Questions? Feedback? powered by Olark live chat software
Ignorar Navegação

Archive Elasticsearch Indices to Azure Blob Storage using the Azure Cloud Plugin

Publicado em 19 abril, 2016

Azure Solution Architect, Cloud and Enterprise

The Azure Cloud plugin for Elasticsearch adds some great capabilities to integrate your Elasticsearch environment with Azure. One of the capabilities it provides is the snapshot and restoration of indices to and from Azure Blob Storage, giving you a cost effective and highly available option for recovery of your indices. This can also provide a method for archiving indices that are no longer in use, but that you would like to be able to access in the future. In this document, we will walk through the deployment of an Elasticsearch cluster on Azure VMs with the newly integrated Azure Cloud plugin and detail the steps to snapshot and restore an index for archival purposes.

Setup

Create a storage account

One of the first things you will need is a storage account to use as a target for your snapshots. In the Azure Portal, select New, Data + Storage, Storage account and create a new storage account using the redundancy option that best suits your requirements (LRS, ZRS, GRS or RA-GRS).

image

Once created, copy the storage account name and one of the access keys, as we will need this when we deploy our Elasticsearch cluster. To get this information, open your storage account, expand All settings and select Access keys. You'll find the name of the storage account as well as the access key, click the copy button to copy to your clipboard and paste them into a Notepad document for easy availability later on.

image

Build an Elasticsearch cluster with the Azure Cloud plugin installed

The Azure Quickstart Template for Elasticsearch has recently been updated to include the capability to install and configure the Azure Cloud plugin. To deploy a cluster, go to the GitHub template link and click the Deploy to Azure button. This will bring you back to the Azure Portal where you can deploy a cluster into your Azure subscription. You can customize the other parameters that pertain to your environment or take the defaults.

It is recommended to select the latest version of Elasticsearch to be installed (2.3.1 at the time of this writing). There are some additional input parameters pertaining to the plugin that have been added (CLOUDAZURE, CLOUDAZURESTORAGEACCOUNT and CLOUDAZURESTORAGEKEY) allowing you to install and configure the plugin automatically. Ensure you select yes to install the plugin (it is set to no by default) and enter the name of the storage account and your storage account key that you created earlier.

image

This plugin implementation will add a couple additional lines of configuration to your elasticsearch.yml file on each node in the cluster specifying the storage account and key. Full information on the allowed parameters can be found on the Elasticsearch Azure Repository documentation page.

image

Once you have deployed your cluster, you are ready to create an index and go through the snapshot process.

Create and populate your index

For purposes of demonstration we are going to create a basic index of data to walk through the steps required to snapshot and restore. You will use the Sense UI deployed with your cluster to do this, but you can also use curl or any other utility that is capable of sending standard HTTP requests. Sense is deployed on the Kibana server; to get the URL open the parent resource group that contains your cluster and click on the Last deployment link.

image

Select Kibana and you will find the KIBANA-URL output from the deployment; click the copy button to copy the URL to your clipboard.

image

The URL for Sense will be http://X.X.X.X:5601/app/sense, substituting your IP address. In Sense, update the server URL to one of the internal addresses of the Elasticsearch nodes.

image

Now we can create an index called myindex with a few documents. Copy/paste the following HTTP requests into Sense and run each command by clicking the green arrow (the cursor will need to be in each command for the arrow to show up).

PUT myindex
{
  "number_of_shards" : 1,
  "number_of_replicas" : 0
}
POST /myindex/documents
{
  "name" : "document1",
  "content" : "Some example content for document1"
}
POST /myindex/documents
{
  "name" : "document2",
  "content" : "Some example content for document2"
}
POST /myindex/documents
{
  "name" : "document3",
  "content" : "Some example content for document3"
}

Now, run the following commands and you will be able to see the newly created index and search against it to see the entries we just added:

GET _cat/indices?v
GET myindex/_search

Now that we have an index, we will go through the process to archive with a snapshot.

Snapshot and archive an index

Create a snapshot repository

The first thing we need to do is create a repository. A repository is a logical container that points to the location where snapshots should be stored and in this case we are going to store our snapshots in Azure. Set up a new repository by running the following in Sense:

PUT _snapshot/myrepository
{
  "type" : "azure"
}

This creates a new repository called myrepository using the parameters for your storage account that are specified in the elasticsearch.yml file.

Take a snapshot

Now take a snapshot of the index:

PUT /_snapshot/myrepository/myindexsnapshot
{
  "indices" : "myindex",
  "include_global_state" : false
}

We have specified a couple options here. The first is indices which allows us to specify the index (or indices separated by commas) that we want to include in this snapshot. The second option is include_global_state which we set to false so that the global state of the cluster is not included in the snapshot, just the index that we want to archive. You can monitor the status of the snapshot progress with the following request:

GET _snapshot/myrepository/myindexsnapshot/_status

When the snapshot has completed, its state should be SUCCESS.

image

If you take a look in your storage account you will see the files associated with your index in a blob container, which by default is called elasticsearch-snapshots. This is configurable by adding the cloud.azure.storage.default.container parameter to your elasticsearch.yml config file with the name of the container you'd like to store the snapshots in, or by passing the container option on the HTTP request.

Delete the index

Now that you have a snapshot of your index we can close it to ensure nothing is being written to the index and then remove it from the cluster.

POST myindex/_close
DELETE myindex

At this point the index is no longer on the cluster and is not consuming any resources other than the Azure Blob storage that the snapshot is using.

Restore an index

Restoring an index to the same cluster is a straightforward process. Since the repository is already registered (assuming that it has not been unregistered) simply POST the repository and snapshot you want to restore to the cluster as follows:

POST _snapshot/myrepository/myindexsnapshot/_restore

You can also restore the snapshot to a different cluster. Supposing we had a second cluster that was configured to use the same Azure Cloud plugin settings, we would need to register the repository with the cluster:

PUT _snapshot/myarchiverepository
{
  "type":"azure"
}

Note that the name of the repository is different. This is simply to demonstrate that the name of the repository does not need to match the name on the cluster where the snapshot was taken, but the name of the snapshot itself will need to match. Now that the repository is registered, we can restore the index:

POST _snapshot/myarchiverepository/myindexsnapshot/_restore

You can monitor recovery by running the following command, which will show details of the recovery activities on the cluster. Additional options are detailed in the Elasticsearch Indices Recovery documentation.

GET _cat/recovery?v

Run a search against the index, and you should see the entries that were added when we initially created the index:

GET myindex/_search

Learn more

The Azure Cloud plugin for Elasticsearch provides a great option for archiving your Elasticsearch indices to low-cost Azure Blob storage, giving you the ability to reduce resources and expenses associated with maintaining indices that may be stale or no longer needed in an immediately online state. For more information on running Elasticsearch in Azure IaaS or on the Elasticsearch Azure Cloud plugin please visit the following links.

 

Special thanks to Hans Krijger and Harold Perry for their assistance with this post!