This tutorial on taking Elasticsearch snapshots using curator will be divided into sections. One obvious section is how to take snapshots. Other less obvious part will be on configuring a shared directory using Network file sharing on Linux. I will be using a RHEL 7 based cluster of three machines for this tutorial. Once you are done with the basics I outline here, you should start using curator to manage your aliases as my next post details.
As usual I will start with WHY followed by HOW.
WHY
You want to take backups. If you are running a ELK stack then sooner or later you will have old logs which you want to archive and free up space on your cluster. When you upgrade your cluster then you have to take snapshots before doing anything. And there is always a that hardware failure scenario.
HOW
You can take Elasticsearch snapshots in many ways. Simplest is via curl commands. But it is better to use the tool given by Elastic. It is called …..Drum rolls please.
Steps to install curator on a RHEL/CentOS machine
Some housekeeping work. Since Elasticsearch is evolving rapidly you should check the latest instructions here.
sudo rpm --import https://packages.elastic.co/GPG-KEY-elasticsearch
Create a curator.repo file
sudo vi /etc/yum.repos.d/curator.repo
and put this in content in it
[curator-5] name=CentOS/RHEL 7 repository for Elasticsearch Curator 5.x packages baseurl=http://packages.elastic.co/curator/5/centos/7 gpgcheck=1 gpgkey=http://packages.elastic.co/GPG-KEY-elasticsearch enabled=1
Actual installation
sudo yum install elasticsearch-curator
Contrary to other tools it does not have a config file already prepared and ready for you to change.
So you have to create a config file curator.yml yourself. A great starting template is located here. Just change the hosts and the port and you will be good to go. In case the curator is running on same machine as the elasticsearch is running you really can use this file as it is.
To make like easier store this file in ~/.curator location. Otherwise you have to pass the file location using –config option every time you run the tool to take Elasticsearch snapshots. And who wants to do that? Not me.
So create a directory
mkdir ~/.curator
Create a file curator.yml in it.
vi curator.yml
Put this into the file.
# Remember, leave a key empty if there is no value. None will be a string, ## not a Python "NoneType" client: hosts:port: url_prefix: use_ssl: False certificate: client_cert: client_key: ssl_no_validate: False http_auth: timeout: 30 master_only: False logging: loglevel: INFO logfile: /home/elastic/logs logformat: default blacklist: ['elasticsearch', 'urllib3']
For dumping the curator logs you need to have a folder. Hence the /home/elastic/logs folder which you see in the text above.
Create the folder and give the necessary permissions. Though I am logged in as elastic user (belonging to a group elk) I am doing this explicitly. The most common problem during setting up of Elasticsearch snapshots is permissions on folders. Hence making it very visible here.
mkdir -p /home/elastic/logs cd /home/elastic sudo chown -R elastic:elk logs
Now check if everything is working fine.
curator_cli show_indices
Once things are working fine then its time to do something useful. Curator can do a variety of tasks on your indices. These are called Actions and the full list is here.
You pass curator the actions via an action file. You need to pass the location of the file at command line. One nice thing about the tool is the dry-run option which allows you to do a test run safely without actually changing anything in the cluster.
You can see a whole list of sample action files doing the stuff at this location.
I will pick up the snapshot action file and change it to suit my needs. Then I can take a snapshot of a given series of indices. Elasticsearch snapshots … here I come.
Tedious networking stuff
You have to create a shared directory for the nodes first. And this is the hard part where lot of networking issues can trip you. The short story is that you need a location which is visible “all” the nodes in the cluster. And these nodes should have read/write permissions on that shared location. Can’t stress this point enough. The idea is that when snapshot command is issued then all nodes start dumping the data from the part of index on them on to the shared location. If you already have this sorted out then skip this section by clicking here.
I will use Network File Service on RHEL 7 to create a shared directory. Then I will create a folder on each node. Folder name and path will be same. And I will mount this shared directory on each node at that particular folder.
Installing the needed softwares (I will keep this brief. I used this site. In case anything fails refer to this one or google).
On the machine where the NFS server will be running and the shared folder will be located.
sudo yum install nfs-utils libnfsidmap sudo systemctl enable rpcbind sudo systemctl enable nfs-server sudo systemctl start rpcbind sudo systemctl start nfs-server sudo systemctl start rpc-statd sudo systemctl start nfs-idmapd
Now you create the directory to share with the clients.
mkdir /home/elastic/backups sudo chown -R elastic:elk /home/elastic/backups
Modify /etc/exports
sudo vi /etc/exports
and put this in it.
/home/elastic/backups *(rw,sync,no_root_squash)
Then export the shared directory.
sudo exportfs -r
On the cluster nodes some installs and configuration is needed too.
Let us create a folder on each of the nodes.
mkdir /home/elastic/mount/backups sudo chown -R elastic:elk /home/elastic/mount/backups
We will mount the shared directory at this location.
NFS related software installs
sudo yum -y install nfs-utils libnfsidmap sudo systemctl enable rpcbind sudo systemctl start rpcbind
Check if the exported dir is visible on the client
showmount -e ServerHostingNFS
This should show
Export list for ServerHostingNFS: /home/elastic/backups *
You want the mounting of this shared directory to happen automatically on the clients when the reboot happens because reboots happen.
Open /etc/fstab
sudo vi /etc/fstab
and add line
ServerHostingNFS:/home/elastic/backups /home/elastic/mount/backups nfs defaults 0 0
You want to check if the auto mount is happening. One option is to reboot the machine and then check it. But if you have any syntax mistakes in the /etc/fstab then there are chances that machine might not boot up. This can be an issue when the machine is a remote machine. So it is better to unmount the shared folder on the client machine and then just do a mount. The shared folder should get mounted automatically.
sudo umount /home/elastic/mount/backups sudo mount -a mount
You should see that the shared directory is mounted.
ServerHostingNFS:/home/elastic/backups on /home/elastic/mount/backups type nfs4 .....blah blah
Check if the mount is writeable after the automount
touch /home/elastic/mount/backups/test
You should be able to see the file across all the nodes and inside the shared directory of on the ServerHostingNFS. Try out create and delete combinations to find if there are nay permission issues. Reboot the nodes and see if the automounting is happening.
Once you have everything on shared directory sorted out rest of stuff is actually easy.
Configuring Elasticsearch
You have to add an entry in the elasticsearch.yml
Open the file
sudo vim /etc/elasticsearch/elasticsearch.yml
and add
path.repo: ["/home/elastic/mount/backups"]
Then restart each elasticsearch on each of the node.
If you have any issues then elasticsearch will refuse to start up. Go through the logs to find the issue. Most of the times it is because of the permissions.
Now you have to create a repository in elasticsearch and map it to the location where the shared files service is mounted.
Use curl command in linux terminal
curl -XPUT 'yourelasticserverip:9200/_snapshot/logs_backup' -H 'Content-Type: application/json' -d '{ "type": "fs", "settings": {"location": "/home/elastic/mount/backups","compress": true}}'
Now in elasticsearch I have registered a repository which has name logs_backup. All the nodes will dump the data to /Data/mount/backups which actually refers to Data/backups shared file system.
Now we have to have an action file for curator to work with. Let us call it action_snapshot.yml and put this content in it.
actions: 1: action: snapshot description: >- Snapshot log-production- prefixed indices older than 1 day (based on index creation_date) with the default snapshot name pattern of 'curator-%Y%m%d%H%M%S'. Wait for the snapshot to complete. Do not skip the repository filesystem access check. Use the other options to create the snapshot. options: repository: logs_backup # Leaving name blank will result in the default 'curator-%Y%m%d%H%M%S' name: ProductionLogs-%Y%m%d%H%M%S ignore_unavailable: False include_global_state: True partial: False wait_for_completion: True skip_repo_fs_check: False disable_action: False filters: - filtertype: pattern kind: prefix value: log-production- - filtertype: age source: creation_date direction: older unit: days unit_count: 1
Line 11: We are passing the name of the repository we had registered earlier.
Line 14: This is the name of the snapshot. See how it will be appended with time information.
Line 24: The indices it will match. Here it will pick all the indices which start with log-production.
Now you can take Elasticsearch snapshots !! If you are a sane person you will like to do a dry run. Not me.
curator action_snapshot.yml
Output is something like this
2017-08-02 17:08:13,897 INFO Preparing Action ID: 1, "snapshot" 2017-08-02 17:08:13,905 INFO Trying Action ID: 1, "snapshot": Snapshot log-production- prefixed indices older than 1 day (based on index creation_date) with the default snapshot name pattern of 'curator-%Y%m%d%H%M%S'. Wait for the snapshot to complete. Do not skip the repository filesystem access check. Use the other options to create the snapshot. 2017-08-02 17:08:13,991 INFO Creating snapshot "ProductionLogs-20170802070813" from indices: ['log-production-2017.06', 'log-production-2017.07'] 2017-08-02 17:08:14,049 INFO Snapshot ProductionLogs-20170802070813 still in progress. 2017-08-02 17:08:23,061 INFO Snapshot ProductionLogs-20170802070813 still in progress. 2017-08-02 17:08:32,072 INFO Snapshot ProductionLogs-20170802070813 still in progress. 2017-08-02 17:08:41,103 INFO Snapshot ProductionLogs-20170802070813 successfully completed. 2017-08-02 17:08:41,104 INFO Action ID: 1, "snapshot" completed. 2017-08-02 17:08:41,104 INFO Job completed.
Go and take a peek in the shared location, “Data/backups” in our case. They should have the backup files.
You can also issue a command on the terminal to see the snapshots.
curl -XGET 'http://yourserver:9200/_snapshot/logs_backup/_all?pretty'
Output is something like this
{ "snapshots" : [ { "snapshot" : "ProductionLogs-20170802070813", "uuid" : "bWjLfMTaSgWkbWTbxL1XTA", "version_id" : 5020299, "version" : "5.2.2", "indices" : [ "log-production-2017.07", "log-production-2017.06" ....blah..... ....blah..... } ] }
Now with that done only one thing is left. Do a restore using Elasticsearch snapshots you have taken. How you do it is something you have to decide. For me it is simple. Since I am working with test data I will count the number of documents in the indices whose snapshot was taken. Then I will delete the indices. Then restore. And if the count of document matches with the intial one I know that restore worked.
Count of intial docs
curl -XGET 'http://yourserver:9200/log-production-*/_stats?pretty'
Output
{ "_shards" : { "total" : 20, "successful" : 20, "failed" : 0 }, "_all" : { "primaries" : { "docs" : { "count" : 5368390, "deleted" : 0 }, "store" : { "size_in_bytes" : 1496195150, "throttle_time_in_millis" : 0 ....... .......
Document count is 5368390
Then a delete
curl -XDELETE 'http://yourserver:9200/log-production-*?pretty'
To restore you need an action file.
I will create an action file “action_snapshot_restore.yml”
actions: 1: action: restore description: >- Restore all indices in the most recent curator-* snapshot with state SUCCESS. Wait for the restore to complete before continuing. Do not skip the repository filesystem access check. Use the other options to define the index/shard settings for the restore. options: repository: logs_backup # If name is blank, the most recent snapshot by age will be selected name: ProductionLogs-20170803003417 # If indices is blank, all indices in the snapshot will be restored indices: include_aliases: False ignore_unavailable: False include_global_state: False partial: False rename_pattern: rename_replacement: extra_settings: wait_for_completion: True skip_repo_fs_check: True disable_action: False filters: - filtertype: pattern kind: prefix value: ProductionLogs- - filtertype: state state: SUCCESS
Line 28: I want to work with the Elasticsearch snapshots with name beginning with ProductionLogs.
Line 10: I specify the repository to be used.
Line 12: I choose the snapshot. This is useful if you want to restore the indices only till a point in the past. To restore the indices till present leave it blank. Elasticsearch will use the latest snapshot.
curator action_snapshot_restore.yml
Output
2017-08-03 12:54:33,491 INFO Preparing Action ID: 1, "restore" 2017-08-03 12:54:33,499 INFO Trying Action ID: 1, "restore": Restore all indices in the most recent curator-* snapshot with state SUCCESS. Wait for the restore to complete before continuing. Do not skip the repository filesystem access check. Use the other options to define the index/shard settings for the restore. 2017-08-03 12:54:33,514 INFO Restoring indices "['log-production-2017.07', 'log-production-2017.06']" from snapshot: ProductionLogs-20170803003417 2017-08-03 12:54:33,586 INFO _recovery returned an empty response. Trying again. 2017-08-03 12:54:42,611 INFO Index "log-production-2017.07" is still in stage "INDEX" 2017-08-03 12:54:51,630 INFO Index "log-production-2017.07" is still in stage "INDEX" 2017-08-03 12:55:00,646 INFO Index "log-production-2017.07" is still in stage "INDEX" 2017-08-03 12:55:09,664 INFO Index "log-production-2017.07" is still in stage "INDEX" 2017-08-03 12:55:18,674 INFO Action ID: 1, "restore" completed. 2017-08-03 12:55:18,674 INFO Job completed.
A quick curl command to check if Elasticsearch snapshots restore worked. See the count of documents restored.
curl -XGET 'http://yourserver:9200/log-galveston-*/_stats?pretty'
Output
{ "_shards" : { "total" : 20, "successful" : 20, "failed" : 0 }, "_all" : { "primaries" : { "docs" : { "count" : 5368390, "deleted" : 0 }, "store" : { "size_in_bytes" : 1496195150, "throttle_time_in_millis" : 0 }, ...... ......
The count is spot on. You tamed the Elasticsearch snapshots. Now you are ready to take your curator skills to next level. Start managing your aliases with curator.
Pingback: Managing Elasticsearch Aliases using Curator
Thanks for sharing the Curator goodness!
Hey nice to see you here Aaron.
Thanks for sharing! I have a index with the format: mylog-2018.05.14-000002 and mylog-2018.05.14-1. Curator errors out saying it cant find any indices. I assume it is something with the timestring value. Any ideas?
I will assume that you are trying to take snapshot. I think you are rightly concerned about the timestring values. The example code I have put is using filter chaining. There is pattern match which is followed by another filter which uses time. So I would remove the time based filter and see what happens.
Something like replacing
with this