Taking Elasticsearch snapshots using Curator

By | August 7, 2017

This tutorial on taking Elasticsearch snapshots using curator will be divided into sections. One obvious section is how to take snapshots. Other less obvious part will be on configuring a shared directory using Network file sharing on Linux. I will be using a RHEL 7 based cluster of three machines for this tutorial. Once you are done with the basics I outline here, you should start using curator to manage your aliases as my next post details.

As usual I will start with WHY followed by HOW.

WHY


You want to take backups. If you are running a ELK stack then sooner or later you will have old logs which you want to archive and free up space on your cluster. When you upgrade your cluster then you have to take snapshots before doing anything. And there is always a that hardware failure scenario.

HOW

You can take Elasticsearch snapshots in many ways. Simplest is via curl commands. But it is better to use the tool given by Elastic. It is called …..Drum rolls please.
Elasticsearch snapshots using curator

Steps to install curator on a RHEL/CentOS machine

Some housekeeping work. Since Elasticsearch is evolving rapidly you should check the latest instructions here.

sudo rpm --import https://packages.elastic.co/GPG-KEY-elasticsearch

Create a curator.repo file

sudo vi /etc/yum.repos.d/curator.repo

and put this in content in it

[curator-5]
name=CentOS/RHEL 7 repository for Elasticsearch Curator 5.x packages
baseurl=http://packages.elastic.co/curator/5/centos/7
gpgcheck=1
gpgkey=http://packages.elastic.co/GPG-KEY-elasticsearch
enabled=1

Actual installation

sudo yum install elasticsearch-curator

Contrary to other tools it does not have a config file already prepared and ready for you to change.

Elasticsearch snapshots using curator
So you have to create a config file curator.yml yourself. A great starting template is located here. Just change the hosts and the port and you will be good to go. In case the curator is running on same machine as the elasticsearch is running you really can use this file as it is.
To make like easier store this file in ~/.curator location. Otherwise you have to pass the file location using –config option every time you run the tool to take Elasticsearch snapshots. And who wants to do that? Not me.

So create a directory

mkdir ~/.curator

Create a file curator.yml in it.

vi curator.yml

Put this into the file.

# Remember, leave a key empty if there is no value.  None will be a string,
## not a Python "NoneType"

client:
  hosts: 
  port: 
  url_prefix:
  use_ssl: False
  certificate:
  client_cert:
  client_key:
  ssl_no_validate: False
  http_auth:
  timeout: 30
  master_only: False

logging:
  loglevel: INFO
  logfile: /home/elastic/logs
  logformat: default
  blacklist: ['elasticsearch', 'urllib3']

For dumping the curator logs you need to have a folder. Hence the /home/elastic/logs folder which you see in the text above.

Create the folder and give the necessary permissions. Though I am logged in as elastic user (belonging to a group elk) I am doing this explicitly. The most common problem during setting up of Elasticsearch snapshots is permissions on folders. Hence making it very visible here.

mkdir -p /home/elastic/logs
cd /home/elastic
sudo chown -R elastic:elk logs

Now check if everything is working fine.

curator_cli show_indices

Once things are working fine then its time to do something useful. Curator can do a variety of tasks on your indices. These are called Actions and the full list is here.
You pass curator the actions via an action file. You need to pass the location of the file at command line. One nice thing about the tool is the dry-run option which allows you to do a test run safely without actually changing anything in the cluster.
You can see a whole list of sample action files doing the stuff at this location.

I will pick up the snapshot action file and change it to suit my needs. Then I can take a snapshot of a given series of indices. Elasticsearch snapshots … here I come.

elasticsearch snapshot using curator

Tedious networking stuff
You have to create a shared directory for the nodes first. And this is the hard part where lot of networking issues can trip you. The short story is that you need a location which is visible “all” the nodes in the cluster. And these nodes should have read/write permissions on that shared location. Can’t stress this point enough. The idea is that when snapshot command is issued then all nodes start dumping the data from the part of index on them on to the shared location. If you already have this sorted out then skip this section by clicking here.

I will use Network File Service on RHEL 7 to create a shared directory. Then I will create a folder on each node. Folder name and path will be same. And I will mount this shared directory on each node at that particular folder.

Installing the needed softwares (I will keep this brief. I used this site. In case anything fails refer to this one or google).

On the machine where the NFS server will be running and the shared folder will be located.

sudo yum install nfs-utils libnfsidmap
sudo systemctl enable rpcbind
sudo systemctl enable nfs-server
sudo systemctl start rpcbind
sudo systemctl start nfs-server
sudo systemctl start rpc-statd
sudo systemctl start nfs-idmapd

Now you create the directory to share with the clients.

mkdir /home/elastic/backups 
sudo chown -R elastic:elk /home/elastic/backups

Modify /etc/exports

sudo vi /etc/exports

and put this in it.

/home/elastic/backups *(rw,sync,no_root_squash)

Then export the shared directory.

sudo exportfs -r

On the cluster nodes some installs and configuration is needed too.
Let us create a folder on each of the nodes.

mkdir /home/elastic/mount/backups
sudo chown -R elastic:elk /home/elastic/mount/backups

We will mount the shared directory at this location.

NFS related software installs

sudo yum -y install nfs-utils libnfsidmap
sudo systemctl enable rpcbind
sudo systemctl start rpcbind

Check if the exported dir is visible on the client

showmount -e ServerHostingNFS

This should show

Export list for ServerHostingNFS:
/home/elastic/backups *

You want the mounting of this shared directory to happen automatically on the clients when the reboot happens because reboots happen.
Open /etc/fstab

sudo vi /etc/fstab

and add line

ServerHostingNFS:/home/elastic/backups /home/elastic/mount/backups        nfs     defaults        0 0

You want to check if the auto mount is happening. One option is to reboot the machine and then check it. But if you have any syntax mistakes in the /etc/fstab then there are chances that machine might not boot up. This can be an issue when the machine is a remote machine. So it is better to unmount the shared folder on the client machine and then just do a mount. The shared folder should get mounted automatically.

sudo umount /home/elastic/mount/backups
sudo mount -a
mount

You should see that the shared directory is mounted.

ServerHostingNFS:/home/elastic/backups on /home/elastic/mount/backups type nfs4 .....blah blah

Check if the mount is writeable after the automount

touch /home/elastic/mount/backups/test

You should be able to see the file across all the nodes and inside the shared directory of on the ServerHostingNFS. Try out create and delete combinations to find if there are nay permission issues. Reboot the nodes and see if the automounting is happening.

Once you have everything on shared directory sorted out rest of stuff is actually easy.

Configuring Elasticsearch
You have to add an entry in the elasticsearch.yml
Open the file

sudo vim /etc/elasticsearch/elasticsearch.yml

and add

path.repo: ["/home/elastic/mount/backups"]

Then restart each elasticsearch on each of the node.
If you have any issues then elasticsearch will refuse to start up. Go through the logs to find the issue. Most of the times it is because of the permissions.

Now you have to create a repository in elasticsearch and map it to the location where the shared files service is mounted.
Use curl command in linux terminal

curl -XPUT 'yourelasticserverip:9200/_snapshot/logs_backup' -H 'Content-Type: application/json' -d '{ "type": "fs", "settings": {"location": "/home/elastic/mount/backups","compress": true}}'

Now in elasticsearch I have registered a repository which has name logs_backup. All the nodes will dump the data to /Data/mount/backups which actually refers to Data/backups shared file system.

Now we have to have an action file for curator to work with. Let us call it action_snapshot.yml and put this content in it.

actions:
  1:
    action: snapshot
    description: >-
      Snapshot log-production- prefixed indices older than 1 day (based on index
      creation_date) with the default snapshot name pattern of
      'curator-%Y%m%d%H%M%S'.  Wait for the snapshot to complete.  Do not skip
      the repository filesystem access check.  Use the other options to create
      the snapshot.
    options:
      repository: logs_backup

    # Leaving name blank will result in the default 'curator-%Y%m%d%H%M%S'
      name: ProductionLogs-%Y%m%d%H%M%S
      ignore_unavailable: False
      include_global_state: True
      partial: False
      wait_for_completion: True
      skip_repo_fs_check: False
      disable_action: False
    filters:
    - filtertype: pattern
      kind: prefix
      value: log-production-
    - filtertype: age
      source: creation_date
      direction: older
      unit: days
      unit_count: 1

Line 11: We are passing the name of the repository we had registered earlier.
Line 14: This is the name of the snapshot. See how it will be appended with time information.
Line 24: The indices it will match. Here it will pick all the indices which start with log-production.

Now you can take Elasticsearch snapshots !! If you are a sane person you will like to do a dry run. Not me.

curator action_snapshot.yml

Output is something like this

2017-08-02 17:08:13,897 INFO      Preparing Action ID: 1, "snapshot"
2017-08-02 17:08:13,905 INFO      Trying Action ID: 1, "snapshot": Snapshot log-production- prefixed indices older than 1 day (based on index creation_date) with the default snapshot name pattern of 'curator-%Y%m%d%H%M%S'.  Wait for the snapshot to complete.  Do not skip the repository filesystem access check.  Use the other options to create the snapshot.
2017-08-02 17:08:13,991 INFO      Creating snapshot "ProductionLogs-20170802070813" from indices: ['log-production-2017.06', 'log-production-2017.07']
2017-08-02 17:08:14,049 INFO      Snapshot ProductionLogs-20170802070813 still in progress.
2017-08-02 17:08:23,061 INFO      Snapshot ProductionLogs-20170802070813 still in progress.
2017-08-02 17:08:32,072 INFO      Snapshot ProductionLogs-20170802070813 still in progress.
2017-08-02 17:08:41,103 INFO      Snapshot ProductionLogs-20170802070813 successfully completed.
2017-08-02 17:08:41,104 INFO      Action ID: 1, "snapshot" completed.
2017-08-02 17:08:41,104 INFO      Job completed.

Go and take a peek in the shared location, “Data/backups” in our case. They should have the backup files.
You can also issue a command on the terminal to see the snapshots.

curl -XGET 'http://yourserver:9200/_snapshot/logs_backup/_all?pretty'

Output is something like this

{
  "snapshots" : [
    {
      "snapshot" : "ProductionLogs-20170802070813",
      "uuid" : "bWjLfMTaSgWkbWTbxL1XTA",
      "version_id" : 5020299,
      "version" : "5.2.2",
      "indices" : [
        "log-production-2017.07",
        "log-production-2017.06"
        ....blah.....
        ....blah.....
    }
  ]
}

Now with that done only one thing is left. Do a restore using Elasticsearch snapshots you have taken. How you do it is something you have to decide. For me it is simple. Since I am working with test data I will count the number of documents in the indices whose snapshot was taken. Then I will delete the indices. Then restore. And if the count of document matches with the intial one I know that restore worked.

Count of intial docs

curl -XGET 'http://yourserver:9200/log-production-*/_stats?pretty'

Output

{
  "_shards" : {
    "total" : 20,
    "successful" : 20,
    "failed" : 0
  },
  "_all" : {
    "primaries" : {
      "docs" : {
        "count" : 5368390,
        "deleted" : 0
      },
      "store" : {
        "size_in_bytes" : 1496195150,
        "throttle_time_in_millis" : 0
      .......
      .......

Document count is 5368390

Then a delete

curl -XDELETE 'http://yourserver:9200/log-production-*?pretty'

To restore you need an action file.
I will create an action file “action_snapshot_restore.yml”

actions:
  1:
    action: restore
    description: >-
      Restore all indices in the most recent curator-* snapshot with state
      SUCCESS.  Wait for the restore to complete before continuing.  Do not skip
      the repository filesystem access check.  Use the other options to define
      the index/shard settings for the restore.
    options:
      repository: logs_backup
      # If name is blank, the most recent snapshot by age will be selected
      name: ProductionLogs-20170803003417
      # If indices is blank, all indices in the snapshot will be restored
      indices:
      include_aliases: False
      ignore_unavailable: False
      include_global_state: False
      partial: False
      rename_pattern:
      rename_replacement:
      extra_settings:
      wait_for_completion: True
      skip_repo_fs_check: True
      disable_action: False
    filters:
    - filtertype: pattern
      kind: prefix
      value: ProductionLogs-
    - filtertype: state
      state: SUCCESS

Line 28: I want to work with the Elasticsearch snapshots with name beginning with ProductionLogs.
Line 10: I specify the repository to be used.
Line 12: I choose the snapshot. This is useful if you want to restore the indices only till a point in the past. To restore the indices till present leave it blank. Elasticsearch will use the latest snapshot.

System.Threading.Timer Will it work
Time to push the button.

curator action_snapshot_restore.yml

Output

2017-08-03 12:54:33,491 INFO      Preparing Action ID: 1, "restore"
2017-08-03 12:54:33,499 INFO      Trying Action ID: 1, "restore": Restore all indices in the most recent curator-* snapshot with state SUCCESS.  Wait for the restore to complete before continuing.  Do not skip the repository filesystem access check.  Use the other options to define the index/shard settings for the restore.
2017-08-03 12:54:33,514 INFO      Restoring indices "['log-production-2017.07', 'log-production-2017.06']" from snapshot: ProductionLogs-20170803003417
2017-08-03 12:54:33,586 INFO      _recovery returned an empty response. Trying again.
2017-08-03 12:54:42,611 INFO      Index "log-production-2017.07" is still in stage "INDEX"
2017-08-03 12:54:51,630 INFO      Index "log-production-2017.07" is still in stage "INDEX"
2017-08-03 12:55:00,646 INFO      Index "log-production-2017.07" is still in stage "INDEX"
2017-08-03 12:55:09,664 INFO      Index "log-production-2017.07" is still in stage "INDEX"
2017-08-03 12:55:18,674 INFO      Action ID: 1, "restore" completed.
2017-08-03 12:55:18,674 INFO      Job completed.

A quick curl command to check if Elasticsearch snapshots restore worked. See the count of documents restored.

curl -XGET 'http://yourserver:9200/log-galveston-*/_stats?pretty'

Output

{
  "_shards" : {
    "total" : 20,
    "successful" : 20,
    "failed" : 0
  },
  "_all" : {
    "primaries" : {
      "docs" : {
        "count" : 5368390,
        "deleted" : 0
      },
      "store" : {
        "size_in_bytes" : 1496195150,
        "throttle_time_in_millis" : 0
      },
      ......
      ......

The count is spot on. You tamed the Elasticsearch snapshots. Now you are ready to take your curator skills to next level. Start managing your aliases with curator.

5 thoughts on “Taking Elasticsearch snapshots using Curator

  1. Pingback: Managing Elasticsearch Aliases using Curator

  2. Brendan

    Thanks for sharing! I have a index with the format: mylog-2018.05.14-000002 and mylog-2018.05.14-1. Curator errors out saying it cant find any indices. I assume it is something with the timestring value. Any ideas?

    Reply
    1. Pankaj K Post author

      I will assume that you are trying to take snapshot. I think you are rightly concerned about the timestring values. The example code I have put is using filter chaining. There is pattern match which is followed by another filter which uses time. So I would remove the time based filter and see what happens.
      Something like replacing

          filters:
          - filtertype: pattern
            kind: prefix
            value: log-production-
          - filtertype: age
            source: creation_date
            direction: older
            unit: days
            unit_count: 1  
      

      with this

          filters:
          - filtertype: pattern
            kind: prefix
            value: log-production-
      
      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.