ElasticSearch Sample Data

By | July 25, 2016

This ElasticSearch Sample Data is to be used for learning purpose only. It is randomly generated but still care has been taken to make it look like real world data.

WHY

Many people who are on path to learn ElasticSearch get stumped on this.
ElasticSearch Sample Data

Yes. Indeed where are large ElasticSearch Sample Data they can use to hone their ElasticSearch Kung-Fu? Well you are at right place.

HOW

Here is ElasticSearch Sample Data in form of two formatted json data files I created for myself for learning purposes.

Employees100K
Employees50K

One has records of 50000 employees while another one has 100000 employees.
Feel free to use these ElasticSearch Sample Data. However I assume no responsibility for any damage that might/can/will/should result from that. 🙂

To newbies here are the steps to load data to your ElasticSearch cluster:
1–Download curl. I am using linux which usually has curl.
2–Download and extract the data files.
3–Run these commands to load the data. First command creates an index with right mapping. The second one loads data. Might take some time.

4–Access the url http://localhost:9200/companydatabase/_count?pretty=1 to check if the data is there or not.

Some words about data
Though it is random generated but still I have tried to keep lot of structure in it.
There is one CEO.
The President, Vice President, Delivery managers, Managers, Architects, HR Managers, Team lead, Senior Software Engineers, Software Engineers and Trainees, all follow a secret ratio (approximate one that is). Do post in comments sections if you find that. And any other interesting tidbits like male managers who have pole dancing as hobby. 🙂

The JSON format of data is like this

Giving credit where it is due
I used this website to generate the random addresses.
I used this website to generate the list of hobbies.
Source of Male first names is this.
Source of Female first names is this.
Source of Surnames is this.

I will post the program I wrote to generate the random data once I clean it up and it is presentable.
Meanwhile if you want smaller ElasticSearch Sample Data sets then give me a shout. I will generate and put that up.

2 I have hardcoded the index name to companydatabase and type name to employees btw. Sorry about that. You can change that in any editor

3 thoughts on “ElasticSearch Sample Data

  1. Greg

    Maybe add a StartDate date field (yyyy-mm-dd); could open demo possibilities for timeline style queries like “what is the most popular hiring month?” etc.

    Reply
    1. Pankaj K Post author

      Sorry for late response. It is not that I did not read the comment but just that I was too busy to work on that. However that being said I have updated the sample data and you can use it if you still need it.

      As a bonus I have made the month of joining of most of the people coincide with the most popular months of switching jobs. See if your kibana visualisations can pick that up. 🙂

      Reply
  2. Miko

    Very helpful – I was just looking for exactly this type of data. Thanks a lot! PS. time based fields and timestamps would be a good extension

    Reply

Leave a Reply