Elasticsearch on EC2 with node auto-discovery
At my previous employer, Wakoopa, we used Elasticsearch for a few of our applications, so we set up a centralized Elasticsearch cluster that is managed with Amazon CloudFormation and Chef.
For the uninitiated: Elasticsearch is a search engine that is built with high availability and horizontal scaling in mind. Node auto-discovery is the process in which an Elasticsearch node (typically a single server) is added to a cluster automatically.
To set up Elasticsearch on EC2, I followed this excellent tutorial. Most of the
heavy lifting regarding EC2 (and auto-discovery through the AWS API) is done by
the elasticsearch-cloud-aws
plugin. However, I had some difficulty getting node
auto-discovery to work. At one point I had 3 EC2 instances running
Elasticsearch, but each of these nodes promoted itself to master because the
other nodes could not be found.
After some googling it appeared that the discovery.type
,
discovery.ec2.groups
, and cloud.aws.region
configuration options are the
key to get this to work.
The discovery.type
setting must be set to ec2
to tell Elasticsearch to use
the AWS API to find suitable EC2 instances that are Elasticsearch nodes.
Suitable nodes then get added to the cluster automatically.
The discovery.ec2.groups
setting tells Elasticsearch to limit the search for
EC2 instances to a certain EC2 Security Group. Without this setting, all
running instances in your AWS account will be pinged to see if the instance is
an Elasticsearch node. For me this failed. To solve this, add all Elasticsearch
nodes to a Security Group and specify the name of the Security Group in this
configuration setting. In our case this is elasticsearch
.
The cloud.aws.region
further limits the search for instances, this time to a
specific AWS region.
So, putting it all together, this is how our configuration looks:
cluster:
name: search.example.com
node:
name: your-node-name # hostname or IP address
path:
data: /mnt/elasticsearch/data
logs: /mnt/elasticsearch/logs
discovery:
type: ec2
ec2:
groups: elasticsearch
gateway:
type: s3
s3:
bucket: your-bucket
cloud:
aws:
region: eu-west-1
index:
number_of_shards: 6
Bonus: to view the status of all your nodes, install the amazing Paramedic plugin. It's an embedded Ember.js app that polls the status of your indeces and nodes, and visualizes their performance.