cluster_chef
Infrastructure as code: describe and orchestrate whole clusters of cloud or virtual machines.
Overview
The cluster_chef toolset takes chef to the next level:
cluster_chef
gem: assembles machines and roles into a working cluster using a clean, expressive domain-specific language.cluster_chef-knife
knife plugin: orchestrate clusters of machines, in the cloud or VMs.cluster_chef-homebase
: Our collection of industrial-strength, cloud-ready recipes for Hadoop, HBase, Cassandra, Elasticsearch, Zabbix and more.- the
metachef
cookbook: coordinate discovery of services ("list all the machines forawesome_webapp
, that I might load balance them") and aspects ("
Walkthrough
Here's a very simple cluster:
ClusterChef.cluster 'awesome' do
cloud(:ec2) do
flavor 't1.micro'
end
role :base_role
role :chef_client
role :ssh
# The database server
facet :dbnode do
instances 1
role :mysql_server
cloud do
flavor 'm1.large'
backing 'ebs'
end
end
# A throwaway facet for development.
facet :webnode do
instances 2
role :nginx_server
role :awesome_webapp
end
end
This code defines a cluster named demosimple. A cluster is a group of servers united around a common purpose, in this case to serve a scalable web application.
The awesome cluster has two 'facets' -- dbnode and webnode. A facet is a subgroup of interchangeable servers that provide a logical set of systems: in this case, the systems that store the website's data and those that render it.
The dbnode facet has one server, which will be named 'awesome-dbnode-0'; the webnode facet has two servers, 'awesome-webnode-0' and 'awesome-webnode-1'.
Each server inherits the appropriate behaviors from its facet and cluster. All the servers in this cluster have the base_role
, chef_client
and ssh
roles. The dbnode machines additionally house a MySQL server, while the webnodes have an nginx reverse proxy for the custom awesome_webapp
.
As you can see, the dbnode facet asks for a different flavor of machine ('m1.large') than the cluster default ('t1.micro'). Settings in the facet override those in the server, and settings in the server override those of its facet. You economically describe only what's significant about each machine.
Cluster-level tools
$ knife cluster show awesome
+--------------------+-------+------------+-------------+--------------+---------------+-----------------+----------+--------------+------------+------------+
| Name | Chef? | InstanceID | State | Public IP | Private IP | Created At | Flavor | Image | AZ | SSH Key |
+--------------------+-------+------------+-------------+--------------+---------------+-----------------+----------+--------------+------------+------------+
| awesome-dbnode-0 | yes | i-43c60e20 | running | 107.22.6.104 | 10.88.112.201 | 20111029-204156 | t1.micro | ami-cef405a7 | us-east-1a | awesome |
| awesome-webnode-0 | yes | i-1233aef1 | running | 102.99.3.123 | 10.88.112.123 | 20111029-204156 | t1.micro | ami-cef405a7 | us-east-1a | awesome |
| awesome-webnode-1 | yes | i-0986423b | not running | | | | | | | |
+--------------------+-------+------------+-------------+--------------+---------------+-----------------+----------+--------------+------------+------------+
The commands available are
- list -- lists known clusters
- show -- show the named servers
- launch -- launch server
- bootstrap
- sync
- ssh
- start/stop
- kill
- kick -- trigger a chef-client run on each named machine, tailing the logs until the run completes
Advanced clusters remain simple
Let's say that app is truly awesome, and the features and demand increases. This cluster adds an ElasticSearch server for searching, a haproxy loadbalancer, and spreads the webnodes across two availability zones.
ClusterChef.cluster 'webserver_demo' do
cloud(:ec2) do
image_name "maverick"
flavor "t1.micro"
availability_zones ['us-east-1a']
end
# The database server
facet :dbnode do
instances 1
role :mysql_server
cloud do
flavor 'm1.large'
backing 'ebs'
end
volume(:data) do
size 20
keep true
device '/dev/sdi'
mount_point '/data'
snapshot_id 'snap-a10234f'
attachable :ebs
end
end
facet :webnode do
instances 6
cloud.availability_zones ['us-east-1a', 'us-east-1b']
role :nginx_server
role :awesome_webapp
role :elasticsearch_client
volume(:server_logs) do
size 5
keep true
device '/dev/sdi'
mount_point '/server_logs'
snapshot_id 'snap-d9c1edb1'
end
end
facet :esnode do
instances 1
role "elasticsearch_data_esnode"
role "elasticsearch_http_esnode"
cloud.flavor "m1.large"
end
facet :loadbalancer do
instances 1
role "haproxy"
cloud.flavor "m1.xlarge"
elastic_ip "128.69.69.23"
end
cluster_role.override_attributes({
:elasticsearch => {
:version => '0.17.8',
},
})
end
The facets are described and scale independently. If you'd like to add more webnodes, just increase the instance count. If a machine misbehaves, just terminate it. Running knife cluster launch awesome webnode
will note which machines are missing, and launch and configure them appropriately.
ClusterChef speaks naturally to both Chef and your cloud provider. The esnode's cluster_role.override_attributes
statement will be synchronized to the chef server, pinning the elasticsearch version across the clients and server.. Your chef roles should focus system-specific information; the cluster file lets you see the architecture as a whole.
With these simple settings, if you have already set up chef's knife to launch cloud servers, typing knife cluster launch demosimple --bootstrap
will (using Amazon EC2 as an example):
- Synchronize to the chef server:
- create chef roles on the server for the cluster and each facet.
- apply role directives (eg the homebase's
default_attributes
declaration). - create a node for each machine
- apply the runlist to each node
- Set up security isolation:
- uses a keypair (login ssh key) isolated to that cluster
- Recognizes the
ssh
role, and add a security groupssh
that by default opens port 22. - Recognize the
nfs_server
role, and adds security groupsnfs_server
andnfs_client
- Authorizes the
nfs_server
to accept connections from allnfs_client
s. Machines in other clusters that you mark asnfs_client
s can connect to the NFS server, but are not automatically granted any other access to the machines in this cluster. ClusterChef's opinionated behavior is about more than saving you effort -- tying this behavior to the chef role means you can't screw it up.
- Launches the machines in parallel:
- using the image name and the availability zone, it determines the appropriate region, image ID, and other implied behavior.
- passes a JSON-encoded user_data hash specifying the machine's chef
node_name
and client key. An appropriately-configured machine image will need no further bootstrapping -- it will connect to the chef server with the appropriate identity and proceed completely unattended.
- Syncronizes to the cloud provider:
- Applies EC2 tags to the machine, making your console intelligible:
- Connects external (EBS) volumes, if any, to the correct mount point -- it uses (and applies) tags to the volumes, so they know which machine to adhere to. If you've manually added volumes, just make sure they're defined correctly in your cluster file and run
knife cluster sync {cluster_name}
; it will paint them with the correct tags. - Associates an elastic IP, if any, to the machine
- Bootstraps the machine using knife bootstrap
Getting Started
This assumes you have installed chef, have a working chef server, and have an AWS account. If you can run knife and use the web browser to see your EC2 console, you can start here. If not, see the instructions below.
Setup
bundle install
Your first cluster
Let's create a cluster called 'demosimple'. It's, well, a simple demo cluster.
Create a simple demo cluster
Create a directory for your clusters; copy the demosimple cluster and its associated roles from cluster_chef:
mkdir -p $CHEF_REPO_DIR/clusters
cp cluster_chef/clusters/{defaults,demosimple}.rb ./clusters/
cp cluster_chef/roles/{big_package,nfs_*,ssh,base_role,chef_client}.rb ./roles/
Lastly, add the cookbooks
, site-cookbooks
, and meta-cookbooks
directories
from cluster_chef to the cookbooks_path
in your knife.rb, and push everything
to the chef server. (see below for details).
knife cluster launch
Hooray! You're ready to launch a cluster:
knife cluster launch demosimple homebase --bootstrap
It will kick off a node and then bootstrap it. You'll see it install a whole bunch of things. Yay.
Philosophy
Some general principles of how we use chef.
- Chef server is never the repository of truth -- it only mirrors the truth.
- a file is tangible and immediate to access
- Specifically, we want truth to live in the git repo, and be enforced by the chef server. There is no truth but git, and chef is its messenger.
- this means that everything is versioned, documented and exchangeable.
- Systems, services and significant modifications cluster should be obvious from the
clusters
file. I don't want to have to bounce around nine different files to find out which thing installed a redis:server.- basically, the existence of anything that opens a port should be obvious when I look at the cluster file.
- Roles define systems, clusters assemble systems into a machine.
- For example, a resque worker queue has a redis, a webserver and some config files -- your cluster should invoke a @whatever_queue@ role, and the @whatever_queue@ role should include recipes for the component services.
- the existence of anything that opens a port or runs as a service should be obvious when I look at the roles file.
- include_recipe considered harmful Do NOT use include_recipe for anything that a) provides a service, b) launches a daemon or c) is interesting in any way. (so: @include_recipe java@ yes; @include_recipe iptables@ no.) You should note the dependency in the metadata.rb. This seems weird, but the breaking behavior is purposeful: it makes you explicitly state all dependencies.
- It's nice when machines are in full control of their destiny.
- initial setup (elastic IP, attaching a drive) is often best enforced externally
- but machines should be ablt independently assert things like load balancer registration that that might change at any point in the lifetime.
- It's even nicer, though, to have full idempotency from the command line: I can at any time push truth from the git repo to the chef server and know that it will take hold.
Advanced Superpowers
Auto-vivifying machines (no bootstrap required!)
On EC2, you can make a machine that auto-vivifies -- no bootstrap necessary. Burn an AMI that has the config/client.rb
file in /etc/chef/client.rb. It will use the ec2 userdata (passed in by knife) to realize its purpose in life, its identity, and the chef server to connect to; everything happens automagically from there. No parallel ssh required!
EBS Volumes
Define a snapshot_id
for your volumes, and set create_at_launch
true.
Extended Installation Notes
Set up Knife on your local machine, and a Chef Server in the cloud
If you already have a working chef installation you can skip this section.
To get started with knife and chef, follow the "Chef Quickstart,":http://wiki.opscode.com/display/chef/Quick+Start We use the hosted chef service and are very happy, but there are instructions on the wiki to set up a chef server too. Stop when you get to "Bootstrap the Ubuntu system" -- cluster chef is going to make that much easier.
Cloud setup
Next,
- sign up for an AWS account
- Follow the "Knife with AWS quickstart": on the opscode wiki.
Right now cluster chef works well with AWS. If you're interested in modifying it to work with other cloud providers, "see here":https://github.com/infochimps/cluster_chef/issues/28 or get in touch.
Knife setup
In your .chef/knife.rb
, modify the cookbook path to include cluster_chef's cookbooks
, meta-cookbooks
and site-cookbooks
, and to add settings for cluster_chef_path
, cluster_path
and keypair_path
. Here's mine:
current_dir = File.dirname(__FILE__)
organization = 'CHEF_ORGANIZATION'
username = 'CHEF_USERNAME'
# The full path to your cluster_chef installation
cluster_chef_path File.("#{current_dir}/../cluster_chef")
# The list of paths holding clusters
cluster_path [ File.("#{current_dir}/../clusters") ]
# The directory holding your cloud keypairs
keypair_path File.(current_dir)
log_level :info
log_location STDOUT
node_name username
client_key "#{keypair_path}/#{username}.pem"
validation_client_name "#{organization}-validator"
validation_key "#{keypair_path}/#{organization}-validator.pem"
chef_server_url "https://api.opscode.com/organizations/#{organization}"
cache_type 'BasicFile'
( :path => "#{ENV['HOME']}/.chef/checksums" )
# The first things have lowest priority (so, site-cookbooks gets to win)
cookbook_path [
"#{cluster_chef_path}/cookbooks", # std cookbooks from opscode/cookbooks
"#{cluster_chef_path}/meta-cookbooks", # coordinate services among cookbooks
"#{cluster_chef_path}/site-cookbooks", # infochimps' collection of cookbooks
"#{current_dir}/../cookbooks",
"#{current_dir}/../site-cookbooks", # your internal cookbooks
]
# If you primarily use AWS cloud services:
knife[:ssh_address_attribute] = 'cloud.public_hostname'
knife[:ssh_user] = 'ubuntu'
# Configure bootstrapping
knife[:bootstrap_runs_chef_client] = true
bootstrap_chef_version "~> 0.10.0"
# AWS access credentials
knife[:aws_access_key_id] = "XXXXXXXXXXX"
knife[:aws_secret_access_key] = "XXXXXXXXXXXXX"
Push to chef server
To send all the cookbooks and role to the chef server, visit your cluster_chef directory and run:
cd $CHEF_REPO_DIR
mkdir -p $CHEF_REPO_DIR/site-cookbooks
knife cookbook upload --all
for foo in roles/*.rb ; do knife role from file $foo & sleep 1 ; done
You should see all the cookbooks defined in cluster_chef/cookbooks (ant, apt, ...) listed among those it uploads.