Every configuration management tool has the concept of config hierarchy. Puppet has Hiera, ansible has groups
I recently moved from a company that used puppet to a windows shop who was green fielding with chef. Chef is really growing on me, but there is one design pattern that it does not handle well. “Datacenters”
With puppet, if you have multiple datacenters and each needs its own config, you simply create a hiera hierarchy and put your variables in the appropriate level.
Chef has a different data model. As of chef 12, there are 15 different levels of precedence for attributes. The main hierarchy looking like the following:
+------+
| node |
+------+
+----------+
| role |
+----------+
+-------------+
| environment |
+-------------+
Dealing with datacenter / role specific attributes
There is still one corner case that the above design patten doesn’t cover. Suppose you have a specific group of servers that need overrides that are different per datacenter. This creates a 2x matrix that normally could only be modeled by creating a role per datacenter. Thats no big deal if you only have a couple of DCs, but quickly becomes unmanageable. Suppose you had 8 datacenters and 5 types of web servers, You would need 40 roles to model them all! Any time you want to make a change that applies to all of them, you would need to modify 40 files.
roles
datacenter-us-west-web-basic
datacenter-us-west-web-custom1
datacenter-us-west-web-custom2
datacenter-us-west-web-custom3
datacetner-us-east-web-basic
datacenter-us-east-web-custom1
datacenter-us-east-web-custom2
datacenter-us-east-web-custom3
datacenter-uk-north-web-basic
....
Matrices should be avoided at all costs
The solution we came up with was to have a datacenter role that contains datacenter specific settings, and override roles that contain the attributes that are unique per service, per datacenter.
+------+
| node |
+------+
+----------------+
| override-roleA |
+----------------+
+-------+ +--------------------+
| roleA | | datacenter-us-west |
+-------+ +--------------------+
+--------------------------------+
| environment |
+--------------------------------+
This only works as long as you keep each attribute unique per level. For example, if both roleA and roleB had the foo
attribute, the later one in the runlist would win.
Example
We have web servers
, web upload servers
and custom mail servers
that are deployed from a monolithic build. The .tgz file created by our build pipeline contains the code for all 3. A web server
, upload server
and a mail server
only differ by the services that are running. Web servers have role specific settings, upload servers have role specific settings and datacenter specific settings whereas mail servers have just datacenter specific settings.
Eventually we will move away from the monolithic builds, and re-architect our software to not require datacenter specific and roles settings, but for now this is what we have.
The roles and environments
roles
web-default
web-mail
datacenter-us-west
datacenter-us-east
override-web-upload-us-west
override-web-upload-us-east
environment
prod
stage
So if you need a web server, you apply roles
web-default, datacenter-us-east
For a mail server, you apply roles
web-mail, datacenter-us-east
For a upload server you apply roles
web-default, override-web-upload-us-west, datacenter-us-west
Role principles
- Put as many attributes at the lowest possible level. Environment -> Role -> Override-Role -> Node
- Avoid using node specific settings like the plague
- Use as few environments as you really need, each has an administrative cost
- Use as few roles as possible
- Every node gets a datacenter role
- Only service roles have run_lists (web,mail, ect..)
- Only create datacenter specific service roles if absolutely necessary.
Web server role
{
"name": "web-default",
"chef_type": "role",
"default_attributes": {
"webserver": true,
"foo": 42
},
"override_attributes": {
},
"run_list": [
"recipe[web-cookbook]",
"recipe[web-server]"
]
}
Mail server role
{
"name": "web-mail",
"chef_type": "role",
"default_attributes": {
"mailserver": true,
"foo" : 33
},
"override_attributes": {
},
"run_list": [
"recipe[web-cookbook]",
"recipe[mail-server]"
]
}
Datacenter specific role
{
"name": "datacenter-us-west",
"chef_type": "role",
"default_attributes": {
"loadbalancer": "5.5.5.5"
},
"override_attributes": {
}
}
Upload servers in every datacenter need their own specific settings, put those settings in an “override” role
{
"name": "override-web-upload-us-west",
"chef_type": "role",
"default_attributes": {
},
"override_attributes": {
"loadbalancer": "10.10.10.10"
},
"run_list": [
"recipe[web-cookbook]",
"recipe[web-upload]"
]
}
{
"name": "override-web-upload-us-east",
"chef_type": "role",
"default_attributes": {
},
"override_attributes": {
"loadbalancer": "20.20.20.20"
},
"run_list": [
"recipe[web-cookbook]",
"recipe[web-upload]"
]
}
While designing our roles, we brainstormed every possible pattern in a github gist:. The final design is a hybrid that uses pattern B
as much as possible, and pattern A
for the datacenter specific roles.
This pattern works very well for us, understandably it won’t work perfectly for everyone. Chef is working on the policy file which will be interesting how it will replace some of our design once it is finalized.
Questions
- Roles don’t have versions, won’t a change to a role affect all environments at once?
That is exactly why chef is creating the policy file. In our infrastructure, runlists don’t change very often. And even if they do, they are in version control so it can be reverted back. We put as many settings as possible at the lowest hierarchy level (environment). Every other setting next goes to the datacenter specific role. This minimizes the number of servers that are affected by a single change.
- What happens if you have the same setting in 2 roles?
The last one will win. Thats why the override roles have all settings in the override_attributes
section.
- Don’t you have to make a lot of changes if you want to change a setting everywhere?
A change to the entire infrastructure requires making the change to every environment. We have 4 environments (dev, test, stage, prod). So at most 4 files need to be modified.
Additional Information
https://gist.github.com/spuder/00aa024e61392d16f4cc
http://serverfault.com/questions/700860/can-chef-have-different-databags-in-different-datacenters