By default, elasticsearch will create 5 shards when receiving data from logstash. While 5 shards, may be a good default, there are times that you may want to increase and decrease this value.
Suppose you are splitting up your data into a lot of indexes. And you are keeping data for 30 days.
- web-servers
- database-servers
- mail-servers
At 10 shards per day (5 shards x 2 copies), thats 300 shards. Considering that each shard is its own lucene index, this has the potential to be a lot of overhead.
Templates
Elasticserach leverages templates to define the settings for the indexes in shards. You can see the elasticsearch template for logstash with this http GET
_template/logstash?pretty
From linux / Mac terminal
curl elasticsearch.example.com:9200/_template/logstash?pretty
My config will likely look different than yours since it leverages ‘doc_values’ according to this blog
Observe below how the template leverages a wildcard to apply to all logstash indexes. logstash-*
{
"logstash" : {
"order" : 0,
"template" : "logstash-*",
"settings" : { },
"mappings" : {
"_default_" : {
"dynamic_templates" : [ {
"date_fields" : {
"mapping" : {
"format" : "dateOptionalTime",
"doc_values" : true,
"type" : "date"
},
"match" : "*",
"match_mapping_type" : "date"
}
}, {
"byte_fields" : {
"mapping" : {
"doc_values" : true,
"type" : "byte"
},
"match" : "*",
"match_mapping_type" : "byte"
}
}, {
"double_fields" : {
"mapping" : {
"doc_values" : true,
"type" : "double"
},
"match" : "*",
"match_mapping_type" : "double"
}
}, {
"float_fields" : {
"mapping" : {
"doc_values" : true,
"type" : "float"
},
"match" : "*",
"match_mapping_type" : "float"
}
}, {
"integer_fields" : {
"mapping" : {
"doc_values" : true,
"type" : "integer"
},
"match" : "*",
"match_mapping_type" : "integer"
}
}, {
"long_fields" : {
"mapping" : {
"doc_values" : true,
"type" : "long"
},
"match" : "*",
"match_mapping_type" : "long"
}
}, {
"short_fields" : {
"mapping" : {
"doc_values" : true,
"type" : "short"
},
"match" : "*",
"match_mapping_type" : "short"
}
}, {
"string_fields" : {
"mapping" : {
"index" : "not_analyzed",
"omit_norms" : true,
"doc_values" : true,
"type" : "string"
},
"match" : "*",
"match_mapping_type" : "string"
}
} ],
"properties" : {
"@version" : {
"index" : "not_analyzed",
"doc_values" : true,
"type" : "string"
}
},
"_all" : {
"enabled" : true
}
}
},
"aliases" : { }
}
}
Modify default shard count
To change the default shard count, we will need to modify the settings field in the template.
"settings" : {
"number_of_shards" : 2
},
...
Before you proceed, understand that this only applies to new indexes. You can’t change the mapping of an already indexed indicie without reimporting your data.
Save the template to your workstation
$elk = 'http://elasticsearch.example.com:9200'
curl $elk/_template/logstash?pretty > ~/Desktop/logstash-template.json
Backup the file, then edit the ‘settings’ section of your file to reflect the number of shards that you want. I’m going to change mine from the default of 5 down to 2
Remove unneeded fields from template
The most important, and least clearly documented step of uploading a template, is removing the following lines
"logstash" : {
"order" : 0,
}
Before
{
"logstash" : {
"order" : 0,
"template" : "logstash-*",
"settings" : { },
"mappings" : {
"_default_" : {
"dynamic_templates" : [ {
After
{
"template" : "logstash-*",
"settings" : {
"number_of_shards": 2
},
"mappings" : {
"_default_" : {
"dynamic_templates" : [ {
Final result
Here is a full config that has the ‘logstash’, and ‘order’ fields removed
{
"template" : "logstash-*",
"settings" : {
"number_of_shards": 2
},
"mappings" : {
"_default_" : {
"dynamic_templates" : [ {
"date_fields" : {
"mapping" : {
"format" : "dateOptionalTime",
"doc_values" : true,
"type" : "date"
},
"match" : "*",
"match_mapping_type" : "date"
}
}, {
"byte_fields" : {
"mapping" : {
"doc_values" : true,
"type" : "byte"
},
"match" : "*",
"match_mapping_type" : "byte"
}
}, {
"double_fields" : {
"mapping" : {
"doc_values" : true,
"type" : "double"
},
"match" : "*",
"match_mapping_type" : "double"
}
}, {
"float_fields" : {
"mapping" : {
"doc_values" : true,
"type" : "float"
},
"match" : "*",
"match_mapping_type" : "float"
}
}, {
"integer_fields" : {
"mapping" : {
"doc_values" : true,
"type" : "integer"
},
"match" : "*",
"match_mapping_type" : "integer"
}
}, {
"long_fields" : {
"mapping" : {
"doc_values" : true,
"type" : "long"
},
"match" : "*",
"match_mapping_type" : "long"
}
}, {
"short_fields" : {
"mapping" : {
"doc_values" : true,
"type" : "short"
},
"match" : "*",
"match_mapping_type" : "short"
}
}, {
"string_fields" : {
"mapping" : {
"index" : "not_analyzed",
"omit_norms" : true,
"doc_values" : true,
"type" : "string"
},
"match" : "*",
"match_mapping_type" : "string"
}
} ],
"properties" : {
"@version" : {
"index" : "not_analyzed",
"doc_values" : true,
"type" : "string"
}
},
"_all" : {
"enabled" : true
}
}
},
"aliases" : { }
}
Verify that you have valid json by using a tool like this one
Double check your config, then upload to any elasticsearch node. You can specify a file to upload by prefacing the filename with @
. In this case, I named my file ‘foobar.json’
cd ~/Desktop
$elk = 'http://elasticsearch.example.com:9200'
curl -XPUT $elk/_template/logstash -d "@foobar.json"
If everything uploaded correctly, you can check with the same command you ran earlier
curl elasticsearch.example.com:9200/_template/logstash?pretty
Tomorrows index should only have 2 shards
If there is a problem reading the file, or uploading, elasticsearch will warn you and ignore the changes.
Warning: Couldn't read data from file "foobar", this makes an empty
Warning: POST.
Additional Resources