Archive
Posts in Monitoring
Collecting Nginx metrics with the Prometheus nginx_exporter
Over the past year I've rolled out numerous Prometheus exporters to provide visibility into the infrastructure I manage. Exporters are server processes that interface with an application (HAProxy, MySQL, Redis, etc.), and make their operational metrics available through an HTTP endpoint. The nginx_exporter is an exporter for Nginx, and allows you to gather the stub_status metrics in a super easy way. To use this exporter, you will first need to download the nginx_exporter binary from the projects Github release page…
$ read more →Using elastic's metricbeat to collect system utilization data
Over the past month I've been evaluating metricbeat. Metricbeat along with the ELK stack are incredibly powerful tools for deriving meaning from metrics and unstructured log data. Metricbeats allows you to funnel system and application metrics (e.g., CPU utilization, number of HTTP GET requests, number of SQL queries, HTTP endpoint response times, etc.) into elasticsearch and the powerful kibana visualization tool can then be used to make sense of it. To get up and running with metricbeats you will first need to configure your logstash infrastructure to support incoming beats…
$ read more →Install metricbeats with ansible and the elastic yum repository
Last month I started playing with elastic's metricbeat and you can say I fell in love at first beat. I've created some amazing visualizations with the metrics it produces and am blown away by how much visibility I can get from correlating disparate event streams. A good example of this is being able to see VMware hypervisor utilization, system utilization and HTTP endpoint latency stacked on top of each other. Elastic hosts a yum metricbeat repository and it's easy to deploy it to Fedora-derived servers with ansible's templating capabilities and the yum_repository module…
$ read more →Troubleshooting a bizarre logstash CPU problem
This week I was converting some legacy shell scripts to ansible roles and wandered into a bizarre issue with one of my elasticsearch servers. After committing a couple of changes my CI system rejected the commit due to a system resource issue. When I logged into the system to troubleshoot the issue I noticed the CPU was pegged: This system is used solely to test changes so it should have been 100% idle. Htop showed the Logstash java process as the top CPU consumer so I ventured off to the log directory to see if anything was awry…
$ read more →How elasticsearch bootstrap checks affect development and production mode
One of my friends reached out to me earlier this week to help him with an elasticsearch issue. He was trying to bring up a new cluster to see how ES compares to splunk and was getting a "bootstrap checks failed" error at startup. This was causing his elasticsearch java processes to bind to localhost instead of the hostname he assigned to the network.host value. Here is a snippet of what I saw when I reviewed the logs: Elasticsearch has two modes of operation: development and production…
$ read more →