Docker Monitoring using Prometheus, cAdvisor, Node Exporter and Grafana
As companies increasingly adopt Docker containers to enhance application deployment efficiency and speed, monitoring and observability become paramount for running containers in production environments. Robust monitoring provides invaluable metrics, logs, and insights into the performance of both applications and the underlying infrastructure. This enables teams to proactively troubleshoot issues before they escalate and cause downstream impacts, as well as optimize resource utilization and spending on containerized resources. This article will help you monitor your docker using prometheus.
In this comprehensive guide, we will establish an integrated open-source monitoring stack for visualizing Docker host and container metrics using the following tools:
- Prometheus: A powerful time-series database for storing and querying metrics.
- cAdvisor: A container resource usage analyzer that collects and exposes container metrics.
- Node Exporter: An exporter for hardware and OS metrics.
- Grafana: A data visualization tool for creating dashboards and visualizing metrics.
Collectively, these tools offer end-to-end observability into Dockerized environments, ranging from the physical infrastructure up to the running applications. We will deploy them using Docker and configure metrics ingestion pipelines, storage, querying, and dashboarding to gain a clear understanding of how containers utilize host resources over time. This article will explain about Docker Monitoring using Prometheus, cAdvisor, Node Exporter and Grafana.
Why Monitor Containers and Hosts?
First, let’s address the fundamental question: why is monitoring so critical for container infrastructure?
As applications are packaged into portable, isolated containers, they become more distributed across fluid pools of virtualized container hosts, such as nodes in a cluster.
This ephemeral architecture introduces several visibility challenges, including:
- Dynamic Resource Allocation: Containers can be created, destroyed, and scaled dynamically, making it difficult to track resource usage over time.
- Isolation: Containers are isolated from the host OS, making it challenging to access host-level metrics directly.
- Complexity: Containerized environments can be complex, with many containers running across multiple hosts.
While containerization offers architectural advantages through loose coupling and portability, the environment becomes increasingly complex from an operational perspective.
By collecting, storing, and charting detailed time-series metrics, we regain visibility, including:
- Resource Usage: CPU, memory, disk, and network usage per container and host.
- Application Performance: Response times, error rates, and other application-specific metrics.
- System Health: Overall health and performance of the Docker hosts.
- Anomaly Detection: Identifying unusual patterns or performance bottlenecks.
Understanding precisely how applications consume resources enables more efficient operation, automated responses, and helps prevent instability due to overutilization or resource bottlenecks.
Prerequisites
To follow along with all components, you will need:
- A Linux server (e.g., Ubuntu 20.04) with Docker installed.
- Docker Compose (optional, but recommended for easier deployment).
For convenience, we will use an Ubuntu 20.04 system. However, any modern Linux distribution should work well.
Now, let’s explore how to wire up Prometheus metrics ingestion and visualize dashboards for container environments!
Step 1 – Create Isolated Docker Monitoring Network
First, create a user-defined bridge network for communication between the monitoring services using:
$ docker network create monitoring-net
This allows exposing the services on consistent hostnames instead of dynamic IP addresses that can change:
prometheus
grafana
cadvisor
node-exporter
Now, run each service attached to this network for simplified connectivity.
Step 2 – Set Up Prometheus Time Series Database
Prometheus is a specialized time-series database optimized for ingesting and querying numeric metrics like counters, gauges, and histograms, even at high scale or cardinality. It scrapes and stores numeric metrics at regular intervals over time.
Prometheus runs well in containers and integrates with Docker environments using exporters to provide metrics about containers, images, volumes, and more.
Pull the latest Prometheus server image:
$ docker pull prom/prometheus:latest
Next, create a directory for persistence across container restarts:
$ mkdir -p /prometheus-data
We need to define a Prometheus configuration file to discover what to monitor. Create this at /prometheus-data/prometheus.yml
on your host with initial job definitions:
global:
scrape_interval: 10s
external_labels:
monitor: production-01
scrape_configs:
- job_name: 'prometheus'
scrape_interval: 5s
static_configs:
- targets: ['localhost:9090']
- job_name: 'cadvisor'
scrape_interval: 5s
static_configs:
- targets: ['cadvisor:8080']
- job_name: 'node'
scrape_interval: 5s
static_configs:
- targets: ['node-exporter:9100']
This configures connectivity to:
- Prometheus itself (for self-monitoring).
- cAdvisor (for container metrics).
- Node Exporter (for host metrics).
We will define these services next.
Finally, run the Prometheus server in detached mode:
$ docker run -d --name=prometheus
--network=monitoring-net
-p 9090:9090
-v=/prometheus-data:/prometheus-data
prom/prometheus:latest
--config.file=/prometheus-data/prometheus.yml
This launches Prometheus connected to the monitoring network, loading the configuration file and persistent volume.
You can access the Prometheus UI at http://<server-ip>:9090
. We will integrate Grafana shortly for dashboards.
Step 3 – Install cAdvisor for Container Metrics
cAdvisor (Container Advisor) is a utility for collecting, aggregating, processing, and exporting performance and resource usage metrics from running containers.
For example, cAdvisor exposes CPU, memory, filesystem, and network usage statistics per container. This allows understanding how much resources containers utilize relative to their host machines over time.
cAdvisor runs as a daemonset on each node in a cluster for monitoring resource usage. Here, we will run it standalone with Docker.
Pull the latest cAdvisor image:
$ docker pull gcr.io/cadvisor/cadvisor:latest
Then, launch the containerized cAdvisor agent:
$ docker run
--name=cadvisor
--network=monitoring-net
--volume=/:/rootfs:ro
--volume=/var/run:/var/run:rw
--volume=/sys:/sys:ro
--volume=/var/lib/docker/:/var/lib/docker:ro
--publish=8080:8080
--detach=true
gcr.io/cadvisor/cadvisor:latest
This runs cAdvisor with access to:
- The root filesystem (read-only).
- The
/var/run
directory (read-write). - The
/sys
directory (read-only). - The Docker socket (read-only).
cAdvisor scrapes these sources and exposes aggregated metrics on port 8080.
Prometheus will automatically discover cAdvisor metrics on our monitoring network.
Step 4 – Installing Node Exporter on Docker Hosts
While cAdvisor exposes metrics about running containers, Node Exporter gathers OS and hardware metrics from the Docker hosts themselves, such as CPU, memory, disk utilization, network, systemd services, and more. This completes Docker Monitoring using Prometheus, cAdvisor, Node Exporter and Grafana.
This reveals performance and saturation issues that could impact applications, like the operating system or physical hardware.
Using docker, launch node-exporter similarly:
$ docker run -d
--name=node-exporter
--network=monitoring-net
-p 9100:9100
prom/node-exporter:latest
This exposes host metrics on port 9100. Prometheus will discover them for scraping into metrics like:
node_cpu_seconds_total{mode="idle"}
node_memory_MemAvailable_bytes
node_network_transmit_bytes_total
We now have two pipelines sending system and container metrics into Prometheus.
Step 5 – Install Grafana for Beautiful Data Visualization
Raw metrics numbers in Prometheus can be hard to interpret over the CLI or UI. For improved analysis, Grafana provides flexible data dashboards with graphs, gauges, and breakdowns mixing multiple metrics together.
Pull and run the official Grafana image:
$ docker run -d
--name=grafana
-p 3000:3000
--network=monitoring-net
grafana/grafana:latest
This runs Grafana connected to our monitoring services on port 3000.
Navigate to http://<server-ip>:3000
and log into Grafana using the default credentials:
- Username:
admin
- Password:
admin
Let’s set up our Prometheus data source next…
Step 6 – Configure Prometheus Data Source in Grafana
From the Grafana sidebar menu, click on “Configuration” then “Data Sources”.
Here, you can manage connections to monitoring databases like Prometheus, Graphite, InfluxDB, and more.
Select “Add Data Source” and set the following fields:
- Name:
Prometheus
- Type:
Prometheus
- URL:
http://prometheus:9090
(because we’re on themonitoring-net
network)
Then, click “Save and Test”. Grafana now has access to all metrics stored in Prometheus!
Step 7 – Import Dashboard Templates
Rather than creating dashboards completely from scratch, we can leverage the Grafana community dashboard ecosystem with pre-built templates.
Hover over the “+” icon on the left menu and select “Import”. Then, enter the Dashboard ID 1860, which handles containers, or 893 for the Docker host view.
Grafana imports these templates pre-populated with graphs and breakdowns for our new Prometheus data source. Customize them or extend with more focused dashboards!
Now, your Grafana instance should have insightful dashboards monitoring:
- Docker container resource usage (CPU, memory, network, disk).
- Docker host system performance (CPU, memory, network, disk).
Prometheus and Grafana now provide end-to-end data pipelines, storage, and visualization for container environments. Next, let’s discuss administrative functionality to maintain and scale our new monitoring stack.
Administering the Monitoring Services
Now that you have Prometheus, Node Exporter, cAdvisor, and Grafana all running, here are best practices for administering these over the long term.
Persisting Prometheus Metrics History
By default, Prometheus stores metrics locally on disk, which limits capacity and durability. For production systems, use remote storage to retain history for longer trend analysis and capacity planning.
Popular long-term stores compatible with Prometheus include:
- Thanos: A highly available, distributed Prometheus setup.
- Cortex: A horizontally scalable, multi-tenant Prometheus-as-a-Service.
- Cloud Provider Monitoring Solutions: AWS CloudWatch, Google Cloud Monitoring, Azure Monitor.
Configure these under prometheus.yml
’s remote_write:
and remote_read:
sections.
Retaining Dashboard History in Grafana
To persist dashboard history across Grafana restarts, configure a database like PostgreSQL or MySQL under “Configuration – Data Sources – Grafana Database”. Sync monitoring dashboards to source control for version history.
Limiting Data Cardinality
Due to the high volume of per-container metrics possible, watch for metrics “cardinality explosions”. Filter carefully with metric relabeling to control storage growth. Dropping less valuable metrics can improve performance.
Horizontal Sharding
To distribute load as environments grow larger, run multiple Prometheus pods in a StatefulSet with hashring or federation configurations. Similarly, Grafana can be made highly available through replication controllers.
Summary
In this guide, we have built a comprehensive monitoring stack for Docker hosts and container estates, including:
- Prometheus: for collecting and storing time-series metrics.
- cAdvisor: for monitoring container resource usage.
- Node Exporter: for monitoring host system performance.
- Grafana: for visualizing metrics in dashboards.
Together, these tools provide end-to-end visibility and alerts to detect anomalies across dynamic container environments.
Now that you have a working monitoring foundation, potential next steps include:
- Setting up alerting rules in Prometheus to trigger notifications on critical events.
- Creating custom Grafana dashboards tailored to your specific application needs.
- Integrating with logging systems like Fluentd or ELK stack for comprehensive observability.
- Exploring advanced monitoring techniques like distributed tracing.
As you move containers and microservices to production, I hope this exploration into metrics, monitoring, and visibility helps run infrastructure reliably! Let me know if you have any other questions.
Alternative Solutions for Docker Monitoring
While the Prometheus, cAdvisor, Node Exporter, and Grafana stack is a popular and powerful solution, alternative approaches exist for Docker monitoring. Here are two different ways to solve the problem, along with explanations and code examples where applicable.
1. Using the ELK Stack (Elasticsearch, Logstash, Kibana)
The ELK stack, now often referred to as the Elastic Stack, is another widely used open-source platform for log management and analysis. While primarily known for logs, it can also be adapted to monitor Docker metrics.
Explanation:
Instead of relying on Prometheus as the time-series database, the ELK stack uses Elasticsearch. Logstash acts as the data pipeline, collecting logs and metrics from various sources, including Docker containers and hosts. Kibana provides the visualization and dashboarding capabilities.
To monitor Docker metrics with the ELK stack, you would typically use the following:
- Filebeat (or other Beats): Filebeat is a lightweight shipper that can forward log files or metrics to Logstash or directly to Elasticsearch.
- Docker Logging Driver: Configure Docker containers to send their logs to a specific driver (e.g.,
gelf
,json-file
,syslog
). - Metricbeat (optional): Metricbeat is another Beat that can collect system and service metrics, including Docker metrics, and ship them to Elasticsearch.
Code Example (Filebeat Configuration):
This example shows a basic Filebeat configuration for collecting Docker container logs:
filebeat.inputs:
- type: container
paths:
- '/var/lib/docker/containers/*/*.log'
processors:
- decode_json_fields:
target: json
fields: ["message"]
process_empty: true
process_keys: true
output.elasticsearch:
hosts: ["elasticsearch:9200"]
Explanation:
filebeat.inputs
: Defines the input source for Filebeat. Here, it’s configured to read from Docker container log files.type: container
: Specifies that the input is container logs.paths
: The path to the Docker container log files.processors
: Uses thedecode_json_fields
processor to parse JSON-formatted log messages (if your containers are logging in JSON).output.elasticsearch
: Configures Filebeat to send the data to Elasticsearch.
Advantages of ELK Stack:
- Centralized Logging: Excellent for aggregating and analyzing logs from all your containers and hosts in one place.
- Powerful Search Capabilities: Elasticsearch provides robust search capabilities for quickly finding specific events or errors in your logs.
- Flexible Data Processing: Logstash allows for complex data transformation and enrichment.
Disadvantages of ELK Stack:
- Resource Intensive: ELK stack can be more resource-intensive than the Prometheus/Grafana stack, especially for large-scale deployments.
- Complexity: Configuring and managing the ELK stack can be complex, requiring a good understanding of Elasticsearch, Logstash, and Kibana.
- Not Optimized for Time-Series Data: While Elasticsearch can handle time-series data, it’s not specifically optimized for it like Prometheus.
2. Using Datadog
Datadog is a popular commercial monitoring and analytics platform that provides a comprehensive solution for monitoring Docker containers and hosts.
Explanation:
Datadog uses an agent that runs on each host to collect metrics, logs, and traces. The agent automatically discovers Docker containers and collects relevant metrics, such as CPU, memory, network, and disk usage.
Configuration:
-
Install the Datadog Agent: Follow the Datadog documentation to install the agent on your Docker hosts.
-
Enable the Docker Integration: The Datadog agent automatically discovers Docker containers. You can configure the integration to collect specific metrics and logs.
-
Use Datadog Dashboards: Datadog provides pre-built dashboards for Docker monitoring, or you can create custom dashboards tailored to your needs.
Code Example (Illustrative – No actual code needed for basic Datadog setup):
While you don’t directly write code for basic Datadog setup, you might use configuration files to customize the Docker integration. Here’s an example of what a configuration file might contain (this is for illustrative purposes, refer to the Datadog documentation for the specific format):
init_config:
instances:
- docker_url: "unix://var/run/docker.sock"
collect_container_size: true
collect_images_stats: true
excluded_images:
- "datadog/agent"
Explanation:
docker_url
: Specifies the URL for the Docker socket.collect_container_size
: Enables the collection of container size metrics.collect_images_stats
: Enables the collection of Docker image statistics.excluded_images
: Excludes specific images from monitoring.
Advantages of Datadog:
- Easy to Use: Datadog is generally easier to set up and use than open-source solutions like Prometheus/Grafana or ELK.
- Comprehensive Features: Datadog provides a wide range of features, including monitoring, logging, tracing, security monitoring, and more.
- Excellent Support: Datadog offers excellent customer support.
Disadvantages of Datadog:
- Cost: Datadog is a commercial product and can be expensive, especially for large-scale deployments.
- Vendor Lock-in: Using Datadog can create vendor lock-in.
- Less Customization: While Datadog is highly configurable, it may not offer the same level of customization as open-source solutions.
Conclusion:
The choice of monitoring solution depends on your specific needs and priorities. If you need a free, open-source solution with a strong focus on time-series data, Prometheus/Grafana is a good choice. If you need centralized logging and powerful search capabilities, the ELK stack is a viable option. If you prioritize ease of use and comprehensive features, Datadog is a strong contender. Docker Monitoring using Prometheus, cAdvisor, Node Exporter and Grafana is a great solution, but these alternatives are also available.