Table of Contents
Learning - Prometheus Monitor
Monitoring Tool for
-
Highly dynamic container environments
-
Container & Microservices Infrastructure
-
Traditional, bare server
-
constantly monitor all the services
-
alert when crash
-
identify problem before
-
checking memory usage
-
notify administrator
-
Trigger alert at 50%
-
Monitor network loads
Prometheus Server
Does the actual monitoring work
-
Time Series Database
Storage - stores metrics data (CPU usage, No. of exception) -
Data Retrieval Worker
Retrieval - pulls metrics data (Applications, Servers, ...) -
Accepts PromQL queries
HTTP Server - accepts queries -
Prometheus Web UI
-
Grafana
-
etc.
Targets and Metrics
Targets
-
What does Prometheus monitor?
- Linux/Windows Server
- Single Application
- Apache Server
- Service, like Database
-
Which units are monitored of those targets?
- CPU Status
- Memory/Disk Space Usage
- Requests Count
- Exceptions Count
- Request Duration
Metrics
- Format: Human-readable text-based
- Metrics entries: TYPE and HELP attributes
HELP - description of what the metrics is
TYPE - 3 metrics types- Counter - how many times x happened
- Gauge - what is the current value of x now?
- Histogram - how long or how big?
Collecting Metrics Data from Targets
Data Retrieval Worker => pull over HTTP => Target (Linux Server, External Service)
- Pulls from HTTP endpoints
hostaddress/metrics
- must be in correct format
Target Endpoints and Exporters
- Exposing
/metrics
endpoints by default - Many services need another component
Exporter
- fetches metrics from target (some service)
- converts to correct format
- expose
/metrics
List of official exporters ...
Monitor a Linux Server?
- download a node exporter
- untar and execute
- converts metrics of the server
- exposes
/metrics
endpoint - configure prometheus to scrape this endpoint
Monitoring your own applications
- How many requests?
- How many exceptions?
- How many server resources are used?
Using client libraries you can expose /metrics
endpoint
Pull Mechanism
Data Retrieval Worker pulls Targets /metrics
Push system
Amazon Cloud Watch, New Relic - Applications/Servers push to a centralized collection platform
- high load of network traffic
- monitoring can become your bottleneck
- install additional software or tool to push metrics
Pull system - more advantages
- multiple Premetheus instances can pull metrics data
- better detection/insight if service is up and running
Pushgateway
What, when target only runs for a short time?
"short-lived job" => push metrics at exit => Pushgateway
Pushgateway <= pull <= Prometheus Server
Prometheus targets <= pull <= Prometheus Server
Configuring Prometheus
How does Prometheus know what to scrape and when?
-
prometheus.yml
- which targets?
- at what interval?
-
service discovery
service discovery <= discover targets <= Prometheus Server
global:
scrape_interval: 15s
evaluation_interval: 15s
rule_files:
# - "first.rules"
# = "second.rules"
scrape_configs:
- job_name: prometheus
static_configs:
- targets: ['localhost:9090']
- job_name: node_exporter
scrape_interval: 1m
scrape_timeout: 1m
static_configs:
- targets: ['localhost:9100]
- How often Prometheus will scrape its targets
- Rules for aggregating metric values or creating alerts when condition met
- What resources Prometheus monitors
- Prometheus has its own
/metrics
endpoint
- Prometheus has its own
Alert Manager
- How does Prometheus trigger the alerts?
- Who receives the alerts?
Prometheus Server => push alerts => Alertmanager => Email, Slack, etc.
Prometheus Data Storage
Where does Prometheus store the data?
-
Local - Disk (HDD/SSD)
-
Remote Storage Systems
-
Custom Time Series Format
- Can't write prometheus data directly into a relational database
PromQL Query Language
Prometheus Web UI => PromQL => Prometheus Server
Data Visualization Tools => PromQL => Prometheus Server
- Query target directly
- Or use more powerful visualization tools - e.g. Grafana
PromQL Query
Query all HTTP status codes except 4xx ones
http_requests_total{status!~"4.."}
Returns the 5-minute rate of the http_requests_total metric for the past 30mins
rate(http_requests_total[5m])[30m:]
Prometheus Characteristics
Pros
- reliable
- stand-alone and self-containing
- works, even if other parts of infrastructure broken
- no extensive set-up needed
- less complex
Cons
- difficult to scale
- limits monitoring
Workarounds
- increase Prometheus server capacity
- limit number of metrics
Prometheus with Docker and Kubernetes
- fully compatible
- Prometheus components available as Docker images
- can easily be deployed in Container Environments like Kubernetes
- Monitoring of K8s Cluster Node Resources out-of-the box!
References
How Prometheus Monitoring works | Prometheus Architecture explained