Learning - Prometheus Monitor

Monitoring Tool for

Highly dynamic container environments
Container & Microservices Infrastructure
Traditional, bare server
constantly monitor all the services
alert when crash
identify problem before
checking memory usage
notify administrator
Trigger alert at 50%
Monitor network loads

Prometheus Server

Does the actual monitoring work

Time Series Database
Storage - stores metrics data (CPU usage, No. of exception)
Data Retrieval Worker
Retrieval - pulls metrics data (Applications, Servers, ...)
Accepts PromQL queries
HTTP Server - accepts queries
Prometheus Web UI
Grafana
etc.

Targets and Metrics

Targets

What does Prometheus monitor?
- Linux/Windows Server
- Single Application
- Apache Server
- Service, like Database
Which units are monitored of those targets?
- CPU Status
- Memory/Disk Space Usage
- Requests Count
- Exceptions Count
- Request Duration

Metrics

Format: Human-readable text-based
Metrics entries: TYPE and HELP attributes
HELP - description of what the metrics is
TYPE - 3 metrics types
- Counter - how many times x happened
- Gauge - what is the current value of x now?
- Histogram - how long or how big?

Collecting Metrics Data from Targets

Data Retrieval Worker => pull over HTTP => Target (Linux Server, External Service)

Pulls from HTTP endpoints
hostaddress/metrics
must be in correct format

Target Endpoints and Exporters

Exposing /metrics endpoints by default
Many services need another component

Exporter

fetches metrics from target (some service)
converts to correct format
expose /metrics

List of official exporters ...

Monitor a Linux Server?

download a node exporter
untar and execute
converts metrics of the server
exposes /metrics endpoint
configure prometheus to scrape this endpoint

Monitoring your own applications

How many requests?
How many exceptions?
How many server resources are used?

Using client libraries you can expose /metrics endpoint

Pull Mechanism

Data Retrieval Worker pulls Targets /metrics

Push system

Amazon Cloud Watch, New Relic - Applications/Servers push to a centralized collection platform

high load of network traffic
monitoring can become your bottleneck
install additional software or tool to push metrics

Pull system - more advantages

multiple Premetheus instances can pull metrics data
better detection/insight if service is up and running

Pushgateway

What, when target only runs for a short time?

"short-lived job" => push metrics at exit => Pushgateway

Pushgateway <= pull <= Prometheus Server
Prometheus targets <= pull <= Prometheus Server

Configuring Prometheus

How does Prometheus know what to scrape and when?

prometheus.yml
- which targets?
- at what interval?
service discovery
service discovery <= discover targets <= Prometheus Server

global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
  # - "first.rules"
  # = "second.rules"

scrape_configs:
  - job_name: prometheus
    static_configs:
      - targets: ['localhost:9090']
  - job_name: node_exporter
    scrape_interval: 1m
    scrape_timeout: 1m
    static_configs:
      - targets: ['localhost:9100]

How often Prometheus will scrape its targets
Rules for aggregating metric values or creating alerts when condition met
What resources Prometheus monitors
- Prometheus has its own /metrics endpoint

Alert Manager

How does Prometheus trigger the alerts?
Who receives the alerts?

Prometheus Server => push alerts => Alertmanager => Email, Slack, etc.

Prometheus Data Storage

Where does Prometheus store the data?

Local - Disk (HDD/SSD)
Remote Storage Systems
Custom Time Series Format
- Can't write prometheus data directly into a relational database

PromQL Query Language

Prometheus Web UI => PromQL => Prometheus Server
Data Visualization Tools => PromQL => Prometheus Server

Query target directly
Or use more powerful visualization tools - e.g. Grafana

PromQL Query

Query all HTTP status codes except 4xx ones

http_requests_total{status!~"4.."}

Returns the 5-minute rate of the http_requests_total metric for the past 30mins

rate(http_requests_total[5m])[30m:]

Prometheus Characteristics

Pros

reliable
stand-alone and self-containing
works, even if other parts of infrastructure broken
no extensive set-up needed
less complex

Cons

difficult to scale
limits monitoring

Workarounds

increase Prometheus server capacity
limit number of metrics

Prometheus with Docker and Kubernetes

fully compatible
Prometheus components available as Docker images
can easily be deployed in Container Environments like Kubernetes
Monitoring of K8s Cluster Node Resources out-of-the box!

References

How Prometheus Monitoring works | Prometheus Architecture explained

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Day: March 13, 2022