Learning - Prometheus Monitor
Monitoring Tool for
Highly dynamic container environments
Container & Microservices Infrastructure
Traditional, bare server
constantly monitor all the services
alert when crash
identify problem before
checking memory usage
Trigger alert at 50%
Monitor network loads
Does the actual monitoring work
Time Series Database
Storage - stores metrics data (CPU usage, No. of exception)
Data Retrieval Worker
Retrieval - pulls metrics data (Applications, Servers, ...)
Accepts PromQL queries
HTTP Server - accepts queries
Prometheus Web UI
Targets and Metrics
What does Prometheus monitor?
- Linux/Windows Server
- Single Application
- Apache Server
- Service, like Database
Which units are monitored of those targets?
- CPU Status
- Memory/Disk Space Usage
- Requests Count
- Exceptions Count
- Request Duration
- Format: Human-readable text-based
- Metrics entries: TYPE and HELP attributes
HELP - description of what the metrics is
TYPE - 3 metrics types
- Counter - how many times x happened
- Gauge - what is the current value of x now?
- Histogram - how long or how big?
Collecting Metrics Data from Targets
Data Retrieval Worker => pull over HTTP => Target (Linux Server, External Service)
- Pulls from HTTP endpoints
- must be in correct format
Target Endpoints and Exporters
/metricsendpoints by default
- Many services need another component
- fetches metrics from target (some service)
- converts to correct format
List of official exporters ...
Monitor a Linux Server?
- download a node exporter
- untar and execute
- converts metrics of the server
- configure prometheus to scrape this endpoint
Monitoring your own applications
- How many requests?
- How many exceptions?
- How many server resources are used?
Using client libraries you can expose
Data Retrieval Worker pulls Targets
Amazon Cloud Watch, New Relic - Applications/Servers push to a centralized collection platform
- high load of network traffic
- monitoring can become your bottleneck
- install additional software or tool to push metrics
Pull system - more advantages
- multiple Premetheus instances can pull metrics data
- better detection/insight if service is up and running
What, when target only runs for a short time?
"short-lived job" => push metrics at exit => Pushgateway
Pushgateway <= pull <= Prometheus Server
Prometheus targets <= pull <= Prometheus Server
How does Prometheus know what to scrape and when?
- which targets?
- at what interval?
service discovery <= discover targets <= Prometheus Server
global: scrape_interval: 15s evaluation_interval: 15s rule_files: # - "first.rules" # = "second.rules" scrape_configs: - job_name: prometheus static_configs: - targets: ['localhost:9090'] - job_name: node_exporter scrape_interval: 1m scrape_timeout: 1m static_configs: - targets: ['localhost:9100]
- How often Prometheus will scrape its targets
- Rules for aggregating metric values or creating alerts when condition met
- What resources Prometheus monitors
- Prometheus has its own
- Prometheus has its own
- How does Prometheus trigger the alerts?
- Who receives the alerts?
Prometheus Server => push alerts => Alertmanager => Email, Slack, etc.
Prometheus Data Storage
Where does Prometheus store the data?
Local - Disk (HDD/SSD)
Remote Storage Systems
Custom Time Series Format
- Can't write prometheus data directly into a relational database
PromQL Query Language
Prometheus Web UI => PromQL => Prometheus Server
Data Visualization Tools => PromQL => Prometheus Server
- Query target directly
- Or use more powerful visualization tools - e.g. Grafana
Query all HTTP status codes except 4xx ones
Returns the 5-minute rate of the http_requests_total metric for the past 30mins
- stand-alone and self-containing
- works, even if other parts of infrastructure broken
- no extensive set-up needed
- less complex
- difficult to scale
- limits monitoring
- increase Prometheus server capacity
- limit number of metrics
Prometheus with Docker and Kubernetes
- fully compatible
- Prometheus components available as Docker images
- can easily be deployed in Container Environments like Kubernetes
- Monitoring of K8s Cluster Node Resources out-of-the box!