Observability
Prometheus + PromQL — observability metrics that matter
Prometheus collects time-series metrics. PromQL queries them. Master 6 query patterns and answer 90% of operational questions.
## Basic queries
```promql
http_requests_total
http_requests_total{job="api"} # filter by label
http_requests_total{status=~"5.."} # regex (5xx)
```
## Rate (most common pattern)
```promql
rate(http_requests_total[5m]) # rps over 5m
sum by (job) (rate(http_requests_total[5m])) # rps by service
rate(http_requests_total{status=~"5.."}[5m])
/ rate(http_requests_total[5m]) # 5xx error rate
```
## Aggregation
```promql
sum(node_memory_MemAvailable_bytes) by (instance)
avg by (job) (rate(http_request_duration_seconds_sum[5m]))
quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))
```
## Histogram percentiles
```promql
histogram_quantile(0.95,
sum by (le) (rate(http_request_duration_seconds_bucket[5m]))) # p95 latency
```
## Alerting expressions
```promql
avg by (instance) (rate(node_cpu_seconds_total{mode!="idle"}[5m])) > 0.8
increase(kube_pod_container_status_restarts_total[1h]) > 0
predict_linear(node_filesystem_avail_bytes[1h], 4 * 3600) < 0
```
## Tip
If PromQL feels foreign, give AI the metric name + question in plain English. Ask for the PromQL. Then ask it to explain back what the query computes.