Prometheus
约 2444 字大约 8 分钟
monitoringprometheusgrafana
2025-10-27
🏗️ 架构图
🚀 部署指南
Docker
docker run -d \
-p 9090:9090 \
-v /path/to/config:/etc/prometheus \ # 挂载配置文件目录
-v /path/to/data:/prometheus \ # 挂载数据目录,确保数据持久化
-e TZ=Asia/Shanghai \
--memory="2g" \
--cpus="1.0" \
--user="$(id -u):$(id -g)" \
--restart=unless-stopped \
--name=prometheus \
prom/prometheus:latest \
--config.file=/etc/prometheus/prometheus.yml \
--storage.tsdb.path=/prometheus \
--storage.tsdb.retention.time=30d \
--web.enable-lifecycleDocker Compose
查看当前用户的UID和GID
idservices:
prometheus:
image: prom/prometheus:latest
container_name: prometheus
user: "${UID}:${GID}"
restart: unless-stopped
ports:
- "9090:9090"
volumes:
- /path/to/config:/etc/prometheus # 挂载配置文件目录
- /path/to/data:/prometheus # 挂载数据目录,确保数据持久化
environment:
- TZ=Asia/Shanghai
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--storage.tsdb.retention.time=30d'
- '--web.enable-lifecycle'
deploy:
resources:
limits:
memory: '2g'
cpus: '1.0'⚙️ 配置样例
📦 Prometheus (itself)
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
scrape_timeout: 10s # scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
scrape_interval: 30s
static_configs:
- targets: [ 'localhost:9090' ]📦 Node exporter
部署
- Docker
docker run -d \ --net="host" \ --pid="host" \ -v "/:/host:ro,rslave" \ --restart=unless-stopped \ --name=node-exporter \ quay.io/prometheus/node-exporter:latest \ --path.rootfs=/host \ --path.procfs=/host/proc \ --path.sysfs=/host/sys \ --collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)- Docker Compose
services: node-exporter: image: quay.io/prometheus/node-exporter:latest container_name: node-exporter restart: unless-stopped network_mode: host pid: host volumes: - /:/host:ro,rslave command: - --path.rootfs=/host - --path.procfs=/host/proc - --path.sysfs=/host/sys - --collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)配置
scrape_configs: - job_name: 'node-exporter' scrape_interval: 15s static_configs: - targets: - 'target.homelab.lan:9100' relabel_configs: # 使用 域名 或 IP地址 作为实例名,缩短名称长度 - source_labels: [__address__] target_label: instance regex: '([^:]+):\d+' # 捕获域名或IP地址 replacement: '${1}'重启
重启Prometheus服务以使配置生效,或采用文件服务发现,方法如下:
scrape_configs: - job_name: 'node-exporter' scrape_interval: 15s static_configs: - targets: - 'server.homelab.lan:9100' file_sd_configs: - files: - '/etc/prometheus/targets/node-exporters.yml' refresh_interval: 1m relabel_configs: # 使用 域名 或 IP地址 作为实例名,缩短名称长度 - source_labels: [__address__] target_label: instance regex: '([^:]+):\d+' # 捕获域名或IP地址 replacement: '${1}'- /etc/prometheus/targets/node-exporters.yml
- targets: - 'server1.homelab.lan:9100' - 'server2.homelab.lan:9100' - 'server3.homelab.lan:9100' labels: group: 'production'图表
1860Node Exporter Full8919TenSunS自动同步版16098通用JOB分组版
📦 cAdvisor
部署
- Docker
VERSION=v0.49.1 # use the latest release version from https://github.com/google/cadvisor/releases sudo docker run \ --volume=/:/rootfs:ro \ --volume=/var/run:/var/run:ro \ --volume=/sys:/sys:ro \ --volume=/var/lib/docker/:/var/lib/docker:ro \ --volume=/dev/disk/:/dev/disk:ro \ --publish=8080:8080 \ --detach=true \ --name=cadvisor \ --privileged \ --device=/dev/kmsg \ gcr.io/cadvisor/cadvisor:$VERSION- Docker Compose
services: cadvisor: image: gcr.io/cadvisor/cadvisor:${VERSION:-v0.49.1} # use the latest release version from https://github.com/google/cadvisor/releases container_name: cadvisor privileged: true ports: - "8080:8080" volumes: - '/:/rootfs:ro' - '/var/run:/var/run:ro' - '/sys:/sys:ro' - '/var/lib/docker/:/var/lib/docker:ro' - '/dev/disk/:/dev/disk:ro' devices: - '/dev/kmsg:/dev/kmsg'配置
scrape_configs: - job_name: 'cadvisor' scrape_interval: 15s static_configs: - targets: - 'server.homelab.lan:8080'重启
重启Prometheus服务以使配置生效,或采用文件服务发现。
图表
📦 DCGM-Exporter
部署
- Docker
VERSION=4.4.1-4.6.0 docker run -d \ -p 9400:9400 \ --gpus all \ --cap-add SYS_ADMIN \ --restart=unless-stopped \ --name=dcgm-exporter \ nvcr.io/nvidia/k8s/dcgm-exporter:${VERSION}-ubuntu22.04- Docker Compose
services: dcgm-exporter: image: nvcr.io/nvidia/k8s/dcgm-exporter:{VERSION:-4.4.1-4.6.0}-ubuntu22.04 container_name: dcgm-exporter restart: unless-stopped ports: - "9400:9400" deploy: resources: reservations: devices: - driver: nvidia capabilities: [utility] count: all cap_add: - SYS_ADMIN- Helm
# 添加仓库 helm repo add gpu-helm-charts https://nvidia.github.io/dcgm-exporter/helm-charts # 更新仓库 helm repo update # 安装部署 helm install --generate-name gpu-helm-charts/dcgm-exporter配置
scrape_configs: - job_name: 'dcgm-exporter' scrape_interval: 15s metrics_path: /metrics static_configs: - targets: - 'server.homelab.lan:9400'重启
重启Prometheus服务以使配置生效,或采用文件服务发现。
图表
nvidia_gpu_exporter
部署
docker run -d \ --name nvidia_smi_exporter \ --restart unless-stopped \ --device /dev/nvidiactl:/dev/nvidiactl \ --device /dev/nvidia0:/dev/nvidia0 \ -v /usr/lib/x86_64-linux-gnu/libnvidia-ml.so:/usr/lib/x86_64-linux-gnu/libnvidia-ml.so \ -v /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1:/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1 \ -v /usr/bin/nvidia-smi:/usr/bin/nvidia-smi \ -p 9835:9835 \ utkuozdemir/nvidia_gpu_exporter:1.3.1配置
重启
重启Prometheus服务以使配置生效,或采用文件服务发现。
MinIO
部署
不涉及。
配置
- 目录结构
/etc/prometheus/ ├── prometheus.yml ├── file_sd/ │ └── minio-targets.yml └── secrets/ ├── minio-token-prod └── minio-token-stagingscrape_configs: - job_name: 'minio-cluster' scrape_interval: 15s metrics_path: /minio/v2/metrics/cluster scheme: https tls_config: insecure_skip_verify: false file_sd_configs: - files: - '/etc/prometheus/file_sd/minio-targets.yml' refresh_interval: 1m relabel_configs: # 从目标元数据中提取 Bearer Token 文件路径 - source_labels: [__meta_minio_auth_token_file] target_label: __bearer_token_file replacement: /etc/prometheus/secrets/${1} # 动态配置 TLS - source_labels: [__meta_minio_tls_skip_verify] target_label: __tls_insecure_skip_verify regex: "false" replacement: "true" # 环境标识 - source_labels: [__meta_minio_environment] target_label: environment - source_labels: [__meta_minio_cluster] target_label: cluster - source_labels: [__meta_minio_region] target_label: region # 使用 域名 或 IP地址 作为 实例名 - source_labels: [__address__] target_label: instance regex: '([^:]+):\d+' # 捕获域名或IP地址 replacement: '${1}' - target_label: job replacement: "minio"- /etc/prometheus/file_sd/minio-targets.yml
- targets: - 'minio.homelab.lan:9000' labels: # 环境标识 environment: 'prod' cluster: 'xxx' region: 'us-east-1' # 元数据配置 __meta_minio_auth_token_file: 'minio-token-prod' __meta_minio_tls_skip_verify: 'false' - targets: - 'minio.staging.homelab.lan:9000' labels: environment: 'staging' cluster: 'yyy' region: 'us-east-1' __meta_minio_auth_token_file: 'minio-token-staging' __meta_minio_tls_skip_verify: 'true' - targets: - 'minio.testing.homelab.lan:9000' labels: environment: 'testing' cluster: 'zzz' region: 'us-east-1' __meta_minio_auth_token_file: 'minio-token-testing' __meta_minio_tls_skip_verify: 'true'重启
重启Prometheus服务以使配置生效,或采用文件服务发现。
📦 MySQL Server Exporter
部署
docker run -d \ -p 9104:9104 \ -v ./my.cnf:/.my.cnf \ --restart=unless-stopped \ --name=mysqld-exporter \ prom/mysqld-exporter:latest- my.cnf
[client] user=exporter password=your_password [client.prod] host=mysql.homelab.lan port=3306 [client.staging] host=mysql.staging.homelab.lan port=3306配置
scrape_configs: - job_name: mysql scrape_interval: 30s metrics_path: /probe static_configs: - targets: - mysql.homelab.lan:3306 - mysql.staging.homelab.lan:3306 relabel_configs: - source_labels: [__address__] target_label: __param_target regex: '([^:]+)\.homelab\.lan:(\d+)' replacement: '${1}.homelab.lan:${2}' - source_labels: [__address__] target_label: instance regex: '([^:]+)\.homelab\.lan:\d+' replacement: '${1}' - target_label: __address__ replacement: 'docker.homelab.lan:9104'重启
重启Prometheus服务以使配置生效,或采用文件服务发现。
图表
📦 Prometheus Valkey & Redis Metrics Exporter
部署
docker run -d \ -p 9121:9121 \ --name redis_exporter \ oliver006/redis_exporter配置
scrape_configs: - job_name: redis scrape_interval: 30s metrics_path: /scrape static_configs: - targets: - redis://redis.homelab.lan:6379 - redis://redis.staging.homelab.lan:6379 relabel_configs: - source_labels: [__address__] target_label: __param_target regex: 'redis://([^:]+\.homelab\.lan:\d+)' replacement: 'redis://${1}' - source_labels: [__param_target] target_label: instance regex: 'redis://([^:]+)\.homelab\.lan:\d+' replacement: '${1}' - target_label: __address__ replacement: 'docker.homelab.lan:9121'重启
重启Prometheus服务以使配置生效,或采用文件服务发现。
图表
PostgreSQL Server Exporter
部署
# Start an example database docker run --net=host -it --rm -e POSTGRES_PASSWORD=password postgres # Connect to it docker run \ --net=host \ -e DATA_SOURCE_URI="localhost:5432/postgres?sslmode=disable" \ -e DATA_SOURCE_USER=postgres \ -e DATA_SOURCE_PASS=password \ quay.io/prometheuscommunity/postgres-exporter配置
scrape_configs: - job_name: postgres scrape_interval: 30s metrics_path: /metrics static_configs: - targets: - postgres.homelab.lan:9187 - postgres.staging.homelab.lan:9187重启
重启Prometheus服务以使配置生效,或采用文件服务发现。
图表
12485PostgreSQL Exporter
APISIX prometheus plugin
部署
修改
config.yaml配置文件plugin_attr: prometheus: # Plugin: prometheus attributes export_uri: /apisix/prometheus/metrics # Set the URI for the Prometheus metrics endpoint. metric_prefix: apisix_ # Set the prefix for Prometheus metrics generated by APISIX. enable_export_server: true # Enable the Prometheus export server. export_addr: # Set the address for the Prometheus export server. ip: 127.0.0.1 # Set the IP. port: 9091 # Set the port. # metrics: # Create extra labels for metrics. # http_status: # These metrics will be prefixed with `apisix_`. # extra_labels: # Set the extra labels for http_status metrics. # - upstream_addr: $upstream_addr # - status: $upstream_status # expire: 0 # The expiration time of metrics in seconds. # 0 means the metrics will not expire. # http_latency: # extra_labels: # Set the extra labels for http_latency metrics. # - upstream_addr: $upstream_addr # expire: 0 # The expiration time of metrics in seconds. # 0 means the metrics will not expire. # bandwidth: # extra_labels: # Set the extra labels for bandwidth metrics. # - upstream_addr: $upstream_addr # expire: 0 # The expiration time of metrics in seconds. # 0 means the metrics will not expire. # default_buckets: # Set the default buckets for the `http_latency` metrics histogram. # - 10 # - 50 # - 100 # - 200 # - 500 # - 1000 # - 2000 # - 5000 # - 10000 # - 30000 # - 60000 # - 500配置
scrape_configs: - job_name: 'apisix' scrape_interval: 5s metrics_path: /apisix/prometheus/metrics static_configs: - targets: - 'apisix:9091' labels: service: 'apisix-gateway' component: 'api-gateway'重启
重启Prometheus服务以使配置生效,或采用文件服务发现。
Blackbox exporter
部署
Note: You may want to enable ipv6 in your docker configuration
docker run --rm \ -p 9115/tcp \ -v $(pwd):/config \ --name blackbox_exporter \ quay.io/prometheus/blackbox-exporter:latest \ --config.file=/config/blackbox.yml配置
scrape_configs: - job_name: blackbox_all metrics_path: /probe params: module: [ http_2xx ] # Look for a HTTP 200 response. dns_sd_configs: - names: - example.com - prometheus.io type: A port: 443 relabel_configs: - source_labels: [__address__] target_label: __param_target replacement: https://$1/ # Make probe URL be like https://1.2.3.4:443/ - source_labels: [__param_target] target_label: instance - target_label: __address__ replacement: 127.0.0.1:9115 # The blackbox exporter's real hostname:port. - source_labels: [__meta_dns_name] target_label: __param_hostname # Make domain name become 'Host' header for probe requests - source_labels: [__meta_dns_name] target_label: vhost # and store it in 'vhost' label重启
重启Prometheus服务以使配置生效,或采用文件服务发现。
Windows exporter
部署
docker run -d \ --restart=unless-stopped \ --name=windows-exporter \ prometheuscommunity/windows-exporter配置
重启
重启Prometheus服务以使配置生效,或采用文件服务发现。
💻 最佳实践
主机&容器 🔥🔥🔥
部署
networks: monitoring: driver: bridge services: node-exporter: # Node Exporter - 收集主机系统指标 image: quay.io/prometheus/node-exporter:latest container_name: node-exporter restart: unless-stopped pid: host ports: - 9100:9100 volumes: - /:/host:ro,rslave command: - --path.rootfs=/host - --path.procfs=/host/proc - --path.sysfs=/host/sys - --collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/) networks: - monitoring cadvisor: # cAdvisor - 收集容器指标 image: gcr.io/cadvisor/cadvisor:v0.49.1 container_name: cadvisor restart: unless-stopped privileged: true ports: - 8080:8080 volumes: - /:/rootfs:ro - /var/run:/var/run:ro - /sys:/sys:ro - /var/lib/docker/:/var/lib/docker:ro - /dev/disk/:/dev/disk:ro devices: - /dev/kmsg:/dev/kmsg networks: - monitoring配置
scrape_configs: - job_name: 'node-exporter' scrape_interval: 15s static_configs: - targets: - 'target.homelab.lan:9100' relabel_configs: # 使用 域名 或 IP地址 作为实例名,缩短名称长度 - source_labels: [__address__] target_label: instance regex: '([^:]+):\d+' # 捕获域名或IP地址 replacement: '${1}' - job_name: 'cadvisor' scrape_interval: 15s static_configs: - targets: - 'server.homelab.lan:8080'图表
1860Node Exporter Full16098通用JOB分组版193Docker monitoring19908cAdvisor Docker Insights
1Panel 数据库三套件
部署
networks: 1panel-network: external: true services: mysqld-exporter: # MySQL image: prom/mysqld-exporter:latest container_name: mysqld-exporter restart: unless-stopped volumes: - ./my.cnf:/.my.cnf ports: - 9104:9104 networks: - 1panel-network redis-exporter: # Redis image: oliver006/redis_exporter:latest container_name: redis-exporter restart: unless-stopped ports: - 9121:9121 environment: REDIS_ADDR: redis://redis.staging.homelab.lan:6379 REDIS_PASSWORD: ${REDIS_PASSWORD} networks: - 1panel-network postgres-exporter: # PostgreSQL image: quay.io/prometheuscommunity/postgres-exporter:latest container_name: postgres-exporter restart: unless-stopped network_mode: host environment: DATA_SOURCE_URI: postgresql:5432/postgres?sslmode=disable DATA_SOURCE_USER: ${POSTGRE_USERNAME} DATA_SOURCE_PASS: ${POSTGRE_PASSWORD} networks: - 1panel-network- my.cnf
[client] user=exporter password=your_password [client.prod] host=mysql.homelab.lan port=3306 [client.staging] host=mysql.staging.homelab.lan port=3306- .env
COMPOSE_PROJECT_NAME=monitoring REDIS_PASSWORD= POSTGRE_USERNAME= POSTGRE_PASSWORD=配置
scrape_configs: - job_name: mysql scrape_interval: 30s metrics_path: /probe static_configs: - targets: - mysql.homelab.lan:3306 - mysql.staging.homelab.lan:3306 relabel_configs: - source_labels: [__address__] target_label: __param_target regex: '([^:]+)\.homelab\.lan:(\d+)' replacement: '${1}.homelab.lan:${2}' - source_labels: [__address__] target_label: instance regex: '([^:]+)\.homelab\.lan:\d+' replacement: '${1}' - target_label: __address__ replacement: 'docker.homelab.lan:9104' - job_name: redis scrape_interval: 30s metrics_path: /scrape static_configs: - targets: - redis://redis.homelab.lan:6379 - redis://redis.staging.homelab.lan:6379 relabel_configs: - source_labels: [__address__] target_label: __param_target regex: 'redis://([^:]+\.homelab\.lan:\d+)' replacement: 'redis://${1}' - source_labels: [__param_target] target_label: instance regex: 'redis://([^:]+)\.homelab\.lan:\d+' replacement: '${1}' - target_label: __address__ replacement: 'docker.homelab.lan:9121' - job_name: postgres scrape_interval: 30s metrics_path: /metrics static_configs: - targets: - postgres.homelab.lan:9187 - postgres.staging.homelab.lan:9187图表
GPU
部署
networks: monitoring: driver: bridge services: dcgm-exporter: # DCGM Exporter - 收集 NVIDIA GPU 指标 image: nvcr.io/nvidia/k8s/dcgm-exporter:4.4.1-4.6.0-ubuntu22.04 container_name: dcgm-exporter restart: unless-stopped ports: - "9400:9400" deploy: resources: reservations: devices: - driver: nvidia capabilities: [utility] count: all cap_add: - SYS_ADMIN networks: - monitoring配置
scrape_configs: - job_name: 'dcgm-exporter' scrape_interval: 15s metrics_path: /metrics static_configs: - targets: - 'server.homelab.lan:9400'图表
12239Grafana Dashboard
