Prometheus 容器化部署,配合Grafan画图工具监控节点
一、部署环境
主机名 | IP地址 | 服务 |
---|---|---|
prometheus | 192.168.85.131 | prometheus、grafana |
cAdvisor | 192.168.85.132 | cAdvisor、docker |
二、部署 Prometheus
准备工作
修改主机名
[root@localhost ~]# hostnamectl set-hostname prometheus
[root@localhost ~]# bash
关闭防火墙
[root@prometheus ~]# systemctl disable --now firewalld
关闭selinux
[root@prometheus ~]# vim /etc/selinux/config
[root@prometheus ~]# cat /etc/selinux/config
# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
# disabled - SELinux security policy is enforced.
# permissive - SELinux prints warnings instead of disabled.
# disabled - No SELinux policy is loaded.
SELINUX=disabled
# SELINUXTYPE= can take one of these three values:
# targeted - Targeted processes are protected,
# minimum - Modification of targeted policy. Only selected processes are protected.
# mls - Multi Level Security protection.
SELINUXTYPE=targeted
[root@prometheus ~]# reboot
[root@prometheus ~]# getenforce
Disabled
首先配置yum仓库
[root@prometheus ~]# wget -O /etc/yum.repos.d/CentOS-Base.repo https://mirrors.aliyun.com/repo/Centos-8.repo
配置docker的yum源
[root@prometheus ~]# cd /etc/yum.repos.d/
[root@prometheus ~]# curl -o docker-ce.repo https://mirrors.tuna.tsinghua.edu.cn/docker-ce/linux/centos/docker-ce.repo
[root@prometheus ~]# sed -i 's@https://download.docker.com@https://mirrors.tuna.tsinghua.edu.cn/docker-ce@g' docker-ce.repo
安装docker-ce以及docker组件
[root@prometheus ~]# yum -y install docker-ce
启动docker
[root@prometheus ~]# systemctl start docker
[root@prometheus ~]# systemctl enable docker
[root@prometheus ~]# systemctl status docker
● docker.service - Docker Application Container Engine
Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
Active: active (running) since Thu 2021-12-30 23:35:18 CST; 20min ago
Docs: https://docs.docker.com
Main PID: 1035 (dockerd)
Tasks: 20
Memory: 125.4M
CGroup: /system.slice/docker.service
设置阿里云镜像加速
[root@prometheus ~]# vim /etc/docker/daemon.json
[root@prometheus ~]# cat /etc/docker/daemon.json
{
"registry-mirrors": ["https://in3617d8.mirror.aliyuncs.com"]
}
[root@prometheus ~]# systemctl restart docker
[root@prometheus ~]# systemctl daemon-reload
[root@prometheus ~]# docker info
Client:
Context: default
Debug Mode: false
Plugins:
app: Docker App (Docker Inc., v0.9.1-beta3)
buildx: Docker Buildx (Docker Inc., v0.7.1-docker)
scan: Docker Scan (Docker Inc., v0.12.0)
Server:
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 0
Server Version: 20.10.12
Storage Driver: overlay2
Backing Filesystem: xfs
Supports d_type: true
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 7b11cfaabd73bb80907dd23182b9347b4245eb5d
runc version: v1.0.2-0-g52b36a2
init version: de40ad0
Security Options:
seccomp
Profile: default
Kernel Version: 4.18.0-257.el8.x86_64
Operating System: CentOS Stream 8
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 3.622GiB
Name: prometheus
ID: Z3JR:D5TL:NVZ7:DJWE:7FL3:EC74:6DKV:6HOB:IPXV:FCQ3:MF4B:R3BG
Docker Root Dir: /var/lib/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Registry Mirrors:
https://in3617d8.mirror.aliyuncs.com/
Live Restore Enabled: false
运行promethues容器
拉取官方prometheus镜像
[root@prometheus ~]# docker pull prom/prometheus
Using default tag: latest
latest: Pulling from prom/prometheus
3cb635b06aa2: Pull complete
34f699df6fe0: Pull complete
33d6c9635e0f: Pull complete
f2af7323bed8: Pull complete
c16675a6a294: Pull complete
827843f6afe6: Pull complete
3d272942eeaf: Pull complete
7e785cfa34da: Pull complete
05e324559e3b: Pull complete
170620261a59: Pull complete
ec35f5996032: Pull complete
5509173eb708: Pull complete
Digest: sha256:cb9817249c346d6cfadebe383ed3b3cd4c540f623db40c4ca00da2ada45259bb
Status: Downloaded newer image for prom/prometheus:latest
docker.io/prom/prometheus:latest
创建存放prometheus配置文件的目录,并提供默认配置文件
[root@prometheus ~]# mkdir -p /prometheus/config/
[root@prometheus ~]# cd /prometheus/config/
[root@prometheus config]# vim prometheus.yml
[root@prometheus config]# cat prometheus.yml
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: "prometheus"
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ["localhost:9090"]
使用官方promethrus镜像创建容器
[root@prometheus ~]# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
prom/prometheus latest a3d385fc29f9 11 days ago 201MB
[root@prometheus ~]# docker run --name prometheus -d --restart always -p 9090:9090 -v /prometheus/config/prometheus.yml:/etc/prometheus/prometheus.yml prom/prometheus:latest
7c224924088bc66c18f11549225e92677646ee5bbf5010340d9f57ddbd407cfc
[root@prometheus ~]#
[root@prometheus ~]#
[root@prometheus ~]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
7c224924088b prom/prometheus:latest "/bin/prometheus --c…" 5 seconds ago Up 4 seconds 0.0.0.0:9090->9090/tcp, :::9090->9090/tcp prometheus
查看端口号
[root@prometheus ~]# ss -antl
State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
LISTEN 0 128 0.0.0.0:9090 0.0.0.0:*
LISTEN 0 128 0.0.0.0:22 0.0.0.0:*
LISTEN 0 128 [::]:9090 [::]:*
LISTEN 0 128 [::]:22 [::]:*
使用本机IP地址192.168.85.131 + 端口号9090/targets在浏览器中访问
三、部署cAdvisor
准备工作
修改主机名
[root@localhost ~]# hostnamectl set-hostname cAdvisor
[root@localhost ~]# bash
关闭防火墙
[root@cAdvisor ~]# systemctl disable --now firewalld
关闭selinux
[root@cAdvisor ~]# vim /etc/selinux/config
[root@cAdvisor ~]# cat /etc/selinux/config
# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
# disabled - SELinux security policy is enforced.
# permissive - SELinux prints warnings instead of disabled.
# disabled - No SELinux policy is loaded.
SELINUX=disabled
# SELINUXTYPE= can take one of these three values:
# targeted - Targeted processes are protected,
# minimum - Modification of targeted policy. Only selected processes are protected.
# mls - Multi Level Security protection.
SELINUXTYPE=targeted
[root@cAdvisor ~]# reboot
[root@cAdvisor ~]# getenforce
Disabled
首先配置yum仓库
[root@cAdvisor ~]# wget -O /etc/yum.repos.d/CentOS-Base.repo https://mirrors.aliyun.com/repo/Centos-8.repo
配置docker的yum源
[root@cAdvisor ~]# cd /etc/yum.repos.d/
[root@cAdvisor ~]# curl -o docker-ce.repo https://mirrors.tuna.tsinghua.edu.cn/docker-ce/linux/centos/docker-ce.repo
[root@cAdvisor ~]# sed -i 's@https://download.docker.com@https://mirrors.tuna.tsinghua.edu.cn/docker-ce@g' docker-ce.repo
安装docker-ce以及docker组件
[root@cAdvisor ~]# yum -y install docker-ce
启动docker
[root@cAdvisor ~]# systemctl start docker
[root@cAdvisor ~]# systemctl enable docker
[root@cAdvisor ~]# systemctl status docker
● docker.service - Docker Application Container Engine
Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
Active: active (running) since Thu 2021-12-30 23:35:18 CST; 20min ago
Docs: https://docs.docker.com
Main PID: 1035 (dockerd)
Tasks: 20
Memory: 125.4M
CGroup: /system.slice/docker.service
设置阿里云镜像加速
[root@cAdvisor ~]# vim /etc/docker/daemon.json
[root@cAdvisor ~]# cat /etc/docker/daemon.json
{
"registry-mirrors": ["https://in3617d8.mirror.aliyuncs.com"]
}
[root@cAdvisor ~]# systemctl restart docker
[root@cAdvisor ~]# systemctl daemon-reload
[root@cAdvisor ~]# docker info
Client:
Context: default
Debug Mode: false
Plugins:
app: Docker App (Docker Inc., v0.9.1-beta3)
buildx: Docker Buildx (Docker Inc., v0.7.1-docker)
scan: Docker Scan (Docker Inc., v0.12.0)
Server:
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 0
Server Version: 20.10.12
Storage Driver: overlay2
Backing Filesystem: xfs
Supports d_type: true
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 7b11cfaabd73bb80907dd23182b9347b4245eb5d
runc version: v1.0.2-0-g52b36a2
init version: de40ad0
Security Options:
seccomp
Profile: default
Kernel Version: 4.18.0-257.el8.x86_64
Operating System: CentOS Stream 8
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 3.622GiB
Name: prometheus
ID: Z3JR:D5TL:NVZ7:DJWE:7FL3:EC74:6DKV:6HOB:IPXV:FCQ3:MF4B:R3BG
Docker Root Dir: /var/lib/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Registry Mirrors:
https://in3617d8.mirror.aliyuncs.com/
Live Restore Enabled: false
在client主机上拉取google/cadvisor官方镜像
[root@cAdvisor ~]# docker pull google/cadvisor
Using default tag: latest
latest: Pulling from google/cadvisor
ff3a5c916c92: Pull complete
44a45bb65cdf: Pull complete
0bbe1a2fe2a6: Pull complete
Digest: sha256:815386ebbe9a3490f38785ab11bda34ec8dacf4634af77b8912832d4f85dca04
Status: Downloaded newer image for google/cadvisor:latest
docker.io/google/cadvisor:latest
在 client 主机上使用官方镜像运行cadvisor容器并进行目录、端口映射
[root@cAdvisor ~]# docker run --volume=/:/rootfs:ro \
--volume=/var/run:/var/run:ro \
--volume=/sys:/sys:ro \
--volume=/var/lib/docker/:/var/lib/docker:ro \
--volume=/dev/disk/:/dev/disk:ro \
--publish=8080:8080 \
--detach=true \
--name=cadvisor \
--privileged \
--device=/dev/kmsg \
google/cadvisor
f2708f9435dbcc9cbac2133ab6660d3e566e47d2b4cddc0128c6209341834d32
查看容器运行状态
[root@cAdvisor ~]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
f2708f9435db google/cadvisor "/usr/bin/cadvisor -…" 3 minutes ago Up 3 minutes 0.0.0.0:8080->8080/tcp, :::8080->8080/tcp cadvisor
查看端口号
[root@cAdvisor ~]# ss -antl
State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
LISTEN 0 128 0.0.0.0:8080 0.0.0.0:*
LISTEN 0 128 0.0.0.0:22 0.0.0.0:*
LISTEN 0 128 [::]:8080 [::]:*
LISTEN 0 128 [::]:22 [::]:*
使用IP地址+端口号8080访问
四、添加节点到prometheus中
修改/prometheus/config目录下的prometheus配置文件prometheus.yml
[root@prometheus ~]# cd /prometheus/config/
[root@prometheus config]# vim prometheus.yml
[root@prometheus ~]# vim /prometheus/config/prometheus.yml
[root@prometheus ~]# cat /prometheus/config/prometheus.yml
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: "prometheus"
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ["localhost:9090"]
- job_name: "cAdvisor" #添加节点工作名称
static_configs:
- targets: ["192.168.85.132:8080"] #添加cAdvisorIP地址和端口号
重启prometheus容器
[root@prometheus ~]# docker restart prometheus
prometheus
[root@prometheus ~]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
7c224924088b prom/prometheus:latest "/bin/prometheus --c…" 55 minutes ago Up 7 seconds 0.0.0.0:9090->9090/tcp, :::9090->9090/tcp prometheus
使用prometheus主机IP地址192.168.85.131 + 端口号9090/targets在浏览器中访问
五、部署grafana画图工具
拉取grafan/grafan官方镜像
[root@prometheus ~]# docker pull grafana/grafana
Using default tag: latest
latest: Pulling from grafana/grafana
97518928ae5f: Already exists
5b58818b7f48: Already exists
d9a64d9fd162: Already exists
4e368e1b924c: Already exists
867f7fdd92d9: Already exists
387c55415012: Already exists
07f94c8f51cd: Pull complete
ce8cf00ff6aa: Pull complete
e44858b5f948: Pull complete
4000fdbdd2a3: Pull complete
Digest: sha256:18d94ae734accd66bccf22daed7bdb20c6b99aa0f2c687eea3ce4275fe275062
Status: Downloaded newer image for grafana/grafana:latest
docker.io/grafana/grafana:latest
[root@prometheus ~]# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
prom/prometheus latest a3d385fc29f9 11 days ago 201MB
grafana/grafana latest 9b957e098315 2 weeks ago 275MB
使用官方grafana镜像运行容器
[root@prometheus ~]# docker run -d --name grafan -p 3000:3000 grafana/grafana
39f8ffa8c45b8b57a61e4526118384c15b56539bd878154a0a129edf60393849
[root@prometheus ~]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
39f8ffa8c45b grafana/grafana "/run.sh" 5 seconds ago Up 4 seconds 0.0.0.0:3000->3000/tcp, :::3000->3000/tcp grafan
7c224924088b prom/prometheus:latest "/bin/prometheus --c…" About an hour ago Up 20 minutes 0.0.0.0:9090->9090/tcp, :::9090->9090/tcp prometheus
[root@prometheus ~]# ss -antl
State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
LISTEN 0 128 0.0.0.0:3000 0.0.0.0:*
LISTEN 0 128 0.0.0.0:9090 0.0.0.0:*
LISTEN 0 128 0.0.0.0:22 0.0.0.0:*
LISTEN 0 128 [::]:3000 [::]:*
LISTEN 0 128 [::]:9090 [::]:*
LISTEN 0 128 [::]:22 [::]:*
使用prometheus主机IP地址192.168.85.131 + 端口号3000在浏览器中访问
登录用户和密码都是 admin,输入后提示修改密码
进入首页
配置数据源
导入仪表盘模板
六、安装Alertmanager
1、下载Alertmanager
[root@localhost ~]# wget https://github.com/prometheus/alertmanager/releases/download/v0.20.0/alertmanager-0.20.0.linux-amd64.tar.gz
[root@localhost ~]# tar xf alertmanager-0.20.0.linux-amd64.tar.gz
[root@localhost ~]# mv alertmanager-0.20.0.linux-amd64 /usr/local/alertmanager
2、创建启动文件
[root@localhost ~]# vim /usr/lib/systemd/system/alertmanager.service
[Unit]
Description=alertmanager
Documentation=https://github.com/prometheus/alertmanager
After=network.target
[Service]
Type=simple
User=root
ExecStart=/usr/local/alertmanager/alertmanager --config.file=/usr/local/alertmanager/alertmanager.yml --cluster.advertise-address=0.0.0.0:9093
Restart=on-failure
[Install]
WantedBy=multi-user.target
3、配置alertmanager.yml文件
Alertmanager 安装目录下默认有 alertmanager.yml 配置文件,可以创建新的配置文件,在启动时指定即可。
[root@localhost ~]# cd /usr/local/alertmanager
[root@localhost alertmanager]# vim alertmanager.yml
global:
resolve_timeout: 5m
# 邮件SMTP配置
smtp_smarthost: 'smtp.exmail.qq.com:25'
smtp_from: 'service@yangxingzhen.com'
smtp_auth_username: 'service@yangxingzhen.com'
smtp_auth_password: '123456'
smtp_require_tls: false
# 自定义通知模板
templates:
- '/usr/local/prometheus/alertmanager/template/email.tmpl'
# route用来设置报警的分发策略
route:
# 采用哪个标签来作为分组依据
group_by: ['alertname']
# 组告警等待时间。也就是告警产生后等待10s,如果有同组告警一起发出
group_wait: 10s
# 两组告警的间隔时间
group_interval: 10s
# 重复告警的间隔时间,减少相同邮件的发送频率
repeat_interval: 1h
# 设置默认接收人
receiver: 'email'
routes: # 可以指定哪些组接手哪些消息
- receiver: 'email'
continue: true
group_wait: 10s
receivers:
- name: 'email'
email_configs:
- to: 'xingzhen.yang@yangxingzhen.com'
html: '{{ template "email.to.html" . }}'
headers: { Subject: "Prometheus [Warning] 报警邮件" }
send_resolved: true
- smtp_smarthost:是用于发送邮件的邮箱的 SMTP 服务器地址+端口;
- smtp_auth_password:是发送邮箱的授权码而不是登录密码;
- smtp_require_tls:不设置的话默认为 true,当为 true 时会有 starttls 错误,为了简单这里设置为 false;
- templates:指出邮件的模板路径;
- receivers 下html指出邮件内容模板名,这里模板名为“to.html”,在模板路径中的文件中定义。
- headers:为邮件标题;
4、配置告警模板
[root@localhost alertmanager]# mkdir -p /usr/local/prometheus/alertmanager/template
[root@localhost alertmanager]# vim /usr/local/prometheus/alertmanager/template/email.tmpl
{{ define "email.to.html" }}{{- if gt (len .Alerts.Firing) 0 -}}{{ range .Alerts }}=========start==========<br>告警程序: prometheus_alert <br>告警级别: {{ .Labels.severity }} <br>告警类型: {{ .Labels.alertname }} <br>告警主机: {{ .Labels.instance }} <{{ define "email.to.html" }}
{{- if gt (len .Alerts.Firing) 0 -}}
{{ range .Alerts }}
=========start==========<br>
告警程序: prometheus_alert <br>
告警级别: {{ .Labels.severity }} <br>
告警类型: {{ .Labels.alertname }} <br>
告警主机: {{ .Labels.instance }} <br>
告警主题: {{ .Annotations.summary }} <br>
告警详情: {{ .Annotations.description }} <br>
触发时间: {{ .StartsAt.Format "2006-01-02 15:04:05" }} <br>
=========end==========<br>
{{ end }}{{ end -}}
{{- if gt (len .Alerts.Resolved) 0 -}}
{{ range .Alerts }}
=========start==========<br>
告警程序: prometheus_alert <br>
告警级别: {{ .Labels.severity }} <br>
告警类型: {{ .Labels.alertname }} <br>
告警主机: {{ .Labels.instance }} <br>
告警主题: {{ .Annotations.summary }} <br>
告警详情: {{ .Annotations.description }} <br>
触发时间: {{ .StartsAt.Format "2006-01-02 15:04:05" }} <br>
恢复时间: {{ .EndsAt.Format "2006-01-02 15:04:05" }} <br>
=========end==========<br>
{{ end }}{{ end -}}
{{- end }}br>告警主题: {{ .Annotations.summary }} <br>告警详情: {{ .Annotations.description }} <br>触发时间: {{ .StartsAt.Format "2006-01-02 15:04:05" }} <br>=========end==========<br>{{ end }}{{ end -}} {{- if gt (len .Alerts.Resolved) 0 -}}{{ range .Alerts }}=========start==========<br>告警程序: prometheus_alert <br>告警级别: {{ .Labels.severity }} <br>告警类型: {{ .Labels.alertname }} <br>告警主机: {{ .Labels.instance }} <br>告警主题: {{ .Annotations.summary }} <br>告警详情: {{ .Annotations.description }} <br>触发时间: {{ .StartsAt.Format "2006-01-02 15:04:05" }} <br>恢复时间: {{ .EndsAt.Format "2006-01-02 15:04:05" }} <br>=========end==========<br>{{ end }}{{ end -}} {{- end }}
5、配置告警规则
[root@localhost alertmanager]# mkdir -p /usr/local/prometheus/rules
[root@localhost alertmanager]# cd /usr/local/prometheus/rules
[root@localhost rules]# vim node.yml
groups:
- name: Node_exporter Down
rules:
- alert: Node实例已宕机
expr: up == 0
for: 10s
labels:
user: root
severity: Warning
annotations:
summary: "{{ $labels.job }}"
address: "{{ $labels.instance }}"
description: "Node_exporter 客户端在1分钟内连接失败."
在Prometheus.yml 中指定 node.yml 的路径
[root@localhost rules]# vim /usr/local/prometheus/prometheus.yml
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets: ['localhost:9093']
# - localhost:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
- 'rules/*.yml'
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ['localhost:9100']
6、重启 Prometheus 服务
[root@localhost rules]# systemctl restart prometheus
7、启动 Alertmanager
[root@localhost rules]# systemctl daemon-reload
[root@localhost rules]# systemctl start alertmanager
8、验证效果
此时访问prometheus管理界面可以看到如下信息:
9、停止 node_exporter 服务,然后再看效果。
[root@localhost rules]# systemctl stop node_exporter
prometheus界面的alert可以看到告警状态。
- 绿色表示正常。
- 红色状态为PENDING表示alerts还没有发送至Alertmanager,因为rules里面配置了for: 10s。
- 10秒后状态由PENDING变为FIRING,此时Prometheus才将告警发给alertmanager,在Alertmanager中可以看到有一个alert。