你的位置:首页 > 信息动态 > 新闻中心
信息动态
联系我们

监控容器并实现邮箱报警(cAdvisor 、AlertManager)

2021/12/31 1:14:53

Prometheus 容器化部署,配合Grafan画图工具监控节点

一、部署环境

主机名IP地址服务
prometheus192.168.85.131prometheus、grafana
cAdvisor192.168.85.132cAdvisor、docker

二、部署 Prometheus

准备工作

修改主机名

[root@localhost ~]# hostnamectl  set-hostname prometheus
[root@localhost ~]# bash

关闭防火墙

[root@prometheus ~]# systemctl disable --now firewalld

关闭selinux

[root@prometheus ~]# vim /etc/selinux/config 
[root@prometheus ~]# cat /etc/selinux/config
# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
#     disabled - SELinux security policy is enforced.
#     permissive - SELinux prints warnings instead of disabled.
#     disabled - No SELinux policy is loaded.
SELINUX=disabled
# SELINUXTYPE= can take one of these three values:
#     targeted - Targeted processes are protected,
#     minimum - Modification of targeted policy. Only selected processes are protected. 
#     mls - Multi Level Security protection.
SELINUXTYPE=targeted


[root@prometheus ~]# reboot
[root@prometheus ~]# getenforce 
Disabled

首先配置yum仓库

[root@prometheus ~]# wget -O /etc/yum.repos.d/CentOS-Base.repo https://mirrors.aliyun.com/repo/Centos-8.repo

配置docker的yum源

[root@prometheus ~]# cd /etc/yum.repos.d/
[root@prometheus ~]# curl -o docker-ce.repo https://mirrors.tuna.tsinghua.edu.cn/docker-ce/linux/centos/docker-ce.repo
[root@prometheus ~]# sed -i 's@https://download.docker.com@https://mirrors.tuna.tsinghua.edu.cn/docker-ce@g' docker-ce.repo

安装docker-ce以及docker组件

[root@prometheus ~]# yum -y install docker-ce    

启动docker

[root@prometheus ~]# systemctl start docker
[root@prometheus ~]# systemctl enable docker
[root@prometheus ~]# systemctl status docker
● docker.service - Docker Application Container Engine
   Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2021-12-30 23:35:18 CST; 20min ago
     Docs: https://docs.docker.com
 Main PID: 1035 (dockerd)
    Tasks: 20
   Memory: 125.4M
   CGroup: /system.slice/docker.service

设置阿里云镜像加速

[root@prometheus ~]# vim /etc/docker/daemon.json
[root@prometheus ~]# cat /etc/docker/daemon.json 
{
  "registry-mirrors": ["https://in3617d8.mirror.aliyuncs.com"]
}
[root@prometheus ~]# systemctl restart docker
[root@prometheus ~]# systemctl daemon-reload
[root@prometheus ~]# docker info 
Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Docker Buildx (Docker Inc., v0.7.1-docker)
  scan: Docker Scan (Docker Inc., v0.12.0)

Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 0
 Server Version: 20.10.12
 Storage Driver: overlay2
  Backing Filesystem: xfs
  Supports d_type: true
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 7b11cfaabd73bb80907dd23182b9347b4245eb5d
 runc version: v1.0.2-0-g52b36a2
 init version: de40ad0
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 4.18.0-257.el8.x86_64
 Operating System: CentOS Stream 8
 OSType: linux
 Architecture: x86_64
 CPUs: 2
 Total Memory: 3.622GiB
 Name: prometheus
 ID: Z3JR:D5TL:NVZ7:DJWE:7FL3:EC74:6DKV:6HOB:IPXV:FCQ3:MF4B:R3BG
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Registry Mirrors:
  https://in3617d8.mirror.aliyuncs.com/
 Live Restore Enabled: false

运行promethues容器

拉取官方prometheus镜像

[root@prometheus ~]# docker pull prom/prometheus  
Using default tag: latest
latest: Pulling from prom/prometheus
3cb635b06aa2: Pull complete 
34f699df6fe0: Pull complete 
33d6c9635e0f: Pull complete 
f2af7323bed8: Pull complete 
c16675a6a294: Pull complete 
827843f6afe6: Pull complete 
3d272942eeaf: Pull complete 
7e785cfa34da: Pull complete 
05e324559e3b: Pull complete 
170620261a59: Pull complete 
ec35f5996032: Pull complete 
5509173eb708: Pull complete 
Digest: sha256:cb9817249c346d6cfadebe383ed3b3cd4c540f623db40c4ca00da2ada45259bb
Status: Downloaded newer image for prom/prometheus:latest
docker.io/prom/prometheus:latest

创建存放prometheus配置文件的目录,并提供默认配置文件

[root@prometheus ~]# mkdir -p  /prometheus/config/
[root@prometheus ~]# cd /prometheus/config/
[root@prometheus config]# vim prometheus.yml 
[root@prometheus config]# cat prometheus.yml
# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      
      # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ["localhost:9090"]

使用官方promethrus镜像创建容器

[root@prometheus ~]# docker images
REPOSITORY        TAG       IMAGE ID       CREATED       SIZE
prom/prometheus   latest    a3d385fc29f9   11 days ago   201MB
[root@prometheus ~]# docker run --name prometheus -d --restart always  -p 9090:9090 -v /prometheus/config/prometheus.yml:/etc/prometheus/prometheus.yml   prom/prometheus:latest
7c224924088bc66c18f11549225e92677646ee5bbf5010340d9f57ddbd407cfc
[root@prometheus ~]# 
[root@prometheus ~]# 
[root@prometheus ~]# docker ps
CONTAINER ID   IMAGE                    COMMAND                  CREATED         STATUS         PORTS                                       NAMES
7c224924088b   prom/prometheus:latest   "/bin/prometheus --c…"   5 seconds ago   Up 4 seconds   0.0.0.0:9090->9090/tcp, :::9090->9090/tcp   prometheus

查看端口号

[root@prometheus ~]# ss -antl
State         Recv-Q        Send-Q               Local Address:Port                Peer Address:Port        Process        
LISTEN        0             128                        0.0.0.0:9090                     0.0.0.0:*                          
LISTEN        0             128                        0.0.0.0:22                       0.0.0.0:*                          
LISTEN        0             128                           [::]:9090                        [::]:*                          
LISTEN        0             128                           [::]:22                          [::]:*                        



使用本机IP地址192.168.85.131 + 端口号9090/targets在浏览器中访问
请添加图片描述

三、部署cAdvisor

准备工作

修改主机名

[root@localhost ~]# hostnamectl  set-hostname cAdvisor
[root@localhost ~]# bash

关闭防火墙

[root@cAdvisor ~]# systemctl disable --now firewalld

关闭selinux

[root@cAdvisor ~]# vim /etc/selinux/config 
[root@cAdvisor ~]# cat /etc/selinux/config
# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
#     disabled - SELinux security policy is enforced.
#     permissive - SELinux prints warnings instead of disabled.
#     disabled - No SELinux policy is loaded.
SELINUX=disabled
# SELINUXTYPE= can take one of these three values:
#     targeted - Targeted processes are protected,
#     minimum - Modification of targeted policy. Only selected processes are protected. 
#     mls - Multi Level Security protection.
SELINUXTYPE=targeted


[root@cAdvisor ~]# reboot
[root@cAdvisor ~]# getenforce 
Disabled

首先配置yum仓库

[root@cAdvisor ~]# wget -O /etc/yum.repos.d/CentOS-Base.repo https://mirrors.aliyun.com/repo/Centos-8.repo

配置docker的yum源

[root@cAdvisor ~]# cd /etc/yum.repos.d/
[root@cAdvisor ~]# curl -o docker-ce.repo https://mirrors.tuna.tsinghua.edu.cn/docker-ce/linux/centos/docker-ce.repo
[root@cAdvisor ~]# sed -i 's@https://download.docker.com@https://mirrors.tuna.tsinghua.edu.cn/docker-ce@g' docker-ce.repo

安装docker-ce以及docker组件

[root@cAdvisor ~]# yum -y install docker-ce    

启动docker

[root@cAdvisor ~]# systemctl start docker
[root@cAdvisor ~]# systemctl enable  docker
[root@cAdvisor ~]# systemctl status docker
● docker.service - Docker Application Container Engine
   Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2021-12-30 23:35:18 CST; 20min ago
     Docs: https://docs.docker.com
 Main PID: 1035 (dockerd)
    Tasks: 20
   Memory: 125.4M
   CGroup: /system.slice/docker.service

设置阿里云镜像加速

[root@cAdvisor ~]# vim /etc/docker/daemon.json
[root@cAdvisor ~]# cat /etc/docker/daemon.json 
{
  "registry-mirrors": ["https://in3617d8.mirror.aliyuncs.com"]
}
[root@cAdvisor ~]# systemctl restart docker
[root@cAdvisor ~]# systemctl daemon-reload
[root@cAdvisor ~]# docker info 
Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Docker Buildx (Docker Inc., v0.7.1-docker)
  scan: Docker Scan (Docker Inc., v0.12.0)

Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 0
 Server Version: 20.10.12
 Storage Driver: overlay2
  Backing Filesystem: xfs
  Supports d_type: true
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 7b11cfaabd73bb80907dd23182b9347b4245eb5d
 runc version: v1.0.2-0-g52b36a2
 init version: de40ad0
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 4.18.0-257.el8.x86_64
 Operating System: CentOS Stream 8
 OSType: linux
 Architecture: x86_64
 CPUs: 2
 Total Memory: 3.622GiB
 Name: prometheus
 ID: Z3JR:D5TL:NVZ7:DJWE:7FL3:EC74:6DKV:6HOB:IPXV:FCQ3:MF4B:R3BG
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Registry Mirrors:
  https://in3617d8.mirror.aliyuncs.com/
 Live Restore Enabled: false

在client主机上拉取google/cadvisor官方镜像

[root@cAdvisor ~]# docker pull google/cadvisor
Using default tag: latest
latest: Pulling from google/cadvisor
ff3a5c916c92: Pull complete 
44a45bb65cdf: Pull complete 
0bbe1a2fe2a6: Pull complete 
Digest: sha256:815386ebbe9a3490f38785ab11bda34ec8dacf4634af77b8912832d4f85dca04
Status: Downloaded newer image for google/cadvisor:latest
docker.io/google/cadvisor:latest

在 client 主机上使用官方镜像运行cadvisor容器并进行目录、端口映射

[root@cAdvisor ~]# docker run --volume=/:/rootfs:ro \
--volume=/var/run:/var/run:ro \
--volume=/sys:/sys:ro \
--volume=/var/lib/docker/:/var/lib/docker:ro \
--volume=/dev/disk/:/dev/disk:ro  \
--publish=8080:8080  \
--detach=true \
--name=cadvisor \
--privileged \
--device=/dev/kmsg \
google/cadvisor 
f2708f9435dbcc9cbac2133ab6660d3e566e47d2b4cddc0128c6209341834d32

查看容器运行状态

[root@cAdvisor ~]# docker ps
CONTAINER ID   IMAGE             COMMAND                  CREATED         STATUS         PORTS                                       NAMES
f2708f9435db   google/cadvisor   "/usr/bin/cadvisor -…"   3 minutes ago   Up 3 minutes   0.0.0.0:8080->8080/tcp, :::8080->8080/tcp   cadvisor

查看端口号

[root@cAdvisor ~]# ss -antl
State  Recv-Q Send-Q Local Address:Port   Peer Address:Port                      Process                      
LISTEN 0      128          0.0.0.0:8080        0.0.0.0:*                                                      
LISTEN 0      128          0.0.0.0:22          0.0.0.0:*                                                      
LISTEN 0      128             [::]:8080           [::]:*                                                      
LISTEN 0      128             [::]:22             [::]:*                                                   

使用IP地址+端口号8080访问

请添加图片描述
请添加图片描述

请添加图片描述

四、添加节点到prometheus中

修改/prometheus/config目录下的prometheus配置文件prometheus.yml

[root@prometheus ~]# cd /prometheus/config/
[root@prometheus config]# vim prometheus.yml 
[root@prometheus ~]# vim /prometheus/config/prometheus.yml 
[root@prometheus ~]# cat /prometheus/config/prometheus.yml
# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ["localhost:9090"]

  - job_name: "cAdvisor"  #添加节点工作名称  
    static_configs:
    - targets: ["192.168.85.132:8080"]  #添加cAdvisorIP地址和端口号

重启prometheus容器

[root@prometheus ~]# docker restart prometheus
prometheus
[root@prometheus ~]# docker ps
CONTAINER ID   IMAGE                    COMMAND                  CREATED          STATUS         PORTS                                       NAMES
7c224924088b   prom/prometheus:latest   "/bin/prometheus --c…"   55 minutes ago   Up 7 seconds   0.0.0.0:9090->9090/tcp, :::9090->9090/tcp   prometheus

使用prometheus主机IP地址192.168.85.131 + 端口号9090/targets在浏览器中访问

请添加图片描述

五、部署grafana画图工具

拉取grafan/grafan官方镜像

[root@prometheus ~]# docker pull grafana/grafana
Using default tag: latest
latest: Pulling from grafana/grafana
97518928ae5f: Already exists 
5b58818b7f48: Already exists 
d9a64d9fd162: Already exists 
4e368e1b924c: Already exists 
867f7fdd92d9: Already exists 
387c55415012: Already exists 
07f94c8f51cd: Pull complete 
ce8cf00ff6aa: Pull complete 
e44858b5f948: Pull complete 
4000fdbdd2a3: Pull complete 
Digest: sha256:18d94ae734accd66bccf22daed7bdb20c6b99aa0f2c687eea3ce4275fe275062
Status: Downloaded newer image for grafana/grafana:latest
docker.io/grafana/grafana:latest
[root@prometheus ~]# docker images
REPOSITORY        TAG       IMAGE ID       CREATED       SIZE
prom/prometheus   latest    a3d385fc29f9   11 days ago   201MB
grafana/grafana   latest    9b957e098315   2 weeks ago   275MB

使用官方grafana镜像运行容器

[root@prometheus ~]# docker run -d --name grafan -p 3000:3000 grafana/grafana
39f8ffa8c45b8b57a61e4526118384c15b56539bd878154a0a129edf60393849
[root@prometheus ~]# docker ps
CONTAINER ID   IMAGE                    COMMAND                  CREATED             STATUS          PORTS                                       NAMES
39f8ffa8c45b   grafana/grafana          "/run.sh"                5 seconds ago       Up 4 seconds    0.0.0.0:3000->3000/tcp, :::3000->3000/tcp   grafan
7c224924088b   prom/prometheus:latest   "/bin/prometheus --c…"   About an hour ago   Up 20 minutes   0.0.0.0:9090->9090/tcp, :::9090->9090/tcp   prometheus
[root@prometheus ~]# ss -antl
State         Recv-Q        Send-Q               Local Address:Port                Peer Address:Port        Process        
LISTEN        0             128                        0.0.0.0:3000                     0.0.0.0:*                          
LISTEN        0             128                        0.0.0.0:9090                     0.0.0.0:*                          
LISTEN        0             128                        0.0.0.0:22                       0.0.0.0:*                          
LISTEN        0             128                           [::]:3000                        [::]:*                          
LISTEN        0             128                           [::]:9090                        [::]:*                          
LISTEN        0             128                           [::]:22                          [::]:* 

使用prometheus主机IP地址192.168.85.131 + 端口号3000在浏览器中访问
请添加图片描述
登录用户和密码都是 admin,输入后提示修改密码
请添加图片描述
进入首页
请添加图片描述
配置数据源
请添加图片描述
请添加图片描述
请添加图片描述
请添加图片描述
请添加图片描述
请添加图片描述
导入仪表盘模板
请添加图片描述
请添加图片描述
请添加图片描述
请添加图片描述

六、安装Alertmanager

1、下载Alertmanager

[root@localhost ~]# wget https://github.com/prometheus/alertmanager/releases/download/v0.20.0/alertmanager-0.20.0.linux-amd64.tar.gz

[root@localhost ~]# tar xf alertmanager-0.20.0.linux-amd64.tar.gz

[root@localhost ~]# mv alertmanager-0.20.0.linux-amd64 /usr/local/alertmanager

2、创建启动文件

[root@localhost ~]# vim /usr/lib/systemd/system/alertmanager.service
[Unit]
Description=alertmanager
Documentation=https://github.com/prometheus/alertmanager
After=network.target
[Service]
Type=simple
User=root
ExecStart=/usr/local/alertmanager/alertmanager --config.file=/usr/local/alertmanager/alertmanager.yml --cluster.advertise-address=0.0.0.0:9093
Restart=on-failure
[Install]
WantedBy=multi-user.target

3、配置alertmanager.yml文件

Alertmanager 安装目录下默认有 alertmanager.yml 配置文件,可以创建新的配置文件,在启动时指定即可。

[root@localhost ~]# cd /usr/local/alertmanager
[root@localhost alertmanager]# vim alertmanager.yml
global:
  resolve_timeout: 5m
  # 邮件SMTP配置
  smtp_smarthost: 'smtp.exmail.qq.com:25'
  smtp_from: 'service@yangxingzhen.com'
  smtp_auth_username: 'service@yangxingzhen.com'
  smtp_auth_password: '123456'
  smtp_require_tls: false
# 自定义通知模板
templates:
  - '/usr/local/prometheus/alertmanager/template/email.tmpl'
# route用来设置报警的分发策略
route:
  # 采用哪个标签来作为分组依据
  group_by: ['alertname']
  # 组告警等待时间。也就是告警产生后等待10s,如果有同组告警一起发出
  group_wait: 10s
  # 两组告警的间隔时间
  group_interval: 10s
  # 重复告警的间隔时间,减少相同邮件的发送频率
  repeat_interval: 1h
  # 设置默认接收人
  receiver: 'email'
  routes:   # 可以指定哪些组接手哪些消息
  - receiver: 'email'
    continue: true
    group_wait: 10s
receivers:
- name: 'email'
  email_configs:
  - to: 'xingzhen.yang@yangxingzhen.com'
    html: '{{ template "email.to.html" . }}'
    headers: { Subject: "Prometheus [Warning] 报警邮件" }
    send_resolved: true
  • smtp_smarthost:是用于发送邮件的邮箱的 SMTP 服务器地址+端口;
  • smtp_auth_password:是发送邮箱的授权码而不是登录密码;
  • smtp_require_tls:不设置的话默认为 true,当为 true 时会有 starttls 错误,为了简单这里设置为 false;
  • templates:指出邮件的模板路径;
  • receivers 下html指出邮件内容模板名,这里模板名为“to.html”,在模板路径中的文件中定义。
  • headers:为邮件标题;

4、配置告警模板

[root@localhost alertmanager]# mkdir -p /usr/local/prometheus/alertmanager/template
[root@localhost alertmanager]# vim /usr/local/prometheus/alertmanager/template/email.tmpl
{{ define "email.to.html" }}{{- if gt (len .Alerts.Firing) 0 -}}{{ range .Alerts }}=========start==========<br>告警程序: prometheus_alert <br>告警级别: {{ .Labels.severity }} <br>告警类型: {{ .Labels.alertname }} <br>告警主机: {{ .Labels.instance }} <{{ define "email.to.html" }}
{{- if gt (len .Alerts.Firing) 0 -}}
{{ range .Alerts }}
=========start==========<br>
告警程序: prometheus_alert <br>
告警级别: {{ .Labels.severity }} <br>
告警类型: {{ .Labels.alertname }} <br>
告警主机: {{ .Labels.instance }} <br>
告警主题: {{ .Annotations.summary }}  <br>
告警详情: {{ .Annotations.description }} <br>
触发时间: {{ .StartsAt.Format "2006-01-02 15:04:05" }} <br>
=========end==========<br>
{{ end }}{{ end -}}
 
{{- if gt (len .Alerts.Resolved) 0 -}}
{{ range .Alerts }}
=========start==========<br>
告警程序: prometheus_alert <br>
告警级别: {{ .Labels.severity }} <br>
告警类型: {{ .Labels.alertname }} <br>
告警主机: {{ .Labels.instance }} <br>
告警主题: {{ .Annotations.summary }} <br>
告警详情: {{ .Annotations.description }} <br>
触发时间: {{ .StartsAt.Format "2006-01-02 15:04:05" }} <br>
恢复时间: {{ .EndsAt.Format "2006-01-02 15:04:05" }} <br>
=========end==========<br>
{{ end }}{{ end -}}
 
{{- end }}br>告警主题: {{ .Annotations.summary }}  <br>告警详情: {{ .Annotations.description }} <br>触发时间: {{ .StartsAt.Format "2006-01-02 15:04:05" }} <br>=========end==========<br>{{ end }}{{ end -}} {{- if gt (len .Alerts.Resolved) 0 -}}{{ range .Alerts }}=========start==========<br>告警程序: prometheus_alert <br>告警级别: {{ .Labels.severity }} <br>告警类型: {{ .Labels.alertname }} <br>告警主机: {{ .Labels.instance }} <br>告警主题: {{ .Annotations.summary }} <br>告警详情: {{ .Annotations.description }} <br>触发时间: {{ .StartsAt.Format "2006-01-02 15:04:05" }} <br>恢复时间: {{ .EndsAt.Format "2006-01-02 15:04:05" }} <br>=========end==========<br>{{ end }}{{ end -}} {{- end }}

5、配置告警规则

[root@localhost alertmanager]# mkdir -p /usr/local/prometheus/rules
[root@localhost alertmanager]# cd /usr/local/prometheus/rules
[root@localhost rules]# vim node.yml
groups:
- name: Node_exporter Down
  rules:
  - alert: Node实例已宕机
    expr: up == 0
    for: 10s
    labels:
      user: root
      severity: Warning
    annotations:
      summary: "{{ $labels.job }}"
      address: "{{ $labels.instance }}"
      description: "Node_exporter 客户端在1分钟内连接失败."

在Prometheus.yml 中指定 node.yml 的路径

[root@localhost rules]# vim /usr/local/prometheus/prometheus.yml

global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).
 
# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets: ['localhost:9093']
      # - localhost:9093
 
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  - 'rules/*.yml'
  # - "first_rules.yml"
  # - "second_rules.yml"
 
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'
    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.
 
    static_configs:
    - targets: ['localhost:9100']

6、重启 Prometheus 服务

[root@localhost rules]# systemctl restart prometheus

7、启动 Alertmanager

[root@localhost rules]# systemctl daemon-reload
[root@localhost rules]# systemctl start alertmanager

8、验证效果

此时访问prometheus管理界面可以看到如下信息:

Prometheus邮件报警配置

9、停止 node_exporter 服务,然后再看效果。

[root@localhost rules]# systemctl stop node_exporter

prometheus界面的alert可以看到告警状态。

  • 绿色表示正常。
  • 红色状态为PENDING表示alerts还没有发送至Alertmanager,因为rules里面配置了for: 10s。
  • 10秒后状态由PENDING变为FIRING,此时Prometheus才将告警发给alertmanager,在Alertmanager中可以看到有一个alert。

Prometheus邮件报警配置