prometheus 配置钉钉告警

prometheus的alertmanager本身不支持直接使用钉钉告警的方式,若配置钉钉告警需要webhook插件做为组件。

安装钉钉告警webhook插件

由于我的prometheus和alertmanager都安装在kubernetes里面,所以钉钉插件也安装在k8s平台里面。若需要其他安装方式,可以参考
钉钉插件github仓库

  • vim dingtalk-alert-deployment.yml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
labels:
app: dingtalk-alert
name: dingtalk-alert
namespace: monitor
spec:
replicas: 1
selector:
matchLabels:
app: dingtalk-alert
template:
metadata:
labels:
app: dingtalk-alert
spec:
containers:
- image: timonwong/prometheus-webhook-dingtalk
command:
- /bin/prometheus-webhook-dingtalk
- --ding.profile=webhook1=https://oapi.dingtalk.com/robot/send?access_token=da1cc37cd155f73112bcbf4aa4be49c8c400786f1b38908a15fa1e9be0eee51
- --template.file=/usr/share/prometheus-webhook-dingtalk/template/default.tmpl
imagePullPolicy: IfNotPresent
name: dingtalk-alert
ports:
- containerPort: 8086
name: service
protocol: TCP
resources:
limits:
cpu: 100m
memory: 512Mi
requests:
cpu: 50m
memory: 128Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /usr/share/prometheus-webhook-dingtalk/template/default.tmpl
name: dingtalk-tmpl
subPath: default.tmpl
dnsPolicy: ClusterFirst
volumes:
- configMap:
name: dingtalk-alert-configmap
name: dingtalk-tmpl
  • 注意: –ding.profile指定钉钉机器人的url,其中webhook1为自定义名称,在配置alertmanager规则中指定webhook时会使用
  • 注意:–template.file指定发送告警模版文件
  • [x]注意:–ding.profile后钉钉机器人url不可有引号,不然会报404错误,非k8s安装时需要加引号
  • vim dingtalk-configmap.yml
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    apiVersion: v1
    data:
    default.tmpl: |
    {{ define "__subject" }}[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] {{ .GroupLabels.SortedPairs.Values | join " " }} {{ if gt (len .CommonLabels) (len .GroupLabels) }}({{ with .CommonLabels.Remove .GroupLabels.Names }}{{ .Values | join " " }}{{ end }}){{ end }}{{ end }}
    {{ define "__alertmanagerURL" }}{{ .ExternalURL }}/#/alerts?receiver={{ .Receiver }}{{ end }}

    {{ define "__text_alert_list" }}{{ range . }}
    **告警**
    {{ range .Annotations.SortedPairs }}> - {{ .Name }}: {{ .Value | markdown | html }}
    {{ end }}

    **标签**
    {{ range .Labels.SortedPairs }}> - {{ .Name }}: {{ .Value | markdown | html }}
    {{ end }}
    **Source:** **http://grafana.monitor.bj.univer.top/d/xLrfYirmk/a-li-yun-gao-jing?orgId=1**

    {{ end }}{{ end }}

    {{ define "ding.link.title" }}{{ template "__subject" . }}{{ end }}
    {{ define "ding.link.content" }}#### \[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}\] **[{{ index .GroupLabels "alertname" }}]({{ template "__alertmanagerURL" . }})**
    {{ template "__text_alert_list" .Alerts.Firing }}
    {{ end }}
    kind: ConfigMap
    metadata:
    creationTimestamp: 2019-05-13T07:15:10Z
    labels:
    app: dingtalk-alert
    name: dingtalk-alert-configmap
    namespace: monitor
  • vim dingtalk-service.yml
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    apiVersion: v1
    kind: Service
    metadata:
    creationTimestamp: 2019-05-13T07:15:10Z
    labels:
    app: dingtalk-alert
    name: dingtalk-alert-svc
    namespace: monitor
    resourceVersion: "31476279"
    selfLink: /api/v1/namespaces/monitor/services/dingtalk-alert-svc
    uid: d81b6783-754e-11e9-8566-00163e10fec4
    spec:
    clusterIP: 172.19.1.47
    externalTrafficPolicy: Cluster
    ports:
    - name: service
    nodePort: 31221
    port: 8060
    protocol: TCP
    targetPort: 8060
    selector:
    app: dingtalk-alert
    sessionAffinity: None
    type: NodePort

修改alertmanager配置

  • alertmanager增加webhook选项,指定钉钉插件
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    alertmanager.yml: |
    global:
    resolve_timeout: 5m
    receivers:
    - name: webhook
    webhook_configs:
    - url: http://dingtalk-alert-svc:8060/dingtalk/ops/send
    send_resolved: true
    route:
    group_interval: 5m
    group_wait: 10s
    receiver: webhook
    repeat_interval: 3h

    整体结构如下