1 安装 agent
指标说明
名字 |
暴露信息 |
rocketmq_producer_tps |
每秒每个主题生成的消息数 |
rocketmq_producer_message_size |
主题每秒生成的消息的大小(字节) |
rocketmq_producer_offset |
主题生成消息的进度 |
rocketmq_consumer_tps |
消费者群体每秒消耗的消息数 |
rocketmq_consumer_message_size |
消费者群体每秒消耗的消息大小(字节) |
rocketmq_consumer_offset |
消费群体消费信息的进展 |
rocketmq_group_get_latency |
消费者延迟对一个队列的某个主题 |
rocketmq_group_get_latency_by_storetime |
消费群体的消费延迟时间 |
rocketmq_message_accumulation |
消费者抵消滞后程度 |
rocketmq_client_consume_fail_msg_count |
消耗的消息数量在一小时内失败 |
rocketmq_client_consume_fail_msg_tps |
消耗的消息数量每秒失败 |
rocketmq_client_consume_ok_msg_tps |
每秒消耗成功的消息数 |
rocketmq_client_consume_rt |
消耗每条消息的平均时间 |
rocketmq_client_consumer_pull_rt |
拉每个消息的平均时间 |
rocketmq_client_consumer_pull_tps |
客户端每秒提取的消息数 |
1.1 下载插件
Apache RocketMQ Prometheus Exporter
1.2 修改配置
vim src/main/resources/application.yml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
| rocketmq: config: webTelemetryPath: /metrics rocketmqVersion: 4_8_0 namesrvAddr: 127.0.0.1:9876 enableCollect: true enableACL: false accessKey: secretKey:
task: count: 5 collectTopicOffset: cron: 30 0/1 * * * ? collectConsumerOffset: cron: 30 0/1 * * * ? collectBrokerStatsTopic: cron: 30 0/1 * * * ? collectBrokerStats: cron: 30 0/1 * * * ? collectBrokerRuntimeStats: cron: 30 0/1 * * * ?
|
1.3 打包
mvn clean install
1.4 启动
nohup /opt/java8/bin/java -jar rocketmq-exporter-0.0.2-SNAPSHOT.jar &
2 收集数据与监控
kubectl edit configmap prometheus-server -n ops
2.1 配置 prometheus 收集任务
1 2 3 4 5
| scrape_configs: - job_name: rocketmq static_configs: - targets: - 172.16.3.17:5557
|
2.2 配置监控策略
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
| rules: |- groups: - name: rocketmq rules: - alert: RocketMQ Exporter is Down expr: up{job="rocketmq"} == 0 for: 20s labels: severity: '灾难' annotations: summary: RocketMQ {{ $labels.instance }} is down - alert: RocketMQ 存在消息积压 expr: (sum(irate(rocketmq_producer_offset[1m])) by (topic) - on(topic) group_right sum(irate(rocketmq_consumer_offset[1m])) by (group,topic)) > 5 for: 5m labels: severity: '警告' annotations: summary: RocketMQ (group={{ $labels.group }} topic={{ $labels.topic }})积压数 = {{ .Value }} - alert: GroupGetLatencyByStoretime 消费组的消费延时时间过高 expr: rocketmq_group_get_latency_by_storetime/1000 > 5 and rate(rocketmq_group_get_latency_by_storetime[5m]) >0 for: 3m labels: severity: 警告 annotations: description: 'consumer {{$labels.group}} on {{$labels.broker}}, {{$labels.topic}} consume time lag behind message store time and (behind value is {{$value}}).' summary: 消费组的消费延时时间过高 - alert: RocketMQClusterProduceHigh 集群TPS > 20 expr: sum(rocketmq_producer_tps) by (cluster) >= 20 for: 3m labels: severity: 警告 annotations: description: '{{$labels.cluster}} Sending tps too high. now TPS = {{ .Value }}' summary: cluster send tps too high
|
2.3 配置 grafana
导入模板 10477
3 配置告警
3.1 下载插件
prometheus-webhook-dingtalk
3.2 钉钉插件配置
cat config.yml
1 2 3 4 5 6 7 8
| templates: \- /opt/prometheus-webhook-dingtalk/template.tmpl targets: webhook1: url: https://oapi.dingtalk.com/robot/send?access_token=ac5f4916af10804b1aeffe9f5f45574a9af8e7cdd8436bcf1dc2448a85116fba secret: SEC6f3e3e736f33a8f8692e3f4f9e1c0828ac41fc514c99c5215fd21659bxxxx mention: mobiles: ['1810133xxxx', '1871712xxxx']
|
cat template.tmpl
1 2 3 4 5 6 7 8 9 10 11
| {{ define "ding.link.title" }}{{ template "legacy.title" . }}{{ end }} {{ define "ding.link.content" }} {{ if gt (len .Alerts.Firing) 0 -}} 告警列表: {{ template "__text_alert_list" .Alerts.Firing }} {{- end }} {{ if gt (len .Alerts.Resolved) 0 -}} 恢复列表: {{ template "__text_resolve_list" .Alerts.Resolved }} {{- end }} {{- end }}
|
3.3 启动 prometheus-webhook-dingtalk
/opt/prometheus-webhook-dingtalk/prometheus-webhook-dingtalk --log.level=info > dingding.log 2>&1 &
3.3.1 查看钉钉插件接口
3.4 prometheus 告警配置
kubectl edit configmap prometheus-alertmanager -n ops
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
| alertmanager.yml: |- global: resolve_timeout: 5m route: receiver: webhook group_wait: 30s group_interval: 1m repeat_interval: 4h group_by: [alertname] routes: - receiver: webhook group_wait: 10s receivers: - name: webhook webhook_configs: - url: http://172.16.3.1x:8060/dingtalk/webhook1/send send_resolved: true
|