exporter 的 pod 曾经 oom 过,重启后使用内存增长迅速,log 报错 timout,手动通过 url 访问没任何问题
1 考虑服务本身问题
1.1 查看事件
因为操作完写的文档,没有记录了,当时有OOM killed字段,怀疑硬件资源问题
1.2 查看系统硬件日志
搜索 oom 会找到报错,重点是:mems_allowed=0,所以 pod 被 kill 掉了
1 2 3 4 5
dmesg -T |grep oom [Mon Jun 6 14:15:29 2022] test-exporter-57bb88f4b4-92b6c invoked oom-killer: gfp_mask=0x40cc0(GFP_KERNEL|__GFP_COMP), order=0, oom_score_adj=999 [Mon Jun 6 14:15:29 2022] oom_kill_process.cold+0xb/0x10 [Mon Jun 6 14:15:29 2022] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name [Mon Jun 6 14:15:29 2022] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=2cc569cbfe3a5704262577c4981320162423466ca195a113062c49c633b719c6,mems_allowed=0,oom_memcg=/kubepods/burstable/pod772b413a-011a-4f9d-bd8d-037298a367df,task_memcg=/kubepods/burstable/pod772b413a-011a-4f9d-bd8d-037298a367df/2cc569cbfe3a5704262577c4981320162423466ca195a113062c49c633b719c6,task=test-exporter-57bb88f4b4-92b6c,pid=20620,uid=0
1.3 查看内存使用
1 2 3
kubectl top pod -n kube-system test-exporter-57bb88f4b4-92b6c --use-protocol-buffers NAME CPU(cores) MEMORY(bytes) test-exporter-57bb88f4b4-92b6c 79m 3301Mi
1.4 手动访问测试
手动测试没任何问题
1 2 3 4 5
curl http://10.1.1.1:9100/metrics -I HTTP/1.1 200 OK Content-Length: 35049 Content-Type: text/plain; version=0.0.4; charset=utf-8 Date: Mon, 06 Jun 2022 05:53:13 GMT
kubectl top pod -n kube-system test-exporter-57bb88f4b4-92b6c --use-protocol-buffers NAME CPU(cores) MEMORY(bytes) test-exporter-57bb88f4b4-92b6c 120m 6378Mi