0%

K8S节点异常问题排查

节点故障排查思路

参考文档:

查看节点状态

1
kubectl describe nodes $nodename

查看ntp

1
2
3
systemctl status chronyd
systemctl restart chronyd
journalctl -u chronyd

重启kubelet和docker

1
2
3
4
5
6
7
8
systemctl stop kubelet
systemctl stop docker
systemctl stop docker.socket
systemctl stop containerd
systemctl daemon-reload
systemctl start containerd
systemctl start docker
systemctl start kubelet

docker启动失败

docker启动卡住,查看日志

1
2
systemctl status docker -l
journalctl -ru docker

报错:

1
Error (Unable to complete atomic operation, key modified) deleting object [endpoint 622bf1a499580702606742e5f5554ac99e7c0d61abcd5d9063881fc2da33d16f afdce62ce70de2cbe5a971b05521280940947e4968c163e48c3e5252919a4fae], retrying....

解决办法:

1
2
3
4
ps -ef | grep docker
kill -9 xxx
systemctl stop containerd
systemctl start docker

降级docker

1
2
3
4
5
6
7
8
9
10
11
docker version
yum list docker-ce --showduplicates | sort -r
systemctl stop kubelet
systemctl stop docker
systemctl stop docker.socket
systemctl stop containerd
version=19.03.15
yum downgrade --setopt=obsoletes=0 -y docker-ce-${version} docker-ce-cli-${version} docker-ce-selinux-${version} containerd.io
systemctl start containerd
systemctl start docker
systemctl start kubelet

PLEG is not healthy

Pod生命周期事件生成器PLEG(Pod Lifecycle Event Generator)会记录Pod生命周期中的各种事件,如容器的启动、终止等。PLEG is not healthy异常通常是由于节点上的运行时进程异常或者节点Systemd版本缺陷导致。

  • 本文作者: 好好学习的郝
  • 本文链接: https://www.voidking.com/dev-k8s-node-problem/
  • 版权声明: 本博客所有文章除特别声明外,均采用 BY-NC-SA 许可协议。转载请注明出处!源站会及时更新知识点及修正错误,阅读体验也更好。欢迎分享,欢迎收藏~