线上k8s版本比较旧,之前同事部署后一直没升级过,经讨论决定升级到1.9.11,但直接替换kubelet的二进制包后,node上的容器会直接被重启。
被谁重启的
通过日志查到kubelet中有重启记录,甚至包含了原因:container spec hash changed
I0114 17:57:42.715551 12945 kuberuntime_manager.go:550] Container "prometheus-node-exporter" ({"docker" "f59c4812a66d65572020efab38780c1271d671330b126642653390dc8b8d29f1"})
of pod prometheus-node-exporter-l7vzz_monitoring(4ec492d2-17de-11e9-9206-52540064c479):
Container spec hash changed (1559107639 vs 1428860573).. Container will be killed and recreated.
id变了? 这时查看节点上的pod的相关文件/var/lib/kubelet/pods/$podId的Id并未改变,但/sys/fs/cgroup/cpu/kubepods/besteffort/pod$podId下的容器id除了pause没变另一个容器ID已经改变(因为已经重启)
进入到对应版本的代码kuberuntime_manager.go:550查看:
glog.V(2).Infof("Container %q (%q) of pod %s: %s", container.Name, containerStatus.ID, format.Pod(pod), message)
查看上下文,发现是kuberuntime_manager.go:522的containerChanged(&container, containerStatus)反馈容器改变触发的重建:
// The container is running, but kill the container if any of the following condition is met.
reason := ""
restart := shouldRestartOnFailure(pod)
if expectedHash, actualHash, changed := containerChanged(&container, containerStatus); changed {
reason = fmt.Sprintf("Container spec hash changed (%d vs %d).", actualHash, expectedHash)
// Restart regardless of the restart policy because the container
// spec changed.
restart = true
} else if liveness, found := m.livenessManager.Get(containerStatus.ID); found && liveness == proberesults.Failure {
// If the container failed the liveness probe, we should kill it.
reason = "Container failed liveness probe."
} else {
// Keep the container.
keepCount += 1
continue
}
// We need to kill the container, but if we also want to restart the
// container afterwards, make the intent clear in the message. Also do
// not kill the entire pod since we expect container to be running eventually.
message := reason
if restart {
message = fmt.Sprintf("%s. Container will be killed and recreated.", message)
changes.ContainersToStart = append(changes.ContainersToStart, idx)
}
changes.ContainersToKill[containerStatus.ID] = containerToKillInfo{
name: containerStatus.Name,
container: &pod.Spec.Containers[idx],
message: message,
}
glog.V(2).Infof("Container %q (%q) of pod %s: %s", container.Name, containerStatus.ID, format.Pod(pod), message)
}
即:containerStatus.Hash != expectedHash 而1.7也是做同样的检查:
expectedHash := kubecontainer.HashContainer(&container)
containerChanged := containerStatus.Hash != expectedHash
if containerChanged {
message := fmt.Sprintf("Pod %q container %q hash changed (%d vs %d), it will be killed and re-created.",
pod.Name, container.Name, containerStatus.Hash, expectedHash)
containerChanged := containerStatus.Hash != expectedHash
而且相关的计算算法并没有改变,但container自身struct有了改变,并且是从1.7到1.8有改变,从1.8到1.9也在变。因此即便一个一个版本的升级也无法避免此类重启。
解决
- 在计算hash的时候,将1.9新增加的字段内容去掉确保和1.7一致。 弊端:在后续升级的时候,依然需要考虑hash的问题。
- 接受重启。
参考:https://github.com/kubernetes/kubernetes/issues/53644 同事佶澳的博客更详细的记录了整个过程:https://www.lijiaocn.com/%E9%97%AE%E9%A2%98/2019/01/14/kubelet-updates-container-restart.html