k8s从1.7升1.9版本遇到的一个问题

线上k8s版本比较旧,之前同事部署后一直没升级过,经讨论决定升级到1.9.11,但直接替换kubelet的二进制包后,node上的容器会直接被重启。

被谁重启的

通过日志查到kubelet中有重启记录,甚至包含了原因:container spec hash changed

I0114 17:57:42.715551   12945 kuberuntime_manager.go:550] Container "prometheus-node-exporter" ({"docker" "f59c4812a66d65572020efab38780c1271d671330b126642653390dc8b8d29f1"})
                                       of pod prometheus-node-exporter-l7vzz_monitoring(4ec492d2-17de-11e9-9206-52540064c479):
                                       Container spec hash changed (1559107639 vs 1428860573).. Container will be killed and recreated.

id变了? 这时查看节点上的pod的相关文件/var/lib/kubelet/pods/$podId的Id并未改变,但/sys/fs/cgroup/cpu/kubepods/besteffort/pod$podId下的容器id除了pause没变另一个容器ID已经改变(因为已经重启)

进入到对应版本的代码kuberuntime_manager.go:550查看:

glog.V(2).Infof("Container %q (%q) of pod %s: %s", container.Name, containerStatus.ID, format.Pod(pod), message)

查看上下文,发现是kuberuntime_manager.go:522的containerChanged(&container, containerStatus)反馈容器改变触发的重建:

		// The container is running, but kill the container if any of the following condition is met.
		reason := ""
		restart := shouldRestartOnFailure(pod)
		if expectedHash, actualHash, changed := containerChanged(&container, containerStatus); changed {
			reason = fmt.Sprintf("Container spec hash changed (%d vs %d).", actualHash, expectedHash)
			// Restart regardless of the restart policy because the container
			// spec changed.
			restart = true
		} else if liveness, found := m.livenessManager.Get(containerStatus.ID); found && liveness == proberesults.Failure {
			// If the container failed the liveness probe, we should kill it.
			reason = "Container failed liveness probe."
		} else {
			// Keep the container.
			keepCount += 1
			continue
		}
		// We need to kill the container, but if we also want to restart the
		// container afterwards, make the intent clear in the message. Also do
		// not kill the entire pod since we expect container to be running eventually.
		message := reason
		if restart {
			message = fmt.Sprintf("%s. Container will be killed and recreated.", message)
			changes.ContainersToStart = append(changes.ContainersToStart, idx)
		}

		changes.ContainersToKill[containerStatus.ID] = containerToKillInfo{
			name:      containerStatus.Name,
			container: &pod.Spec.Containers[idx],
			message:   message,
		}
		glog.V(2).Infof("Container %q (%q) of pod %s: %s", container.Name, containerStatus.ID, format.Pod(pod), message)
	}

即:containerStatus.Hash != expectedHash 而1.7也是做同样的检查:

expectedHash := kubecontainer.HashContainer(&container)
containerChanged := containerStatus.Hash != expectedHash
if containerChanged {
			message := fmt.Sprintf("Pod %q container %q hash changed (%d vs %d), it will be killed and re-created.",
				pod.Name, container.Name, containerStatus.Hash, expectedHash)
      containerChanged := containerStatus.Hash != expectedHash

而且相关的计算算法并没有改变,但container自身struct有了改变,并且是从1.7到1.8有改变,从1.8到1.9也在变。因此即便一个一个版本的升级也无法避免此类重启。

解决

  • 在计算hash的时候,将1.9新增加的字段内容去掉确保和1.7一致。 弊端:在后续升级的时候,依然需要考虑hash的问题。
  • 接受重启。

参考:https://github.com/kubernetes/kubernetes/issues/53644 同事佶澳的博客更详细的记录了整个过程:https://www.lijiaocn.com/%E9%97%AE%E9%A2%98/2019/01/14/kubelet-updates-container-restart.html