/var/log/messages ================= - Various RPC errors. ... rpc error: code = # desc = xxx ... - PLEG is not healthy: pleg was last seen active 3m0.448988393s ago; threshold is 3m0s This is severe and indicates communication probelm (or at least high latency) with docker daemon. As result the node can be marked temporary NotReady and cause eviction of all resident pods. - container kill failed because of 'container not found' or 'no such process': Cannot kill container ###: rpc error: code = 2 desc = no such process" Despite the errror, the containers are actually killed and pods destroyed. However, this error likely triggers problem with rogue interfaces staying on the OpenVSwitch bridge. - RunPodSandbox from runtime service failed: rpc error: code = 2 desc = NetworkPlugin cni failed to set up pod "kdb-server-testing-180-build_katrin" network: CNI request failed with status 400: 'failed to run IPAM for 4b56e403e2757d38dca67831ce09e10bc3b3f442b6699c20dcd89556763e2d5d: failed to run CNI IPAM ADD: no IP addresses available in network: openshift-sdn CreatePodSandbox for pod "kdb-server-testing-180-build_katrin(65640902-3bd6-11ea-bbd6-0cc47adef0e6)" failed: rpc error: code = 2 desc = NetworkPlugin cni failed to set up pod "kdb-server-testing-180-build_katrin" network: CNI request failed with status 400: 'failed to run IPAM for 4b56e403e2757d38dca67831ce09e10bc3b3f442b6699c20dcd89556763e2d5d: failed to run CNI IPAM ADD: no IP addresses available in network: openshift-sdn Indicates exhaustion of the IP range of the pod network on the node. This also seems triggered by problems with resource management and pereodic manual clean-up is required. - containerd: unable to save f7c3e6c02cdbb951670bc7ff925ddd7efd75a3bb5ed60669d4b182e5337dec23:d5b9394468235f7c9caca8ad4d97e7064cc49cd59cadd155eceae84545dc472a starttime: read /proc/81994/stat: no such process containerd: f7c3e6c02cdbb951670bc7ff925ddd7efd75a3bb5ed60669d4b182e5337dec23:d5b9394468235f7c9caca8ad4d97e7064cc49cd59cadd155eceae84545dc472a (pid 81994) has become an orphan, killing it Seems a bug in docker 1.12* which is resolved in 1.13.0rc2. No side effects according to the issue. https://github.com/moby/moby/issues/28336 - W0625 03:49:34.231471 36511 docker_sandbox.go:337] failed to read pod IP from plugin/docker: NetworkPlugin cni failed on the status hook for pod "...": Unexpected command output nsenter: cannot open /proc/63586/ns/net: No such file or directory - W0630 21:40:20.978177 5552 docker_sandbox.go:337] failed to read pod IP from plugin/docker: NetworkPlugin cni failed on the status hook for pod "...": CNI failed to retrieve network namespace path: Cannot find network namespace for the terminated container "..." Probably refered by the following bug report and accordingly can be ignored... https://bugzilla.redhat.com/show_bug.cgi?id=1434950 - E0630 14:05:40.304042 5552 glusterfs.go:148] glusterfs: failed to get endpoints adei-cfg[an empty namespace may not be set when a resource name is provided] E0630 14:05:40.304062 5552 reconciler.go:367] Could not construct volume information: MountVolume.NewMounter failed for volume "kubernetes.io/glusterfs/4 I guess some configuration issue.... Probably can be ignored... - kernel: SELinux: mount invalid. Same superblock, different security settings for (dev mqueue, type mqueue) There are no adverse effects to this. It is a potential kernel issue, but should be just ignored by the customer. Nothing is going to break. https://bugzilla.redhat.com/show_bug.cgi?id=1425278 - E0625 03:59:52.438970 23953 watcher.go:210] watch chan error: etcdserver: mvcc: required revision has been compacted seems fine and can be ignored. - E0926 09:29:50.744454 93115 mount_linux.go:172] Mount failed: exit status 1 Output: Failed to start transient scope unit: Connection timed out It seems caused by too many parallel mounts (about 500 per-node) may cause systemd to hang. Details: https://github.com/kubernetes/kubernetes/issues/79194 * Suggested to use 'setsid' to mount volumes instead of 'systemd-run' /var/log/openvswitch/ovs-vswitchd.log ===================================== - bridge|WARN|could not open network device veth7d33a20f (No such device) Indicates cleanup pod-cleanup failure and may cause problems during pod-scheduling.