Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
dockershim: fine-tune network-ready handling on sandbox teardown and removal
If sandbox teardown results in an error, GC will periodically attempt
to again remove the sandbox. Until the sandbox is removed, pod
sandbox status calls will attempt to enter the pod's namespace and
retrieve the pod IP, but the first teardown attempt may have already
removed the network namespace, resulting in a pointless log error
message that the network namespace doesn't exist, or that nsenter
can't find eth0.
The network-ready mechanism originally attempted to suppress those
messages by ensuring that pod sandbox status skipped network checks
when networking was already torn down, but unfortunately the ready
value was cleared too early.
Also, don't tear down the pod network multiple times if the first
time we tore it down, it succeeded.
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
```
If sandbox teardown results in an error, GC will periodically attempt
to again remove the sandbox. Until the sandbox is removed, pod
sandbox status calls will attempt to enter the pod's namespace and
retrieve the pod IP, but the first teardown attempt may have already
removed the network namespace, resulting in a pointless log error
message that the network namespace doesn't exist, or that nsenter
can't find eth0.
The network-ready mechanism originally attempted to suppress those
messages by ensuring that pod sandbox status skipped network checks
when networking was already torn down, but unfortunately the ready
value was cleared too early.
Also, don't tear down the pod network multiple times if the first
time we tore it down, it succeeded.
To ensure kubelet doesn't attempt network teardown on HostNetwork
containers that no longer exist but are still checkpointed, make
sure we preserve the HostNetwork property in checkpoints. If
the checkpoint indicates the container was a HostNetwork one,
don't tear down the network since that would fail anyway.
Related: https://github.com/kubernetes/kubernetes/issues/44307#issuecomment-299548609
GenericPLEG's 1s relist() loop races against pod network setup. It
may be called after the infra container has started but before
network setup is done, since PLEG and the runtime's SyncPod() run
in different goroutines.
Track network setup status and don't bother trying to read the pod's
IP address if networking is not yet ready.
See also: https://bugzilla.redhat.com/show_bug.cgi?id=1434950
Mar 22 12:18:17 ip-172-31-43-89 atomic-openshift-node: E0322
12:18:17.651013 25624 docker_manager.go:378] NetworkPlugin
cni failed on the status hook for pod 'pausepods22' - Unexpected
command output Device "eth0" does not exist.
GenericPLEG's 1s relist() loop races against pod network setup. It
may be called after the infra container has started but before
network setup is done, since PLEG and the runtime's SyncPod() run
in different goroutines.
Track network setup status and don't bother trying to read the pod's
IP address if networking is not yet ready.
See also: https://bugzilla.redhat.com/show_bug.cgi?id=1434950
Mar 22 12:18:17 ip-172-31-43-89 atomic-openshift-node: E0322
12:18:17.651013 25624 docker_manager.go:378] NetworkPlugin
cni failed on the status hook for pod 'pausepods22' - Unexpected
command output Device "eth0" does not exist.
Automatic merge from submit-queue (batch tested with PRs 44326, 45768)
[CRI] Forcibly remove container
Forcibly remove the running containers in `RemoveContainer`. Since we should forcibly remove the running containers in `RemovePodSandbox`. See [here](https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/api/v1alpha1/runtime/api.proto#L35).
cc @feiskyer @Random-Liu
Signed-off-by: Xianglin Gao <xlgao@zju.edu.cn>
Automatic merge from submit-queue (batch tested with PRs 42025, 44169, 43940)
[CRI] Remove all containers in the sandbox
Remove all containers in the sandbox, when we remove the sandbox.
/cc @feiskyer @Random-Liu
Signed-off-by: Xianglin Gao <xlgao@zju.edu.cn>
PR #29378 introduces ClusterFirstWithHostNet, but docker doesn't support
setting dns options togather with hostnetwork. This commit rewrites
resolv.conf same as dockertools.
In ListPodSandbox(), we
1. List all sandbox docker containers
2. List all sandbox checkpoints. If the checkpoint does not have a
corresponding container in (1), we return partial result based on
the checkpoint.
The problem is that new PodSandboxes can be created between step (1) and
(2). In those cases, we will see the checkpoints, but not the sandbox
containers. This leads to strange behavior because the partial result
from the checkpoint does not include some critical information. For
example, the creation timestamp'd be zero, and that would cause kubelet's
garbage collector to immediately remove the sandbox.
This change fixes that by getting the list of checkpoints before listing
all the containers (since in RunPodSandbox we create them in the reverse
order).
Automatic merge from submit-queue (batch tested with PRs 41980, 42192, 42223, 41822, 42048)
CRI: Make dockershim better implements CRI.
When thinking about CRI Validation test, I found that `PodSandboxStatus.Linux.Namespaces.Options.HostPid` and `PodSandboxStatus.Linux.Namespaces.Options.HostIpc` are not populated. Although they are not used by kuberuntime now, we should populate them to conform to CRI.
/cc @yujuhong @feiskyer
Automatic merge from submit-queue (batch tested with PRs 38212, 38792, 39641, 36390, 39005)
Set MemorySwap to zero on Windows
Fixes https://github.com/kubernetes/kubernetes/issues/39003
@dchen1107 @michmike @kubernetes/sig-node-misc
We have observed that, after failing to create a container due to "device or
resource busy", docker may end up having inconsistent internal state. One
symptom is that docker will not report the existence of the "failed to create"
container, but if kubelet tries to create a new container with the same name,
docker will error out with a naming conflict message.
To work around this, this commit parses the creation error message and if there
is a naming conflict, it would attempt to remove the existing container.
Automatic merge from submit-queue
CRI: Handle empty container name in dockershim.
Fixes https://github.com/kubernetes/kubernetes/issues/35924.
Dead container may have no name, we should handle this properly.
@yujuhong @bprashanth
The enum constants are not namespaced. The shorter, unspecifc names are likely
to cause naming conflicts in the future.
Also replace "SandBox" with "Sandbox" in the API.
Automatic merge from submit-queue
Add sysctls for dockershim
This PR adds sysctls support for dockershim. All sysctls e2e tests are passed in my local settings.
Note that sysctls runtimeAdmit is not included in this PR, it is addressed in #32803.
cc/ @yujuhong @Random-Liu
Automatic merge from submit-queue
CRI: Implement temporary ImageStats in kuberuntime_manager
For #33048 and #33189.
This PR:
1) Implement a temporary `ImageStats` in kuberuntime_manager.go
2) Add container name label on infra container to make the current summary api logic work with dockershim.
I run the summary api test locally and it passed for me. Notice that the original summary api test is not showing up on CRI testgrid because it was removed yesterday. It will be added back in https://github.com/kubernetes/kubernetes/pull/33779.
@yujuhong @feiskyer
Automatic merge from submit-queue
Add seccomp and apparmor support.
This PR adds seccomp and apparmor support in new CRI.
This a WIP because I'm still adding unit test for some of the functions. Sent this PR here for design discussion.
This PR is similar with https://github.com/kubernetes/kubernetes/pull/33450.
The differences are:
* This PR passes seccomp and apparmor configuration via annotations;
* This PR keeps the seccomp handling logic in docker shim because current seccomp implementation is very docker specific, and @timstclair told me that even the json seccomp profile file is defined by docker.
Notice that this PR almost passes related annotations in `api.Pod` to the runtime directly instead of introducing new CRI annotation.
@yujuhong @feiskyer @timstclair
Add a new docker integration with kubelet using the new runtime API.
This change adds the package with some skeletons, and implements some
of the basic operations.