When using a legacy cloud provider, if kubelet is passed a node address
in --node-ip it will use this address in preference out the the
addresses by the cloud provider.
When using an external cloud provider, kubelet will annotate the Node
with the first --node-ip for use by the cloud provider. The cloud
provider validates this annotation but does not otherwise use it,
meaning that --node-ip has no effect.
This change moves the node address filtering code from kubelet to
component-helpers and updates both kubelet and cloud-provider to use it.
There is no functional change to kubelet, but cloud-provider now honours
kubelet's --node-ip.
Create an E2E test that creates a job that spawns a pod that should
succeed. The job reserves a fixed amount of CPU and has a large number
of completions and parallelism. Use to repro github.com/kubernetes/kubernetes/issues/106884
Signed-off-by: David Porter <david@porter.me>
Other components must know when the Kubelet has released critical
resources for terminal pods. Do not set the phase in the apiserver
to terminal until all containers are stopped and cannot restart.
As a consequence of this change, the Kubelet must explicitly transition
a terminal pod to the terminating state in the pod worker which is
handled by returning a new isTerminal boolean from syncPod.
Finally, if a pod with init containers hasn't been initialized yet,
don't default container statuses or not yet attempted init containers
to the unknown failure state.
Previously, callers of `Exists()` would not know why the cGroup was or
was not existing. In one call-site in particular, the `kubelet` would
entirely fail to start if the cGroup validation did not succeed. In
these cases we MUST explain what went wrong and pass that information
clearly to the caller. Previously, some but not all of the reasons for
invalidation were logged at a low log-level instead. This led to poor
UX.
The original method was retained on the interface so as to make this
diff small.
Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>
Instead of doing (almost) the same thing from the three different
methods (Create, Update, Destroy), move the functionality to
libctCgroupConfig, replacing updateSystemdCgroupInfo.
The needResources bool is needed because we do not need resources
during Destroy, so we skip the unneeded resource conversion.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Commit 79be8be10e made hugetlb settings optional if cgroup v2 is used and
hugetlb is not available, fixing issue 92933. Note at that time this was only
needed for v2, because for v1 the resources were set one-by-one, and only for
supported resources.
Commit d312ef7eb6 switched the code to using Set from runc/libcontainer
cgroups manager, and expanded the check to cgroup v1 as well.
Move this check earlier, to inside m.toResources, so instead of
converting all hugetlb resources from ResourceConfig to libcontainers's
Resources.HugetlbLimit, and then setting it to nil, we can skip the
conversion entirely if hugetlb is not supported, thus not doing the work
that is not needed.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Commit ecd6361f added setting PidsLimit to Create and Update.
Commit bce9d5f2 added setting PidsLimit to m.toResources.
Now, PidsLimit is assigned twice.
Remove the duplicate.
Fixes: bce9d5f2
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
There's no need to call m.Update (which will create another instance of
libcontainer cgroup manager, convert all the resources and then set
them). All this is already done here, except for Set().
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>