This implements a drop-in configuration directory for the kubelet
by introducing a "--config-dir" flag. Users can provide individual
kubelet config snippets in separate files, formatted similarly to
kubelet.conf. The kubelet will process the files in alphanumeric order,
appending configurations if subfield(s) doesn't exist, overwriting them if
they do, and handling lists by overwriting instead of merging.
Co-authored-by: Yu Qi Zhang <jerzhang@redhat.com>
This patch modifies kubelet to get the cgroup driver setting from the
CRI runtime using the newly added RuntimeConfig rpc. The new code path
only takes place if the KubeletCgroupDriverFromCRI feature gate is
enabled. If the runtime returns a not-implemented error kubelet falls
back to using the cgroupDriver configuration option, with a log message
instructing the user to upgrade to w newer container runtime. Other rpc
errors cause kubelet to exit as is the case if the runtime returns an
unknown cgroup driver.
This patch refactors the kubelet startup code to initialize the runtime
service earlier in the startup sequence. We want this to be able to
query the cgroup driver setting from the CRI befure initializing the
cgroup manager.
* [API REVIEW] ValidatingAdmissionPolicyStatucController config.
worker count.
* ValidatingAdmissionPolicyStatus controller.
* remove CEL typechecking from API server.
* fix initializer tests.
* remove type checking integration tests
from API server integration tests.
* validatingadmissionpolicy-status options.
* grant access to VAP controller.
* add defaulting unit test.
* generated: ./hack/update-codegen.sh
* add OWNERS for VAP status controller.
* type checking test case.
When someone decides that a Pod should definitely run on a specific node, they
can create the Pod with spec.nodeName already set. Some custom scheduler might
do that. Then kubelet starts to check the pod and (if DRA is enabled) will
refuse to run it, either because the claims are still waiting for the first
consumer or the pod wasn't added to reservedFor. Both are things the scheduler
normally does.
Also, if a pod got scheduled while the DRA feature was off in the
kube-scheduler, a pod can reach the same state.
The resource claim controller can handle these two cases by taking over for the
kube-scheduler when nodeName is set. Triggering an allocation is simpler than
in the scheduler because all it takes is creating the right
PodSchedulingContext with spec.selectedNode set. There's no need to list nodes
because that choice was already made, permanently. Adding the pod to
reservedFor also isn't hard.
What's currently missing is triggering de-allocation of claims to re-allocate
them for the desired node. This is not important for claims that get created
for the pod from a template and then only get used once, but it might be
worthwhile to add de-allocation in the future.
TL;DR: we want to start failing the LB HC if a node is tainted with ToBeDeletedByClusterAutoscaler.
This field might need refinement, but currently is deemed our best way of understanding if
a node is about to get deleted. We want to do this only for eTP:Cluster services.
The goal is to connection draining terminating nodes