Prevent Kubelet from incorrectly interpreting "not yet started" pods as "ready to terminate pods" by unifying responsibility for pod lifecycle into pod worker
Consume in the static policy the cpu manager policy options from
the cpumanager instance.
Validate in the none policy if any option is given, and fail if so -
this is almost surely a configuration mistake.
Add new cpumanager.Options type to hold the options and translate from
user arguments to flags.
Co-authored-by: Swati Sehgal <swsehgal@redhat.com>
Signed-off-by: Francesco Romani <fromani@redhat.com>
Introduce a new `admission` subpackage to factor out the responsability
to create `PodAdmitResult` objects. This enables resource manager
to report specific errors in Allocate() and to bubble up them
in the relevant fields of the `PodAdmitResult`.
To demonstrate the approach we refactor TopologyAffinityError as a
proper error.
Co-authored-by: Kevin Klues <kklues@nvidia.com>
Co-authored-by: Swati Sehgal <swsehgal@redhat.com>
Signed-off-by: Francesco Romani <fromani@redhat.com>
The CPUManagerPolicyOptions received from the kubelet config/command line args
is propogated to the Container Manager.
We defer the consumption of the options to a later patch(set).
Co-authored-by: Swati Sehgal <swsehgal@redhat.com>
Signed-off-by: Francesco Romani <fromani@redhat.com>
Files generate after running `make generated_files`.
Co-authored-by: Swati Sehgal <swsehgal@redhat.com>
Signed-off-by: Francesco Romani <fromani@redhat.com>
In this patch we enhance the kubelet configuration to support
cpuManagerPolicyOptions.
In order to introduce SMT-awareness in CPU Manager, we introduce a
new flag in Kubelet to allow the user to specify an additional flag
called `cpumanager-policy-options` to allow the user to modify the
behaviour of static policy to strictly guarantee allocation of whole
core.
Co-authored-by: Francesco Romani <fromani@redhat.com>
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
Through Job.status.uncountedPodUIDs and a Pod finalizer
An annotation marks if a job should be tracked with new behavior
A separate work queue is used to remove finalizers from orphan pods.
Change-Id: I1862e930257a9d1f7f1b2b0a526ed15bc8c248ad
For tracking Job Pods that have finished but are not yet counted as failed or succeeded
And feature gate JobTrackingWithFinalizers
Change-Id: I3e080f3ec090922640384b692e88eaf9a544d3b5
It is not uncommon for users to Create a Service and not specify things
like ClusterIP and NodePort, which we then allocate for them. They same
that YAML somewhere and later use it again in an Update, but then it
fails.
That's because we detected them trying to set a ClusterIP from a value
to "", which is not allowed. If it was just NodePort, they would
actually succeed and reallocate a new port.
After this change, we try to "patch" updates where the user did not
specify those values from the old object.
* set `endpoints.kubernetes.io/over-capacity` to "truncated" when
number of addresses has been truncated to a 1000
* ready addresses are prioritized over non-ready addresses
* addresses are proportionally truncated across subsets
As of now, we allow PDBs to be applied to pods via
selectors, so there can be unmanaged pods(pods that
don't have backing controllers) but still have PDBs associated.
Such pods are to be logged instead of immediately throwing
a sync error. This ensures disruption controller is
not frequently updating the status subresource and thus
preventing excessive and expensive writes to etcd.
oomwatcher.NewWatcher returns "open /dev/kmsg: operation not permitted" error,
when running with sysctl value `kernel.dmesg_restrict=1`.
The error is negligible for KubeletInUserNamespace.
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
Errors during setting the following sysctl values are ignored:
- vm.overcommit_memory
- vm.panic_on_oom
- kernel.panic
- kernel.panic_on_oops
- kernel.keys.root_maxkeys
- kernel.keys.root_maxbytes
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>