With Topology Manager enabled by default, we no longer need
`resourceAllocator` as Topology Manager serves as the main
PodAdmitHandler completely responsible for admission check
based on hints received from the hintProviders and the
subsequent allocation of the corresponding resources to a
pod as can be seen here:
https://github.com/kubernetes/kubernetes/blob/v1.26.0/pkg/kubelet/cm/topologymanager/scope.go#L150
With regard to DRA, the passing of `cm.draManager` into
resourceAllocator seems redundant as no admission checks
(and allocation of resources handled by DRA) is taking place
in `Admit` method of resourceAllocator. DRA has a completely
different model to the rest of the resource managers where
pod is only scheduled on a node once resources are reserved
for it. Because of this, admission checks or waiting for
resources to be provisioned after the pod has been scheduled
on the node is not required.
Before making the above change, it was verified that DRA Manager
is instantiated in `NewContainerManager`:
https://github.com/kubernetes/kubernetes/blob/v1.26.0/pkg/kubelet/cm/container_manager_linux.go#L318
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
Since Topology manager is graduating to GA, we remove
internal configuration variable names with `Experimental`
prefix.
There is no expected change in behavior, only trival
variable renaming.
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
In order to implement the `full-pcpus-only` cpumanager policy option,
we leverage the implementation of the algorithm which picks CPUs.
By design, CPUs are taken from the biggest chunk available (socket
or NUMA zone) to physical cores, down to single cores.
Leveraging this, if the requested CPU count is a multiple of the SMT
level (commonly 2), we're guaranteed that only full physical cores
will be taken.
The hidden assumption here is this holds true by construction iff
the user reserved CPUs (if any) considering full physical CPUs.
IOW, if the user did intentionally or mistakely reserve single threads
which are no core siblings[1], then the simple check we implemented
is not sufficient.
A easy example can probably outline this better. With this setup:
cores: [(0, 4), (1, 5), (2, 6), (3, 8)] (in parens: thread siblings).
SMT level: 2 (each tuple is 2 elements)
Reserved CPUs: 0,1 (explicit pick using `--reserved-cpus`)
A container then requests 6 cpus. full-pcpus-only check: 6 % 2 == 0. Passed.
The CPU allocator will take first full cores, (2,6) and (3,8), and will
then pick the remaining single CPUs. The allocation will succeed, but
it's incorrect.
We can fix this case with a stricter precheck.
We need to additionally consider all the core siblings of the reserved
CPUs as unavailable when computing the free cpus, before to start the
actual allocation. Doing so, we fall back in the intended behavior, and
by construction all possible CPUs allocation whose number is multiple
of the SMT level are now correct again.
+++
[1] or thread siblings in the linux parlance, in any case:
hyperthread siblings of the same physical core
Signed-off-by: Francesco Romani <fromani@redhat.com>
Passing in a context instead of a stop channel has several advantages:
- ensures that client-go calls return as soon as the controller is asked to stop
- contextual logging can be used
By passing that context down to its own functions and checking it while
waiting, the lease controller also doesn't get stuck in backoffEnsureLease
anymore (https://github.com/kubernetes/kubernetes/issues/116196).
This improves performance of the text formatting and ktesting.
Because ktesting no longer buffers messages by default, one unit
test needs to ask for that explicitly.
1. Scheduler bug-fix + scheduler-focussed E2E tests
2. Add cgroup v2 support for in-place pod resize
3. Enable full E2E pod resize test for containerd>=1.6.9 and EventedPLEG related changes.
Co-Authored-By: Vinay Kulkarni <vskibum@gmail.com>
1. Core Kubelet changes to implement In-place Pod Vertical Scaling.
2. E2E tests for In-place Pod Vertical Scaling.
3. Refactor kubelet code and add missing tests (Derek's kubelet review)
4. Add a new hash over container fields without Resources field to allow feature gate toggling without restarting containers not using the feature.
5. Fix corner-case where resize A->B->A gets ignored
6. Add cgroup v2 support to pod resize E2E test.
KEP: /enhancements/keps/sig-node/1287-in-place-update-pod-resources
Co-authored-by: Chen Wang <Chen.Wang1@ibm.com>
Feedback from https://github.com/kubernetes/utils/pull/267 and related
reviews.
* Equality when insertion order is different
* UnsortedList contents
* Not-Subset cases
* Clone coverage
Currently, there are some unit tests that are failing on Windows due to
various reasons:
- On Windows, consecutive time.Now() calls may return the same timestamp, which would cause
the TestFreeSpaceRemoveByLeastRecentlyUsed test to flake.
- tests in kuberuntime_container_windows_test.go fail on Nodes that have fewer than 3 CPUs,
expecting the CPU max set to be more than 100% of available CPUs, which is not possible.
- calls in summary_windows_test.go are missing context.
- filterTerminatedContainerInfoAndAssembleByPodCgroupKey will filter and group container
information by the Pod cgroup key, if it exists. However, we don't have cgroups on Windows,
thus we can't make the same assertions.