Commit Graph

1298 Commits

Author SHA1 Message Date
charles-chenzz
ba9ce3ab08 fix flaky test on dra TestPrepareResources/should_timeout
Co-authored-by: TommyStarK <thomasmilox@gmail.com>
2023-08-03 22:37:54 +08:00
Kevin Klues
0449cef8fd Increase timeout for DRA kubelet plugin client
The 10 second timeout was too low. Given that the retry loop for the
kubelet itself is 90s, increasing the timeout to half of this seems
reasonable. Ideally we would pull in the variable that sets the retry
timeout to 90s and then just set our local timeout to half of that.
Unfortunately, this is not exported, so we settle (for now with just
explicitly setting it to 45s.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2023-07-18 22:45:01 +01:00
Ed Bartosh
0ec99fb0b2 Kubelet DRA: fix failing test cases 2023-07-18 19:06:33 +03:00
Ed Bartosh
f6431c6138 DRA: don't query claims from API server
When a pod is force-deleted UnprepareResources fails to get a claim
from an API server.
PrepareResources should cache claim info required by the
UnprepareResources so that UnprepareResources would get it from
the cache instead of querying API server.
2023-07-18 18:23:10 +03:00
Kubernetes Prow Robot
6d83e22ba4 Merge pull request #118711 from TommyStarK/tom/gh_118436
add unit test for dra/manager.go
2023-07-18 04:17:09 -07:00
charles-chenzz
0372e4b662 add unit test for dra/manager.go.
Co-Authored-By: charles-chenzz <Rekles666@gmail.com>
Signed-off-by: TommyStarK <thomasmilox@gmail.com>
2023-07-18 12:14:27 +02:00
Kubernetes Prow Robot
da2fdf8cc3 Merge pull request #118764 from iholder101/Swap/burstableQoS-impl
Add full cgroup v2 swap support with automatically calculated swap limit for LimitedSwap and Burstable QoS Pods
2023-07-17 19:49:07 -07:00
Kubernetes Prow Robot
bdcf812c95 Merge pull request #118254 from elezar/4009/add-cdi-devices-to-device-plugin
Add CDI devices to device plugin API
2023-07-17 05:21:08 -07:00
Evan Lezar
b57c7e2fe4 Add CDI devices to device plugin API
This change adds CDI device IDs to the ContainerAllocateResponse in the
device plugin API. This allows a device plugin to specify CDI devices
by their unique fully-qualified CDI device names using the related field
in the CRI specification.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-17 11:53:09 +02:00
Kubernetes Prow Robot
900237fada Merge pull request #118635 from ffromani/devmgr-check-pod-running
kubelet: devices: skip allocation for running pods
2023-07-15 05:43:16 -07:00
Kubernetes Prow Robot
cab65e2008 Merge pull request #118816 from PiotrProkop/topo-opts-to-beta
topologymanager: Promote support for improved multi-numa alignment in Topology Manager to beta
2023-07-14 16:55:08 -07:00
Itamar Holder
a30410d9ce LimitedSwap: Automatically configure swap limit for Burstable QoS Pods
After this commit, when LimitedSwap is enabled,
containers would get swap acess limited with respect
the container memory request, total physical memory
on the node, and the swap size on the node.

Pods of Best-Effort / Guaranteed QoS classes don't get
to swap. In addition, container with memory requests
that are equal to their memory limits also don't get to
swap.

The swap limitation is calculated in the following way:
1. Calculate the container's memory proportionate to the node's memory:
- Divide the container's memory request by the total node's physical memory.
  Let's call this value ContainerMemoryProportion.

2. Multiply the container memory proportion by the available
swap memory for Pods:
Meaning: ContainerMemoryProportion * TotalPodsSwapAvailable.

Fore more information:
https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2400-node-swap/README.md

Signed-off-by: Itamar Holder <iholder@redhat.com>
2023-07-14 14:52:28 +03:00
Kubernetes Prow Robot
0086712926 Merge pull request #116922 from sourcelliu/checkpoint
Improve the performance of map usage
2023-07-12 17:59:30 -07:00
Kubernetes Prow Robot
047d040ce7 Merge pull request #119012 from pohly/dra-batch-node-prepare
kubelet: support batched prepare/unprepare in v1alpha3 DRA plugin API
2023-07-12 10:57:37 -07:00
Kubernetes Prow Robot
be222f38f0 Merge pull request #119058 from TommyStarK/dra-state-checkpoint-unit-test
dynamic resource allocation: Improve code coverage of state checkpoint
2023-07-12 07:49:14 -07:00
Patrick Ohly
d743c50bb9 kubelet: support batched prepare/unprepare in v1alpha3 DRA plugin API
Combining all prepare/unprepare operations for a pod enables plugins to
optimize the execution. Plugins can continue to use the v1beta2 API for now,
but should switch. The new API is designed so that plugins which want to work
on each claim one-by-one can do so and then report errors for each claim
separately, i.e. partial success is supported.
2023-07-12 14:50:30 +02:00
TommyStarK
f924bf95df dynamic resource allocation: Improve code coverage of state checkpoint
Signed-off-by: TommyStarK <thomasmilox@gmail.com>
2023-07-12 13:27:18 +02:00
Francesco Romani
c635a7e7d8 node: devicemgr: topomgr: add logs
One of the contributing factors of issues #118559 and #109595 hard to
debug and fix is that the devicemanager has very few logs in important
flow, so it's unnecessarily hard to reconstruct the state from logs.

We add minimal logs to be able to improve troubleshooting.
We add minimal logs to be backport-friendly, deferring a more
comprehensive review of logging to later PRs.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2023-07-12 13:25:36 +02:00
Francesco Romani
3bcf4220ec kubelet: devices: skip allocation for running pods
When kubelet initializes, runs admission for pods and possibly
allocated requested resources. We need to distinguish between
node reboot (no containers running) versus kubelet restart (containers
potentially running).

Running pods should always survive kubelet restart.
This means that device allocation on admission should not be attempted,
because if a container requires devices and is still running when kubelet
is restarting, that container already has devices allocated and working.

Thus, we need to properly detect this scenario in the allocation step
and handle it explicitely. We need to inform
the devicemanager about which pods are already running.

Note that if container runtime is down when kubelet restarts, the
approach implemented here won't work. In this scenario, so on kubelet
restart containers will again fail admission, hitting
https://github.com/kubernetes/kubernetes/issues/118559 again.
This scenario should however be pretty rare.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2023-07-12 13:25:36 +02:00
Kubernetes Prow Robot
e0dafe57a3 Merge pull request #117351 from pohly/dra-generated-resource-claim-names
DRA: generated resource claim names
2023-07-11 10:33:11 -07:00
PiotrProkop
f855a23b45 topologymanager: promote TopologyManagerPolicyOptions feature to beta
* Promote TopologyManagerPolicyOptions feature to beta
* Promote PreferClosestNUMANodes TopologyManagerPolicyOption to beta

Signed-off-by: PiotrProkop <pprokop@nvidia.com>
2023-07-11 15:06:57 +02:00
PiotrProkop
23833b9c81 topologymanager: Increase TopologyManager test coverage by adding negative test cases around NUMA topology discovery
Signed-off-by: PiotrProkop <pprokop@nvidia.com>
2023-07-11 15:04:32 +02:00
Patrick Ohly
444d23bd2f dra: generated name for ResourceClaim from template
Generating the name avoids all potential name collisions. It's not clear how
much of a problem that was because users can avoid them and the deterministic
names for generic ephemeral volumes have not led to reports from users. But
using generated names is not too hard either.

What makes it relatively easy is that the new pod.status.resourceClaimStatus
map stores the generated name for kubelet and node authorizer, i.e. the
information in the pod is sufficient to determine the name of the
ResourceClaim.

The resource claim controller becomes a bit more complex and now needs
permission to modify the pod status. The new failure scenario of "ResourceClaim
created, updating pod status fails" is handled with the help of a new special
"resource.kubernetes.io/pod-claim-name" annotation that together with the owner
reference identifies exactly for what a ResourceClaim was generated, so
updating the pod status can be retried for existing ResourceClaims.

The transition from deterministic names is handled with a special case for that
recovery code path: a ResourceClaim with no annotation and a name that follows
the Kubernetes <= 1.27 naming pattern is assumed to be generated for that pod
claim and gets added to the pod status.

There's no immediate need for it, but just in case that it may become relevant,
the name of the generated ResourceClaim may also be left unset to record that
no claim was needed. Components processing such a pod can skip whatever they
normally would do for the claim. To ensure that they do and also cover other
cases properly ("no known field is set", "must check ownership"),
resourceclaim.Name gets extended.
2023-07-11 14:23:48 +02:00
Kubernetes Prow Robot
bc01306c98 Merge pull request #116738 from AxeZhan/TopologyManagerPolicy
When TopologyManagerPolicy is None, skip checks in NewManager.
2023-07-11 04:53:13 -07:00
Evan Lezar
cd14e97ea8 Add a builder for ContainerAllocateResponse objects
This chagne introduces a helper to construct ContainerAllocateResponse instances.
Test cases are updated to use a new constructor accepting functional options
allowing the response contents to be set based on the test requirements.

This can then be extended to also test additional fields in the device plugin API
such as annotations which are not currently covered or new fields.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-11 11:48:26 +02:00
Evan Lezar
db2a1edbdd Generate empty cdi annotations
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-11 11:48:24 +02:00
Evan Lezar
f0e3c32fe5 Move CDI annotation code to utils package
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-11 11:47:53 +02:00
Kubernetes Prow Robot
7581ae8123 Merge pull request #116739 from moshe010/clone-cdi-devices
kubelet dra: lock before getting claimInfo CDIDevices and annotations fields
2023-07-07 06:31:04 -07:00
Patrick Ohly
bde66bfb55 kubelet dra: restore skipping of unused resource claims
1aeec10efb removed iterating over containers in favor of iterating over pod
claims. This had the unintended consequence that NodePrepareResource gets
called unnecessarily when no container needs the claim. The more natural
behavior is to skip unused resources. This enables (theoretic, at this time)
use cases where some DRA driver relies on the controller part to influence
scheduling, but then doesn't use CDI with containers.
2023-06-27 16:02:31 +02:00
Patrick Ohly
874daa8b52 kubelet dra: fix checking of second pod which uses a claim
When a second pod wanted to use a claim, the obligatory sanity check whether
the pod is really allowed to use the claim ("reserved for") was skipped.
2023-06-27 16:01:11 +02:00
Kubernetes Prow Robot
299b72c587 Merge pull request #114760 from TommyStarK/unit-tests/pkg-kubelet-cm-containermap
kubelet/cm/containermap: Improving test coverage
2023-06-06 11:18:24 -07:00
Kubernetes Prow Robot
484645e817 Merge pull request #116659 from claudiubelu/skip-flaky-tests-2
unit tests: Skip flaky tests on Windows (part 2)
2023-05-23 20:04:48 -07:00
Ian K. Coolidge
cede96336a Depend on k8s.io/utils cpuset
Steps performed:

$ find . -name '*.go' -exec sed -i
's|k8s.io/kubernetes/pkg/kubelet/cm/cpuset|k8s.io/utils/cpuset|g' {} \
$ ./hack/update-vendor.sh
$ ./hack/update-gofmt.sh
$ git rm -r pkg/kubelet/cm/cpuset/
2023-05-03 16:26:09 +00:00
Kubernetes Prow Robot
f5fff0f2bc Merge pull request #117105 from yoongon/feature/assert-order
Swap assert.Equal parameters oders to follow convention
2023-05-01 22:34:22 -07:00
Kubernetes Prow Robot
1241ddc567 Merge pull request #116376 from swatisehgal/device-mgr-recovery-wip
node: device-mgr: Handle recovery flow by checking if healthy devices exist- attempt 2
2023-05-01 21:30:11 -07:00
Moshe Levi
04ad946e8f kubelet dra: lock before getting claimInfo CDIDevices and annotations fields
Currently claimInfo CDIDevices and annotations access directly without RLock.
This can lead to concurrent read write error.

To avoid it we added RLock all before getting the CDIDevices and annotations

Signed-off-by: Moshe Levi <moshele@nvidia.com>
2023-05-01 15:09:43 +03:00
Swati Sehgal
dc1a592632 node: device-mgr: Handle recovery by checking if healthy devices exist
In case of node reboot/kubelet restart, the flow of events involves
obtaining the state from the checkpoint file followed by setting
the `healthDevices`/`unhealthyDevices` to its zero value. This is
done to allow the device plugin to re-register itself so that
capacity can be updated appropriately.

During the allocation phase, we need to check if the resources requested
by the pod have been registered AND healthy devices are present on
the node to be allocated.

Also we need to move this check above `needed==0` where needed is
required - devices allocated to the container (which is obtained from
the checkpoint file) because even in cases where no additional devices
have to be allocated (as they were pre-allocated), we still need to
make sure he devices that were previously allocated are healthy.

Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2023-04-28 14:41:30 +01:00
mantuliu
37ea51fd56 Improve the performance of map usage 2023-04-16 11:41:14 +08:00
kidddddddddddddddddddddd
4e928c96b5 skip checks when topologyPolicyName is PolicyNone 2023-04-14 22:44:24 +08:00
Claudiu Belu
0979d55443 unit tests: Skip flaky tests on Windows (part 2)
Some of the unit tests are currently flaky on Windows. This commit
skips them until they are resolved.
2023-04-13 12:07:18 +00:00
Kubernetes Prow Robot
006ad0576e Merge pull request #116560 from bart0sh/PR107-DRA-get-rid-of-extra-loops
DRA: get rid of unneeded loops over pod containers
2023-04-11 21:16:50 -07:00
Kubernetes Prow Robot
ce56fd7c8b Merge pull request #117152 from samuelkarp/godoc-typo
cpumanager: fix typo in godoc
2023-04-11 20:22:14 -07:00
Kubernetes Prow Robot
d0fc9d16ce Merge pull request #114800 from haoruan/feature-8976-spew-sprintf-refactor
Capture spew.Sprintf() with all our favorite config into a util func
2023-04-11 15:34:57 -07:00
Samuel Karp
ea74a2d877 cpumanager: fix typo in godoc
Signed-off-by: Samuel Karp <samuelkarp@google.com>
2023-04-06 16:48:24 -07:00
Yoon Park
934516791c Swap assert.Equal parameters oders to follow convention 2023-04-05 22:01:40 +09:00
Hao Ruan
f638e2849f replaced spew.Sprintf with a util pretty print function 2023-03-27 09:24:22 +08:00
mantuliu
838ed7feb5 Improve the performance of map usage
Signed-off-by: mantuliu <240951888@qq.com>
2023-03-25 17:08:36 +08:00
Ed Bartosh
1aeec10efb DRA: get rid of unneeded loops over pod containers 2023-03-15 09:41:30 +02:00
Kubernetes Prow Robot
74123a7341 Merge pull request #116621 from moshe010/dra-lock
kubelet dra: add lock to addCDIDevices
2023-03-14 19:27:28 -07:00
Kubernetes Prow Robot
815b1bf0d8 Merge pull request #116558 from klueska/update-dra-kubeletplugin-v1alpha2
Update kubeletplugin API for DRA to v1alpha2
2023-03-14 19:27:06 -07:00