Kubernetes Prow Robot
fad52aedfc
Merge pull request #125086 from oxxenix/exponential-backoff
...
add exponential backoff in NodeResourceSlices controller
2024-05-28 02:46:43 -07:00
Oksana Baranova
c4ec24890e
nodeResourceSlicesController: add exponential backoff
2024-05-27 23:12:53 +03:00
Itamar Holder
a6b971f14b
Use kubelet owned directories for mounting rather than /tmp
...
Signed-off-by: Itamar Holder <iholder@redhat.com >
2024-05-21 13:18:16 +03:00
Itamar Holder
74f29880bd
Replace log entry by a warning event
...
Signed-off-by: Itamar Holder <iholder@redhat.com >
2024-05-21 13:18:16 +03:00
Itamar Holder
29535c0463
Warn of swap is enabled on the OS and tmpfs noswap is not supported
...
When --fail-swap-on=false kubelet CLI argument
is provided, but tmpfs noswap is not supported
by the kernel, warn about the risks of memory-backed
volumes being swapped into disk
Signed-off-by: Itamar Holder <iholder@redhat.com >
2024-05-21 13:18:16 +03:00
Kubernetes Prow Robot
8352c09592
Merge pull request #124323 from bart0sh/PR142-dra-fix-cache-integrity
...
kubelet: DRA: fix cache integrity
2024-05-13 09:54:02 -07:00
Alvaro Aleman
6d0ac8c561
Use the generic/typed workqueue throughout
...
This change makes us use the generic workqueue throughout the project in
order to improve type safety and readability of the code.
2024-05-04 14:33:12 -04:00
Ed Bartosh
f24134d7b2
kubelet: DRA: add unit test for ClaimInfo and claimInfoCache
2024-05-03 13:30:31 +00:00
Ed Bartosh
6ce294558a
kubelet: DRA: add stress test
...
The tests calls PrepareResources and UnprepareResources API in
parallel to help discover race conditions.
2024-05-03 13:30:29 +00:00
Kevin Klues
86a18d5333
kubelet: DRA: update manager test to adhere to new claiminfo cache APIs
...
Signed-off-by: Kevin Klues <kklues@nvidia.com >
2024-05-03 13:28:37 +00:00
Kevin Klues
805e7c3434
kubelet: DRA: remove check to set pluginName to DriverName if not in ResourceHandle
...
It has always been validated that a ResourceHandle MUST have DriverName set, so
this check is unnecessary.
Signed-off-by: Kevin Klues <kklues@nvidia.com >
2024-05-03 13:23:29 +00:00
Kevin Klues
f80be2728e
kubelet: DRA: change key of claimInfo cache to "namespace/claimname"
...
Signed-off-by: Kevin Klues <kklues@nvidia.com >
2024-05-03 13:23:29 +00:00
Kevin Klues
639e887631
kubelet: DRA: add a reconcile loop to unprepare claims for deleted pods
...
Signed-off-by: Kevin Klues <kklues@nvidia.com >
2024-05-03 13:23:29 +00:00
Kevin Klues
a8931c6c25
kubelet: DRA: update locking/checkpoint semantics of the claimInfo cache
...
Signed-off-by: Kevin Klues <kklues@nvidia.com >
2024-05-03 13:23:27 +00:00
Kubernetes Prow Robot
1fd835ce59
Merge pull request #123398 from ffromani/remove-legacy-checkpoint
...
node: devicemgr: remove obsolete pre-1.20 checkpoint file support
2024-04-29 14:46:53 -07:00
Marek Siarkowicz
3ee8178768
Cleanup defer from SetFeatureGateDuringTest function call
2024-04-24 20:25:29 +02:00
Patrick Ohly
77341f7595
DRA: remove support for v1alpha2 kubelet API
...
The v1alpha2 API is several releases old. No current drivers should still
depend on it.
2024-04-19 18:27:05 +02:00
Kubernetes Prow Robot
bbfd2145de
Merge pull request #124091 from bitoku/dra-nil-check
...
kubelet: add nil check for Node(Un)PrepareResources.
2024-04-18 10:46:05 -07:00
Kubernetes Prow Robot
528cff12f6
Merge pull request #120969 from skitt/uber-go-mock
...
Switch from golang/mock to uber-go/mock
2024-04-17 23:59:24 -07:00
Francesco Romani
181fb0da51
node: devicemgr: remove obsolete pre-1.20 checkpoint file support
...
In commit 2f426fdba6
we added
compatibility (and tests) to deal with pre-1.20 checkpoint files.
We are now well past the end of support for pre-1.20 kubelets,
so we can get rid of this code.
Signed-off-by: Francesco Romani <fromani@redhat.com >
2024-04-15 14:01:56 +02:00
Ayato Tokubi
d04f87abde
add nil check for Node(Un)PrepareResources.
...
Signed-off-by: Ayato Tokubi <atokubi@redhat.com >
2024-04-04 23:24:25 +00:00
HirazawaUi
10b6319e64
fix slow dra unit test
2024-03-16 22:21:15 +08:00
Ed Bartosh
26881132bd
kubelet: assign Node as an owner for the ResourceSlice
...
Co-authored-by: Patrick Ohly <patrick.ohly@intel.com >
2024-03-15 09:46:13 +02:00
Patrick Ohly
a0add8d2c7
dra api: NodeResourceModel -> ResourceModel
...
When renaming NodeResourceSlice to ResourceSlice, the embedded
[Node]ResourceModel also should have been renamed.
2024-03-14 18:07:36 +01:00
Kevin Klues
fc2134c84c
dra kubelet: fix error log
...
Previously we were returning the error string from 'err' (which is nil), when
we should have been returning it from result.Error. Without this it is hard to
debug issues with NodeUnprepareResources.
Signed-off-by: Kevin Klues <kklues@nvidia.com >
2024-03-11 13:51:29 +00:00
Kevin Klues
13a6dcc21c
dra kubelet: add StructuredResourceModel to UnprepareResources call
...
Signed-off-by: Kevin Klues <kklues@nvidia.com >
2024-03-09 18:08:14 +00:00
Patrick Ohly
0b6a0d686a
dra api: rename NodeResourceSlice -> ResourceSlice
...
While currently those objects only get published by the kubelet for node-local
resources, this could change once we also support network-attached
resources. Dropping the "Node" prefix enables such a future extension.
The NodeName in ResourceSlice and StructuredResourceHandle then becomes
optional. The kubelet still needs to provide one and it must match its own node
name, otherwise it doesn't have permission to access ResourceSlice objects.
2024-03-07 22:22:55 +01:00
Patrick Ohly
d59676a545
dra kubelet: publish NodeResourceSlices
...
The information is received from the DRA driver plugin through a new gRPC
streaming interface. This is backwards compatible with old DRA driver kubelet
plugins, their gRPC server will return "not implemented" and that can be
handled by kubelet. Therefore no API break is needed.
However, DRA drivers need to be updated because the Go API changed. They can
return
status.New(codes.Unimplemented, "no node resource support").Err()
if they don't support the new ListAndWatchResources method and
structured parameters.
The controller in kubelet then synchronizes this information from the driver
with NodeResourceSlice objects, creating, updating and deleting them as needed.
2024-03-07 22:22:13 +01:00
Patrick Ohly
6f1ddfcd2e
kubelet: support structured parameters for preparing resources
...
If the resource handle has data from a structured parameter model, then we need
to pass that to the DRA driver kubelet plugin. Because Kubernetes uses
gogo/protobuf, we cannot use "optional" for that new optional field and have to
resort to "repeated" with a single repetition if present.
This is a new, backwards-compatible field.
That extending the resource.k8s.io changes the checksum of a kubelet checkpoint
is unfortunate. Updating the test cases is a stop-gap measure, the actual
solution will have to be something else before beta.
2024-03-07 22:22:13 +01:00
Stephen Kitt
6bf667af06
Switch from golang/mock to uber-go/mock
...
See https://github.com/golang/mock#gomock : golang/mock is no longer
maintained, and should be replaced by go.uber.org/mock.
This allows golang/mock to be dropped from the status and vendored
fields in unwanted-dependencies.json.
Signed-off-by: Stephen Kitt <skitt@redhat.com >
2024-03-07 09:12:16 +01:00
Kubernetes Prow Robot
70383f3701
Merge pull request #119561 from payall4u/fix-kubelet-panic-when-allocate-device
...
Fix kubelet panic when allocate resource for pod.
2024-02-29 03:06:54 -08:00
Kubernetes Prow Robot
0f7cc6fcaa
Merge pull request #121778 from Tal-or/mm_metrics
...
kubelet: memorymanager: metrics: add metrics about static allocation
2024-02-20 09:41:50 -08:00
Kubernetes Prow Robot
79e11fe563
Merge pull request #122703 from TommyStarK/fix/dra-manager-should-timeout
...
dra: increase timeout in setupFakeDRADriverGRPCServer to prevent tests to flake
2024-02-13 09:33:17 -08:00
Kubernetes Prow Robot
244fbf94fd
Merge pull request #122698 from daniel-hutao/feat-1
...
Code Cleanup: Redundant String Conversions and Spelling/Grammar Corrections
2024-02-05 16:57:07 -08:00
Daniel Hu
1baf7d4586
Corrected some spelling and grammatical errors
...
Signed-off-by: Daniel Hu <farmer.hutao@outlook.com >
2024-01-27 10:10:25 +08:00
Kubernetes Prow Robot
3da22db11c
Merge pull request #121499 from matte21/add-comments-to-cpu-accumulator
...
Improve understandability of kubelet's cpu accumulator code
2024-01-26 00:56:21 +01:00
Daniel Hu
d652596e42
Remove redundant string conversions in print statements
...
Signed-off-by: Daniel Hu <farmer.hutao@outlook.com >
2024-01-15 09:57:35 +08:00
TommyStarK
6f021e99cf
dra: increase timeout in setupFakeDRADriverGRPCServer to prevent tests to flake.
...
Signed-off-by: TommyStarK <thomasmilox@gmail.com >
2024-01-11 09:20:04 +01:00
Akihiro Suda
2e999fff02
Fix compiling e2e.test on macOS
...
Fix issue 122650 (regression in PR 122552)
```
$ make WHAT=test/e2e/e2e.test
+++ [0109 10:06:53] Building go targets for darwin/amd64
k8s.io/kubernetes/test/e2e/e2e.test (test)
package k8s.io/kubernetes/test/e2e
imports k8s.io/kubernetes/test/e2e/common
imports k8s.io/kubernetes/test/e2e/common/node
imports k8s.io/kubernetes/pkg/kubelet
imports github.com/opencontainers/runc/libcontainer/userns: C source files not allowed when not using cgo or SWIG: userns_maps.c
!!! [0109 10:06:54] Call tree:
!!! [0109 10:06:54] 1: /Users/suda/gopath/src/k8s.io/kubernetes/hack/lib/golang.sh:948 kube::golang::build_binaries_for_platform(...)
!!! [0109 10:06:54] 2: hack/make-rules/build.sh:27 kube::golang::build_binaries(...)
!!! [0109 10:06:54] Call tree:
!!! [0109 10:06:54] 1: hack/make-rules/build.sh:27 kube::golang::build_binaries(...)
!!! [0109 10:06:54] Call tree:
!!! [0109 10:06:54] 1: hack/make-rules/build.sh:27 kube::golang::build_binaries(...)
make: *** [all] Error 1
```
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp >
2024-01-09 10:42:20 +09:00
Kubernetes Prow Robot
c96d7a5b5a
Merge pull request #121774 from charles-chenzz/increase_timeout_in_dra_shouldTimeOut
...
increase timeout in fakeDraDriverGrpcServer to fix flake
2024-01-04 17:59:12 +01:00
weilaaa
eb8f3f194f
use build-in max and min func to instead of k8s.io/utils/integer funcs
2023-12-15 15:09:11 +08:00
Kubernetes Prow Robot
3c1356bc9b
Merge pull request #119764 from linxiulei/reservedTypo
...
Fix error message for invalid resource reservation
2023-12-13 21:25:42 +01:00
Talor Itzhak
ddd60de3f3
memorymanager:metrics: add metrics
...
As part of the memory manager GA graduation effort, we should add
metrics in order to iprove observability.
The metrics also mentioned in the PR https://github.com/kubernetes/enhancements/pull/4251 (which was not merged yet)
Signed-off-by: Talor Itzhak <titzhak@redhat.com >
2023-11-12 09:34:55 +02:00
payall4u
d6b8a660b0
Fix kubelet panic when allocate resource for pod.
...
Signed-off-by: payall4u <payall4u@qq.com >
2023-11-12 10:54:05 +08:00
charles-chenzz
abaf7a800d
increase timeout in fakeDraDriverGrpcServer to fix flake in dra/manger_test
2023-11-07 19:38:27 +08:00
Kubernetes Prow Robot
960431407c
Merge pull request #120715 from gjkim42/do-not-reuse-memory-of-restartable-init-containers
...
Don't reuse memory of a restartable init container
2023-11-01 01:50:45 +01:00
Kubernetes Prow Robot
a5ff0324a9
Merge pull request #120461 from gjkim42/do-not-reuse-device-of-restartable-init-container
...
Don't reuse the device of a restartable init container
2023-10-31 19:15:53 +01:00
Kubernetes Prow Robot
bfeb3c2621
Merge pull request #119447 from gjkim42/do-not-reuse-cpu-set-of-restartable-init-container
...
Don't reuse CPU set of a restartable init container
2023-10-31 19:15:26 +01:00
Kubernetes Prow Robot
191abe34b8
Merge pull request #120550 from adrianchiris/fix-dra-node-reboot
...
DRA: call plugins for claims even if exist in cache
2023-10-26 10:26:59 +02:00
adrianc
3738111337
Add unit tests
...
adjust existing tests and add new test flows
to cover new DRA manager behaviour
Signed-off-by: adrianc <adrianc@nvidia.com >
2023-10-25 13:20:22 +03:00