kubernetes

Author	SHA1	Message	Date
Kubernetes Prow Robot	0f7cc6fcaa	Merge pull request #121778 from Tal-or/mm_metrics kubelet: memorymanager: metrics: add metrics about static allocation	2024-02-20 09:41:50 -08:00
Kubernetes Prow Robot	79e11fe563	Merge pull request #122703 from TommyStarK/fix/dra-manager-should-timeout dra: increase timeout in setupFakeDRADriverGRPCServer to prevent tests to flake	2024-02-13 09:33:17 -08:00
Kubernetes Prow Robot	244fbf94fd	Merge pull request #122698 from daniel-hutao/feat-1 Code Cleanup: Redundant String Conversions and Spelling/Grammar Corrections	2024-02-05 16:57:07 -08:00
Daniel Hu	1baf7d4586	Corrected some spelling and grammatical errors Signed-off-by: Daniel Hu <farmer.hutao@outlook.com>	2024-01-27 10:10:25 +08:00
Kubernetes Prow Robot	3da22db11c	Merge pull request #121499 from matte21/add-comments-to-cpu-accumulator Improve understandability of kubelet's cpu accumulator code	2024-01-26 00:56:21 +01:00
Daniel Hu	d652596e42	Remove redundant string conversions in print statements Signed-off-by: Daniel Hu <farmer.hutao@outlook.com>	2024-01-15 09:57:35 +08:00
TommyStarK	6f021e99cf	dra: increase timeout in setupFakeDRADriverGRPCServer to prevent tests to flake. Signed-off-by: TommyStarK <thomasmilox@gmail.com>	2024-01-11 09:20:04 +01:00
Akihiro Suda	2e999fff02	Fix compiling e2e.test on macOS Fix issue 122650 (regression in PR 122552) ``` $ make WHAT=test/e2e/e2e.test +++ [0109 10:06:53] Building go targets for darwin/amd64 k8s.io/kubernetes/test/e2e/e2e.test (test) package k8s.io/kubernetes/test/e2e imports k8s.io/kubernetes/test/e2e/common imports k8s.io/kubernetes/test/e2e/common/node imports k8s.io/kubernetes/pkg/kubelet imports github.com/opencontainers/runc/libcontainer/userns: C source files not allowed when not using cgo or SWIG: userns_maps.c !!! [0109 10:06:54] Call tree: !!! [0109 10:06:54] 1: /Users/suda/gopath/src/k8s.io/kubernetes/hack/lib/golang.sh:948 kube::golang::build_binaries_for_platform(...) !!! [0109 10:06:54] 2: hack/make-rules/build.sh:27 kube::golang::build_binaries(...) !!! [0109 10:06:54] Call tree: !!! [0109 10:06:54] 1: hack/make-rules/build.sh:27 kube::golang::build_binaries(...) !!! [0109 10:06:54] Call tree: !!! [0109 10:06:54] 1: hack/make-rules/build.sh:27 kube::golang::build_binaries(...) make: *** [all] Error 1 ``` Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>	2024-01-09 10:42:20 +09:00
Kubernetes Prow Robot	c96d7a5b5a	Merge pull request #121774 from charles-chenzz/increase_timeout_in_dra_shouldTimeOut increase timeout in fakeDraDriverGrpcServer to fix flake	2024-01-04 17:59:12 +01:00
weilaaa	eb8f3f194f	use build-in max and min func to instead of k8s.io/utils/integer funcs	2023-12-15 15:09:11 +08:00
Kubernetes Prow Robot	3c1356bc9b	Merge pull request #119764 from linxiulei/reservedTypo Fix error message for invalid resource reservation	2023-12-13 21:25:42 +01:00
Talor Itzhak	ddd60de3f3	memorymanager:metrics: add metrics As part of the memory manager GA graduation effort, we should add metrics in order to iprove observability. The metrics also mentioned in the PR https://github.com/kubernetes/enhancements/pull/4251 (which was not merged yet) Signed-off-by: Talor Itzhak <titzhak@redhat.com>	2023-11-12 09:34:55 +02:00
charles-chenzz	abaf7a800d	increase timeout in fakeDraDriverGrpcServer to fix flake in dra/manger_test	2023-11-07 19:38:27 +08:00
Kubernetes Prow Robot	960431407c	Merge pull request #120715 from gjkim42/do-not-reuse-memory-of-restartable-init-containers Don't reuse memory of a restartable init container	2023-11-01 01:50:45 +01:00
Kubernetes Prow Robot	a5ff0324a9	Merge pull request #120461 from gjkim42/do-not-reuse-device-of-restartable-init-container Don't reuse the device of a restartable init container	2023-10-31 19:15:53 +01:00
Kubernetes Prow Robot	bfeb3c2621	Merge pull request #119447 from gjkim42/do-not-reuse-cpu-set-of-restartable-init-container Don't reuse CPU set of a restartable init container	2023-10-31 19:15:26 +01:00
Kubernetes Prow Robot	191abe34b8	Merge pull request #120550 from adrianchiris/fix-dra-node-reboot DRA: call plugins for claims even if exist in cache	2023-10-26 10:26:59 +02:00
adrianc	3738111337	Add unit tests adjust existing tests and add new test flows to cover new DRA manager behaviour Signed-off-by: adrianc <adrianc@nvidia.com>	2023-10-25 13:20:22 +03:00
adrianc	08b942028f	DRA: call plugins for claims even if exist in cache Today, DRA manager does not call plugin NodePrepareResource for claims that it previously successfully handled, that is, if claims are present in cache (checkpoint) even if node rebooted. After node reboots, it is required to call DRA plugin for resource claims so that plugins may prepare them again in case the resources dont persist reboot. To achieve that, once kubelet is started, we call DRA plugins for claims once if a pod sandbox is required to be created during PodSync. Signed-off-by: adrianc <adrianc@nvidia.com>	2023-10-25 13:20:16 +03:00
matte21	4bba73a2bd	Add comments to cpu accumulator and minor renames The cpu accumulator logic (that select CPUs for containers) has some non-obvious code. This commit adds some comments to make that code easier to understand for new contributors. Some minor renames to improve readability are also performed.	2023-10-24 22:49:54 -04:00
Antonio Ojea	8e0be64b8f	remove data race on the devicemanager client plugin Change-Id: I45b85440a792e5ed2f75a344ec1f0332854d8d6d	2023-10-24 21:35:13 +00:00
Shiming Zhang	35f4d29d73	Fix unit test	2023-10-24 11:06:35 +08:00
Kubernetes Prow Robot	76fc18c528	Merge pull request #120099 from TommyStarK/gh_119469 dra: refactoring overall flow of prepare/unprepare resources	2023-10-23 19:51:53 +02:00
TommyStarK	55e3662b72	dra: refactoring overall flow of prepare/unprepare resources Signed-off-by: TommyStarK <thomasmilox@gmail.com>	2023-10-23 15:11:27 +02:00
Kubernetes Prow Robot	f41ede6241	Merge pull request #118534 from swatisehgal/sample-dp-register-by-default node: sample-device-plugin: register to kubelet by default and ensure re-registration to kubelet on kubelet restarts	2023-10-23 13:41:19 +02:00
Kubernetes Prow Robot	a7b8357a55	Merge pull request #118165 from champly/master kubelet: fix comment typo	2023-10-17 23:28:25 +02:00
Swati Sehgal	9a354fc9d0	node: sample-dp: Add retry to handle device plugin restart failure Add retry mechanism to handle cases where after kubelet restarts, the device plugin unix socket(s) were created but not ready to serve yet. Signed-off-by: Swati Sehgal <swsehgal@redhat.com>	2023-10-17 12:19:10 +01:00
Swati Sehgal	d0d133298d	node: sample-dp: Use fsnotify for kubelet restart detection Add kubeletSocket file to fsnotify instead of polling and waiting for deletion of device plugin unix socket as a way of detecting kubelet restart. We need to ensure that the device plugin re-registers itself after kubelet restart depending on the configured registration mode (auto-registration or controller registration). Signed-off-by: Swati Sehgal <swsehgal@redhat.com>	2023-10-17 12:19:10 +01:00
Swati Sehgal	211d8cc80a	node: sample-dp: stubRegisterControlFunc for controlling registration If the user specifies the intent to control registration process, we rely on registration triggers (deletion of control file) to prompt registration. This behvaiour is expected to be consistent across kubelet restarts and therefore across the watch calls where we watch for changes to the unix socket so we make this part of Stub object instead of a parameter. Co-authored-by: Francesco Romani <fromani@redhat.com> Signed-off-by: Swati Sehgal <swsehgal@redhat.com>	2023-10-17 12:19:10 +01:00
Swati Sehgal	c4c9d61d66	node: sample-dp: Handle re-registration for controlled registrations In case `REGISTER_CONTROL_FILE` is specified, we want to ensure that the registration is triggered by deletion of the control file. This is applicable both when the registration happens for the first time and subsequent ones because of kubelet restarts. Signed-off-by: Swati Sehgal <swsehgal@redhat.com>	2023-10-17 12:19:07 +01:00
Swati Sehgal	6714e678d3	node: sample-dp: register by default and re-register on restarts In issue: 115107 we added an environment variable to control the registration of sample device plugin to kubelet. The intent of this patch is to ensure that the default behaviour of the plugin is to register to kubelet (in case no environment variable is specified). In addition to that, we want to ensure that the plugin registers itself not just once. It should re-register itself to kubelet in case of node reboot or kubelet restarts. Signed-off-by: Swati Sehgal <swsehgal@redhat.com>	2023-10-17 12:14:09 +01:00
Gunju Kim	d2b803246a	Don't reuse the device allocated to the restartable init container	2023-10-17 18:28:29 +09:00
Kubernetes Prow Robot	c7d270302c	Merge pull request #121059 from matte21/improve_err_message_in_cpu_assignments Improve error message in Kubelet CPU assignment logic	2023-10-16 16:48:54 +02:00
Kubernetes Prow Robot	0de29e1d43	Merge pull request #120911 from gjkim42/devicemanager-remove-deprecated-sets-string pkg/kubelet/cm: Remove deprecated sets.String and sets.Int	2023-10-16 16:48:40 +02:00
matte21	d4a5a085a8	Improve error message in cpu assignment logic Include number of requested and available CPUs in the error message when the assignment of CPUs fails because there are less available CPUs than requested.	2023-10-09 13:31:37 -04:00
Gunju Kim	8b5f30ef09	Don't reuse CPU set of a restartable init container	2023-10-06 22:16:15 +09:00
matte21	a213edae2a	Add package-level godoc to pkg/kubelet/cm Add file doc.go with some rudimentary information to package kubelet/cm. This will make it easier for people approaching the kubelet codebase for the first time to quickly understand what's in the package, since its name is abbreviated and hostile to newcomers.	2023-10-05 14:20:51 -04:00
Gunju Kim	a0610a97b3	pkg/kubelet/cm: Remove deprecated sets.String and sets.Int This removes deprecated sets.String and sets.Int - replace sets.String with sets.Set[string] - replace sets.Int with sets.Set[int] - replace sets.NewString with sets.New[string] - replace sets.NewInt with sets.New[int] - replace sets.(OLD).List with sets.List(NEW)	2023-09-27 22:02:15 +09:00
Kubernetes Prow Robot	f9f00da6bc	Merge pull request #118761 from TommyStarK/gh_113831 move common logic of highestSupportedVersion to util package	2023-09-18 13:59:25 -07:00
TommyStarK	42356bfbb3	move common logic of highestSupportedVersion to util package Signed-off-by: TommyStarK <thomasmilox@gmail.com>	2023-09-18 21:25:29 +02:00
Kubernetes Prow Robot	82bca6304b	Merge pull request #119464 from TommyStarK/dra/cleanup-manager-unit-tests dra: cleanup manager unit tests	2023-09-18 07:08:43 -07:00
Gunju Kim	b4e5b868a8	Don't reuse memory of a restartable init container	2023-09-17 14:49:15 +09:00
Eric Lin	286628b030	Fix error message for invalid resource reservation Signed-off-by: Eric Lin <exlin@google.com>	2023-08-20 12:55:26 +00:00
Kubernetes Prow Robot	19deb04a90	Merge pull request #118619 from TommyStarK/gh_113832 dynamic resource allocation: reuse gRPC connection	2023-08-16 09:32:27 -07:00
charles-chenzz	ba9ce3ab08	fix flaky test on dra TestPrepareResources/should_timeout Co-authored-by: TommyStarK <thomasmilox@gmail.com>	2023-08-03 22:37:54 +08:00
TommyStarK	391c1a3ecc	dra: cleanup manager unit tests Signed-off-by: TommyStarK <thomasmilox@gmail.com>	2023-08-02 23:35:45 +02:00
TommyStarK	60a8bca507	dynamic resource allocation: add unit test to check the reuse of the gRPC connection Signed-off-by: TommyStarK <thomasmilox@gmail.com>	2023-07-20 19:22:25 +02:00
TommyStarK	7ffd3063ce	dynamic resource allocation: reuse gRPC connection Signed-off-by: TommyStarK <thomasmilox@gmail.com>	2023-07-19 10:12:52 +02:00
Kevin Klues	0449cef8fd	Increase timeout for DRA kubelet plugin client The 10 second timeout was too low. Given that the retry loop for the kubelet itself is 90s, increasing the timeout to half of this seems reasonable. Ideally we would pull in the variable that sets the retry timeout to 90s and then just set our local timeout to half of that. Unfortunately, this is not exported, so we settle (for now with just explicitly setting it to 45s. Signed-off-by: Kevin Klues <kklues@nvidia.com>	2023-07-18 22:45:01 +01:00
Ed Bartosh	0ec99fb0b2	Kubelet DRA: fix failing test cases	2023-07-18 19:06:33 +03:00

1 2 3 4 5 ...

1346 Commits