kubernetes

Author	SHA1	Message	Date
Albert Sverdlov	a46bab6930	Fix a job quota related deadlock (#119776 ) * Fix a job quota related deadlock In case ResourceQuota is used and sets a max # of jobs, a CronJob may get trapped in a deadlock: 1. Job quota for a namespace is reached. 2. CronJob controller can't create a new job, because quota is reached. 3. Cleanup of jobs owned by a cronjob doesn't happen, because a control loop iteration is finished because of an error to create a job. To fix this we stop early quitting from a control loop iteration when cronjob reconciliation failed and always let old jobs to be cleaned up. * Dont reorder imports * Don't stop requeuing on reconciliation error Previous code only logged the reconciliation error inside jm.sync() and didn't return the reconciliation error to it's invoker processNextWorkItem(). Adding a copy-paste back to avoid this issue. * Remove copy-pasted cleanupFinishedJobs() Now we always call jm.cleanupFinishedJobs() first and then jm.syncCronJob(). We also extract cronJobCopy and updateStatus outside jm.syncCronJob function and pass pointers to them in both jm.syncCronJob and jm.cleanupFinishedJobs to make delayed updates handling more explicit and not dependent on the order in which cleanupFinishedJobs and syncCronJob are invoked. * Return updateStatus bool instead of changing the reference * Explicitly ignore err in tests to fix linter	2023-08-31 08:25:00 -07:00
Kubernetes Prow Robot	f852d7fead	Merge pull request #118653 from pohly/volume-resource-requirements Volume resource requirements	2023-08-21 14:08:05 -07:00
Kubernetes Prow Robot	6cbc5dfac6	Merge pull request #114095 from aimuz/fix-114083 scheduler: Fix field apiVersion is missing from events reported from taint manager	2023-08-21 07:03:23 -07:00
Patrick Ohly	2472291790	api: introduce separate VolumeResourceRequirements struct PVC and containers shared the same ResourceRequirements struct to define their API. When resource claims were added, that struct got extended, which accidentally also changed the PVC API. To avoid such a mistake from happening again, PVC now uses its own VolumeResourceRequirements struct. The `Claims` field gets removed because risk of breaking someone is low: theoretically, YAML files which have a claims field for volumes now get rejected when validating against the OpenAPI. Such files have never made sense and should be fixed. Code that uses the struct definitions needs to be updated.	2023-08-21 15:31:28 +02:00
Kubernetes Prow Robot	df493712e4	Merge pull request #119874 from kannon92/pod-replacement-policy-typos fix typos for pod replacement policy	2023-08-17 11:21:34 -07:00
git-jxj	a5b3a4b738	cleanup: Update deprecated FromInt to FromInt32 (#119858 ) * redo commit * apply suggestions from liggitt * update Parse function based on suggestions	2023-08-16 09:33:01 -07:00
Kubernetes Prow Robot	d5f2420309	Merge pull request #119914 from luohaha3123/job-feature Job: Change job controller methods receiver to pointer	2023-08-15 23:14:05 -07:00
Kubernetes Prow Robot	fa1fc7a9cb	Merge pull request #119904 from tenzen-y/replace-deprecated-workqueue-lib Job: Replace deprecated workqueue function with supported one	2023-08-15 23:13:52 -07:00
Kubernetes Prow Robot	5638fe5f33	Merge pull request #119214 from kaisoz/refactor-controller-utils-test Rewrite the tests to be table driven	2023-08-15 15:17:55 -07:00
Kubernetes Prow Robot	7407f36b4b	Merge pull request #117992 from liggitt/gc-discovery-flutter Fix duplicate GC event handlers getting added if discovery flutters	2023-08-15 15:16:50 -07:00
lhaha	947c9376f6	change struct methods receiver to pointer	2023-08-12 10:21:14 +08:00
Yuki Iwai	6f27733af8	Job: Replace deprecated workqueue library with supported one Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>	2023-08-11 20:35:36 +09:00
kannon92	f73c253acc	fix typos for pod replacement policy	2023-08-09 20:34:48 +00:00
Tomas Tormo	074d5b5329	Rewrite the tests to be table driven	2023-08-03 08:39:46 +00:00
Kubernetes Prow Robot	18f8cb8398	Merge pull request #118644 from alexzielenski/apiserver/policy/namespaceParamRef KEP-3488: Promote ValidatingAdmissionPolicy to Beta	2023-07-21 17:44:08 -07:00
Alexander Zielenski	ef8670c946	refactor: replace usage of v1alpha1 with v1beta1 v1alpha -> v1beta fill in DenyAction where there is no ParameterNotFoundAction	2023-07-21 13:41:24 -07:00
Kubernetes Prow Robot	a30f6b7922	Merge pull request #119506 from mimowo/fix-job-controller-flaky-test Fix the flaky TestJobApiBackoffReset test	2023-07-21 09:30:07 -07:00
Michal Wozniak	dbea279112	Fix the flaky TestJobApiBackoffReset test	2023-07-21 14:45:04 +02:00
kannon92	74fcf3e766	implementation of PodReplacementPolicy kep in the job controller	2023-07-21 00:44:53 +00:00
Michal Wozniak	35d0af9243	Include ignored pods when computing backoff delay for Job pod failures	2023-07-19 17:39:58 +02:00
Kubernetes Prow Robot	88c8bcbb4a	Merge pull request #115952 from pacoxu/cleanup-cronjob cronjob: return immediately when failed to create job for the namespace is terminating	2023-07-18 21:28:02 -07:00
Kubernetes Prow Robot	d1d86dafb7	Merge pull request #118772 from kannon92/terminating-pod-gc KEP-3939: pod gc changes for pod replacement policy kep	2023-07-18 16:46:03 -07:00
Michał Woźniak	a15c27661e	Job controller implementation of backoff limit per index (#118009 )	2023-07-18 13:44:11 -07:00
Daniel Vega-Myhre	7698fe7639	Add StatefulSet pod index as pod label (#119232 ) * add statefulset pod index as pod label * change statefulset pod index label name * check 3 pods * change label variable name	2023-07-17 12:47:10 -07:00
kannon92	e38ab6d367	Add PodGC changes for PodReplacementPolicy	2023-07-16 23:47:04 +00:00
Kubernetes Prow Robot	84a999923f	Merge pull request #119335 from mimowo/use-final-diff-for-job-pod-creation Ensure final diff is used for setting expectations for Job pod creation	2023-07-14 15:20:54 -07:00
Kubernetes Prow Robot	6f3856f953	Merge pull request #118883 from danielvegamyhre/kep-4017-job Add completion index as pod label for indexed jobs	2023-07-14 12:23:50 -07:00
Michal Wozniak	9564bdc39d	Ensure final diff is used for setting expectations for Job pod creation	2023-07-14 19:09:39 +02:00
Kubernetes Prow Robot	5c72df7281	Merge pull request #118953 from mskrocki/escLib Convert EndpointSlice Reconciler to a library in staging.	2023-07-13 17:13:34 -07:00
Kubernetes Prow Robot	be2cfc9697	Merge pull request #118228 from carlory/move-non-graceful-node-shutdown-to-GA move non-graceful node shutdown to GA	2023-07-13 15:47:37 -07:00
Daniel Vega-Myhre	037091284e	fix unit test bug	2023-07-13 22:38:21 +00:00
Kubernetes Prow Robot	bea27f82d3	Merge pull request #118209 from pohly/dra-pre-scheduled-pods dra: pre-scheduled pods	2023-07-13 14:43:37 -07:00
Daniel Vega-Myhre	a1a5f49bb9	remove statefulset label added to wrong branch	2023-07-13 21:07:17 +00:00
Daniel Vega-Myhre	1ae60c0ed1	use job completion index annotation as label	2023-07-13 21:04:37 +00:00
Jiahui Feng	049614f884	ValidatingAdmissionPolicy controller for Type Checking (#117377 ) * [API REVIEW] ValidatingAdmissionPolicyStatucController config. worker count. * ValidatingAdmissionPolicyStatus controller. * remove CEL typechecking from API server. * fix initializer tests. * remove type checking integration tests from API server integration tests. * validatingadmissionpolicy-status options. * grant access to VAP controller. * add defaulting unit test. * generated: ./hack/update-codegen.sh * add OWNERS for VAP status controller. * type checking test case.	2023-07-13 13:41:50 -07:00
Patrick Ohly	80ab8f0542	dra: handle scheduled pods in kube-controller-manager When someone decides that a Pod should definitely run on a specific node, they can create the Pod with spec.nodeName already set. Some custom scheduler might do that. Then kubelet starts to check the pod and (if DRA is enabled) will refuse to run it, either because the claims are still waiting for the first consumer or the pod wasn't added to reservedFor. Both are things the scheduler normally does. Also, if a pod got scheduled while the DRA feature was off in the kube-scheduler, a pod can reach the same state. The resource claim controller can handle these two cases by taking over for the kube-scheduler when nodeName is set. Triggering an allocation is simpler than in the scheduler because all it takes is creating the right PodSchedulingContext with spec.selectedNode set. There's no need to list nodes because that choice was already made, permanently. Adding the pod to reservedFor also isn't hard. What's currently missing is triggering de-allocation of claims to re-allocate them for the desired node. This is not important for claims that get created for the pod from a template and then only get used once, but it might be worthwhile to add de-allocation in the future.	2023-07-13 21:27:11 +02:00
Patrick Ohly	cffbb1f1b2	dra controller: enhance testing The allocation mode is relevant when clearing the reservedFor: for delayed allocation, deallocation gets requested, for immediate allocation not. Both should get tested. All pre-defined claims now use delayed allocation, just as they would if created normally.	2023-07-13 21:27:11 +02:00
Patrick Ohly	5cec6d798c	dra: revamp event handlers in kube-controller-manager Enabling logging is useful to track what the code is doing. There are some functional changes: - The pod handler checks for existence of claims. This avoids adding pods to the work queue in more cases when nothing needs to be done, at the cost of making the event handlers a bit slower. This will become more important when adding more work to the controller - The handler for deleted ResourceClaim did not check for cache.DeletedFinalStateUnknown.	2023-07-13 21:27:11 +02:00
Kubernetes Prow Robot	4fa97eae1b	Merge pull request #119291 from mimowo/use-jobctx-for-first-pending Pass Job context down to firstPendingIndexes	2023-07-13 09:57:05 -07:00
Michal Wozniak	7e3b53042b	Pass Job context down to firstPendingIndexes	2023-07-13 16:11:06 +02:00
Kubernetes Prow Robot	cd9215915b	Merge pull request #118480 from carlory/gc_metrics podgc metrics should count all pod deletion behaviors	2023-07-13 06:52:05 -07:00
Jordan Liggitt	12a874d227	Preserve resourcequota informers for groups with discovery resolution errors only	2023-07-12 12:29:33 -04:00
Jordan Liggitt	c9a084d59c	Fix duplicate GC event handlers getting added if discovery flutters	2023-07-12 12:29:31 -04:00
Patrick Ohly	98ba89d31d	resourceclaim controller: avoid caching deleted pod unnecessarily We don't need to remember that a pod got deleted when it had no resource claims because the code which checks the cached UIDs only checks for pods which have resource claims.	2023-07-12 16:57:17 +02:00
Patrick Ohly	7d064812bb	kube-controller-manager: finish conversion to contextual logging This removes all exceptions and fixes the remaining unconverted log calls.	2023-07-12 14:57:29 +02:00
Patrick Ohly	1b8ddf6b79	podgc controller: convert to contextual logging	2023-07-12 13:45:10 +02:00
Mengjiao Liu	19869478c1	Migrate /pkg/controller/disruption to structured and contextual logging	2023-07-12 11:30:45 +08:00
Maciej Skrocki	7c873327b6	Convert controller name to reconciler variable.	2023-07-11 18:08:25 +00:00
Maciej Skrocki	29fad383da	move endpointslice reconciler to staging endpointslice repo	2023-07-11 18:08:12 +00:00
Kubernetes Prow Robot	a6890b361d	Merge pull request #119193 from mimowo/sync-job-context Introduce syncJobContext to limit the number of function parameters	2023-07-11 10:33:30 -07:00

1 2 3 4 5 ...

6272 Commits