kubernetes/pkg/controller
Albert Sverdlov a46bab6930
Fix a job quota related deadlock (#119776)
* Fix a job quota related deadlock

In case ResourceQuota is used and sets a max # of jobs, a CronJob may get
trapped in a deadlock:
  1. Job quota for a namespace is reached.
  2. CronJob controller can't create a new job, because quota is
     reached.
  3. Cleanup of jobs owned by a cronjob doesn't happen, because a
     control loop iteration is finished because of an error to create a
     job.

To fix this we stop early quitting from a control loop iteration when
cronjob reconciliation failed and always let old jobs to be cleaned up.

* Dont reorder imports

* Don't stop requeuing on reconciliation error

Previous code only logged the reconciliation error inside jm.sync() and
didn't return the reconciliation error to it's invoker
processNextWorkItem().

Adding a copy-paste back to avoid this issue.

* Remove copy-pasted cleanupFinishedJobs()

Now we always call jm.cleanupFinishedJobs() first and then
jm.syncCronJob().

We also extract cronJobCopy and updateStatus outside jm.syncCronJob
function and pass pointers to them in both jm.syncCronJob and
jm.cleanupFinishedJobs to make delayed updates handling more explicit
and not dependent on the order in which cleanupFinishedJobs and
syncCronJob are invoked.

* Return updateStatus bool instead of changing the reference

* Explicitly ignore err in tests to fix linter
2023-08-31 08:25:00 -07:00
..
apis/config ValidatingAdmissionPolicy controller for Type Checking (#117377) 2023-07-13 13:41:50 -07:00
bootstrap replace spew methods with dump methods 2023-04-14 08:05:53 +08:00
certificates Merge pull request #113994 from mengjiao-liu/contextual-logging-controller-certificates 2023-06-21 09:03:42 -07:00
clusterroleaggregation Replace uses of diff.ObjectDiff with cmp.Diff 2023-04-12 08:46:12 -07:00
cronjob Fix a job quota related deadlock (#119776) 2023-08-31 08:25:00 -07:00
daemon cleanup: Update deprecated FromInt to FromInt32 (#119858) 2023-08-16 09:33:01 -07:00
deployment cleanup: Update deprecated FromInt to FromInt32 (#119858) 2023-08-16 09:33:01 -07:00
disruption Migrate /pkg/controller/disruption to structured and contextual logging 2023-07-12 11:30:45 +08:00
endpoint move endpointslice reconciler to staging endpointslice repo 2023-07-11 18:08:12 +00:00
endpointslice Convert controller name to reconciler variable. 2023-07-11 18:08:25 +00:00
endpointslicemirroring Merge pull request #118953 from mskrocki/escLib 2023-07-13 17:13:34 -07:00
garbagecollector Fix duplicate GC event handlers getting added if discovery flutters 2023-07-12 12:29:31 -04:00
history api: introduce separate VolumeResourceRequirements struct 2023-08-21 15:31:28 +02:00
job Merge pull request #119874 from kannon92/pod-replacement-policy-typos 2023-08-17 11:21:34 -07:00
namespace namespace controller: use contextual logging 2023-03-13 14:59:17 +08:00
nodeipam Migrated pkg/controller/volume|util|replicaset|nodeipam to contextual logging 2023-07-06 07:39:52 +08:00
nodelifecycle Merge pull request #114095 from aimuz/fix-114083 2023-08-21 07:03:23 -07:00
podautoscaler Merge pull request #118173 from huiwq1990/feat-autoscale-variable 2023-07-02 23:00:50 -07:00
podgc Add PodGC changes for PodReplacementPolicy 2023-07-16 23:47:04 +00:00
replicaset kube-controller-manager: finish conversion to contextual logging 2023-07-12 14:57:29 +02:00
replication Migrated pkg/controller/volume|util|replicaset|nodeipam to contextual logging 2023-07-06 07:39:52 +08:00
resourceclaim dra: handle scheduled pods in kube-controller-manager 2023-07-13 21:27:11 +02:00
resourcequota Preserve resourcequota informers for groups with discovery resolution errors only 2023-07-12 12:29:33 -04:00
serviceaccount implement LegacyServiceAccountTokenCleanUp alpha 2023-05-24 23:20:17 +00:00
statefulset api: introduce separate VolumeResourceRequirements struct 2023-08-21 15:31:28 +02:00
storageversiongc Merge pull request #113986 from songxiao-wang87/runwxs-test2 2023-03-07 04:19:43 -08:00
testutil Merge pull request #114061 from Octopusjust/k8s-pr15 2023-07-05 08:38:57 -07:00
ttl Making a run test. 2023-01-28 03:14:57 +00:00
ttlafterfinished Make use of k8s.io/utils/pointer.Duration 2023-06-18 21:46:26 +03:00
util move endpointslice reconciler to staging endpointslice repo 2023-07-11 18:08:12 +00:00
validatingadmissionpolicystatus refactor: replace usage of v1alpha1 with v1beta1 2023-07-21 13:41:24 -07:00
volume api: introduce separate VolumeResourceRequirements struct 2023-08-21 15:31:28 +02:00
controller_ref_manager_test.go Merge pull request #101250 from evertrain/master 2021-11-10 09:19:26 -08:00
controller_ref_manager.go kube-controller-manager: finish conversion to contextual logging 2023-07-12 14:57:29 +02:00
controller_utils_test.go Merge pull request #119214 from kaisoz/refactor-controller-utils-test 2023-08-15 15:17:55 -07:00
controller_utils.go implementation of PodReplacementPolicy kep in the job controller 2023-07-21 00:44:53 +00:00
doc.go
OWNERS add myself as approver to pkg/controller 2022-01-12 19:33:02 -05:00