kubernetes/pkg/controller
Nikhita Raghunath fd8d92a29d pkg/controller/job: re-honor exponential backoff
This commit makes the job controller re-honor exponential backoff for
failed pods. Before this commit, the controller created pods without any
backoff. This is a regression because the controller used to
create pods with an exponential backoff delay before (10s, 20s, 40s ...).

The issue occurs only when the JobTrackingWithFinalizers feature is
enabled (which is enabled by default right now). With this feature, we
get an extra pod update event when the finalizer of a failed pod is
removed.

Note that the pod failure detection and new pod creation happen in the
same reconcile loop so the 2nd pod is created immediately after the 1st
pod fails. The backoff is only applied on 2nd pod failure, which means
that the 3rd pod created 10s after the 2nd pod, 4th pod is created 20s
after the 3rd pod and so on.

This commit fixes a few bugs:

1. Right now, each time `uncounted != nil` and the job does not see a
_new_ failure, `forget` is set to true and the job is removed from the
queue. Which means that this condition is also triggered each time the
finalizer for a failed pod is removed and `NumRequeues` is reset, which
results in a backoff of 0s.

2. Updates `updatePod` to only apply backoff when we see a particular
pod failed for the first time. This is necessary to ensure that the
controller does not apply backoff when it sees a pod update event
for finalizer removal of a failed pod.

3. If `JobsReadyPods` feature is enabled and backoff is 0s, the job is
now enqueued after `podUpdateBatchPeriod` seconds, instead of 0s. The
unit test for this check also had a few bugs:
    - `DefaultJobBackOff` is overwritten to 0 in certain unit tests,
    which meant that `DefaultJobBackOff` was considered to be 0,
    effectively not running any meaningful checks.
    - `JobsReadyPods` was not enabled for test cases that ran tests
    which required the feature gate to be enabled.
    - The check for expected and actual backoff had incorrect
    calculations.
2023-01-12 20:34:10 +05:30
..
apis/config refactor: remove deprecated flags 2022-04-22 20:28:12 +08:00
bootstrap remove rate limiter metric as it is not in use 2022-10-13 13:07:11 -07:00
certificates kubelet: add key encipherment usage only if it is rsa key 2022-12-27 16:04:25 +08:00
clusterroleaggregation Lock ServerSideApply feature to true 2022-09-27 13:48:28 +02:00
cronjob Fix indentation/spacing in comments to render correctly in godoc 2022-12-17 23:27:38 -05:00
daemon Update daemonSet status even if syncDaemonSet fails 2022-12-10 11:45:56 +09:00
deployment Fix indentation/spacing in comments to render correctly in godoc 2022-12-17 23:27:38 -05:00
disruption Fix clearing rate limiter in disruption controller 2023-01-03 15:06:06 +01:00
endpoint Merge pull request #111178 from lucming/cleanup 2022-12-16 19:17:52 -08:00
endpointslice Merge pull request #111178 from lucming/cleanup 2022-12-16 19:17:52 -08:00
endpointslicemirroring endpointslicemirroring handle endpoints with multiple subsets 2022-12-10 11:44:10 +00:00
garbagecollector pkg/controller: Replace deprecated func usage from the k8s.io/utils/pointer pkg 2022-11-23 17:40:23 +02:00
history convert int32 to pointer using library function 2022-07-01 14:58:26 +08:00
job pkg/controller/job: re-honor exponential backoff 2023-01-12 20:34:10 +05:30
namespace remove rate limiter metric as it is not in use 2022-10-13 13:07:11 -07:00
nodeipam add metric for max no. of CIDRs that can be allocated from MultiCIDRSet 2022-12-05 15:18:45 +00:00
nodelifecycle pkg/controller: Replace deprecated func usage from the k8s.io/utils/pointer pkg 2022-11-23 17:40:23 +02:00
podautoscaler spelling mistake rectified 2022-12-29 17:55:17 +00:00
podgc Enable the feature into beta 2022-11-09 09:02:40 +01:00
replicaset Merge pull request #110747 from harshanarayana/cleanup/GIT-110737/logging-improvements 2022-11-03 00:49:34 -07:00
replication Enable propagration of HasSynced 2022-12-14 18:43:33 +00:00
resourceclaim kube-controller-manager: add ResourceClaim controller 2022-11-10 20:23:50 +01:00
resourcequota quota: add an update filter 2022-07-08 18:39:55 -04:00
serviceaccount lock LegacyServiceAccountTokenNoAutoGeneration 2022-12-16 10:45:35 -08:00
statefulset Merge pull request #114870 from mattcary/mutation 2023-01-05 23:16:09 -08:00
storageversiongc pkg/controller/storageversiongc: add constructor function newKubeApiserverLease 2022-11-09 15:52:47 -05:00
testutil Wait for Pods to finish before considering Failed in Job (#113860) 2022-11-15 09:44:53 -08:00
ttl Reduce number of buckets in ttl controller for 2k+ nodes clusters 2022-05-05 12:26:36 +00:00
ttlafterfinished pkg/controller: Replace deprecated func usage from the k8s.io/utils/pointer pkg 2022-11-23 17:40:23 +02:00
util endpoints: remove obsolete ServiceSelectorCache 2022-12-12 08:00:48 -08:00
volume Fix indentation/spacing in comments to render correctly in godoc 2022-12-17 23:27:38 -05:00
controller_ref_manager_test.go Merge pull request #101250 from evertrain/master 2021-11-10 09:19:26 -08:00
controller_ref_manager.go Fix indentation/spacing in comments to render correctly in godoc 2022-12-17 23:27:38 -05:00
controller_utils_test.go NodeLifecycleController: Remove race condition 2022-10-24 19:36:58 +00:00
controller_utils.go Merge pull request #111683 from lucming/code-cleanup5 2022-12-09 15:42:21 -08:00
doc.go
lookup_cache.go
OWNERS add myself as approver to pkg/controller 2022-01-12 19:33:02 -05:00