kubernetes

Author	SHA1	Message	Date
Yuki Iwai	a85f587984	Job: Use built-in min function instead of integer package Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>	2023-11-17 14:10:00 +09:00
Dejan Pejchev	88c0a8be1b	feat: add job_pods_creation_total metric	2023-10-24 17:49:04 +02:00
Dejan Zele Pejchev	f8a4e343a1	Fix tracking of terminating Pods when nothing else changes (#121342 ) * cleanup: refactor pod replacement policy integration test into staged assertion * cleanup: remove typo in job_test.go * refactor PodReplacementPolicy test and remove test for defaulting the policy * fix issue with missing update in job controller for terminating status and refactor pod replacement policy integration test * use t.Cleanup instead of defer in PodReplacementPolicy integration tests * revert t.Cleanup to defer for reseting feature flag in PodReplacementPolicy integration tests	2023-10-24 15:04:46 +02:00
Kubernetes Prow Robot	8149ab3f3f	Merge pull request #121356 from mimowo/backoff-limit-per-index-beta Graduate BackoffLimitPerIndex to Beta	2023-10-23 18:39:58 +02:00
Michal Wozniak	b0d04d933b	Introduce the job_finished_indexes_total metric	2023-10-20 15:19:04 +02:00
Michal Wozniak	6dd0ad5c0f	Graduate BackoffLimitPerIndex to Beta	2023-10-19 12:18:36 +02:00
Kubernetes Prow Robot	6d70013af5	Merge pull request #121147 from kannon92/rm-at-least-no-terminating-count Remove terminating count from rmAtLeast	2023-10-18 00:44:51 +02:00
Kubernetes Prow Robot	27ff547a14	Merge pull request #121011 from kannon92/job-pod-replacement-policy-feature-on-but-api-specified Fix panic when enablement of pod replacement policy is skewed	2023-10-17 21:28:48 +02:00
Yuki Iwai	201c30fba8	Job: Handle error returned from AddEventHandler function (#119917 ) * Job: Handle error returned from AddEventHandler function Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com> * Use the error message the similar to CronJob Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com> * Clean up error messages Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com> * Put the tesing.T on the second place in the args for the newControllerFromClient function Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com> * Put the testing.T on the second place in the args for the newControllerFromClientWithClock function Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com> * Call t.Helper() Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com> * Put the testing.TB on the second place in the args for the createJobControllerWithSharedInformers function and call tb.Helper() there Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com> * Put the testing.TB on the second place in the args for the startJobControllerAndWaitForCaches function and call tb.Helper() there Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com> * Adapt TestFinializerCleanup to the eventhandler error Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com> --------- Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>	2023-10-17 21:28:34 +02:00
Kevin Hannon	7a1ac18bc8	Fix panic if there are more terminating pods than active pods Co-authored-by: Aldo Culquicondor <1299064+alculquicondor@users.noreply.github.com>	2023-10-17 14:50:38 -04:00
Kevin Hannon	d7ee6b9d1b	fix possible panic if pod replacement policy is turned on and jobs do not set pod replacement policy	2023-10-11 08:37:50 -04:00
Kevin Hannon	b96a074bcd	convert pointer to ptr for job controller	2023-10-05 09:30:01 -04:00
Kevin Hannon	a62eb45ae2	Rename job reasons to JobReasons as part of api review	2023-09-19 13:10:22 -04:00
Kevin Hannon	c6e9fba79b	move reasons to api package for job controller	2023-09-14 13:24:29 -04:00
Sharpz7	43fc6b5bdb	Added suggests changes	2023-09-06 03:05:14 +00:00
Sharpz7	e9be1d7438	Test now has coverage!	2023-08-27 05:06:53 +00:00
Sharpz7	cf32ae9453	Initial Commit	2023-08-25 10:35:58 +00:00
Sharpz7	297f04b74a	Added function to remove finalizers as backup	2023-08-25 10:35:57 +00:00
Kubernetes Prow Robot	df493712e4	Merge pull request #119874 from kannon92/pod-replacement-policy-typos fix typos for pod replacement policy	2023-08-17 11:21:34 -07:00
Kubernetes Prow Robot	d5f2420309	Merge pull request #119914 from luohaha3123/job-feature Job: Change job controller methods receiver to pointer	2023-08-15 23:14:05 -07:00
lhaha	947c9376f6	change struct methods receiver to pointer	2023-08-12 10:21:14 +08:00
Yuki Iwai	6f27733af8	Job: Replace deprecated workqueue library with supported one Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>	2023-08-11 20:35:36 +09:00
kannon92	f73c253acc	fix typos for pod replacement policy	2023-08-09 20:34:48 +00:00
kannon92	74fcf3e766	implementation of PodReplacementPolicy kep in the job controller	2023-07-21 00:44:53 +00:00
Michal Wozniak	35d0af9243	Include ignored pods when computing backoff delay for Job pod failures	2023-07-19 17:39:58 +02:00
Michał Woźniak	a15c27661e	Job controller implementation of backoff limit per index (#118009 )	2023-07-18 13:44:11 -07:00
Kubernetes Prow Robot	84a999923f	Merge pull request #119335 from mimowo/use-final-diff-for-job-pod-creation Ensure final diff is used for setting expectations for Job pod creation	2023-07-14 15:20:54 -07:00
Kubernetes Prow Robot	6f3856f953	Merge pull request #118883 from danielvegamyhre/kep-4017-job Add completion index as pod label for indexed jobs	2023-07-14 12:23:50 -07:00
Michal Wozniak	9564bdc39d	Ensure final diff is used for setting expectations for Job pod creation	2023-07-14 19:09:39 +02:00
Michal Wozniak	7e3b53042b	Pass Job context down to firstPendingIndexes	2023-07-13 16:11:06 +02:00
Patrick Ohly	7d064812bb	kube-controller-manager: finish conversion to contextual logging This removes all exceptions and fixes the remaining unconverted log calls.	2023-07-12 14:57:29 +02:00
Michal Wozniak	bf48165232	Remarks to syncJobCtx	2023-07-11 09:44:08 +02:00
Michal Wozniak	990339d4c3	Introduce syncJobContext to limit the number of function parameters	2023-07-11 09:27:21 +02:00
Aldo Culquicondor	f7a1fb76f4	Only declare job as finished after removing all finalizers Change-Id: Id4b01b0e6fabe24134e57e687356e0fc613cead4	2023-07-07 14:08:19 -04:00
kannon92	921b7e6e8f	remove equalReady and replace with k8 util function	2023-07-05 20:11:48 +00:00
Daniel Vega-Myhre	a9afaa1eee	add feature gate	2023-06-27 18:07:17 +00:00
Daniel Vega-Myhre	2176053415	add completion index as pod label	2023-06-26 19:53:14 +00:00
Michal Wozniak	8ed23558b4	Do not set jm.syncJobBatchPeriod=0 if not needed	2023-06-22 11:10:53 +02:00
Michal Wozniak	784a309b91	Do not error in Job controller sync when there are pod failures	2023-06-20 11:31:24 +02:00
Michal Wozniak	74c5ff97f1	Lower the constants for the rate limiter in Job controller	2023-06-16 17:00:04 +02:00
Michal Wozniak	c51a422d78	Cleanup job controller handling of backoff	2023-06-16 14:53:27 +02:00
Ziqi Zhao	7bc449d7e0	add contextual logging to job-controller Signed-off-by: Ziqi Zhao <zhaoziqi9146@gmail.com>	2023-06-14 13:40:02 +08:00
Michal Wozniak	2f6b1d3c0f	Ensure Job sync invocations are batched by 1s periods	2023-06-07 17:32:46 +02:00
Michal Wozniak	70d3bb43e5	Adjust the algorithm for computing the pod finish time Change-Id: Ic282a57169cab8dc498574f08b081914218a1039	2023-06-05 10:06:56 +02:00
Michal Wozniak	0fe27a06f9	Cleanup the Job controller handling of terminating pods	2023-05-19 09:52:08 +02:00
Yuki Iwai	e4340f0d9b	Job: Use generic Set in controller Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>	2023-05-08 15:02:23 +09:00
Sathyanarayanan Saravanamuthu	c84c8add70	Decouple batch/job back-off logic from workqueues (#114768 ) * batch/job: decouple backoff from workqueue Signed-off-by: Sathyanarayanan Saravanamuthu <sathyanarays@vmware.com> * Resolving review comments * Resolving more review comments * Resolving review comments Signed-off-by: Sathyanarayanan Saravanamuthu <sathyanarays@vmware.com> * Computing finish time to now when FinishedAt is unix epoch * Addressing review comments Signed-off-by: Sathyanarayanan Saravanamuthu <sathyanarays@vmware.com> --------- Signed-off-by: Sathyanarayanan Saravanamuthu <sathyanarays@vmware.com>	2023-03-16 10:15:21 -07:00
kannon92	32ac4a9581	left over uncounted from tracking cleanup	2023-02-22 16:45:53 +00:00
Daniel Vega-Myhre	2a81337e7c	update prev succeeded indexes for indexed jobs unconditionally	2023-01-31 19:15:53 +00:00
Nikhita Raghunath	fd8d92a29d	pkg/controller/job: re-honor exponential backoff This commit makes the job controller re-honor exponential backoff for failed pods. Before this commit, the controller created pods without any backoff. This is a regression because the controller used to create pods with an exponential backoff delay before (10s, 20s, 40s ...). The issue occurs only when the JobTrackingWithFinalizers feature is enabled (which is enabled by default right now). With this feature, we get an extra pod update event when the finalizer of a failed pod is removed. Note that the pod failure detection and new pod creation happen in the same reconcile loop so the 2nd pod is created immediately after the 1st pod fails. The backoff is only applied on 2nd pod failure, which means that the 3rd pod created 10s after the 2nd pod, 4th pod is created 20s after the 3rd pod and so on. This commit fixes a few bugs: 1. Right now, each time `uncounted != nil` and the job does not see a _new_ failure, `forget` is set to true and the job is removed from the queue. Which means that this condition is also triggered each time the finalizer for a failed pod is removed and `NumRequeues` is reset, which results in a backoff of 0s. 2. Updates `updatePod` to only apply backoff when we see a particular pod failed for the first time. This is necessary to ensure that the controller does not apply backoff when it sees a pod update event for finalizer removal of a failed pod. 3. If `JobsReadyPods` feature is enabled and backoff is 0s, the job is now enqueued after `podUpdateBatchPeriod` seconds, instead of 0s. The unit test for this check also had a few bugs: - `DefaultJobBackOff` is overwritten to 0 in certain unit tests, which meant that `DefaultJobBackOff` was considered to be 0, effectively not running any meaningful checks. - `JobsReadyPods` was not enabled for test cases that ran tests which required the feature gate to be enabled. - The check for expected and actual backoff had incorrect calculations.	2023-01-12 20:34:10 +05:30

1 2 3 4

166 Commits