kubernetes

Author	SHA1	Message	Date
Kubernetes Prow Robot	76c0573ff4	Merge pull request #105181 from alculquicondor/revert Revert #104739	2021-09-21 16:54:00 -07:00
Aldo Culquicondor	7868fbbe64	Revert "Add metric job_pod_finished" This reverts commit `a0e7a567c5`.	2021-09-21 15:16:54 -04:00
Aldo Culquicondor	8bcb780808	Revert "Limit number of Pods counted in a single Job sync" This reverts commit `7d9cb88fed`.	2021-09-21 15:16:50 -04:00
Kubernetes Prow Robot	f55101913f	Merge pull request #105098 from Karthik-K-N/fix-error-format Fix incorrect format specifier in test files	2021-09-20 08:56:09 -07:00
Karthik K N	c651d50202	Fix incorrect format specifier in test files	2021-09-17 16:27:53 +05:30
Aldo Culquicondor	a0e7a567c5	Add metric job_pod_finished To count the number of pods that the job controller successfully tracked with the JobTrackingWithFinalizers feature gate.	2021-09-15 11:19:47 -04:00
Aldo Culquicondor	7d9cb88fed	Limit number of Pods counted in a single Job sync This prevents big Jobs from starving smaller ones.	2021-09-10 10:32:04 -04:00
Aldo Culquicondor	23ea5d80d6	Fix Job tracking with finalizers for more than 500 pods When doing partial updates for uncountedTerminatedPods, the controller might have removed UIDs for Pods which still had finalizers. Also make more space by removing UIDs that don't have finalizers at the beginning of the sync.	2021-09-01 16:19:04 -04:00
Aldo Culquicondor	5e1b5ec398	Revert counting deleted pods as failures for Job When JobTrackingWithFinalizers is disabled. To preserve existing behavior. Change-Id: Id1752f96feed322911712fe9e918e91e42eca809	2021-07-14 10:03:20 -04:00
Aldo Culquicondor	2dd2622188	Track Job Pods completion in status Through Job.status.uncountedPodUIDs and a Pod finalizer An annotation marks if a job should be tracked with new behavior A separate work queue is used to remove finalizers from orphan pods. Change-Id: I1862e930257a9d1f7f1b2b0a526ed15bc8c248ad	2021-07-08 17:48:05 +00:00
Adhityaa Chandrasekar	ba708e5fc9	graduate SuspendJob to beta Also adds a label to two existing Job metrics. Signed-off-by: Adhityaa Chandrasekar <adtac@google.com>	2021-06-03 18:48:32 +00:00
Mengxue Zhang	e64e34e029	specify pod name and hostname in indexed job	2021-05-19 15:30:13 +00:00
Kubernetes Prow Robot	548fb43643	Merge pull request #101292 from AliceZhang2016/job_controller_metrics Graduate indexed job to beta	2021-05-07 13:31:44 -07:00
Mengxue Zhang	2d2ee6bc3a	change default feature gate value of IndexedJob	2021-04-30 14:36:15 +00:00
Mengxue Zhang	4cf7e75841	indexed job: remove pods with invalid index	2021-04-19 14:07:07 +00:00
Kubernetes Prow Robot	0172cbf56c	Merge pull request #99963 from alculquicondor/job_complete_active Remove active pods past completions	2021-04-08 17:10:10 -07:00
Aldo Culquicondor	e6c3d7b34d	Only default Job fields when feature gates are enabled Also use pointer for completionMode enum	2021-03-12 20:46:52 +00:00
Aldo Culquicondor	4af432bab3	Remove active pods past completions	2021-03-10 14:55:40 +00:00
Aldo Culquicondor	8ae0ad2b2f	Fix completed indexed job with repeated indexes	2021-03-09 19:22:45 +00:00
Adhityaa Chandrasekar	a0844da8f7	batch: add suspended job Signed-off-by: Adhityaa Chandrasekar <adtac@google.com>	2021-03-08 20:08:21 +00:00
Kubernetes Prow Robot	170c6a9833	Merge pull request #99806 from alculquicondor/job-adoption-unit Merge tests for getPodsForJob	2021-03-06 12:50:29 -08:00
Aldo Culquicondor	f0f9f1d540	Merge tests for getPodsForJob	2021-03-04 21:09:33 +00:00
Aldo Culquicondor	2dd0c73056	Test for removal of invalid and repeated indexes in Indexed Job	2021-03-04 16:39:34 +00:00
Aldo Culquicondor	8812531b8c	Add completion index to Job Pods When .spec.completionMode="Indexed"	2021-03-03 22:45:53 +00:00
Aldo Culquicondor	609116b147	Test failed pod recreation Change-Id: I31a2e667e9d96c385a921e25347ebeb5a8424e62	2021-02-01 13:20:03 -05:00
Aldo Culquicondor	dbf9e3b2d3	Make sync Job test tables more readable And use t.Run to improve debugging experience Change-Id: Ia91adbfe9c419cc640abe0efe287f5b9ab715e87	2021-01-27 16:56:41 -05:00
yodarshafrir1	24010022ef	Number of failed jobs should exceed the backoff limit and not big equal. Remove patch in e2e test of backoff limit due to usage of NumRequeues	2020-08-11 11:06:09 +03:00
yodarshafrir1	ca420ddada	Fix job's backoff limit for restart policy Never, rely on number of failures instead of number of NumRequeues	2020-08-07 14:22:40 +03:00
Kubernetes Prow Robot	b17ddac4df	Merge pull request #78944 from avorima/golint_fix_job Fix golint errors in pkg/controller/job	2020-04-12 21:57:47 -07:00
taesun_lee	79680b5d9b	Fix pkg/controller typos in some error messages, comments etc - applied review results by LuisSanchez - Co-Authored-By: Luis Sanchez <sanchezl@redhat.com> genernal -> general iniital -> initial initalObjects -> initialObjects intentionaly -> intentionally inforer -> informer anotother -> another triger -> trigger mutli -> multi Verifyies -> Verifies valume -> volume unexpect -> unexpected unfulfiled -> unfulfilled implenets -> implements assignement -> assignment expectataions -> expectations nexpected -> unexpected boundSatsified -> boundSatisfied externel -> external calcuates -> calculates workes -> workers unitialized -> uninitialized afater -> after Espected -> Expected nodeMontiorGracePeriod -> NodeMonitorGracePeriod estimateGrracefulTermination -> estimateGracefulTermination secondrary -> secondary ShouldRunDaemonPodOnUnscheduableNode -> ShouldRunDaemonPodOnUnschedulableNode rrror -> error expectatitons -> expectations foud -> found epackage -> package succesfulJobs -> successfulJobs namesapce -> namespace ConfigMapResynce -> ConfigMapResync	2020-02-27 00:15:33 +09:00
David Xia	fabfd950b1	cleanup: fix some log and error capitalizations Part of https://github.com/kubernetes/kubernetes/issues/15863	2019-07-20 18:26:16 -04:00
Mario Valderrama	dbbe68601f	Fix golint errors in pkg/controller/job	2019-06-12 20:09:57 +02:00
Fei Xu	9feb0df370	Add pending status for pastBackoffLimitOnFailure	2019-05-21 09:45:29 +08:00
goodluckbot	53c3e103d1	Fix pastBackoffLimitOnFailure when backoffLimit is zero	2018-10-11 17:29:11 +08:00
Kubernetes Submit Queue	65819a8f92	Merge pull request #63744 from krmayankk/changelog Automatic merge from submit-queue (batch tested with PRs 63580, 63744, 64541, 64502, 64100). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. remove redundant getKey functions from controller tests ```release-note None ```	2018-06-20 01:27:32 -07:00
Maciej Szulik	d80ed537e5	Rate limit only when an actual error happens, not on update conflicts	2018-06-05 22:53:09 +02:00
Mayank Kumar	a1cd3a4bcc	remove redundant getKey functions from tests	2018-05-30 22:15:06 -07:00
David Eads	94e3d94d67	update tests to be specific about the versions they are testing instead of floating	2018-05-01 13:18:41 -04:00
David Eads	a89291a5de	stop duplicating preferred version order	2018-04-26 10:03:36 -04:00
Kubernetes Submit Queue	139309f798	Merge pull request #58972 from soltysh/issue54870 Automatic merge from submit-queue (batch tested with PRs 61962, 58972, 62509, 62606). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Fix job's backoff limit for restart policy OnFailure Which issue(s) this PR fixes (optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged): Fixes #54870 Release note: ```release-note NONE ``` /assign janetkuo	2018-04-19 16:47:18 -07:00
yue9944882	c9962b9644	fixes failing job back off test	2018-04-12 15:58:09 +08:00
Maciej Szulik	5ff7e977bc	Fix job's backoff limit for restart policy OnFailure	2018-03-19 17:40:29 +01:00
Maciej Szulik	1266252dc2	Backoff only when failed pod shows up	2018-03-14 11:49:13 +01:00
cedric lamoriniere	c6e8bd62ad	Improves backoff policy in JobController issues: https://github.com/kubernetes/kubernetes/issues/56853 Add check if the number of pods succeeded increased since the last check. If yes the backoff delay is cleared. This logic improves the Job backoff policy when parallelism > 1 and few pods's Job failed but others succeed.	2018-02-22 10:24:23 +01:00
Maciej Szulik	f760e00af7	Add job controller test verifying if backoff is reseted on success	2017-12-01 15:14:58 +01:00
Dr. Stefan Schimanski	012b085ac8	pkg/apis/core: mechanical import fixes in dependencies	2017-11-09 12:14:08 +01:00
Dr. Stefan Schimanski	7773a30f67	pkg/api/legacyscheme: fixup imports	2017-10-18 17:23:55 +02:00
cedric lamoriniere	48116da0ec	Improve how JobController use queue for backoff Centralize the key "forget" and "requeue" process in only on method. Change the signature of the syncJob method in order to return the information if it is necessary to forget the backoff delay for a given key.	2017-09-07 17:14:47 +02:00
cedric lamoriniere	1dbef2f113	Job failure policy support in JobController Job failure policy integration in JobController. From the JobSpec.BackoffLimit the JobController will define the backoff duration between Job retry. It use the ```workqueue.RateLimitingInterface``` to store the number of "retry" as "requeue" and the default Job backoff initial duration is set during the initialization of the ```workqueue.RateLimiter. Since the number of retry for each job is store in a local structure "JobController.queue" if the JobController restarts the number of retries will be lost and the backoff duration will be reset to 0. Add e2e test for Job backoff failure policy	2017-09-03 12:07:12 +02:00
Joel Smith	1889a6ef52	Slow-start batch pod creation of rs, rc, ds, jobs Prevent too-large replicas from generating enormous numbers of events by creating only a few pods at a time, then increasing the batch size when pod creations succeed. Stop creating batches of pods when any pod creation errors are encountered.	2017-09-01 09:23:43 -06:00

1 2

55 Commits