kubernetes

Author	SHA1	Message	Date
Harsha Narayana	208c3868cf	job controller: refactored job controller to be able to inject FakeClock for Unit Test	2022-12-20 21:29:24 +05:30
Aldo Culquicondor	7dc36bdf82	Wait for Pods to finish before considering Failed in Job (#113860 ) * Wait for Pods to finish before considering Failed Limit behavior to feature gates PodDisruptionConditions and JobPodFailurePolicy and jobs with a podFailurePolicy. Change-Id: I926391cc2521b389c8e52962afb0d4a6a845ab8f * Remove check for unsheduled terminating pod Change-Id: I3dc05bb4ea3738604f01bf8cb5fc8cc0f6ea54ec	2022-11-15 09:44:53 -08:00
Michal Wozniak	c803892bd8	Enable the feature into beta	2022-11-09 09:02:40 +01:00
Aldo Culquicondor	4948918155	Graduate JobTrackingWithFinalizers to stable Change-Id: Ifc749a85b1270c0155ac511b91d4681d53236820	2022-11-04 17:05:53 -04:00
Aldo Culquicondor	5e03865f65	Add benchmark for large indexed job Change-Id: I556f0cce5842699c98654cfb5a66e7c8d63b2e2e	2022-11-02 11:56:26 -04:00
Michał Woźniak	3628532311	Extend metrics with the new labels (#113324 ) * Extend job metrics * Refactor TestMetrics to extract its checks into dedicated tests per feature	2022-10-31 08:50:45 -07:00
Aldo Culquicondor	12d308f5c4	Add metric for terminated pods with tracking finalizer Change-Id: I26f3169588c30ed82250cb7baff8e277f8d13bb7	2022-10-20 11:35:20 -04:00
Kubernetes Prow Robot	28ced69b76	Merge pull request #113054 from logicalhan/proxy-metric remove rate limiter metric as it is not in use	2022-10-17 11:09:18 -07:00
Han Kang	2bbd445f50	remove rate limiter metric as it is not in use Change-Id: I91157653e3860eeecc3f572aee88da6ffc65faed	2022-10-13 13:07:11 -07:00
Michal Wozniak	b64e5b2d15	Fix the occasional double-counting job_finished_total metric The reason for the issue is that the metrics were bumped before the final job status update. In case the update failed the path was repeated by the next syncJob leading to double-counting of the metrics. The solution is to delay recording metrics and broadcasting events after the job status update succeeds.	2022-10-13 17:23:03 +02:00
Michal Wozniak	bf9ce70de3	Support handling of pod failures with respect to the specified rules	2022-08-04 18:39:08 +02:00
Aldo Culquicondor	ca8cebe5ba	Fix JobTrackingWithFinalizers when a pod succeeds after the job fails Change-Id: I3be351fb3b53216948a37b1d58224f8fbbf22b47	2022-08-02 19:33:06 -04:00
Davanum Srinivas	a9593d634c	Generate and format files - Run hack/update-codegen.sh - Run hack/update-generated-device-plugin.sh - Run hack/update-generated-protobuf.sh - Run hack/update-generated-runtime.sh - Run hack/update-generated-swagger-docs.sh - Run hack/update-openapi-spec.sh - Run hack/update-gofmt.sh Signed-off-by: Davanum Srinivas <davanum@gmail.com>	2022-07-26 13:14:05 -04:00
Aldo Culquicondor	b492f49c9f	Do not skip job requeue in conflict error Change-Id: Ie97977887a1cc3de58922d73dce92ae1965965bf	2022-07-08 16:14:32 +00:00
Harsha Narayana	eea7dca085	GIT-110239: fix activeDeadlineSeconds enforcement bug GIT-110239: add additional tests with preset Status.StartTime GIT-110239: add additional tests with preset Status.StartTime	2022-06-13 20:06:44 +05:30
Kubernetes Prow Robot	6cd258f9f5	Merge pull request #110292 from mimowo/109904-avoid-duplicate-conditions Avoid duplicate Failed conditions in job status	2022-06-09 14:01:45 -07:00
Michal Wozniak	e298649b6c	Avoid duplicate conditions by updating the pre-existing failed condition in case its status is False or Unknown. In case the status of the pre-existing condition is true we ignore the new condition. If there is no pre-existing failed condition, then append the new failed condition as before. Also, make the condition comparisons less hacky by ignoring timestamp fields in tests.	2022-06-01 19:32:53 +02:00
Wojciech Tyczyński	11b679c66a	Fix event broadcaster shutdown in multiple controllers	2022-05-17 22:14:19 +02:00
Aldo Culquicondor	09caa36718	Fix removing finalizer from finished jobs In some rare race conditions, the job controller might create new pods after the job is declared finished. Change-Id: I8a00429c8845463259cd7f82bb3c241d0011583c	2022-04-20 16:39:10 -04:00
Aldo Culquicondor	53aa05df3a	Don't mark job as failed until expectations are satisfied Change-Id: I99206f35f6f145054c005ab362c792e71b9b15f4	2022-04-20 16:39:10 -04:00
Aldo Culquicondor	8c00f510ef	Graduate JobReadyPods to beta Set podUpdateBatchPeriod to 1s Change-Id: I8a10fd8f8559adad9df179b664b8c82851607855	2022-03-29 10:07:41 -04:00
Aldo Culquicondor	8776931abb	Remove finalizer when orphaned Change-Id: Id88a28755660812a274dffab2693cb8a0ef4235c	2022-03-24 11:57:51 -04:00
Aldo Culquicondor	211e33d93f	Fix: Clean job tracking finalizer from orphan pods Change-Id: I04cd70725fd1830be8daf2dca53f67bc10a379b7	2022-03-24 11:57:51 -04:00
Aldo Culquicondor	2c5d0a273c	Graduate IndexedJob to stable - Lock feature gate to true and schedule for deletion in 1.26 - Remove checks on feature gate - Graduate E2E test to Conformance Change-Id: I6814819d318edaed5c86dae4055f4b050a4d39fd	2022-03-15 13:41:06 -04:00
Abdullah Gharaibeh	b2d2ec9e76	Graduate SuspendJob to GA	2022-02-15 10:46:13 -05:00
Mike Dame	80c01707e0	Wire contexts to Batch controllers (#105491 ) * Wire contexts to Batch controllers * (hold) feedback + updates that overlap with Apps controllers * fixup errors	2021-11-10 14:56:46 -08:00
Kubernetes Prow Robot	8e37a3b324	Merge pull request #103868 from qingsenLi/210723-forget Merge conditional assignment into variable declaration	2021-10-28 16:32:50 -07:00
Aldo Culquicondor	60fc90967b	Count ready pods in job controller When the feature gate JobReadyPods is enabled. Change-Id: I86f93914568de6a7029f9ae92ee7b749686fbf97	2021-10-19 15:18:37 -04:00
Kubernetes Prow Robot	0bfa37dfcc	Merge pull request #105676 from alculquicondor/job-name Fix name for Pods of NonIndexed Jobs	2021-10-14 10:50:12 -07:00
Aldo Culquicondor	4ef9d18abe	Fix name for Pods of NonIndexed Jobs Change-Id: I0ea4685a82f4cdec0caab362d52144476652f95a	2021-10-14 10:55:46 -04:00
Kubernetes Prow Robot	f27e4714ba	Merge pull request #105377 from damemi/wire-contexts-apps Wire contexts to Apps controllers	2021-10-14 06:59:19 -07:00
Mike Dame	41fcb95f2f	Wire contexts to Apps controllers	2021-10-13 16:32:13 -04:00
Aldo Culquicondor	5929ccd391	Track expected removals of Pod finalizers Add the UIDs of Pods for which we are removing finalizers to an in-memory cache. The controller removes UIDs from the cache as Pod updates or deletes come in. This avoids double counting finished Pods when Pod updates arrive after Job status updates. https://github.com/kubernetes/kubernetes/issues/105200	2021-10-04 16:09:58 -04:00
Aldo Culquicondor	95c2a8024c	Parallelize pod updates in job test To potentially reduce the number of job controller syncs. Also reduce the maximum number of pods to sync in tests.	2021-10-01 09:55:53 -04:00
Aldo Culquicondor	a438f16741	Revert "Revert "Add metric job_pod_finished"" This reverts commit `7868fbbe64`.	2021-09-23 12:56:29 -04:00
Aldo Culquicondor	47a957d163	Revert "Revert "Limit number of Pods counted in a single Job sync"" This reverts commit `8bcb780808`.	2021-09-23 12:56:29 -04:00
Aldo Culquicondor	01f27cd93e	Fix log line for target number of running pods	2021-09-23 12:56:29 -04:00
Aldo Culquicondor	eebd678cda	Remove GET job and retries for status updates. Doing a GET right before retrying has 2 problems: - It can masquerade conflicts - It adds an additional delay As for retries, we are better of going through the sync backoff. In the case of conflict, we know that there was a Job update that would trigger another sync, so there is no need to do a rate limited requeue.	2021-09-23 11:48:34 -04:00
Aldo Culquicondor	7868fbbe64	Revert "Add metric job_pod_finished" This reverts commit `a0e7a567c5`.	2021-09-21 15:16:54 -04:00
Aldo Culquicondor	8bcb780808	Revert "Limit number of Pods counted in a single Job sync" This reverts commit `7d9cb88fed`.	2021-09-21 15:16:50 -04:00
Aldo Culquicondor	a0e7a567c5	Add metric job_pod_finished To count the number of pods that the job controller successfully tracked with the JobTrackingWithFinalizers feature gate.	2021-09-15 11:19:47 -04:00
Aldo Culquicondor	7d9cb88fed	Limit number of Pods counted in a single Job sync This prevents big Jobs from starving smaller ones.	2021-09-10 10:32:04 -04:00
Aldo Culquicondor	23ea5d80d6	Fix Job tracking with finalizers for more than 500 pods When doing partial updates for uncountedTerminatedPods, the controller might have removed UIDs for Pods which still had finalizers. Also make more space by removing UIDs that don't have finalizers at the beginning of the sync.	2021-09-01 16:19:04 -04:00
10177505	2740965dc9	Merge conditional assignment into variable declaration	2021-07-23 17:02:19 +08:00
Aldo Culquicondor	5e1b5ec398	Revert counting deleted pods as failures for Job When JobTrackingWithFinalizers is disabled. To preserve existing behavior. Change-Id: Id1752f96feed322911712fe9e918e91e42eca809	2021-07-14 10:03:20 -04:00
Aldo Culquicondor	2dd2622188	Track Job Pods completion in status Through Job.status.uncountedPodUIDs and a Pod finalizer An annotation marks if a job should be tracked with new behavior A separate work queue is used to remove finalizers from orphan pods. Change-Id: I1862e930257a9d1f7f1b2b0a526ed15bc8c248ad	2021-07-08 17:48:05 +00:00
Adhityaa Chandrasekar	ba708e5fc9	graduate SuspendJob to beta Also adds a label to two existing Job metrics. Signed-off-by: Adhityaa Chandrasekar <adtac@google.com>	2021-06-03 18:48:32 +00:00
Aldo Culquicondor	d8aad7944c	Remove unused util CreatePods And rename CreatePodsWithControllerRef to simply CreatePods	2021-05-20 20:27:21 +00:00
Mengxue Zhang	e64e34e029	specify pod name and hostname in indexed job	2021-05-19 15:30:13 +00:00
Kubernetes Prow Robot	548fb43643	Merge pull request #101292 from AliceZhang2016/job_controller_metrics Graduate indexed job to beta	2021-05-07 13:31:44 -07:00

1 2 3

115 Commits