Michal Wozniak
c803892bd8
Enable the feature into beta
2022-11-09 09:02:40 +01:00
Aldo Culquicondor
4948918155
Graduate JobTrackingWithFinalizers to stable
...
Change-Id: Ifc749a85b1270c0155ac511b91d4681d53236820
2022-11-04 17:05:53 -04:00
Michal Wozniak
bf9ce70de3
Support handling of pod failures with respect to the specified rules
2022-08-04 18:39:08 +02:00
Aldo Culquicondor
ca8cebe5ba
Fix JobTrackingWithFinalizers when a pod succeeds after the job fails
...
Change-Id: I3be351fb3b53216948a37b1d58224f8fbbf22b47
2022-08-02 19:33:06 -04:00
Aldo Culquicondor
b492f49c9f
Do not skip job requeue in conflict error
...
Change-Id: Ie97977887a1cc3de58922d73dce92ae1965965bf
2022-07-08 16:14:32 +00:00
Aldo Culquicondor
62a25920e6
Wait for cache sync in TestSyncPastDeadlineJobFinished
...
Change-Id: I6f023ca6999108f4f86a0f57831d47704cdbb42b
2022-06-24 09:22:59 -04:00
Aldo Culquicondor
817c8bbf59
Increase timeout for TestSyncPastDeadlineJobFinished
...
To mitigate flakiness
Change-Id: I1d0286d16d2b7dd3a605690e9a2d4d2f954701ff
2022-06-21 14:49:10 -04:00
Harsha Narayana
eea7dca085
GIT-110239: fix activeDeadlineSeconds enforcement bug
...
GIT-110239: add additional tests with preset Status.StartTime
GIT-110239: add additional tests with preset Status.StartTime
2022-06-13 20:06:44 +05:30
Kubernetes Prow Robot
6cd258f9f5
Merge pull request #110292 from mimowo/109904-avoid-duplicate-conditions
...
Avoid duplicate Failed conditions in job status
2022-06-09 14:01:45 -07:00
Michal Wozniak
e298649b6c
Avoid duplicate conditions by updating the pre-existing failed condition
...
in case its status is False or Unknown.
In case the status of the pre-existing condition is true we ignore the new
condition. If there is no pre-existing failed condition, then append
the new failed condition as before.
Also, make the condition comparisons less hacky by ignoring timestamp fields
in tests.
2022-06-01 19:32:53 +02:00
Aldo Culquicondor
a5f5eab5fd
Wait for cache to sync in job's TestWatchOrphanPods
...
Otherwise the event handler might not be called.
Change-Id: I23c93c2251b411430a0f2469686db6355d84af2f
2022-05-10 14:18:21 -04:00
Aldo Culquicondor
09caa36718
Fix removing finalizer from finished jobs
...
In some rare race conditions, the job controller might create new pods after the job is declared finished.
Change-Id: I8a00429c8845463259cd7f82bb3c241d0011583c
2022-04-20 16:39:10 -04:00
Aldo Culquicondor
53aa05df3a
Don't mark job as failed until expectations are satisfied
...
Change-Id: I99206f35f6f145054c005ab362c792e71b9b15f4
2022-04-20 16:39:10 -04:00
Aldo Culquicondor
8c00f510ef
Graduate JobReadyPods to beta
...
Set podUpdateBatchPeriod to 1s
Change-Id: I8a10fd8f8559adad9df179b664b8c82851607855
2022-03-29 10:07:41 -04:00
Aldo Culquicondor
2c5d0a273c
Graduate IndexedJob to stable
...
- Lock feature gate to true and schedule for deletion in 1.26
- Remove checks on feature gate
- Graduate E2E test to Conformance
Change-Id: I6814819d318edaed5c86dae4055f4b050a4d39fd
2022-03-15 13:41:06 -04:00
Abdullah Gharaibeh
b2d2ec9e76
Graduate SuspendJob to GA
2022-02-15 10:46:13 -05:00
Mike Dame
80c01707e0
Wire contexts to Batch controllers ( #105491 )
...
* Wire contexts to Batch controllers
* (hold) feedback + updates that overlap with Apps controllers
* fixup errors
2021-11-10 14:56:46 -08:00
Aldo Culquicondor
60fc90967b
Count ready pods in job controller
...
When the feature gate JobReadyPods is enabled.
Change-Id: I86f93914568de6a7029f9ae92ee7b749686fbf97
2021-10-19 15:18:37 -04:00
Aldo Culquicondor
4ef9d18abe
Fix name for Pods of NonIndexed Jobs
...
Change-Id: I0ea4685a82f4cdec0caab362d52144476652f95a
2021-10-14 10:55:46 -04:00
Aldo Culquicondor
5929ccd391
Track expected removals of Pod finalizers
...
Add the UIDs of Pods for which we are removing finalizers to an in-memory cache.
The controller removes UIDs from the cache as Pod updates or deletes come in.
This avoids double counting finished Pods when Pod updates arrive after Job status updates.
https://github.com/kubernetes/kubernetes/issues/105200
2021-10-04 16:09:58 -04:00
Aldo Culquicondor
a438f16741
Revert "Revert "Add metric job_pod_finished""
...
This reverts commit 7868fbbe64
.
2021-09-23 12:56:29 -04:00
Aldo Culquicondor
47a957d163
Revert "Revert "Limit number of Pods counted in a single Job sync""
...
This reverts commit 8bcb780808
.
2021-09-23 12:56:29 -04:00
Aldo Culquicondor
eebd678cda
Remove GET job and retries for status updates.
...
Doing a GET right before retrying has 2 problems:
- It can masquerade conflicts
- It adds an additional delay
As for retries, we are better of going through the sync backoff.
In the case of conflict, we know that there was a Job update that would trigger another sync, so there is no need to do a rate limited requeue.
2021-09-23 11:48:34 -04:00
Kubernetes Prow Robot
76c0573ff4
Merge pull request #105181 from alculquicondor/revert
...
Revert #104739
2021-09-21 16:54:00 -07:00
Aldo Culquicondor
7868fbbe64
Revert "Add metric job_pod_finished"
...
This reverts commit a0e7a567c5
.
2021-09-21 15:16:54 -04:00
Aldo Culquicondor
8bcb780808
Revert "Limit number of Pods counted in a single Job sync"
...
This reverts commit 7d9cb88fed
.
2021-09-21 15:16:50 -04:00
Kubernetes Prow Robot
f55101913f
Merge pull request #105098 from Karthik-K-N/fix-error-format
...
Fix incorrect format specifier in test files
2021-09-20 08:56:09 -07:00
Karthik K N
c651d50202
Fix incorrect format specifier in test files
2021-09-17 16:27:53 +05:30
Aldo Culquicondor
a0e7a567c5
Add metric job_pod_finished
...
To count the number of pods that the job controller successfully tracked with the JobTrackingWithFinalizers feature gate.
2021-09-15 11:19:47 -04:00
Aldo Culquicondor
7d9cb88fed
Limit number of Pods counted in a single Job sync
...
This prevents big Jobs from starving smaller ones.
2021-09-10 10:32:04 -04:00
Aldo Culquicondor
23ea5d80d6
Fix Job tracking with finalizers for more than 500 pods
...
When doing partial updates for uncountedTerminatedPods, the controller might have removed UIDs for Pods which still had finalizers.
Also make more space by removing UIDs that don't have finalizers at the beginning of the sync.
2021-09-01 16:19:04 -04:00
Aldo Culquicondor
5e1b5ec398
Revert counting deleted pods as failures for Job
...
When JobTrackingWithFinalizers is disabled. To preserve existing behavior.
Change-Id: Id1752f96feed322911712fe9e918e91e42eca809
2021-07-14 10:03:20 -04:00
Aldo Culquicondor
2dd2622188
Track Job Pods completion in status
...
Through Job.status.uncountedPodUIDs and a Pod finalizer
An annotation marks if a job should be tracked with new behavior
A separate work queue is used to remove finalizers from orphan pods.
Change-Id: I1862e930257a9d1f7f1b2b0a526ed15bc8c248ad
2021-07-08 17:48:05 +00:00
Adhityaa Chandrasekar
ba708e5fc9
graduate SuspendJob to beta
...
Also adds a label to two existing Job metrics.
Signed-off-by: Adhityaa Chandrasekar <adtac@google.com>
2021-06-03 18:48:32 +00:00
Mengxue Zhang
e64e34e029
specify pod name and hostname in indexed job
2021-05-19 15:30:13 +00:00
Kubernetes Prow Robot
548fb43643
Merge pull request #101292 from AliceZhang2016/job_controller_metrics
...
Graduate indexed job to beta
2021-05-07 13:31:44 -07:00
Mengxue Zhang
2d2ee6bc3a
change default feature gate value of IndexedJob
2021-04-30 14:36:15 +00:00
Mengxue Zhang
4cf7e75841
indexed job: remove pods with invalid index
2021-04-19 14:07:07 +00:00
Kubernetes Prow Robot
0172cbf56c
Merge pull request #99963 from alculquicondor/job_complete_active
...
Remove active pods past completions
2021-04-08 17:10:10 -07:00
Aldo Culquicondor
e6c3d7b34d
Only default Job fields when feature gates are enabled
...
Also use pointer for completionMode enum
2021-03-12 20:46:52 +00:00
Aldo Culquicondor
4af432bab3
Remove active pods past completions
2021-03-10 14:55:40 +00:00
Aldo Culquicondor
8ae0ad2b2f
Fix completed indexed job with repeated indexes
2021-03-09 19:22:45 +00:00
Adhityaa Chandrasekar
a0844da8f7
batch: add suspended job
...
Signed-off-by: Adhityaa Chandrasekar <adtac@google.com>
2021-03-08 20:08:21 +00:00
Kubernetes Prow Robot
170c6a9833
Merge pull request #99806 from alculquicondor/job-adoption-unit
...
Merge tests for getPodsForJob
2021-03-06 12:50:29 -08:00
Aldo Culquicondor
f0f9f1d540
Merge tests for getPodsForJob
2021-03-04 21:09:33 +00:00
Aldo Culquicondor
2dd0c73056
Test for removal of invalid and repeated indexes
...
in Indexed Job
2021-03-04 16:39:34 +00:00
Aldo Culquicondor
8812531b8c
Add completion index to Job Pods
...
When .spec.completionMode="Indexed"
2021-03-03 22:45:53 +00:00
Aldo Culquicondor
609116b147
Test failed pod recreation
...
Change-Id: I31a2e667e9d96c385a921e25347ebeb5a8424e62
2021-02-01 13:20:03 -05:00
Aldo Culquicondor
dbf9e3b2d3
Make sync Job test tables more readable
...
And use t.Run to improve debugging experience
Change-Id: Ia91adbfe9c419cc640abe0efe287f5b9ab715e87
2021-01-27 16:56:41 -05:00
yodarshafrir1
24010022ef
Number of failed jobs should exceed the backoff limit and not big equal.
...
Remove patch in e2e test of backoff limit due to usage of NumRequeues
2020-08-11 11:06:09 +03:00