Aldo Culquicondor
8776931abb
Remove finalizer when orphaned
...
Change-Id: Id88a28755660812a274dffab2693cb8a0ef4235c
2022-03-24 11:57:51 -04:00
Aldo Culquicondor
211e33d93f
Fix: Clean job tracking finalizer from orphan pods
...
Change-Id: I04cd70725fd1830be8daf2dca53f67bc10a379b7
2022-03-24 11:57:51 -04:00
Aldo Culquicondor
2c5d0a273c
Graduate IndexedJob to stable
...
- Lock feature gate to true and schedule for deletion in 1.26
- Remove checks on feature gate
- Graduate E2E test to Conformance
Change-Id: I6814819d318edaed5c86dae4055f4b050a4d39fd
2022-03-15 13:41:06 -04:00
Abdullah Gharaibeh
b2d2ec9e76
Graduate SuspendJob to GA
2022-02-15 10:46:13 -05:00
Davanum Srinivas
9682b7248f
OWNERS cleanup - Jan 2021 Week 1
...
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2022-01-10 08:14:29 -05:00
Davanum Srinivas
9405e9b55e
Check in OWNERS modified by update-yamlfmt.sh
...
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2021-12-09 21:31:26 -05:00
Mike Dame
80c01707e0
Wire contexts to Batch controllers ( #105491 )
...
* Wire contexts to Batch controllers
* (hold) feedback + updates that overlap with Apps controllers
* fixup errors
2021-11-10 14:56:46 -08:00
Kubernetes Prow Robot
8e37a3b324
Merge pull request #103868 from qingsenLi/210723-forget
...
Merge conditional assignment into variable declaration
2021-10-28 16:32:50 -07:00
Aldo Culquicondor
60fc90967b
Count ready pods in job controller
...
When the feature gate JobReadyPods is enabled.
Change-Id: I86f93914568de6a7029f9ae92ee7b749686fbf97
2021-10-19 15:18:37 -04:00
Kubernetes Prow Robot
0bfa37dfcc
Merge pull request #105676 from alculquicondor/job-name
...
Fix name for Pods of NonIndexed Jobs
2021-10-14 10:50:12 -07:00
Aldo Culquicondor
4ef9d18abe
Fix name for Pods of NonIndexed Jobs
...
Change-Id: I0ea4685a82f4cdec0caab362d52144476652f95a
2021-10-14 10:55:46 -04:00
Kubernetes Prow Robot
f27e4714ba
Merge pull request #105377 from damemi/wire-contexts-apps
...
Wire contexts to Apps controllers
2021-10-14 06:59:19 -07:00
Mike Dame
41fcb95f2f
Wire contexts to Apps controllers
2021-10-13 16:32:13 -04:00
Aldo Culquicondor
5929ccd391
Track expected removals of Pod finalizers
...
Add the UIDs of Pods for which we are removing finalizers to an in-memory cache.
The controller removes UIDs from the cache as Pod updates or deletes come in.
This avoids double counting finished Pods when Pod updates arrive after Job status updates.
https://github.com/kubernetes/kubernetes/issues/105200
2021-10-04 16:09:58 -04:00
Aldo Culquicondor
95c2a8024c
Parallelize pod updates in job test
...
To potentially reduce the number of job controller syncs.
Also reduce the maximum number of pods to sync in tests.
2021-10-01 09:55:53 -04:00
Aldo Culquicondor
a438f16741
Revert "Revert "Add metric job_pod_finished""
...
This reverts commit 7868fbbe64
.
2021-09-23 12:56:29 -04:00
Aldo Culquicondor
47a957d163
Revert "Revert "Limit number of Pods counted in a single Job sync""
...
This reverts commit 8bcb780808
.
2021-09-23 12:56:29 -04:00
Aldo Culquicondor
01f27cd93e
Fix log line for target number of running pods
2021-09-23 12:56:29 -04:00
Aldo Culquicondor
eebd678cda
Remove GET job and retries for status updates.
...
Doing a GET right before retrying has 2 problems:
- It can masquerade conflicts
- It adds an additional delay
As for retries, we are better of going through the sync backoff.
In the case of conflict, we know that there was a Job update that would trigger another sync, so there is no need to do a rate limited requeue.
2021-09-23 11:48:34 -04:00
Kubernetes Prow Robot
76c0573ff4
Merge pull request #105181 from alculquicondor/revert
...
Revert #104739
2021-09-21 16:54:00 -07:00
Aldo Culquicondor
7868fbbe64
Revert "Add metric job_pod_finished"
...
This reverts commit a0e7a567c5
.
2021-09-21 15:16:54 -04:00
Aldo Culquicondor
8bcb780808
Revert "Limit number of Pods counted in a single Job sync"
...
This reverts commit 7d9cb88fed
.
2021-09-21 15:16:50 -04:00
Kubernetes Prow Robot
f55101913f
Merge pull request #105098 from Karthik-K-N/fix-error-format
...
Fix incorrect format specifier in test files
2021-09-20 08:56:09 -07:00
Karthik K N
c651d50202
Fix incorrect format specifier in test files
2021-09-17 16:27:53 +05:30
Aldo Culquicondor
a0e7a567c5
Add metric job_pod_finished
...
To count the number of pods that the job controller successfully tracked with the JobTrackingWithFinalizers feature gate.
2021-09-15 11:19:47 -04:00
Aldo Culquicondor
7d9cb88fed
Limit number of Pods counted in a single Job sync
...
This prevents big Jobs from starving smaller ones.
2021-09-10 10:32:04 -04:00
Aldo Culquicondor
23ea5d80d6
Fix Job tracking with finalizers for more than 500 pods
...
When doing partial updates for uncountedTerminatedPods, the controller might have removed UIDs for Pods which still had finalizers.
Also make more space by removing UIDs that don't have finalizers at the beginning of the sync.
2021-09-01 16:19:04 -04:00
Stephen Augustus
481cf6fbe7
generated: Run hack/update-gofmt.sh
...
Signed-off-by: Stephen Augustus <foo@auggie.dev>
2021-08-24 15:47:49 -04:00
10177505
2740965dc9
Merge conditional assignment into variable declaration
2021-07-23 17:02:19 +08:00
Aldo Culquicondor
5e1b5ec398
Revert counting deleted pods as failures for Job
...
When JobTrackingWithFinalizers is disabled. To preserve existing behavior.
Change-Id: Id1752f96feed322911712fe9e918e91e42eca809
2021-07-14 10:03:20 -04:00
Aldo Culquicondor
2dd2622188
Track Job Pods completion in status
...
Through Job.status.uncountedPodUIDs and a Pod finalizer
An annotation marks if a job should be tracked with new behavior
A separate work queue is used to remove finalizers from orphan pods.
Change-Id: I1862e930257a9d1f7f1b2b0a526ed15bc8c248ad
2021-07-08 17:48:05 +00:00
Adhityaa Chandrasekar
ba708e5fc9
graduate SuspendJob to beta
...
Also adds a label to two existing Job metrics.
Signed-off-by: Adhityaa Chandrasekar <adtac@google.com>
2021-06-03 18:48:32 +00:00
Aldo Culquicondor
d8aad7944c
Remove unused util CreatePods
...
And rename CreatePodsWithControllerRef to simply CreatePods
2021-05-20 20:27:21 +00:00
Mengxue Zhang
e64e34e029
specify pod name and hostname in indexed job
2021-05-19 15:30:13 +00:00
Kubernetes Prow Robot
548fb43643
Merge pull request #101292 from AliceZhang2016/job_controller_metrics
...
Graduate indexed job to beta
2021-05-07 13:31:44 -07:00
Mengxue Zhang
2d2ee6bc3a
change default feature gate value of IndexedJob
2021-04-30 14:36:15 +00:00
Mengxue Zhang
5fd4ab3dc3
add pod create/delete operation limitations per job sync
2021-04-27 18:51:38 +00:00
Mengxue Zhang
cda503fcc9
indexed job: add three metrics to job controller
2021-04-27 18:32:53 +00:00
Mengxue Zhang
4cf7e75841
indexed job: remove pods with invalid index
2021-04-19 14:07:07 +00:00
Kubernetes Prow Robot
0172cbf56c
Merge pull request #99963 from alculquicondor/job_complete_active
...
Remove active pods past completions
2021-04-08 17:10:10 -07:00
Adhityaa Chandrasekar
0a21157c96
job controller: don't mutate shared cache object
...
Signed-off-by: Adhityaa Chandrasekar <adtac@google.com>
2021-03-25 06:36:15 +00:00
Aldo Culquicondor
e6c3d7b34d
Only default Job fields when feature gates are enabled
...
Also use pointer for completionMode enum
2021-03-12 20:46:52 +00:00
Aldo Culquicondor
4af432bab3
Remove active pods past completions
2021-03-10 14:55:40 +00:00
Aldo Culquicondor
8ae0ad2b2f
Fix completed indexed job with repeated indexes
2021-03-09 19:22:45 +00:00
Adhityaa Chandrasekar
a0844da8f7
batch: add suspended job
...
Signed-off-by: Adhityaa Chandrasekar <adtac@google.com>
2021-03-08 20:08:21 +00:00
Kubernetes Prow Robot
170c6a9833
Merge pull request #99806 from alculquicondor/job-adoption-unit
...
Merge tests for getPodsForJob
2021-03-06 12:50:29 -08:00
Aldo Culquicondor
f0f9f1d540
Merge tests for getPodsForJob
2021-03-04 21:09:33 +00:00
Aldo Culquicondor
2dd0c73056
Test for removal of invalid and repeated indexes
...
in Indexed Job
2021-03-04 16:39:34 +00:00
Aldo Culquicondor
8812531b8c
Add completion index to Job Pods
...
When .spec.completionMode="Indexed"
2021-03-03 22:45:53 +00:00
Benjamin Elder
56e092e382
hack/update-bazel.sh
2021-02-28 15:17:29 -08:00
Aldo Culquicondor
609116b147
Test failed pod recreation
...
Change-Id: I31a2e667e9d96c385a921e25347ebeb5a8424e62
2021-02-01 13:20:03 -05:00
Aldo Culquicondor
dbf9e3b2d3
Make sync Job test tables more readable
...
And use t.Run to improve debugging experience
Change-Id: Ia91adbfe9c419cc640abe0efe287f5b9ab715e87
2021-01-27 16:56:41 -05:00
KeZhang
67b40a50c6
Optimize log output
2020-12-08 11:20:24 +08:00
yodarshafrir1
24010022ef
Number of failed jobs should exceed the backoff limit and not big equal.
...
Remove patch in e2e test of backoff limit due to usage of NumRequeues
2020-08-11 11:06:09 +03:00
yodarshafrir1
ca420ddada
Fix job's backoff limit for restart policy Never, rely on number of failures instead of number of NumRequeues
2020-08-07 14:22:40 +03:00
Kubernetes Prow Robot
00d6255f44
Merge pull request #91712 from KobayashiD27/structured-logging-in-event
...
Migrate log to klog.InfoS for staging/src/k8s.io/client-go
2020-06-22 23:53:40 -07:00
Kubernetes Prow Robot
be31023a95
Merge pull request #87155 from kolorful/patch-3
...
Fix a comment in job_controller
2020-06-19 08:51:58 -07:00
Kobayashi Daisuke
4ae11dac2e
Replace StartLogging(klog.Infof) with StartStructuredLogging(0)
2020-06-15 17:48:35 +09:00
KeZhang
884f94ad92
Do not swallow NotFound error for DeletePod in dsc.manage
2020-06-04 16:41:38 +08:00
Zhou Peng
bc9bff0d9e
[pkg/controller/job]: fix comment typo
...
Signed-off-by: Zhou Peng <p@ctriple.cn>
2020-05-30 23:09:10 +08:00
Davanum Srinivas
07d88617e5
Run hack/update-vendor.sh
...
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2020-05-16 07:54:33 -04:00
Davanum Srinivas
442a69c3bd
switch over k/k to use klog v2
...
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2020-05-16 07:54:27 -04:00
Kubernetes Prow Robot
b17ddac4df
Merge pull request #78944 from avorima/golint_fix_job
...
Fix golint errors in pkg/controller/job
2020-04-12 21:57:47 -07:00
taesun_lee
79680b5d9b
Fix pkg/controller typos in some error messages, comments etc
...
- applied review results by LuisSanchez
- Co-Authored-By: Luis Sanchez <sanchezl@redhat.com>
genernal -> general
iniital -> initial
initalObjects -> initialObjects
intentionaly -> intentionally
inforer -> informer
anotother -> another
triger -> trigger
mutli -> multi
Verifyies -> Verifies
valume -> volume
unexpect -> unexpected
unfulfiled -> unfulfilled
implenets -> implements
assignement -> assignment
expectataions -> expectations
nexpected -> unexpected
boundSatsified -> boundSatisfied
externel -> external
calcuates -> calculates
workes -> workers
unitialized -> uninitialized
afater -> after
Espected -> Expected
nodeMontiorGracePeriod -> NodeMonitorGracePeriod
estimateGrracefulTermination -> estimateGracefulTermination
secondrary -> secondary
ShouldRunDaemonPodOnUnscheduableNode -> ShouldRunDaemonPodOnUnschedulableNode
rrror -> error
expectatitons -> expectations
foud -> found
epackage -> package
succesfulJobs -> successfulJobs
namesapce -> namespace
ConfigMapResynce -> ConfigMapResync
2020-02-27 00:15:33 +09:00
Mike Danese
25651408ae
generated: run refactor
2020-02-08 12:30:21 -05:00
Mike Danese
3aa59f7f30
generated: run refactor
2020-02-07 18:16:47 -08:00
Kubernetes Prow Robot
e4926e2d70
Merge pull request #85421 from terrytangyuan/patch-1
...
Fix grammar: have -> has
2020-01-22 08:40:58 -08:00
Kewei Ma
34fce9faee
Fix a comment in job_controller
2020-01-13 10:09:06 -06:00
Kubernetes Prow Robot
42fe74cd2c
Merge pull request #86142 from raz-bn/add-complete-event
...
Adding new job completed event
2019-12-16 23:43:58 -08:00
raz-bn
0224c48120
Job completed event added
2019-12-16 21:41:15 +00:00
Ted Yu
9cff345770
Do not swallow timeout in manageReplicas
2019-12-12 11:27:36 -08:00
Yuan Tang
dd308ca576
Fix grammar: have -> has
2019-11-18 11:17:58 -05:00
Kubernetes Prow Robot
6a19261e96
Merge pull request #84123 from smarterclayton/terminating_cause
...
Handle namespace deletion more gracefully in built-in controllers
2019-11-04 07:55:41 -08:00
wojtekt
7b6bcdf780
Autogenerated code
2019-10-24 20:21:00 +02:00
Clayton Coleman
c6e34e58c5
job: Ignore namespace termination errors when creating pods or jobs
...
Instead of reporting an event or displaying an error, simply exit
when the namespace is being terminated. This reduces the amount of
controller churn on namespace shutdown. While we could technically
exit the entire processing loop early for very large jobs,
we should wait for more evidence that is an issue before changing
that logic substantially.
2019-10-20 18:39:01 -04:00
Yassine TIJANI
c1487840bc
move util/metrics to component-base
...
Signed-off-by: Yassine TIJANI <ytijani@vmware.com>
2019-10-08 14:42:31 +02:00
Yassine TIJANI
7e4c3096fe
move WaitForCacheSync to the sharedInformer package
...
Signed-off-by: Yassine TIJANI <ytijani@vmware.com>
2019-08-22 16:13:41 +01:00
David Xia
fabfd950b1
cleanup: fix some log and error capitalizations
...
Part of https://github.com/kubernetes/kubernetes/issues/15863
2019-07-20 18:26:16 -04:00
Ted Yu
898f099346
Skip unnecessary operations if diff is less than 0
2019-07-17 14:03:08 -07:00
Mario Valderrama
6ac7421535
Update comments
2019-06-14 14:23:13 +02:00
Mario Valderrama
dbbe68601f
Fix golint errors in pkg/controller/job
2019-06-12 20:09:57 +02:00
Fei Xu
9feb0df370
Add pending status for pastBackoffLimitOnFailure
2019-05-21 09:45:29 +08:00
Maciej Szulik
3cc85a1c09
Updates OWNERS files in job controller
2019-04-19 10:35:16 +02:00
stewart-yu
ecbd5427e7
auto-generated file
2019-03-02 12:55:26 +08:00
stewart-yu
e01ff1641c
move config local to every controllers in kube-controller-manager
2019-03-02 12:54:33 +08:00
Kubernetes Prow Robot
808f2cf0ef
Merge pull request #72525 from justinsb/owners_should_not_be_executable
...
Remove executable file permission from OWNERS files
2019-02-14 23:55:45 -08:00
Roy Lenferink
b43c04452f
Updated OWNERS files to include link to docs
2019-02-04 22:33:12 +01:00
Andrew Kim
0bc5508aca
replace client-go/util/integer with k8s.io/utils/integer
2019-01-24 15:34:21 -05:00
Justin SB
dd19b923b7
Remove executable file permission from OWNERS files
2019-01-11 16:42:59 -08:00
Davanum Srinivas
954996e231
Move from glog to klog
...
- Move from the old github.com/golang/glog to k8s.io/klog
- klog as explicit InitFlags() so we add them as necessary
- we update the other repositories that we vendor that made a similar
change from glog to klog
* github.com/kubernetes/repo-infra
* k8s.io/gengo/
* k8s.io/kube-openapi/
* github.com/google/cadvisor
- Entirely remove all references to glog
- Fix some tests by explicit InitFlags in their init() methods
Change-Id: I92db545ff36fcec83afe98f550c9e630098b3135
2018-11-10 07:50:31 -05:00
k8s-ci-robot
e6c5fb4666
Merge pull request #67859 from goodluckbot/job-controller-backoffLimit
...
Fix pastBackoffLimitOnFailure in job controller
2018-10-11 05:49:30 -07:00
goodluckbot
53c3e103d1
Fix pastBackoffLimitOnFailure when backoffLimit is zero
2018-10-11 17:29:11 +08:00
Kubernetes Submit Queue
d744c6ea61
Merge pull request #66085 from liggitt/updatejob
...
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md ">here</a>.
fix updateJob scheduling of resync
fixes #66071
```release-note
NONE
```
2018-08-27 17:40:54 -07:00
Davanum Srinivas
9b43d97cd4
Add Labels to various OWNERS files
...
Will reduce the burden of manually adding labels. Information pulled
from:
https://github.com/kubernetes/community/blob/master/sigs.yaml
Change-Id: I17e661e37719f0bccf63e41347b628269cef7c8b
2018-08-21 13:59:08 -04:00
Da K. Ma
a56121c191
Removed unused functions.
...
Signed-off-by: Da K. Ma <klaus1982.cn@gmail.com>
2018-07-22 20:56:53 +08:00
Jordan Liggitt
6d6842da0b
fix updateJob scheduling of resync
2018-07-11 17:10:10 -04:00
Jeff Grafton
23ceebac22
Run hack/update-bazel.sh
2018-06-22 16:22:57 -07:00
Kubernetes Submit Queue
65819a8f92
Merge pull request #63744 from krmayankk/changelog
...
Automatic merge from submit-queue (batch tested with PRs 63580, 63744, 64541, 64502, 64100). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md ">here</a>.
remove redundant getKey functions from controller tests
```release-note
None
```
2018-06-20 01:27:32 -07:00
Maciej Szulik
d80ed537e5
Rate limit only when an actual error happens, not on update conflicts
2018-06-05 22:53:09 +02:00
Maciej Szulik
5df2755399
Never clean backoff in job controller
2018-06-04 19:28:58 +02:00