Commit Graph

54 Commits

Author SHA1 Message Date
yodarshafrir1
24010022ef Number of failed jobs should exceed the backoff limit and not big equal.
Remove patch in e2e test of backoff limit due to usage of NumRequeues
2020-08-11 11:06:09 +03:00
yodarshafrir1
ca420ddada Fix job's backoff limit for restart policy Never, rely on number of failures instead of number of NumRequeues 2020-08-07 14:22:40 +03:00
Kubernetes Prow Robot
00d6255f44
Merge pull request #91712 from KobayashiD27/structured-logging-in-event
Migrate log to klog.InfoS for staging/src/k8s.io/client-go
2020-06-22 23:53:40 -07:00
Kubernetes Prow Robot
be31023a95
Merge pull request #87155 from kolorful/patch-3
Fix a comment in job_controller
2020-06-19 08:51:58 -07:00
Kobayashi Daisuke
4ae11dac2e Replace StartLogging(klog.Infof) with StartStructuredLogging(0) 2020-06-15 17:48:35 +09:00
KeZhang
884f94ad92 Do not swallow NotFound error for DeletePod in dsc.manage 2020-06-04 16:41:38 +08:00
Zhou Peng
bc9bff0d9e [pkg/controller/job]: fix comment typo
Signed-off-by: Zhou Peng <p@ctriple.cn>
2020-05-30 23:09:10 +08:00
Davanum Srinivas
442a69c3bd
switch over k/k to use klog v2
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2020-05-16 07:54:27 -04:00
Kubernetes Prow Robot
b17ddac4df
Merge pull request #78944 from avorima/golint_fix_job
Fix golint errors in pkg/controller/job
2020-04-12 21:57:47 -07:00
Mike Danese
25651408ae generated: run refactor 2020-02-08 12:30:21 -05:00
Mike Danese
3aa59f7f30 generated: run refactor 2020-02-07 18:16:47 -08:00
Kubernetes Prow Robot
e4926e2d70
Merge pull request #85421 from terrytangyuan/patch-1
Fix grammar: have -> has
2020-01-22 08:40:58 -08:00
Kewei Ma
34fce9faee
Fix a comment in job_controller 2020-01-13 10:09:06 -06:00
Kubernetes Prow Robot
42fe74cd2c
Merge pull request #86142 from raz-bn/add-complete-event
Adding new job completed event
2019-12-16 23:43:58 -08:00
raz-bn
0224c48120 Job completed event added 2019-12-16 21:41:15 +00:00
Ted Yu
9cff345770 Do not swallow timeout in manageReplicas 2019-12-12 11:27:36 -08:00
Yuan Tang
dd308ca576
Fix grammar: have -> has 2019-11-18 11:17:58 -05:00
Clayton Coleman
c6e34e58c5
job: Ignore namespace termination errors when creating pods or jobs
Instead of reporting an event or displaying an error, simply exit
when the namespace is being terminated. This reduces the amount of
controller churn on namespace shutdown. While we could technically
exit the entire processing loop early for very large jobs,
we should wait for more evidence that is an issue before changing
that logic substantially.
2019-10-20 18:39:01 -04:00
Yassine TIJANI
c1487840bc move util/metrics to component-base
Signed-off-by: Yassine TIJANI <ytijani@vmware.com>
2019-10-08 14:42:31 +02:00
Yassine TIJANI
7e4c3096fe move WaitForCacheSync to the sharedInformer package
Signed-off-by: Yassine TIJANI <ytijani@vmware.com>
2019-08-22 16:13:41 +01:00
Ted Yu
898f099346 Skip unnecessary operations if diff is less than 0 2019-07-17 14:03:08 -07:00
Mario Valderrama
6ac7421535 Update comments 2019-06-14 14:23:13 +02:00
Mario Valderrama
dbbe68601f Fix golint errors in pkg/controller/job 2019-06-12 20:09:57 +02:00
Fei Xu
9feb0df370 Add pending status for pastBackoffLimitOnFailure 2019-05-21 09:45:29 +08:00
Andrew Kim
0bc5508aca replace client-go/util/integer with k8s.io/utils/integer 2019-01-24 15:34:21 -05:00
Davanum Srinivas
954996e231
Move from glog to klog
- Move from the old github.com/golang/glog to k8s.io/klog
- klog as explicit InitFlags() so we add them as necessary
- we update the other repositories that we vendor that made a similar
change from glog to klog
  * github.com/kubernetes/repo-infra
  * k8s.io/gengo/
  * k8s.io/kube-openapi/
  * github.com/google/cadvisor
- Entirely remove all references to glog
- Fix some tests by explicit InitFlags in their init() methods

Change-Id: I92db545ff36fcec83afe98f550c9e630098b3135
2018-11-10 07:50:31 -05:00
k8s-ci-robot
e6c5fb4666
Merge pull request #67859 from goodluckbot/job-controller-backoffLimit
Fix pastBackoffLimitOnFailure in job controller
2018-10-11 05:49:30 -07:00
goodluckbot
53c3e103d1 Fix pastBackoffLimitOnFailure when backoffLimit is zero 2018-10-11 17:29:11 +08:00
Kubernetes Submit Queue
d744c6ea61
Merge pull request #66085 from liggitt/updatejob
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

fix updateJob scheduling of resync

fixes #66071 

```release-note
NONE
```
2018-08-27 17:40:54 -07:00
Da K. Ma
a56121c191 Removed unused functions.
Signed-off-by: Da K. Ma <klaus1982.cn@gmail.com>
2018-07-22 20:56:53 +08:00
Jordan Liggitt
6d6842da0b
fix updateJob scheduling of resync 2018-07-11 17:10:10 -04:00
Maciej Szulik
d80ed537e5
Rate limit only when an actual error happens, not on update conflicts 2018-06-05 22:53:09 +02:00
Maciej Szulik
5df2755399
Never clean backoff in job controller 2018-06-04 19:28:58 +02:00
Kubernetes Submit Queue
7eb88f11d2
Merge pull request #59727 from wgliang/master.time
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

should use time.Since instead of time.Now().Sub

**What this PR does / why we need it**:
should use time.Since instead of time.Now().Sub

**Special notes for your reviewer**:
2018-05-10 20:29:40 -07:00
Kubernetes Submit Queue
139309f798
Merge pull request #58972 from soltysh/issue54870
Automatic merge from submit-queue (batch tested with PRs 61962, 58972, 62509, 62606). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Fix job's backoff limit for restart policy OnFailure

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #54870

**Release note**:
```release-note
NONE
```

/assign janetkuo
2018-04-19 16:47:18 -07:00
Wang Guoliang
89669283fe should use time.Since instead of time.Now().Sub 2018-04-10 12:05:51 +08:00
Mikhail Mazurskiy
468655b76a
Use typed events client directly 2018-04-01 18:57:29 +10:00
Maciej Szulik
5ff7e977bc
Fix job's backoff limit for restart policy OnFailure 2018-03-19 17:40:29 +01:00
Maciej Szulik
1266252dc2
Backoff only when failed pod shows up 2018-03-14 11:49:13 +01:00
cedric lamoriniere
c6e8bd62ad Improves backoff policy in JobController
issues: https://github.com/kubernetes/kubernetes/issues/56853

Add check if the number of pods succeeded increased since the last
check. If yes the backoff delay is cleared. This logic improves the Job
backoff policy when parallelism > 1 and few pods's Job failed but others
succeed.
2018-02-22 10:24:23 +01:00
supereagle
b694d51842 use versiond group clients from client-go 2017-11-07 14:47:22 +08:00
cedric lamoriniere
48116da0ec Improve how JobController use queue for backoff
Centralize the key "forget" and "requeue" process in only on method.
Change the signature of the syncJob method in order to return the
information if it is necessary to forget the backoff delay for a given
key.
2017-09-07 17:14:47 +02:00
cedric lamoriniere
1dbef2f113
Job failure policy support in JobController
Job failure policy integration in JobController. From the
JobSpec.BackoffLimit the JobController will define the backoff
duration between Job retry.

It use the ```workqueue.RateLimitingInterface``` to store the number of
"retry" as "requeue" and the default Job backoff initial duration is set
during the initialization of the ```workqueue.RateLimiter.

Since the number of retry for each job is store in a local structure
"JobController.queue" if the JobController restarts the number of retries
will be lost and the backoff duration will be reset to 0.

Add e2e test for Job backoff failure policy
2017-09-03 12:07:12 +02:00
Joel Smith
1889a6ef52 Slow-start batch pod creation of rs, rc, ds, jobs
Prevent too-large replicas from generating enormous numbers
of events by creating only a few pods at a time, then increasing
the batch size when pod creations succeed. Stop creating batches
of pods when any pod creation errors are encountered.
2017-09-01 09:23:43 -06:00
Kubernetes Submit Queue
25da6e64e2 Merge pull request #48454 from weiwei04/check-job-activeDeadlineSeconds
Automatic merge from submit-queue (batch tested with PRs 44719, 48454)

check job ActiveDeadlineSeconds

**What this PR does / why we need it**:

enqueue a sync task after ActiveDeadlineSeconds

**Which issue this PR fixes** *: 

fixes #32149

**Special notes for your reviewer**:

**Release note**:

```release-note
enqueue a sync task to wake up jobcontroller to check job ActiveDeadlineSeconds in time
```
2017-08-29 08:25:06 -07:00
Wei Wei
46239ea30b check job ActiveDeadlineSeconds 2017-08-29 20:15:11 +08:00
gmarek
0504cfbc25 Make metav1.(Micro)?Time functions take pointers 2017-08-17 11:24:28 +02:00
Mikhail Mazurskiy
042b5642b9
Migrate to NewControllerRef from meta/v1 package 2017-08-06 22:43:46 +10:00
Mikhail Mazurskiy
b28a83a4cf
Migrate to GetControllerOf from meta/v1 package 2017-08-06 22:41:58 +10:00
Chao Xu
97e07e5b52 Let controllers ignore initialization timeout error when creating a pod. 2017-08-03 15:28:08 -07:00