kubernetes/plugin/pkg/scheduler
Kubernetes Submit Queue bc7ccfe93b Merge pull request #50106 from julia-stripe/improve-scheduler-error-handling
Automatic merge from submit-queue

Retry scheduling pods after errors more consistently in scheduler

**What this PR does / why we need it**:

This fixes 2 places in the scheduler where pods can get stuck in Pending forever.  In both these places, errors happen and `sched.config.Error` is not called afterwards. This is a problem because `sched.config.Error` is responsible for requeuing pods to retry scheduling when there are issues (see [here](2540b333b2/plugin/pkg/scheduler/factory/factory.go (L958))), so if we don't call `sched.config.Error` then the pod will never get scheduled (unless the scheduler is restarted).

One of these (where it returns when `ForgetPod` fails instead of continuing and reporting an error) is a regression from [this refactor](https://github.com/kubernetes/kubernetes/commit/ecb962e6585#diff-67f2b61521299ca8d8687b0933bbfb19L234), and with the [old behavior](80f26fa8a8/plugin/pkg/scheduler/scheduler.go (L233-L237)) the error was reported correctly. As far as I can tell changing the error handling in that refactor wasn't intentional.

When AssumePod fails there's never been an error reported but I think adding this will help the scheduler recover when something goes wrong instead of letting pods possibly never get scheduled.

This will help prevent issues like https://github.com/kubernetes/kubernetes/issues/49314 in the future.

**Release note**:

```release-note
Fix incorrect retry logic in scheduler
```
2017-08-07 01:35:17 -07:00
..
algorithm Merge pull request #49547 from k82cn/k8s_42001_0 2017-08-04 14:29:42 -07:00
algorithmprovider Update generated files 2017-08-03 23:03:52 +08:00
api update generated deepcopy code 2017-07-31 22:33:00 +08:00
core Merge pull request #47408 from shiywang/follow-go-code-style 2017-08-05 03:22:54 -07:00
factory Update generated bazel 2017-07-18 23:58:32 +08:00
metrics autogenerated 2017-04-14 10:40:57 -07:00
schedulercache Enhance scheduler cache unit tests to cover OIR in pod spec 2017-07-25 06:35:23 -04:00
testing Fix incorrect call to 'bind' in scheduler 2017-08-03 13:55:00 -07:00
util Merge pull request #47309 from xiang90/util 2017-07-16 20:00:54 -07:00
BUILD Scripted migration from clientset_generated to client-go. 2017-07-17 15:05:37 -07:00
OWNERS Updated OWNERS_ALIASES for scheduler, and added scheduler integration test owners. 2017-07-01 09:28:52 +08:00
scheduler_test.go Fix incorrect call to 'bind' in scheduler 2017-08-03 13:55:00 -07:00
scheduler.go Handle errors more consistently in scheduler 2017-08-04 12:00:22 -07:00
testutil.go Scripted migration from clientset_generated to client-go. 2017-07-17 15:05:37 -07:00