Pods which are evicted by the nodecontroller due to network
malfunction, or unresponsive kubelet should be differentiated
from termination initiated by other sources. The reason/message
are consumed by kubectl to provide a better summary using get/describe.
Automatic merge from submit-queue
Node controller to not force delete pods
Fixes https://github.com/kubernetes/kubernetes/issues/35145
- [x] e2e tests to test Petset, RC, Job.
- [x] Remove and cover other locations where we force-delete pods within the NodeController.
**Release note**:
<!-- Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access)
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`.
-->
``` release-note
Node controller no longer force-deletes pods from the api-server.
* For StatefulSet (previously PetSet), this change means creation of replacement pods is blocked until old pods are definitely not running (indicated either by the kubelet returning from partitioned state, or deletion of the Node object, or deletion of the instance in the cloud provider, or force deletion of the pod from the api-server). This has the desirable outcome of "fencing" to prevent "split brain" scenarios.
* For all other existing controllers except StatefulSet , this has no effect on the ability of the controller to replace pods because the controllers do not reuse pod names (they use generate-name).
* User-written controllers that reuse names of pod objects should evaluate this change.
```
We had another bug where we confused the hostname with the NodeName.
To avoid this happening again, and to make the code more
self-documenting, we use types.NodeName (a typedef alias for string)
whenever we are referring to the Node.Name.
A tedious but mechanical commit therefore, to change all uses of the
node name to use types.NodeName
Also clean up some of the (many) places where the NodeName is referred
to as a hostname (not true on AWS), or an instanceID (not true on GCE),
etc.
Automatic merge from submit-queue
Fix FakeNodeHandler Update behaviour
Two problems:
1. Get is always using Existing nodes slice, and you will for sure miss any updated data
2. Each Update adds a duplicate node entry to UpdatedNodes slice
For the 1st, we will try to find a node in UpdatedNodes slice (same as for the List).
2nd - append only if there is no node with same name as updated, if there is we will replace object in UpdatedNodes slice.
Two problems:
1. Get is always using Existing nodes slice, and you will for sure miss any
updated data
2. Each Update duplicates node entry in UpdatedNodes slice
For the 1st, try to find a node in UpdatedNodes slice (same as for the List).
2nd - append only if there is no node with same name as updated, if there is
just replace object.
Change-Id: I9ef1cca2788ba946eee37fa1b037c124ad76074c
Automatic merge from submit-queue
Sleep between NodeStatus update retries
Just a thing I found when looking into other problems.
This is pretty much no-risk change fixing wrong behavior. Do you think it should go in 1.4? @pwittrock
Node controller's internalPodInformer will block main thread
if it is not started as a go routine. This patch fixed this
by runing internalPodInformer as a go routine.
Automatic merge from submit-queue
Node controller deletePod return true if there are pods pending deletion
Fixes https://github.com/kubernetes/kubernetes/issues/30536
If a node had a single pod in terminating state, and that node no longer reported healthy, the pod was never deleted by the node controller because it believed there were no pods remaining.
@smarterclayton @ncdc
Automatic merge from submit-queue
Expose flags for new NodeEviction logic in NodeController
Fix#28832
Last PR from the NodeController NodeEviction logic series.
cc @davidopp @lavalamp @mml
Automatic merge from submit-queue
Run goimport for the whole repo
While removing GOMAXPROC and running goimports, I noticed quite a lot of other files also needed a goimport format. Didn't commit `*.generated.go`, `*.deepcopy.go` or files in `vendor`
This is more for testing if it builds.
The only strange thing here is the gopkg.in/gcfg.v1 => github.com/scalingdata/gcfg replace.
cc @jfrazelle @thockin