This commit adds support for failing deployments based on a timeout
parameter defined in the spec. If there is no progress for the amount
of time defined as progressDeadlineSeconds then the deployment will be
marked as failed by adding a condition with a ProgressDeadlineExceeded
reason in it. Progress in the context of a deployment means the creation
or adoption of a new replica set, scaling up new pods, and scaling down
old pods.
Automatic merge from submit-queue
Fix how we iterate over active jobs when removing them for Replace policy
When fixing the Replace Active removal I used wrong for loop construct which panics :/ This PR fixes that by using for range.
@janetkuo ptal
@jessfraz this will also be a cherry-pick candidate for 1.4, I remember we've picked the aforementioned fix as well
Automatic merge from submit-queue
Set reason and message on Pod during nodecontroller eviction
**What this PR does / why we need it**: Pods which are evicted by the nodecontroller due to network partition, or unresponsive kubelet should be differentiated from termination initiated by other sources. The reason/message are consumed by kubectl to provide a better summary using get/describe.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#35725
**Release note**:
```release-note
Pods that are terminating due to eviction by the nodecontroller (typically due to unresponsive kubelet, or network partition) now surface in `kubectl get` output
as being in state "Unknown", along with a longer description in `kubectl describe` output.
```
Pods which are evicted by the nodecontroller due to network
malfunction, or unresponsive kubelet should be differentiated
from termination initiated by other sources. The reason/message
are consumed by kubectl to provide a better summary using get/describe.
When looking for overlapping deployments, we should also find other deployments that select current deployment's pods,
not just the ones whose pods are selected by current deployment.
Automatic merge from submit-queue
Making the pod.alpha.kubernetes.io/initialized annotation optional in PetSet pods
**What this PR does / why we need it**: As of now, the absence of the annotation `pod.alpha.kubernetes.io/initialized` in PetSets causes the PetSet controller to effectively "pause". Being a debug hook, users expect that its absence has no effect on the working of a PetSet. This PR inverts the logic so that we let the PetSet controller operate as expected in the absence of the annotation.
Letting the annotation remain alpha seems ok. Renaming it to something more meaningful needs further discussion.
**Which issue this PR fixes** _(optional, in `fixes #<issue number>(, #<issue_number>, ...)` format, will close that issue when PR gets merged)_: fixes https://github.com/kubernetes/kubernetes/issues/35498
**Special notes for your reviewer**:
**Release note**:
``` release-note
The annotation "pod.alpha.kubernetes.io/initialized" on StatefulSets (formerly PetSets) is now optional and only encouraged for debug use.
```
cc @erictune @smarterclayton @bprashanth @kubernetes/sig-apps
@kow3ns The examples will need to be cleaned up as well I think later on to remove them.
Automatic merge from submit-queue
Node controller to not force delete pods
Fixes https://github.com/kubernetes/kubernetes/issues/35145
- [x] e2e tests to test Petset, RC, Job.
- [x] Remove and cover other locations where we force-delete pods within the NodeController.
**Release note**:
<!-- Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access)
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`.
-->
``` release-note
Node controller no longer force-deletes pods from the api-server.
* For StatefulSet (previously PetSet), this change means creation of replacement pods is blocked until old pods are definitely not running (indicated either by the kubelet returning from partitioned state, or deletion of the Node object, or deletion of the instance in the cloud provider, or force deletion of the pod from the api-server). This has the desirable outcome of "fencing" to prevent "split brain" scenarios.
* For all other existing controllers except StatefulSet , this has no effect on the ability of the controller to replace pods because the controllers do not reuse pod names (they use generate-name).
* User-written controllers that reuse names of pod objects should evaluate this change.
```
Automatic merge from submit-queue
Add tooling to generate listers
Add lister-gen tool to auto-generate listers. So far this PR only demonstrates replacing the manually-written `StoreToLimitRangeLister` with the generated `LimitRangeLister`, as it's a small and easy swap.
cc @deads2k @liggitt @sttts @nikhiljindal @lavalamp @smarterclayton @derekwaynecarr @kubernetes/sig-api-machinery @kubernetes/rh-cluster-infra
Automatic merge from submit-queue
quota controller uses informers if available for pod calculation
This PR does the following:
1. plumb informer factory into quota registry and evaluators
2. pod quota evaluator uses informers for determining aggregrate usage instead of making direct calls
3. admission code path does not use informers because
1. we do not want to add new watches in apiserver
2. admission code path does not require aggregate usage calculation
As a result, quota controller is much faster in re-calculating quota usage when it observes a pod deletion.
Follow-on PRs will make similar changes for other informer backed resources (pvcs next).
/cc @deads2k @mfojtik @smarterclayton @kubernetes/rh-cluster-infra
Automatic merge from submit-queue
Remove Job also from .status.active for Replace strategy
When iterating over list of Jobs we're removing each of them when strategy is replace. Unfortunately, the job reference was not removed from `.status.active` which cause the controller trying to remove it once again during next run and failed removing what was already removed during previous run. This was cause by not removing the reference previously. This PR fixes that and cleans logs a bit, in that controller.
@erictune fyi
@janetkuo ptal
Automatic merge from submit-queue
Let release_1_5 clientset include multiple versions of a group
Fix#35237
This PR make versioned clientset to include multiple versions of a group. Currently only `batch` has `v1` and `v2alpha1`. The clientset interface now looks like:
```go
BatchV2alpha1() v2alpha1batch.BatchV2alpha1Interface
BatchV1() v1batch.BatchV1Interface
// Deprecated: please explicitly pick a version if possible.
Batch() v1batch.BatchV1Interface
```
Commit "update client-gen to say internalversion rather than unversioned" fixes https://github.com/kubernetes/kubernetes/issues/24481.
cc @kubernetes/sig-api-machinery @soltysh @deads2k @nikhiljindal
```release-note
release_1_5 clientset supports multiple versions of a group.
```
Automatic merge from submit-queue
Make overlapping deployments deletable
@kubernetes/deployment ptal
Fixes https://github.com/kubernetes/kubernetes/issues/34466 by 1) not adding the overlapping annotation in the working deployment, 2) updates observedGeneration for overlapping deployments, and 3) updates the kubectl deployment reaper to do non-cascading deletion for deployments with the overlapping annotation.
Automatic merge from submit-queue
convert SA controller to shared informers
convert the SA controller to shared informer + workqueue.
I think one of @derekwaynecarr @ncdc or @liggitt
Automatic merge from submit-queue
Simplify negotiation in server in preparation for multi version support
This is a pre-factor for #33900 to simplify runtime.NegotiatedSerializer, tighten up a few abstractions that may break when clients can request different client versions, and pave the way for better negotiation.
View this as pure simplification.
Automatic merge from submit-queue
Add sync state loop in master's volume reconciler
At master volume reconciler, the information about which volumes are
attached to nodes is cached in actual state of world. However, this
information might be out of date in case that node is terminated (volume
is detached automatically). In this situation, reconciler assume volume
is still attached and will not issue attach operation when node comes
back. Pods created on those nodes will fail to mount.
This PR adds the logic to periodically sync up the truth for attached
volumes kept in
the actual state cache. If the volume is no longer attached to the node,
the actual state will be updated to reflect the truth. In turn,
reconciler will take actions if needed.
To avoid issuing many concurrent operations on cloud provider, this PR
tries to add batch operation to check whether a list of volumes are
attached to the node instead of one request per volume.
Automatic merge from submit-queue
Moving some force deletion logic from the NC into the PodGC
**What this PR does / why we need it**: Moves some pod force-deletion behavior into the PodGC, which is a better place for these.
This should be a NOP because we're just moving functionality
around and thanks to #35476, the podGC controller should always
run.
Related: https://github.com/kubernetes/kubernetes/pull/34160, https://github.com/kubernetes/kubernetes/issues/35145
cc @gmarek @kubernetes/sig-apps
At master volume reconciler, the information about which volumes are
attached to nodes is cached in actual state of world. However, this
information might be out of date in case that node is terminated (volume
is detached automatically). In this situation, reconciler assume volume
is still attached and will not issue attach operation when node comes
back. Pods created on those nodes will fail to mount.
This PR adds the logic to periodically sync up the truth for attached volumes kept in the actual state cache. If the volume is no longer attached to the node, the actual state will be updated to reflect the truth. In turn, reconciler will take actions if needed.
To avoid issuing many concurrent operations on cloud provider, this PR
tries to add batch operation to check whether a list of volumes are
attached to the node instead of one request per volume.
More details are explained in PR #33760
Alter how runtime.SerializeInfo is represented to simplify negotiation
and reduce the need to allocate during negotiation. Simplify the dynamic
client's logic around negotiating type. Add more tests for media type
handling where necessary.
Automatic merge from submit-queue
Fix potential panic in namespace controller when rapidly create/delet…
Fixes https://github.com/kubernetes/kubernetes/issues/33676
The theory is this could occur in either of the following scenarios:
1. HA environment where a GET to a different API server than what the WATCH was read from
1. In a many controller scenario (i.e. where multiple finalizers participate), a namespace that is created and deleted with the same name could trip up the other namespace controller to see a namespace with the same name that was not actually in a delete state. Added checks to verify uid matches across retry operations.
/cc @liggitt @kubernetes/rh-cluster-infra
Automatic merge from submit-queue
Node status updater should SetNodeStatusUpdateNeeded if it fails to
update status
When volume controller tries to update the node status, if it fails to
update the nodes status, it should call SetNodeStatusUpdateNeeded so
that the volume list could be updated next time.
Objects from shared informer must not be changed, they are shared among all
controllers.
This fixes CacheMutationDetector panic with this output:
CACHE *api.Node[5] ALTERED!
{"metadata":{"name":"ip-172-18-8-71.ec2.internal","selfLink":"/api/v1/nodes/ip-172-18-8-71.ec2.internal","uid":"73d07d16-976e-11e6-8225-0e2f14b56070","resourceVersion":"136","creationTimestamp":"2016-10-21T09:12:12Z","labels":{"beta.kubernetes.io/arch":"amd64","beta.kubernetes.io/instance-type":"t2.medium","beta.kubernetes.io/os":"linux","failure-domain.beta.kubernetes.io/region":"us-east-1","failure-domain.beta.kubernetes.io/zone":"us-east-1d","kubernetes.io/hostname":"ip-172-18-8-71.ec2.internal"},"annotations":{"volumes.kubernetes.io/controller-managed-attach-detach":"true"}},"spec":{"externalID":"i-9cb6180f","providerID":"aws:///us-east-1d/i-9cb6180f"},"status":{"capacity":{"alpha.kubernetes.io/nvidia-gpu":"0","cpu":"2","memory":"4045568Ki","pods":"110"},"allocatable":{"alpha.kubernetes.io/nvidia-gpu":"0","cpu":"2","memory":"4045568Ki","pods":"110"},"conditions":[{"type":"OutOfDisk","status":"False","lastHeartbeatTime":"2016-10-21T09:12:52Z","lastTransitionTime":"2016-10-21T09:12:12Z","reason":"KubeletHasSufficientDisk","message":"kubelet has sufficient disk space available"},{"type":"MemoryPressure","status":"False","lastHeartbeatTime":"2016-10-21T09:12:52Z","lastTransitionTime":"2016-10-21T09:12:12Z","reason":"KubeletHasSufficientMemory","message":"kubelet has sufficient memory available"},{"type":"DiskPressure","status":"False","lastHeartbeatTime":"2016-10-21T09:12:52Z","lastTransitionTime":"2016-10-21T09:12:12Z","reason":"KubeletHasNoDiskPressure","message":"kubelet has no disk pressure"},{"type":"InodePressure","status":"False","lastHeartbeatTime":"2016-10-21T09:12:52Z","lastTransitionTime":"2016-10-21T09:12:12Z","reason":"KubeletHasNoInodePressure","message":"kubelet has no inode pressure"},{"type":"Ready","status":"True","lastHeartbeatTime":"2016-10-21T09:12:52Z","lastTransitionTime":"2016-10-21T09:12:22Z","reason":"KubeletReady","message":"kubelet is posting ready status"}],"addresses":[{"type":"InternalIP","address":"172.18.8.71"},{"type":"LegacyHostIP","address":"172.18.8.71"},{"type":"ExternalIP","address":"54.85.104.236"}],"daemonEndpoints":{"kubeletEndpoint":{"Port":10250}},"nodeInfo":{"machineID":"78a79498db8e4fdc9ac24b5e436a982c","systemUUID":"EC2BB406-5467-4ABE-B54D-D9993C45714F","bootID":"2553d6b8-1ddb-4ef0-902a-d09a807b89ba","kernelVersion":"4.6.7-300.fc24.x86_64","osImage":"Fedora 24 (Cloud Edition)","containerRuntimeVersion":"docker://1.10.3","kubeletVersion":"v1.5.0-alpha.1.726+5aac5eddb809e4","kubeProxyVersion":"v1.5.0-alpha.1.726+5aac5eddb809e4","operatingSystem":"linux","architecture":"amd64"},"images":[{"names":["openshift/origin-release:latest"],"sizeBytes":714569002},{"names":["openshift/origin-haproxy-router-base:latest"],"sizeBytes":294417608},{"names":["openshift/origin-base:latest"],"sizeBytes":275310761},{"names":["docker.io/centos@sha256:2ae0d2c881c7123870114fb9cc7afabd1e31f9888dac8286884f6cf59373ed9b","docker.io/centos:centos7"],"sizeBytes":196744353},{"names":["gcr.io/google_containers/busybox@sha256:4bdd623e848417d96127e16037743f0cd8b528c026e9175e22a84f639eca58ff","gcr.io/google_containers/busybox:1.24"],"sizeBytes":1113554},{"names":["gcr.io/google_containers/pause-amd64@sha256:163ac025575b775d1c0f9bf0bdd0f086883171eb475b5068e7defa4ca9e76516","gcr.io/google_containers/pause-amd64:3.0"],"sizeBytes":746888}],"volumesInUse":["kubernetes.io/aws-ebs/aws://us-east-1d/vol-f4bd0352"]
A: ,"volumesAttached":[{"name":"kubernetes.io/aws-ebs/aws://us-east-1d/vol-f4bd0352","devicePath":"/dev/xvdba"}]}}
B: }}
update status
When volume controller tries to update the node status, if it fails to
update the nodes status, it should call SetNodeStatusUpdateNeeded so
that the volume list could be updated next time.
Restoring the code which was removed in 528bf7a. When there are no sufficient
resources on node or there is a conflicting host port event is emited.
It helps with debugging.
Fixes#31369
Automatic merge from submit-queue
Create restclient interface
Refactoring of code to allow replace *restclient.RESTClient with any RESTClient implementation that implements restclient.RESTClientInterface interface.
Automatic merge from submit-queue
Add an informer for StorageClass
Add an informer for `StorageClass` for later consumption in quota.
/cc @eparis @deads2k @erinboyd
Automatic merge from submit-queue
PVC informer lister supports listing
This will be used in follow-on PRs for quota evaluation backed by informers for pvcs.
/cc @deads2k @eparis @kubernetes/sig-storage
Automatic merge from submit-queue
Adding default StorageClass annotation printout for resource_printer and describer and some refactoring
adding ISDEFAULT for _kubectl get storageclass_ output
```
[root@screeley-sc1 gce]# kubectl get storageclass
NAME TYPE ISDEFAULT
another-class kubernetes.io/gce-pd NO
generic1-slow kubernetes.io/gce-pd YES
generic2-fast kubernetes.io/gce-pd YES
```
```release-note
Add ISDEFAULT to kubectl get storageClass output
```
@kubernetes/sig-storage
Automatic merge from submit-queue
fix more RS controller flakes
I saw another flake:
```
panic: Fail in goroutine after TestUpdatePods has completed
/usr/local/go/src/runtime/panic.go:500 +0x1ae
/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/util/runtime/runtime.go:56 +0x17d
/usr/local/go/src/runtime/panic.go:458 +0x271
/usr/local/go/src/testing/testing.go:412 +0x182
/usr/local/go/src/testing/testing.go:484 +0x95
/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/controller/replicaset/replica_set_test.go:619 +0x1d2
/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/controller/replicaset/replica_set.go:414 +0x191
/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/controller/replicaset/replica_set.go:403 +0x39
/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/controller/replicaset/replica_set.go:169 +0x42
/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/util/wait/wait.go:87 +0x70
/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/util/wait/wait.go:88 +0xbe
/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/util/wait/wait.go:49 +0x5b
/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/controller/replicaset/replica_set_test.go:625 +0x369
```
This resolves that by separating the listers from the watch like it used to be in this set of tests. The tests were like this before the refactor. I think they limit utility, but I'm not prepared to re-write them all.
@kargakis
Automatic merge from submit-queue
+optional tag for OpenAPI spec
OpenAPI rely on "omitempty" json tag to determine if a field is optional or not. This change will add "+optional" tag to all fields with "omitempty" json tag and support the tag in OpenAPI spec generator.
Automatic merge from submit-queue
controller: set minReadySeconds in deployment's replica sets
* Estimate available pods for a deployment by using minReadySeconds on
the replica set.
* Stop requeueing deployments on pod events, superseded by following the
replica set status.
* Cleanup redundant deployment utilities
Fixes https://github.com/kubernetes/kubernetes/issues/26079
@kubernetes/deployment ptal
Automatic merge from submit-queue
Pass whole PVC to provisioner plugins
Gluster provisioner is interested in namespace of PVCs that are being provisioned and I don't want to add at as a new field in `volume.VolumeOptions` - it would contain almost whole PVC.
Let's rework `VolumeOptions` and pass direct reference to PVC there instead of some "interesting" fields and let the provisioner to pick information it is interested in.
There was lot of refactoring in volume plugins to apply this change (too many plugins), however the logic is simple and it's all the same in all plugins.
@rootfs @humblec
Automatic merge from submit-queue
NodeController waits for informer sync before doing anything
cc @lavalamp @davidopp
```release-note
NodeController waits for full sync of all it's informers before taking any action.
```
Automatic merge from submit-queue
Run rbac authorizer from cache
RBAC authorization can be run very effectively out of a cache. The cache is a normal reflector backed cache (shared informer).
I've split this into three parts:
1. slim down the authorizer interfaces
1. boilerplate for adding rbac shared informers and associated listers which conform to the new interfaces
1. wiring
@liggitt @ericchiang @kubernetes/sig-auth
Automatic merge from submit-queue
Handle DeletedFinalStateUnknown in NodeController
Fix#34692
```release-note
Fix panic in NodeController caused by receiving DeletedFinalStateUnknown object from the cache.
```
cc @davidopp
* Estimate available pods for a deployment by using minReadySeconds on
the replica set.
* Stop requeueing deployments on pod events, superseded by following the
replica set status.
* Cleanup redundant deployment utilities
Automatic merge from submit-queue
Revert "Error out when any RS has more available pods then its spec r…
Reverts https://github.com/kubernetes/kubernetes/pull/29808
The PR is wrong because we can have more available pods than desired every time we scale down.
@kubernetes/deployment ptal
Automatic merge from submit-queue
Fix missleading comment
**What this PR does / why we need it**: It just fixes misleading comment. It took me some time to figure out real behavior.
Gluster provisioner is interested in pvc.Namespace and I don't want to add
at as a new field in VolumeOptions - it would contain almost whole PVC.
Let's pass direct reference to PVC instead and let the provisioner to pick
information it is interested in.
Automatic merge from submit-queue
Copy finalizers from template spec to pod.
**What this PR does / why we need it**: The PodTemplateSpec has a finalizers field whose contents are not copied over to a pod during creation.