Automatic merge from submit-queue (batch tested with PRs 38608, 38299)
controller: set unavailableReplicas correctly when scaling down
```
deployment_controller.go:299] Error syncing deployment
e2e-tests-kubectl-2l7xx/e2e-test-nginx-deployment:
Deployment.extensions "e2e-test-nginx-deployment" is invalid:
status.unavailableReplicas: Invalid value: -1:
must be greater than or equal to 0
```
The validation error above occurs usually when a Deployment is
scaled down. In such a case we should default unavailableReplicas
to 0 instead of making an invalid api call.
@kubernetes/deployment
Automatic merge from submit-queue
Remove json serialization annotations from internal types
fixes#3933
Internal types should never be serialized, and including json serialization tags on them makes it possible to accidentally do that without realizing it.
fixes in this PR:
* types
* [x] remove json tags from internal types
* [x] fix references from serialized types to internal ObjectMeta
* generation
* [x] remove generated json codecs for internal types (they should never be used)
* kubectl
* [x] fix `apply` to operate on versioned object
* [x] fix sorting by field to operate on versioned object
* [x] fix `--record` to build annotation patch using versioned object
* hpa
* [x] fix unmarshaling to internal CustomMetricTargetList in validation
* thirdpartyresources
* [x] fix encoding API responses using internal ObjectMeta
* tests
* [x] fix tests to use versioned objects when checking encoded content
* [x] fix tests passing internal objects to generic printers
follow ups (will open tracking issues or additional PRs):
- [ ] remove json tags from internal kubeconfig types (`kubectl config set` pathfinding needs to work against external type)
- [ ] HPA should version CustomMetricTargetList serialization in annotations
- [ ] revisit how TPR resthandlers encoding objects
- [ ] audit and add tests for printer use (human-readable printer requires internal versions, generic printers require external versions)
- [ ] add static analysis tests preventing new internal types from adding tags
- [ ] add static analysis tests requiring json tags on external types (and enforcing lower-case first letter)
- [ ] add more tests for `kubectl get` exercising known and unknown types with all output options
Automatic merge from submit-queue (batch tested with PRs 38354, 38371)
Add GetOptions parameter to Get() calls in client library
Ref #37473
This PR is super mechanical - the non trivial commits are:
- Update client generator
- Register GetOptions in batch/v2alpha1 group
Automatic merge from submit-queue (batch tested with PRs 38278, 37770)
Refactor REST storage to use generic defaults
This removes the repetition in the REST storage builders by moving the logic to `restoptions.ApplyOptions`. `registry.StorageWithCacher`/`generic.StorageDecorator` no longer assume that they can build the `keyFunc` for arbitrary objects. `restoptions.ApplyOptions` uses the `registry.Store`'s `KeyFunc` for its call to `generic.StorageDecorator`.
```release-note
Cluster federation servers have changed the location in etcd where federated services are stored, so existing federated services must be deleted and recreated. Before upgrading, export all federated services from the federation server and delete the services. After upgrading the cluster, recreate the federated services from the exported data.
```
Automatic merge from submit-queue (batch tested with PRs 38377, 36365, 36648, 37691, 38339)
Exponential back off when volume delete fails
**What this PR does / why we need it**:
This PR implements ability in pv_controller to back off when deleting a volume fails from plugin API.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*:
Partly fixes#38295 , but I think volume delete is most problematic thing happening in pv_controller without any sort of backoff.
After this change the attempts of volume deletion look like:
```
controller : I1208 00:18:35.532061 16388 aws_util.go:55] Error deleting EBS Disk volume aws://us-east-1d/vol-abcdefg: VolumeInUse: Volume vol-abcdefg is currently attached to i-1234567
controller : I1208 00:20:50.578325 16388 aws_util.go:55] Error deleting EBS Disk volume aws://us-east-1d/vol-abcdefg: VolumeInUse: Volume vol-abcdefg is currently attached to i-1234567
controller : I1208 00:23:05.563488 16388 aws_util.go:55] Error deleting EBS Disk volume aws://us-east-1d/vol-abcdefg: VolumeInUse: Volume vol-abcdefg is currently attached to i-1234567
controller : I1208 00:25:20.599158 16388 aws_util.go:55] Error deleting EBS Disk volume aws://us-east-1d/vol-abcdefg: VolumeInUse: Volume vol-abcdefg is currently attached to i-1234567
controller : I1208 00:27:35.560009 16388 aws_util.go:55] Error deleting EBS Disk volume aws://us-east-1d/vol-abcdefg: VolumeInUse: Volume vol-abcdefg is currently attached to i-1234567
controller : I1208 00:29:50.594967 16388 aws_util.go:55] Error deleting EBS Disk volume aws://us-east-1d/vol-abcdefg: VolumeInUse: Volume vol-abcdefg is currently attached to i-1234567
controller : I1208 00:32:05.539168 16388 aws_util.go:55] Error deleting EBS Disk volume aws://us-east-1d/vol-abcdefg: VolumeInUse: Volume vol-abcdefg is currently attached to i-1234567
controller : I1208 00:34:20.581665 16388 aws_util.go:55] Error deleting EBS Disk volume aws://us-east-1d/vol-abcdefg: VolumeInUse: Volume vol-abcdefg is currently attached to i-1234567
```
Automatic merge from submit-queue (batch tested with PRs 38377, 36365, 36648, 37691, 38339)
controller: sync stuck deployments in a secondary queue
@kubernetes/deployment this makes Deployments not depend on a tight resync interval in order to estimate progress.
This implements pv_controller to exponentially backoff
when deleting a volume fails in Cloud API. It ensures that
we aren't making too many calls to Cloud API
deployment_controller.go:299] Error syncing deployment
e2e-tests-kubectl-2l7xx/e2e-test-nginx-deployment:
Deployment.extensions "e2e-test-nginx-deployment" is invalid:
status.unavailableReplicas: Invalid value: -1:
must be greater than or equal to 0
The validation error above occurs usually when a Deployment is
scaled down. In such a case we should default unavailableReplicas
to 0 instead of making an invalid api call.
Automatic merge from submit-queue
Add integration tests for desire state of world populator
Add integration tests for desire state of world populator
This adds tests for code introduced here :
https://github.com/kubernetes/kubernetes/issues/26994
Via integration test we can now verify that if pod delete
event is somehow missed by AttachDetach controller - it still
get cleaned up by Desired State of World populator.
This adds tests for code introduced here :
https://github.com/kubernetes/kubernetes/issues/26994
Via integration test we can now verify that if pod delete
event is somehow missed by AttachDetach controller - it still
get cleaned up by Desired State of World populator.
Automatic merge from submit-queue (batch tested with PRs 37608, 37103, 37320, 37607, 37678)
Fix issue #37377: Report an event on successful PVC provisioning
This is a simple patch to fix the issue #37377: On a successful PVC provisioning an event is emitted so it's clear the provisioning actually succeeded.
cc: @jsafrane
Automatic merge from submit-queue (batch tested with PRs 37945, 37498, 37391, 37209, 37169)
Refactor certificate controller to make approval an interface
@mikedanese
Automatic merge from submit-queue
SetNodeUpdateStatusNeeded whenever nodeAdd event is received
**What this PR does / why we need it**:
Bug fix and SetNodeStatusUpdateNeeded for a node whenever its api object is added. This is to ensure that we don't lose the attached list of volumes in the node when its api object is deleted and recreated.
fixes https://github.com/kubernetes/kubernetes/issues/37586https://github.com/kubernetes/kubernetes/issues/37585
**Special notes for your reviewer**:
<!-- Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access)
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`.
-->
Automatic merge from submit-queue
rescale immediately if the basic constraints are not satisfied
refactor reconcileAutoscaler.
If the basic constraints are not satisfied, we should rescale the target ref immediately.
Automatic merge from submit-queue
fix rbac informer. it's listers are all internal
Fixes https://github.com/kubernetes/kubernetes/issues/37615
The rbac informer still uses internal types in its listers, which means it must use internal clients for evaluation. Since its running inside the API server, this seems ok for now and we can/should fix it when generated informers come along. This just patches us to keep RBAC working.
@kubernetes/sig-auth @sttts @liggitt this is broken in master, let's get it sorted quickly.
Automatic merge from submit-queue
Add namespace index for limit ranger
Without this PR I'm seeing a huge number of lines like this:
```
Index with name namespace does not exist
```
Those are coming from LimitRanger admission controller - this PR fixes those.
Automatic merge from submit-queue
controller manager refactors
The controller manager needs some significant cleanup. This starts us down the patch by respecting parameters like `stopCh`, simplifying discovery checks, removing unnecessary parameters, preventing unncessary fatals, and using our client builder.
@sttts @ncdc
Automatic merge from submit-queue
Fix package aliases to follow golang convention
Some package aliases are not not align with golang convention https://blog.golang.org/package-names. This PR fixes them. Also adds a verify script and presubmit checks.
Fixes#35070.
cc/ @timstclair @Random-Liu
Many providers need to do some sort of node name -> IP or instanceID
lookup before they can use the list of hostnames passed to
EnsureLoadBalancer/UpdateLoadBalancer.
This change just passes the full Node object instead of simply the node
name, allowing providers to use the node's provider ID and cached
addresses without additional lookups. Using `node.Name` reproduces the
old behaviour.
Automatic merge from submit-queue
hold namespaces briefly before processing deletion
possible fix for #36891
in HA scenarios (either HA apiserver or HA etcd), it is possible for deletion of resources from namespace cleanup to race with creation of objects in the terminating namespace
HA master timeline:
1. "delete namespace n" API call goes to apiserver 1, deletion timestamp is set in etcd
2. namespace controller observes namespace deletion, starts cleaning up resources, lists deployments
3. "create deployment d" API call goes to apiserver 2, gets persisted to etcd
4. apiserver 2 observes namespace deletion, stops allowing new objects to be created
5. namespace controller finishes deleting listed deployments, deletes namespace
HA etcd timeline:
1. "create deployment d" API call goes to apiserver, gets persisted to etcd
2. "delete namespace n" API call goes to apiserver, deletion timestamp is set in etcd
3. namespace controller observes namespace deletion, starts cleaning up resources, lists deployments
4. list call goes to non-leader etcd member that hasn't observed the new deployment or the deleted namespace yet
5. namespace controller finishes deleting the listed deployments, deletes namespace
In both cases, simply waiting to clean up the namespace (either for etcd members to observe objects created at the last second in the namespace, or for other apiservers to observe the namespace move to terminating phase and disallow additional creations) resolves the issue
Possible other fixes:
* do a second sweep of objects before deleting the namespace
* have the namespace controller check for and clean up objects in namespaces that no longer exist
* ...?
Automatic merge from submit-queue
Add more logging around Pod deletion
After this PR we'll have at least V(2) level log near all Pod deletions.
@saad-ali - this is required by GKE to help with diagnosing possible problem.
cc @dchen1107 @wojtek-t
Automatic merge from submit-queue
Add logs near force deletions of Pods
We should always log something when control plane force deletes the Pod.
@davidopp I think that logging force deletions is enough, or do you think we should log soft deletions as well?
cc @deads2k
Automatic merge from submit-queue
Fix kubectl Stratigic Merge Patch compatibility
As @smarterclayton pointed out in [comment1](https://github.com/kubernetes/kubernetes/pull/35647#pullrequestreview-8290820) and [comment2](https://github.com/kubernetes/kubernetes/pull/35647#pullrequestreview-8290847) in PR #35647,
we cannot assume the API servers publish version and they shares the same version.
This PR removes all the calls of GetServerSupportedSMPatchVersion().
Change the behavior of `apply` and `edit` to:
Retrying with the old patch version, if the new version fails.
Default other usage of SMPatch to the new version, since they don't update list of primitives.
fixes#36916
cc: @pwittrock @smarterclayton
This PR is to fix the issue in converting aws volume id from mount
paths. Currently there are three aws volume id formats supported. The
following lists example of those three formats and their corresponding
global mount paths:
1. aws:///vol-123456
(/var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/aws/vol-123456)
2. aws://us-east-1/vol-123456
(/var/lib/kubelet/plugins/kubernetes.io/mounts/aws/us-est-1/vol-123455)
3. vol-123456
(/var/lib/kubelet/plugins/kubernetes.io/mounts/aws/us-est-1/vol-123455)
For the first two cases, we need to check the mount path and convert
them back to the original format.
Automatic merge from submit-queue
Restore event messages for replica sets in the deployment controller
Needed to unblock release upgrade tests (see https://github.com/kubernetes/kubernetes/issues/36453)
@kubernetes/deployment ptal
Automatic merge from submit-queue
Fix strategic patch for list of primitive type with merge sementic
Fix strategic patch for list of primitive type when the patch strategy is `merge`.
Before: we cannot replace or delete an item in a list of primitive, e.g. string, when the patch strategy is `merge`. It will always append new items to the list.
This patch will generate a map to update the list of primitive type.
The server with this patch will accept either a new patch or an old patch.
The client will found out the APIserver version before generate the patch.
Fixes#35163, #32398
cc: @pwittrock @fabianofranz
``` release-note
Fix strategic patch for list of primitive type when patch strategy is `merge` to remove deleted objects.
```
Automatic merge from submit-queue
Do not handle AlreadyExists errors yet
Until we fix https://github.com/kubernetes/kubernetes/issues/29735 (use a new hashing algo) we should not handle AlreadyExists (was added recently in the perma-failed PR).
@kubernetes/deployment
Automatic merge from submit-queue
Better messaging for missing volume binaries on host
**What this PR does / why we need it**:
When mount binaries are not present on a host, the error returned is a generic one.
This change is to check the mount binaries before the mount and return a user-friendly error message.
This change is specific to GCI and the flag is experimental now.
https://github.com/kubernetes/kubernetes/issues/36098
**Release note**:
Introduces a flag `check-node-capabilities-before-mount` which if set, enables a check (`CanMount()`) prior to mount operations to verify that the required components (binaries, etc.) to mount the volume are available on the underlying node. If the check is enabled and `CanMount()` returns an error, the mount operation fails. Implements the `CanMount()` check for NFS.
Sample output post change :
rkouj@rkouj0:~/go/src/k8s.io/kubernetes$ kubectl describe pods
Name: sleepyrc-fzhyl
Namespace: default
Node: e2e-test-rkouj-minion-group-oxxa/10.240.0.3
Start Time: Mon, 07 Nov 2016 21:28:36 -0800
Labels: name=sleepy
Status: Pending
IP:
Controllers: ReplicationController/sleepyrc
Containers:
sleepycontainer1:
Container ID:
Image: gcr.io/google_containers/busybox
Image ID:
Port:
Command:
sleep
6000
QoS Tier:
cpu: Burstable
memory: BestEffort
Requests:
cpu: 100m
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment Variables:
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
data:
Type: NFS (an NFS mount that lasts the lifetime of a pod)
Server: 127.0.0.1
Path: /export
ReadOnly: false
default-token-d13tj:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-d13tj
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
7s 7s 1 {default-scheduler } Normal Scheduled Successfully assigned sleepyrc-fzhyl to e2e-test-rkouj-minion-group-oxxa
6s 3s 4 {kubelet e2e-test-rkouj-minion-group-oxxa} Warning FailedMount Unable to mount volume kubernetes.io/nfs/32c7ef16-a574-11e6-813d-42010af00002-data (spec.Name: data) on pod sleepyrc-fzhyl (UID: 32c7ef16-a574-11e6-813d-42010af00002). Verify that your node machine has the required components before attempting to mount this volume type. Required binary /sbin/mount.nfs is missing
Automatic merge from submit-queue
Implement external provisioning proposal
In other words, add "provisioned-by" annotation to all PVCs that should be provisioned dynamically.
Most of the changes are actually in tests.
@kubernetes/sig-storage
Automatic merge from submit-queue
Use generation in pod disruption budget
Fixes#35324
Previously it was possible to use allowedDirsruptions calculated for the previous spec with the current spec. With generation check API servers always make sure that allowedDisruptions were calculated for the current spec.
At the same time I set the registry policy to only accept updates if the version based on which the update was made matches to the current version in etcd. That ensures that parallel eviction executions don't use the same allowed disruption.
cc: @davidopp @kargakis @wojtek-t
Automatic merge from submit-queue
Use available informers in quota replenishment
more iteration on the goal to use informers where available in quota system. this time adding persistent volume claims so the same informer is used here and https://github.com/kubernetes/kubernetes/pull/36316
Currently, the HPA considers unready pods the same as ready pods when
looking at their CPU and custom metric usage. However, pods frequently
use extra CPU during initialization, so we want to consider them
separately.
This commit causes the HPA to consider unready pods as having 0 CPU
usage when scaling up, and ignores them when scaling down. If, when
scaling up, factoring the unready pods as having 0 CPU would cause a
downscale instead, we simply choose not to scale. Otherwise, we simply
scale up at the reduced amount caculated by factoring the pods in at
zero CPU usage.
The effect is that unready pods cause the autoscaler to be a bit more
conservative -- large increases in CPU usage can still cause scales,
even with unready pods in the mix, but will not cause the scale factors
to be as large, in anticipation of the new pods later becoming ready and
handling load.
Similarly, if there are pods for which no metrics have been retrieved,
these pods are treated as having 100% of the requested metric when
scaling down, and 0% when scaling up. As above, this cannot change the
direction of the scale.
This commit also changes the HPA to ignore superfluous metrics -- as
long as metrics for all ready pods are present, the HPA we make scaling
decisions. Currently, this only works for CPU. For custom metrics, we
cannot identify which metrics go to which pods if we get superfluous
metrics, so we abort the scale.
Automatic merge from submit-queue
Rename ScheduledJobs to CronJobs
I went with @smarterclayton idea of registering named types in schema. This way we can support both the new (CronJobs) and old (ScheduledJobs) resource name. Fixes#32150.
fyi @erictune @caesarxuchao @janetkuo
Not ready yet, but getting close there...
**Release note**:
```release-note
Rename ScheduledJobs to CronJobs.
```
Automatic merge from submit-queue
Add more events to disruption controller
To provide users with information that their PDB may not be working as intended.
cc: @davidopp
Automatic merge from submit-queue
Fix possible race in operationNotSupportedCache
Because we can run multiple workers to delete namespaces simultaneously, the
operationNotSupportedCache needs to be guarded with a mutex to avoid concurrent
map read/write errors.
Automatic merge from submit-queue
lister-gen updates
- Remove "zz_generated." prefix from generated lister file names
- Add support for expansion interfaces
- Switch to new generated JobLister
@deads2k @liggitt @sttts @mikedanese @caesarxuchao for the lister-gen changes
@soltysh @deads2k for the informer / job controller changes
Automatic merge from submit-queue
Remove GetRootContext method from VolumeHost interface
Remove the `GetRootContext` call from the `VolumeHost` interface, since Kubernetes no longer needs to know the SELinux context of the Kubelet directory.
Per #33951 and #35127.
Depends on #33663; only the last commit is relevant to this PR.
Automatic merge from submit-queue
Update how we detect overlapping deployments
<!-- Thanks for sending a pull request! Here are some tips for you:
1. If this is your first time, read our contributor guidelines https://github.com/kubernetes/kubernetes/blob/master/CONTRIBUTING.md and developer guide https://github.com/kubernetes/kubernetes/blob/master/docs/devel/development.md
2. If you want *faster* PR reviews, read how: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/faster_reviews.md
3. Follow the instructions for writing a release note: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/pull-requests.md#release-notes
-->
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#24152
**Special notes for your reviewer**: cc @kubernetes/deployment
**Release note**:
<!-- Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access)
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`.
-->
```release-note
NONE
```
When looking for overlapping deployments, we should also find other deployments that select current deployment's pods,
not just the ones whose pods are selected by current deployment.
Automatic merge from submit-queue
Controller changes for perma failed deployments
This PR adds support for reporting failed deployments based on a timeout
parameter defined in the spec. If there is no progress for the amount
of time defined as progressDeadlineSeconds then the deployment will be
marked as failed by a Progressing condition with a ProgressDeadlineExceeded
reason.
Follow-up to https://github.com/kubernetes/kubernetes/pull/19343
Docs at kubernetes/kubernetes.github.io#1337
Fixes https://github.com/kubernetes/kubernetes/issues/14519
@kubernetes/deployment @smarterclayton
Because we can run multiple workers to delete namespaces simultaneously, the
operationNotSupportedCache needs to be guarded with a mutex to avoid concurrent
map read/write errors.
This commit adds support for failing deployments based on a timeout
parameter defined in the spec. If there is no progress for the amount
of time defined as progressDeadlineSeconds then the deployment will be
marked as failed by adding a condition with a ProgressDeadlineExceeded
reason in it. Progress in the context of a deployment means the creation
or adoption of a new replica set, scaling up new pods, and scaling down
old pods.
Automatic merge from submit-queue
Fix how we iterate over active jobs when removing them for Replace policy
When fixing the Replace Active removal I used wrong for loop construct which panics :/ This PR fixes that by using for range.
@janetkuo ptal
@jessfraz this will also be a cherry-pick candidate for 1.4, I remember we've picked the aforementioned fix as well
Automatic merge from submit-queue
Set reason and message on Pod during nodecontroller eviction
**What this PR does / why we need it**: Pods which are evicted by the nodecontroller due to network partition, or unresponsive kubelet should be differentiated from termination initiated by other sources. The reason/message are consumed by kubectl to provide a better summary using get/describe.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#35725
**Release note**:
```release-note
Pods that are terminating due to eviction by the nodecontroller (typically due to unresponsive kubelet, or network partition) now surface in `kubectl get` output
as being in state "Unknown", along with a longer description in `kubectl describe` output.
```
Pods which are evicted by the nodecontroller due to network
malfunction, or unresponsive kubelet should be differentiated
from termination initiated by other sources. The reason/message
are consumed by kubectl to provide a better summary using get/describe.
When looking for overlapping deployments, we should also find other deployments that select current deployment's pods,
not just the ones whose pods are selected by current deployment.
Automatic merge from submit-queue
Making the pod.alpha.kubernetes.io/initialized annotation optional in PetSet pods
**What this PR does / why we need it**: As of now, the absence of the annotation `pod.alpha.kubernetes.io/initialized` in PetSets causes the PetSet controller to effectively "pause". Being a debug hook, users expect that its absence has no effect on the working of a PetSet. This PR inverts the logic so that we let the PetSet controller operate as expected in the absence of the annotation.
Letting the annotation remain alpha seems ok. Renaming it to something more meaningful needs further discussion.
**Which issue this PR fixes** _(optional, in `fixes #<issue number>(, #<issue_number>, ...)` format, will close that issue when PR gets merged)_: fixes https://github.com/kubernetes/kubernetes/issues/35498
**Special notes for your reviewer**:
**Release note**:
``` release-note
The annotation "pod.alpha.kubernetes.io/initialized" on StatefulSets (formerly PetSets) is now optional and only encouraged for debug use.
```
cc @erictune @smarterclayton @bprashanth @kubernetes/sig-apps
@kow3ns The examples will need to be cleaned up as well I think later on to remove them.
Automatic merge from submit-queue
Node controller to not force delete pods
Fixes https://github.com/kubernetes/kubernetes/issues/35145
- [x] e2e tests to test Petset, RC, Job.
- [x] Remove and cover other locations where we force-delete pods within the NodeController.
**Release note**:
<!-- Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access)
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`.
-->
``` release-note
Node controller no longer force-deletes pods from the api-server.
* For StatefulSet (previously PetSet), this change means creation of replacement pods is blocked until old pods are definitely not running (indicated either by the kubelet returning from partitioned state, or deletion of the Node object, or deletion of the instance in the cloud provider, or force deletion of the pod from the api-server). This has the desirable outcome of "fencing" to prevent "split brain" scenarios.
* For all other existing controllers except StatefulSet , this has no effect on the ability of the controller to replace pods because the controllers do not reuse pod names (they use generate-name).
* User-written controllers that reuse names of pod objects should evaluate this change.
```
Automatic merge from submit-queue
Add tooling to generate listers
Add lister-gen tool to auto-generate listers. So far this PR only demonstrates replacing the manually-written `StoreToLimitRangeLister` with the generated `LimitRangeLister`, as it's a small and easy swap.
cc @deads2k @liggitt @sttts @nikhiljindal @lavalamp @smarterclayton @derekwaynecarr @kubernetes/sig-api-machinery @kubernetes/rh-cluster-infra
Automatic merge from submit-queue
quota controller uses informers if available for pod calculation
This PR does the following:
1. plumb informer factory into quota registry and evaluators
2. pod quota evaluator uses informers for determining aggregrate usage instead of making direct calls
3. admission code path does not use informers because
1. we do not want to add new watches in apiserver
2. admission code path does not require aggregate usage calculation
As a result, quota controller is much faster in re-calculating quota usage when it observes a pod deletion.
Follow-on PRs will make similar changes for other informer backed resources (pvcs next).
/cc @deads2k @mfojtik @smarterclayton @kubernetes/rh-cluster-infra
Automatic merge from submit-queue
Remove Job also from .status.active for Replace strategy
When iterating over list of Jobs we're removing each of them when strategy is replace. Unfortunately, the job reference was not removed from `.status.active` which cause the controller trying to remove it once again during next run and failed removing what was already removed during previous run. This was cause by not removing the reference previously. This PR fixes that and cleans logs a bit, in that controller.
@erictune fyi
@janetkuo ptal
Automatic merge from submit-queue
Let release_1_5 clientset include multiple versions of a group
Fix#35237
This PR make versioned clientset to include multiple versions of a group. Currently only `batch` has `v1` and `v2alpha1`. The clientset interface now looks like:
```go
BatchV2alpha1() v2alpha1batch.BatchV2alpha1Interface
BatchV1() v1batch.BatchV1Interface
// Deprecated: please explicitly pick a version if possible.
Batch() v1batch.BatchV1Interface
```
Commit "update client-gen to say internalversion rather than unversioned" fixes https://github.com/kubernetes/kubernetes/issues/24481.
cc @kubernetes/sig-api-machinery @soltysh @deads2k @nikhiljindal
```release-note
release_1_5 clientset supports multiple versions of a group.
```
Automatic merge from submit-queue
Make overlapping deployments deletable
@kubernetes/deployment ptal
Fixes https://github.com/kubernetes/kubernetes/issues/34466 by 1) not adding the overlapping annotation in the working deployment, 2) updates observedGeneration for overlapping deployments, and 3) updates the kubectl deployment reaper to do non-cascading deletion for deployments with the overlapping annotation.
Automatic merge from submit-queue
convert SA controller to shared informers
convert the SA controller to shared informer + workqueue.
I think one of @derekwaynecarr @ncdc or @liggitt
Automatic merge from submit-queue
Simplify negotiation in server in preparation for multi version support
This is a pre-factor for #33900 to simplify runtime.NegotiatedSerializer, tighten up a few abstractions that may break when clients can request different client versions, and pave the way for better negotiation.
View this as pure simplification.
Automatic merge from submit-queue
Add sync state loop in master's volume reconciler
At master volume reconciler, the information about which volumes are
attached to nodes is cached in actual state of world. However, this
information might be out of date in case that node is terminated (volume
is detached automatically). In this situation, reconciler assume volume
is still attached and will not issue attach operation when node comes
back. Pods created on those nodes will fail to mount.
This PR adds the logic to periodically sync up the truth for attached
volumes kept in
the actual state cache. If the volume is no longer attached to the node,
the actual state will be updated to reflect the truth. In turn,
reconciler will take actions if needed.
To avoid issuing many concurrent operations on cloud provider, this PR
tries to add batch operation to check whether a list of volumes are
attached to the node instead of one request per volume.
Automatic merge from submit-queue
Moving some force deletion logic from the NC into the PodGC
**What this PR does / why we need it**: Moves some pod force-deletion behavior into the PodGC, which is a better place for these.
This should be a NOP because we're just moving functionality
around and thanks to #35476, the podGC controller should always
run.
Related: https://github.com/kubernetes/kubernetes/pull/34160, https://github.com/kubernetes/kubernetes/issues/35145
cc @gmarek @kubernetes/sig-apps
At master volume reconciler, the information about which volumes are
attached to nodes is cached in actual state of world. However, this
information might be out of date in case that node is terminated (volume
is detached automatically). In this situation, reconciler assume volume
is still attached and will not issue attach operation when node comes
back. Pods created on those nodes will fail to mount.
This PR adds the logic to periodically sync up the truth for attached volumes kept in the actual state cache. If the volume is no longer attached to the node, the actual state will be updated to reflect the truth. In turn, reconciler will take actions if needed.
To avoid issuing many concurrent operations on cloud provider, this PR
tries to add batch operation to check whether a list of volumes are
attached to the node instead of one request per volume.
More details are explained in PR #33760
Alter how runtime.SerializeInfo is represented to simplify negotiation
and reduce the need to allocate during negotiation. Simplify the dynamic
client's logic around negotiating type. Add more tests for media type
handling where necessary.
Automatic merge from submit-queue
Fix potential panic in namespace controller when rapidly create/delet…
Fixes https://github.com/kubernetes/kubernetes/issues/33676
The theory is this could occur in either of the following scenarios:
1. HA environment where a GET to a different API server than what the WATCH was read from
1. In a many controller scenario (i.e. where multiple finalizers participate), a namespace that is created and deleted with the same name could trip up the other namespace controller to see a namespace with the same name that was not actually in a delete state. Added checks to verify uid matches across retry operations.
/cc @liggitt @kubernetes/rh-cluster-infra
Automatic merge from submit-queue
Node status updater should SetNodeStatusUpdateNeeded if it fails to
update status
When volume controller tries to update the node status, if it fails to
update the nodes status, it should call SetNodeStatusUpdateNeeded so
that the volume list could be updated next time.
Objects from shared informer must not be changed, they are shared among all
controllers.
This fixes CacheMutationDetector panic with this output:
CACHE *api.Node[5] ALTERED!
{"metadata":{"name":"ip-172-18-8-71.ec2.internal","selfLink":"/api/v1/nodes/ip-172-18-8-71.ec2.internal","uid":"73d07d16-976e-11e6-8225-0e2f14b56070","resourceVersion":"136","creationTimestamp":"2016-10-21T09:12:12Z","labels":{"beta.kubernetes.io/arch":"amd64","beta.kubernetes.io/instance-type":"t2.medium","beta.kubernetes.io/os":"linux","failure-domain.beta.kubernetes.io/region":"us-east-1","failure-domain.beta.kubernetes.io/zone":"us-east-1d","kubernetes.io/hostname":"ip-172-18-8-71.ec2.internal"},"annotations":{"volumes.kubernetes.io/controller-managed-attach-detach":"true"}},"spec":{"externalID":"i-9cb6180f","providerID":"aws:///us-east-1d/i-9cb6180f"},"status":{"capacity":{"alpha.kubernetes.io/nvidia-gpu":"0","cpu":"2","memory":"4045568Ki","pods":"110"},"allocatable":{"alpha.kubernetes.io/nvidia-gpu":"0","cpu":"2","memory":"4045568Ki","pods":"110"},"conditions":[{"type":"OutOfDisk","status":"False","lastHeartbeatTime":"2016-10-21T09:12:52Z","lastTransitionTime":"2016-10-21T09:12:12Z","reason":"KubeletHasSufficientDisk","message":"kubelet has sufficient disk space available"},{"type":"MemoryPressure","status":"False","lastHeartbeatTime":"2016-10-21T09:12:52Z","lastTransitionTime":"2016-10-21T09:12:12Z","reason":"KubeletHasSufficientMemory","message":"kubelet has sufficient memory available"},{"type":"DiskPressure","status":"False","lastHeartbeatTime":"2016-10-21T09:12:52Z","lastTransitionTime":"2016-10-21T09:12:12Z","reason":"KubeletHasNoDiskPressure","message":"kubelet has no disk pressure"},{"type":"InodePressure","status":"False","lastHeartbeatTime":"2016-10-21T09:12:52Z","lastTransitionTime":"2016-10-21T09:12:12Z","reason":"KubeletHasNoInodePressure","message":"kubelet has no inode pressure"},{"type":"Ready","status":"True","lastHeartbeatTime":"2016-10-21T09:12:52Z","lastTransitionTime":"2016-10-21T09:12:22Z","reason":"KubeletReady","message":"kubelet is posting ready status"}],"addresses":[{"type":"InternalIP","address":"172.18.8.71"},{"type":"LegacyHostIP","address":"172.18.8.71"},{"type":"ExternalIP","address":"54.85.104.236"}],"daemonEndpoints":{"kubeletEndpoint":{"Port":10250}},"nodeInfo":{"machineID":"78a79498db8e4fdc9ac24b5e436a982c","systemUUID":"EC2BB406-5467-4ABE-B54D-D9993C45714F","bootID":"2553d6b8-1ddb-4ef0-902a-d09a807b89ba","kernelVersion":"4.6.7-300.fc24.x86_64","osImage":"Fedora 24 (Cloud Edition)","containerRuntimeVersion":"docker://1.10.3","kubeletVersion":"v1.5.0-alpha.1.726+5aac5eddb809e4","kubeProxyVersion":"v1.5.0-alpha.1.726+5aac5eddb809e4","operatingSystem":"linux","architecture":"amd64"},"images":[{"names":["openshift/origin-release:latest"],"sizeBytes":714569002},{"names":["openshift/origin-haproxy-router-base:latest"],"sizeBytes":294417608},{"names":["openshift/origin-base:latest"],"sizeBytes":275310761},{"names":["docker.io/centos@sha256:2ae0d2c881c7123870114fb9cc7afabd1e31f9888dac8286884f6cf59373ed9b","docker.io/centos:centos7"],"sizeBytes":196744353},{"names":["gcr.io/google_containers/busybox@sha256:4bdd623e848417d96127e16037743f0cd8b528c026e9175e22a84f639eca58ff","gcr.io/google_containers/busybox:1.24"],"sizeBytes":1113554},{"names":["gcr.io/google_containers/pause-amd64@sha256:163ac025575b775d1c0f9bf0bdd0f086883171eb475b5068e7defa4ca9e76516","gcr.io/google_containers/pause-amd64:3.0"],"sizeBytes":746888}],"volumesInUse":["kubernetes.io/aws-ebs/aws://us-east-1d/vol-f4bd0352"]
A: ,"volumesAttached":[{"name":"kubernetes.io/aws-ebs/aws://us-east-1d/vol-f4bd0352","devicePath":"/dev/xvdba"}]}}
B: }}
update status
When volume controller tries to update the node status, if it fails to
update the nodes status, it should call SetNodeStatusUpdateNeeded so
that the volume list could be updated next time.