Automatic merge from submit-queue
add metrics for workqueues
Adds prometheus metrics to work queues and enables them for the resourcequota controller. It would be easy to add this to all other workqueue based controllers and gather basic responsiveness metrics.
@kubernetes/rh-cluster-infra helps debug quota controller responsiveness problems.
<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.kubernetes.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.kubernetes.io/reviews/kubernetes/kubernetes/30296)
<!-- Reviewable:end -->
Automatic merge from submit-queue
pkg/controller/garbagecollector: simplify mutexes.
pkg/controller/garbagecollector: simplified synchronization and made idiomatic.
Similar to #29598, we can rely on the zero-value construction behavior
to embed `sync.Mutex` into parent structs.
<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.kubernetes.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.kubernetes.io/reviews/kubernetes/kubernetes/29898)
<!-- Reviewable:end -->
Automatic merge from submit-queue
HPA: ignore scale targets whose replica count is 0
Disable HPA when the user (or another component) explicitly sets the replicas to 0.
Fixes#28603
@kubernetes/autoscaling @fgrzadkowski @kubernetes/rh-cluster-infra @smarterclayton @ncdc
<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.kubernetes.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.kubernetes.io/reviews/kubernetes/kubernetes/29212)
<!-- Reviewable:end -->
Also, fix unit tests to have the same claim and volume sizes in most of the
tests where we don't test matching based on size and test for a specific size
when we do actually test the matching.
Automatic merge from submit-queue
Run goimport for the whole repo
While removing GOMAXPROC and running goimports, I noticed quite a lot of other files also needed a goimport format. Didn't commit `*.generated.go`, `*.deepcopy.go` or files in `vendor`
This is more for testing if it builds.
The only strange thing here is the gopkg.in/gcfg.v1 => github.com/scalingdata/gcfg replace.
cc @jfrazelle @thockin
Automatic merge from submit-queue
optimize conditions of ServiceReplenishmentUpdateFunc to replenish service
Originally, the replenishQuota method didn't focus on the third parameter object even if others transfered to it, i think the function is not efficient and perfect. then i use the third param to get MatchResources, it will be more exact. for example, if the old pod was quota tracked and the new was not, the replenishQuota only focus on usage resource of the old pod, still if the third parameter object is nil, the process will be same as before
Automatic merge from submit-queue
refactor quota calculation for re-use
Refactors quota calculation to allow reuse. This will allow us to do "punch through" calculation inside of admission if a particular quota needs usage stats and allows downstream re-use by moving calculation closer to the evaluators and separating "needs calculation" logic from "do calculation".
@derekwaynecarr
Automatic merge from submit-queue
Rewrite service controller to apply best controller pattern
This PR is a long term solution for #21625:
We apply the same pattern like replication controller to service controller to avoid the potential process order messes in service controller, the change includes:
1. introduce informer controller to watch service changes from kube-apiserver, so that every changes on same service will be kept in serviceStore as the only element.
2. put the service name to be processed to working queue
3. when process service, always get info from serviceStore to ensure the info is up-to-date
4. keep the retry mechanism, sleep for certain interval and add it back to queue.
5. remote the logic of reading last service info from kube-apiserver before processing the LB info as we trust the info from serviceStore.
The UT has been passed, manual test passed after I hardcode the cloud provider as FakeCloud, however I am not able to boot a k8s cluster with any available cloudprovider, so e2e test is not done.
Submit this PR first for review and for triggering a e2e test.
Automatic merge from submit-queue
Fix deployment e2e test: waitDeploymentStatus should error when entering an invalid state
Follow up #28162
1. We should check that max unavailable and max surge aren't violated at all times in e2e tests (didn't check this in deployment scaled rollout yet, but we should wait for it to become valid and then continue do the check until it finishes)
2. Fix some minor bugs in e2e tests
@kubernetes/deployment
Automatic merge from submit-queue
Stabilize volume unit tests by waiting for exact state
Wait for specific final state instead of waiting for specific number of
operations in controller unit tests. The tests are more readable and will survive
random goroutine ordering (PV and PVC controller have both their own
goroutine).
@kubernetes/sig-storage
Automatic merge from submit-queue
add union registry for quota
Adds the ability to combine multiple quota registries together. Kube needs this for other types.
@derekwaynecarr
Automatic merge from submit-queue
Error out when any RS has more available pods then its spec replicas
Fixes#29559 (hopefully, if not the bot will open new issues for us)
@kubernetes/deployment
Automatic merge from submit-queue
pkg/various: plug leaky time.New{Timer,Ticker}s
According to the documentation for Go package time, `time.Ticker` and
`time.Timer` are uncollectable by garbage collector finalizers. They
leak until otherwise stopped. This commit ensures that all remaining
instances are stopped upon departure from their relative scopes.
Similar efforts were incrementally done in #29439 and #29114.
```release-note
* pkg/various: plugged various time.Ticker and time.Timer leaks.
```
Automatic merge from submit-queue
pkg/controller/node/nodecontroller: simplify mutex
Similar to #29598, we can rely on the zero-value construction behavior
to embed `sync.Mutex` into parent structs.
/CC: @saad-ali
Automatic merge from submit-queue
make the resource prefix in etcd configurable for cohabitation
This looks big, its not as bad as it seems.
When you have different resources cohabiting, the resource name used for the etcd directory needs to be configurable. HPA in two different groups worked fine before. Now we're looking at something like RC<->RS. They normally store into two different etcd directories. This code allows them to be configured to store into the same location.
To maintain consistency across all resources, I allowed the `StorageFactory` to indicate which `ResourcePrefix` should be used inside `RESTOptions` which already contains storage information.
@lavalamp affects cohabitation.
@smarterclayton @mfojtik prereq for our rc<->rs and d<->dc story.
For non-attachable volumes, do not call GetVolumeName on the plugin and instead
generate a unique name based on the identity of the pod and the name of the volume
within the pod.
Automatic merge from submit-queue
Add an Azure CloudProvider Implementation
This PR adds `Azure` as a cloudprovider provider for Kubernetes. It specifically adds support for native pod networking (via Azure User Defined Routes) and L4 Load Balancing (via Azure Load Balancers).
I did have to add `clusterName` as a parameter to the `LoadBalancers` methods. This is because Azure only allows one "LoadBalancer" object per set of backend machines. This means a single "LoadBalancer" object must be shared across the cluster. The "LoadBalancer" is named via the `cluster-name` parameter passed to `kube-controller-manager` so as to enable multiple clusters per resource group if the user desires such a configuration.
There are few things that I'm a bit unsure about:
1. The implementation of the `Instances` interface. It's not extensively documented, it's not really clear what the different functions are used for, and my questions on the ML didn't get an answer.
2. Counter to the comments on the `LoadBalancers` Interface, I modify the `api.Service` object in `EnsureLoadBalancerDeleted`, but not with the intention of affecting Kube's view of the Service. I simply do it so that I can remove the `Port`s on the `Service` object and then re-use my reconciliation logic that can handle removing stale/deleted Ports.
3. The logging is a bit verbose. I'm looking for guidance on the appropriate log level to use for the chattier bits.
Due to the (current) lack of Instance Metadata Service and lack of Virtual Machine Identity in Azure, the user is required to do a few things to opt-in to this provider. These things are called-out as they are in contrast to AWS/GCE:
1. The user must provision an Azure Active Directory ServicePrincipal with `Contributor` level access to the resource group that the cluster is deployed in. This creation process is documented [by Hashicorp](https://www.packer.io/docs/builders/azure-setup.html) or [on the MSDN Blog](https://blogs.msdn.microsoft.com/arsen/2016/05/11/how-to-create-and-test-azure-service-principal-using-azure-cli/).
2. The user must place a JSON file somewhere on each Node that conforms to the `AzureConfig` struct defined in `azure.go`. (This is automatically done in the Azure flavor of [Kubernetes-Anywhere](https://github.com/kubernetes/kubernetes-anywhere).)
3. The user must specify `--cloud-config=/path/to/azure.json` as an option to `kube-apiserver` and `kube-controller-manager` similarly to how the user would need to pass `--cloud-provider=azure`.
I've been running approximately this code for a month and a half. I only encountered one bug which has since been fixed and covered by a unit test. I've just deployed a new cluster (and a Type=LoadBalancer nginx Service) using this code (via `kubernetes-anywhere`) and have posted [the `kube-controller-manager` logs](https://gist.github.com/colemickens/1bf6a26e7ef9484a72a30b1fcf9fc3cb) for anyone who is interested in seeing the logs of the logic.
If you're interested in this PR, you can use the instructions in my [`azure-kubernetes-demo` repository](https://github.com/colemickens/azure-kubernetes-demo) to deploy a cluster with minimal effort via [`kubernetes-anywhere`](https://github.com/kubernetes/kubernetes-anywhere). (There is currently [a pending PR in `kubernetes-anywhere` that is needed](https://github.com/kubernetes/kubernetes-anywhere/pull/172) in conjuncture with this PR). I also have a pre-built `hyperkube` image: `docker.io/colemickens/hyperkube-amd64:v1.4.0-alpha.0-azure`, which will be kept in sync with the branch this PR stems from.
I'm hoping this can land in the Kubernetes 1.4 timeframe.
CC (potential code reviewers from Azure): @ahmetalpbalkan @brendandixon @paulmey
CC (other interested Azure folk): @brendandburns @johngossman @anandramakrishna @jmspring @jimzim
CC (others who've expressed interest): @codefx9 @edevil @thockin @rootfs
According to the documentation for Go package time, `time.Ticker` and
`time.Timer` are uncollectable by garbage collector finalizers. They
leak until otherwise stopped. This commit ensures that all remaining
instances are stopped upon departure from their relative scopes.
Automatic merge from submit-queue
To break the loop when object found in removeOrphanFinalizer()
To break the loop when object found in removeOrphanFinalizer()
Automatic merge from submit-queue
Allow shareable resources for admission control plugins.
Changes allow admission control plugins to share resources. This is done via new PluginInitialization structure. The structure can be extended for other resources, for now it is an shared informer for namespace plugins (NamespiceLifecycle, NamespaceAutoProvisioning, NamespaceExists).
If a plugins needs some kind of shared resource e.g. client, the client shall be added to PluginInitializer and Wants methods implemented to every plugin which will use it.
Automatic merge from submit-queue
controller/service: minor cleanup
1. always handle short case first for if statement
2. do not capitalize error message
3. put the mutex before the fields it protects
4. prefer switch over if elseif.
Automatic merge from submit-queue
use a separate queue for initial quota calculation
When the quota controller gets backed up on resyncs, it can take a long time to observe the first usage stats which are needed by the admission plugin. This creates a second queue to prioritize the initial calculation.
Automatic merge from submit-queue
Fix "PVC Volume not detached if pod deleted via namespace deletion" issue
Fixes#29051: "PVC Volume not detached if pod deleted via namespace deletion"
This PR:
* Fixes a bug in `desired_state_of_the_world_populator.go` to check the value of `exists` returned by the `podInformer` so that it can delete pods even if the delete event is missed (or fails).
* Reduces the desired state of the world populators sleep period from 5 min to 1 min (reducing the amount of time a volume would remain attached if a volume delete event is missed or fails).
Automatic merge from submit-queue
Allow mounts to run in parallel for non-attachable volumes
This PR:
* Fixes https://github.com/kubernetes/kubernetes/issues/28616
* Enables mount volume operations to run in parallel for non-attachable volume plugins.
* Enables unmount volume operations to run in parallel for all volume plugins.
* Renames `GoRoutineMap` to `GoroutineMap`, resolving a long outstanding request from @thockin: `"Goroutine" is a noun`
When a new rollout with a different size than the previous size of the
deployment is initiated then only the new replica set will notice the
new size. Old replica sets are not updated by the rollout path.
Allow mount volume operations to run in parallel for non-attachable
volume plugins.
Allow unmount volume operations to run in parallel for all volume
plugins.
Automatic merge from submit-queue
Certificate signing controller for TLS bootstrap (alpha)
The controller handles generating and signing certificates when a CertificateSigningRequest has the "Approved" condition. Uses cfssl to support a wide set of possible keys and algorithms. Depends on PR #25562, only the last two commits are relevant to this PR.
cc @mikedanese
[]()
Automatic merge from submit-queue
Improve quota controller performance by eliminating unneeded list calls
Previously, when syncing quota usage, we asked each registered `Evaluator` to determine the usage it knows to track associated with a `GroupKind` even if that particular `GroupKind` had no associated resources under quota.
This fix makes it that when we sync a quota that just had only `Pod` related compute resources, we do not also calculate the usage stats for things like `ConfigMap`, `Secret`, etc. per quota.
This should be a significant performance gain when running large numbers of `Namespace`'s each with `ResourceQuota` that tracks a subset of resources.
/cc @deads2k @kubernetes/rh-cluster-infra
Automatic merge from submit-queue
[GarbageCollector] Let the RC manager set/remove ControllerRef
What's done:
* RC manager sets Controller Ref when creating new pods
* RC manager sets Controller Ref when adopting pods with matching labels but having no controller
* RC manager clears Controller Ref when pod labels change
* RC manager clears pods' Controller Ref when rc's selector changes
* RC manager stops adoption/creating/deleting pods when rc's DeletionTimestamp is set
* RC manager bumps up ObservedGeneration: The [original code](https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/replication/replication_controller_utils.go#L36) will do this.
* Integration tests:
* verifies that changing RC's selector or Pod's Labels triggers adoption/abandoning
* e2e tests (separated to #27151):
* verifies GC deletes the pods created by RC if DeleteOptions.OrphanDependents=false, and orphans the pods if DeleteOptions.OrphanDependents=true.
TODO:
- [x] we need to be able to select Pods that have a specific ControllerRef. Then each time we sync the RC, we will iterate through all the Pods that has a controllerRef pointing the RC, event if the labels of the Pod doesn't match the selector of RC anymore. This will prevent a Pod from stuck with a stale controllerRef, which could be caused by the race between abandoner (the goroutine that removes controllerRef) and worker the goroutine that add controllerRef to pods).
- [ ] use controllerRef instead of calling `getPodController`. This might be carried out by the control-plane team.
- [ ] according to the controllerRef proposal (#25256): "For debugging purposes we want to add an adoptionTime annotation prefixed with kubernetes.io/ which will keep the time of last controller ownership transfer." This might be carried out by the control-plane team.
cc @lavalamp @gmarek
Automatic merge from submit-queue
[garbage collector] add e2e test
This PR also includes some changes to plumb controller-manager's `--enable_garbage_collector` from the environment variable.
The e2e test will not be run by the core suite because it's marked `[Feature:GarbageCollector]`.
The corresponding jenkins job configuration PR is https://github.com/kubernetes/test-infra/pull/132.
Automatic merge from submit-queue
controller: wait for synced old replica sets on Recreate
Partially fixes https://github.com/kubernetes/kubernetes/issues/27362
Any other work on it should be handled in the replica set level (and/or kubelet if it's required)
@kubernetes/deployment PTAL
Automatic merge from submit-queue
Controllers doesn't take any actions when being deleted.
I started doing it for other controllers but it's not always clear to me how it should work. I'll be adding other ones as separate commits to this PR.
cc @caesarxuchao @lavalamp
Automatic merge from submit-queue
Add hooks for cluster health detection
Separate a function that decides if zone is healthy. First real commit for preventing massive pod eviction.
Ref. #28832
cc @davidopp
Automatic merge from submit-queue
Change storeToNodeConditionLister to return []*api.Node instead of api.NodeList for performance
Currently copies that are made while copying/creating api.NodeList are significant part of scheduler profile, and a bunch of them are made in places, that are not-parallelizable.
Ref #28590
Automatic merge from submit-queue
controller/volume: simplify sync logic in syncUnboundClaim
Remove all unnecessary branching logic. No actual logic changes. Code is more readable now.
Wait for specific final state instead of waiting for specific number of
operations in volume unit tests. The tests are more readable and will survive
random goroutine ordering (PV and PVC controller have both their own
goroutine).
Automatic merge from submit-queue
allow lock acquisition injection for quota admission
Allows for custom lock acquisition when composing the quota admission controller.
@derekwaynecarr I'm still experimenting to make sure this satisfies the need downstream, but looking for agreement in principle
Changes:
* moved waiting for synced caches before starting any work
* refactored worker() to really quit on quit
* changed queue to a ratelimiting queue and added retries on errors
* deep-copy deployments before mutating - we still need to deep-copy
replica sets and pods
Automatic merge from submit-queue
Move deployment functions to deployment/util.go not widely used
If the function is not used in multiple areas, move it to deployment/util.go
fixes#26750
Automatic merge from submit-queue
Some scheduler optimizations
Ref #28590
This PR doesn't do anything fancy - it is just reducing amount of memory allocations in scheduler, which in turn significantly speeds up scheduler.
Automatic merge from submit-queue
allow handler to join after the informer has started
This allows an event handler to join after a SharedInformer has started. It can't add any indexes, but it can add its reaction functions.
This works by
1. stopping the flow of events from the reflector (thus stopping updates to our store)
1. registering the new handler
1. sending synthetic "add" events to the new handler only
1. unblocking the flow of events
It would be possible to
1. block
1. list
1. add recorder
1. unblock
1. play list to as-yet unregistered handler
1. block
1. remove recorder
1. play recording
1. add new handler
1. unblock
But that is considerably more complicated. I'd rather not start there since this ought to be the exception rather than the rule.
@wojtek-t who requested this power in the initial review
@smarterclayton @liggitt I think this resolves our all-in-one ordering problem.
@hongchaodeng since this came up on the call
* rolling.go (has all the logic for rolling deployments)
* recreate.go (has all the logic for recreate deployments)
* sync.go (has all the logic for getting and scaling replica sets)
* rollback.go (has all the logic for rolling back a deployment)
* util.go (contains all the utilities used throughout the controller)
Leave back at deployment_controller.go all the necessary bits for
creating, setting up, and running the controller loop.
Also add package documentation.
Automatic merge from submit-queue
Use `CreatedByAnnotation` constant
A nit but didn't want the strings to get out of sync.
Signed-off-by: Doug Davis <dug@us.ibm.com>
Automatic merge from submit-queue
Track object modifications in fake clientset
Fake clientset is used by unit tests extensively but it has some
shortcomings:
- no filtering on namespace and name: tests that want to test objects in
multiple namespaces end up getting all objects from this clientset,
as it doesn't perform any filtering based on name and namespace;
- updates and deletes don't modify the clientset state, so some tests
can get unexpected results if they modify/delete objects using the
clientset;
- it's possible to insert multiple objects with the same
kind/name/namespace, this leads to confusing behavior, as retrieval is
based on the insertion order, but anchors on the last added object as
long as no more objects are added.
This change changes core.ObjectRetriever implementation to track object
adds, updates and deletes.
Some unit tests were depending on the previous (and somewhat incorrect)
behavior. These are fixed in the following few commits.
Automatic merge from submit-queue
Kubelet should mark VolumeInUse before checking if it is Attached
Kubelet should mark VolumeInUse before checking if it is Attached.
Controller should fetch fresh copy of node object before detach instead of relying on node informer cache.
Fixes#27836
Ensure that kublet marks VolumeInUse before checking if it is Attached.
Also ensures that the attach/detach controller always fetches a fresh
copy of the node object before detach (instead ofKubelet relying on node
informer cache).
Using new fake clientset registry exposes the actual flow on pending
namespace finalization: get namespace, create finalizer, list pods and
delete namespace if there are no pods.
Since fake clientset now correctly tracks objects created by deployment
controller, it triggers different controller behavior: controller only
creates replica set once and updates deployment once.
Fake clientset no longer needs to be prepopulated with records: keeping
them in leads to the name conflict on creates. Also, since fake
clientset now respects namespaces, we need to correctly populate them.
Automatic merge from submit-queue
Convert service account token controller to use a work queue
Converts the service account token controller to use a work queue. This allows parallelization of token generation (useful when there are several simultaneous namespaces or service accounts being created). It also lets us requeue failures to be retried sooned than the next sync period (which can be very long).
Fixes an issue seen when a namespace is created with secrets quotaed, and the token controller tries to create a token secret prior to the quota status having been initialized. In that case, the secret is rejected at admission, and the token controller wasn't retrying until the resync period.
Automatic merge from submit-queue
Fix initialization of volume controller caches.
Fix `PersistentVolumeController.initializeCaches()` to pass pointers to volume or claim to `storeObjectUpdate()` and add extra functions to enforce that the right types are checked in the future.
Fixes#28076
Fix PersistentVolumeController.initializeCaches() to pass pointers to volume
or claim to storeObjectUpdate() and add extra functions to enforce that the
right types are checked in the future.
Fixes#28076
Automatic merge from submit-queue
[client-gen]Add Patch to clientset
* add the Patch() method to the clientset.
* I have to rename the existing Patch() method of `Event` to PatchWithEventNamespace() to avoid overriding.
* some minor changes to the fake Patch action.
cc @Random-Liu since he asked for the method
@kubernetes/sig-api-machinery
ref #26580
```release-note
Add the Patch method to the generated clientset.
```
Automatic merge from submit-queue
Fix startup type error in initializeCaches
The following error was getting logged:
PersistentVolumeController can't initialize caches, expected list of volumes, got:
&{TypeMeta:{Kind: APIVersion:} ListMeta:{SelfLink:/api/v1/persistentvolumes ResourceVersion:11} Items:[]}
The tests make extensive use of NewFakeControllerSource which uses api.List
instead of api.PersistentVolumeList. So use reflect to help iterate over the
items then assert the item type.
fixes#27757
Automatic merge from submit-queue
daemon/controller.go: refactor worker
1. function name is better to be verb or verb+noun
2. remove unnecessary func call
Automatic merge from submit-queue
Proportionally scale paused and rolling deployments
Enable paused and rolling deployments to be proportionally scaled.
Also have cleanup policy work for paused deployments.
Fixes#20853Fixes#20966Fixes#20754
@bgrant0607 @janetkuo @ironcladlou @nikhiljindal
<!-- Reviewable:start -->
---
This change is [<img src="http://reviewable.k8s.io/review_button.svg" height="35" align="absmiddle" alt="Reviewable"/>](http://reviewable.k8s.io/reviews/kubernetes/kubernetes/20273)
<!-- Reviewable:end -->