In addition create a similar method that doesn't copy objects.
Benchmark for the new no-copy method vs the old one:
```
benchmark old ns/op new ns/op delta
BenchmarkGetControllerOf-12 214 14.8 -93.08%
benchmark old allocs new allocs delta
BenchmarkGetControllerOf-12 1 0 -100.00%
benchmark old bytes new bytes delta
BenchmarkGetControllerOf-12 80 0 -100.00%
```
Benchamrk for the new (copy) method vs the old one:
```
benchmark old ns/op new ns/op delta
BenchmarkGetControllerOf-12 128 114 -10.94%
benchmark old allocs new allocs delta
BenchmarkGetControllerOf-12 1 1 +0.00%
benchmark old bytes new bytes delta
BenchmarkGetControllerOf-12 80 80 +0.00%
```
Overall there is a 10% improvement for the old vs new (copy) method and
huge improvent (x10) for the old vs new (no-copy).
I changed the IsControlledBy and a few other methods to use the new (no-copy) method.
syncService shouldn't return error if the service doesn't exist which
means it's triggered by service deletion, otherwise the service would be
enqueued repeatedly even its cleanup has been executed successfully.
This patch makes syncService return nil if the error is NotFound when
getting the service, like the other controllers do.
This should fix a bug that could break masters when the EndpointSlice
feature gate was enabled. This was all tied to how the apiserver creates
and manages it's own services and endpoints (or in this case endpoint
slices). Consumers of endpoint slices also need to know about the
corresponding service. Previously we were trying to set an owner
reference here for this purpose, but that came with potential downsides
and increased complexity. This commit changes behavior of the apiserver
endpointslice integration to set the service name label instead of owner
references, and simplifies consumer logic to reference that (both are
set by the EndpointSlice controller).
Additionally, this should fix a bug with the EndpointSlice GenerateName
value that had previously been set with a "." as a suffix.
This tests the PDB status update path in DisruptionController and
asserts that conflicting writes (with eviciton handler) are handled
gracefully.
This adds the client-go fake.Clientset into our tests, because that is
the layer required for injecting update failures.
This also adds a TestMain so that DisruptionController logs can be
enabled during test. e.g.,
go test ./pkg/controller/disruption -v -args -v=4
This changes the retry logic in DisruptionController so that it
reconciles update conflicts. In the old behavior, any pdb status update
failure was retried with the same status, regardless of error.
Now there is no retry logic with the status update. The error is passed
up the stack where the PDB can be requeued for processing.
If the PDB status update error is a conflict error, there are some new
special cases:
- failSafe is not triggered, since this is considered a retryable error
- the PDB is requeued immediately (ignoring the rate limiter) because we
assume that conflict can be resolved by getting the latest version
The current mechanism for excluding "master" nodes based on node names
is fragile and should be fixed by using a label exclusion similar to
service load balancers. The legacy code path is preserved behind a
defaulted-on gate and will be removed in the future.
The service load balancer controller should honor the
LegacyNodeRoleBehavior feature gate for checks that use node-roles,
switch to using a non alpha annotation behind the gate, and prepare
to graduate to beta.
Add live list of pods to PVC protection controller, as opposed to doing
only a cache-based list through the Informer. Both lists are performed
while processing a PVC with deletionTimestamp set to check whether Pods
using the PVC exist and remove the finalizer to enable deletion of the
PVC if that's not the case. Prior to this commit only the cache-based
list was done but that's unreliable because a pod using the PVC might
exist but not be in the cache just yet. On the other hand, the live
list is 100% reliable.
Note that it would be enough to do only the live list. Instead, this
commit adds it after the cache-based list and performs it only if the
latter finds no Pod blocking deletion of the PVC being processed. The
rationale is that live lists are expensive and it's desirable to
minimize them. The drawback is that if at the time of the cache-based
list the cache has not been notified yet of the deletion of a Pod using
the PVC the PVC is kept. Correctness is not compromised because the
finalizer will be removed when the Pod deletion notification is
received, but this means PVC deletion is delayed. Reducing live lists
was valued more than deleting PVCs slightly faster.
Also, add a unit test that fails without the change introduced by this
commit and revamp old unit tests. The latter is needed because expected
behavior is described in terms of API calls the controller makes, and
this commit introduces new API calls (the live lists).
before this change, an object would be added to `attemptToDelete` queue only if `gc` detected the transition, simply by check if `deletionTimestamp` was set. After the change, it will check if `foregroundDeletion` finalizer has been set before adding the item to the queue.
During a Deployment update there may be more Pods in the scale target
ref status than in the spec. This test verifies that we do not scale
to the status value. Instead we should stay at the spec value.
Fails before #79035 and passes after.
This is used when the cloudprovider layer is not implementing loadBalancer service.
Implementation will be in a different controller running on master.
Make the PVC protection controller robust to cases where a Pod X is deleted,
then a Pod Y with the same namespaced name is created and the two events are
delivered via a single update notification. Both pods should be processed,
because X might be blocking deletion of a PVC which is not referenced by Y.
Prior to this commit only the newer pod is processed, which means that it
is possible to leak PVCs.
Also, add unit tests to reflect the change.
Add support for scaling to zero pods
minReplicas is allowed to be zero
condition is set once
Based on https://github.com/kubernetes/kubernetes/pull/61423
set original valid condition
add scale to/from zero and invalid metric tests
Scaling up from zero pods ignores tolerance
validate metrics when minReplicas is 0
Document HPA behaviour when minReplicas is 0
Documented minReplicas field in autoscaling APIs
All controllers in controller-manager that deal with objects generically
work with those objects without needing the full object. Update the GC
and quota controller to use PartialObjectMetadata input objects which
is faster and more efficient.
The metadata client uses protobuf and returns only a subset of object
data (the metadata) which allows operations that act only on objects
generically to work much faster. Use the metadata client in the
namespace controller to reduce the amount of work the namespace controller
has to do in large namespaces.
This change adds pending pods to the ignored set first before
selecting pods missing metrics. Pending pods are always ignored when
calculating scale.
When the HPA decides which pods and metric values to take into account
when scaling, it divides the pods into three disjoint subsets: 1)
ready 2) missing metrics and 3) ignored. First the HPA selects pods
which are missing metrics. Then it selects pods should be ignored
because they are not ready yet, or are still consuming CPU during
initialization. All the remaining pods go into the ready set. After
the HPA has decided what direction it wants to scale based on the
ready pods, it considers what might have happened if it had the
missing metrics. It makes a conservative guess about what the missing
metrics might have been, 0% if it wants to scale up--100% if it wants
to scale down. This is a good thing when scaling up, because newly
added pods will likely help reduce the usage ratio, even though their
metrics are missing at the moment. The HPA should wait to see the
results of its previous scale decision before it makes another
one. However when scaling down, it means that many missing metrics can
pin the HPA at high scale, even when load is completely removed. In
particular, when there are many unschedulable pods due to insufficient
cluster capacity, the many missing metrics (assumed to be 100%) can
cause the HPA to avoid scaling down indefinitely.
As the benchmark shows it speeds up the method~x4 and reduces memory
consumption ~x20.
```
benchmark old ns/op new ns/op delta
BenchmarkGetPodMapForDeployment-12 276121 72591 -73.71%
benchmark old allocs new allocs delta
BenchmarkGetPodMapForDeployment-12 241 238 -1.24%
benchmark old bytes new bytes delta
BenchmarkGetPodMapForDeployment-12 554025 28956 -94.77%
```
current scale. Two important ones are when missing metrics might
change the direction of scaling, and when the recommended scale is
within tolerance of the current scale.
The way that ReplicaCalculator signals it's desire to not change the
current scale is by returning the current scale. However the current
scale is from scale.Status.Replicas and can be larger than
scale.Spec.Replicas (e.g. during Deployment rollout with configured
surge). This causes a positive feedback loop because
scale.Status.Replicas is written back into scale.Spec.Replicas,
further increasing the current scale.
This PR fixes the feedback loop by plumbing the replica count from
spec through horizontal.go and replica_calculator.go so the calculator
can punt with the right value.
Its only caller lives in the corresponding test file. Hence, we move the
method called by loadBalancerName over to the test and remove
loadBalancerName.
Handle a case in the Horizontal Pod Autoscaler Controller when scaling
on multiple metrics and one or more is missing or invalid.
If all metrics are missing - return an error and leave the isScalingActive
condition as that for the last invalid metric.
If some metrics are missing/invalid and some are valid and found -
if a scale up would be triggered by the valid metrics ignore the missing
metrics and scale up, if a scale down would be triggered, return an error
and leave the isScalingActive condition as that for the last invalid metric.
Add three tests for handling invalid metrics when scaling on
multiple metrics - one for scaling up successfully (new behaviour)
and two for ensuring we don't scale down (existing behaviour).
If node events are received at a faster rate than they
can be processed then initialization for some nodes will
be delayed. Once they are eventually processed their cloud taint
is removed, but there may already be several update events
for those nodes with the cloud taint still on them already
in the event queue. To avoid re-initializing those nodes,
the cloud taint is checked for again after requesting
the current state of the node. If the cloud taint is no
longer on the node then nil is returned from the
RetryOnConflict, as an error does not need to be logged.
The logging for a successful initialization is also
moved inside the RetryOnConflict so that the early nil
return does not cause the aborted initialization to be
logged as a success.
**What type of PR is this?**
/kind cleanup
**What this PR does / why we need it**:
Staging the GCE Cloud Provider as part of KEP [20190125-removing-in-tree-providers](https://github.com/kubernetes/enhancements/blob/master/keps/sig-cloud-provider/20190125-removing-in-tree-providers.md). Staging repo setup here https://github.com/kubernetes/legacy-cloud-providers
Moves the GCE cloud provider implementation to staging.
This is in preparation for moving the cloud provider code out of tree entirely.
However we need it in staging while the code needs to be consumed both in/out of tree.
**Which issue(s) this PR fixes**:
Fixes #
**Special notes for your reviewer**:
**Does this PR introduce a user-facing change?**:
```
NONE
```
Updated import dependency tracking.
Factored in the cleanup from #77412
Minor fix to go.mod.
The reactor in runTest is set to catch all actions, but eventually it
only handles CreateAction without checking action type which might fail
sometimes when Patch arrives. This fix ensures we handle only the
CreateAction.
Fix a TODO in GC controller that blocks using the new
PartialObjectMetadata flow (because we don't support PUT of
PartialObjectMetadata). This patch is equivalent to the previous
mutation.
This change handles the case where the ith cronjob may have its start
time set to nil.
Previously, the Less method could cause a panic in case the ith
cronjob had its start time set to nil, but the jth cronjob did not. It
would panic when calling Before on a nil StartTime.
This is the 2nd PR to move CSINodeInfo/CSIDriver APIs to
v1beta1 core storage APIs. It includes controller side changes.
It depends on the PR with API changes:
https://github.com/kubernetes/kubernetes/pull/73883
* merge pkg/api/v1/node with pkg/util/node
* update test case for utilnode
* remove package pkg/api/v1/node
* move isNodeReady to internal func
* Split GetNodeCondition into e2e and controller pkg
* fix import errors
We should clean the EndpointsLastChangeTriggerTime annotation when there
is no new trigger time to be exported. Not clearing it may result in
kube-proxy exporting the wrong Network Programming Latency SLI.
clusterCIDR is passed down from kube_env CLUSTER_IP_RANGE to the flag --cluster-cidr.
We plan not to let GKE to pass down CLUSTER_IP_RANGE to kube-env for the master node.
Found using script:
https://gist.github.com/dims/384dea60754042f61d79233603034038
Just run using:
`find . -name .import-restrictions | xargs python ~/junk/sanitize-import-boss.py`
The removed entries are either packages that got moved/renamed/deleted
but are still not cleaned up from .import-restrictions files.
Change-Id: I92c400f74e6f012cc75539311ed4de280e25e918
* Remove multiple spaces after full stop
* Include a single space after a comment
* Fixed a typo
```diff
- eixst
+ exist
```
* Make comment and function name the same
Fixes#71730
0 indicates standby, 1 indicates master, label indicates which lease.
Tweaked name and documentation
Factored in Mike Danese feedback.
Removed dependency on prometheus from client-go using adapter.
Centralized adapter import.
Fixed godeps
Fixed boilerplate.
Put in fixes for caesarxuchao
This PR fixes issue #32727.
When an attach operation fails, it is still possible that the volume
will be attached to the node later. This PR adds the logic to record the
volume to node with attached state no matter whether the operation
succedded or not. If the operation fails, mark the attached state to
false. If the operation succeeded, mark the attached state to true. The
reconciler will still issue attach operation until it returns
successfully. If the pod is removed in the mean time, the reconciler
will issue detach operations for all the volumes no matter what is the
attached state.
In the situation when a PVC is deleted and a new one with the same name
bound to a different PV the "old" PV may fail to recycle since it's
associaded with a PVC that is detected as being in use. This may cause
the recycler processes to hang.
Signed-off-by: PingWang <wang.ping5@zte.com.cn>
add test condition and remove TODO
Signed-off-by: PingWang <wang.ping5@zte.com.cn>
update test
Signed-off-by: PingWang <wang.ping5@zte.com.cn>
- Move from the old github.com/golang/glog to k8s.io/klog
- klog as explicit InitFlags() so we add them as necessary
- we update the other repositories that we vendor that made a similar
change from glog to klog
* github.com/kubernetes/repo-infra
* k8s.io/gengo/
* k8s.io/kube-openapi/
* github.com/google/cadvisor
- Entirely remove all references to glog
- Fix some tests by explicit InitFlags in their init() methods
Change-Id: I92db545ff36fcec83afe98f550c9e630098b3135
This patch introduces glusterfsPersistentVolumeSource addition
to glusterfsVolumeSource. All fields remains same as glusterfsVolumeSource
with an addition of a new field
called `EndpointsNamespace` to define namespace of endpoint in the
spec.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
Individual implementations are not yet being moved.
Fixed all dependencies which call the interface.
Fixed golint exceptions to reflect the move.
Added project info as per @dims and
https://github.com/kubernetes/kubernetes-template-project.
Added dims to the security contacts.
Fixed minor issues.
Added missing template files.
Copied ControllerClientBuilder interface to cp.
This allows us to break the only dependency on K8s/K8s.
Added TODO to ControllerClientBuilder.
Fixed GoDeps.
Factored in feedback from JustinSB.
Since we are going to treat both node status and node lease as node
heartbeat/health signals, this PR makes the renmae changes, so that the
follow-up PRs are easier to review.
HPA will treat initial size of autoscalee to avoid hastily overriding
recomendations made by HPA (if HPA set size and then was restarted) or by user
(initial size should be treated as human-generated recommendation).
- Use VolumeAvaiable instead of empty or pending phase in tests
- Add a test case to verify findMatchingVolume will not choose
non-avaiable PVs if it's not pre-bind
- Add a test case to verify syncClaim will not choose non-avaibalbe PVs
if it's not pre-bind
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.
Added unschedulable and network-unavailable toleration.
Signed-off-by: Da K. Ma <klaus1982.cn@gmail.com>
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
part of #61312
fixes: https://github.com/kubernetes/kubernetes/issues/67606
**Release note**:
```release-note
If `TaintNodesByCondition` is enabled, add `node.kubernetes.io/unschedulable` and
`node.kubernetes.io/network-unavailable` automatically to DaemonSet pods.
```
Automatic merge from submit-queue (batch tested with PRs 65250, 68241). If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.
Use informer cache instead of active pod gets in HPA controller.
**What this PR does / why we need it**:
Use informer cache instead of active pod gets in HPA controller.
**Which issue(s) this PR fixes**:
Fixes#68217
**Release note**:
```release-note
kube-controller-manager: use informer cache instead of active pod gets in HPA controller
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.
CSI Node info registration in kubelet
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#67683
**Special notes for your reviewer**:
Feature issue: https://github.com/kubernetes/features/issues/557
Design doc: https://github.com/kubernetes/community/pull/2034
Missing pieces:
* CSI client retry and exponential backoff logic.
* CSINodeInfo object validation
* e2e test with all the CSI machinery.
An RBAC rule is also added to support external-provisioner topology updates.
**Release note**:
```release-note
Registers volume topology information reported by a node-level Container Storage Interface (CSI) driver. This enables Kubernetes support of CSI topology mechanisms.
```
Automatic merge from submit-queue (batch tested with PRs 67950, 68195). If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.
Consolidate componentconfig code standards
**What this PR does / why we need it**:
This PR fixes a bunch of very small misalignments in ComponentConfig packages:
- Add sane comments to all functions/variables in componentconfig `register.go` files
- Make the `register.go` files of componentconfig pkgs follow the same pattern and not differ from each other like they do today.
- Register the `openapi-gen` tag in all `doc.go` files where the pkg contains _external_ types.
- Add the `groupName` tag where missing
- Fix cases where `addKnownTypes` was registered twice in the `SchemeBuilder`
- Add `Readme` and `OWNERS` files to `Godeps` directories if missing.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
/assign @sttts @thockin
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.
Let the service controller retry when presistUpdate returns a conflict error
**What this PR does / why we need it**:
If a load balancer is changed while provisioning, it will fall into an error state and will not self-recover.
This PR picks up the conflict error and let serviceController retry in order to get the load balancer out of error state.
**Special notes for your reviewer**:
/assign @MrHohn @rramkumar1
**Release note**:
```release-note
Let service controller retry creating load balancer when persistUpdate failed due to conflict.
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.
Implement semantic comparison of VolumeNodeAffinity for unit tests
**What this PR does / why we need it**:
Implements a semantic comparison function for VolumeNodeAffinity that is not sensitive to ordering of various members. Previous reflect.DeepEqual was sensitive to ordering causing it to be flaky.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes # https://github.com/kubernetes/kubernetes/issues/67852
**Special notes for your reviewer**:
We want to able to successfully match `VolumeNodeAffinity{Required:&NodeSelector{NodeSelectorTerms:[{[{a In [1]} {b In [2 3]}] []}],},}` and `VolumeNodeAffinity{Required:&NodeSelector{NodeSelectorTerms:[{[{b In [3 2]} {a In [1]}] []}],},}` without being sensitive to the ordering of requirements with key `a` and `b` or the order of values with key `b`. This fix enables such semantic comparison of VolumeNodeAffinity
We can move `volumeNodeAffinitiesEqual` to volume/utils post code freeze
**Release note**:
```release-note
NONE
```
/sig storage
cc @msau42
Automatic merge from submit-queue (batch tested with PRs 67709, 67556). If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.
Fix volume scheduling issue with pod affinity and anti-affinity
**What this PR does / why we need it**:
The previous design of the volume scheduler had volume assume + bind done before pod assume + bind. This causes issues when trying to evaluate future pods with pod affinity/anti-affinity because the pod has not been assumed while the volumes have been decided.
This PR changes the design so that volume and pod are assumed first, followed by volume and pod binding. Volume binding waits (asynchronously) for the operations to complete or error. This eliminates the subsequent passes through the scheduler to wait for volume binding to complete (although pod events or resyncs may still cause the pod to run through scheduling while binding is still in progress). This design also aligns better with the scheduler framework design, so will make it easier to migrate in the future.
Many changes had to be made in the volume scheduler to handle this new design, mostly around:
* How we cache pending binding operations. Now, any delayed binding PVC that is not fully bound must have a cached binding operation. This also means bind API updates may be repeated.
* Waiting for the bind operation to fully complete, and detecting failure conditions to abort the bind and retry scheduling.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#65131
**Special notes for your reviewer**:
**Release note**:
```release-note
Fixes issue where pod scheduling may fail when using local PVs and pod affinity and anti-affinity without the default StatefulSet OrderedReady pod management policy
```
Automatic merge from submit-queue (batch tested with PRs 66840, 68159). If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.
CSI Cluster Registry and Node Info CRDs Improvements
**What this PR does / why we need it**:
https://github.com/kubernetes/kubernetes/pull/67803 merged before I could address @lavalamp's feedback. This PR addresses his feedback
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Follow up on PR https://github.com/kubernetes/kubernetes/pull/67803
**Special notes for your reviewer**:
**Release note**:
```release-note
```
/assign @lavalamp
/assign @thockin
CC @jsafrane @vladimirvivien @verult @gnufied @childsb
* FindPodVolumes
* Prebound PVCs are treated like unbound immediate PVCs and will error
* Always check for fully bound PVCs and cache bindings for not fully
bound PVCs
* BindPodVolumes
* Retry API updates for not fully bound PVCs even if the assume cache
already marked it
* Wait for PVCs to be fully bound after making the API updates
* Error when detecting binding/provisioning failure conditions
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.
Replace scale down window
**What this PR does / why we need it**:
Replace scale down forbidden window with scale down stabilization window.
This allows scale down based on more than one sample, to avoid rapidly changing size up and down for controllers with fluctuating load.
A bit more in https://docs.google.com/document/d/1IdG3sqgCEaRV3urPLA29IDudCufD89RYCohfBPNeWIM
This PR is copy of #67771 with resolved comments.
**Release note**:
```release-note
Replace scale down forbidden window with scale down stabilization window. Rather than waiting a fixed period of time between scale downs HPA now scales down to the highest recommendation it during the scale down stabilization window.
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.
Add gnufied as approver for attach/detach controller
Hopefully has reviewed and made enough fixes in this
area to understand the code thoroughly.
```release-note
None
```
/assign @saad-ali @jsafrane
Automatic merge from submit-queue (batch tested with PRs 64283, 67910, 67803, 68100). If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.
CSI Cluster Registry and Node Info CRDs
**What this PR does / why we need it**:
Introduces the new `CSIDriver` and `CSINodeInfo` API Object as proposed in https://github.com/kubernetes/community/pull/2514 and https://github.com/kubernetes/community/pull/2034
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes https://github.com/kubernetes/features/issues/594
**Special notes for your reviewer**:
Per the discussion in https://groups.google.com/d/msg/kubernetes-sig-storage-wg-csi/x5CchIP9qiI/D_TyOrn2CwAJ the API is being added to the staging directory of the `kubernetes/kubernetes` repo because the consumers will be attach/detach controller and possibly kubelet, but it will be installed as a CRD (because we want to move in the direction where the API server is Kubernetes agnostic, and all Kubernetes specific types are installed).
**Release note**:
```release-note
Introduce CSI Cluster Registration mechanism to ease CSI plugin discovery and allow CSI drivers to customize Kubernetes' interaction with them.
```
CC @jsafrane
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.
Change CPU sample sanitization in HPA.
**What this PR does / why we need it**:
Change CPU sample sanitization in HPA.
Ignore samples if:
- Pod is beeing initalized - 5 minutes from start defined by flag
- pod is unready
- pod is ready but full window of metric hasn't been colected since
transition
- Pod is initialized - 5 minutes from start defined by flag:
- Pod has never been ready after initial readiness period.
**Release notes:**
```release-note
Improve CPU sample sanitization in HPA by taking metric's freshness into account.
```
Ignore samples if:
- Pod is beeing initalized - 5 minutes from start defined by flag
- pod is unready
- pod is ready but full window of metric hasn't been colected since
transition
- Pod is initialized - 5 minutes from start defined by flag:
- Pod has never been ready after initial readiness period.
Automatic merge from submit-queue (batch tested with PRs 67745, 67432, 67569, 67825, 67943). If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.
Fix VMWare VM freezing bug by reverting #51066
**What this PR does / why we need it**: kube-controller-manager, VSphere specific: When the controller tries to attach a Volume to Node A that is already attached to Node B, Node A freezes until the volume is attached. Kubernetes continues to try to attach the volume as it thinks that it's 'multi-attachable' when it's not. #51066 is the culprit.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes https://github.com/vmware/kubernetes/issues/500 / https://github.com/vmware/kubernetes/issues/502 (same issue)
**Special notes for your reviewer**:
- Repro:
Vsphere installation, any k8s version from 1.8 and above, pod with attached PV/PVC/VMDK:
1. cordon the node which the pod is in
2. `kubectl delete po/[pod] --force --grace-period=0`
3. the pod is immediately rescheduled to a new node. Grab the new node from a `kubectl describe [pod]` and attempt to Ping it or SSH into it.
4. you can see that pings/ssh fail to reach the new node. `kubectl get node` shows it as 'NotReady'. New node is frozen until the volume is attached - usually 1 minute freeze for 1 volume in a low-load cluster, and many minutes more with higher loads and more volumes involved.
- Patch verification:
Tested a custom patched 1.9.10 kube-controller-manager with #51066 reverted and the above bug is resolved - can't repro it anymore. New node doesn't freeze at all, and attaching happens quite quickly, in a few seconds.
**Release note**:
```
Fix VSphere VM Freezing bug by reverting #51066
```
Automatic merge from submit-queue (batch tested with PRs 67067, 67947). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Do not count soft-deleted pods for scaling purposes in HPA controller
**What this PR does / why we need it**:
The metrics of "soft-deleted" pods in general to be deleted should probably not matter for scaling purposes, since they'll be gone "soon", whether they're nodelost or just normally delete.
As long as soft-deleted pods still exist, they prevent normal scale up.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes https://github.com/kubernetes/kubernetes/issues/62845
**Special notes for your reviewer**:
**Release note**:
```release-note
Stop counting soft-deleted pods for scaling purposes in HPA controller to avoid soft-deleted pods incorrectly affecting scale up replica count calculation.
```
Automatic merge from submit-queue (batch tested with PRs 67938, 66719, 67883). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Remove incorrect glog error from Horizontal Pod Autoscaler Controller.
**What this PR does / why we need it**:
Pro removes incorrect glog error from Horizontal Pod Autoscaler Controller.
**Release note:**
```release-note
none
```
Automatic merge from submit-queue (batch tested with PRs 67694, 64973, 67902). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
SCTP support implementation for Kubernetes
**What this PR does / why we need it**: This PR adds SCTP support to Kubernetes, including Service, Endpoint, and NetworkPolicy.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#44485
**Special notes for your reviewer**:
**Release note**:
```release-note
SCTP is now supported as additional protocol (alpha) alongside TCP and UDP in Pod, Service, Endpoint, and NetworkPolicy.
```
Automatic merge from submit-queue (batch tested with PRs 64597, 67854, 67734, 67917, 67688). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
fix an issue that scheduling doesn't respect NodeLost status of a node
**What this PR does / why we need it**:
- if Node is in UnknowStatus, apply unreachable taint with NoSchedule effect
- some internal data structure refactoring
- update unit test
**Which issue(s) this PR fixes**:
Fixes#67733, and very likely #67536
**Special notes for your reviewer**:
See detailed reproducing steps in #67733.
**Release note**:
```release-note
Apply unreachable taint to a node when it lost network connection.
```
The requested Service Protocol is checked against the supported protocols of GCE Internal LB. The supported protocols are TCP and UDP.
SCTP is not supported by OpenStack LBaaS. If SCTP is requested in a Service with type=LoadBalancer, the request is rejected. Comment style is also corrected.
SCTP is not allowed for LoadBalancer Service and for HostPort. Kube-proxy can be configured not to start listening on the host port for SCTP: see the new SCTPUserSpaceNode parameter
changed the vendor github.com/nokia/sctp to github.com/ishidawataru/sctp. I.e. from now on we use the upstream version.
netexec.go compilation fixed. Various test cases fixed
SCTP related conformance tests removed. Netexec's pod definition and Dockerfile are updated to expose the new SCTP port(8082)
SCTP related e2e test cases are removed as the e2e test systems do not support SCTP
sctp related firewall config is removed from cluster/gce/util.sh. Variable name sctp_addr is corrected to sctpAddr in pkg/proxy/ipvs/proxier.go
cluster/gce/util.sh is copied from master
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
controller expectations for deletion can be met by 404
A controller asks pod control to delete a pod because it wants the pod to be gone. It doesn't really care if the imperative delete action itself succeeds. When the pod is already gone (404), then the desire of the controller is met.
Since the pods themselves are cache driven, you can hit this condition more than you may like. See https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/replicaset/replica_set.go#L582 as an example.
@kubernetes/sig-apps-bugs
/assign @janetkuo @tnozicka
```release-note
latent controller caches no longer cause repeating deletion messages for deleted pods
```
Automatic merge from submit-queue (batch tested with PRs 66916, 67252, 67794, 67619, 67328). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix HPA sample sanitization
**What this PR does / why we need it**: @mwielgus pointed out a case when HPA fails as a result of my changes to HPA algorithm:
- Have pods that use a lot of CPU during initilization, become ready right after they initialize,
- Trigger a scale up,
- When new pods become ready will will count their usage (even though it's not related to any work that needs doing),
- This triggers another scale up, even though existing pods can handle work, no problem.
The fix is:
- Use all samples for non-cpu metrics.
- Only use CPU samples if:
- Pod is ready and was started more than 2 minutes ago, or
- Pod is unready and last readiness change happened more than 10s after it was started.
Reasoning behind this in: https://docs.google.com/document/d/1UdtYedhmCxjaJIQi6hwJMY0eHQQKxlVD8lSHZC1BPOA/edit
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
**Special notes for your reviewer**:
**Release note**:
```release-note
Replace scale up forbidden window with disregarding CPU samples collected when pod was initializing.
```
Duration of initialization taint on CPU and window of initial readiness
setting controlled by flags.
Adding API violation exceptions following example of e50340ee23
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Remove incorrect comment
**What this PR does / why we need it**:
These code did not Update the revisions labels, the comment is incorrect
```
// Update the revisions name and labels
clone.Name = ControllerRevisionName(parent.GetName(), hash)
ns := parent.GetNamespace()
created, err := rh.client.AppsV1().ControllerRevisions(ns).Create(clone)
```
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
NONE
**Release note**:
```
NONE
```
/kind cleanup
/release-note-none
/sig apps