The HPA controller keeps a flat history of recommendations for
stabilization. However when both up and down scale stabilization are
configured, the interpretation of the history changes depending on the
direction of movement. What we want is to keep the stabilized
recommendation within the envelope of the minimum and maximum over
configured stabilization windows. We should only move when the
envelope forces a move.
When a pod is deleted, it is given a deletion timestamp. However the
pod might still run for some time during graceful shutdown. During
this time it might still produce CPU utilization metrics and be in a
Running phase.
Currently the HPA replica calculator attempts to ignore deleted pods
by skipping over them. However by not adding them to the ignoredPods
set, their metrics are not removed from the average utilization
calculation. This allows pods in the process of shutting down to drag
down the recommmended number of replicas by producing near 0%
utilization metrics.
In fact the ignoredPods set is misnomer. Those pods are not fully
ignored. When the replica calculator recommends to scale up, 0%
utilization metrics are filled in for those pods to limit the scale
up. This prevents overscaling when pods take some time to startup. In
fact, there should be 4 sets considered (readyPods, unreadyPods,
missingPods, ignoredPods) not just 3.
This change renames ignoredPods as unreadyPods and leaves the scaleup
limiting semantics. Another set (actually) ignoredPods is added to
which delete pods are added instead of being skipped during
grouping. Both ignoredPods and unreadyPods have their metrics removed
from consideration. But only unreadyPods have 0% utilization metrics
filled in upon scaleup.
During a Deployment update there may be more Pods in the scale target
ref status than in the spec. This test verifies that we do not scale
to the status value. Instead we should stay at the spec value.
Fails before #79035 and passes after.
Add support for scaling to zero pods
minReplicas is allowed to be zero
condition is set once
Based on https://github.com/kubernetes/kubernetes/pull/61423
set original valid condition
add scale to/from zero and invalid metric tests
Scaling up from zero pods ignores tolerance
validate metrics when minReplicas is 0
Document HPA behaviour when minReplicas is 0
Documented minReplicas field in autoscaling APIs
This change adds pending pods to the ignored set first before
selecting pods missing metrics. Pending pods are always ignored when
calculating scale.
When the HPA decides which pods and metric values to take into account
when scaling, it divides the pods into three disjoint subsets: 1)
ready 2) missing metrics and 3) ignored. First the HPA selects pods
which are missing metrics. Then it selects pods should be ignored
because they are not ready yet, or are still consuming CPU during
initialization. All the remaining pods go into the ready set. After
the HPA has decided what direction it wants to scale based on the
ready pods, it considers what might have happened if it had the
missing metrics. It makes a conservative guess about what the missing
metrics might have been, 0% if it wants to scale up--100% if it wants
to scale down. This is a good thing when scaling up, because newly
added pods will likely help reduce the usage ratio, even though their
metrics are missing at the moment. The HPA should wait to see the
results of its previous scale decision before it makes another
one. However when scaling down, it means that many missing metrics can
pin the HPA at high scale, even when load is completely removed. In
particular, when there are many unschedulable pods due to insufficient
cluster capacity, the many missing metrics (assumed to be 100%) can
cause the HPA to avoid scaling down indefinitely.
current scale. Two important ones are when missing metrics might
change the direction of scaling, and when the recommended scale is
within tolerance of the current scale.
The way that ReplicaCalculator signals it's desire to not change the
current scale is by returning the current scale. However the current
scale is from scale.Status.Replicas and can be larger than
scale.Spec.Replicas (e.g. during Deployment rollout with configured
surge). This causes a positive feedback loop because
scale.Status.Replicas is written back into scale.Spec.Replicas,
further increasing the current scale.
This PR fixes the feedback loop by plumbing the replica count from
spec through horizontal.go and replica_calculator.go so the calculator
can punt with the right value.
Handle a case in the Horizontal Pod Autoscaler Controller when scaling
on multiple metrics and one or more is missing or invalid.
If all metrics are missing - return an error and leave the isScalingActive
condition as that for the last invalid metric.
If some metrics are missing/invalid and some are valid and found -
if a scale up would be triggered by the valid metrics ignore the missing
metrics and scale up, if a scale down would be triggered, return an error
and leave the isScalingActive condition as that for the last invalid metric.
Add three tests for handling invalid metrics when scaling on
multiple metrics - one for scaling up successfully (new behaviour)
and two for ensuring we don't scale down (existing behaviour).
The reactor in runTest is set to catch all actions, but eventually it
only handles CreateAction without checking action type which might fail
sometimes when Patch arrives. This fix ensures we handle only the
CreateAction.
- Move from the old github.com/golang/glog to k8s.io/klog
- klog as explicit InitFlags() so we add them as necessary
- we update the other repositories that we vendor that made a similar
change from glog to klog
* github.com/kubernetes/repo-infra
* k8s.io/gengo/
* k8s.io/kube-openapi/
* github.com/google/cadvisor
- Entirely remove all references to glog
- Fix some tests by explicit InitFlags in their init() methods
Change-Id: I92db545ff36fcec83afe98f550c9e630098b3135
HPA will treat initial size of autoscalee to avoid hastily overriding
recomendations made by HPA (if HPA set size and then was restarted) or by user
(initial size should be treated as human-generated recommendation).
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.
Change CPU sample sanitization in HPA.
**What this PR does / why we need it**:
Change CPU sample sanitization in HPA.
Ignore samples if:
- Pod is beeing initalized - 5 minutes from start defined by flag
- pod is unready
- pod is ready but full window of metric hasn't been colected since
transition
- Pod is initialized - 5 minutes from start defined by flag:
- Pod has never been ready after initial readiness period.
**Release notes:**
```release-note
Improve CPU sample sanitization in HPA by taking metric's freshness into account.
```
Ignore samples if:
- Pod is beeing initalized - 5 minutes from start defined by flag
- pod is unready
- pod is ready but full window of metric hasn't been colected since
transition
- Pod is initialized - 5 minutes from start defined by flag:
- Pod has never been ready after initial readiness period.
Automatic merge from submit-queue (batch tested with PRs 67067, 67947). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Do not count soft-deleted pods for scaling purposes in HPA controller
**What this PR does / why we need it**:
The metrics of "soft-deleted" pods in general to be deleted should probably not matter for scaling purposes, since they'll be gone "soon", whether they're nodelost or just normally delete.
As long as soft-deleted pods still exist, they prevent normal scale up.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes https://github.com/kubernetes/kubernetes/issues/62845
**Special notes for your reviewer**:
**Release note**:
```release-note
Stop counting soft-deleted pods for scaling purposes in HPA controller to avoid soft-deleted pods incorrectly affecting scale up replica count calculation.
```
Automatic merge from submit-queue (batch tested with PRs 66916, 67252, 67794, 67619, 67328). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix HPA sample sanitization
**What this PR does / why we need it**: @mwielgus pointed out a case when HPA fails as a result of my changes to HPA algorithm:
- Have pods that use a lot of CPU during initilization, become ready right after they initialize,
- Trigger a scale up,
- When new pods become ready will will count their usage (even though it's not related to any work that needs doing),
- This triggers another scale up, even though existing pods can handle work, no problem.
The fix is:
- Use all samples for non-cpu metrics.
- Only use CPU samples if:
- Pod is ready and was started more than 2 minutes ago, or
- Pod is unready and last readiness change happened more than 10s after it was started.
Reasoning behind this in: https://docs.google.com/document/d/1UdtYedhmCxjaJIQi6hwJMY0eHQQKxlVD8lSHZC1BPOA/edit
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
**Special notes for your reviewer**:
**Release note**:
```release-note
Replace scale up forbidden window with disregarding CPU samples collected when pod was initializing.
```
Duration of initialization taint on CPU and window of initial readiness
setting controlled by flags.
Adding API violation exceptions following example of e50340ee23
After my previous changes HPA wasn't behaving correctly in the following
situation:
- Pods use a lot of CPU during initilization, become ready right after they initialize,
- Scale up triggers,
- When new pods become ready HPA counts their usage (even though it's not related to any work that needs doing),
- Another scale up, even though existing pods can handle work, no problem.
Instead discard metric values for pods that are unready and have never
been ready (they may report misleading values, the original reason for
introducing scale up forbidden window).
Use per pod metric when pod is:
- Ready, or
- Not ready but creation timestamp and last readiness change are more
than 10s apart.
In the latter case we asume the pod was ready but later became unready.
We want to use metrics for such pods because sometimes such pods are
unready because they were getting too much load.
Automatic merge from submit-queue (batch tested with PRs 64681, 65907). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Make runTest easier to understand
Fewer nested conditions, more checking for incorrect looking test cases.
**What this PR does / why we need it**: Make HPA tests easier to understand.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 66175, 66324, 65828, 65901, 66350). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Hpa improv refactor replica calc test
**What this PR does / why we need it**: prepareTestClient generates 4 fake clients, using replicaCalcTestCase object. This PR extracts a separate helper for generating each fake independently.
**Which issue(s) this PR fixes**
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```