Duration of initialization taint on CPU and window of initial readiness
setting controlled by flags.
Adding API violation exceptions following example of e50340ee23
After my previous changes HPA wasn't behaving correctly in the following
situation:
- Pods use a lot of CPU during initilization, become ready right after they initialize,
- Scale up triggers,
- When new pods become ready HPA counts their usage (even though it's not related to any work that needs doing),
- Another scale up, even though existing pods can handle work, no problem.
Instead discard metric values for pods that are unready and have never
been ready (they may report misleading values, the original reason for
introducing scale up forbidden window).
Use per pod metric when pod is:
- Ready, or
- Not ready but creation timestamp and last readiness change are more
than 10s apart.
In the latter case we asume the pod was ready but later became unready.
We want to use metrics for such pods because sometimes such pods are
unready because they were getting too much load.
Automatic merge from submit-queue (batch tested with PRs 64681, 65907). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Make runTest easier to understand
Fewer nested conditions, more checking for incorrect looking test cases.
**What this PR does / why we need it**: Make HPA tests easier to understand.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Similar to the change we made for `GetObjectMetricReplicas` in the
previous commit. Ensure that `GetExternalMetricReplicas` does not
include unready pods when its determining how many replica it desires.
Including unready pods can lead to over-scaling.
We did not change the behavior of `GetExternalPerPodMetricReplicas`, as
it is slightly less clear what is the desired behavior. We did make some
small naming refactorings to this method, which will make it easier to
ignore unready pods if we decide we want to.
Previously, when `GetObjectMetricReplicas` calculated the desired
replica count, it multiplied the usage ratio by the current number of replicas.
This method caused over-scaling when there were pods that were not ready
for a long period of time. For example, if there were pods A, B, and C,
and only pod A was ready, and the usage ratio was 500%, we would
previously specify 15 pods as the desired replicas (even though really
only one pod was handling the load).
After this change, we now multiple the usage
ratio by the number of ready pods for `GetObjectMetricReplicas`.
In the example above, we'd only desire 5 replica pods.
This change gives `GetObjectMetricReplicas` the same behavior as the
other replica calculator methods. Only `GetExternalMetricReplicas` and
`GetExternalPerPodMetricRepliacs` still allow unready pods to impact the
number of desired replicas. I will fix this issue in the following
commit.
Currently, when performing a scale up, any failed pods (which can be present for example in case of evictions performed by kubelet) will be treated as unready. Unready pods are treated as if they had 0% utilization which will slow down or even block scale up.
After this change, failed pods are ignored in all calculations. This way they do not influence neither scale up nor scale down replica calculations.
With the introduction of the RESTMetrics client, there are two ways to
fetch metrics for auto-scaling. However, they previously shared error
messages. This could be misleading. Make the error message more clearly
show which method is in use.
Fix#18155
Make HPA tolerance configurable as a flag. This change allows us to use
different tolerance values in production/testing.
Signed-off-by: mattjmcnaughton <mattjmcnaughton@gmail.com>
Address `golint` errors in `pkg/controller/podautoscaler`. Note,
I did not address issues around exported types/functions missing
comments, because I'm not sure what the convention within the k8s project is.
Signed-off-by: mattjmcnaughton <mattjmcnaughton@gmail.com>
Automatic merge from submit-queue (batch tested with PRs 51956, 50708)
Move autoscaling/v2 from alpha1 to beta1
This graduates autoscaling/v2alpha1 to autoscaling/v2beta1. The move is more-or-less just a straightforward rename.
Part of kubernetes/features#117
```release-note
v2 of the autoscaling API group, including improvements to the HorizontalPodAutoscaler, has moved from alpha1 to beta1.
```
The new fake client properly represents the resource of `PodMetrics` as
"pods" and the resource of `NodeMetrics` as "nodes". Previously, it
used "podmetricses" and "nodemetrics", respectively.
This fixes up `horizontal_test.go` and `replica_calc_test.go` to use the
new names.
This commit switches over the HPA controller to use the custom metrics
API. It also converts the HPA controller to use the generated client
in k8s.io/metrics for the resource metrics API.
In order to enable support, you must enable
`--horizontal-pod-autoscaler-use-rest-clients` on the
controller-manager, which will switch the HPA controller's MetricsClient
implementation over to use the standard rest clients for both custom
metrics and resource metrics. This requires that at the least resource
metrics API is registered with kube-aggregator, and that the controller
manager is pointed at kube-aggregator. For this to work, Heapster
must be serving the new-style API server (`--api-server=true`).
This commit converts the HPA controller over to using the new version of
the HorizontalPodAutoscaler object found in autoscaling/v2alpha1. Note
that while the autoscaler will accept requests for object metrics, the
scale client will return an error on attempts to get object metrics
(since that requires the new custom metrics API, which is not yet
implemented).
This also enables the HPA object in v2alpha1 as a retrievable API
version by default.
In certain conditions in which the set of metrics returned by Heapster
is completely disjoint from the set of pods returned by the API server,
we can have a request sum of zero, which can cause a panic (due to
division by zero). This checks for that condition.
Fixes#39680
Currently, the HPA considers unready pods the same as ready pods when
looking at their CPU and custom metric usage. However, pods frequently
use extra CPU during initialization, so we want to consider them
separately.
This commit causes the HPA to consider unready pods as having 0 CPU
usage when scaling up, and ignores them when scaling down. If, when
scaling up, factoring the unready pods as having 0 CPU would cause a
downscale instead, we simply choose not to scale. Otherwise, we simply
scale up at the reduced amount caculated by factoring the pods in at
zero CPU usage.
The effect is that unready pods cause the autoscaler to be a bit more
conservative -- large increases in CPU usage can still cause scales,
even with unready pods in the mix, but will not cause the scale factors
to be as large, in anticipation of the new pods later becoming ready and
handling load.
Similarly, if there are pods for which no metrics have been retrieved,
these pods are treated as having 100% of the requested metric when
scaling down, and 0% when scaling up. As above, this cannot change the
direction of the scale.
This commit also changes the HPA to ignore superfluous metrics -- as
long as metrics for all ready pods are present, the HPA we make scaling
decisions. Currently, this only works for CPU. For custom metrics, we
cannot identify which metrics go to which pods if we get superfluous
metrics, so we abort the scale.