Handle failure cases on startup gracefully to avoid causing cascading
errors and poor initialization in other components. Initial errors from
config load cause the initializer to pause and hold requests. Return
typed errors to better communicate failures to clients.
Add code to handle two specific cases - admin wants to bypass
initialization defaulting, and mirror pods (which want to bypass
initialization because the kubelet owns their lifecycle).
- leveraging Config struct (—cloud-config) to store backoff and rate limit on/off and performance configuration
- added add’l error logging
- enabled backoff for vm GET requests
This PR adds two features:
1. add support for isolating the emptyDir volume use. If user
sets a size limit for emptyDir volume, kubelet's eviction manager
monitors its usage
and evict the pod if the usage exceeds the limit.
2. add support for isolating the local storage for container overlay. If
the container's overly usage exceeds the limit defined in container
spec, eviction manager will evict the pod.
An annotation in the pod spec of the form:
podpreset.admission.kubernetes.io/exclude: "true"
Will cause the admission controller to skip manipulating the pod spec,
no matter the labelling.
The annotation for a podpreset acting on a pod has also been slightly
modified to contain a podpreset prefix:
podpreset.admission.kubernetes.io/podpreset-{name} = resource version
Fixes#44161
This commit updates `kubectl describe` to display the new HPA
status conditions. This should make it easier for users to discern
the current state of the HPA.
This commit causes the HPA controller to set a variety of status
conditions using the new `Status.Conditions` field of
autoscaling/v2alpha1. These provide insight into the current state
of the HPA, and generally correspond to similar events being emitted.
This commit adds the new API status conditions to the API types.
The field exists as a field in autoscaling/v2alpha1, and is
round-tripped through an annotation in autoscaling/v1.
Automatic merge from submit-queue (batch tested with PRs 46734, 46810, 46759, 46259, 46771)
Improve code coverage for pkg/kubelet/images/image_gc_manager
**What this PR does / why we need it**:
#39559#40780
code coverage from 74.5% to 77.4%
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 46734, 46810, 46759, 46259, 46771)
OpenAPI aggregation for kube-aggregator
This PR implements OpenAPI aggregation layer for kube-aggregator. On each API registration, it tries to download swagger.spec of the user api server. On failure it will try again next time (either on another add or get /swagger.* on aggregator server) up to five times. To merge specs, it first remove all unrelated paths from the downloaded spec (anything other than group/version of the API service) and then remove all unused definitions. Adding paths are straightforward as they won't have any conflicts, but definitions will most probably have conflicts. To resolve that, we would reused any definition that is not changed (documentation changes are fine) and rename the definition otherwise.
To use this PR, kube aggregator should have nonResourceURLs (for get verb) to user apiserver.
```release-note
Support OpenAPI spec aggregation for kube-aggregator
```
fixes: #43717
Automatic merge from submit-queue (batch tested with PRs 45871, 46498, 46729, 46144, 46804)
Implement kubectl rollout undo and history for DaemonSet
~Depends on #45924, only the 2nd commit needs review~ (merged)
Ref https://github.com/kubernetes/community/pull/527/
TODOs:
- [x] kubectl rollout history
- [x] sort controller history, print overview (with revision number and change cause)
- [x] print detail view (content of a history)
- [x] print template
- [x] ~(do we need to?) print labels and annotations~
- [x] kubectl rollout undo:
- [x] list controller history, figure out which revision to rollback to
- if toRevision == 0, rollback to the latest revision, otherwise choose the history with matching revision
- [x] update the ds using the history to rollback to
- [x] replace the ds template with history's
- [x] ~(do we need to?) replace the ds labels and annotations with history's~
- [x] test-cmd.sh
@kubernetes/sig-apps-pr-reviews @erictune @kow3ns @lukaszo @kargakis @kubernetes/sig-cli-maintainers
---
**Release note**:
```release-note
```
Automatic merge from submit-queue
Add SuccessfulMountVolume message to the events of pod
**What this PR does / why we need it:**
When creating a pod with volume, the volume mount may failed at first, but eventually succeed after retry several times. kubectl describe pod can only see the failed messages, so i think it will be better to add the SuccessfulMountVolume message to the pod events too.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
Fixes#42867
Automatic merge from submit-queue
Delete all dead containers and sandboxes when under disk pressure.
This PR modifies the eviction manager to add dead container and sandbox garbage collection as a resource reclaim function for disk. It also modifies the container GC logic to allow pods that are terminated, but not deleted to be removed.
It still does not delete containers that are less than the minGcAge. This should prevent nodes from entering a permanently bad state if the entire disk is occupied by pods that are terminated (in the state failed, or succeeded), but not deleted.
There are two improvements we should consider making in the future:
- Track the disk space and inodes reclaimed by deleting containers. We currently do not track this, and it prevents us from determining if deleting containers resolves disk pressure. So we may still evict a pod even if we are able to free disk space by deleting dead containers.
- Once we can track disk space and inodes reclaimed, we should consider only deleting the containers we need to in order to relieve disk pressure. This should help avoid a scenario where we try and delete a massive number of containers all at once, and overwhelm the runtime.
/assign @vishh
cc @derekwaynecarr
```release-note
Disk Pressure triggers the deletion of terminated containers on the node.
```
Automatic merge from submit-queue
Delete meaningless check
**What this PR does / why we need it**:
Delete meaningless check
The deleted check is redundant.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 46681, 46786, 46264, 46680, 46805)
Add annotation for image policy webhook fail open.
**What this PR does / why we need it**: there's no good way to audit log if binary verification fails open. Adding an annotation can solve that, and provide a useful tool to audit [non-malicious] containers.
**Release note**: add the annotation "alpha.image-policy.k8s.io/failed-open=true" to pods created when the image policy webhook fails open.
```release-note
Add the `alpha.image-policy.k8s.io/failed-open=true` annotation when the image policy webhook encounters an error and fails open.
```
Automatic merge from submit-queue (batch tested with PRs 46681, 46786, 46264, 46680, 46805)
Fix for-loop and err definition
**What this PR does / why we need it**:
we can use j directly, it's odd to use i then get j through i.
we can put err definition into if{} , after all the para. was only used in if{}.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 40760, 46706, 46783, 46742, 46751)
Fix unit test for kubectl create role
When expected err is not nil but error deos not happen, we should report error in unit test.
**Release note**:
```
NONE
```
Automatic merge from submit-queue
Implement Daemonset history
~Depends on #45867 (the 1st commit, ignore it when reviewing)~ (already merged)
Ref https://github.com/kubernetes/community/pull/527/ and https://github.com/kubernetes/community/pull/594
@kubernetes/sig-apps-api-reviews @kubernetes/sig-apps-pr-reviews @erictune @kow3ns @lukaszo @kargakis
---
TODOs:
- [x] API changes
- [x] (maybe) Remove rollback subresource if we decide to do client-side rollback
- [x] deployment controller
- [x] controller revision
- [x] owner ref (claim & adoption)
- [x] history reconstruct (put revision number, hash collision avoidance)
- [x] de-dup history and relabel pods
- [x] compare ds template with history
- [x] hash labels (put it in controller revision, pods, and maybe deployment)
- [x] clean up old history
- [x] Rename status.uniquifier when we reach consensus in #44774
- [x] e2e tests
- [x] unit tests
- [x] daemoncontroller_test.go
- [x] update_test.go
- [x] ~(maybe) storage_test.go // if we do server side rollback~
kubectl part is in #46144
---
**Release note**:
```release-note
```
Automatic merge from submit-queue
reset resultRun on pod restart
xref https://bugzilla.redhat.com/show_bug.cgi?id=1455056
There is currently an issue where, if the pod is restarted due to liveness probe failures exceeding failureThreshold, the failure count is not reset on the probe worker. When the pod restarts, if the liveness probe fails even once, the pod is restarted again, not honoring failureThreshold on the restart.
```yaml
apiVersion: v1
kind: Pod
metadata:
name: busybox
spec:
containers:
- name: busybox
image: busybox
command:
- sleep
- "3600"
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 3
timeoutSeconds: 1
periodSeconds: 3
successThreshold: 1
failureThreshold: 5
terminationGracePeriodSeconds: 0
```
Before this PR:
```
$ kubectl create -f busybox-probe-fail.yaml
pod "busybox" created
$ kubectl get pod -w
NAME READY STATUS RESTARTS AGE
busybox 1/1 Running 0 4s
busybox 1/1 Running 1 24s
busybox 1/1 Running 2 33s
busybox 0/1 CrashLoopBackOff 2 39s
```
After this PR:
```
$ kubectl create -f busybox-probe-fail.yaml
$ kubectl get pod -w
NAME READY STATUS RESTARTS AGE
busybox 0/1 ContainerCreating 0 2s
busybox 1/1 Running 0 4s
busybox 1/1 Running 1 27s
busybox 1/1 Running 2 45s
```
```release-note
Fix kubelet reset liveness probe failure count across pod restart boundaries
```
Restarts are now happen at even intervals.
@derekwaynecarr
Automatic merge from submit-queue (batch tested with PRs 46782, 46719, 46339, 46609, 46494)
Do not log the content of pod manifest if parsing fails.
**What this PR does / why we need it**:
- ~~only accepts text/plain config file~~
- ~~not log config file content when it's invalid~~
Do not log the content of pod manifest if parsing fails.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#46493
**Special notes for your reviewer**:
/cc @yujuhong
@sig-node-reviewers
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 46782, 46719, 46339, 46609, 46494)
Fix inconsistency in finding cni binaries
Fixes [#46476]
Signed-off-by: Abhinav Dahiya <abhinav.dahiya@coreos.com>
**What this PR does / why we need it**:
This fixes the inconsistency in finding the appropriate cni binaries.
Currently `lo` cniNetwork follows vendorCniDir > binDir whereas default for all others is binDir > vendorCniDir. This PR makes vendorCniDir > binDir as default behavior.
**Why we need it**:
Hypercube right now ships cni binaries in /opt/cni/bin.
And to use latest version of calico you need to override kubelet's /opt/cni/bin from host which means all other cni plugins (flannel, loopback etc...) have to be mounted from host too. Keeping vendordir at higher order allows easy installation of newer versions of plugins.
Automatic merge from submit-queue (batch tested with PRs 46782, 46719, 46339, 46609, 46494)
update default translation of annotations
**What this PR does / why we need it**:
```
using the local cluster. the help of kubectl is not corrent
# ./cluster/kubectl.sh
.......
Settings Commands:
label Update the labels on a resource
annotate Update the annotations on a resourcewatch is only supported on individual resources and resource
collections - %d resources were found
completion Output shell completion code for the specified shell (bash or zsh)
Other Commands:
api-versions Print the supported API versions on the server, in the form of "group/version"
config Modify kubeconfig files
help Help about any command
plugin Runs a command-line plugin
version Print the client and server version information
Use "kubectl <command> --help" for more information about a given command.
Use "kubectl options" for a list of global command-line options (applies to all commands).
```
**Which issue this PR fixes**:
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 46620, 46732, 46773, 46772, 46725)
Improving test coverage for kubelet/kuberuntime.
**What this PR does / why we need it**:
Increases test coverage for kubelet/kuberuntime
https://github.com/kubernetes/kubernetes/issues/46123
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
https://github.com/kubernetes/kubernetes/issues/46123
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 36721, 46483, 45500, 46724, 46036)
retry clientCA post start hook on transient failures
@smarterclayton retries the poststarthook you saw failing.
Having looked through, it seems that I didn't kill the server on the failure.
Automatic merge from submit-queue (batch tested with PRs 36721, 46483, 45500, 46724, 46036)
AWS: Allow configuration of a single security group for ELBs
**What this PR does / why we need it**:
AWS has a hard limit on the number of Security Groups (500). Right now every time an ELB is created Kubernetes is creating a new Security Group. This allows for specifying a Security Group to use for all ELBS
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
For some reason the Diff tool makes this look like it was way more changes than it really was.
**Release note**:
```release-note
```