Automatic merge from submit-queue
Ensure that init containers are preserved during pruning
Pods with multiple init containers were getting the wrong containers
pruned. Fix an error message and add a test.
Fixes#26131
This is needed for the /etc/hosts mount and the downward API to work.
Furthermore, this is required for the reported `PodStatus` to be
correct.
The `Status` bit mostly worked prior to #25062, and this restores that
functionality in addition to the new functionality.
Automatic merge from submit-queue
rkt: Support alternate stage1's via annotation
This provides a basic implementation for setting a stage1 on a per-pod
basis via an annotation.
This provides a basic implementation for setting a stage1 on a per-pod
basis via an annotation. See discussion here for how this approach was arrived at: https://github.com/kubernetes/kubernetes/issues/23944#issuecomment-212653776
It's possible this feature should be gated behind additional knobs, such
as a kubelet flag to filter allowed stage1s, or a check akin to what
priviliged gets in the apiserver.
Currently, it checks `AllowPrivileged`, as a means to let people disable
this feature, though overloading it as stage1 and privileged isn't
ideal.
Fixes#23944
Testing done (note, unfortunately done with some additional ./cluster changes merged in):
```
$ cat examples/stage1-fly/fly-me-to-the-moon.yaml
apiVersion: v1
kind: Pod
metadata:
labels:
name: exit
name: exit-fast
annotations: {"rkt.alpha.kubernetes.io/stage1-name-override": "coreos.com/rkt/stage1-fly:1.3.0"}
spec:
restartPolicy: Never
containers:
- name: exit
image: busybox
command: ["sh", "-c", "ps aux"]
$ kubectl create -f examples/stage1-fly
$ ssh core@minion systemctl status -l --no-pager k8s_2f169b2e-c32a-49e9-a5fb-29ae1f6b4783.service
...
failed
...
May 04 23:33:03 minion rkt[2525]: stage0: error writing /etc/rkt-resolv.conf: open /var/lib/rkt/pods/run/2f169b2e-c32a-49e9-a5fb-29ae1f6b4783/stage1/rootfs/etc/rkt-resolv.conf: no such file or directory
...
# Restart kubelet with allow-privileged=false
$ kubectl create -f examples/stage1-fly
$ kubectl describe exit-fast
...
1m 19s 5 {kubelet euank-e2e-test-minion-dv3u} spec.containers{exit} Warning Failed Failed to create rkt container with error: cannot make "exit-fast_default(17050ce9-1252-11e6-a52a-42010af00002)": running a custom stage1 requires a privileged security context
....
```
Note as well that the "success" here is rkt spitting out an [error message](https://github.com/coreos/rkt/issues/2141) which indicates that the right stage1 was being used at least.
cc @yifan-gu @aaronlevy
Automatic merge from submit-queue
Downward API implementation for resources limits and requests
This is an implementation of Downward API for resources limits and requests, and it works with environment variables and volume plugin.
This is based on proposal https://github.com/kubernetes/kubernetes/pull/24051. This implementation follows API with magic keys approach as discussed in the proposal.
@kubernetes/rh-cluster-infra
<!-- Reviewable:start -->
---
This change is [<img src="http://reviewable.k8s.io/review_button.svg" height="35" align="absmiddle" alt="Reviewable"/>](http://reviewable.k8s.io/reviews/kubernetes/kubernetes/24179)
<!-- Reviewable:end -->
This provides a basic implementation for setting a stage1 on a per-pod
basis via an annotation.
It's possible this feature should be gated behind additional knobs, such
as a kubelet flag to filter allowed stage1s, or a check akin to what
priviliged gets in the apiserver.
Currently, it checks `AllowPrivileged`, as a means to let people disable
this feature, though overloading it as stage1 and privileged isn't
ideal.
Since the containerInfo has the LogPath in it, let's use that and
not manually construct the path ourselves. This also makes the code
less prone to breaking if docker change this path.
Fixes#23695
Automatic merge from submit-queue
kubelet/cadvisor: Refactor cadvisor disk stat/usage interfaces.
basically
1) cadvisor struct will know what runtime the kubelet is, passed in via additional argument to New()
2) rename cadvisor wrapper function to DockerImagesFsInfo() to ImagesFsInfo() and have linux implementation choose a label based on the runtime inside the cadvisor struct
2a) mock/fake/unsupported modified to take the same additional argument in New()
3) kubelet's wrapper for the cadvisor wrapper is renamed in parallel
4) make all tests use new interface
Automatic merge from submit-queue
Fix detection of docker cgroup on RHEL
Check docker's pid file, then fallback to pidof when trying to determine the pid for docker. The
latest docker RPM for RHEL changes /usr/bin/docker from an executable to a shell script (to support
/usr/bin/docker-current and /usr/bin/docker-latest). The pidof check for docker fails in this case,
so we check /var/run/docker.pid first (the default location), and fallback to pidof if that fails.
@kubernetes/sig-node @kubernetes/rh-cluster-infra
Automatic merge from submit-queue
Add support for limiting grace period during soft eviction
Adds eviction manager support in kubelet for max pod graceful termination period when a soft eviction is met.
```release-note
Kubelet evicts pods when available memory falls below configured eviction thresholds
```
/cc @vishh
Automatic merge from submit-queue
Add support for PersistentVolumeClaim in Attacher/Detacher interface
The attach detach interface does not support volumes which are referenced through PVCs. This PR adds that support
Automatic merge from submit-queue
Only expose top N images in `NodeStatus`
Fix#25209
Sorted the image and only pick set top 50 sized images in node status.
cc @vishh
Automatic merge from submit-queue
Updaing QoS policy to be at the pod level
Quality of Service will be derived from an entire Pod Spec, instead of being derived from resource specifications of individual resources per-container.
A Pod is `Guaranteed` iff all its containers have limits == requests for all the first-class resources (cpu, memory as of now).
A Pod is `BestEffort` iff requests & limits are not specified for any resource across all containers.
A Pod is `Burstable` otherwise.
Note: Existing pods might be more susceptible to OOM Kills on the node due to this PR! To protect pods from being OOM killed on the node, set `limits` for all resources across all containers in a pod.
<!-- Reviewable:start -->
---
This change is [<img src="http://reviewable.k8s.io/review_button.svg" height="35" align="absmiddle" alt="Reviewable"/>](http://reviewable.k8s.io/reviews/kubernetes/kubernetes/14943)
<!-- Reviewable:end -->
Automatic merge from submit-queue
kubelet: Don't attempt to apply the oom score if container exited already
Containers could terminate before kubelet applies the oom score. This is normal
and the function should not error out.
This addresses #25844 partially.
/cc @smarterclayton @Random-Liu
Check docker's pid file, then fallback to pidof when trying to determine the pid for docker. The
latest docker RPM for RHEL changes /usr/bin/docker from an executable to a shell script (to support
/usr/bin/docker-current and /usr/bin/docker-latest). The pidof check for docker fails in this case,
so we check /var/run/docker.pid first (the default location), and fallback to pidof if that fails.