Automatic merge from submit-queue (batch tested with PRs 62266, 64351, 64366, 64235, 64560). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Adding CSI driver registration with plugin watcher
Adding CSI driver registration bits. The registration process will leverage driver-registrar side which will open the `registration` socket and will listen for pluginwatcher's GetInfo calls.
```release-note
Adding CSI driver registration code.
```
/sig sig-storage
Automatic merge from submit-queue (batch tested with PRs 62266, 64351, 64366, 64235, 64560). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Bind mount subpath with same read/write settings as underlying volume
**What this PR does / why we need it**:
https://github.com/kubernetes/kubernetes/pull/63045 broke two scenarios:
* If volumeMount path already exists in container image, container runtime will try to chown the volume
* In SELinux system, we will try to set SELinux labels when starting the container
This fix makes it so that the subpath bind mount will inherit the read/write settings of the underlying volume mount. It does this by using the "bind,remount" mount options when doing the bind mount.
The underlying volume mount is ro when the volumeSource.readOnly flag is set. This is for persistent volume types like PVC, GCE PD, NFS, etc. When this is set, we won't try to configure SELinux labels. Also in this mode, subpaths have to already exist in the volume, we cannot make new directories on a read only volume.
When volumeMount.readOnly is set, the container runtime is in charge of making the volume in the container readOnly, but the underlying volume mount on the host can be writable. This can be set for any volume type, and is permanently set for atomic volume types like configmaps, secrets. In this case, SELinux labels will be applied before the container runtime makes the volume readOnly. And subpaths don't have to exist.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#64120
**Special notes for your reviewer**:
**Release note**:
```release-note
Fixes issue for readOnly subpath mounts for SELinux systems and when the volume mountPath already existed in the container image.
```
Automatic merge from submit-queue (batch tested with PRs 62266, 64351, 64366, 64235, 64560). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add log and fs stats for Windows containers
**What this PR does / why we need it**:
Add log and fs stats for Windows containers.
Without this, kubelet will report errors continuously:
```
Unable to fetch container log stats for path \var\log\pods\2a70ed65-37ae-11e8-8730-000d3a14b1a0\echo: Du not supported for this build.
```
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#60180#62047
**Special notes for your reviewer**:
**Release note**:
```release-note
Add log and fs stats for Windows containers
```
Automatic merge from submit-queue (batch tested with PRs 63453, 64592, 64482, 64618, 64661). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Revert "Remove rescheduler and corresponding tests from master"
Reverts kubernetes/kubernetes#64364
After discussing with @bsalamat on how DS controllers(ref: https://github.com/kubernetes/kubernetes/pull/63223#discussion_r192277527) cannot create pods if the cluster is at capacity and they have to rely on rescheduler for making some space, we thought it is better to
- Bring rescheduler back.
- Make rescheduler priority aware.
- If cluster is full and if **only** DS controller is not able to create pods, let rescheduler be run and let it evict some pods which have less priority.
- The DS controller pods will be scheduled now.
So, I am reverting this PR now. Step 2, 3 above are going to be in rescheduler.
/cc @bsalamat @aveshagarwal @k82cn
Please let me know your thoughts on this.
```release-note
Revert #64364 to resurrect rescheduler. More info https://github.com/kubernetes/kubernetes/issues/64725 :)
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix Windows CNI for the sandbox case
**What this PR does / why we need it**:
Windows supports both sandbox and non-sandbox cases. The non-sandbox
case is for Windows Server 2016 and for Windows Server version greater
than 1709 which use Hyper-V containers.
Currently, the CNI on Windows fetches the IP from the containers
within the pods regardless of the mode. This should be done only
in the non-sandbox mode where the IP of the actual container
will be different than the IP of the sandbox container.
In the case where the sandbox container is supported, all the containers
from the same pod will share the network details of the sandbox container.
This patch updates the CNI to fetch the IP from the sandbox container
when this mode is supported.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#64188
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 63348, 63839, 63143, 64447, 64567). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Containerized subpath
**What this PR does / why we need it**:
Containerized kubelet needs a different implementation of `PrepareSafeSubpath` than kubelet running directly on the host.
On the host we safely open the subpath and then bind-mount `/proc/<pidof kubelet>/fd/<descriptor of opened subpath>`.
With kubelet running in a container, `/proc/xxx/fd/yy` on the host contains path that works only inside the container, i.e. `/rootfs/path/to/subpath` and thus any bind-mount on the host fails.
Solution:
- safely open the subpath and gets its device ID and inode number
- blindly bind-mount the subpath to `/var/lib/kubelet/pods/<uid>/volume-subpaths/<name of container>/<id of mount>`. This is potentially unsafe, because user can change the subpath source to a link to a bad place (say `/run/docker.sock`) just before the bind-mount.
- get device ID and inode number of the destination. Typical users can't modify this file, as it lies on /var/lib/kubelet on the host.
- compare these device IDs and inode numbers.
**Which issue(s) this PR fixes**
Fixes#61456
**Special notes for your reviewer**:
The PR contains some refactoring of `doBindSubPath` to extract the common code. New `doNsEnterBindSubPath` is added for the nsenter related parts.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 63348, 63839, 63143, 64447, 64567). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Move pkg/scheduler/schedulercache -> pkg/scheduler/cache
**What this PR does / why we need it**:
Move pkg/scheduler/schedulercache -> pkg/scheduler/cache
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#63813
**Special notes for your reviewer**:
In order to prevent name conflicts still rename the `cache` to `schedulercache`.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add proxy for container streaming in kubelet for streaming auth.
For https://github.com/kubernetes/kubernetes/issues/36666, option 2 of https://github.com/kubernetes/kubernetes/issues/36666#issuecomment-378440458.
This PR:
1. Removed the `DirectStreamingRuntime`, and changed `IndirectStreamingRuntime` to `StreamingRuntime`. All `DirectStreamingRuntime`s, `dockertools` and `rkt`, were removed.
2. Proxy container streaming in kubelet instead of returning redirect to apiserver. This solves the container runtime authentication issue, which is what we agreed on in https://github.com/kubernetes/kubernetes/issues/36666.
Please note that, this PR replaced the redirect with proxy directly instead of adding a knob to switch between the 2 behaviors. For existing CRI runtimes like containerd and cri-o, they should change to serve container streaming on localhost, so as to make the whole container streaming connection secure.
If a general authentication mechanism proposed in https://github.com/kubernetes/kubernetes/issues/62747 is ready, we can switch back to redirect, and all code can be found in github history.
Please also note that this added some overhead in kubelet when there are container streaming connections. However, the actual bottleneck is in the apiserver anyway, because it does proxy for all container streaming happens in the cluster. So it seems fine to get security and simplicity with this overhead. @derekwaynecarr @mrunalp Are you ok with this? Or do you prefer a knob?
@yujuhong @timstclair @dchen1107 @mikebrow @feiskyer
/cc @kubernetes/sig-node-pr-reviews
**Release note**:
```release-note
Kubelet now proxies container streaming between apiserver and container runtime. The connection between kubelet and apiserver is authenticated. Container runtime should change streaming server to serve on localhost, to make the connection between kubelet and container runtime local.
In this way, the whole container streaming connection is secure. To switch back to the old behavior, set `--redirect-container-streaming=true` flag.
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Remove rescheduler and corresponding tests from master
**What this PR does / why we need it**:
This is to remove rescheduler from master branch as we are promoting priority and preemption to beta.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Part of #57471
**Special notes for your reviewer**:
/cc @bsalamat @aveshagarwal
**Release note**:
```release-note
Remove rescheduler from master.
```
Automatic merge from submit-queue (batch tested with PRs 64338, 64219, 64486, 64495, 64347). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
remove unused status per TODO
This should have been deleted in #63221, as it is now unused.
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 57082, 64325, 64016, 64443, 64403). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
small nit in the annotations of pkg/kubelet/container/os.go
**What this PR does / why we need it**:
just a small nit in the annotations of container/os.go, but, it looks quite uncomfortable cause others all get right.
Automatic merge from submit-queue (batch tested with PRs 61803, 64305, 64170, 64361, 64339). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
add a flag to control the cap on images reported in node status
While I normally try to avoid adding flags, this is a short term
scalability fix for v1.11, and there are other long-term solutions in
the works, so we shouldn't commit to this in the v1beta1 Kubelet config.
Flags are our escape hatch here.
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 58920, 58327, 60577, 49388, 62306). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Dynamic env in subpath - Fixes Issue 48677
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#48677
**Special notes for your reviewer**:
**Release note**:
```release-note
Adds the VolumeSubpathEnvExpansion alpha feature to support environment variable expansion
Sub-paths cannot be mounted with a dynamic volume mount name.
This fix provides environment variable expansion to sub paths
This reduces the need to manage symbolic linking within sidecar init containers to achieve the same goal
```
While I normally try to avoid adding flags, this is a short term
scalability fix for v1.11, and there are other long-term solutions in
the works, so we shouldn't commit to this in the v1beta1 Kubelet config.
Flags are our escape hatch.
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add metadata to kubelet eviction event annotations
**What this PR does / why we need it**:
Add annotations to kubelet eviction events. Annotations include
"offending_containers" : comma-seperated list of containers.
"offending_containers_usage": comma-seperated list of usage.
"starved_resource": v1.ResourceName of the starved resource
**Special notes for your reviewer**:
Adding annotations to events required changing the `EventRecorder` interface to add a `AnnotatedEventf` function, which can add annotations to an event.
**Release note**:
```release-note
NONE
```
/assign @dchen1107
cc @mwielgus @schylek @kgrygiel
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
improve test: verify kubelet.config.Restore only happen once
**What this PR does / why we need it**:
This patch is to add additional test coverage of pod config restore,
it verifies that restore can only happen once.
in the second restore attempt, we should expect no error and no channel update.
**Which issue(s) this PR fixes**:
this is a test improvement based on test been added at https://github.com/kubernetes/kubernetes/pull/63553
**Special notes for your reviewer**:
**Release note**:
```release-note
None
```
/sig node
/cc @rphillips @jiayingz @vikaschoudhary16 @anfernee @Random-Liu @dchen1107 @derekwaynecarr
@vishh @yujuhong @tallclair
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Validate cgroups-per-qos for Windows
**What this PR does / why we need it**:
cgroups-per-qos and enforce-node-allocatable is not supported on Windows, but kubelet allows it on Windows. And then Pods may stuck in terminating state because of it. Refer #61716.
This PR adds validation for them and make kubelet refusing to start in this case.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#61716
**Special notes for your reviewer**:
**Release note**:
```release-note
Fail fast if cgroups-per-qos is set on Windows
```
The goal of this change is to remove the registration of signal
handling from pkg/kubelet. We now pass in a stop channel.
If you register a signal handler in `main()` to aid in a controlled
and deliberate exit then the handler registered in `pkg/kubelet` often
wins and the process exits immediately. This means all other signal
handler registrations are currently racy if `DockerServer.Start()` is
directly or indirectly invoked.
This change also removes another signal handler registration from
`NewAPIServerCommand()`; a stop channel is now passed to this
function.
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix typo in volume_stats.go
**What this PR does / why we need it**:
While reviewing the implementation details I came across a typo in volume_stats.go
sed/volumeStatsCollecotr/volumeStatsCollector/
**Release note**:
```release-note
NONE
```
Windows supports both sandbox and non-sandbox cases. The non-sandbox
case is for Windows Server 2016 and for Windows Server version greater
than 1709 which use Hyper-V containers.
Currently, the CNI on Windows fetches the IP from the containers
within the pods regardless of the mode. This should be done only
in the non-sandbox mode where the IP of the actual container
will be different than the IP of the sandbox container.
In the case where the sandbox container is supported, all the containers
from the same pod will share the network details of the sandbox container.
This patch updates the CNI to fetch the IP from the sandbox container
when this mode is supported.
Signed-off-by: Alin Balutoiu <abalutoiu@cloudbasesolutions.com>
Automatic merge from submit-queue (batch tested with PRs 64013, 63896, 64139, 57527, 62102). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
add dynamic config metrics
This PR exports config-releated metrics from the Kubelet.
The Guages for active, assigned, and last-known-good config can be used
to identify config versions and produce aggregate counts across several
nodes. The error-reporting Gauge can be used to determine whether a node
is experiencing a config-related error, and to prodouce an aggregate
count of nodes in an error state.
https://github.com/kubernetes/features/issues/281
```release-note
The Kubelet now exports metrics that report the assigned (node_config_assigned), last-known-good (node_config_last_known_good), and active (node_config_active) config sources, and a metric indicating whether the node is experiencing a config-related error (node_config_error). The config source metrics always report the value 1, and carry the node_config_name, node_config_uid, node_config_resource_version, and node_config_kubelet_key labels, which identify the config version. The error metric reports 1 if there is an error, 0 otherwise.
```