Clarifies that requesting no conversion is part of the codec factory, and
future refactors will make the codec factory less opionated about conversion.
If the container is not found, do not stop reading the log file
immediately but wait until we reach again EOF.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
it seems fsnotify can miss some read events, blocking the kubelet to
receive more data from the log file.
If we end up waiting for events with fsnotify, force a read from the
log file every second so that are sure to not miss new data for longer
than that.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
if the runtime is configured to rotate the log file, we might end up
watching the old fd where there are no more writes.
When a fsnotify event other than Write is received, reopen the log
file and recreate the watcher.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
manager
This PR fixes the issue #75345. This fix modified the checking volume in
actual state when validating whether volume can be removed from desired state or not. Only if volume status is already mounted in actual state, it can be removed from desired state.
For the case of mounting fails always, it can still work because the
check also validate whether pod still exist in pod manager. In case of
mount fails, pod should be able to removed from pod manager so that
volume can also be removed from desired state.
Minor optimization in the code that attempts to assign whole
sockets/cores in case the number of CPUs requested is higher
than CPUs-per-socket/core: check if the number of requested
CPUs is higher than CPUs-per-socket/core before retrieving
and iterating the free sockets/cores, and break the loops
when that is no longer the case.
Signed-off-by: Arik Hadas <ahadas@redhat.com>
CRI runtimes do not supply cpu nano core usage as it is not part of CRI
stats. However, there are upstream components that still rely on such
stats to function. The previous fix was faulty because the multiple
callers could compete and update the stats, causing
inconsistent/incoherent metrics. This change, instead, creates a
separate call for updating the usage, and rely on eviction manager,
which runs periodically, to trigger the updates. The caveat is that if
eviction manager is completley turned off, no one would compute the
usage.
This is the 2nd PR to move CSINodeInfo/CSIDriver APIs to
v1beta1 core storage APIs. It includes controller side changes.
It depends on the PR with API changes:
https://github.com/kubernetes/kubernetes/pull/73883
This change explicits the restart policy, as on some docker version
(e.g. 11.07-ce) the default for this field is "". which seems to be not
respected by dockerd
A previous PR (https://github.com/kubernetes/kubernetes/pull/73726)
added GMSA support to the dockershim. Unfortunately, there was a
bug in there: the registry keys used to pass the cred specs down
to Docker were being cleaned up too early, right after the containers'
creation - before Docker would ever try to read them, when trying to
actually start the container.
This patch fixes this.
An e2e test is also provided in a separate PR.
Signed-off-by: Jean Rouge <rougej+github@gmail.com>
1) Add suffix (`seconds` or `total`) to metric name
2) Switch Summary metric to Histogram metric (Summary metrics are not
supported completely by prometheus-to-sd and can't be aggregated.)
This package contains public/private key utilities copied directly from
client-go/util/cert. All imports were updated.
Future PRs will actually refactor the libraries.
Updates #71004
1. Find the minimal thread number within a core using a
single loop rather than by sorting the thread numbers.
2. Inline getUniqueCoreID#err and Discover#numCPUs variables.
3. Narrow the scope of Discover#coreID and Discover#err variables.
Signed-off-by: Arik Hadas <ahadas@redhat.com>
The messages for container lifecycle events are subtly inconsistent
and should be unified.
First, the field format for containers is hard to parse for a human,
so include the container name directly in the message for create
and start, and for kill remove the container runtime prefix.
Second, the pulling image event has inconsistent capitalization, fix
that to be sentence without punctuation.
Third, the kill container event was unnecessarily wordy and inconsistent
with the create and start events. Make the following changes:
* Use 'Stopping' instead of 'Killing' since kill is usually reserved for
when we decide to hard stop a container
* Send the event before we dispatch the prestop hook, since this is an
"in-progress" style event vs a "already completed" type event
* Remove the 'cri-o://' / 'docker://' prefix by printing the container
name instead of id (we already do that replacement at the lower level
to prevent high cardinality events)
* Use 'message' instead of 'reason' as the argument name since this is a
string for humans field, not a string for machines field
* Remove the hash values on the container spec changed event because no
human will ever be able to do anything with the hash value
* Use 'Stopping container %s(, explanation)?' form without periods to
follow event conventions
The end result is a more pleasant message for humans:
```
35m Normal Created Pod Created container
35m Normal Started Pod Started container
10m Normal Killing Pod Killing container cri-o://installer:Need to kill Pod
10m Normal Pulling Pod pulling image "registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-02-10-172026@sha256:3da5303d4384d24691721c1cf2333584ba60e8f82c9e782f593623ce8f83ddc5"
```
becomes
```
35m Normal Created Pod Created container installer
35m Normal Started Pod Started container installer
10m Normal Killing Pod Stopping container installer
10m Normal Pulling Pod Pulling image "registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-02-10-172026@sha256:3da5303d4384d24691721c1cf2333584ba60e8f82c9e782f593623ce8f83ddc5"
```
that requests any device plugin resource. If not, re-issue Allocate
grpc calls. This allows us to handle the edge case that a pod got
assigned to a node even before it populates its extended resource
capacity.
* value names are now purely random
* cleaning up leaked registry keys at Kubelet init
* fixing a small bug masking create errors
Signed-off-by: Jean Rouge <rougej+github@gmail.com>
This patch comprises the kubelet changes outlined in the GMSA KEP
(https://github.com/kubernetes/enhancements/blob/master/keps/sig-windows/20181221-windows-group-managed-service-accounts-for-container-identity.md)
to add GMSA support to Windows workloads.
More precisely, it includes the logic proposed in the KEP to resolve
which GMSA spec should be applied to which containers, and changes
`dockershim` to copy the relevant GMSA credential specs to Windows
registry values prior to creating the container, passing them down
to docker itself, and finally removing the values from the registry
afterwards; both these changes need to be activated with the `WindowsGMSA`
feature gate.
Includes unit tests.
Signed-off-by: Jean Rouge <rougej+github@gmail.com>
When upgrading to 1.13, pods that were created prior to the upgrade have
no pod.Spec.EnableServiceLinks set. This causes a segfault and prevents
the pod from ever starting.
Check and set to the default if nil.
Fixes#71749
For the next release, we include both sets of labels for pods and
containers: "container_name" and "container", "pod_name" and "pod".
In future releases, the "*_name" metrics will be deprecated.
We've changed the Ephemeral Containers API, and container type will no
longer be required. Since this is the only feature using it, remove it.
This reverts commit ba6f31a6c6.
Currently, Docker make IPC of every container shareable by default,
which means other containers can join it's IPC namespace. This is
implemented by creating a tmpfs mount on the host, and then
bind-mounting it to a container's /dev/shm. Other containers
that want to share the same IPC (and the same /dev/shm) can also
bind-mount the very same host's mount.
Now, since https://github.com/moby/moby/commit/7120976d7
(https://github.com/moby/moby/pull/34087) there is a possiblity
to have per-daemon default of having "private" IPC mode,
meaning all the containers created will have non-shareable
/dev/shm.
For shared IPC to work in the above scenario, we need to
explicitly make the "pause" container's IPC mode as "shareable",
which is what this commit does.
To test: add "default-ipc-mode: private" to /etc/docker/daemon.json,
try using kube as usual, there should be no errors.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Modify kubelet plugin watcher to support older CSI drivers that use an
the old plugins directory for socket registration.
Also modify CSI plugin registration to support multiple versions of CSI
registering with the same name.
kubeadm uses certificate rotation to replace the initial high-power
cert provided in --kubeconfig with a less powerful certificate on
the masters. This requires that we pass the contents of the client
config certData and keyData down into the cert store to populate
the initial client.
Add better comments to describe why the flow is required. Add a test
that verifies initial cert contents are written to disk. Change
the cert manager to not use MustRegister for prometheus so that
it can be tested.
Expose both a Stop() method (for cleanup) and a method to force
cert rotation, but only expose Stop() on the interface.
Verify that we choose the correct client.
Ensure that bootstrap+clientcert-rotation in the Kubelet can:
1. happen in the background so that static pods aren't blocked by bootstrap
2. collapse down to a single call path for requesting a CSR
3. reorganize the code to allow future flexibility in retrieving bootstrap creds
Fetching the first certificate and later certificates when the kubelet
is using client rotation and bootstrapping should share the same code
path. We also want to start the Kubelet static pod loop before
bootstrapping completes. Finally, we want to take an incremental step
towards improving how the bootstrap credentials are loaded from disk
(potentially allowing for a CLI call to get credentials, or a remote
plugin that better integrates with cloud providers or KSMs).
Reorganize how the kubelet client config is determined. If rotation is
off, simplify the code path. If rotation is on, load the config
from disk, and then pass that into the cert manager. The cert manager
creates a client each time it tries to request a new cert.
Preserve existing behavior where:
1. bootstrap kubeconfig is used if the current kubeconfig is invalid/expired
2. we create the kubeconfig file based on the bootstrap kubeconfig, pointing to
the location that new client certs will be placed
3. the newest client cert is used once it has been loaded
UnmountDevice must not clear devicepath, because such devicePath
may come from node.status (e.g. on AWS) and subsequent MountDevice
operation (that may be already enqueued) needs it.
**What type of PR is this?**
/kind cleanup
**What this PR does / why we need it**:
Fix typos for stats_provider_test.go
**Which issue(s) this PR fixes** *(optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged)*:
**Special notes for your reviewer**:
**Release note**:
```
NONE
```
The reason for the bump is the new functionality of the
k8s.io/utils/exec package which allows
- to get a hold of the process' std{out,err} as `io.Reader`s
- to `Start` a process and `Wait` for it
This should help on addressing #70890 by allowing to wrap std{out,err}
of the process to be wrapped with a `io.limitedReader`.
It also updates
- k8s.io/kubernetes/pkg/probe/exec.FakeCmd
- k8s.io/kubernetes/pkg/kubelet/prober.execInContainer
- k8s.io/kubernetes/cmd/kubeadm/app/phases/kubelet.fakeCmd
to implement the changed interface.
The dependency on 'k8s.io/utils/pointer' to the new version has also
been bumped in some staging repos:
- apiserver
- kube-controller-manager
- kube-scheduler
The inotify code was removed from golang.org/x/exp several years ago. Therefore
importing it from that path prevents downstream consumers from using any module
that makes use of more recent features of golang.org/x/exp.
This change is a followup to google/cadvisor#2060 which was merged with #70889
This fixes#68478
- Move from the old github.com/golang/glog to k8s.io/klog
- klog as explicit InitFlags() so we add them as necessary
- we update the other repositories that we vendor that made a similar
change from glog to klog
* github.com/kubernetes/repo-infra
* k8s.io/gengo/
* k8s.io/kube-openapi/
* github.com/google/cadvisor
- Entirely remove all references to glog
- Fix some tests by explicit InitFlags in their init() methods
Change-Id: I92db545ff36fcec83afe98f550c9e630098b3135
This previously caused a panic when moving lastKnownGood between two
non-nil values, because we were comparing the interface wrapper instead
of comparing the NodeConfigSources. The case of moving from one non-nil
lastKnownGood config to another doesn't appear to be tested by the e2e
node tests. I added a unit test and an e2e node test to help catch bugs
with this case in the future.
When node lease feature is enabled, kubelet reports node status to api server
only if there is some change or it didn't report over last report interval.
Individual implementations are not yet being moved.
Fixed all dependencies which call the interface.
Fixed golint exceptions to reflect the move.
Added project info as per @dims and
https://github.com/kubernetes/kubernetes-template-project.
Added dims to the security contacts.
Fixed minor issues.
Added missing template files.
Copied ControllerClientBuilder interface to cp.
This allows us to break the only dependency on K8s/K8s.
Added TODO to ControllerClientBuilder.
Fixed GoDeps.
Factored in feedback from JustinSB.
In the recent PR on adding ProcMount, we introduced a regression when
pods are privileged. This shows up in 18.06 docker with kubeadm in the
kube-proxy container.
The kube-proxy container is privilged, but we end up setting the
`/proc/sys` to Read-Only which causes failures when running kube-proxy
as a pod. This shows up as a failure when using sysctl to set various
network things.
Change-Id: Ic61c4c9c961843a4e064e783fab0b54350762a8d
This change adds comments to exported things and renames the tcp,
http, and exec probe interfaces to just be Prober within their
namespace.
Issue #68026
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.
add missing LastTransitionTime of ContainerReady condition
**What this PR does / why we need it**:
add missing LastTransitionTime of ContainerReady condition
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
xref #64646
**Special notes for your reviewer**:
/cc freehan yujuhong
**Release note**:
```release-note
add missing LastTransitionTime of ContainerReady condition
```
We are removing dependencies on docker types where possible in the core
libraries. credentialprovider is generic to Docker and uses a public API
(the config file format) that must remain stable. Create an equivalent type
and use a type cast (which would error if we ever change the type) in the
dockershim. We already perform a transformation like this for CRI and so
we aren't changing much.
Automatic merge from submit-queue (batch tested with PRs 67950, 68195). If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.
Consolidate componentconfig code standards
**What this PR does / why we need it**:
This PR fixes a bunch of very small misalignments in ComponentConfig packages:
- Add sane comments to all functions/variables in componentconfig `register.go` files
- Make the `register.go` files of componentconfig pkgs follow the same pattern and not differ from each other like they do today.
- Register the `openapi-gen` tag in all `doc.go` files where the pkg contains _external_ types.
- Add the `groupName` tag where missing
- Fix cases where `addKnownTypes` was registered twice in the `SchemeBuilder`
- Add `Readme` and `OWNERS` files to `Godeps` directories if missing.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
/assign @sttts @thockin
Automatic merge from submit-queue (batch tested with PRs 68119, 68191). If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.
fix token controller keyFunc bug
Currently, token manager use keyFunc like: `fmt.Sprintf("%q/%q/%#v", name, namespace, tr.Spec)`.
Since tr.Spec contains point fields, new token request would not reuse the cache at all.
This patch fix this, also adds unit test.
```release-note
NONE
```
Since tr.Spec contains point fields, new token request would not reuse
the cache at all. This patch fix this, also adds unit test.
Signed-off-by: Mike Danese <mikedanese@google.com>
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.
Kubelet: only sync iptables on linux
**What this PR does / why we need it**:
Iptables is only supported on Linux, kubelet should only sync NAT rules on Linux.
Without this PR, Kubelet on Windows would logs following errors on each `syncNetworkUtil()`:
```
kubelet.err.log:4692:E0711 22:03:42.103939 2872 kubelet_network.go:102] Failed to ensure that nat chain KUBE-MARK-DROP exists: error creating chain "KUBE-MARK-DROP": executable file
```
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#65713
**Special notes for your reviewer**:
**Release note**:
```release-note
Kubelet now only sync iptables on Linux.
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.
Kubelet: only apply default hard evictions of nodefs.inodesFree on Linux
**What this PR does / why we need it**:
Kubelet sets default hard evictions of `nodefs.inodesFree ` for all platforms today. This will cause errors on Windows and a lot `no observation found for eviction signal nodefs.inodesFree` errors will be logs for kubelet.
```
kubelet.err.log:4961:W0711 22:21:12.378789 2872 helpers.go:808] eviction manager: no observation found for eviction signal nodefs.inodesFree
kubelet.err.log:4967:W0711 22:21:30.411371 2872 helpers.go:808] eviction manager: no observation found for eviction signal nodefs.inodesFree
kubelet.err.log:4974:W0711 22:21:48.446456 2872 helpers.go:808] eviction manager: no observation found for eviction signal nodefs.inodesFree
kubelet.err.log:4978:W0711 22:22:06.482441 2872 helpers.go:808] eviction manager: no observation found for eviction signal nodefs.inodesFree
```
This PR updates the default hard eviction value and only apply nodefs.inodesFree on Linux.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#66088
**Special notes for your reviewer**:
**Release note**:
```release-note
Kubelet only applies default hard evictions of nodefs.inodesFree on Linux
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.
Add kubelet stats for windows system container "pods"
**What this PR does / why we need it**:
This PR adds kubelet stats for windows system container "pods". Without this, kubelet will always logs error:
```
kubelet.err.log:4832:E0711 22:12:49.241358 2872 helpers.go:735] eviction manager: failed to construct signal: "allocatableMemory.available" error: system container "pods" not found
```
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#66087
**Special notes for your reviewer**:
/sig windows
/sig node
**Release note**:
```release-note
Add kubelet stats for windows system container "pods"
```
Automatic merge from submit-queue (batch tested with PRs 65251, 67255, 67224, 67297, 68105). If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.
Promote mount propagation to GA
**What this PR does / why we need it**:
This PR promotes mount propagation to GA.
Website PR: https://github.com/kubernetes/website/pull/9823
**Release note**:
```release-note
Mount propagation has promoted to GA. The `MountPropagation` feature gate is deprecated and will be removed in 1.13.
```
Automatic merge from submit-queue (batch tested with PRs 64283, 67910, 67803, 68100). If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.
CSI Cluster Registry and Node Info CRDs
**What this PR does / why we need it**:
Introduces the new `CSIDriver` and `CSINodeInfo` API Object as proposed in https://github.com/kubernetes/community/pull/2514 and https://github.com/kubernetes/community/pull/2034
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes https://github.com/kubernetes/features/issues/594
**Special notes for your reviewer**:
Per the discussion in https://groups.google.com/d/msg/kubernetes-sig-storage-wg-csi/x5CchIP9qiI/D_TyOrn2CwAJ the API is being added to the staging directory of the `kubernetes/kubernetes` repo because the consumers will be attach/detach controller and possibly kubelet, but it will be installed as a CRD (because we want to move in the direction where the API server is Kubernetes agnostic, and all Kubernetes specific types are installed).
**Release note**:
```release-note
Introduce CSI Cluster Registration mechanism to ease CSI plugin discovery and allow CSI drivers to customize Kubernetes' interaction with them.
```
CC @jsafrane
Automatic merge from submit-queue (batch tested with PRs 64283, 67910, 67803, 68100). If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.
Add a ProcMount option to the SecurityContext & AllowedProcMountTypes to PodSecurityPolicy
So there is a bit of a chicken and egg problem here in that the CRI runtimes will need to implement this for there to be any sort of e2e testing.
**What this PR does / why we need it**: This PR implements design proposal https://github.com/kubernetes/community/pull/1934. This adds a ProcMount option to the SecurityContext and AllowedProcMountTypes to PodSecurityPolicy
Relies on https://github.com/google/cadvisor/pull/1967
**Release note**:
```release-note
ProcMount added to SecurityContext and AllowedProcMounts added to PodSecurityPolicy to allow paths in the container's /proc to not be masked.
```
cc @Random-Liu @mrunalp
Automatic merge from submit-queue (batch tested with PRs 67349, 66056). If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.
wait until apiserver connection before starting kubelet tls bootstrap
I wonder if this helps with sometimes slow network programming
cc @mwielgus @awly
This vendor change was purely for the changes in docker to allow for
setting the Masked and Read-only paths.
See: moby/moby#36644
But because of the docker dep update it also needed cadvisor to be
updated and winterm due to changes in pkg/tlsconfig in docker
See: google/cadvisor#1967
Signed-off-by: Jess Frazelle <acidburn@microsoft.com>
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
should not event directly
**What this PR does / why we need it**:
should not event directly, using recordContainerEvent() to generate ref and deduplicate events instead.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
none
```
Automatic merge from submit-queue (batch tested with PRs 67739, 65222). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Honor --hostname-override, report compatible hostname addresses with cloud provider
xref #677147828e5d made cloud providers authoritative for the addresses reported on Node objects, so that the addresses used by the node (and requested as SANs in serving certs) could be verified via cloud provider metadata.
This had the effect of no longer reporting addresses of type Hostname for Node objects for some cloud providers. Cloud providers that have the instance hostname available in metadata should add a `type: Hostname` address to node status. This is being tracked in #67714
This PR does a couple other things to ease the transition to authoritative cloud providers:
* if `--hostname-override` is set on the kubelet, make the kubelet report that `Hostname` address. if it can't be verified via cloud-provider metadata (for cert approval, etc), the kubelet deployer is responsible for fixing the situation by adjusting the kubelet configuration (as they were in 1.11 and previously)
* if `--hostname-override` is not set, *and* the cloud provider didn't report a Hostname address, *and* the auto-detected hostname matches one of the addresses the cloud provider *did* report, make the kubelet report that as a Hostname address. That lets the addresses remain verifiable via cloud provider metadata, while still including a `Hostname` address whenever possible.
/sig node
/sig cloud-provider
/cc @mikedanese
fyi @hh
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 67694, 64973, 67902). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
SCTP support implementation for Kubernetes
**What this PR does / why we need it**: This PR adds SCTP support to Kubernetes, including Service, Endpoint, and NetworkPolicy.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#44485
**Special notes for your reviewer**:
**Release note**:
```release-note
SCTP is now supported as additional protocol (alpha) alongside TCP and UDP in Pod, Service, Endpoint, and NetworkPolicy.
```
The requested Service Protocol is checked against the supported protocols of GCE Internal LB. The supported protocols are TCP and UDP.
SCTP is not supported by OpenStack LBaaS. If SCTP is requested in a Service with type=LoadBalancer, the request is rejected. Comment style is also corrected.
SCTP is not allowed for LoadBalancer Service and for HostPort. Kube-proxy can be configured not to start listening on the host port for SCTP: see the new SCTPUserSpaceNode parameter
changed the vendor github.com/nokia/sctp to github.com/ishidawataru/sctp. I.e. from now on we use the upstream version.
netexec.go compilation fixed. Various test cases fixed
SCTP related conformance tests removed. Netexec's pod definition and Dockerfile are updated to expose the new SCTP port(8082)
SCTP related e2e test cases are removed as the e2e test systems do not support SCTP
sctp related firewall config is removed from cluster/gce/util.sh. Variable name sctp_addr is corrected to sctpAddr in pkg/proxy/ipvs/proxier.go
cluster/gce/util.sh is copied from master
This extends the Kubelet to create and periodically update leases in a
new kube-node-lease namespace. Based on [KEP-0009](https://github.com/kubernetes/community/blob/master/keps/sig-node/0009-node-heartbeat.md),
these leases can be used as a node health signal, and will allow us to
reduce the load caused by over-frequent node status reporting.
- add NodeLease feature gate
- add kube-node-lease system namespace for node leases
- add Kubelet option for lease duration
- add Kubelet-internal lease controller to create and update lease
- add e2e test for NodeLease feature
- modify node authorizer and node restriction admission controller
to allow Kubelets access to corresponding leases
Automatic merge from submit-queue (batch tested with PRs 65247, 63633, 67425). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
fix kubelet iptclient in ipv6 cluster
**What this PR does / why we need it**:
Kubelet uses "iptables" instead of "ip6tables" in an ipv6-only cluster. This causes failed traffic for type: LoadBalancer services (and probably a lot of other problems).
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#67398
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
remove rescheduler since scheduling DS pods by default scheduler is moving to beta
**What this PR does / why we need it**:
remove rescheduler since scheduling DS pods by default scheduler is moving to beta
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#64725
**Special notes for your reviewer**:
**Release note**:
```release-note
Remove rescheduler since scheduling DS pods by default scheduler is moving to beta.
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Reduce latency to node ready after CIDR is assigned.
This adds code to execute an immediate runtime and node status update when the Kubelet sees that it has a CIDR, which significantly decreases the latency to node ready.
```release-note
Speed up kubelet start time by executing an immediate runtime and node status update when the Kubelet sees that it has a CIDR.
```
Automatic merge from submit-queue (batch tested with PRs 67430, 67550). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
cpumanager: rollback state if updateContainerCPUSet failed
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#63018
If `updateContainerCPUSet` failed, the container will start failed. We should rollback the state to avoid CPU leak.
**Special notes for your reviewer**:
**Release note**:
```release-note
cpumanager: rollback state if updateContainerCPUSet failed
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add labels to kubelet OWNERS files
**What this PR does / why we need it**:
This change makes it possible to automatically add the two labels: `area/kubelet` to PRs that touch the paths in question.
this already exists for kubeadm:
https://github.com/kubernetes/kubernetes/blob/master/cmd/kubeadm/OWNERS#L17-L19
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
refs https://github.com/kubernetes/community/issues/1808
**Special notes for your reviewer**:
none
**Release note**:
```release-note
NONE
```
/area kubelet
@kubernetes/sig-node-pr-reviews
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add Labels to various OWNERS files
**What this PR does / why we need it**:
Will reduce the burden of manually adding labels. Information pulled
from:
https://github.com/kubernetes/community/blob/master/sigs.yaml
Change-Id: I17e661e37719f0bccf63e41347b628269cef7c8b
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 67661, 67497, 66523, 67622, 67632). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Reduce verbose logs of node addresses requesting
**What this PR does / why we need it**:
Kubelet build from the master branch is flushing node addresses requesting logs, which is too verbose:
```sh
Aug 16 10:09:40 node-1 kubelet[24217]: I0816 10:09:40.658479 24217 cloud_request_manager.go:97] Requesting node addresses from cloud provider for node "node-1"
Aug 16 10:09:40 node-1 kubelet[24217]: I0816 10:09:40.666114 24217 cloud_request_manager.go:116] Node addresses from cloud provider for node "node-1" collected
Aug 16 10:09:50 node-1 kubelet[24217]: I0816 10:09:50.666357 24217 cloud_request_manager.go:97] Requesting node addresses from cloud provider for node "node-1"
Aug 16 10:09:50 node-1 kubelet[24217]: I0816 10:09:50.674322 24217 cloud_request_manager.go:116] Node addresses from cloud provider for node "node-1" collected
Aug 16 10:10:01 node-1 kubelet[24217]: I0816 10:10:00.674644 24217 cloud_request_manager.go:97] Requesting node addresses from cloud provider for node "node-1"
Aug 16 10:10:01 node-1 kubelet[24217]: I0816 10:10:00.682794 24217 cloud_request_manager.go:116] Node addresses from cloud provider for node "node-1" collected
Aug 16 10:10:10 node-1 kubelet[24217]: I0816 10:10:10.683002 24217 cloud_request_manager.go:97] Requesting node addresses from cloud provider for node "node-1"
Aug 16 10:10:10 node-1 kubelet[24217]: I0816 10:10:10.689641 24217 cloud_request_manager.go:116] Node addresses from cloud provider for node "node-1" collected
Aug 16 10:10:20 node-1 kubelet[24217]: I0816 10:10:20.690006 24217 cloud_request_manager.go:97] Requesting node addresses from cloud provider for node "node-1"
Aug 16 10:10:20 node-1 kubelet[24217]: I0816 10:10:20.696545 24217 cloud_request_manager.go:116] Node addresses from cloud provider for node "node-1" collected
```
This PR sets them to level 5.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
/cc @ingvagabund
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fail container start if its requested device plugin resource is unknown.
With the change, Kubelet device manager now checks whether it has cached option state for the requested device plugin resource to make sure the resource is in ready state when we start the container.
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes https://github.com/kubernetes/kubernetes/issues/67107
**Special notes for your reviewer**:
**Release note**:
```release-note
Fail container start if its requested device plugin resource hasn't registered after Kubelet restart.
```
Automatic merge from submit-queue (batch tested with PRs 66209, 67380, 67499, 67437, 67498). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
nits in manager.go
**What this PR does / why we need it**:
just found some nits in the manager.go
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 64445, 67459, 67434). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
dockershim/network: pass ipRange CNI capabilities
**What this PR does / why we need it**:
Updates the dynamic (capability args) passed from Kubernetes to the CNI plugin. This means CNI plugin authors can offer more features and / or reduce their dependency on the APIServer.
Currently, we only pass the `portMappings` capability. CNI now supports `bandwidth` for bandwidth limiting and `ipRanges` for preferred IP blocks. This PR adds support for these two new capabilities.
Bandwidth limits are provided - as implemented in kubenet - via the pod annotations `kubernetes.io/ingress-bandwidth` and `kubernetes.io/egress-bandwidth`.
The ipRanges field simply passes the PodCIDR. This does mean that we need to change the NodeReady algorithm. Previously, we would only set NodeNotReady on missing PodCIDR when using Kubenet. Now, if the CNI configuration includes the `ipRanges` capability, we need to do the same.
**Which issue(s) this PR fixes**:
Fixes#64393
**Release note**:
```release-note
The dockershim now sets the "bandwidth" and "ipRanges" CNI capabilities (dynamic parameters). Plugin authors and administrators can now take advantage of this by updating their CNI configuration file. For more information, see the [CNI docs](https://github.com/containernetworking/cni/blob/master/CONVENTIONS.md#dynamic-plugin-specific-fields-capabilities--runtime-configuration)
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Other components support set log level dynamically
**What this PR does / why we need it**:
#63777 introduced a way to set glog.logging.verbosity dynamically.
We should enable this for all other components, which is specially useful in debugging.
**Release note**:
```release-note
Expose `/debug/flags/v` to allow kubelet dynamically set glog logging level. If want to change glog level to 3, you only have to send a PUT request like `curl -X PUT http://127.0.0.1:8080/debug/flags/v -d "3"`.
```
Automatic merge from submit-queue (batch tested with PRs 65561, 67109, 67450, 67456, 67402). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
error text refers to wrong stream type
**What this PR does / why we need it**:
clarify error text
**Special notes for your reviewer**:
I think this was a copy and paste error.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 65561, 67109, 67450, 67456, 67402). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Compared preemption by priority in Kubelet
Signed-off-by: Da K. Ma <klaus1982.cn@gmail.com>
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#65372
**Release note**:
```release-note
None
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Attacher/Detacher refactor for local storage
Proposal link: https://github.com/kubernetes/community/pull/2438
**What this PR does / why we need it**:
Attacher/Detacher refactor for the plugins which just need to mount device, but do not need to attach, such as local storage plugin.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
```release-note
Attacher/Detacher refactor for local storage
```
/sig storage
/kind feature
Automatic merge from submit-queue (batch tested with PRs 66177, 66185, 67136, 67157, 65065). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
kubelet: reduce logging for backoff situations
xref https://bugzilla.redhat.com/show_bug.cgi?id=1555057#c6
Pods that are in `ImagePullBackOff` or `CrashLoopBackOff` currently generate a lot of logging at the `glog.Info()` level. This PR moves some of that logging to `V(3)` and avoids logging in situations where the `SyncPod` only fails because pod are in a BackOff error condition.
@derekwaynecarr @liggitt
Automatic merge from submit-queue (batch tested with PRs 66491, 66587, 66856, 66657, 66923). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Revert #63905: Setup dns servers and search domains for Windows Pods
**What this PR does / why we need it**:
From https://github.com/kubernetes/kubernetes/pull/63905#issuecomment-396709775:
> I don't think this change does anything on Windows. On windows, the network endpoint configuration is taken care of completely by CNI. If you would like to pass on the custom dns polices from the pod spec, it should be dynamically going to the cni configuration that gets passed to CNI. From there, it would be passed down to platform and would be taken care of appropriately by HNS.
> etc\resolve.conf is very specific to linux and that should remain linux speicfic implementation. We should be trying to move away from platform specific code in Kubelet.
Docker is not managing the networking here for windows. So it doens't really care about any network settings. So passing it to docker shim's hostconfig also doens;t make sense here.
DNS for Windows containers will be set by CNI plugins. And this change also introduced two endpoints for sandbox container. So this PR reverts #63905 .
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
The PR should also be cherry-picked to release-1.11.
Also, https://github.com/kubernetes/kubernetes/issues/66588 is opened to track the process of pushing this to CNI.
**Release note**:
```release-note
Revert #63905: Setup dns servers and search domains for Windows Pods. DNS for Windows containers will be set by CNI plugins.
```
/sig windows
/sig node
/kind bug
This allows kubelets to stop the necessary work when the context has
been canceled (e.g., connection closed), and not leaking a goroutine
and inotify watcher waiting indefinitely.
Automatic merge from submit-queue (batch tested with PRs 67177, 53042). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Adding unit tests to methods of pod's format
What this PR does / why we need it:
Add unit test cases, thank you!
Automatic merge from submit-queue (batch tested with PRs 66512, 66946, 66083). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
kubelet/cm/cpumanager: Fix unused variable "skipIfPermissionsError"
The variable "skipIfPermissionsError" is not needed even when
permission error happened.
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
clean up useless variables in deviceplugin/types.go
**What this PR does / why we need it**:
some variables is useless for reasons, I think we need a clean up.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
```NONE
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
error out empty hostname
**What this PR does / why we need it**:
For linux, the hostname is read from file `/proc/sys/kernel/hostname` directly, which can be overwritten with whitespaces.
Should error out such invalid hostnames.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixeskubernetes/kubeadm#835
**Special notes for your reviewer**:
/cc luxas timothysc
**Release note**:
```release-note
nodes: improve handling of erroneous host names
```
Automatic merge from submit-queue (batch tested with PRs 62901, 66562, 66938, 66927, 66926). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
kubelet: volumemanager: poll immediate when waiting for volume attachment
Currently, `WaitForAttachAndMount()` introduces a 300ms minimum delay by using `wait.Poll()` rather than `wait.PollImmediate()`. This wait constitutes >99% of the total processing time for `syncPod()`. Changing this reduced `syncPod()` processing time for a simple busybox pod with one emptyDir volume from 302ms to 2ms.
@derekwaynecarr @pmorie @smarterclayton @jsafrane
/sig node
/release-note-none
Automatic merge from submit-queue (batch tested with PRs 66190, 66871, 66617, 66293, 66891). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Do not set cgroup parent when --cgroups-per-qos is disabled
When --cgroups-per-qos=false (default is true), kubelet sets pod
container management to podContainerManagerNoop implementation and
GetPodContainerName() returns '/' as cgroup parent (default cgroup root).
(1) In case of 'systemd' cgroup driver, '/' is invalid parent as
docker daemon expects '.slice' suffix and throws this error:
'cgroup-parent for systemd cgroup should be a valid slice named as \"xxx.slice\"'
(5fc12449d8/daemon/daemon_unix.go (L618))
'/' corresponds to '-.slice' (root slice) in systemd but I don't think
we want to assign root slice instead of runtime specific default value.
In case of docker runtime, this will be 'system.slice'
(e2593239d9/daemon/oci_linux.go (L698))
(2) In case of 'cgroupfs' cgroup driver, '/' is valid parent but I don't
think we want to assign root instead of runtime specific default value.
In case of docker runtime, this will be '/docker'
(e2593239d9/daemon/oci_linux.go (L695))
Current fix will not set the cgroup parent when --cgroups-per-qos is disabled.
```release-note
Fix pod launch by kubelet when --cgroups-per-qos=false and --cgroup-driver="systemd"
```
Automatic merge from submit-queue (batch tested with PRs 66190, 66871, 66617, 66293, 66891). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
fix nil pointer dereference in node_container_manager#enforceExisting
**What this PR does / why we need it**:
fix nil pointer dereference in node_container_manager#enforceExisting
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#66189
**Special notes for your reviewer**:
NONE
**Release note**:
```release-note
kubelet: fix nil pointer dereference while enforce-node-allocatable flag is not config properly
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Skip checking when failSwapOn=false
**What this PR does / why we need it**:
Skip checking when failSwapOn=false
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
NONE
**Release note**:
```
NONE
```
Automatic merge from submit-queue (batch tested with PRs 66850, 66902, 66779, 66864, 66912). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
add methods to apimachinery to easy unit testing
When unit testing, you often want a selective scheme and codec factory. Rather than writing the vars and the init function and the error handling, you can simply do
`scheme, codecs := testing.SchemeForInstallOrDie(install.Install)`
@kubernetes/sig-api-machinery-misc
@sttts
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 65730, 66615, 66684, 66519, 66510). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
kubelet: add image-gc low/high validation check
Currently, there is no protection against setting the high watermark <= the low watermark for image GC
This PR adds a validation rule for that.
@smarterclayton
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
move feature gate checks inside IsCriticalPod
Currently `IsCriticalPod()` calls throughout the code are protected by `utilfeature.DefaultFeatureGate.Enabled(features.ExperimentalCriticalPodAnnotation)`.
However, with Pod Priority, this gate could be disabled which skips the priority check inside IsCriticalPod().
This PR moves the feature gate checking inside `IsCriticalPod()` and handles both situations properly.
@aveshagarwal @ravisantoshgudimetla @derekwaynecarr
/sig node
/sig scheduling
/king bug
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
delete unused events
**What this PR does / why we need it**:
events (HostNetworkNotSupported, UndefinedShaper) is unused since #47058
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 66623, 66718). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
cpumanager: validate topology in static policy
**What this PR does / why we need it**:
This patch adds a check for the static policy state validation. The check fails if the CPU topology obtained from cadvisor doesn't match with the current topology in the state file.
If the CPU topology has changed in a node, cpumanager static policy might try to assign non-present cores to containers.
For example in my test case, static policy had the default CPU set of `0-1,4-7`. Then kubelet was shut down and CPU 7 was offlined. After restarting the kubelet, CPU manager tries to assign the non-existent CPU 7 to containers which don't have exclusive allocations assigned to them:
Error response from daemon: Requested CPUs are not available - requested 0-1,4-7, available: 0-6)
This breaks the exclusivity, since the CPUs from the shared pool don't get assigned to non-exclusive containers, meaning that they can execute on the exclusive CPUs.
**Release note**:
```release-note
Added CPU Manager state validation in case of changed CPU topology.
```
Test the cases where the number of CPUs available in the system is
smaller or larger than the number of CPUs known in the state, which
should lead to a panic. This covers both CPU onlining and offlining. The
case where the number of CPUs matches is already covered by the
"non-corrupted state" test.
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Move the` k8s.io/kubernetes/pkg/util/pointer` package to` k8s.io/utils/pointer`
**What this PR does / why we need it**:
Move `k8s.io/kubernetes/pkg/util/pointer` to `shared utils` directory, so that we can use it easily.
Close#66010 accidentally, and can't reopen it, so the same as #66010
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
This patch adds a check for the static policy state validation. The
check fails if the CPU topology obtained from cadvisor doesn't match
with the current topology in the state file.
If the CPU topology has changed in a node, cpu manager static policy
might try to assign non-present cores to containers.
For example in my test case, static policy had the default CPU set of
0-1,4-7. Then kubelet was shut down and CPU 7 was offlined. After
restarting the kubelet, CPU manager tries to assign the non-existent CPU
7 to containers which don't have exclusive allocations assigned to them:
Error response from daemon: Requested CPUs are not available - requested 0-1,4-7, available: 0-6)
This breaks the exclusivity, since the CPUs from the shared pool don't
get assigned to non-exclusive containers, meaning that they can execute
on the exclusive CPUs.
the caching layer on endpoint is redundant.
Here are the 3 related objects in picture:
devicemanager <-> endpoint <-> plugin
Plugin is the source of truth for devices
and device health status.
devicemanager maintain healthyDevices,
unhealthyDevices, allocatedDevices based on updates
from plugin.
So there is no point for endpoint caching devices,
this patch is removing this caching layer on endpoint,
Also removing the Manager.Devices() since i didn't
find any caller of this other than test, i am adding a
notification channel to facilitate testing,
If we need to get all devices from manager in future,
it just need to return healthyDevices + unhealthyDevices,
we don't have to call endpoint after all.
This patch makes code more readable, data model been simplified.
Automatic merge from submit-queue (batch tested with PRs 66593, 66727, 66558). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
remove the outdate comments in tryRegisterWithAPIServer
**What this PR does / why we need it**:
some judgement about ExternalID removed in #61877, so remove the outdate comments in tryRegisterWithAPIServer
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 58755, 66414). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Use probe based plugin watcher mechanism in Device Manager
**What this PR does / why we need it**:
Uses this probe based utility in the device plugin manager.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#56944
**Notes For Reviewers**:
Changes are backward compatible and existing device plugins will continue to work. At the same time, any new plugins that has required support for probing model (Identity service implementation), will also work.
**Release note**
```release-note
Add support kubelet plugin watcher in device manager.
```
/sig node
/area hw-accelerators
/cc /cc @jiayingz @RenaudWasTaken @vishh @ScorpioCPH @sjenning @derekwaynecarr @jeremyeder @lichuqiang @tengqm @saad-ali @chakri-nelluri @ConnorDoyle
Automatic merge from submit-queue (batch tested with PRs 66665, 66707, 66596). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
fix kubelet npe panic on device plugin return zero container
Signed-off-by: bingshen.wbs <bingshen.wbs@alibaba-inc.com>
**What this PR does / why we need it**:
Fix kubelet panic when device plugin return zero containers. Panic logs like follows:
```
Jul 17 12:50:24 iZwz9bqgzuo4i8qu435zk8Z kubelet[25815]: /workspace/anago-v1.10.4-beta.0.68+5ca598b4ba5abb/src/k8s.io/kubernetes/_output/dockerized/go/src/
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51
Jul 17 12:50:24 iZwz9bqgzuo4i8qu435zk8Z kubelet[25815]: /workspace/anago-v1.10.4-beta.0.68+5ca598b4ba5abb/src/k8s.io/kubernetes/_output/dockerized/go/src/
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:65
Jul 17 12:50:24 iZwz9bqgzuo4i8qu435zk8Z kubelet[25815]: /workspace/anago-v1.10.4-beta.0.68+5ca598b4ba5abb/src/k8s.io/kubernetes/_output/dockerized/go/src/
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:72
Jul 17 12:50:24 iZwz9bqgzuo4i8qu435zk8Z kubelet[25815]: E0717 12:50:24.726856 25815 runtime.go:66] Observed a panic: "index out of range" (runtime error
: index out of range)
```
**Release note**:
```
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Taint node when initializing node.
Signed-off-by: Da K. Ma <klaus1982.cn@gmail.com>
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#63897
**Release note**:
```release-note
If `TaintNodesByCondition` enabled, taint node with `TaintNodeUnschedulable` when
initializing node to avoid race condition.
```
The --docker-disable-shared-pid flag has been deprecated since 1.10 and
has been superceded by ShareProcessNamespace in the pod API, which is
scheduled for beta in 1.12.
Automatic merge from submit-queue (batch tested with PRs 66152, 66406, 66218, 66278, 65660). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Refactor kubelet node status setters, add test coverage
This internal refactor moves the node status setters to a new package, explicitly injects dependencies to facilitate unit testing, and adds individual unit tests for the setters.
I gave each setter a distinct commit to facilitate review.
Non-goals:
- I intentionally excluded the class of setters that return a "modified" boolean, as I want to think more carefully about how to cleanly handle the behavior, and this PR is already rather large.
- I would like to clean up the status update control loops as well, but that belongs in a separate PR.
```release-note
NONE
```
When --cgroups-per-qos=false (default is true), kubelet sets pod
container management to podContainerManagerNoop implementation and
GetPodContainerName() returns '/' as cgroup parent (default cgroup root).
(1) In case of 'systemd' cgroup driver, '/' is invalid parent as
docker daemon expects '.slice' suffix and throws this error:
'cgroup-parent for systemd cgroup should be a valid slice named as \"xxx.slice\"'
(5fc12449d8/daemon/daemon_unix.go (L618))
'/' corresponds to '-.slice' (root slice) in systemd but I don't think
we want to assign root slice instead of runtime specific default value.
In case of docker runtime, this will be 'system.slice'
(e2593239d9/daemon/oci_linux.go (L698))
(2) In case of 'cgroupfs' cgroup driver, '/' is valid parent but I don't
think we want to assign root instead of runtime specific default value.
In case of docker runtime, this will be '/docker'
(e2593239d9/daemon/oci_linux.go (L695))
Current fix will not set the cgroup parent when --cgroups-per-qos is disabled.
Automatic merge from submit-queue (batch tested with PRs 65771, 65849). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add a new conversion path to replace GenericConversionFunc
reflect.Call is very expensive. We currently use a switch block as part of AddGenericConversionFunc to avoid the bulk of top level a->b conversion for our primary types which is hand-written. Instead of having these be handwritten, we should generate them.
The pattern for generating them looks like:
```
scheme.AddConversionFunc(&v1.Type{}, &internal.Type{}, func(a, b interface{}, scope conversion.Scope) error {
return Convert_v1_Type_to_internal_Type(a.(*v1.Type), b.(*internal.Type), scope)
})
```
which matches AddDefaultObjectFunc (which proved out the approach last year). The
conversion machinery should then do a simple map lookup based on the incoming types and invoke the function. Like defaulting, it's up to the caller to match the types to arguments, which we do by generating this code. This bypasses reflect.Call and in the future allows Golang mid-stack inlining to optimize this code.
As part of this change I strengthened registration of custom functions to be generated instead of hand registered, and also strengthened error checking of the generator when it sees a manual conversion to error out. Since custom functions are automatically used by the generator, we don't really have a case for not registering the functions.
Once this is fully tested out, we can remove the reflection based path and the old registration methods, and all conversion will work from point to point methods (whether generated or custom).
Much of the need for the reflection path has been removed by changes to generation (to omit fields) and changes to Go (to make assigning equivalent structs easy).
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 66175, 66324, 65828, 65901, 66350). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Start cloudResourceSyncsManager before getNodeAnyWay (initializeModules) to avoid kubelet getting stuck in retrieving node addresses from a cloudprovider.
**What this PR does / why we need it**:
This PR starts cloudResourceSyncsManager before getNodeAnyWay (initializeModules) otherwise kubelet gets stuck in setNodeAddress->kl.cloudResourceSyncManager.NodeAddresses() (https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/kubelet_node_status.go#L470) forever retrieving node addresses from a cloud provider, and due to this cloudResourceSyncsManager will not be started at all.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
None
```
@ingvagabund @derekwaynecarr @sjenning @kubernetes/sig-node-bugs
Instead of hiding these behind a helper, we just register them in a
uniform way. We are careful to keep the call-order of the setters the
same, though we can consider re-ordering in a future PR to achieve
fewer appends.
Automatic merge from submit-queue (batch tested with PRs 65052, 65594). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Derive kubelet serving certificate CSR template from node status addresses
xref https://github.com/kubernetes/features/issues/267fixes#55633
Builds on https://github.com/kubernetes/kubernetes/pull/65587
* Makes the cloud provider authoritative when recording node status addresses
* Makes the node status addresses authoritative for the kube-apiserver determining how to speak to a kubelet (stops paying attention to the hostname label when determining how to reach a kubelet, which was only done to support kubelets < 1.5)
* Updates kubelet certificate rotation to be driven from node status
* Avoids needing to compute node addresses a second time, and differently, in order to request serving certificates.
* Allows the kubelet to react to changes in its status addresses by updating its serving certificate
* Allows the kubelet to be driven by external cloud providers recording node addresses on the node status
test procedure:
```sh
# setup
export FEATURE_GATES=RotateKubeletServerCertificate=true
export KUBELET_FLAGS="--rotate-server-certificates=true --cloud-provider=external"
# cleanup from previous runs
sudo rm -fr /var/lib/kubelet/pki/
# startup
hack/local-up-cluster.sh
# wait for a node to register, verify it didn't set addresses
kubectl get nodes
kubectl get node/127.0.0.1 -o jsonpath={.status.addresses}
# verify the kubelet server isn't available, and that it didn't populate a serving certificate
curl --cacert _output/certs/server-ca.crt -v https://localhost:10250/pods
ls -la /var/lib/kubelet/pki
# set an address on the node
curl -X PATCH http://localhost:8080/api/v1/nodes/127.0.0.1/status \
-H "Content-Type: application/merge-patch+json" \
--data '{"status":{"addresses":[{"type":"Hostname","address":"localhost"}]}}'
# verify a csr was submitted with the right SAN, and approve it
kubectl describe csr
kubectl certificate approve csr-...
# verify the kubelet connection uses a cert that is properly signed and valid for the specified hostname, but NOT the IP
curl --cacert _output/certs/server-ca.crt -v https://localhost:10250/pods
curl --cacert _output/certs/server-ca.crt -v https://127.0.0.1:10250/pods
ls -la /var/lib/kubelet/pki
# set an hostname and IP address on the node
curl -X PATCH http://localhost:8080/api/v1/nodes/127.0.0.1/status \
-H "Content-Type: application/merge-patch+json" \
--data '{"status":{"addresses":[{"type":"Hostname","address":"localhost"},{"type":"InternalIP","address":"127.0.0.1"}]}}'
# verify a csr was submitted with the right SAN, and approve it
kubectl describe csr
kubectl certificate approve csr-...
# verify the kubelet connection uses a cert that is properly signed and valid for the specified hostname AND IP
curl --cacert _output/certs/server-ca.crt -v https://localhost:10250/pods
curl --cacert _output/certs/server-ca.crt -v https://127.0.0.1:10250/pods
ls -la /var/lib/kubelet/pki
```
```release-note
* kubelets that specify `--cloud-provider` now only report addresses in Node status as determined by the cloud provider
* kubelet serving certificate rotation now reacts to changes in reported node addresses, and will request certificates for addresses set by an external cloud provider
```
Automatic merge from submit-queue (batch tested with PRs 66076, 65792, 65649). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
kubernetes: fix printf format errors
These are all flagged by Go 1.11's
more accurate printf checking in go vet,
which runs as part of go test.
```release-note
NONE
```
These are all flagged by Go 1.11's
more accurate printf checking in go vet,
which runs as part of go test.
Lubomir I. Ivanov <neolit123@gmail.com>
applied ammend for:
pkg/cloudprovider/provivers/vsphere/nodemanager.go
Automatic merge from submit-queue (batch tested with PRs 65987, 65962). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix pod worker deadlock.
Preemption will stuck forever if `killPodNow` timeout once. The sequence is:
* `killPodNow` create the response channel (size 0) and send it to pod worker.
* `killPodNow` timeout and return.
* Pod worker finishes killing the pod, and tries to send back response via the channel.
However, because the channel size is 0, and the receiver has exited, the pod worker will stuck forever.
In @jingxu97's case, this causes a critical system pod (apiserver) unable to come up, because the csi pod can't be preempted.
I checked the history, and the bug was introduced 2 years ago 6fefb428c1.
I think we should at least cherrypick this to `1.11` since preemption is beta and enabled by default in 1.11.
@kubernetes/sig-node-bugs @derekwaynecarr @dashpole @yujuhong
Signed-off-by: Lantao Liu <lantaol@google.com>
```release-note
none
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Store the latest cloud provider node addresses
**What this PR does / why we need it**:
Buffer the recently retrieved node address so they can be used as soon as the next node status update is run.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#65814
**Special notes for your reviewer**:
**Release note**:
```release-note
None
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Adding traffic shaping support for CNI network driver
**What this PR does / why we need it**:
Adding traffic shaping support for CNI network driver - it's also a sub-task of kubenet deprecation work.
Design document is available here: https://github.com/kubernetes/community/pull/1893
**Which issue(s) this PR fixes**:
Fixes #
**Special notes for your reviewer**:
/cc @freehan @jingax10 @caseydavenport @dcbw
/sig network
/sig node
**Release note**:
```release-note
Support traffic shaping for CNI network driver
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Remove --cadvisor-port - has been deprecated since v1.10
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#56523
**Special notes for your reviewer**:
- Deprecated in https://github.com/kubernetes/kubernetes/pull/59827 (v1.10)
- Disabled in https://github.com/kubernetes/kubernetes/pull/63881 (v1.11)
**Release note**:
```release-note
[action required] The formerly publicly-available cAdvisor web UI that the kubelet started using `--cadvisor-port` is now entirely removed in 1.12. The recommended way to run cAdvisor if you still need it, is via a DaemonSet.
```
Automatic merge from submit-queue (batch tested with PRs 65582, 65480, 65310, 65644, 65645). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix test failure of truncated time
**What this PR does / why we need it**:
The test of `TestFsStoreAssignedModified` in `pkg/kubelet/kubeletconfig/checkpoint/store` fails in my environment like below.
```
$ make test WHAT=./pkg/kubelet/kubeletconfig/checkpoint/store/
Running tests for APIVersion: v1,admissionregistration.k8s.io/v1alpha1,admissionregistration.k8s.io/v1beta1,admission.k8s.io/v1beta1,apps/v1beta1,apps/v1beta2,apps/v1,authentication.k8s.io/v1,authentication.k8s.io/v1beta1,authorization.k8s.io/v1,authorization.k8s.io/v1beta1,autoscaling/v1,autoscaling/v2beta1,batch/v1,batch/v1beta1,batch/v2alpha1,certificates.k8s.io/v1beta1,coordination.k8s.io/v1beta1,extensions/v1beta1,events.k8s.io/v1beta1,imagepolicy.k8s.io/v1alpha1,networking.k8s.io/v1,policy/v1beta1,rbac.authorization.k8s.io/v1,rbac.authorization.k8s.io/v1beta1,rbac.authorization.k8s.io/v1alpha1,scheduling.k8s.io/v1alpha1,scheduling.k8s.io/v1beta1,settings.k8s.io/v1alpha1,storage.k8s.io/v1beta1,storage.k8s.io/v1,storage.k8s.io/v1alpha1,
+++ [0628 22:53:39] Running tests without code coverage
--- FAIL: TestFsStoreAssignedModified (0.00s)
fsstore_test.go:316: expect "2018-06-28T22:53:43+09:00" but got "2018-06-28T22:53:43+09:00"
FAIL
FAIL k8s.io/kubernetes/pkg/kubelet/kubeletconfig/checkpoint/store 0.236s
make: *** [test] Error 1
```
My environment is
OS: macOS Sierra Version 10.12.6
File System: Journaled HFS+
The error message confused me because the comparing times looked the same in the error log. If we know certain systems truncate times, I think we can just compare less precise times to avoid confusions in tests.
**Special notes for your reviewer**:
N/A
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 60150, 65467, 65487, 65595, 65374). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
kubelet: feature gate LSI capacity calculation
Currently if `cm.cadvisorInterface.RootFsInfo()` fails, the whole kubelet bails. If `/var/lib/kubelet` is on a tmpfs or bindmount, this can happen (this is the case for some of our CI envs https://github.com/openshift/origin/issues/19948).
We would be able to workaround this, in the short term, by disabling the LSI feature gate if the capacity calculate was protected by the gate, but currently it isn't.
This PR adds the gate check around setting the ephemeral storage capacity.
@liggitt @derekwaynecarr @dashpole
It might be a different discussion about whether or not this should be fatal. If it isn't fatal, seems that it would just prevent pods that had a ephemeral storage request from being scheduled.
/sig node
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Revert "certs: only append locally discovered addresses when we got none from the cloudprovider"
This reverts commit 7354bbe5ac.
https://github.com/kubernetes/kubernetes/pull/61869 caused a mismatch between the requested CSR and the addresses in node status.
Instead of computing addresses in two places, the cert manager should derive its CSR request from the addresses in node status. This would enable the kubelet to react to address changes, as well as be driven by an external cloud provider.
/cc @mikedanese
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add support for plugin directory hierarchy
**What this PR does / why we need it**:
Add hierarchy support for plugin directory, it traverses and
watch plugin directory and its sub directory recursively.
plugin socket file only need be unique within one directory,
```
plugin socket directory
|
---->sub directory 1
| |
| -----> socket1, socket2 ...
----->sub directory 2
|
------> socket1, socket2 ...
```
the design itself allow sub directory be anything,
but in practical, each plugin type could just use one sub directory.
**Which issue(s) this PR fixes**:
Fixes#64003
**Special notes for your reviewer**:
twos bonus changes added as below
1) propose to let pluginWatcher bookkeeping registered plugins,
to make sure plugin name is unique within one plugin type.
arguably, we could let each handler do the same work, but it requires
every handler repeat the same thing.
2) extract example handler out from test, it is easier to read the code with the
seperation.
**Release note**:
```release-note
N/A
```
/sig node
/cc @vikaschoudhary16 @jiayingz @RenaudWasTaken @vishh @derekwaynecarr @saad-ali @vladimirvivien @dchen1107 @yujuhong @tallclair @Random-Liu @anfernee @akutz
Automatic merge from submit-queue (batch tested with PRs 65453, 65523, 65513, 65560). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Cleanup verbose cAdvisor mocking in Kubelet unit tests
These tests had a lot of duplicate code to set up the cAdvisor mock, but weren't really depending on the mock functionality. By moving the tests to use the fake cAdvisor, most of the setup can be cleaned up.
/kind cleanup
/sig node
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 59214, 65330). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Migrate cpumanager to use checkpointing manager
**What this PR does / why we need it**:
This PR migrates `cpumanager` to use new kubelet level node checkpointing feature (#56040) to decrease code redundancy and improve consistency.
**Which issue(s) this PR fixes**:
Fixes#58339
**Notes**:
At point of submitting PR the most straightforward approach was used - `state_checkpoint` implementation of `State` interface was added. However, with checkpointing implementation there might be no point to keep `State` interface and just use single implementation with checkpoint backend and in case of different backend than filestore needed just supply `cpumanager` with custom `CheckpointManager` implementation.
/kind feature
/sig node
cc @flyingcougar @ConnorDoyle
it traverses and watch plugin directory and its sub directory recursively,
plugin socket file only need be unique within one directory,
- plugin socket directory
- |
- ---->sub directory 1
- | |
- | -----> socket1, socket2 ...
- ----->sub directory 2
- |
- ------> socket1, socket2 ...
the design itself allow sub directory be anything,
but in practical, each plugin type could just use one sub directory.
four bonus changes added as below
1. extract example handler out from test, it is easier to read the code
with the seperation.
2. there are two variables here: "Watcher" and "watcher".
"Watcher" is the plugin watcher, and "watcher" is the fsnotify watcher.
so rename the "watcher" to "fsWatcher" to make code easier to
understand.
3. change RegisterCallbackFn() return value order, it is
conventional to return error last, after this change,
the pkg/volume/csi is compliance with golint, so remove it
from hack/.golint_failures
4. refactor errors handling at invokeRegistrationCallbackAtHandler()
to make error message more clear.
Automatic merge from submit-queue (batch tested with PRs 61330, 64793, 64675, 65059, 65368). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fixes data races for pkg/kubelet/config/file_linux_test.go
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#64655
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 65290, 65326, 65289, 65334, 64860). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
checkLimitsForResolvConf for the pod create and update events instead of checking period
**What this PR does / why we need it**:
- Check for the same at pod create and update events instead of checking continuously for every 30 seconds.
- Increase the logging level to 4 or higher since the event is not catastrophic to cluster health .
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#64849
**Special notes for your reviewer**:
@ravisantoshgudimetla
**Release note**:
```release-note
checkLimitsForResolvConf for the pod create and update events instead of checking period
```
Automatic merge from submit-queue (batch tested with PRs 65187, 65206, 65223, 64752, 65238). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Kubelet watches necessary secrets/configmaps instead of periodic polling
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
move oldNodeUnschedulable pkg var to kubelet struct
**What this PR does / why we need it**:
move oldNodeUnschedulable pkg var to kubelet struct
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 58690, 64773, 64880, 64915, 64831). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
ignore not found file error when watching manifests
**What this PR does / why we need it**:
An alternative of #63910.
When using vim to create a new file in manifest folder, a temporary file, with an arbitrary number (like 4913) as its name, will be created to check if a directory is writable and see the resulting ACL.
These temporary files will be deleted later, which should by ignored when watching the manifest folder.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#55928, #59009, #48219
**Special notes for your reviewer**:
/cc dims luxas yujuhong liggitt tallclair
**Release note**:
```release-note
ignore not found file error when watching manifests
```
Automatic merge from submit-queue (batch tested with PRs 64688, 64451, 64504, 64506, 56358). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
cleanup some dead kubelet code
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 65032, 63471, 64104, 64672, 64427). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
pkg: kubelet: remote: increase grpc client default size to 16MiB
**What this PR does / why we need it**:
Increase the gRPC max message size to 16MB in the remote container runtime. I've seen sizes over 8MB in clusters with big (256GB RAM) nodes.
**Release note**:
```release-note
Increase the gRPC max message size to 16MB in the remote container runtime.
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
bind alpha feature network plugin flags correctly
**What this PR does / why we need it**:
When working #63542, I found the flags, like `--cni-conf-dir` and `cni-bin-dir`, were not correctly bound.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
/cc kubernetes/sig-node-pr-reviews
**Release note**:
```release-note
None
```
Automatic merge from submit-queue (batch tested with PRs 64142, 64426, 62910, 63942, 64548). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Clean up fake mounters.
**What this PR does / why we need it**:
Fixes https://github.com/kubernetes/kubernetes/issues/61502
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
list of fake mounters:
- (keep) pkg/util/mount.FakeMounter
- (removed) pkg/kubelet/cm.fakeMountInterface:
- (inherit from mount.FakeMounter) pkg/util/mount.fakeMounter
- (inherit from mount.FakeMounter) pkg/util/removeall.fakeMounter
- (removed) pkg/volume/host_path.fakeFileTypeChecker
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 65230, 57355, 59174, 63698, 63659). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
TODO has already been implemented
**What this PR does / why we need it**:
TODO has already been implemented, remove the TODO tag.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
```NONE
If we create a new key on each CSR, if CSR fails the next attempt will
create a new one instead of reusing previous CSR.
If approver/signer don't handle CSRs as quickly as new nodes come up,
they can pile up and approver would keep handling old abandoned CSRs and
Nodes would keep timing out on startup.
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
dockershim/network: add dcbw to OWNERS as an approver
I've been involved with the kubelet network code, including most
of this code, for a couple years and contributed a good number
of PRs for these directories. I've also been a SIG Network
co-lead for couple years.
I've also been on the CNI maintainers team for a couple years.
```release-note
NONE
```
@freehan @thockin @kubernetes/sig-network-pr-reviews
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Revert #64189: Fix Windows CNI for the sandbox case
**What this PR does / why we need it**:
This reverts PR #64189, which breaks DNS for Windows containers.
Refer https://github.com/kubernetes/kubernetes/pull/64189#issuecomment-395248704
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#64861
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
cc @madhanrm @PatrickLang @alinbalutoiu @dineshgovindasamy
Automatic merge from submit-queue (batch tested with PRs 64503, 64903, 64643, 64987). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Use unix.EpollWait to determine when memcg events are available to be Read
**What this PR does / why we need it**:
This fixes a file descriptor leak introduced in https://github.com/kubernetes/kubernetes/pull/60531 when the `--experimental-kernel-memcg-notification` kubelet flag is enabled. The root of the issue is that `unix.Read` blocks indefinitely when reading from an event file descriptor and there is nothing to read. Since we refresh the memcg notifications, these reads accumulate until the memcg threshold is crossed, at which time all reads complete. However, if the node never comes under memory pressure, the node can run out of file descriptors.
This PR changes the eviction manager to use `unix.EpollWait` to wait, with a 10 second timeout, for events to be available on the eventfd. We only read from the eventfd when there is an event available to be read, preventing an accumulation of `unix.Read` threads, and allowing the event file descriptors to be reclaimed by the kernel.
This PR also breaks the creation, and updating of the memcg threshold into separate portions, and performs creation before starting the periodic synchronize calls. It also moves the logic of configuring memory thresholds into memory_threshold_notifier into a separate file.
This also reverts https://github.com/kubernetes/kubernetes/pull/64582, as the underlying leak that caused us to disable it for testing is fixed here.
Fixes#62808
**Release note**:
```release-note
NONE
```
/sig node
/kind bug
/priority critical-urgent
I've been involved with the kubelet network code, including most
of this code, for a couple years and contributed a good number
of PRs for these directories. I've also been a SIG Network
co-lead for couple years.
I've also been on the CNI maintainers team for a couple years.
Automatic merge from submit-queue (batch tested with PRs 63905, 64855). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Setup dns servers and search domains for Windows Pods
**What this PR does / why we need it**:
Kubelet is depending on docker container's ResolvConfPath (e.g. /var/lib/docker/containers/439efe31d70fc17485fb6810730679404bb5a6d721b10035c3784157966c7e17/resolv.conf) to setup dns servers and search domains. While this is ok for Linux containers, ResolvConfPath is always an empty string for windows containers. So that the DNS setting for windows containers is always not set.
This PR setups DNS for Windows sandboxes. In this way, Windows Pods could also use kubernetes dns policies.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#61579
**Special notes for your reviewer**:
Requires Docker EE version >= 17.10.0.
**Release note**:
```release-note
Setup dns servers and search domains for Windows Pods in dockershim. Docker EE version >= 17.10.0 is required for propagating DNS to containers.
```
/cc @PatrickLang @taylorb-microsoft @michmike @JiangtianLi
Automatic merge from submit-queue (batch tested with PRs 63717, 64646, 64792, 64784, 64800). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Update cadvisor godeps to v0.30.0
**What this PR does / why we need it**:
cAdvisor godep update corresponding to 1.11
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#63204
**Release note**:
```release-note
Use IONice to reduce IO priority of du and find
cAdvisor ContainerReference no longer contains Labels. Use ContainerSpec instead.
Fix a bug where cadvisor failed to discover a sub-cgroup that was created soon after the parent cgroup.
```
/sig node
/kind bug
/priority critical-urgent
/assign @dchen1107
Automatic merge from submit-queue (batch tested with PRs 63717, 64646, 64792, 64784, 64800). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Reconcile extended resource capacity after kubelet restart.
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes https://github.com/kubernetes/kubernetes/issues/64632
**Special notes for your reviewer**:
**Release note**:
```release-note
Kubelet will set extended resource capacity to zero after it restarts. If the extended resource is exported by a device plugin, its capacity will change to a valid value after the device plugin re-connects with the Kubelet. If the extended resource is exported by an external component through direct node status capacity patching, the component should repatch the field after kubelet becomes ready again. During the time gap, pods previously assigned with such resources may fail kubelet admission but their controller should create new pods in response to such failures.
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Promote sysctl annotations to fields
#
**What this PR does / why we need it**:
Promoting experimental sysctl feature from annotations to API fields.
**Special notes for your reviewer**:
Following sysctl KEP: https://github.com/kubernetes/community/pull/2093
**Release note**:
```release-note
The Sysctls experimental feature has been promoted to beta (enabled by default via the `Sysctls` feature flag). PodSecurityPolicy and Pod objects now have fields for specifying and controlling sysctls. Alpha sysctl annotations will be ignored by 1.11+ kubelets. All alpha sysctl annotations in existing deployments must be converted to API fields to be effective.
```
**TODO**:
* [x] - Promote sysctl annotation in Pod spec
* [x] - Promote sysctl annotation in PodSecuritySpec spec
* [x] - Feature gate the sysctl
* [x] - Promote from alpha to beta
* [x] - docs PR - https://github.com/kubernetes/website/pull/8804
Automatic merge from submit-queue (batch tested with PRs 64009, 64780, 64354, 64727, 63650). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
per-field dynamic config advice
Dynamic Kubelet config gives cluster admins and k8s-as-a-service providers a lot of flexibility around reconfiguring the Kubelet in live environments. With great power comes great responsibility. These comments intend to provide more nuanced guidance around using dynamic Kubelet config by adding items to consider when changing various fields and pointing out where cluster admins and k8s-as-service providers should maintain extra caution.
@kubernetes/sig-node-pr-reviews PLEASE provide feedback and help fill in the blanks here, I don't have domain expertise in all of these features.
https://github.com/kubernetes/features/issues/281
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 64009, 64780, 64354, 64727, 63650). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Kubelet: Add security context for Windows containers
**What this PR does / why we need it**:
This PR adds windows containers to Kubelet CRI and also implements security context setting for docker containers.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
RunAsUser from Kubernetes API only accept int64 today, which is not supported on Windows. It should be changed to intstr for working with both Windows and Linux containers in a separate PR.
**Release note**:
```release-note
Kubelet: Add security context for Windows containers
```
/cc @PatrickLang @taylorb-microsoft @michmike @JiangtianLi @yujuhong @dchen1107
Automatic merge from submit-queue (batch tested with PRs 64276, 64094, 64719, 64766, 64750). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Delegate map operation to BlockVolumeMapper plugin
**What this PR does / why we need it**:
This PR refactors the volume controller's operation generator, for block mapping, to delegate core block mounting sequence to the `volume.BlockVolumeMapper` plugin instead of living in the operation generator. This is to ensure better customization of block volume logic for existing internal volume plugins.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#64093
```release-note
NONE
```
/sig storage
Automatic merge from submit-queue (batch tested with PRs 62266, 64351, 64366, 64235, 64560). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Adding CSI driver registration with plugin watcher
Adding CSI driver registration bits. The registration process will leverage driver-registrar side which will open the `registration` socket and will listen for pluginwatcher's GetInfo calls.
```release-note
Adding CSI driver registration code.
```
/sig sig-storage
Automatic merge from submit-queue (batch tested with PRs 62266, 64351, 64366, 64235, 64560). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Bind mount subpath with same read/write settings as underlying volume
**What this PR does / why we need it**:
https://github.com/kubernetes/kubernetes/pull/63045 broke two scenarios:
* If volumeMount path already exists in container image, container runtime will try to chown the volume
* In SELinux system, we will try to set SELinux labels when starting the container
This fix makes it so that the subpath bind mount will inherit the read/write settings of the underlying volume mount. It does this by using the "bind,remount" mount options when doing the bind mount.
The underlying volume mount is ro when the volumeSource.readOnly flag is set. This is for persistent volume types like PVC, GCE PD, NFS, etc. When this is set, we won't try to configure SELinux labels. Also in this mode, subpaths have to already exist in the volume, we cannot make new directories on a read only volume.
When volumeMount.readOnly is set, the container runtime is in charge of making the volume in the container readOnly, but the underlying volume mount on the host can be writable. This can be set for any volume type, and is permanently set for atomic volume types like configmaps, secrets. In this case, SELinux labels will be applied before the container runtime makes the volume readOnly. And subpaths don't have to exist.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#64120
**Special notes for your reviewer**:
**Release note**:
```release-note
Fixes issue for readOnly subpath mounts for SELinux systems and when the volume mountPath already existed in the container image.
```
Automatic merge from submit-queue (batch tested with PRs 62266, 64351, 64366, 64235, 64560). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add log and fs stats for Windows containers
**What this PR does / why we need it**:
Add log and fs stats for Windows containers.
Without this, kubelet will report errors continuously:
```
Unable to fetch container log stats for path \var\log\pods\2a70ed65-37ae-11e8-8730-000d3a14b1a0\echo: Du not supported for this build.
```
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#60180#62047
**Special notes for your reviewer**:
**Release note**:
```release-note
Add log and fs stats for Windows containers
```
Automatic merge from submit-queue (batch tested with PRs 63453, 64592, 64482, 64618, 64661). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Revert "Remove rescheduler and corresponding tests from master"
Reverts kubernetes/kubernetes#64364
After discussing with @bsalamat on how DS controllers(ref: https://github.com/kubernetes/kubernetes/pull/63223#discussion_r192277527) cannot create pods if the cluster is at capacity and they have to rely on rescheduler for making some space, we thought it is better to
- Bring rescheduler back.
- Make rescheduler priority aware.
- If cluster is full and if **only** DS controller is not able to create pods, let rescheduler be run and let it evict some pods which have less priority.
- The DS controller pods will be scheduled now.
So, I am reverting this PR now. Step 2, 3 above are going to be in rescheduler.
/cc @bsalamat @aveshagarwal @k82cn
Please let me know your thoughts on this.
```release-note
Revert #64364 to resurrect rescheduler. More info https://github.com/kubernetes/kubernetes/issues/64725 :)
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Address comments in #64006.
Address comments in #64006
@tallclair @yujuhong
@kubernetes/sig-node-pr-reviews
Signed-off-by: Lantao Liu <lantaol@google.com>
**Release note**:
```release-note
none
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix Windows CNI for the sandbox case
**What this PR does / why we need it**:
Windows supports both sandbox and non-sandbox cases. The non-sandbox
case is for Windows Server 2016 and for Windows Server version greater
than 1709 which use Hyper-V containers.
Currently, the CNI on Windows fetches the IP from the containers
within the pods regardless of the mode. This should be done only
in the non-sandbox mode where the IP of the actual container
will be different than the IP of the sandbox container.
In the case where the sandbox container is supported, all the containers
from the same pod will share the network details of the sandbox container.
This patch updates the CNI to fetch the IP from the sandbox container
when this mode is supported.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#64188
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 63348, 63839, 63143, 64447, 64567). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Containerized subpath
**What this PR does / why we need it**:
Containerized kubelet needs a different implementation of `PrepareSafeSubpath` than kubelet running directly on the host.
On the host we safely open the subpath and then bind-mount `/proc/<pidof kubelet>/fd/<descriptor of opened subpath>`.
With kubelet running in a container, `/proc/xxx/fd/yy` on the host contains path that works only inside the container, i.e. `/rootfs/path/to/subpath` and thus any bind-mount on the host fails.
Solution:
- safely open the subpath and gets its device ID and inode number
- blindly bind-mount the subpath to `/var/lib/kubelet/pods/<uid>/volume-subpaths/<name of container>/<id of mount>`. This is potentially unsafe, because user can change the subpath source to a link to a bad place (say `/run/docker.sock`) just before the bind-mount.
- get device ID and inode number of the destination. Typical users can't modify this file, as it lies on /var/lib/kubelet on the host.
- compare these device IDs and inode numbers.
**Which issue(s) this PR fixes**
Fixes#61456
**Special notes for your reviewer**:
The PR contains some refactoring of `doBindSubPath` to extract the common code. New `doNsEnterBindSubPath` is added for the nsenter related parts.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 63348, 63839, 63143, 64447, 64567). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Move pkg/scheduler/schedulercache -> pkg/scheduler/cache
**What this PR does / why we need it**:
Move pkg/scheduler/schedulercache -> pkg/scheduler/cache
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#63813
**Special notes for your reviewer**:
In order to prevent name conflicts still rename the `cache` to `schedulercache`.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add proxy for container streaming in kubelet for streaming auth.
For https://github.com/kubernetes/kubernetes/issues/36666, option 2 of https://github.com/kubernetes/kubernetes/issues/36666#issuecomment-378440458.
This PR:
1. Removed the `DirectStreamingRuntime`, and changed `IndirectStreamingRuntime` to `StreamingRuntime`. All `DirectStreamingRuntime`s, `dockertools` and `rkt`, were removed.
2. Proxy container streaming in kubelet instead of returning redirect to apiserver. This solves the container runtime authentication issue, which is what we agreed on in https://github.com/kubernetes/kubernetes/issues/36666.
Please note that, this PR replaced the redirect with proxy directly instead of adding a knob to switch between the 2 behaviors. For existing CRI runtimes like containerd and cri-o, they should change to serve container streaming on localhost, so as to make the whole container streaming connection secure.
If a general authentication mechanism proposed in https://github.com/kubernetes/kubernetes/issues/62747 is ready, we can switch back to redirect, and all code can be found in github history.
Please also note that this added some overhead in kubelet when there are container streaming connections. However, the actual bottleneck is in the apiserver anyway, because it does proxy for all container streaming happens in the cluster. So it seems fine to get security and simplicity with this overhead. @derekwaynecarr @mrunalp Are you ok with this? Or do you prefer a knob?
@yujuhong @timstclair @dchen1107 @mikebrow @feiskyer
/cc @kubernetes/sig-node-pr-reviews
**Release note**:
```release-note
Kubelet now proxies container streaming between apiserver and container runtime. The connection between kubelet and apiserver is authenticated. Container runtime should change streaming server to serve on localhost, to make the connection between kubelet and container runtime local.
In this way, the whole container streaming connection is secure. To switch back to the old behavior, set `--redirect-container-streaming=true` flag.
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Remove rescheduler and corresponding tests from master
**What this PR does / why we need it**:
This is to remove rescheduler from master branch as we are promoting priority and preemption to beta.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Part of #57471
**Special notes for your reviewer**:
/cc @bsalamat @aveshagarwal
**Release note**:
```release-note
Remove rescheduler from master.
```
Automatic merge from submit-queue (batch tested with PRs 64338, 64219, 64486, 64495, 64347). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
remove unused status per TODO
This should have been deleted in #63221, as it is now unused.
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 57082, 64325, 64016, 64443, 64403). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
small nit in the annotations of pkg/kubelet/container/os.go
**What this PR does / why we need it**:
just a small nit in the annotations of container/os.go, but, it looks quite uncomfortable cause others all get right.
Automatic merge from submit-queue (batch tested with PRs 61803, 64305, 64170, 64361, 64339). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
add a flag to control the cap on images reported in node status
While I normally try to avoid adding flags, this is a short term
scalability fix for v1.11, and there are other long-term solutions in
the works, so we shouldn't commit to this in the v1beta1 Kubelet config.
Flags are our escape hatch here.
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 58920, 58327, 60577, 49388, 62306). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Dynamic env in subpath - Fixes Issue 48677
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#48677
**Special notes for your reviewer**:
**Release note**:
```release-note
Adds the VolumeSubpathEnvExpansion alpha feature to support environment variable expansion
Sub-paths cannot be mounted with a dynamic volume mount name.
This fix provides environment variable expansion to sub paths
This reduces the need to manage symbolic linking within sidecar init containers to achieve the same goal
```
While I normally try to avoid adding flags, this is a short term
scalability fix for v1.11, and there are other long-term solutions in
the works, so we shouldn't commit to this in the v1beta1 Kubelet config.
Flags are our escape hatch.
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add metadata to kubelet eviction event annotations
**What this PR does / why we need it**:
Add annotations to kubelet eviction events. Annotations include
"offending_containers" : comma-seperated list of containers.
"offending_containers_usage": comma-seperated list of usage.
"starved_resource": v1.ResourceName of the starved resource
**Special notes for your reviewer**:
Adding annotations to events required changing the `EventRecorder` interface to add a `AnnotatedEventf` function, which can add annotations to an event.
**Release note**:
```release-note
NONE
```
/assign @dchen1107
cc @mwielgus @schylek @kgrygiel
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
improve test: verify kubelet.config.Restore only happen once
**What this PR does / why we need it**:
This patch is to add additional test coverage of pod config restore,
it verifies that restore can only happen once.
in the second restore attempt, we should expect no error and no channel update.
**Which issue(s) this PR fixes**:
this is a test improvement based on test been added at https://github.com/kubernetes/kubernetes/pull/63553
**Special notes for your reviewer**:
**Release note**:
```release-note
None
```
/sig node
/cc @rphillips @jiayingz @vikaschoudhary16 @anfernee @Random-Liu @dchen1107 @derekwaynecarr
@vishh @yujuhong @tallclair
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Validate cgroups-per-qos for Windows
**What this PR does / why we need it**:
cgroups-per-qos and enforce-node-allocatable is not supported on Windows, but kubelet allows it on Windows. And then Pods may stuck in terminating state because of it. Refer #61716.
This PR adds validation for them and make kubelet refusing to start in this case.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#61716
**Special notes for your reviewer**:
**Release note**:
```release-note
Fail fast if cgroups-per-qos is set on Windows
```
The goal of this change is to remove the registration of signal
handling from pkg/kubelet. We now pass in a stop channel.
If you register a signal handler in `main()` to aid in a controlled
and deliberate exit then the handler registered in `pkg/kubelet` often
wins and the process exits immediately. This means all other signal
handler registrations are currently racy if `DockerServer.Start()` is
directly or indirectly invoked.
This change also removes another signal handler registration from
`NewAPIServerCommand()`; a stop channel is now passed to this
function.
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix typo in volume_stats.go
**What this PR does / why we need it**:
While reviewing the implementation details I came across a typo in volume_stats.go
sed/volumeStatsCollecotr/volumeStatsCollector/
**Release note**:
```release-note
NONE
```
Windows supports both sandbox and non-sandbox cases. The non-sandbox
case is for Windows Server 2016 and for Windows Server version greater
than 1709 which use Hyper-V containers.
Currently, the CNI on Windows fetches the IP from the containers
within the pods regardless of the mode. This should be done only
in the non-sandbox mode where the IP of the actual container
will be different than the IP of the sandbox container.
In the case where the sandbox container is supported, all the containers
from the same pod will share the network details of the sandbox container.
This patch updates the CNI to fetch the IP from the sandbox container
when this mode is supported.
Signed-off-by: Alin Balutoiu <abalutoiu@cloudbasesolutions.com>
Automatic merge from submit-queue (batch tested with PRs 64013, 63896, 64139, 57527, 62102). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
add dynamic config metrics
This PR exports config-releated metrics from the Kubelet.
The Guages for active, assigned, and last-known-good config can be used
to identify config versions and produce aggregate counts across several
nodes. The error-reporting Gauge can be used to determine whether a node
is experiencing a config-related error, and to prodouce an aggregate
count of nodes in an error state.
https://github.com/kubernetes/features/issues/281
```release-note
The Kubelet now exports metrics that report the assigned (node_config_assigned), last-known-good (node_config_last_known_good), and active (node_config_active) config sources, and a metric indicating whether the node is experiencing a config-related error (node_config_error). The config source metrics always report the value 1, and carry the node_config_name, node_config_uid, node_config_resource_version, and node_config_kubelet_key labels, which identify the config version. The error metric reports 1 if there is an error, 0 otherwise.
```
Automatic merge from submit-queue (batch tested with PRs 59851, 64114, 63912, 64156, 64191). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
kubelet: Move RotateCertificates to the KubeletConfiguration struct
**What this PR does / why we need it**:
Moves `.RotateCertificates` to the `KubeletConfiguration` struct, so it can be configured via the Config file smoothly.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes https://github.com/kubernetes/kubernetes/issues/63878
Fixes https://github.com/kubernetes/kubernetes/issues/61653
**Special notes for your reviewer**:
Pretty similar to https://github.com/kubernetes/kubernetes/pull/62352
**Release note**:
```release-note
The kubelet certificate rotation feature can now be enabled via the `.RotateCertificates` field in the kubelet's config file. The `--rotate-certificates` flag is now deprecated, and will be removed in a future release.
```
@kubernetes/sig-node-pr-reviews @kubernetes/sig-cluster-lifecycle-pr-reviews
Kubelet should not resolve symlinks outside of mounter interface.
Only mounter interface knows, how to resolve them properly on the host.
As consequence, declaration of SafeMakeDir changes to simplify the
implementation:
from SafeMakeDir(fullPath string, base string, perm os.FileMode)
to SafeMakeDir(subdirectoryInBase string, base string, perm os.FileMode)
Automatic merge from submit-queue (batch tested with PRs 63914, 63887, 64116, 64026, 62933). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Enable SELinux relabeling in CSI volumes
**What this PR does / why we need it**:
CSI volume plugin should provide correct information in `GetAttributes` call so kubelet can ask container runtime to relabel the volume. Therefore CSI volume plugin needs to check if a random volume mounted by a CSI driver supports SELinux or not by checking for "seclabel" mount or superblock option.
**Which issue(s) this PR fixes**
Fixes#63965
**Release note**:
```release-note
NONE
```
@saad-ali @vladimirvivien @davidz627
@cofyc, FYI, I'm changing `struct mountInfo`.
This PR exports config-releated metrics from the Kubelet.
The Guages for active, assigned, and last-known-good config can be used
to identify config versions and produce aggregate counts across several
nodes. The error-reporting Gauge can be used to determine whether a node
is experiencing a config-related error, and to prodouce an aggregate
count of nodes in an error state.
Automatic merge from submit-queue (batch tested with PRs 63151, 63795, 63553, 64068, 64113). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
kubelet: fix checkpoint manager logic bug on restore
**What this PR does / why we need it**:
I am testing the new checkpoint logic within the kubelet and ran across a logic bug on API server restores.
Initial PR: https://github.com/kubernetes/kubernetes/pull/56040
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
**Special notes for your reviewer**:
/cc @vikaschoudhary16
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 63151, 63795, 63553, 64068, 64113). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Implement watch-based secret manager
Initial experiments on 5000-node Kubemark show that apiserver is handling those with no real issues.
That said, we shouldn't enable it in prod without much more extensive scalability tests (so most probably not in 1.11), but having that in would enable easier testing.
@liggitt
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
fix typo: peirodically->periodically
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 63881, 64046, 63409, 63402, 63221). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Kubelet responds to ConfigMap mutations for dynamic Kubelet config
This PR makes dynamic Kubelet config easier to reason about by leaving less room for silent skew scenarios. The new behavior is as follows:
- ConfigMap does not exist: Kubelet reports error status due to missing source
- ConfigMap is created: Kubelet starts using it
- ConfigMap is updated: Kubelet respects the update (but we discourage this pattern, in favor of incrementally migrating to a new ConfigMap)
- ConfigMap is deleted: Kubelet keeps using the config (non-disruptive), but reports error status due to missing source
- ConfigMap is recreated: Kubelet respects any updates (but, again, we discourage this pattern)
This PR also makes a small change to the config checkpoint file tree structure, because ResourceVersion is now taken into account when saving checkpoints. The new structure is as follows:
```
- dir named by --dynamic-config-dir (root for managing dynamic config)
| - meta
| - assigned (encoded kubeletconfig/v1beta1.SerializedNodeConfigSource object, indicating the assigned config)
| - last-known-good (encoded kubeletconfig/v1beta1.SerializedNodeConfigSource object, indicating the last-known-good config)
| - checkpoints
| - uid1 (dir for versions of object identified by uid1)
| - resourceVersion1 (dir for unpacked files from resourceVersion1)
| - ...
| - ...
```
fixes: #61643
```release-note
The dynamic Kubelet config feature will now update config in the event of a ConfigMap mutation, which reduces the chance for silent config skew. Only name, namespace, and kubeletConfigKey may now be set in Node.Spec.ConfigSource.ConfigMap. The least disruptive pattern for config management is still to create a new ConfigMap and incrementally roll out a new Node.Spec.ConfigSource.
```
Automatic merge from submit-queue (batch tested with PRs 63881, 64046, 63409, 63402, 63221). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Kubelet config: Validate new config against future feature gates
This fixes an issue with KubeletConfiguration validation, where the
feature gates set by the new config were not taken into account.
Also fixes a validation issue with dynamic Kubelet config, where flag
precedence was not enforced prior to dynamic config validation in the
controller; this prevented rejection of dynamic configs that don't merge
well with values set via legacy flags.
Fixes#63305
```release-note
NONE
```
This fixes an issue with KubeletConfiguration validation, where the
feature gates set by the new config were not taken into account.
Also fixes a validation issue with dynamic Kubelet config, where flag
precedence was not enforced prior to dynamic config validation in the
controller; this prevented rejection of dynamic configs that don't merge
well with values set via legacy flags.
Automatic merge from submit-queue (batch tested with PRs 60012, 63692, 63977, 63960, 64008). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
pkg: kubelet: remote: increase grpc client default size
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
**What this PR does / why we need it**:
when running lots and lots of containers and having tons of images on a given node, we started seeing this in the logs (with docker):
```
Unable to retrieve pods: rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4208374 vs. 4194304)
```
That's because the grpc client is defaulting to a 4MB response size.
This patch increases the resp size to 8MB to avoid such issue.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
increase grpc client default response size
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix e2e "When checkpoint file is corrupted should complete pod sandbo…
…x clean up"
**What this PR does / why we need it**:
This PR fixes the e2e-node test, "When checkpoint file is corrupted should complete pod sandbox clean up"
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#62738
Related #62937
**Special notes for your reviewer**:
**Release note**:
```release-note
None
```
/cc @dashpole @derekwaynecarr
/sig node
Automatic merge from submit-queue (batch tested with PRs 63886, 63857, 63824). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Refactor cache based manager
This is support to be no-op refactoring. It will only allow to share code between secret and configmap managers.
Automatic merge from submit-queue (batch tested with PRs 63865, 57849, 63932, 63930, 63936). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Extract connection rotating dialer into a package
**What this PR does / why we need it**: This will be re-used for exec auth plugin to rotate connections on
credential change.
**Special notes for your reviewer**: this was split from https://github.com/kubernetes/kubernetes/pull/61803 to simplify review
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 63865, 57849, 63932, 63930, 63936). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Eviction Node e2e test checks for eviction reason
**What this PR does / why we need it**:
Currently, the eviction test simply ensures that pods are marked `Failed`. However, this could occur because of an OOM, rather than an eviction.
To ensure that pods are actually being evicted, check for the Reason in the pod status to ensure it is evicted.
**Release note**:
```release-note
NONE
```
cc @kubernetes/sig-node-pr-reviews
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix formatting for kubelet memcg notification threshold
/kind bug
**What this PR does / why we need it**:
This fixes the following errors (found in [this node_e2e serial test log](https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-node-kubelet-serial/4118/artifacts/tmp-node-e2e-49baaf8a-cos-stable-63-10032-71-0/kubelet.log)):
`eviction_manager.go:256] eviction manager attempting to integrate with kernel memcg notification api`
`threshold_notifier_linux.go:70] eviction: setting notification threshold to 4828488Ki`
`eviction_manager.go:272] eviction manager: failed to create hard memory threshold notifier: invalid argument`
**Special notes for your reviewer**:
This needs to be cherrypicked back to 1.10.
This regression was added in https://github.com/kubernetes/kubernetes/pull/60531, because the `quantity` being used was changed from a DecimalSI to BinarySI, which changes how it is printed out in the String() method. To make it more explicit that we want the value, just convert Value() to a string.
**Release note**:
```release-note
Fix memory cgroup notifications, and reduce associated log spam.
```
Automatic merge from submit-queue (batch tested with PRs 63314, 63884, 63799, 63521, 62242). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Check CIDR before updating node status
**What this PR does / why we need it**:
Check CIDR before updating node status. See #62164.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#62164
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 63314, 63884, 63799, 63521, 62242). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add memcg notifications for allocatable cgroup
**What this PR does / why we need it**:
Use memory cgroup notifications to trigger the eviction manager when the allocatable eviction threshold is crossed. This allows the eviction manager to respond more quickly when the allocatable cgroup's available memory becomes low. Evictions are preferable to OOMs in the cgroup since the kubelet can enforce its priorities on which pod is killed.
**Which issue(s) this PR fixes**:
Fixes https://github.com/kubernetes/kubernetes/issues/57901
**Special notes for your reviewer**:
This adds the alloctable cgroup from the container manager to the eviction config.
**Release note**:
```release-note
NONE
```
/sig node
/priority important-soon
/kind feature
I would like this to be included in the 1.11 release.
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Move to a structured status for dynamic kubelet config
This PR updates dynamic Kubelet config to use a structured status, rather than a node condition. This makes the status machine-readable, and thus more useful for config orchestration.
Fixes: #56896
```release-note
The status of dynamic Kubelet config is now reported via Node.Status.Config, rather than the KubeletConfigOk node condition.
```
Updates dynamic Kubelet config to use a structured status, rather than a
node condition. This makes the status machine-readable, and thus more
useful for config orchestration.
Fixes: #56896
Automatic merge from submit-queue (batch tested with PRs 63603, 63557, 62015). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
CRI: update documents for container logpath
**What this PR does / why we need it**:
The container log path has been changed from `containername_attempt#.log` to `containername/attempt#.log` in #59906. This PR updates CRI documents for it.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
CRI: update documents for container logpath. The container log path has been changed from containername_attempt#.log to containername/attempt#.log
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
track/close kubelet->API connections on heartbeat failure
xref #48638
xref https://github.com/kubernetes-incubator/kube-aws/issues/598
we're already typically tracking kubelet -> API connections and have the ability to force close them as part of client cert rotation. if we do that tracking unconditionally, we gain the ability to also force close connections on heartbeat failure as well. it's a big hammer (means reestablishing pod watches, etc), but so is having all your pods evicted because you didn't heartbeat.
this intentionally does minimal refactoring/extraction of the cert connection tracking transport in case we want to backport this
* first commit unconditionally sets up the connection-tracking dialer, and moves all the cert management logic inside an if-block that gets skipped if no certificate manager is provided (view with whitespace ignored to see what actually changed)
* second commit plumbs the connection-closing function to the heartbeat loop and calls it on repeated failures
follow-ups:
* consider backporting this to 1.10, 1.9, 1.8
* refactor the connection managing dialer to not be so tightly bound to the client certificate management
/sig node
/sig api-machinery
```release-note
kubelet: fix hangs in updating Node status after network interruptions/changes between the kubelet and API server
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Report node DNS info with --node-ip
**What this PR does / why we need it**:
This PR adds `ExternalDNS`, `InternalDNS`, and `ExternalIP` info for kubelets with the `--nodeip` flag enabled.
**Which issue(s) this PR fixes**
Fixes#63158
**Special notes for your reviewer**:
I added a field to the Kubelet to make IP validation more testable (`validateNodeIP` relies on the `net` package and the IP address of the host that is executing the test.) I also converted the test to use a table so new cases could be added more easily.
**Release Notes**
```release-note
Report node DNS info with --node-ip flag
```
@andrewsykim
@nckturner
/sig node
/sig network
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Correct kill logic for pod processes
Correct the kill logic for processes in the pod's cgroup. os.FindProcess() does not check whether the process exists on POSIX systems.
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
should use time.Since instead of time.Now().Sub
**What this PR does / why we need it**:
should use time.Since instead of time.Now().Sub
**Special notes for your reviewer**:
Automatic merge from submit-queue (batch tested with PRs 60200, 63623, 63406). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Apply pod name and namespace labels for pod cgroup for cadvisor metrics
**What this PR does / why we need it**:
1. Enable Prometheus users to determine usage by pod name and namespace for pod cgroup sandbox.
1. Label cAdvisor metrics for pod cgroups by pod name and namespace.
1. Aligns with kubelet stats summary endpoint pod cpu and memory stats.
**Special notes for your reviewer**:
This provides parity with the summary API enhancements done here:
https://github.com/kubernetes/kubernetes/pull/55969
**Release note**:
```release-note
Apply pod name and namespace labels to pod cgroup in cAdvisor metrics
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Stop() for Ticker to enable leak-free code
**What this PR does / why we need it**:
I wanted to use the clock package but the `Ticker` without a `Stop()` method is a deal breaker for me.
**Release note**:
```release-note
NONE
```
/kind enhancement
/sig api-machinery
Automatic merge from submit-queue (batch tested with PRs 63624, 59847). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
explicit kubelet config key in Node.Spec.ConfigSource.ConfigMap
This makes the Kubelet config key in the ConfigMap an explicit part of
the API, so we can stop using magic key names.
As part of this change, we are retiring ConfigMapRef for ConfigMap.
```release-note
You must now specify Node.Spec.ConfigSource.ConfigMap.KubeletConfigKey when using dynamic Kubelet config to tell the Kubelet which key of the ConfigMap identifies its config file.
```
Automatic merge from submit-queue (batch tested with PRs 63593, 63539). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Refactor cachingSecretManager
I have a POC of watch-based implementation of SecretManager in https://github.com/kubernetes/kubernetes/pull/63461
This is an initial refactoring that would make that change easier.
@yujuhong - if you're fine with this PR, I will do the same for configmaps in the follow up PR.
Automatic merge from submit-queue (batch tested with PRs 58580, 63120). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Admit BestEffort if it tolerates memory pressure.
Signed-off-by: Da K. Ma <klaus1982.cn@gmail.com>
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#58505
**Release note**:
```release-note
None
```
This makes the Kubelet config key in the ConfigMap an explicit part of
the API, so we can stop using magic key names.
As part of this change, we are retiring ConfigMapRef for ConfigMap.
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Clean up kubelet eviction events
**What this PR does / why we need it**:
This makes eviction events better.
* Exceeding container disk limits no longer says "node was low on X", since the node isn't actually low on a resource. The container limit was just exceeded. Same for pods and volumes.
* Eviction message now lists containers which were exceeding their requests. This is an event from a container evicted while under memory pressure:
`reason: 'Evicted' The node was low on resource: memory. Container high-priority-memory-hog was using 166088Ki, which exceeds its request of 10Mi.`
* Eviction messages now displays real resources, when they exist. Rather than `The node was low on resource: nodefs`, it will now show `The node was low on resource: ephemeral-storage`.
This also cleans up eviction code in order to accomplish this. We previously had a resource for each signal: e.g. `SignalNodeFsAvailable` mapped to the resource`nodefs`, and `nodefs` maps to reclaim functions, and ranking functions. Now, signals map directly to reclaim and ranking functions, and signals map to real resources: e.g. `SignalNodeFsAvailable` maps to the resource `ephemeral-storage`, which is what we use in events.
This also cleans up duplicated code by reusing the `evictPod` function. It also removes the unused signal `SignalAllocatableNodeFsAvailable`.
**Release note**:
```release-note
NONE
```
/sig node
/priority important-longterm
/assign @dchen1107 @jingxu97
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
passthrough readOnly to subpath
**What this PR does / why we need it**:
If a volume is mounted as readonly, or subpath volumeMount is configured as readonly, then the subpath bind mount should be readonly.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#62752
**Special notes for your reviewer**:
**Release note**:
```release-note
Fixes issue where subpath readOnly mounts failed
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fixed hack/verify-golint.sh reported errors
**What this PR does / why we need it**:
Fixes errors reported by `hack/verify-golint.sh` while running `make verify`
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#63065
**Special notes for your reviewer**:
Fixing `hack/verify-gofmt.sh` reported error for `k8s.io/code-generator/cmd/client-gen/generators/client_generator.go` mentioned in #63065 throws error. So didn't modify that file.
**Release note**:
```release-note
NONE
```
/sig-testing-bugs
Automatic merge from submit-queue (batch tested with PRs 63488, 63496). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Improve test coverage of Kubelet file utils
Improves from 30.9% to 77.8%.
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 62914, 63431). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
kubelet: fix flake in TestUpdateExistingNodeStatusTimeout
xref https://github.com/openshift/origin/issues/19443
There are cases where some process, outside the test, attempts to connect to the port we are using to do the test, leading to a attempt count greater than what we expect.
To deal with this, just ensure that we have seen *at least* the number of connection attempts we expect.
@liggitt
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 63421, 63432, 63333). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
update tests to be specific about the versions they are testing
When setting up tests, you want to rely on your own scheme. This eliminates coupling to floating versions which gives unnecessary flexibility in most cases and prevents testing all the versions you need.
@liggitt scrubs unnecessary deps.
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
kubelet - Remove unused code
**What this PR does / why we need it**:
Looks like we have a bunch of unused methods. Let's clean them up
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add support for CNI on Windows Server 2016 RTM
**What this PR does / why we need it**:
Windows Server 2016 RTM has limited CNI support. This PR makes it possible for the CNI plugin to be used to setup POD networking on Windows Server 2016 RTM (build number 14393).
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#61939
**Special notes for your reviewer**:
The old mode is not supported and tested on Windows Server 2016 RTM. This change allows the CNI plugin to be used on Windows Server 2016 RTM to retrieve the container IP instead of using workarounds (docker inspect).
CNI support has been added for Windows Server 2016 version 1709 (build number 16299), this patch will just allow the same support for older build numbers.
Windows Server 2016 RTM has a longer lifecycle (LTS) than Windows Server 2016 version 1709.
https://support.microsoft.com/en-us/lifecycle/search/19761 vs https://support.microsoft.com/en-us/lifecycle/search/20311
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Use a []string for CgroupName, which is a more accurate internal representation
**What this PR does / why we need it**:
This is purely a refactoring and should bring no essential change in behavior.
It does clarify the cgroup handling code quite a bit.
It is preparation for further changes we might want to do in the cgroup hierarchy. (But it's useful on its own, so even if we don't do any, it should still be considered.)
**Special notes for your reviewer**:
The slice of strings more precisely captures the hierarchic nature of the cgroup paths we use to represent pods and their groupings.
It also ensures we're reducing the chances of passing an incorrect path format to a cgroup driver that requires a different path naming, since now explicit conversions are always needed.
The new constructor `NewCgroupName` starts from an existing `CgroupName`, which enforces a hierarchy where a root is always needed. It also performs checking on the component names to ensure invalid characters ("/" and "_") are not in use.
A `RootCgroupName` for the top of the cgroup hierarchy tree is introduced.
This refactor results in a net reduction of around 30 lines of code,
mainly with the demise of ConvertCgroupNameToSystemd which had fairly
complicated logic in it and was doing just too many things.
There's a small TODO in a helper `updateSystemdCgroupInfo` that was introduced to make this commit possible. That logic really belongs in libcontainer, I'm planning to send a PR there to include it there. (The API already takes a field with that information, only that field is only processed in cgroupfs and not systemd driver, we should fix that.)
Tested: By running the e2e-node tests on both Ubuntu 16.04 (with cgroupfs driver) and CentOS 7 (with systemd driver.)
**NOTE**: I only tested this with dockershim, we should double-check that this works with the CRI endpoints too, both in cgroupfs and systemd modes.
/assign @derekwaynecarr
/assign @dashpole
/assign @Random-Liu
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
refactor device plugin grpc dial with dialcontext
**What this PR does / why we need it**:
Refactor grpc `dial` with `dialContext` as `grpc.WithTimeout` has been deprecated by:
> use DialContext and context.WithTimeout instead.
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 62657, 63278, 62903, 63375). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add more volume types in e2e and fix part of them.
**What this PR does / why we need it**:
- Add dir-link/dir-bindmounted/dir-link-bindmounted/bockfs volume types for e2e tests.
- Fix fsGroup related e2e tests partially.
- Return error if we cannot resolve volume path.
- Because we should not fallback to volume path, if it's a symbolic link, we may get wrong results.
To safely set fsGroup on local volume, we need to implement these two methods correctly for all volume types both on the host and in container:
- get volume path kubelet can access
- paths on the host and in container are different
- get mount references
- for directories, we cannot use its mount source (device field) to identify mount references, because directories on same filesystem have same mount source (e.g. tmpfs), we need to check filesystem's major:minor and directory root path on it
Here is current status:
| | (A) volume-path (host) | (B) volume-path (container) | (C) mount-refs (host) | (D) mount-refs (container) |
| --- | --- | --- | --- | --- |
| (1) dir | OK | FAIL | FAIL | FAIL |
| (2) dir-link | OK | FAIL | FAIL | FAIL |
| (3) dir-bindmounted | OK | FAIL | FAIL | FAIL |
| (4) dir-link-bindmounted | OK | FAIL | FAIL | FAIL |
| (5) tmpfs| OK | FAIL | FAIL | FAIL |
| (6) blockfs| OK | FAIL | OK | FAIL |
| (7) block| NOTNEEDED | NOTNEEDED | NOTNEEDED | NOTNEEDED |
| (8) gce-localssd-scsi-fs| NOTTESTED | NOTTESTED | NOTTESTED | NOTTESTED |
- This PR uses `nsenter ... readlink` to resolve path in container as @msau42 @jsafrane [suggested](https://github.com/kubernetes/kubernetes/pull/61489#pullrequestreview-110032850). This fixes B1:B6 and D6, , the rest will be addressed in https://github.com/kubernetes/kubernetes/pull/62102.
- C5:D5 marked `FAIL` because `tmpfs` filesystems can share same mount source, we cannot rely on it to check mount references. e2e tests passes due to we use unique mount source string in tests.
- A7:D7 marked `NOTNEEDED` because we don't set fsGroup on block devices in local plugin. (TODO: Should we set fsGroup on block device?)
- A8:D8 marked `NOTTESTED` because I didn't test it, I leave it to `pull-kubernetes-e2e-gce`. I think it should be same as `blockfs`.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Update all script shebangs to use /usr/bin/env interpreter instead of /bin/interpreter
This is required to support systems where bash doesn't reside in /bin (such as NixOS, or the *BSD family) and allow users to specify a different interpreter version through $PATH manipulation.
https://www.cyberciti.biz/tips/finding-bash-perl-python-portably-using-env.html
```release-note
Use /usr/bin/env in all script shebangs to increase portability.
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
kubelet: force filterContainerID to empty string when removeAll is true
fixes https://github.com/kubernetes/kubernetes/issues/57865
alternative to https://github.com/kubernetes/kubernetes/pull/62170
There is a bug in the container deletor where if `removeAll` is `true` in `deleteContainersInPod()` the `filterContainerID` is still used to filter results in `getContainersToDeleteInPod()`. If the filter container is not found, no containers are returned for deletion.
https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/pod_container_deletor.go#L74-L77
This is the case for the delayed deletion a pod in `CrashLoopBackoff` as the death of the infra container in response to the `DELETE` is detected by PLEG and triggers an attempt to clean up all containers but uses the infra container id as a filter with `removeAll` set to `true`. The infra container is immediately deleted and thus can not be found when `getContainersToDeleteInPod()` tries to find it. Thus the dead app container from the previous restart attempt still exists.
`canBeDeleted()` in the status manager will return `false` until all the pod containers are deleted, delaying the deletion of the pod on the API server.
The removal of the containers is eventually forced by the API server `REMOVE` after the grace period.
The slice of strings more precisely captures the hierarchic nature of
the cgroup paths we use to represent pods and their groupings.
It also ensures we're reducing the chances of passing an incorrect path
format to a cgroup driver that requires a different path naming, since
now explicit conversions are always needed.
The new constructor NewCgroupName starts from an existing CgroupName,
which enforces a hierarchy where a root is always needed. It also
performs checking on the component names to ensure invalid characters
("/" and "_") are not in use.
A RootCgroupName for the top of the cgroup hierarchy tree is introduced.
This refactor results in a net reduction of around 30 lines of code,
mainly with the demise of ConvertCgroupNameToSystemd which had fairly
complicated logic in it and was doing just too many things.
There's a small TODO in a helper updateSystemdCgroupInfo that was
introduced to make this commit possible. That logic really belongs in
libcontainer, I'm planning to send a PR there to include it there.
(The API already takes a field with that information, only that field is
only processed in cgroupfs and not systemd driver, we should fix that.)
Tested by running the e2e-node tests on both Ubuntu 16.04 (with cgroupfs
driver) and CentOS 7 (with systemd driver.)
Automatic merge from submit-queue (batch tested with PRs 58474, 60034, 62101, 63198). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
avoid race condition in device manager and plugin startup/shutdown: wait for goroutines
**What this PR does / why we need it**:
Commit 1325c2f worked around issue #59488, but it is still worthwhile to fix the underlying root cause properly.
**Which issue(s) this PR fixes**:
Fixes#59488
**Special notes for your reviewer**:
This is an alternative to PR #59861, which used a different approach. Personally I tend to prefer this one now.
**Release note**:
```release-note
NONE
```
/sig node
/area hw-accelerators
/assign vikaschoudhary16
Automatic merge from submit-queue (batch tested with PRs 63252, 63160). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
kubelet: logs: do not wait when following terminated container
Currently, a `kubectl logs -f` on a terminated container will output the logs, wait 5 seconds (`stateCheckPeriod`), then return. The 5 seconds delay should not occur as the container is terminated and unable to generate additional log messages.
This PR puts a check at the beginning of `waitLogs()` to avoid doing the wait when the container is not running.
@derekwaynecarr @smarterclayton
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix dockershim e2e
**What this PR does / why we need it**:
Delete checkpoint file when GetCheckpoint fails due to corrupt checkpoint. Earlier, before checkpointmanager, [`GetCheckpoint` in dockershim was deleting corrupt checkpoint file implicitly](https://github.com/kubernetes/kubernetes/pull/56040/files#diff-9a174fa21408b7faeed35309742cc631L116). In checkpointmanager's `GetCheckpoint` this implicit deletion of corrupt checkpoint is not happening. Because of this few e2e tests are failing because these tests are testing this deletion.
Changes are being added to delete checkpoint file if found corrupted.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#62738
**Special notes for your reviewer**:
No new behavior is being introduced. Implicit deletion of corrupt checkpoint is being done explicitly.
**Release note**:
```release-note
None
```
/cc @dashpole @sjenning @derekwaynecarr
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Change seccomp annotation from "docker/default" to "runtime/default"
**What this PR does / why we need it**:
This PR changes seccomp annotation from "docker/default" to "runtime/default", so that it is can be applied to all kinds of container runtimes. This PR is a followup of [#1963](https://github.com/kubernetes/community/pull/1963).
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#39845
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 62982, 63075, 63067, 62877, 63141). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
eliminate indirection from type registration
Some years back there was a partial attempt to revamp api type registration, but the effort was never completed and this was before we started splitting schemes. With separate schemes, the idea of partial registration no longer makes sense. This pull starts removing cruft from the registration process and pulls out a layer of indirection that isn't needed.
@kubernetes/sig-api-machinery-pr-reviews
@lavalamp @cheftako @sttts @smarterclayton
Rebase cost is fairly high, so I'd like to avoid this lingering.
/assign @sttts
/assign @cheftako
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 62982, 63075, 63067, 62877, 63141). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
add warnings on using pod-infra-container-image for remote container runtime
**What this PR does / why we need it**:
We should warn on using `--pod-infra-container-image` to avoid confusions, when users are using remote container runtime.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #55676,#62388,#62732
**Special notes for your reviewer**:
/cc @kubernetes/sig-node-pr-reviews
**Release note**:
```release-note
add warnings on using pod-infra-container-image for remote container runtime
```
Automatic merge from submit-queue (batch tested with PRs 62951, 57460, 63118). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix device plugin re-registration
**What this PR does / why we need it**:
While registering a new endpoint, device manager copies all the devices from the old endpoint for the same resource and then it stops the old endpoint and starts the new endpoint.
There is no sync between stopping the old and starting the new. While stopping the old, manager marks devices(which are copied to new endpoint as well) as "Unhealthy".
In the endpoint.go, when after restart, plugin reports devices healthy, same health state (healthy) is found in the endpoint database and endpoint module does not update manager database.
Solution in the PR is to mark devices as unhealthy before copying to new endpoint.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#62773
**Special notes for your reviewer**:
**Release note**:
```release-note
None
```
/cc @jiayingz @vishh @RenaudWasTaken @derekwaynecarr
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
dockershim/sandbox: clean up pod network even if SetUpPod() failed
If the CNI network plugin completes successfully, but something fails
between that success and dockerhsim's sandbox setup code, plugin resources
may not be cleaned up. A non-trivial amount of code runs after the
plugin itself exits and the CNI driver's SetUpPod() returns, and any error
condition recognized by that code would cause this leakage.
The Kubernetes CRI RunPodSandbox() request does not attempt to clean
up on errors, since it cannot know how much (if any) networking
was actually set up. It depends on the CRI implementation to do
that cleanup for it.
In the dockershim case, a SetUpPod() failure means networkReady is
FALSE for the sandbox, and TearDownPod() will not be called later by
garbage collection even though networking was configured, because
dockershim can't know how far SetUpPod() got.
Concrete examples include if the sandbox's container is somehow
removed during during that time, or another OS error is encountered,
or the plugin returns a malformed result to the CNI driver.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1532965
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 59220, 62927, 63084, 63090, 62284). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix qosReserved json tag (lowercase qos, instead of uppercase QOS)
The API conventions specify that json keys should start with a lowercase
character, and if the key starts with an initialism, all characters in
the initialism should be lowercase. See `tlsCipherSuites` as an example.
API Conventions:
https://github.com/kubernetes/community/blob/master/contributors/devel/api-conventions.md
>All letters in the acronym should have the same case, using the
>appropriate case for the situation. For example, at the beginning
>of a field name, the acronym should be all lowercase, such as "httpGet".
Follow up to: https://github.com/kubernetes/kubernetes/pull/62925
```release-note
NONE
```
@sjenning @derekwaynecarr
Automatic merge from submit-queue (batch tested with PRs 62655, 61711, 59122, 62853, 62390). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
reset resultRun to 0 on pod restart
**What this PR does / why we need it**:
The resultRun should be reset to 0 on pod restart, so that resultRun on the first failure of the new container will be 1, which is correct. Otherwise, the actual FailureThreshold after restarting will be `FailureThreshold - 1`.
**Which issue(s) this PR fixes**:
This PR is related to https://github.com/kubernetes/kubernetes/issues/53530. https://github.com/kubernetes/kubernetes/pull/46371 fixed that issue but there's still a little problem like what I said above.
**Special notes for your reviewer**:
**Release note**:
```release-note
fix resultRun by resetting it to 0 on pod restart
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
unpack dynamic kubelet config payloads to files
This PR unpacks the downloaded ConfigMap to a set of files on the node.
This enables other config files to ride alongside the
KubeletConfiguration, and the KubeletConfiguration to refer to these
cohabitants with relative paths.
This PR also stops storing dynamic config metadata (e.g. current,
last-known-good config records) in the same directory as config
checkpoints. Instead, it splits the storage into `meta` and
`checkpoints` dirs.
The current store dir structure is as follows:
```
- dir named by --dynamic-config-dir (root for managing dynamic config)
| - meta (dir for metadata, e.g. which config source is currently assigned, last-known-good)
| - current (a serialized v1 NodeConfigSource object, indicating the assigned config)
| - last-known-good (a serialized v1 NodeConfigSource object, indicating the last-known-good config)
| - checkpoints (dir for config checkpoints)
| - uid1 (dir for unpacked config, identified by uid1)
| - file1
| - file2
| - ...
| - uid2
| - ...
```
There are some likely changes to the above structure before dynamic config goes beta, such as renaming "current" to "assigned" for clarity, and extending the checkpoint identifier to include a resource version, as part of resolving #61643.
```release-note
NONE
```
/cc @luxas @smarterclayton
If the CNI network plugin completes successfully, but something fails
between that success and dockerhsim's sandbox setup code, plugin resources
may not be cleaned up. A non-trivial amount of code runs after the
plugin itself exits and the CNI driver's SetUpPod() returns, and any error
condition recognized by that code would cause this leakage.
The Kubernetes CRI RunPodSandbox() request does not attempt to clean
up on errors, since it cannot know how much (if any) networking
was actually set up. It depends on the CRI implementation to do
that cleanup for it.
In the dockershim case, a SetUpPod() failure means networkReady is
FALSE for the sandbox, and TearDownPod() will not be called later by
garbage collection even though networking was configured, because
dockershim can't know how far SetUpPod() got.
Concrete examples include if the sandbox's container is somehow
removed during during that time, or another OS error is encountered,
or the plugin returns a malformed result to the CNI driver.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1532965
The API conventions specify that json keys should start with a lowercase
character, and if the key starts with an initialism, all characters in
the initialism should be lowercase. See `tlsCipherSuites` as an example.
API Conventions:
https://github.com/kubernetes/community/blob/master/contributors/devel/api-conventions.md
>All letters in the acronym should have the same case, using the
>appropriate case for the situation. For example, at the beginning
>of a field name, the acronym should be all lowercase, such as "httpGet".
Automatic merge from submit-queue (batch tested with PRs 63001, 62152, 61950). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
When bootstrapping a client cert, store it with other client certs
The kubelet uses two different locations to store certificates on
initial bootstrap and then on subsequent rotation:
* bootstrap: certDir/kubelet-client.(crt|key)
* rotation: certDir/kubelet-client-(DATE|current).pem
Bootstrap also creates an initial node.kubeconfig that points to the
certs. Unfortunately, with short rotation the node.kubeconfig then
becomes out of date because it points to the initial cert/key, not the
rotated cert key.
Alter the bootstrap code to store client certs exactly as if they would
be rotated (using the same cert Store code), and reference the PEM file
containing cert/key from node.kubeconfig, which is supported by kubectl
and other Go tooling. This ensures that the node.kubeconfig continues to
be valid past the first expiration.
Example:
```
bootstrap:
writes to certDir/kubelet-client-DATE.pem and symlinks to certDir/kubelet-client-current.pem
writes node.kubeconfig pointing to certDir/kubelet-client-current.pem
rotation:
writes to certDir/kubelet-client-DATE.pem and symlinks to certDir/kubelet-client-current.pem
```
This will also allow us to remove the wierd "init store with bootstrap cert" stuff, although I'd prefer to do that in a follow up.
@mikedanese @liggitt as per discussion on Slack today
```release-note
The `--bootstrap-kubeconfig` argument to Kubelet previously created the first bootstrap client credentials in the certificates directory as `kubelet-client.key` and `kubelet-client.crt`. Subsequent certificates created by cert rotation were created in a combined PEM file that was atomically rotated as `kubelet-client-DATE.pem` in that directory, which meant clients relying on the `node.kubeconfig` generated by bootstrapping would never use a rotated cert. The initial bootstrap certificate is now generated into the cert directory as a PEM file and symlinked to `kubelet-client-current.pem` so that the generated kubeconfig remains valid after rotation.
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Timeout on instances.NodeAddresses cloud provider request
**What this PR does / why we need it**:
In cases the cloud provider does not respond before the node gets evicted.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
stop kubelet to cloud provider integration potentially wedging kubelet sync loop
```
The kubelet uses two different locations to store certificates on
initial bootstrap and then on subsequent rotation:
* bootstrap: certDir/kubelet-client.(crt|key)
* rotation: certDir/kubelet-client-(DATE|current).pem
Bootstrap also creates an initial node.kubeconfig that points to the
certs. Unfortunately, with short rotation the node.kubeconfig then
becomes out of date because it points to the initial cert/key, not the
rotated cert key.
Alter the bootstrap code to store client certs exactly as if they would
be rotated (using the same cert Store code), and reference the PEM file
containing cert/key from node.kubeconfig, which is supported by kubectl
and other Go tooling. This ensures that the node.kubeconfig continues to
be valid past the first expiration.
Automatic merge from submit-queue (batch tested with PRs 62780, 62886). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Only count local mounts that are from other pods
**What this PR does / why we need it**:
In GCE, we mount the same local SSD in two different paths (for backwards compatability). This makes the fsGroup conflict check fail because it thinks the 2nd mount is from another pod. For the fsgroup check, we only want to detect if other pods are mounting the same volume, so this PR filters the mount list to only those mounts under "/var/lib/kubelet".
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#62867
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 62780, 62886). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Change Capacity log verbosity in status update
*What this PR does / why we need it:*
While in production we noticed that the log verbosity for the Capacity field in the node status was to high.
This log message is called for every device plugin resource at every update.
A proposed solution is to tune it down from V(2) to V(5). In a normal setting you'll be able to see the effect by looking at the node status.
Release note:
```
NONE
```
/sig node
/area hw-accelerators
/assign @vikaschoudhary16 @jiayingz @vishh
Automatic merge from submit-queue (batch tested with PRs 61962, 58972, 62509, 62606). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
kubelet: move QOSReserved from experimental to alpha feature gate
Fixes https://github.com/kubernetes/kubernetes/issues/61665
**Release note**:
```release-note
The --experimental-qos-reserve kubelet flags is replaced by the alpha level --qos-reserved flag or QOSReserved field in the kubeletconfig and requires the QOSReserved feature gate to be enabled.
```
/sig node
/assign @derekwaynecarr
/cc @mtaufen
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
add ut for kuberuntime-gc
**What this PR does / why we need it**:
Add ut for kuberuntime-gc to cover more situations:
1) Add two uncovered cases to test sandbox-gc
(1) When there are more than one exited sandboxes,the older exited sandboxes without containers for existing pods should be garbage collected;
(2) Even though there are more than one exited sandboxes,the older exited sandboxes with containers for existing pods should not be garbage collected.
2) Add one uncovered case to test container-gc
(1) To cover the situation when allSourcesReady is set false;
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
"NONE"
```
This PR unpacks the downloaded ConfigMap to a set of files on the node.
This enables other config files to ride alongside the
KubeletConfiguration, and the KubeletConfiguration to refer to these
cohabitants with relative paths.
This PR also stops storing dynamic config metadata (e.g. current,
last-known-good config records) in the same directory as config
checkpoints. Instead, it splits the storage into `meta` and
`checkpoints` dirs.
Automatic merge from submit-queue (batch tested with PRs 62694, 62569, 62646, 61633, 62433). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Report events to apiserver in local volume plugin.
**What this PR does / why we need it**:
See https://github.com/kubernetes/kubernetes/pull/62102#discussion_r179238429.
Report events to apiserver in local volume plugin.
- Add VolumeHost.GetEventRecorder() method
- Add related e2e tests
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#62248
**Special notes for your reviewer**:
Example output of `kubectl describe pods`:
```
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 7s default-scheduler Successfully assigned e2e-tests-persistent-local-volumes-test-x4h5x/security-context-670da435-4174-11e8-9098-000c29bb0377 to 127.0.0.1
Warning AlreadyMountedVolume 7s kubelet, 127.0.0.1 The requested fsGroup is 4321, but the volume local-pvfbb76 has GID 1234. The volume may not be shareable.
Normal SuccessfulMountVolume 7s kubelet, 127.0.0.1 MountVolume.SetUp succeeded for volume "default-token-996xr"
Normal SuccessfulMountVolume 7s kubelet, 127.0.0.1 MountVolume.SetUp succeeded for volume "local-pvfbb76"
Normal Pulled 6s kubelet, 127.0.0.1 Container image "k8s.gcr.io/busybox:1.24" already present on machine
Normal Created 6s kubelet, 127.0.0.1 Created container
Normal Started 6s kubelet, 127.0.0.1 Started container
```
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 62448, 59317, 59947, 62418, 62352). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
kubelet: add configuration to optionally enable server tls bootstrap
right now if the RotateKubeletServerCertificate feature is enabled,
kubelet will bootstrap server tls. this is undesirable if the deployment
is not or cannot run an approver to handle these certificate signing
requests.
Fixes https://github.com/kubernetes/kubernetes/issues/62077
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 62448, 59317, 59947, 62418, 62352). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
fix assert.Equal argument order
Reference:
https://godoc.org/github.com/stretchr/testify/assert#Equal
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
right now if the RotateKubeletServerCertificate feature is enabled,
kubelet will bootstrap server tls. this is undesirable if the deployment
is not or cannot run an approver to handle these certificate signing
requests.
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
use glog.Infof instead of glog.Info in volumn_host
**What this PR does / why we need it**:
use glog.Infof instead of glog.Info
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 60476, 62462, 61391, 62535, 62394). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Private mount propagation
This PR changes the default mount propagation from "rslave" (newly added in 1.10) to "private" (default in 1.9 and before). "rslave" as default causes regressions, see below.
Value `"None"` has to be added to `MountPropagationMode` enum in API ("I don't want any propagation at all"), which translates to "private" on Linux. [We did not have use cases for it](https://github.com/kubernetes/community/pull/659#discussion_r131454319), but we have them now.
**Which issue(s) this PR fixes**
Fixes#62397, fixes#62396
**Special notes for your reviewer**:
CRI already has an option for private mount propagation in volumes, however it's called "PRIVATE", while Kubernetes API value is "None". I did not change PRIVATE to NONE to keep the interface stable. See `kubelet_pods.go`.
**Release note**:
```release-note
Default mount propagation has changed from "HostToContainer" ("rslave" in Linux terminology) to "None" ("private") to match the behavior in 1.9 and earlier releases. "HostToContainer" as a default caused regressions in some pods.
```
/sig storage
/sig node
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
PSP: move internal types to policy API group
**What this PR does / why we need it**:
This is a part of the PSP migration from extensions to policy API group. This PR moves internal types to the its final destination.
**Which issue(s) this PR fixes**:
Addressed to https://github.com/kubernetes/features/issues/5
A flaky test exposed a race condition where shutting down one server
instance broke the startup of the next instance when using the same
socket path. Commit 1325c2f8be removed the reuse of the same socket
path and thus avoided the issue.
But the real fix is to ensure that the listening socket is really
closed once Stop returns. Two solutions were proposed in
https://github.com/grpc/grpc-go/issues/1861:
- waiting for the goroutine to complete
- closing the socket
The former is done here because it's cleaner to not keep lingering
goroutines. While at it, the Stop methods are made idempotent (similar
to e.g. Close on a socket) and no longer crash when called without
prior Start.
Fixes https://github.com/kubernetes/kubernetes/issues/59488
Automatic merge from submit-queue (batch tested with PRs 62455, 62465, 62427, 62416, 62411). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
kuberuntime: logs: reduce logging level on waitLogs msg
Lots of occurrences of this msg coming from `waitLogs()`:
```
E0411 13:17:04.589338 7645 logs.go:383] Container "4fbf541ed1900c4670216a6a1ecf752cd07ac430f5547c5497fbc4b78e564b78" is not running (state="CONTAINER_EXITED")
E0411 14:02:18.168502 7645 logs.go:383] Container "dba4c535666d05310889965418592727047320743a233e226e2266b399836150" is not running (state="CONTAINER_EXITED")
E0411 14:02:41.342645 7645 logs.go:383] Container "a946289b36fe3c375c29dce020005424f3b980237892253d42b8bd8bfb595756" is not running (state="CONTAINER_EXITED")
E0411 14:02:49.907317 7645 logs.go:383] Container "e1d6014330e7422c03ae6db501d4fb296a4501355517cb60e2f910f54741361d" is not running (state="CONTAINER_EXITED")
```
Added in https://github.com/kubernetes/kubernetes/pull/55140
This message prints whenever something is watching the log when the container dies.
The comment right after the error msg say "this is normal" and thus should not be logged at Error level.
@derekwaynecarr @feiskyer @Random-Liu
In order to make it compilable I had to remove these files manually:
pkg/client/listers/extensions/internalversion/podsecuritypolicy.go
pkg/client/informers/informers_generated/internalversion/extensions/internalversion/podsecuritypolicy.go
pkg/client/clientset_generated/internalclientset/typed/extensions/internalversion/podsecuritypolicy.go
pkg/client/clientset_generated/internalclientset/typed/extensions/internalversion/fake/fake_podsecuritypolicy.go
With CRI, kubelet no longer sets up networking for the pods. The
dockershim package is the rightful owner and the only user of the
newtork package. This change moves the package into dockershim to make
the distinction obvious, and untangles the codebase.
The`network/dns`is kept in the original package since it is only used by
kubelet.
The code was added to support rktnetes and non-CRI docker integrations.
These legacy integrations have already been removed from the codebase.
This change removes the compatibility code existing soley for the
legacy integrations.
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Extract validateNodeIP test to node status test file.
The function of `validateNodeIP` is belong to kubelet_node_status,
so the unit test of this function should be in node status test file.
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
none
```
Automatic merge from submit-queue (batch tested with PRs 62192, 61866, 62206, 62360). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Remove rkt references in the codebase
```release-note
None
```
Automatic merge from submit-queue (batch tested with PRs 62192, 61866, 62206, 62360). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Make the test TestCRIListPodStats pass for Darwin and Windows
GetPodCgroupNameSuffix is only implemented for Linux, which mean
that CPU and Memory stats are only available on Linux.
My fix to make the test pass on other OS:es than Linux
is to just check CPU and Memory stats on Linux.
(This is similar to #57637 which fixed the same problem for the
test TestCadvisorListPodStats.)
**What this PR does / why we need it**:
To make all unit tests pass on macOS/Darwin
**Which issue(s) this PR fixes**:
Fixes#62177
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 59027, 62333, 57661, 62086, 61584). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add UT case to cover the func legacyLogSymlink in legacy.go
**What this PR does / why we need it**:
Add UT case to cover the func legacyLogSymlink in legacy.go.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
"NONE"
```
Automatic merge from submit-queue (batch tested with PRs 61549, 62230, 62055, 61082, 62212). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Setup default cni dir correctly
**What this PR does / why we need it**:
Kubelet failed to set up pod with error: failed to find plugin "loopback" in path []. This only happens when kubelet's --cni-bin-dir not set. This PR fixes this.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#62054
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add volume spec to mountedPod in actual state of world
Add volume spec into mountedPod data struct in the actual state of the
world.
Fixes issue #61248
Automatic merge from submit-queue (batch tested with PRs 61147, 62236, 62018). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
fix local volume absolute path issue on Windows
**What this PR does / why we need it**:
remove IsAbs validation on local volume since it does not work on windows cluster, Windows absolute path `D:` is not allowed in local volume, the [validation](https://github.com/kubernetes/kubernetes/blob/master/pkg/apis/core/validation/validation.go#L1386) happens on both master and agent node, while for windows cluster, the master is Linux and agent is Windows, so `path.IsAbs()` func will not work all in both nodes.
**Instead**, this PR use `MakeAbsolutePath` func to convert `local.path` value in kubelet, it supports both linux and windows styple.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#62016
**Special notes for your reviewer**:
**Release note**:
```
fix local volume absolute path issue on Windows
```
/sig storage
/sig windows
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Remove the workaround of heapster panic
**What this PR does / why we need it**:
In #55213, we merged a work around for heapster panic #54962. Heapster has been upgraded to v1.5.2 in #61396, this PR removes the workaroud.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#55280
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Cleanup the use of ExternalID as it is deprecated
The patch removes ExternalID usage from node_controller
and node_lifecycle_oontroller. The code instead uses InstanceID
which returns the cloud provider ID as well.
fixes#60466
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Node status be more verbose
**What this PR does / why we need it**:
Improve logging ability of node status so it is easier to debug update of a node status
```release-note
NONE
```
use MakeAbsolutePath to convert path in Windows
fix test error: allow relative path for local volume
fix comments
fix comments and add windows unit tests
GetPodCgroupNameSuffix is only implemented for Linux, which mean
that CPU and Memory stats are only available on Linux.
My fix to make the test pass on other OS:es than Linux
is to just check CPU and Memory stats on Linux.
(This is similar to #57637 which fixed the same problem for the
test TestCadvisorListPodStats.)
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Ensure /etc/hosts has a header always - Fix conformance test
**What this PR does / why we need it**:
We need to be able to tell if an /etc/hosts in a container has been touched by kubernetes or not (whether we use the host network or not).
We have 2 scenarios where we copy /etc/hosts
- with host network (we just copy the /etc/hosts from node)
- without host network (create a fresh /etc/hosts from pod info)
We are having trouble figuring out whether a /etc/hosts in a
pod/container has been "fixed-up" or not. And whether we used
host network or a fresh /etc/hosts in the various ways we start
up the tests which are:
- VM/box against a remote cluster
- As a container inside the k8s cluster
- DIND scenario in CI where test runs inside a managed container
Please see previous mis-guided attempt to fix this problem at
ba20e63446 In this commit we revert
the code from there as well.
So we should make sure:
- we always add a header if we touched the file
- we add slightly different headers so we can figure out if we used the
host network or not.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#60938
**Special notes for your reviewer**:
Also see
- https://github.com/kubernetes/kubernetes/pull/61405
- https://github.com/kubernetes/kubernetes/pull/60939
- https://github.com/kubernetes/kubernetes/issues/60938
**Release note**:
```release-note
NONE
```
We have 2 scenarios where we copy /etc/hosts
- with host network (we just copy the /etc/hosts from node)
- without host network (create a fresh /etc/hosts from pod info)
We are having trouble figuring out whether a /etc/hosts in a
pod/container has been "fixed-up" or not. And whether we used
host network or a fresh /etc/hosts in the various ways we start
up the tests which are:
- VM/box against a remote cluster
- As a container inside the k8s cluster
- DIND scenario in CI where test runs inside a managed container
Please see previous mis-guided attempt to fix this problem at
ba20e63446 In this commit we revert
the code from there as well.
So we should make sure:
- we always add a header if we touched the file
- we add slightly different headers so we can figure out if we used the
host network or not.
Update the test case to inject /etc/hosts from node to another path
(/etc/hosts-original) as well and use that to compare.
Automatic merge from submit-queue (batch tested with PRs 61498, 62030). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Delete in-tree support for NVIDIA GPUs.
This removes the alpha Accelerators feature gate which was deprecated in 1.10 (#57384).
The alternative feature DevicePlugins went beta in 1.10 (#60170).
Fixes#54012
```release-note
Support for "alpha.kubernetes.io/nvidia-gpu" resource which was deprecated in 1.10 is removed. Please use the resource exposed by DevicePlugins instead ("nvidia.com/gpu").
```
Automatic merge from submit-queue (batch tested with PRs 60599, 61819). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
fix format
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
The patch removes ExternalID usage from node_controller
and node_lifecycle_oontroller. The code instead uses InstanceID
which returns the cloud provider ID as well.
Automatic merge from submit-queue (batch tested with PRs 61929, 61965). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix dockershim CreateContainer error handling.
Found this bug in CRI validation test https://github.com/kubernetes-incubator/cri-tools/pull/282.
In https://github.com/kubernetes/kubernetes/pull/52077, we expect container creation to return error if `RunAsGroup` is specified without `RunAsUser` or `RunAsUsername`. However, the error returned is not handled.
@krmayankk This is only a corner case. Does this worth cherry-pick into 1.10?
@kubernetes/sig-node-bugs
Signed-off-by: Lantao Liu <lantaol@google.com>
**Release note**:
```release-note
none
```
Automatic merge from submit-queue (batch tested with PRs 61894, 61369). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Expose kubelet health checks using new prometheus endpoint
**What this PR does / why we need it**:
Expose the results of kubelet liveness and readiness probes through a new endpoint on the kubelet called /containerHealth. This endpoint will expose a Prometheus metric. Below is a snippet of output when that endpoint is queried.
```
rramkumar@e2e-test-rramkumar-master ~ $ curl localhost:10255/metrics/probes
# HELP prober_probe_result The result of a liveness or readiness probe for a container.
# TYPE prober_probe_result gauge
prober_probe_result{container_name="kube-apiserver",namespace="kube-system",pod_name="kube-apiserver-e2e-test-rramkumar-master",pod_uid="949e11ad296ad9e3c842fd900f8cc723",probe_type="Liveness"} 0
prober_probe_result{container_name="kube-controller-manager",namespace="kube-system",pod_name="kube-controller-manager-e2e-test-rramkumar-master",pod_uid="0abfc37840bba279706ec39ae53a924c",probe_type="Liveness"} 0
prober_probe_result{container_name="kube-scheduler",namespace="kube-system",pod_name="kube-scheduler-e2e-test-rramkumar-master",pod_uid="0cd4171f9c806808291e6e24f99f0454",probe_type="Liveness"} 0
prober_probe_result{container_name="l7-lb-controller",namespace="kube-system",pod_name="l7-lb-controller-v0.9.8-alpha.2-e2e-test-rramkumar-master",pod_uid="968c792f4c1772566c71403dca2407f9",probe_type="Liveness"} 0
```
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#58235
**Release note**:
```release-note
Kubelet now exposes a new endpoint /metrics/probes which exposes a Prometheus metric containing the liveness and/or readiness probe results for a container.
```
Automatic merge from submit-queue (batch tested with PRs 61894, 61369). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Use range in loops; misc fixes
**What this PR does / why we need it**:
It is cleaner to use `range` in for loops to iterate over channel until it is closed.
**Release note**:
```release-note
NONE
```
/kind cleanup
Automatic merge from submit-queue (batch tested with PRs 54997, 61869, 61816, 61909, 60525). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
certs: only append locally discovered addresses when we get none from the cloudprovider
The cloudprovider is right, and only cloudprovider addresses can be verified centrally, so don't add any extra when we have them.
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
CRI: define the mount behavior when host path does not exist
**What this PR does / why we need it**:
This PR defines the mounting behavior when host path does not exist in CRI. Specifically,
- If the hostPath doesn't exist (e.g. hostPath volume), runtimes should report errors
- If the specified hostPath is a symlink, runtimes should follow the symlink and mount the real destination to the container
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#52318
**Special notes for your reviewer**:
**Release note**:
```release-note
CRI: define the mount behavior when host path does not exist: runtime should report error if the host path doesn't exist
```
Automatic merge from submit-queue (batch tested with PRs 57658, 61304, 61560, 61859, 61870). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
certs: exclude more nonsensical addresses from SANs
I noticed this when I saw 169.254.* SANs using server TLS bootstrap.
This change excludes more nonsensical addresses from being requested as
SANs in that flow.
Automatic merge from submit-queue (batch tested with PRs 61904, 61565, 61401, 61432, 61772). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Remove rktnetes code
**What this PR does / why we need it**:
rktnetes is scheduled to be deprecated in 1.10 (#53601). According to the deprecation policy for beta CLI and flags, we can remove the feature in 1.11.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#58721
**Special notes for your reviewer**:
**Release note**:
```release-note
Removed rknetes code, which was deprecated in 1.10.
```
/assign @yujuhong
/hold
Hold until the end of the freeze.
Currently the Windows Server 2016 RTM has no CNI support.
With this commit, the Windows Server 2016 RTM will be able
to use the CNI plugin for networking setup.
This commit also moves some comments to the right place.
Signed-off-by: Alin Balutoiu <abalutoiu@cloudbasesolutions.com>
I noticed this when I saw 169.254.* SANs using server TLS bootstrap.
This change excludes more nonsensical addresses from being requested as
SANs in that flow.
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Critical pods shouldn't be restricted to kube-system
**What this PR does / why we need it**:
To make sure that critical pods are not restricted to kube-system namespace.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#60596
**Special notes for your reviewer**:
@bsalamat @liggitt @aveshagarwal - Can we hold this till we merge quota restriction PR #57963.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 60166, 61706, 61769). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
use status.Errorf instead of Deprecated func grpc.Errorf
**What this PR does / why we need it**:
```
// Deprecated; use status.Errorf instead.
func Errorf(c codes.Code, format string, a ...interface{}) error {
return status.Errorf(c, format, a...)
}
```
func grpc.Errorf will be deprecated
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
use status.Errorf instead of Deprecated func grpc.Errorf
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
rktnetes is scheduled to be deprecated in 1.10 (#53601). According to
the deprecation policy for beta CLI and flags, we can remove the feature
in 1.11.
Fixes#58721
Automatic merge from submit-queue (batch tested with PRs 60519, 61099, 61218, 61166, 61714). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Automatically add system critical priority classes at cluster boostrapping
**What this PR does / why we need it**:
We had two PriorityClasses that were hardcoded and special cased in our code base. These two priority classes never existed in API server. Priority admission controller had code to resolve these two names. This PR removes the hardcoded PriorityClasses and adds code to create these PriorityClasses automatically when API server starts.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#60178
ref/ #57471
**Special notes for your reviewer**:
**Release note**:
```release-note
Automatically add system critical priority classes at cluster boostrapping.
```
/sig scheduling
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
move the const to the place it should be
**What this PR does / why we need it**:
move the const to the place it should be
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix `PodScheduled` bug for static pod.
Fixes https://github.com/kubernetes/kubernetes/issues/60589.
This is an implementation of option 2 in https://github.com/kubernetes/kubernetes/issues/60589#issuecomment-375103979.
I've validated this in my own cluster, and there won't be continuously status update for static pod any more.
Signed-off-by: Lantao Liu <lantaol@google.com>
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
none
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Replace package "golang.org/x/net/context" with "context"
**What this PR does / why we need it**:
Replace package "golang.org/x/net/context" with "context"
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#60560
**Special notes for your reviewer**:
As of Go 1.7 this package(golang.org/x/net/context) is available in the standard library under the name context. see (https://godoc.org/golang.org/x/net/context)
It is almost machinery replace.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 61453, 61393, 61379, 61373, 61494). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Use inner volume name instead of outer volume name for subpath directory
**What this PR does / why we need it**:
Fixes volume reconstruction for PVCs with subpath
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#61372
**Special notes for your reviewer**:
**Release note**:
```release-note
ACTION REQUIRED: In-place node upgrades to this release from versions 1.7.14, 1.8.9, and 1.9.4 are not supported if using subpath volumes with PVCs. Such pods should be drained from the node first.
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
fix some unhandled errors, ineffectual assignments, and misspellings
What this PR does / why we need it:
When I browsed the source code under the package, i found some variables have been defined, but not be used, so i changed it! At the same time, a spelling mistake has been found, thank you!
Automatic merge from submit-queue (batch tested with PRs 61487, 58353, 61078, 61219, 60792). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
remove dead code in kubelet
clean up dead code
/kind cleanup
/sig node
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Remove YEAR field of all generated files and fix kubernetes boilerplate checker
**What this PR does / why we need it**:
Remove YEAR field of all generated files and fix kubernetes boilerplate checker
xref: [remove YEAR fileds in gengo #91](https://github.com/kubernetes/gengo/pull/91)
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes [#gengo/issues/24](https://github.com/kubernetes/gengo/issues/24)
**Special notes for your reviewer**:
/cc @thockin @lavalamp @sttts
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 61124, 59537, 61235, 61258, 61114). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
fix todo:add function getFailContainer to report which containers failed the pod
**What this PR does / why we need it**:
fix todo:add function getFailContainer to report which containers failed the pod in runonce.go
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 60898, 60912, 60753, 61002, 60796). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix TODO: move openHostPorts and closeHostPorts into a common struct and add UTs
**What this PR does / why we need it**:
* Fix [TODO](https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/network/hostport/hostport.go#L132): move openHostPorts and closeHostPorts into a common struct, and eliminate the `hostportOpener` parameter in openHostPorts(), to make them looks more consistent.
* Add UTs for closeHostPorts.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 60759, 60531, 60923, 60851, 58717). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Do not create dangling legacy symlink
Do not create dangling legacy symlink if the new symlink to container logs does not exist.
These dangling legacy symlink are later removed by kube runtime gc, so it's better if we do not
create them in the first place to avoid unnecessary work from kube runtime gc. This situation occurs when docker uses journald logging driver.
**What this PR does / why we need it**:
This PR fixes an issue where dangling symlink are being created.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
None.
```
@derekwaynecarr @sjenning @dashpole @kubernetes/sig-node-pr-reviews
Automatic merge from submit-queue (batch tested with PRs 60759, 60531, 60923, 60851, 58717). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Subtract inactive_file from usage when setting memcg threshold
**What this PR does / why we need it**:
Implements solution for #51745, proposed in https://github.com/kubernetes/community/pull/1451.
This is a prerequisite to fixing https://github.com/kubernetes/kubernetes/issues/57901.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#51745
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
/sig node
/priority important-longterm
/kind bug
/assign @sjenning @derekwaynecarr @dchen1107
Automatic merge from submit-queue (batch tested with PRs 60574, 60666, 60831, 60877, 60357). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
fix visible typo
fix visible typo
**Release note**:
```
NONE
```
Automatic merge from submit-queue (batch tested with PRs 60363, 59208, 59465, 60581, 60702). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
log an error message when imageToRuntimeAPIImage failed
**What this PR does / why we need it**:
fix todo: log an error message when imageToRuntimeAPIImage failed
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 59740, 59728, 60080, 60086, 58714). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
kubelet: make --cni-bin-dir accept a comma-separated list of CNI plugin directories
Allow CNI-related network plugin drivers (kubenet, cni) to search a list of
directories for plugin binaries instead of just one. This allows using an
administrator-provided path and fallbacks to others (like the previous default
of /opt/cni/bin) for backwards compatibility.
```release-note
kubelet's --cni-bin-dir option now accepts multiple comma-separated CNI binary directory paths, which are search for CNI plugins in the given order.
```
@kubernetes/rh-networking @kubernetes/sig-network-misc @freehan @pecameron @rajatchopra
Automatic merge from submit-queue (batch tested with PRs 59740, 59728, 60080, 60086, 58714). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
fixes document grammar
**What this PR does / why we need it**:
ATT
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 51423, 53880). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Disable ImageGC when high threshold is set to 100
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*:
fixes#51268
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 61203, 61071). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix issue with race condition during pod deletion
This PR fixes two issues
1. When desired_state_populator removes podvolume state, it should check
whether the actual state already has the volume before deleting it to
make sure actual state has a chance to add the volume into the state
2. When checking podVolume still exists, it not only checks the actual
state, but also the volume disk directory because actual state might not
reflect the real world when kubelet starts.
fixes issue #60645
This PR fixes two issues
1. When desired_state_populator removes podvolume state, it should check
whether the actual state already has the volume before deleting it to
make sure actual state has a chance to add the volume into the state
2. When checking podVolume still exists, it not only checks the actual
state, but also the volume disk directory because actual state might not
reflect the real world when kubelet starts.
Automatic merge from submit-queue (batch tested with PRs 60888, 61225). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Mark reconstructed volumes as reported InUse
When a newly started kubelet finds a directory where a volume should be,
it can be fairly confident that the volume was mounted by previous kubelet
and therefore the volume must have been in node.status.volumesInUse.
Therefore we can mark reconstructed volumes as already reported so
subsequent reconcile() can fix the directory and put the mounted volume
into actual state of world.
Fixes: #60645
**Release note**:
```release-note
NONE
```
/sig storage
/sig node
cc: @gnufied @jingxu97
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add missing container-runtime "remote" option
**What this PR does / why we need it**:
Added the "remote" option to the auto-generated documentation for the
`--container-runtime` flag.
The kubelet flag `--container-runtime` lists the possible values as part of the auto-generated documentation but is missing the "remote" possibility.
**Which issue(s) this PR fixes** :
Fixes#60992
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fixes the races around devicemanager Allocate() and endpoint deletion.
There is a race in predicateAdmitHandler Admit() that getNodeAnyWayFunc()
could get Node with non-zero deviceplugin resource allocatable for a
non-existing endpoint. That race can happen when a device plugin fails,
but is more likely when kubelet restarts as with the current registration
model, there is a time gap between kubelet restart and device plugin
re-registration. During this time window, even though devicemanager could
have removed the resource initially during GetCapacity() call, Kubelet
may overwrite the device plugin resource capacity/allocatable with the
old value when node update from the API server comes in later. This
could cause a pod to be started without proper device runtime config set.
To solve this problem, introduce endpointStopGracePeriod. When a device
plugin fails, don't immediately remove the endpoint but set stopTime in
its endpoint. During kubelet restart, create endpoints with stopTime set
for any checkpointed registered resource. The endpoint is considered to be
in stopGracePeriod if its stoptime is set. This allows us to track what
resources should be handled by devicemanager during the time gap.
When an endpoint's stopGracePeriod expires, we remove the endpoint and
its resource. This allows the resource to be exported through other channels
(e.g., by directly updating node status through API server) if there is such
use case. Currently endpointStopGracePeriod is set as 5 minutes.
Given that an endpoint is no longer immediately removed upon disconnection,
mark all its devices unhealthy so that we can signal the resource allocatable
change to the scheduler to avoid scheduling more pods to the node.
When a device plugin endpoint is in stopGracePeriod, pods requesting the
corresponding resource will fail admission handler.
Tested:
Ran GPUDevicePlugin e2e_node test 100 times and all passed now.
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes https://github.com/kubernetes/kubernetes/issues/60176
**Special notes for your reviewer**:
**Release note**:
```release-note
Fixes the races around devicemanager Allocate() and endpoint deletion.
```
There is a race in predicateAdmitHandler Admit() that getNodeAnyWayFunc()
could get Node with non-zero deviceplugin resource allocatable for a
non-existing endpoint. That race can happen when a device plugin fails,
but is more likely when kubelet restarts as with the current registration
model, there is a time gap between kubelet restart and device plugin
re-registration. During this time window, even though devicemanager could
have removed the resource initially during GetCapacity() call, Kubelet
may overwrite the device plugin resource capacity/allocatable with the
old value when node update from the API server comes in later. This
could cause a pod to be started without proper device runtime config set.
To solve this problem, introduce endpointStopGracePeriod. When a device
plugin fails, don't immediately remove the endpoint but set stopTime in
its endpoint. During kubelet restart, create endpoints with stopTime set
for any checkpointed registered resource. The endpoint is considered to be
in stopGracePeriod if its stoptime is set. This allows us to track what
resources should be handled by devicemanager during the time gap.
When an endpoint's stopGracePeriod expires, we remove the endpoint and
its resource. This allows the resource to be exported through other channels
(e.g., by directly updating node status through API server) if there is such
use case. Currently endpointStopGracePeriod is set as 5 minutes.
Given that an endpoint is no longer immediately removed upon disconnection,
mark all its devices unhealthy so that we can signal the resource allocatable
change to the scheduler to avoid scheduling more pods to the node.
When a device plugin endpoint is in stopGracePeriod, pods requesting the
corresponding resource will fail admission handler.
When a newly started kubelet finds a directory where a volume should be,
it can be fairly confident that the volume was mounted by previous kubelet
and therefore the volume must have been in node.status.volumesInUse.
Therefore we can mark reconstructed volumes as already reported so
subsequent reconcile() can fix the directory and put the mounted volume
into actual state of world.
These dangling legacy symlink are removed by kube runtime gc, so it's better if we do not
create them in the first place to avoid unnecessary work from kube runtime gc.
Users must not be allowed to step outside the volume with subPath.
Therefore the final subPath directory must be "locked" somehow
and checked if it's inside volume.
On Windows, we lock the directories. On Linux, we bind-mount the final
subPath into /var/lib/kubelet/pods/<uid>/volume-subpaths/<container name>/<subPathName>,
it can't be changed to symlink user once it's bind-mounted.
Automatic merge from submit-queue (batch tested with PRs 60159, 60731, 60720, 60736, 60740). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Promote LocalStorageCapacityIsolation feature to beta
The LocalStorageCapacityIsolation feature added a new resource type ResourceEphemeralStorage "ephemeral-storage" so that this resource can be allocated, limited, and consumed as the same way as CPU/memory. All the features related to resource management (resource request/limit, quota, limitrange) are available for local ephemeral storage.
This local ephemeral storage represents the storage for root file system, which will be consumed by containers' writtable layer and logs. Some volumes such as emptyDir might also consume this storage.
Fixes issue #60160
This PR also fixes data race issues discovered after open the feature gate. Basically setNodeStatus function in kubelet could be called by multiple threads so the data needs lock protection. Put the fix with this PR for easy testing.
**Release note**:
```release-note
ACTION REQUIRED: LocalStorageCapacityIsolation feature is beta and enabled by default.
```
The LocalStorageCapacityIsolation feature added a new resource type
ResourceEphemeralStorage "ephemeral-storage" so that this resource can
be allocated, limited, and consumed as the same way as CPU/memory. All
the features related to resource management (resource request/limit, quota, limitrange) are avaiable for local ephemeral storage.
This local ephemeral storage represents the storage for root file system, which will be consumed by containers' writtable layer and logs. Some volumes such as emptyDir might also consume this storage.
Allow CNI-related network plugin drivers (kubenet, cni) to search a list of
directories for plugin binaries instead of just one. This allows using an
administrator-provided path and fallbacks to others (like the previous default
of /opt/cni/bin) for backwards compatibility.
It's only used for the test code and after talking with Rajat, the
vendor stuff was never really used anyway. So convert the vendor
code into a plain array of plugin binary search paths, which is all
the vendor code was doing anyway.
Automatic merge from submit-queue (batch tested with PRs 58171, 58036, 60540). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
dockershim: Return Labels as Info in ImageStatus.
c6ddc749e8 added an Info field to
ImageStatusResponse when Verbose is true. This makes the image's
Labels available in that field, rather than unconditionally returning
an empty map.
**What this PR does / why we need it**:
This PR exposes an image's `Labels` through the CRI. In particular, I want this so I can write an `ImageService` wrapper that delegates all operations to a real `ImageService` but also, when the right `Labels`, ensures any needed [nix store](https://nixos.org/nix/) paths are present on the system when an image is pulled, enabling users to use nix for package distribution while still using containers for isolation and kubernetes for orchestration. In general, though, this should be useful for anything that wants to know about an image's `Labels`
**Special notes for your reviewer**:
I'd prefer to put this change into the `Image` protobuf type instead of putting it into `Info` (gated by `Verbose` or not, available in other requests like `ListImages` or not), but that would be a change to the protocol and it seems `Info` was introduced exactly for this purpose. If it's acceptable to put this into `Image`, I'll rework this.
If this change is acceptable, I will also do the work for `cri-o`, `rktlet`, `frakti`, and `cri-containerd` where applicable.
I have started the process for my employer to sign on to the CLA. I don't have reason to expect it to take long, but because there is more work to do if this change is desired I'd prefer if we can start review before that is completed.
**Release note**:
```release-note
dockershim now makes an Image's Labels available in the Info field of ImageStatusResponse
```
Automatic merge from submit-queue (batch tested with PRs 60342, 60505, 59218, 52900, 60486). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fixed log calls in VolumeManager
Use glog.Infof() instead of glog.Info()
**Release note**:
```release-note
NONE
```
/sig storage
/sig node
Automatic merge from submit-queue (batch tested with PRs 60376, 55584, 60358, 54631, 60291). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
fix "make test"
Before this pr, we get this in linux:
```
$ make test
Running tests for APIVersion: v1,admissionregistration.k8s.io/v1alpha1,admissionregistration.k8s.io/v1beta1,admission.k8s.io/v1beta1,apps/v1beta1,apps/v1beta2,apps/v1,authentication.k8s.io/v1,authentication.k8s.io/v1beta1,authorization.k8s.io/v1,authorization.k8s.io/v1beta1,autoscaling/v1,autoscaling/v2beta1,batch/v1,batch/v1beta1,batch/v2alpha1,certificates.k8s.io/v1beta1,extensions/v1beta1,events.k8s.io/v1beta1,imagepolicy.k8s.io/v1alpha1,networking.k8s.io/v1,policy/v1beta1,rbac.authorization.k8s.io/v1,rbac.authorization.k8s.io/v1beta1,rbac.authorization.k8s.io/v1alpha1,scheduling.k8s.io/v1alpha1,settings.k8s.io/v1alpha1,storage.k8s.io/v1beta1,storage.k8s.io/v1,storage.k8s.io/v1alpha1,
+++ [0224 16:10:13] Running tests without code coverage
can't load package: package k8s.io/kubernetes/pkg/kubelet/winstats: build constraints exclude all Go files in /home/fujitsu/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/kubelet/winstats
!!! [0224 16:10:15] Call tree:
!!! [0224 16:10:15] 1: hack/make-rules/test.sh:402 runTests(...)
Makefile:182: recipe for target 'test' failed
make: *** [test] Error 1
```
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 60376, 55584, 60358, 54631, 60291). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
move pod-check forward
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 60236, 60332, 57375, 60451, 57408). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
improve code comment
**What this PR does / why we need it**:
improve code comment
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```NONE
```
Automatic merge from submit-queue (batch tested with PRs 60236, 60332, 57375, 60451, 57408). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
kube-scheduler: Support extender managed extended resources in kube-scheduler
**What this PR does / why we need it**:
This is the continuation of https://github.com/kubernetes/kubernetes/pull/58851.
- This PR adds new extender configurations in scheduler policy config.
- A set of extended resource names can be specified in an extender config. They are the resources that are managed by the extender. The scheduler will only send pods to the extender if the pod requests at least one of the extended resources in the set.
- An `IgnoredByScheduler` flag can be set along with each of such resources. If this flag is set to true, the scheduler will not check the resource in the `PodFitsResources` predicate.
- This PR also changes the default behavior of the `PodFitsResources` predicate. Now, by default, `PodFitsResources` will ignore the extended resources that are not in node status. This is required to support extender managed extended resources (including cluster-level resources) on node. Note that in kube-scheduler we override the default behavior by not ignoring such missing extended resources.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes https://github.com/kubernetes/kubernetes/issues/53616https://github.com/kubernetes/kubernetes/issues/58850
**Special notes for your reviewer**:
**Release note**:
```
Support extender managed extended resources in kube-scheduler
```
Automatic merge from submit-queue (batch tested with PRs 53689, 56880, 55856, 59289, 60249). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add test/typecheck, a fast typecheck for all build platforms.
Add test/typecheck, a fast typecheck for all build platforms.
Most of the time spent compiling is spent optimizing and linking
binary code. Most errors occur at the syntax or semantic (type) layers.
Go's compiler is importable as a normal package, so we can do fast
syntax and type checking for the 10 platforms we build on.
This currently takes ~6 minutes of CPU time (parallelized).
This makes presubmit cross builds superfluous, since it should catch
most cross-build breaks (generally Unix and 64-bit assumptions).
Example output:
```$ time go run test/typecheck/main.go
type-checking: linux/amd64, windows/386, darwin/amd64, linux/arm,
linux/386, windows/amd64, linux/arm64, linux/ppc64le, linux/s390x, darwin/386
ERROR(windows/amd64) pkg/proxy/ipvs/proxier.go:1708:27: ENXIO not declared by package unix
ERROR(windows/386) pkg/proxy/ipvs/proxier.go:1708:27: ENXIO not declared by package unix
real 0m45.083s
user 6m15.504s
sys 1m14.000s
```
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 53689, 56880, 55856, 59289, 60249). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Harden kube-proxy for unmatched IP versions
**What this PR does / why we need it**:
This PR makes kube-proxy omits & logs & emits event for unmatched IP versions configuration (IPv6 address in IPv4 mode or IPv4 address in IPv6 mode).
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#57219
**Special notes for your reviewer**:
**Release note**:
```release-note
Fix the issue in kube-proxy iptables/ipvs mode to properly handle incorrect IP version.
```
Automatic merge from submit-queue (batch tested with PRs 53689, 56880, 55856, 59289, 60249). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
simplify defer statement
**What this PR does / why we need it**:
simplify defer statement
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
kubelet: setup WindowsContainerResources for windows containers
**What this PR does / why we need it**:
This PR setups WindowsContainerResources for windows containers. It implements proposal here: https://github.com/kubernetes/community/pull/1510.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#56734
**Special notes for your reviewer**:
**Release note**:
```release-note
WindowsContainerResources is set now for windows containers
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add FailedPostStartHook error message.
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#54671
**Special notes for your reviewer**:
/cc @derekwaynecarr
cc @lovejoy @OJezu
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 60157, 60337, 60246, 59714, 60467). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
backoff runtime errors in kubelet sync loop
The runtime health check can race with PLEG's first relist, and this
often results in an unnecessary 5 second wait during Kubelet bootstrap.
This change aims to improve the performance.
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
code-gen: output golint compliant 'Generated by' comment
New PR instead of reopening #58115 because /reopen did not work.
This won't be ready to merge until the upstream https://github.com/kubernetes/gengo/pull/94 merges. Once that merges, the second commit will be changed to godep-save.sh and update-staging-godeps.sh, and the last commit will be changed to update-all.sh
The failing test is due to the upstream changes not being merged yet
```devel-release-note
Go code generated by the code generators will now have a comment which allows them to be easily identified by golint
```
Fixes#56489
Automatic merge from submit-queue (batch tested with PRs 60011, 59256, 59293, 60328, 60367). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add CPU/Memory pod stats for CRI stats.
For https://github.com/kubernetes/features/issues/286.
Add CPU and memory stats for pod.
@kubernetes/sig-node-pr-reviews
/cc @dashpole @yujuhong @abhi @yguo0905
Signed-off-by: Lantao Liu <lantaol@google.com>
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
Summary API will include pod CPU and Memory stats for CRI container runtime.
```
Automatic merge from submit-queue (batch tested with PRs 50724, 59025, 59710, 59404, 59958). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
some typo
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 50724, 59025, 59710, 59404, 59958). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add detailed err in ensure docker process error
Signed-off-by: Peng Gao <peng.gao.dut@gmail.com>
**What this PR does / why we need it**:
Add detailed error.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
use GetUniqueVolumeNameFromSpec instead of implementing it manually for kubelet volume manager
**What this PR does / why we need it**:
`volumeName` is only used for attachable plugin, so we should resolve it inside the `if` statement. Besides, we can use the already exist `GetUniqueVolumeNameFromSpec` mthod instead of invoking `GetVolumeName` and `GetUniqueVolumeName` manually.
**Release note**:
```release-note
NONE
```
/sig storage
/kind cleanup
Automatic merge from submit-queue (batch tested with PRs 60435, 60334, 60458, 59301, 60125). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
dockershim: don't check pod IP in StopPodSandbox
We're about to tear the container down, there's no point. It also suppresses
an annoying error message due to kubelet stupidity that causes multiple
parallel calls to StopPodSandbox for the same sandbox.
docker_sandbox.go:355] failed to read pod IP from plugin/docker: NetworkPlugin cni failed on the status hook for pod "docker-registry-1-deploy_default": Unexpected command output nsenter: cannot open /proc/22646/ns/net: No such file or directory
1) A first StopPodSandbox() request triggered by SyncLoop(PLEG) for
a ContainerDied event calls into TearDownPod() and thus the network
plugin. Until this completes, networkReady=true for the
sandbox.
2) A second StopPodSandbox() request triggered by SyncLoop(REMOVE)
calls PodSandboxStatus() and calls into the network plugin to read
the IP address because networkReady=true
3) The first request exits the network plugin, sets networReady=false,
and calls StopContainer() on the sandbox. This destroys the network
namespace.
4) The second request finally gets around to running nsenter but
the network namespace is already destroyed. It returns an error
which is logged by getIP().
```release-note
NONE
```
@yujuhong @freehan
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
lowercase node name in generated static pod name
**What this PR does / why we need it**:
Cast appended node name to lowercase when generating static pod name on kubelet starting.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#59801
**Special notes for your reviewer**:
Not sure about how to deal with other illegal node names e.g. containing invalid no-alphabetic characters. Maybe just let it fail-hard is not a bad idea.
But considering that containing uppercase letter in the hostname is somehow a usual case even in the production environment of some companies, tolerating uppercase and cast it implicitly should be good.
**Release note**:
```release-note
force node name lowercase on static pod name generating
```
Automatic merge from submit-queue (batch tested with PRs 60346, 60135, 60289, 59643, 52640). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Correct TestUpdatePod comment
Signed-off-by: mYmNeo <thomassong2012@gmail.com>
**What this PR does / why we need it**:
Correct TestUpdatePod comment
**Which issue this PR fixes**
The original one wants to check whether all updates has been caught by podWorker, but podWorker can guarantee only the first event and the last one will processed. Correct the comment if others misunderstand the unit test.
Automatic merge from submit-queue (batch tested with PRs 60346, 60135, 60289, 59643, 52640). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
clean up sysctl code
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```
NONE
```
Automatic merge from submit-queue (batch tested with PRs 60346, 60135, 60289, 59643, 52640). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
fix freespace for image GC
**What this PR does / why we need it**:
use 'continue' in the loop instead of 'break'
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 59159, 60318, 60079, 59371, 57415). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Made a couple API changes to deviceplugin/v1beta1 to avoid future
incompatible API changes:
- Add GetDevicePluginOptions rpc call. This is needed when we switch
from Registration service to probe-based plugin watcher.
- Change AllocateRequest and AllocateResponse to allow device requests
from multiple containers in a pod. Currently only made mechanical
change on the devicemanager and test code to cope with the API but
still issues an Allocate call per container. We can modify the
devicemanager in 1.11 to issue a single Allocate call per pod.
The change will also facilitate incremental API change to communicate
pod level information through Allocate rpc if there is such future
need.
**What this PR does / why we need it**:
Made a couple API changes to deviceplugin/v1beta1 to avoid future incompatible API changes.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes https://github.com/kubernetes/kubernetes/issues/59370
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 60324, 60269, 59771, 60314, 59941). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
expunge the word 'manifest' from Kubelet's config API
The word 'manifest' technically refers to a container-group specification
that predated the Pod abstraction. We should avoid using this legacy
terminology where possible. Fortunately, the Kubelet's config API will
be beta in 1.10 for the first time, so we still had the chance to make
this change.
I left the flags alone, since they're deprecated anyway.
I changed a few var names in files I touched too, but this PR is the
just the first shot, not the whole campaign
(`git grep -i manifest | wc -l -> 1248`).
```release-note
Some field names in the Kubelet's now v1beta1 config API differ from the v1alpha1 API: PodManifestPath is renamed to PodPath, ManifestURL is renamed to PodURL, ManifestURLHeader is renamed to PodURLHeader.
```
Automatic merge from submit-queue (batch tested with PRs 60324, 60269, 59771, 60314, 59941). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Promote configurable pod resolv.conf to Beta and add an e2e test
**What this PR does / why we need it**:
Feature issue: https://github.com/kubernetes/features/issues/504
There is no semantic changes. `CustomPodDNS` feature gate will be turned on by default.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#56521
**Special notes for your reviewer**:
/assign @bowei @thockin
**Release note**:
```release-note
Adds BETA support for `DNSConfig` field in PodSpec and `DNSPolicy=None`.
```
Before this pr, we get this in linux:
```
$ make test
Running tests for APIVersion: v1,admissionregistration.k8s.io/v1alpha1,admissionregistration.k8s.io/v1beta1,admission.k8s.io/v1beta1,apps/v1beta1,apps/v1beta2,apps/v1,authentication.k8s.io/v1,authentication.k8s.io/v1beta1,authorization.k8s.io/v1,authorization.k8s.io/v1beta1,autoscaling/v1,autoscaling/v2beta1,batch/v1,batch/v1beta1,batch/v2alpha1,certificates.k8s.io/v1beta1,extensions/v1beta1,events.k8s.io/v1beta1,imagepolicy.k8s.io/v1alpha1,networking.k8s.io/v1,policy/v1beta1,rbac.authorization.k8s.io/v1,rbac.authorization.k8s.io/v1beta1,rbac.authorization.k8s.io/v1alpha1,scheduling.k8s.io/v1alpha1,settings.k8s.io/v1alpha1,storage.k8s.io/v1beta1,storage.k8s.io/v1,storage.k8s.io/v1alpha1,
+++ [0224 16:10:13] Running tests without code coverage
can't load package: package k8s.io/kubernetes/pkg/kubelet/winstats: build constraints exclude all Go files in /home/fujitsu/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/kubelet/winstats
!!! [0224 16:10:15] Call tree:
!!! [0224 16:10:15] 1: hack/make-rules/test.sh:402 runTests(...)
Makefile:182: recipe for target 'test' failed
make: *** [test] Error 1
```
Automatic merge from submit-queue (batch tested with PRs 60054, 60202, 60219, 58090, 60275). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Disable mount propagation for windows containers
**What this PR does / why we need it**:
Windows containers don't support mount propagation. This PR disables it for windows containers.
Without this PR, windows containers creation would fail with error:
Error: Error response from daemon: invalid bind mount spec "c:\\var\\lib\\kubelet\\pods\\a260a7c4-1852-11e8-bb1d-000d3a19c1da\\volumes\\kubernetes.io~secret\\default-token-rj7qv:c:/var/run/secrets/kubernetes.io/serviceaccount:ro,rslave": invalid volume specification: 'c:\var\lib\kubelet\pods\a260a7c4-1852-11e8-bb1d-000d3a19c1da\volumes\kubernetes.io~secret\default-token-rj7qv:c:\var\run\secrets\kubernetes.io\serviceaccount:ro,rslave'
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#60274
**Special notes for your reviewer**:
**Release note**:
```release-note
Disable mount propagation for windows containers.
```
Automatic merge from submit-queue (batch tested with PRs 59286, 59743, 59883, 60190, 60165). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix image file system stats for windows nodes
**What this PR does / why we need it**:
Kubelet is reporting `invalid capacity 0 on image filesystem` on windows nodes and image GC always fails.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#59742
**Special notes for your reviewer**:
**Release note**:
```release-note
Fix image file system stats for windows nodes
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Delete stale UDP conntrack entries that use hostPort
**What this PR does / why we need it**:
This PR introduces a change to delete stale conntrack entries for UDP connections, specifically for udp connections that use hostPort. When the pod listening on that udp port get updated/restarted(and gets a new ip address), these entries need to be flushed so that ongoing udp connections can recover once the pod is back and the new iptables rules have been installed.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#59033
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
incompatible changes:
- Add GetDevicePluginOptions rpc call. This is needed when we switch
from Registration service to probe-based plugin watcher.
- Change AllocateRequest and AllocateResponse to allow device requests
from multiple containers in a pod. Currently only made mechanical
change on the devicemanager and test code to cope with the API but
still issues an Allocate call per container. We can modify the
devicemanager in 1.11 to issue a single Allocate call per pod.
The change will also facilitate incremental API change to communicate
pod level information through Allocate rpc if there is such future
need.
The word 'manifest' technically refers to a container-group specification
that predated the Pod abstraction. We should avoid using this legacy
terminology where possible. Fortunately, the Kubelet's config API will
be beta in 1.10 for the first time, so we still had the chance to make
this change.
I left the flags alone, since they're deprecated anyway.
I changed a few var names in files I touched too, but this PR is the
just the first shot, not the whole campaign
(`git grep -i manifest | wc -l -> 1248`).
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Critical pod priorityClass addition
**What this PR does / why we need it**:
@bsalamat - Apologies for the delay. This PR is to ensure that all pods with priorityClassName `system-node-critical` and `system-cluster-critical` will be critical pods while preserving backwards compatibility.
**Special notes for your reviewer**:
- Moved some constants and other data structures to scheduler/api/types.go where other constants are present.
- An automatic assignment of critical priorities to pods based on critical pod annotation for backwards compatibility including some unit tests.
xref: https://github.com/kubernetes/kubernetes/issues/57471
**Release note**:
```release-note
Critical pods to use priorityClasses.
```
c6ddc749e8 added an Info field to
ImageStatusResponse when Verbose is true. This makes the image's
Labels available in that field, rather than unconditionally returning
an empty map.
Automatic merge from submit-queue (batch tested with PRs 60106, 59510, 60263, 60063, 59088). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
CodeClean, merge Logf And FailNow to Fatalf
**What this PR does / why we need it**:
Trivial changes to clean code, merge Logf And FailNow to Fatalf.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
"NONE"
```
Automatic merge from submit-queue (batch tested with PRs 60106, 59510, 60263, 60063, 59088). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
clean up KubeletConfigOk condition construction
This PR cleans up the construction of the node condition and also fixes
a small bug where the last transition time could be updated incorrectly
when the sync failure overlay was present.
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 60106, 59510, 60263, 60063, 59088). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Update cadvisor godeps to v0.29.0 and ignore per-cpu metrics
**What this PR does / why we need it**:
Updates the cAdvisor dependency to the cAdvisor release associated with the kubernetes 1.10 release.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#60052
**Special notes for your reviewer**:
This PR also adds per-cpu metrics to the ignoreMetrics list. This is a new metric that can be ignored in the most recent cAdvisor release.
The reason for not collecting per-cpu metrics is that it can cause severe scalability issues.
For example, if using a 128 core machine, and running 100 containers, we have 12800 different streams of metrics just for per-cpu metrics which cAdvisor needs to process and transmit.
Additionally, per-cpu metrics are not used by any kubernetes components, and if a user needs these metrics, they can run cAdvisor as a daemonset.
**Release note**:
```release-note
Disable per-cpu metrics by default for scalability.
Fix inaccurate disk usage monitoring of overlayFs.
Retry docker connection on startup timeout to avoid permanent loss of metrics.
```
/assign @dchen1107
Automatic merge from submit-queue (batch tested with PRs 59463, 59719, 60181, 58283, 59966). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Set shared PID namespace mode based on PodSpec
**What this PR does / why we need it**: This PR enables pod process namespace sharing as an alpha feature, as described in [Shared PID Namespace Proposal](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/node/pod-pid-namespace.md).
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
WIP #1615
**Special notes for your reviewer**:
/assign @dchen1107
**Release note**:
```release-note
When the `PodShareProcessNamespace` alpha feature is enabled, setting `pod.Spec.ShareProcessNamespace` to `true` will cause a single process namespace to be shared between all containers in a pod.
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
add spelling checking script
**What this PR does / why we need it**:
Add spell checking script to avoid involving any typos.
Currently many small PRs are fixing those annoying typos, which is time-consuming and low efficient. We should add such a preflight check before a PR gets merged.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
/sig testing
/area test-infra
/sig release
/cc @ixdy
/assign @liggitt @smarterclayton
**Release note**:
```release-note
add spelling checking script
```
Automatic merge from submit-queue (batch tested with PRs 60208, 60084, 60183, 59713, 60096). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Use SeekStart, SeekCurrent, and SeekEnd repalace of deprecated constant
**What this PR does / why we need it**:
Use SeekStart, SeekCurrent, and SeekEnd repalace of deprecated constant.
'''
// Deprecated: Use io.SeekStart, io.SeekCurrent, and io.SeekEnd.
const (
SEEK_SET int = 0 // seek relative to the origin of the file
SEEK_CUR int = 1 // seek relative to the current offset
SEEK_END int = 2 // seek relative to the end
)
'''
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
This PR cleans up the construction of the node condition and also fixes
a small bug where the last transition time could be updated incorrectly
when the sync failure overlay was present.
The runtime health check can race with PLEG's first relist, and this
often results in an unnecessary 5 second wait during Kubelet bootstrap.
This change aims to improve the performance.
Automatic merge from submit-queue (batch tested with PRs 54191, 59374, 59824, 55032, 59906). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Adding per container stats for CRI runtimes
**What this PR does / why we need it**
This commit aims to collect per container log stats. The change was proposed as a part of #55905. The change includes change the log path from /var/pod/<pod uid>/containername_attempt.log to /var/pod/<pod uid>/containername/containername_attempt.log. The logs are collected by reusing volume package to collect metrics from the log path.
Fixes#55905
**Special notes for your reviewer:**
cc @Random-Liu
**Release note:**
```
Adding container log stats for CRI runtimes.
```
Automatic merge from submit-queue (batch tested with PRs 58716, 59977, 59316, 59884, 60117). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Cap how long the kubelet waits when it has no client cert
If we go a certain amount of time without being able to create a client
cert and we have no current client cert from the store, exit. This
prevents a corrupted local copy of the cert from leaving the Kubelet in a
zombie state forever. Exiting allows a config loop outside the Kubelet
to clean up the file or the bootstrap client cert to get another client
cert.
Five minutes is a totally arbitary timeout, judged to give enough time for really slow static pods to boot.
@mikedanese
```release-note
Set an upper bound (5 minutes) on how long the Kubelet will wait before exiting when the client cert from disk is missing or invalid. This prevents the Kubelet from waiting forever without attempting to bootstrap a new client credentials.
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Invoke preStart RPC call before container start, if desired by plugin
**What this PR does / why we need it**:
1. Adds a new RPC `preStart` to device plugin API
2. Update `Register` RPC handling to receive a flag from the Device plugins as an indicator if kubelet should invoke `preStart` RPC before starting container.
3. Changes in device manager to invoke `preStart` before container start
4. Test case updates
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#56943#56307
**Special notes for your reviewer**:
**Release note**:
```release-note
None
```
/sig node
/area hw-accelerators
/cc @jiayingz @RenaudWasTaken @vishh @ScorpioCPH @sjenning @derekwaynecarr @jeremyeder @lichuqiang @tengqm
Automatic merge from submit-queue (batch tested with PRs 59901, 59302, 59928). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Rename StorageProtection to StorageObjectInUseProtection
Rename StorageProtection to StorageObjectInUseProtection
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#59639
**Special notes for your reviewer**:
**Release note**:
```release-note
Rename StorageProtection to StorageObjectInUseProtection
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Bump runc to latest and modify test cases for linux cgroup manager.
**What this PR does / why we need it**:
This PR has 2 commits
- Bumps runc to latest and fixes trailing "/" problem in ExpandSlice of runc
- Fixes the cgroup_manager_linux_tests.go test cases to have "/" as prefix.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #59993
**Special notes for your reviewer**:
cc @sjenning @derekwaynecarr
**Release note**:
```release-note
NONE
```
This commit aims to collect per container log stats. The
change was proposed as a part of #55905. The change includes
change of the log path from /var/pod/<pod uid>/containername_attempt.log
to /var/pod/<pod uid>/containername/containername_attempt.log.
The logs are collected by reusing volume package to collect
metrics from the log path.
Signed-off-by: abhi <abhi@docker.com>
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Update bazelbuild/rules_go, kubernetes/repo-infra, and gazelle dependencies
**What this PR does / why we need it**: updates our bazelbuild/rules_go dependency in order to bump everything to go1.9.4. I'm separating this effort into two separate PRs, since updating rules_go requires a large cleanup, removing an attribute from most build rules.
**Release note**:
```release-note
NONE
```
According to docker docs, setting MemorySwap equals to Memory can
prevent docker containers from using any swap, instead of setting
MemorySwap to zero.
Automatic merge from submit-queue (batch tested with PRs 59683, 59964, 59841, 59936, 59686). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Reevaluate eviction thresholds after reclaim functions
**What this PR does / why we need it**:
When the node comes under `DiskPressure` due to inodes or disk space, the eviction manager runs garbage collection functions to clean up dead containers and unused images.
Currently, we use the strategy of trying to measure the disk space and inodes freed by garbage collection. However, as #46789 and #56573 point out, there are gaps in the implementation that can cause extra evictions even when they are not required. Furthermore, for nodes which frequently cycle through images, it results in a large number of evictions, as running out of inodes always causes an eviction.
This PR changes this strategy to call the garbage collection functions and ignore the results. Then, it triggers another collection of node-level metrics, and sees if the node is still under DiskPressure.
This way, we can simply observe the decrease in disk or inode usage, rather than trying to measure how much is freed.
**Which issue(s) this PR fixes**:
Fixes#46789Fixes#56573
Related PR #56575
**Special notes for your reviewer**:
This will look cleaner after #57802 removes arguments from [makeSignalObservations](https://github.com/kubernetes/kubernetes/pull/57802/files#diff-9e5246d8c78d50ce4ba440f98663f3e9R719).
**Release note**:
```release-note
NONE
```
/sig node
/kind bug
/priority important-soon
cc @kubernetes/sig-node-pr-reviews
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Improve comments for kubelet
**What this PR does / why we need it**:
Improve comments and fix typos for kubelet.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Collect ephemeral storage capacity on initialization
**What this PR does / why we need it**:
We have had some node e2e flakes where a pod can be rejected if it requests ephemeral storage. This is because we don't set capacity and allocatable for ephemeral storage on initialization.
This PR causes cAdvisor to do one round of stats collection during initialization, which will allow it to get the disk capacity when it first sets the node status.
It also sets the node to NotReady if capacities have not been initialized yet.
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
/assign @jingxu97 @Random-Liu
/sig node
/kind bug
/priority important-soon
Automatic merge from submit-queue (batch tested with PRs 57136, 59920). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Updated PID pressure node condition.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
part of #54313
**Release note**:
```release-note
Updated PID pressure node condition
```
Automatic merge from submit-queue (batch tested with PRs 59873, 59933, 59923, 59944, 59953). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix pod scheduled.
Fix `PodScheduled` condition.
The test `[k8s.io] EquivalenceCache [Serial] validates pod affinity works properly when new replica pod is scheduled` for cri-containerd is flaky.
The reason is that it assume all existing pods should have `PodScheduled` condition, but it is not the case:
```
Feb 15 15:31:01.359: INFO: with-label-390d246e-1265-11e8-beb8-0a580a3c7b55 bootstrap-e2e-minion-group-l6qw Running [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2018-02-15 15:30:59 +0000 UTC } {Ready True 0001-01-01 00:00:00 +0000 UTC 2018-02-15 15:31:00 +0000 UTC } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2018-02-15 15:30:59 +0000 UTC }]
Feb 15 15:31:01.359: INFO: calico-node-7mzxc bootstrap-e2e-minion-group-hztx Running [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2018-02-15 14:17:05 +0000 UTC } {Ready True 0001-01-01 00:00:00 +0000 UTC 2018-02-15 14:17:59 +0000 UTC }]
Feb 15 15:31:01.359: INFO: calico-node-kvrsx bootstrap-e2e-minion-group-l6qw Running [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2018-02-15 15:24:54 +0000 UTC } {Ready True 0001-01-01 00:00:00 +0000 UTC 2018-02-15 15:25:20 +0000 UTC }]
Feb 15 15:31:01.359: INFO: calico-node-llwjh
```
I'm not sure why this doesn't happen to docker. One theory is that we don't prepull image in cri-containerd, and we do start pod a bit faster for cri-containerd, and that exposes the race condition.
/cc @kubernetes/sig-node-bugs
Signed-off-by: Lantao Liu <lantaol@google.com>
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
none
```
Automatic merge from submit-queue (batch tested with PRs 59873, 59933, 59923, 59944, 59953). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Rework volume manager log levels
- all normal logs to go to level 4
- too frequent / duplicate logs go to level 5 (e.g. when something else logged similar message not too far away).
I checked that there is no excessive spam in the log - reconciler runs every 100ms, but it does not log anything if there is nothing to do.
**What this PR does / why we need it**:
This will help us debug flakes. E2e tests do not log levels 10-12 used in volume manager
**Release note**:
```release-note
NONE
```
/sig storage
/sig node
cc: @jingxu97 @sjenning
Automatic merge from submit-queue (batch tested with PRs 59873, 59933, 59923, 59944, 59953). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix DownwardAPI refresh race.
WaitForAttachAndMount should mark only pod in DesiredStateOfWorldPopulator (DSWP) and DSWP should mark the volume to be remounted only when the new pod has been processed.
Otherwise DSWP and reconciler race who gets the new pod first. If it's reconciler, then DownwardAPI and Projected volumes of the pod are not refreshed with new content and they are updated after the next periodic sync (60-90 seconds).
Fixes#59813
/assign @jingxu97 @saad-ali
/sig storage
/sig node
```release-note
None
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix kubelet PVC stale metrics
**What this PR does / why we need it**:
Volumes on each node changes, we should not only add PVC metrics into
gauge vector. It's better use a collector to collector metrics from internal
stats.
Currently, if a PV (bound to a PVC `testpv`) is attached and used by node A, then migrated to node B or just deleted from node A later. `testpvc` metrics will not disappear from kubelet on node A. After a long running time, `kubelet` process will keep a lot of stale volume metrics in memory.
For these dynamic metrics, it's better to use a collector to collect metrics from a data source (`StatsProvider` here), like [kube-state-metrics](https://github.com/kubernetes/kube-state-metrics) scraping metrics from kube-apiserver.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes https://github.com/kubernetes/kubernetes/issues/57686
**Special notes for your reviewer**:
**Release note**:
```release-note
Fix kubelet PVC stale metrics
```
Automatic merge from submit-queue (batch tested with PRs 59353, 59905, 53833). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Graduate kubeletconfig API group to beta
Regarding https://github.com/kubernetes/features/issues/281, this PR moves the kubeletconfig API group to beta.
After #53088, the KubeletConfiguration type should not contain any deprecated or experimental fields, and we should not have to remove any more fields from the type before graduating it to beta.
We need the community to double check for two things, however:
1. Are there any fields currently in the KubeletConfiguration type that you were going to mark deprecated this quarter, but haven't yet?
2. Are there any fields currently in the KubeletConfiguration type that are experimental or alpha, but were not explicitly denoted as such?
Please comment on this PR if you can answer "yes" to either of those two questions. Please cc anyone with a stake in the kubeletconfig API, so we get as much coverage as possible.
/cc @thockin @dchen1107 @Random-Liu @yujuhong @dashpole @tallclair @vishh @abw @freehan @dnardo @bowei @MrHohn @luxas @liggitt @ncdc @derekwaynecarr @mikedanese
@kubernetes/sig-network-pr-reviews, @kubernetes/sig-node-pr-reviews
```release-note
action required: The `kubeletconfig` API group has graduated from alpha to beta, and the name has changed to `kubelet.config.k8s.io`. Please use `kubelet.config.k8s.io/v1beta1`, as `kubeletconfig/v1alpha1` is no longer available.
```
**TODO:**
- [x] Move experimental/non-gated-alpha/soon-to-be-deprecated fields to `KubeletFlags`
- [x] #53088
- [x] #54154
- [x] #54160
- [x] #55562
- [x] #55983
- [x] #57851
- [x] Lift embedded structure out of strings
- [x] #53025
- [x] #54643
- [x] #54823
- [x] #55254
- [x] Resolve relative paths against the location config files are loaded from
- [x] #55648
- [x] Rename to `kubelet.config.k8s.io`
- [x] Comments
- [x] Make sure existing comments at least read sensibly.
- [x] Note default values in comments on the versioned struct.
- [x] Remove any reference to default values in comments on the internal struct.
- [x] Most fields should be `+optional` and `omitempty`. Add where necessary. ~Where omitted, explicitly comment.~ Edit: We should not distinguish between nil and empty, see below items.
- [x] Ensure defaults are specified via `pkg/kubelet/apis/kubelet.config.k8s.io/v1beta1/defaults.go`, not `cmd/kubelet/app/options/options.go`.
- [x] #57770
- [x] Ensure kubeadm does not persist v1alpha1 KubeletConfiguration objects (or feature-gates this functionality)
- [x] Don't make a distinction between empty and nil, because of #43203.
- [x] #59515
- [x] #59681
- [x] Take the opportunity to fix insecure Kubelet defaults @tallclair
- [x] #59666
- [x] Remove CAdvisorPort from KubeletConfiguration wrt #56523.
- [x] #59580
- [x] Hide `ConfigTrialDuration` until we're more sure what to do with it.
- [x] #59628
- [x] Fix `// default: x` comments after rebasing on recent changes.
Automatic merge from submit-queue (batch tested with PRs 59353, 59905, 53833). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Rename ConfigOK to KubeletConfigOk
This is a more accurate name for the condition, as it describes the
status of the Kubelet's configuration.
Also cleans up capitalization of internal names.
```release-note
The ConfigOK node condition has been renamed to KubeletConfigOk.
```
Automatic merge from submit-queue (batch tested with PRs 59877, 59886, 59892). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
kubelet: revert the status HostIP behavior
**What this PR does / why we need it**:
This PR partially revert #57106 to fix#59889.
The PR #57106 changed the behavior of `generateAPIPodStatus` when a **kubeClient** is nil.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Create pkg/kubelet/apis/deviceplugin/v1beta1 directory.
The proto stays the same as v1alpha. Only changes Version in
constants.go to "v1beta1" and the BUILD file to pick up the new dir.
```release-note
Adding pkg/kubelet/apis/deviceplugin/v1beta1 API.
```
This is a more accurate name for the condition, as it describes the
status of the Kubelet's configuration.
Also cleans up capitalization of internal names.
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Secure Kubelet's componentconfig defaults while maintaining CLI compatibility
This updates the Kubelet's componentconfig defaults, while applying the legacy defaults to values from options.NewKubeletConfiguration(). This keeps defaults the same for the command line and improves the security of defaults when you load config from a file.
See: https://github.com/kubernetes/kubernetes/issues/53618
See: https://github.com/kubernetes/kubernetes/pull/53833#discussion_r166669931
Also moves EnableServer to KubeletFlags, per @tallclair's comments on #53833.
We should find way of generating documentation for config file defaults, so that people can easily look up what's different from flags.
```release-note
Action required: Default values differ between the Kubelet's componentconfig (config file) API and the Kubelet's command line. Be sure to review the default values when migrating to using a config file.
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix typos
**What this PR does / why we need it**:
To fix some typos
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
WaitForAttachAndMount should mark only pod in DesiredStateOfWorldPopulator (DSWP)
and DSWP should mark the volume to be remounted only when the new pod has been
processed.
Otherwise DSWP and reconciler race who gets the new pod first. If it's reconciler,
then DownwardAPI and Projected volumes of the pod are not refreshed with new
content and they are updated after the next periodic sync (60-90 seconds).
Automatic merge from submit-queue (batch tested with PRs 59489, 59716). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
devicemanager testing: dynamically choose tmp dir
This avoids the test issue #59488 that I was running into.
I believe I have a reasonable explanation for the race condition in that issue (TLDR: it's probably part of the gRPC API and k8s can only avoid the issue until a proper solution gets worked out together with gRPC), therefore I suggest to merge this PR now both because it avoids the issue and because using fixed tmp directories is something that should be avoided anyway.
/assign @jiayingz
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Ignore 0% and 100% eviction thresholds
Primarily, this gives a way to explicitly disable eviction, which is
necessary to use omitempty on EvictionHard.
See: https://github.com/kubernetes/kubernetes/pull/53833#discussion_r166672137
As justification for this approach, neither 0% nor 100% make sense as
eviction thresholds; in the "less-than" case, you can't have less than
0% of a resource and 100% perpetually evicts; in the
"greater-than" case (assuming we ever add a resource with this
semantic), the reasoning is the reverse (not more than 100%, 0%
perpetually evicts).
```release-note
Eviction thresholds set to 0% or 100% are now ignored.
```
Automatic merge from submit-queue (batch tested with PRs 59767, 56454, 59237, 59730, 55479). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Block Volume: Refactor volumehandler in operationexecutor
**What this PR does / why we need it**:
Based on discussion with @saad-ali at #51494, we need refactor volumehandler in operationexecutor
for Block Volume feature. We don't need to add volumehandler as separated object.
```
VolumeHandler does not need to be a separate object that is constructed inline like this.
You can create a new operation, e.g. UnmountOperation to which you pass the spec,
and it can return either a UnmountVolume or UnmapVolume.
```
**Which issue(s) this PR fixes** : no related issue.
**Special notes for your reviewer**:
@saad-ali @msau42
**Release note**:
```release-note
NONE
```
Primarily, this gives a way to explicitly disable eviction, which is
necessary to use omitempty on EvictionHard.
See: https://github.com/kubernetes/kubernetes/pull/53833#discussion_r166672137
As justification for this approach, neither 0% nor 100% make sense as
eviction thresholds; in the "less-than" case, you can't have less than
0% of a resource and 100% perpetually evicts; in the
"greater-than" case (assuming we ever add a resource with this
semantic), the reasoning is the reverse (not more than 100%, 0%
perpetually evicts).
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
fix all the typos across the project
**What this PR does / why we need it**:
There are lots of typos across the project. We should avoid small PRs on fixing those annoying typos, which is time-consuming and low efficient.
This PR does fix all the typos across the project currently. And with #59463, typos could be avoided when a new PR gets merged.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
/sig testing
/area test-infra
/sig release
/cc @ixdy
/assign @fejta
**Release note**:
```release-note
None
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Bury KubeletConfiguration.ConfigTrialDuration for now
Based on discussion in https://github.com/kubernetes/kubernetes/pull/53833/files#r166669046, this PR chooses not to expose a knob for the trial duration yet. It is unclear exactly which shape this functionality should take in the API.
```release-note
The alpha KubeletConfiguration.ConfigTrialDuration field is no longer available.
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
devicemanager: increase code coverege of endpoint's unit test
Particularly cover the code path when an unhealthy device
becomes healthy.
Hard-coding the tests to use /tmp/device_plugin for sockets is
problematic because it prevents running tests in parallel on the same
machine (perhaps because there are multiple developers, perhaps
because testing is done independently on different code checkouts).
/tmp/device_plugin also was not removed after testing.
This is probably not that relevant. But more importantly, this change
also fixes https://github.com/kubernetes/kubernetes/issues/59488.
"make test" failed in TestDevicePluginReRegistration because something
removed /tmp/device_plugin/device-plugin.sock while something else
tried to connect to it:
2018/02/07 14:34:39 Starting to serve on /tmp/device_plugin/device-plugin.sock
[pid 29568] connect(14, {sa_family=AF_UNIX, sun_path="/tmp/device_plugin/server.sock"}, 33) = 0
[pid 29568] unlinkat(AT_FDCWD, "/tmp/device_plugin/server.sock", 0) = 0
[pid 29568] unlinkat(AT_FDCWD, "/tmp/device_plugin/device-plugin.sock", 0) = 0
[pid 29568] --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=29568, si_uid=1000} ---
[pid 29568] connect(6, {sa_family=AF_UNIX, sun_path="/tmp/device_plugin/device-plugin.sock"}, 40) = -1 ENOENT (No such file or directory)
E0207 14:34:39.961321 29568 endpoint.go:117] listAndWatch ended unexpectedly for device plugin mock with error rpc error: code = Unavailable desc = transport is closing
strace: Process 29623 attached
[pid 29574] connect(3, {sa_family=AF_UNIX, sun_path="/tmp/device_plugin/device-plugin.sock"}, 40) = -1 ENOENT (No such file or directory)
[pid 29623] connect(3, {sa_family=AF_UNIX, sun_path="/tmp/device_plugin/device-plugin.sock"}, 40) = -1 ENOENT (No such file or directory)
[pid 29574] connect(3, {sa_family=AF_UNIX, sun_path="/tmp/device_plugin/device-plugin.sock"}, 40) = -1 ENOENT (No such file or directory)
E0207 14:34:49.961324 29568 endpoint.go:60] Can't create new endpoint with path /tmp/device_plugin/device-plugin.sock err failed to dial device plugin: context deadline exceeded
E0207 14:34:49.961390 29568 manager.go:340] Failed to dial device plugin with request &RegisterRequest{Version:v1alpha2,Endpoint:device-plugin.sock,ResourceName:fake-domain/resource,}: failed to dial device plugin: context deadline exceeded
panic: test timed out after 2m0s
It's not entirely certain which code was to blame for this unlinkat()
calls (perhaps some cleanup code from a previous test running in a
goroutine?) but this no longer happened after switching to per-test
socket directories.
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add mountpoint as CRI image filesystem storage identifier.
Fixes https://github.com/kubernetes/kubernetes/issues/57356.
This PR changes CRI to use mountpoint as storage identifier. See https://github.com/kubernetes/kubernetes/issues/57356#issuecomment-363608733.
Note that:
1) This doesn't work with devicemapper for now. Please feel free to propose change for device mapper, we can discuss more about this after this first version is merged. @mrunalp @runcom
2) `mountpoint` is added as new field in `StorageIdentifier` now. After https://github.com/kubernetes/kubernetes/pull/58973 is merged, we can remove the UUID field in `v1alpha2`.
/cc @yujuhong @feiskyer @yguo0905 @dashpole @mikebrow @abhi @kubernetes/sig-node-api-reviews
**Release note**:
```release-note
CRI starts using moutpoint as image filesystem identifier instead of UUID.
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Redesign and implement volume reconstruction work
This PR is the first part of redesign of volume reconstruction work. The detailed design information is https://github.com/kubernetes/community/pull/1601
The changes include
1. Remove dependency on volume spec stored in actual state for volume
cleanup process (UnmountVolume and UnmountDevice)
Modify AttachedVolume struct to add DeviceMountPath so that volume
unmount operation can use this information instead of constructing from
volume spec
2. Modify reconciler's volume reconstruction process (syncState). Currently workflow
is when kubelet restarts, syncState() is only called once before
reconciler starts its loop.
a. If volume plugin supports reconstruction, it will use the
reconstructed volume spec information to update actual state as before.
b. If volume plugin cannot support reconstruction, it will use the
scanned mount path information to clean up the mounts.
In this PR, all the plugins still support reconstruction (except
glusterfs), so reconstruction of some plugins will still have issues.
The next PR will modify those plugins that cannot support reconstruction
well.
This PR addresses issue #52683
Automatic merge from submit-queue (batch tested with PRs 59344, 59595, 59598). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Remove unnecessary summary api call.
Summary API call is not as cheap as we think. Especially for CRI container runtime, it means:
1) Extra cgroups parsing (because cpu/memory are considered to be on demand);
2) Extra grpc encoding/decoding `ListPodSandboxes`, `ListContainers`, `ListContainerStats`;
I don't think it is necessary to call summary twice inside the same function.
/cc @kubernetes/sig-node-pr-reviews @dashpole @jingxu97
Signed-off-by: Lantao Liu <lantaol@google.com>
**Release note**:
```release-note
none
```
Automatic merge from submit-queue (batch tested with PRs 59054, 59515, 59577). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add 'none' option to EnforceNodeAllocatable
This lets us use omitempty on `EnforceNodeAllocatable`. We shouldn't treat
`nil` as different from `[]T{}`, because this can play havoc with
serializers (a-la #43203).
See: https://github.com/kubernetes/kubernetes/pull/53833#discussion_r166672137
```release-note
'none' can now be specified in KubeletConfiguration.EnforceNodeAllocatable (--enforce-node-allocatable) to explicitly disable enforcement.
```
We're about to tear the container down, there's no point. It also suppresses
an annoying error message due to kubelet stupidity that causes multiple
parallel calls to StopPodSandbox for the same sandbox.
docker_sandbox.go:355] failed to read pod IP from plugin/docker: NetworkPlugin cni failed on the status hook for pod "docker-registry-1-deploy_default": Unexpected command output nsenter: cannot open /proc/22646/ns/net: No such file or directory
1) A first StopPodSandbox() request triggered by SyncLoop(PLEG) for
a ContainerDied event calls into TearDownPod() and thus the network
plugin. Until this completes, networkReady=true for the
sandbox.
2) A second StopPodSandbox() request triggered by SyncLoop(REMOVE)
calls PodSandboxStatus() and calls into the network plugin to read
the IP address because networkReady=true
3) The first request exits the network plugin, sets networReady=false,
and calls StopContainer() on the sandbox. This destroys the network
namespace.
4) The second request finally gets around to running nsenter but
the network namespace is already destroyed. It returns an error
which is logged by getIP().
Automatic merge from submit-queue (batch tested with PRs 57824, 58806, 59410, 59280). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
2nd try at using a vanity GCR name
The 2nd commit here is the changes relative to the reverted PR. Please focus review attention on that.
This is the 2nd attempt. The previous try (#57573) was reverted while we
figured out the regional mirrors (oops).
New plan: k8s.gcr.io is a read-only facade that auto-detects your source
region (us, eu, or asia for now) and pulls from the closest. To publish
an image, push k8s-staging.gcr.io and it will be synced to the regionals
automatically (similar to today). For now the staging is an alias to
gcr.io/google_containers (the legacy URL).
When we move off of google-owned projects (working on it), then we just
do a one-time sync, and change the google-internal config, and nobody
outside should notice.
We can, in parallel, change the auto-sync into a manual sync - send a PR
to "promote" something from staging, and a bot activates it. Nice and
visible, easy to keep track of.
xref https://github.com/kubernetes/release/issues/281
TL;DR:
* The new `staging-k8s.gcr.io` is where we push images. It is literally an alias to `gcr.io/google_containers` (the existing repo) and is hosted in the US.
* The contents of `staging-k8s.gcr.io` are automatically synced to `{asia,eu,us)-k8s.gcr.io`.
* The new `k8s.gcr.io` will be a read-only alias to whichever regional repo is closest to you.
* In the future, images will be promoted from `staging` to regional "prod" more explicitly and auditably.
```release-note
Use "k8s.gcr.io" for pulling container images rather than "gcr.io/google_containers". Images are already synced, so this should not impact anyone materially.
Documentation and tools should all convert to the new name. Users should take note of this in case they see this new name in the system.
```
This is the 2nd attempt. The previous was reverted while we figured out
the regional mirrors (oops).
New plan: k8s.gcr.io is a read-only facade that auto-detects your source
region (us, eu, or asia for now) and pulls from the closest. To publish
an image, push k8s-staging.gcr.io and it will be synced to the regionals
automatically (similar to today). For now the staging is an alias to
gcr.io/google_containers (the legacy URL).
When we move off of google-owned projects (working on it), then we just
do a one-time sync, and change the google-internal config, and nobody
outside should notice.
We can, in parallel, change the auto-sync into a manual sync - send a PR
to "promote" something from staging, and a bot activates it. Nice and
visible, easy to keep track of.
Automatic merge from submit-queue (batch tested with PRs 59010, 59212, 59281, 59014, 59297). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Improve error returned when fetching container logs during pod termination
**What this PR does / why we need it**:
This change better handles fetching of logs when a container is in a crash loop backoff state. In cases where it is unable to fetch the logs, it gives a helpful error message back to a user who has requested logs of a container from a terminated pod. Rather than attempting to get logs for a container using an empty container ID, it returns a useful error message.
In cases where the container runtime gets an error, log the error but don't leak it back through the API to the user.
**Which issue(s) this PR fixes**:
Fixes#59296
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 59276, 51042, 58973, 59377, 59472). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
clean up unused function GetKubeletDockerContainers
**What this PR does / why we need it**:
fix todo: function GetKubeletDockerContainers is not unused,it has been migrated off in test/e2e_node/garbage_collector_test.go in [#57976](https://github.com/kubernetes/kubernetes/pull/57976/files)
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 59276, 51042, 58973, 59377, 59472). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Update Container Runtime Interface to use enumerated namespace modes
**What this PR does / why we need it**: This updates the CRI as described in the [Shared PID Namespace](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/node/pod-pid-namespace.md#container-runtime-interface-changes) proposal. This change to the alpha API is not backwards compatible: implementations of the CRI will need to update to the new API version.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
WIP #1615
**Special notes for your reviewer**:
/assign @yujuhong
**Release note**:
```release-note
[action-required] The Container Runtime Interface (CRI) version has increased from v1alpha1 to v1alpha2. Runtimes implementing the CRI will need to update to the new version, which configures container namespaces using an enumeration rather than booleans.
```
Automatic merge from submit-queue (batch tested with PRs 59276, 51042, 58973, 59377, 59472). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
kubelet: only register api source when connecting
**What this PR does / why we need it**:
before this change, an api source was always registered, even when there
was no kubeclient. this lead to some operations blocking waiting for
podConfig.SeenAllSources to pass, which it never would.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#59275
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
This also incorporates the version string into the package name so
that incompatibile versions will fail to connect.
Arbitrary choices:
- The proto3 package name is runtime.v1alpha2. The proto compiler
normally translates this to a go package of "runtime_v1alpha2", but
I renamed it to "v1alpha2" for consistency with existing packages.
- kubelet/apis/cri is used as "internalapi". I left it alone and put the
public "runtimeapi" in kubelet/apis/cri/runtime.
Automatic merge from submit-queue (batch tested with PRs 59441, 58264, 59287, 59396, 59439). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add context to all relevant cloud APIs
**What this PR does / why we need it**:
This adds context to all the relevant cloud provider interface signatures.
Callers of those APIs are currently satisfied using context.TODO().
There will be follow on PRs to push the context through the stack.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#815
**Special notes for your reviewer**:
For an idea of the full scope of this change please look at PR #58532.
**Release note**:
```release-note
Implementers of the cloud provider interface will note the addition of a context to this interface. Trivial code modification will be necessary for a cloud provider to continue to compile.
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add unit test for endpoint allocate
**What this PR does / why we need it**:
Adds a unit test for covering `allocate` function at endpoint.
**Release note**:
```release-note
None
```
/kind testing
/area hw-accelerators
/cc @jiayingz @vishh @derekwaynecarr @RenaudWasTaken @resouer @ConnorDoyle
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Use GlobalMemoryStatusEx to get total physical memory on Windows node
**What this PR does / why we need it**:
This PR fixes issue #57110 due to failure in getting total physical memory on some Windows VM such as in VMWare Fusion or Virtualbox. This change uses GlobalMemoryStatusEx instead of GetPhysicallyInstalledSystemMemory to retrieve total physical memory on Windows node. The amount obtained this way is also closer in parity with reading MemTotal from /proc/meminfo on Linux node.
(thanks to @martinivanov and @marono for the help)
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#57110
**Special notes for your reviewer**:
**Release note**:
```release-note
```
This adds context to all the relevant cloud provider interface signatures.
Callers of those APIs are currently satisfied using context.TODO().
There will be follow on PRs to push the context through the stack.
For an idea of the full scope of this change please look at PR #58532.
Automatic merge from submit-queue (batch tested with PRs 52942, 58415). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Improve messaging on volume expansion
- we now provide clear message to user what to do when cloudprovider resizing is finished
and file system resizing is needed.
- add a event when resizing is successful
- Use PATCH both in controller-manager and kubelet for updating PVC status
- Remove code duplication between controller-manager and kubelet for updating PVC status
- Only remove conditions that are managed by resize controller
```release-note
Improve messages user gets during and after volume resizing is done.
```
Automatic merge from submit-queue (batch tested with PRs 58184, 59307, 58172). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add annotations to the device plugin API
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** : Related to #56649 but does not fix it
This adds the ability for the device plugins to annotate containers.
Product wise, this allows the NVIDIA device plugin to support CRI-O (which allows hooks through container annotations).
**Special notes for your reviewer**:
/area hw-accelerators
/cc @vishh @jiayingz @vikaschoudhary16
I'm wondering if it would make sense to fire a blank call to `newContainerAnnotations` at the start of the deviceplugin to get Annotations that are forbidden.
Current behavior is that any Annotations that conflicts with Kubelet will be overwritten by Kubelet.
**Release note**:
```release-note
NONE
```
This PR is the first part of redesign of volume reconstruction work. The
changes include
1. Remove dependency on volume spec stored in actual state for volume
cleanup process (UnmountVolume and UnmountDevice)
Modify AttachedVolume struct to add DeviceMountPath so that volume
unmount operation can use this information instead of constructing from
volume spec
2. Modify reconciler's volume reconstruction process (syncState). Currently workflow
is when kubelet restarts, syncState() is only called once before
reconciler starts its loop.
a. If volume plugin supports reconstruction, it will use the
reconstructed volume spec information to update actual state as before.
b. If volume plugin cannot support reconstruction, it will use the
scanned mount path information to clean up the mounts.
In this PR, all the plugins still support reconstruction (except
glusterfs), so reconstruction of some plugins will still have issues.
The next PR will modify those plugins that cannot support reconstruction
well.
This PR addresses issue #52683, #54108 (This PR includes the changes to
update devicePath after local attach finishes)
If we go a certain amount of time without being able to create a client
cert and we have no current client cert from the store, exit. This
prevents a corrupted local copy of the cert from leaving the Kubelet in a
zombie state forever. Exiting allows a config loop outside the Kubelet
to clean up the file or the bootstrap client cert to get another client
cert.
Automatic merge from submit-queue (batch tested with PRs 59097, 57076, 59295). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add windows config to Kubelet CRI
**What this PR does / why we need it**:
Currently Container Runtime Interface (CRI) only supports LinuxContainerConfig and therefore LinuxContainerResources in ContainerConfig. Windows resource config is different from Linux's, although it shares some common properties.
This PR adds windows config to CRI. Add newly added WindowsContainerResources is original from OCI spec (see https://github.com/opencontainers/runtime-spec/blob/master/specs-go/config.go#L437).
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
First part of #56734. A further PR is needed to fill the values after we have agreement on the spec.
**Special notes for your reviewer**:
**Release note**:
```release-note
Add windows config to Kubelet CRI
```
/assign @yujuhong @brendandburns
/cc @taylorb-microsoft @JiangtianLi @dchen1107
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Ensure that the runtime mounts RO volumes read-only
**What this PR does / why we need it**:
This change makes it so that containers cannot write to secret, configMap, downwardAPI and projected volumes since the runtime will now mount them read-only. This change makes things less confusing for a user since any attempt to update a secret volume will result in an error rather than a successful change followed by a revert by the kubelet when the volume next syncs.
It also adds a feature gate `ReadOnlyAPIDataVolumes` to a provide a way to disable the new behavior in 1.10, but for 1.11, the new behavior will become non-optional.
Also, E2E tests for downwardAPI and projected volumes are updated to mount the volumes somewhere other than /etc.
**Which issue(s) this PR fixes**
Fixes#58719
**Release note**:
```release-note
Containers now mount secret, configMap, downwardAPI and projected volumes read-only. Previously,
container modifications to files in these types of volumes were temporary and reverted by the kubelet
during volume sync. Until version 1.11, setting the feature gate ReadOnlyAPIDataVolumes=false will
preserve the old behavior.
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Postpone PV deletion with finalizer when it is being used
Postpone PV deletion if it is bound to a PVC
xref: https://github.com/kubernetes/community/pull/1608
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#33355
**Special notes for your reviewer**:
**Release note**:
```release-note
Postpone PV deletion when it is being bound to a PVC
```
WIP, assign to myself first
/assign @NickrenREN
before this change, an api source was always registered, even when there
was no kubeclient. this lead to some operations blocking waiting for
podConfig.SeenAllSources to pass, which it never would.
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Only rotate certificates in the background
Change the Kubelet to not block until the first certs have rotated (we didn't act on it anyway) and fall back to the bootstrap cert if the most recent rotated cert is expired on startup.
The certificate manager originally had a "block on startup" rotation behavior to ensure at least one rotation happened on startup. However, since rotation may not succeed within the first time window the code was changed to simply print the error rather than return it. This meant that the blocking rotation has no purpose - it cannot cause the kubelet to fail, and it *does* block the kubelet from starting static pods before the api server becomes available.
The current block behavior causes a bootstrapped kubelet that is also set to run static pods to wait several minutes before actually launching the static pods, which means self-hosted masters using static pods have a pointless delay on startup.
Since blocking rotation has no benefit and can't actually fail startup, this commit removes the blocking behavior and simplifies the code at the same time. The goroutine for rotation now completely owns the deadline, the shouldRotate() method is removed, and the method that sets rotationDeadline now returns it. We also explicitly guard against a negative sleep interval and omit the message.
Should have no impact on bootstrapping except the removal of a long delay on startup before static pods start.
The other change is that an expired certificate from the cert manager is *not* considered a valid cert, which triggers an immediate rotation. This causes the cert manager to fall back to the original bootstrap certificate until a new certificate is issued. This allows the bootstrap certificate on masters to be "higher powered" and allow the node to function prior to initial approval, which means someone configuring the masters with a pre-generated client cert can be guaranteed that the kubelet will be able to communicate to report self-hosted static pod status, even if the first client rotation hasn't happened. This makes master self-hosting more predictable for static configuration environments.
```release-note
When using client or server certificate rotation, the Kubelet will no longer wait until the initial rotation succeeds or fails before starting static pods. This makes running self-hosted masters with rotation more predictable.
```
Add a feature gate ReadOnlyAPIDataVolumes to a provide a way to
disable the new behavior in 1.10, but for 1.11, the new
behavior will become non-optional.
Also, update E2E tests for downwardAPI and projected volumes
to mount the volumes somewhere other than /etc.
Automatic merge from submit-queue (batch tested with PRs 59106, 58985, 59068, 59120, 59126). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix cross-build breakage after #58174
**What this PR does / why we need it**:
Fix cross-build breakage after #58174
@cblecker
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#59121
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix PodPidsLimit and ConfigTrialDuration on internal KubeletConfig type
They should both follow the convention of not being a pointer on the internal type.
This required adding a conversion function between `int64` and `*int64`. A side effect is this removes a warning in the generated code for the apps API group.
@dims
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add support of hyperv isolation for windows containers
**What this PR does / why we need it**:
Add support of hyperv isolation for windows containers.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#58750
**Special notes for your reviewer**:
Only one container per pod is supported yet.
**Release note**:
```release-note
Windows containers now support experimental Hyper-V isolation by setting annotation `experimental.windows.kubernetes.io/isolation-type=hyperv` and feature gates HyperVContainer. Only one container per pod is supported yet.
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fixes for HostIPC tests to work when Docker has SELinux support enabled.
**What this PR does / why we need it**:
Fixes for HostIPC tests to work when Docker has SELinux support enabled.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
N/A
**Special notes for your reviewer**:
The core of the matter is to use `ipcs` from util-linux rather than the one from busybox. The typical SELinux policy has enough to allow Docker containers (running under svirt_lxc_net_t SELinux type) to access IPC information by reading the contents of the files under /proc/sysvipc/, but not by using the shmctl etc. syscalls.
The `ipcs` implementation in busybox will use `shmctl(0, SHM_INFO, ...)` to detect whether it can read IPC info (see source code [here](https://git.busybox.net/busybox/tree/util-linux/ipcs.c?h=1_28_0#n138)), while the one in util-linux will prefer to read from the /proc files directly if they are available (see source code [here](https://github.com/karelzak/util-linux/blob/v2.27.1/sys-utils/ipcutils.c#L108)).
It turns out the SELinux policy doesn't allow the shmctl syscalls in an unprivileged container, while access to it through the /proc interface is fine. (One could argue this is a bug in the SELinux policy, but getting it fixed on stable OSs is hard, and it's not that hard for us to test it with an util-linux `ipcs`, so I propose we do so.)
This PR also contains a refactor of the code setting IpcMode, since setting it in the "common options" function is misleading, as on containers other than the sandbox, it ends up always getting overwritten, so let's only set it to "host" in the Sandbox.
It also has a minor fix for the `ipcmk` call, since support for size suffix was only introduced in recent versions of it.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Remove setInitError.
**What this PR does / why we need it**:
Removes setInitError, it's not sure it was ever really used, and it causes the kubelet to hang and get wedged.
**Which issue(s) this PR fixes**
Fixes#46086
**Special notes for your reviewer**:
If `initializeModules()` in `kubelet.go` encounters an error, it calls `runtimeState.setInitError(...)`
47d61ef472/pkg/kubelet/kubelet.go (L1339)
The trouble with this is that `initError` is never cleared, which means that `runtimeState.runtimeErrors()` always returns this `initError`, and thus pods never start sync-ing.
In normal operation, this is expected and desired because eventually the runtime is expected to become healthy, but in this case, `initError` is never updated, and so the system just gets wedged.
47d61ef472/pkg/kubelet/kubelet.go (L1751)
We could add some retry to `initializeModules()` but that seems unnecessary, as eventually we'd want to just die anyway. Instead, just log fatal and die, a supervisor will restart us.
Note, I'm happy to add some retry here too, if that makes reviewers happier.
**Release note**:
```release-note
Prevent kubelet from getting wedged if initialization of modules returns an error.
```
@feiskyer @dchen1107 @janetkuo
@kubernetes/sig-node-bugs
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Make eviction manager work with CRI container runtime.
Previously, eviction manager uses a function `HasDedicatedImageFs` in `pkg/kubelet/cadvisor` to detect whether image fs and root fs are on the same device.
However, it doesn't work with CRI container runtime which provides container/image stats through CRI. Thus all eviction tests for containerd are failing now. https://k8s-testgrid.appspot.com/sig-node-containerd#node-e2e-flaky
This PR makes it work with CRI container runtime.
@kubernetes/sig-node-pr-reviews
@yujuhong @yguo0905 @feiskyer @mrunalp @abhi @dashpole
Signed-off-by: Lantao Liu <lantaol@google.com>
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
none
```
They should both follow the convention of not being a pointer on the
internal type. This required adding a conversion function between
`int64` and `*int64`.
A side effect is this removes a warning in the generated code for the
apps API group.
Automatic merge from submit-queue (batch tested with PRs 58899, 58980). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
CRI: Add a call to reopen log file for a container
This allows a daemon external to the container runtime to rotate the log
file, and then ask the runtime to reopen the files.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#58823
**Release note**:
```release-note
CRI: Add a call to reopen log file for a container.
```
Automatic merge from submit-queue (batch tested with PRs 58777, 58978, 58977, 58775). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix pod sandbox privilege.
Fixes https://github.com/kubernetes/kubernetes/issues/58979.
In cri-containerd, we start enforcing that a privileged container can't be created in privileged sandbox in https://github.com/containerd/cri-containerd/pull/577.
However, after that the e2e-gci-device-plugin-gpu test starts failing. https://k8s-testgrid.appspot.com/sig-node-containerd#e2e-gci-device-plugin-gpu
```
I0128 06:49:09.117] Jan 28 06:49:09.086: INFO: At 2018-01-28 06:41:10 +0000 UTC - event for nvidia-driver-installer-5kkrz: {kubelet bootstrap-e2e-minion-group-7s2v} Failed: (combined from similar events): Error: failed to generate container "cfb9f4f01fc2685db6469d3f6348077b94d4aa577e2e6345bf890f8871ec80dd" spec: no privileged container allowed in sandbox
```
The reason is that kubelet doesn't check init container when setting sandbox privilege.
Signed-off-by: Lantao Liu <lantaol@google.com>
**Release note**:
```release-note
none.
```
@kubernetes/sig-node-bugs @yujuhong @feiskyer @mrunalp
Automatic merge from submit-queue (batch tested with PRs 58955, 58968, 58971, 58963, 58298). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
pkg: kubelet: do not assume anything about images names
This patch fixes a regression introduced by
https://github.com/kubernetes/kubernetes/pull/51751 in the CRI
interface.
That commit actually changed a unit test where we were previously *not*
assuming anything about an image name.
Before that commit, if you send the image "busybox" through the CRI,
the container runtime receives "busybox". After that patch the
container runtime gets "docker.io/library/busybox".
While that may be correct for the internal kube dockershim, in the CRI
we must not assume anything about image names. The ImageSpec is not
providing any spec around the image so the container runtime should
just get the raw image name from the pod spec. Every container runtime
can handle image names the way it wants. The "docker.io" namespace is
not at all "standard", CRI-O is not following what the docker UI say
since that's the docker UI. We should not focus the CRI on wrong UI
design, especially around a default namespace.
Image name normalization is a Docker implementation detail around short images names, not the CRI.
ImageSpec is not standardized yet:
https://github.com/kubernetes/kubernetes/issues/46255 and
https://github.com/kubernetes/kubernetes/issues/7203
This is something which should land in 1.9 as well since the regression
is from 1.8.
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
Fix regression in the CRI: do not add a default hostname on short image names
```
Automatic merge from submit-queue (batch tested with PRs 56995, 58498, 57426, 58902, 58863). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
kubelet: remove the rktshim directory
This package contains only placeholders without actual implementation.
Since it is not currently under active development, remove it to avoid
unnecessary change needed whenever the interface is changed.
Automatic merge from submit-queue (batch tested with PRs 56995, 58498, 57426, 58902, 58863). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Get windows kernel version directly from registry
**What this PR does / why we need it**:
kubernetes/kubernetes#55143 gets windows kernel version by calling windows.GetVersion(), but it doesn't work on windows 10. From https://msdn.microsoft.com/en-us/library/windows/desktop/ms724439(v=vs.85).aspx, GetVersion requires app to be manifested.
Applications not manifested for Windows 8.1 or Windows 10 will return the Windows 8 OS version value (6.2). I tried a toy go program using GetVersion on Windows 10 and it returns 0x23f00206.
Given the limited win32 functions in golang, we should read from registry directly.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#58497
**Special notes for your reviewer**:
Should also cherry-pick to v1.9.
**Release note**:
```release-note
Get windows kernel version directly from registry
```
/cc @JiangtianLi @taylorb-microsoft
Automatic merge from submit-queue (batch tested with PRs 56995, 58498, 57426, 58902, 58863). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
flag precedence redo
Changes the Kubelet configuration flag precedence order so that flags take precedence over config from files/ConfigMaps.
This should fix the re-parse issue with #56097 that led to revert.
Fixes#56171.
In order to prevent global flags (registered in 3rd party libs, etc.) from leaking into the command's help text, this PR turns off Cobra's flag parsing in the `kubelet` command and re-implements help and usage funcs for the Kubelet. Cobra's default funcs automatically merge all global flags into the command's flagset, which results in incorrect help text. I tried to keep the formatting as close as possible to the what the Kubelet currently produces.
Diff between Kubelet's help text on `upstream/master` vs `mtaufen/kc-flags-precedence-redo`, which shows a leaked flag being removed, but no change to the formatting:
```
diff --git a/upstream.master.help b/mtaufen.kc-flags-precedence-redo.help
index 798a030..0797869 100644
--- a/upstream.master.help
+++ b/mtaufen.kc-flags-precedence-redo.help
@@ -30,7 +30,6 @@ Flags:
--authorization-mode string Authorization mode for Kubelet server. Valid options are AlwaysAllow or Webhook. Webhook mode uses the SubjectAccessReview API to determine authorization. (default "AlwaysAllow")
--authorization-webhook-cache-authorized-ttl duration The duration to cache 'authorized' responses from the webhook authorizer. (default 5m0s)
--authorization-webhook-cache-unauthorized-ttl duration The duration to cache 'unauthorized' responses from the webhook authorizer. (default 30s)
- --azure-container-registry-config string Path to the file containing Azure container registry configuration information.
--bootstrap-checkpoint-path string <Warning: Alpha feature> Path to to the directory where the checkpoints are stored
--bootstrap-kubeconfig string Path to a kubeconfig file that will be used to get client certificate for kubelet. If the file specified by --kubeconfig does not exist, the bootstrap kubeconfig is used to request a client certificate from the API server. On success, a kubeconfig file referencing the generated client certificate and key is written to the path specified by --kubeconfig. The client certificate and key file will be stored in the directory pointed by --cert-dir.
--cadvisor-port int32 The port of the localhost cAdvisor endpoint (set to 0 to disable) (default 4194)
```
Ultimately, I think we should implement a common lib that K8s components can use to generate clean help text, as the global flag leakage problem affects all core k8s binaries. I would like to do so in a future PR, to keep this PR simple. We could base the help text format on the default values returned from `Command.HelpTemplate` and `Command.UsageTemplate`. Unfortunately, the template funcs used to process these defaults are private to Cobra, so we'd have to re-implement these, or avoid using them.
```release-note
NONE
```
- we now provide clear message to user what to do when cloudprovider resizing is finished
and file system resizing is needed.
- add a event when resizing is successful.
- Use Patch for updating PVCs in both kubelet and controller-manager
- Extract updating pvc util function in one place.
- Only update resize conditions on progress
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
clean up unused const
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
This patch fixes a regression introduced by
https://github.com/kubernetes/kubernetes/pull/51751 in the CRI
interface.
That commit actually changed a unit test where we were previously *not*
assuming anything about an image name.
Before that commit, if you send the image "busybox" through the CRI,
the container runtime receives "busybox". After that patch the
container runtime gets "docker.io/library/busybox".
While that may be correct for the internal kube dockershim, in the CRI
we must not assume anything about image names. The ImageSpec is not
providing any spec around the image so the container runtime should
just get the raw image name from the pod spec. Every container runtime
can handle image names the way it wants. The "docker.io" namespace is
not at all "standard", CRI-O is not following what the docker UI say
since that's the docker UI. We should not focus the CRI on wrong UI
design, especially around a default namespace.
ImageSpec is not standardized yet:
https://github.com/kubernetes/kubernetes/issues/46255 and
https://github.com/kubernetes/kubernetes/issues/7203
This is something which should land in 1.9 as well since the regression
is from 1.8.
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
This changes the Kubelet configuration flag precedence order so that
flags take precedence over config from files/ConfigMaps.
See #56171 for rationale.
Note: Feature gates accumulate with the following
precedence (greater number overrides lesser number):
1. file-based config
2. dynamic cofig
3. flag-based config
The certificate manager originally had a "block on startup" rotation
behavior to ensure at least one rotation happened on startup. However,
since rotation may not succeed within the first time window the code was
changed to simply print the error rather than return it. This meant that
the blocking rotation has no purpose - it cannot cause the kubelet to
fail, and it *does* block the kubelet from starting static pods before
the api server becomes available.
The current block behavior causes a bootstrapped kubelet that is also
set to run static pods to wait several minutes before actually launching
the static pods, which means self-hosted masters using static pods have
a pointless delay on startup.
Since blocking rotation has no benefit and can't actually fail startup,
this commit removes the blocking behavior and simplifies the code at the
same time. The goroutine for rotation now completely owns the deadline,
the shouldRotate() method is removed, and the method that sets
rotationDeadline now returns it. We also explicitly guard against a
negative sleep interval and omit the message.
Should have no impact on bootstrapping except the removal of a long
delay on startup before static pods start.
Also add a guard condition where if the current cert in the store is
expired, we fall back to the bootstrap cert initially (we use the
bootstrap cert to communicate with the server). This is consistent with
when we don't have a cert yet.
This package contains only placeholders without actual implementation.
Since it is not currently under active development, remove it to avoid
unnecessary change needed whenever the interface is changed.
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
fixing array out of bound by checking initContainers instead of containers
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** : Fixes#58541
**Special notes for your reviewer**:
**Release note**:
```release-note
None
```
Automatic merge from submit-queue (batch tested with PRs 57973, 57990). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Set pids limit at pod level
**What this PR does / why we need it**:
Add a new Alpha Feature to set a maximum number of pids per Pod.
This is to allow the use case where cluster administrators wish
to limit the pids consumed per pod (example when running a CI system).
By default, we do not set any maximum limit, If an administrator wants
to enable this, they should enable `SupportPodPidsLimit=true` in the
`--feature-gates=` parameter to kubelet and specify the limit using the
`--pod-max-pids` parameter.
The limit set is the total count of all processes running in all
containers in the pod.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#43783
**Special notes for your reviewer**:
**Release note**:
```release-note
New alpha feature to limit the number of processes running in a pod. Cluster administrators will be able to place limits by using the new kubelet command line parameter --pod-max-pids. Note that since this is a alpha feature they will need to enable the "SupportPodPidsLimit" feature.
```
Having the field set in modifyCommonNamespaceOptions is misleading,
since for the actual container it is later unconditionally overwritten
to point to the sandbox container.
So let's move its setting to modifyHostOptionsForSandbox (renamed from
modifyHostNetworkOptionForSandbox as it's not about network only), since
that reflects what actually happens in practice.
This commit is purely a refactor, it doesn't change any behavior.
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Rename package deviceplugin => devicemanager.
**What this PR does / why we need it**:
Fixes#58795
/kind cleanup
Rename package `deviceplugin` to `devicemanager` for consistency.
We already have components named Container manager and CPU manager. The device plugin package similarly contains an interface called `Manager`. The fact that the manager has plugins is somewhat incidental to the purpose of the package itself.
Note that this rename only affects internal API. The external gRPC interface still exports a package called deviceplugin.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
kubelet/get-pod-from-path: unused param should be removed
**What this PR does / why we need it**:
I'm sorry that i have not notice this PR has been closed because of the error of test. And, i found it can't reopen again, so i open the other one, thank you!
https://github.com/kubernetes/kubernetes/pull/38184
I am so sorry for trouble with you, PTAL, thank you!
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
testcase to pkg/kubelet/cadvisor/util.go
**What this PR does / why we need it**:
testcase to pkg/kubelet/cadvisor/util.go
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add a container type to the runtime manager's container status
**What this PR does / why we need it**:
This is Step 1 of the "Debug Containers" feature proposed in #35584 and is hidden behind a feature gate. Debug containers exist as container status with no associated spec, so this new runtime label allows the kubelet to treat containers differently without relying on spec.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: cc #27140
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
**Integrating feedback**:
- [x] Remove Type field in favor of a help method
**Dependencies:**
- [x] #46261 Feature gate for Debug Containers
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add deprecation warnings for rktnetes flags
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#53601
**Special notes for your reviewer**:
**Release note**:
```release-note
rktnetes has been deprecated in favor of rktlet. Please see https://github.com/kubernetes-incubator/rktlet for more information.
```
This is part of the "Debug Containers" feature and is hidden behind
a feature gate. Debug containers have no stored spec, so this new
runtime label allows the kubelet to treat containers differently
without relying on spec.
Automatic merge from submit-queue (batch tested with PRs 58547, 57228, 58528, 58499, 58618). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add TLS min version flag
Adds a flag for controlling the minimum TLS level allowed.
/assign liggitt
@kubernetes/sig-node-pr-reviews @k8s-mirror-api-machinery-pr-reviews
```release-note
--tls-min-version on kubelet and kube-apiserver allow for configuring minimum TLS versions
```
We let dockershim implement the kubelet's internal (CRI) API as an
intermediary step before transitioning fully to communicate using gRPC.
Now that kubelet has been communicating to the runtime over gRPC for
multiple releases, we can safely retire the extra interface in
dockershim.
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
typo of errUnsuportedVersion
**What this PR does / why we need it**:
typo of errUnsuportedVersion in pkg/kubelet/cm/deviceplugin/types.go
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
```NONE
Automatic merge from submit-queue (batch tested with PRs 58422, 58229, 58421, 58435, 58475). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
kubelet: imagegc: exempt sandbox image
The image GC logic currently does not consider the sandbox image to be in-use by pods, since it isn't explicitly listed in the pod spec. However, it is trivially in-use if there are any pods running on the node.
This change adds logic to exempt the sandbox image from GC by always considering it as in-use.
**Reviewer Note**
I am changing `(m *kubeGenericRuntimeManager) GetImageRef` to return the ID always rather than the first tag if it exists. Seemed ok to me. Makes some error messages a little less readable in that the ID will be printed and not the tag. Just wanted to see what reviewers think about this.
@derekwaynecarr @dashpole
Automatic merge from submit-queue (batch tested with PRs 53631, 56960). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Remove unused code in UT files in pkg/
**What this PR does / why we need it**:
Remove unused code in UT files in pkg/ .
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Support for custom tls cipher suites in api server and kubelet
**What this PR does / why we need it**:
This pull request aims to solve the problem of users not able to set custom cipher suites in the api server.
Several users have requested this given that some default ciphers are vulnerable.
There is a discussion in #41038 of how to implement this. The options are:
- Setting a fixed list of ciphers, but users will have different requirements so a fixed list would be problematic.
- Letting the user set them by parameter, this requires adding a new parameter that could be pretty long with the list of all the ciphers.
I implemented the second option, if the ciphers are not passed by parameter, the Go default ones will be used (same behavior as now).
**Which issue this PR fixes**
fixes#41038
**Special notes for your reviewer**:
The ciphers in Go tls config are constants and the ones passed by parameters are a comma-separated list. I needed to create the `type CipherSuitesFlag` to support that conversion/mapping, because i couldn't find any way to do this type of reflection in Go.
If you think there is another way to implement this, let me know.
If you want to test it out, this is a ciphers combination i tested without the weak ones:
```
TLS_RSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_RSA_WITH_AES_128_GCM_SHA256,TLS_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA
```
If this is merged i will implement the same for the Kubelet.
**Release note**:
```release-note
kube-apiserver and kubelet now support customizing TLS ciphers via a `--tls-cipher-suites` flag
```
Automatic merge from submit-queue (batch tested with PRs 58216, 58193, 53033, 58219, 55921). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix device plugin endpoint UT
**What this PR does / why we need it**:
Fix some issues in device plugin endpoint UT.
**Which issue(s) this PR fixes**:
Fixes#55920
**Special notes for your reviewer**:
@jiayingz @RenaudWasTaken @lichuqiang PTAL.
/sig node
**Release note**:
```release-note
None
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Handle Unhealthy devices
Update node capacity with sum of both healthy and unhealthy devices.
Node allocatable reflect only healthy devices.
**What this PR does / why we need it**:
Currently node capacity only reflects healthy devices. Unhealthy devices are ignored totally while updating node status. This PR handles unhealthy devices while updating node status.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#57241
**Special notes for your reviewer**:
**Release note**:
<!-- Write your release note:
Handle Unhealthy devices
```release-note
Handle Unhealthy devices
```
/cc @tengqm @ConnorDoyle @jiayingz @vishh @jeremyeder @sjenning @resouer @ScorpioCPH @lichuqiang @RenaudWasTaken @balajismaniam
/sig node
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Enable on-demand collection of node metrics
**What this PR does / why we need it**:
This PR enables collecting node-level metrics on-demand. This is useful because it allows the kubelet to respond to resource pressure more quickly.
**Which issue(s) this PR fixes**:
Ref: #51745
**Release note**:
```release-note
NONE
```
/sig node
/priority important-soon
/kind bug
/assign @vishh @derekwaynecarr
cc @tallclair
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
fixed the some typo in eviction_manager
**What this PR does / why we need it**:
fixed some wrong typo in `eviction_manager.go`
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add stub device plugin for conformance e2e test
**What this PR does / why we need it**:
Add stub device plugin for conformance e2e test
- extend [device_plugin_stub](https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/cm/deviceplugin/device_plugin_stub.go) to support e2e test
- add test suite with this device-plugin-stub
- simulate more use cases by deploying some pods to request these resources
**Which issue this PR fixes**:
fixes#52861
**Special notes for your reviewer**:
@vishh @jiayingz PTAL.
**Release note**:
```release-note
None
```
Add a new Alpha Feature to set a maximum number of pids per Pod.
This is to allow the use case where cluster administrators wish
to limit the pids consumed per pod (example when running a CI system).
By default, we do not set any maximum limit, If an administrator wants
to enable this, they should enable `SupportPodPidsLimit=true` in the
`--feature-gates=` parameter to kubelet and specify the limit using the
`--pod-max-pids` parameter.
The limit set is the total count of all processes running in all
containers in the pod.
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix TestCadvisorListPodStats failure under mac/darwin
**What this PR does / why we need it**:
GetPodCgroupNameSuffix is not really implemented under darwin
(or windows for that matter). So let's just skip over the check
for CPU and Memory if that is not set.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#57636
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 57823, 58091, 58093, 58096, 57020). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
ignore images in used by running containers when GC
**What this PR does / why we need it**:
Let kubelet not attempt to remove images being used by running containers.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#57006
**Special notes for your reviewer**:
@kubernetes/sig-node-pr-reviews
**Release note**:
```release-note
ignore images in used by running containers when GC
```
The `--docker-disable-shared-pid` flag will be removed once per-pod
configurable process namespace sharing becomes available. Mark it
deprecated to notify cluster admins.
Automatic merge from submit-queue (batch tested with PRs 57733, 57613, 57953). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
[eviction manager]fix type error
**What this PR does / why we need it**:
It should not wrong hint messages when create memory threshold notifier failed
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
dockershim: bump the minimum supported docker version to 1.11
Drop the 1.10 compatibilty code.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
remove duplicate function getBuggyHostportChain
**What this PR does / why we need it**:
remove `TODO remove this after release 1.9, please refer https://github.com/kubernetes/kubernetes/pull/55153`
function `getBuggyHostportChain` does bad conversion on HostPort from int32 to string, now that `getHostportChain` does right, we remove function `getBuggyHostportChain` .
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix the wrong code comment
**What this PR does / why we need it**:
Fix the wrong code comment
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#55608
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Move scheduler out of plugin directory
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
This is but one step toward resolving the referenced issue.
/ref #57579
**Special notes for your reviewer**:
**Release note**:
```release-note
Default scheduler code is moved out of the plugin directory.
plugin/pkg/scheduler -> pkg/scheduler
plugin/cmd/kube-scheduler -> cmd/kube-scheduler
```
/sig scheduling
This moves plugin/pkg/scheduler to pkg/scheduler and
plugin/cmd/kube-scheduler to cmd/kube-scheduler.
Bulk of the work was done with gomvpkg, except for kube-scheduler main
package.
Credential provider is useful without the v1 API, move the only
dependency out so that we can more easily move credential provider to a
utility library in the future (other callers besides Kubelet may need to
load pull secrets like Docker).
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Bump pause container used by kubelet and tests to 3.1
This updates the version of the pause container used by the kubelet and
various test utilities to 3.1.
**What this PR does / why we need it**: The pause container hasn't been rebuilt in quite a while and needs an update to reap zombies (#50865) and for schema2 manifest (#56253).
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#50865, Fixes#56253
**Special notes for your reviewer**:
**Release note**:
```release-note
The kubelet uses a new release 3.1 of the pause container with the Docker runtime. This version will clean up orphaned zombie processes that it inherits.
```
Automatic merge from submit-queue (batch tested with PRs 57533, 57524). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Make ConfigOK status messages more human readable
This makes the ConfigOK status messages for dynamic config more human readable by including the path (e.g. SelfLink) to the object. The messages used to include the UID, but this was kind of useless, because there's no way to GET an object by UID.
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 57533, 57524). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
periodically check whether assigned kubelet config should become last-known-good
Fixes#57808
Previously, the last-known-good was only updated on Kubelet restart. This has been on my todo list for a while, good to finally have a PR up.
Previously we could have this scenario, which is fixed by this PR:
- lkg is set to local
- we set config A
- config A passes trial period, but nothing caused Kubelet to restart
- we set config B, which turns out to be invalid
- Kubelet will fall back to local, because lkg was never updated
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Update gengo version to include goimports formatter
Update gengo which now uses goimports to format code and organize imports.
Fixes#55542
**Special notes for your reviewer**:
Updates version of k8s.io/gengo
Takes new dependency on golang.org/x/tools/imports and golang.org/x/tools/go/ast/astutil
**Release Notes**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 57572, 57512, 57770). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
More default fixups for Kubelet flags
Similar to #57621, this fixes some other Kubelet flags that were
defaulted wrong.
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 57746, 57621, 56839, 57464). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
security_context_test.go(TestVerifyRunAsNonRoot): add more test cases
**What this PR does / why we need it**:
In #56503 we modified `VerifyRunAsNonRoot` function add add one more argument. As [was requested](https://github.com/kubernetes/kubernetes/pull/56503#discussion_r153870821) by @simo5, this change should have a unit test.
This PR adds this test and also some more to cover more execution paths.
**Release note**:
```release-note
NONE
```
PTAL @pweil- @liggitt
CC @simo5
Automatic merge from submit-queue (batch tested with PRs 57651, 56411, 56779, 57523, 57624). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Replace --init-config-dir with --config
Rather than a directory with magic names, just give the Kubelet a file path.
Was originally in #55718, but I'm splitting it out for clarity.
Fixes#57763
```release-note
The alpha `--init-config-dir` flag has been removed. Instead, use the `--config` flag to reference a kubelet configuration file directly.
```
Automatic merge from submit-queue (batch tested with PRs 49856, 56257, 57027, 57695, 57432). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Change to pkg/util/node.UpdateNodeStatus
**What this PR does / why we need it**:
> // TODO: Change to pkg/util/node.UpdateNodeStatus.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
/cc @brendandburns @dchen1107 @lavalamp
**Release note**:
```release-note
None
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
new testcase to cgroup_manager_linux.go
a new test case to adaptName(), for testing "cgroupManagerType != libcontainerSystemd"