Automatic merge from submit-queue (batch tested with PRs 54040, 52503). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Get fallback termination msg from docker when using journald log driver
**What this PR does / why we need it**:
When using the legacy docker container runtime and when a container has `terminationMessagePolicy=FallbackToLogsOnError` and when docker is configured with a log driver other than `json-log` (such as `journald`), the kubelet should not try to get the container's log from the json log file (since it's not there) but should instead ask docker for the logs.
**Which issue this PR fixes** fixes#52502
**Special notes for your reviewer**:
**Release note**:
```release-note
Fixed log fallback termination messages when using docker with journald log driver
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix pkg/ depends on cmd/ problems
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
Partial fix for https://github.com/kubernetes/kubernetes/issues/53341
**Special notes for your reviewer**:
No logic changes, Just moving things around
**Release note**:
```release-note
NONE
```
Revert "Merge pull request #51857 from kubernetes/revert-51307-kc-type-refactor"
This reverts commit 9d27d92420, reversing
changes made to 2e69d4e625.
See original: #51307
We punted this from 1.8 so it could go through an API review. The point
of this PR is that we are trying to stabilize the kubeletconfig API so
that we can move it out of alpha, and unblock features like Dynamic
Kubelet Config, Kubelet loading its initial config from a file instead
of flags, kubeadm and other install tools having a versioned API to rely
on, etc.
We shouldn't rev the version without both removing all the deprecated
junk from the KubeletConfiguration struct, and without (at least
temporarily) removing all of the fields that have "Experimental" in
their names. It wouldn't make sense to lock in to deprecated fields.
"Experimental" fields can be audited on a 1-by-1 basis after this PR,
and if found to be stable (or sufficiently alpha-gated), can be restored
to the KubeletConfiguration without the "Experimental" prefix.
Automatic merge from submit-queue (batch tested with PRs 53403, 53233). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Remove containers from deleted pods once containers have exited
Issue #51899
Since container deletion is currently done through periodic garbage collection every 30 seconds, it takes a long time for pods to be deleted, and causes the kubelet to send all delete pod requests at the same time, which has performance issues. This PR makes the kubelet actively remove containers of deleted pods rather than wait for them to be removed in periodic garbage collection.
/release-note-none
Automatic merge from submit-queue (batch tested with PRs 51021, 53225, 53094, 53219). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Change ImageGCManage to consume ImageFS stats from StatsProvider
Fixes#53083.
**Release note**:
```
Change ImageGCManage to consume ImageFS stats from StatsProvider
```
/assign @Random-Liu
When using the legacy docker container runtime and when a container has
terminationMessagePolicy=FallbackToLogsOnError and when docker is
configured with a log driver other than json-log (such as journald),
the kubelet should not try to get the container's log from the
json log file (since it's not there) but should instead ask docker for
the logs.
Stop supporting the "nsenter" exec handler. Only the Docker native exec
handler is supported.
The flag was deprecated in Kubernetes 1.6 and is safe to remove
in Kubernetes 1.9 according to the deprecation policy.
Automatic merge from submit-queue (batch tested with PRs 52176, 43152). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
Eliminate hangs/throttling of node heartbeat
Fixes https://github.com/kubernetes/kubernetes/issues/48638Fixes#50304
Stops kubelet from wedging when updating node status if unable to establish tcp connection.
Notes that this only affects the node status loop. The pod sync loop would still hang until the dead TCP connections timed out, so more work is needed to keep the sync loop responsive in the face of network issues, but this change lets existing pods coast without the node controller trying to evict them
```release-note
kubelet to master communication when doing node status updates now has a timeout to prevent indefinite hangs
```
Automatic merge from submit-queue (batch tested with PRs 51186, 50350, 51751, 51645, 51837)
Set up DNS server in containerized mounter path
During NFS/GlusterFS mount, it requires to have DNS server to be able to
resolve service name. This PR gets the DNS server ip from kubelet and
add it to the containerized mounter path. So if containerized mounter is
used, service name could be resolved during mount
**Release note**:
```release-note
Allow DNS resolution of service name for COS using containerized mounter. It fixed the issue with DNS resolution of NFS and Gluster services.
```
Automatic merge from submit-queue (batch tested with PRs 51186, 50350, 51751, 51645, 51837)
Wait for container cleanup before deletion
We should wait to delete pod API objects until the pod's containers have been cleaned up. See issue: #50268 for background.
This changes the kubelet container gc, which deletes containers belonging to pods considered "deleted".
It adds two conditions under which a pod is considered "deleted", allowing containers to be deleted:
Pods where deletionTimestamp is set, and containers are not running
Pods that are evicted
This PR also changes the function PodResourcesAreReclaimed by making it return false if containers still exist.
The eviction manager will wait for containers of previous evicted pod to be deleted before evicting another pod.
The status manager will wait for containers to be deleted before removing the pod API object.
/assign @vishh
During NFS/GlusterFS mount, it requires to have DNS server to be able to
resolve service name. This PR gets the DNS server ip from kubelet and
add it to the containerized mounter path. So if containerized mounter is
used, service name could be resolved during mount
Kubelet makes sure that /var/lib/kubelet is rshared when it starts.
If not, it bind-mounts it with rshared propagation to containers
that mount volumes to /var/lib/kubelet can benefit from mount propagation.
Automatic merge from submit-queue (batch tested with PRs 50932, 49610, 51312, 51415, 50705)
Deprecation warnings for auto detecting cloud providers
**What this PR does / why we need it**:
Adds deprecation warnings for auto detecting cloud providers. As part of the initiative for out-of-tree cloud providers, this feature is conflicting since we're shifting the dependency of kubernetes core into cAdvisor. In the future kubelets should be using `--cloud-provider=external` or no cloud provider at all.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#50986
**Special notes for your reviewer**:
NOTE: I still have to coordinate with sig-node and kubernetes-dev to get approval for this deprecation, I'm only opening this PR since we're close to code freeze and it's something presentable.
**Release note**:
```release-note
Deprecate auto detecting cloud providers in kubelet. Auto detecting cloud providers go against the initiative for out-of-tree cloud providers as we'll now depend on cAdvisor integrations with cloud providers instead of the core repo. In the near future, `--cloud-provider` for kubelet will either be an empty string or `external`.
```
Automatic merge from submit-queue (batch tested with PRs 50932, 49610, 51312, 51415, 50705)
Implement StatsProvider interface using cadvisor
Ref: https://github.com/kubernetes/kubernetes/issues/46984
- This PR changes the `StatsProvider` interface in `pkg/kubelet/server/stats` so that it can provide container stats from either cadvisor or CRI, and the summary API can consume the stats without knowing how they are provided.
- The `StatsProvider` struct in the newly added package `pkg/kubelet/stats` implements part of the `StatsProvider` interface in `pkg/kubelet/server/stats`.
- In `pkg/kubelet/stats`,
- `stats_provider.go`: implements the node level stats and provides the entry point for this package.
- `cadvisor_stats_provider.go`: implements the container level stats using cadvisor.
- `cri_stats_provider.go`: implements the container level stats using CRI.
- `helper.go`: utility functions shared by the above three components.
- There should be no user visible behaviors change in this PR.
- A follow up PR will implement the StatsProvider interface using CRI.
**Release note**:
```
None
```
/assign @yujuhong
/assign @WIZARD-CXY
Automatic merge from submit-queue (batch tested with PRs 49284, 49555, 47639, 49526, 49724)
skip WaitForAttachAndMount for terminated pods in syncPod
Fixes https://github.com/kubernetes/kubernetes/issues/49663
I tried to tread lightly with a small localized change because this needs to be picked to 1.7 and 1.6 as well.
I suspect this has been as issue since we started unmounting volumes on pod termination https://github.com/kubernetes/kubernetes/pull/37228
xref openshift/origin#14383
@derekwaynecarr @eparis @smarterclayton @saad-ali @jwforres
/release-note-none
Automatic merge from submit-queue (batch tested with PRs 49651, 49707, 49662, 47019, 49747)
Add support for `no_new_privs` via AllowPrivilegeEscalation
**What this PR does / why we need it**:
Implements kubernetes/community#639
Fixes#38417
Adds `AllowPrivilegeEscalation` and `DefaultAllowPrivilegeEscalation` to `PodSecurityPolicy`.
Adds `AllowPrivilegeEscalation` to container `SecurityContext`.
Adds the proposed behavior to `kuberuntime`, `dockershim`, and `rkt`. Adds a bunch of unit tests to ensure the desired default behavior and that when `DefaultAllowPrivilegeEscalation` is explicitly set.
Tests pass locally with docker and rkt runtimes. There are also a few integration tests with a `setuid` binary for sanity.
**Release note**:
```release-note
Adds AllowPrivilegeEscalation to control whether a process can gain more privileges than it's parent process
```
Automatic merge from submit-queue (batch tested with PRs 48976, 49474, 40050, 49426, 49430)
Use presence of kubeconfig file to toggle standalone mode
Fixes#40049
```release-note
The deprecated --api-servers flag has been removed. Use --kubeconfig to provide API server connection information instead. The --require-kubeconfig flag is now deprecated. The default kubeconfig path is also deprecated. Both --require-kubeconfig and the default kubeconfig path will be removed in Kubernetes v1.10.0.
```
/cc @kubernetes/sig-cluster-lifecycle-misc @kubernetes/sig-node-misc
Automatic merge from submit-queue (batch tested with PRs 48976, 49474, 40050, 49426, 49430)
Remove duplicated import and wrong alias name of api package
**What this PR does / why we need it**:
**Which issue this PR fixes**: fixes#48975
**Special notes for your reviewer**:
/assign @caesarxuchao
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Remove flags low-diskspace-threshold-mb and outofdisk-transition-frequency
issue: #48843
This removes two flags replaced by the eviction manager. These have been depreciated for two releases, which I believe correctly follows the kubernetes depreciation guidelines.
```release-note
Remove depreciated flags: --low-diskspace-threshold-mb and --outofdisk-transition-frequency, which are replaced by --eviction-hard
```
cc @mtaufen since I am changing kubelet flags
cc @vishh @derekwaynecarr
/sig node
Automatic merge from submit-queue (batch tested with PRs 48636, 49088, 49251, 49417, 49494)
Fix issues for local storage allocatable feature
This PR fixes the following issues:
1. Use ResourceStorageScratch instead of ResourceStorage API to represent
local storage capacity
2. In eviction manager, use container manager instead of node provider
(kubelet) to retrieve the node capacity and reserved resources. Node
provider (kubelet) has a feature gate so that storagescratch information
may not be exposed if feature gate is not set. On the other hand,
container manager has all the capacity and allocatable resource
information.
This PR fixes issue #47809
Replaces use of --api-servers with --kubeconfig in Kubelet args across
the turnup scripts. In many cases this involves generating a kubeconfig
file for the Kubelet and placing it in the correct location on the node.
Automatic merge from submit-queue (batch tested with PRs 48981, 47316, 49180)
Added golint check for pkg/kubelet.
**What this PR does / why we need it**:
Added golint check for pkg/kubelet, and make golint happy.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#47315
**Release note**:
```release-note-none
```
This PR fixes the following issues:
1. Use ResourceStorageScratch instead of ResourceStorage API to represent
local storage capacity
2. In eviction manager, use container manager instead of node provider
(kubelet) to retrieve the node capacity and reserved resources. Node
provider (kubelet) has a feature gate so that storagescratch information
may not be exposed if feature gate is not set. On the other hand,
container manager has all the capacity and allocatable resource
information.
Since v1.5 and the removal of --configure-cbr0:
0800df74ab "Remove the legacy networking mode --configure-cbr0"
kubelet hasn't done any shaping operations internally. They
have all been delegated to network plugins like kubenet or
external CNI plugins. But some shaping code was still left
in kubelet, so remove it now that it's unused.
Centralize Capacity discovery of standard resources in Container manager.
Have storage derive node capacity from container manager.
Move certain cAdvisor interfaces to the cAdvisor package in the process.
This patch fixes a bug in container manager where it was writing to a map without synchronization.
Signed-off-by: Vishnu kannan <vishnuk@google.com>
Automatic merge from submit-queue (batch tested with PRs 47694, 47772, 47783, 47803, 47673)
Make different container runtimes constant
Make different container runtimes constant to avoid hardcode
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 47000, 47188, 47094, 47323, 47124)
fix sync loop health check
This PR will do error logging about the fall behind sync for kubelet instead of sync loop healthz checking.
The reason is kubelet can not do sync loop and therefore can not update sync loop time when there is any runtime error, such as docker hung.
When there is any runtime error, according to current implementation, kubelet will not do sync operation and thus kubelet's sync loop time will not be updated. This will make when there is any runtime error, kubelet will also return non 200 response status code when accessing healthz endpoint. This is contrary with #37865 which prevents kubelet from being killed when docker hangs.
**Release note**:
```release-note
fix sync loop health check with seperating runtime errors
```
/cc @yujuhong @Random-Liu @dchen1107
This PR adds the support for allocatable local storage (scratch space).
This feature is only for root file system which is shared by kubernetes
componenets, users' containers and/or images. User could use
--kube-reserved flag to reserve the storage for kube system components.
If the allocatable storage for user's pods is used up, some pods will be
evicted to free the storage resource.
Automatic merge from submit-queue (batch tested with PRs 46635, 45619, 46637, 45059, 46415)
Certificate rotation for kubelet server certs.
Replaces the current kubelet server side self signed certs with certs signed by
the Certificate Request Signing API on the API server. Also renews expiring
kubelet server certs as expiration approaches.
Two Points:
1. With `--feature-gates=RotateKubeletServerCertificate=true` set, the kubelet will
request a certificate during the boot cycle and pause waiting for the request to
be satisfied.
2. In order to have the kubelet's certificate signing request auto approved,
`--insecure-experimental-approve-all-kubelet-csrs-for-group=` must be set on
the cluster controller manager. There is an improved mechanism for auto
approval [proposed](https://github.com/kubernetes/kubernetes/issues/45030).
**Release note**:
```release-note
With `--feature-gates=RotateKubeletServerCertificate=true` set, the kubelet will
request a server certificate from the API server during the boot cycle and pause
waiting for the request to be satisfied. It will continually refresh the certificate as
the certificates expiration approaches.
```
Replaces the current kubelet server side self signed certs with certs
signed by the Certificate Request Signing API on the API server. Also
renews expiring kubelet server certs as expiration approaches.
Automatic merge from submit-queue (batch tested with PRs 46124, 46434, 46089, 45589, 46045)
Support TCP type runtime endpoint for kubelet
**What this PR does / why we need it**:
Currently the grpc server for kubelet and dockershim has a hardcoded endpoint: unix socket '/var/run/dockershim.sock', which is not applicable on non-unix OS.
This PR is to support TCP endpoint type besides unix socket.
**Which issue this PR fixes**
This is a first attempt to address issue https://github.com/kubernetes/kubernetes/issues/45927
**Special notes for your reviewer**:
Before this change, running on Windows node results in:
```
Container Manager is unsupported in this build
```
After adding the cm stub, error becomes:
```
listen unix /var/run/dockershim.sock: socket: An address incompatible with the requested protocol was used.
```
This PR is to fix those two issues.
After this change, still meets 'seccomp' related issue when running on Windows node, needs more updates later.
**Release note**:
Automatic merge from submit-queue
Fix some typo of comment in kubelet.go
**What this PR does / why we need it**:
The PR is to fix some typo in kubelet.go
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
N/A
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 46022, 46055, 45308, 46209, 43590)
Eviction does not evict unless the previous pod has been cleaned up
Addresses #43166
This PR makes two main changes:
First, it makes the eviction loop re-trigger immediately if there may still be pressure. This way, if we already waited 10 seconds to delete a pod, we dont need to wait another 10 seconds for the next synchronize call.
Second, it waits for the pod to be cleaned up (including volumes, cgroups, etc), before moving on to the next synchronize call. It has a timeout for this operation currently set to 30 seconds.
Automatic merge from submit-queue
Reorganize kubelet tree so apis can be independently versioned
@yujuhong @lavalamp @thockin @bgrant0607
This is an example of how we might reorganize `pkg/kubelet` so the apis it exposes can be independently versioned. This would also provide a logical place to put the `KubeletConfiguration` type, which currently lives in `pkg/apis/componentconfig`; it could live in e.g. `pkg/kubelet/apis/config` instead.
Take a look when you have a chance and let me know what you think. The most significant change in this PR is reorganizing `pkg/kubelet/api` to `pkg/kubelet/apis`, the rest is pretty much updating import paths and `BUILD` files.
Automatic merge from submit-queue (batch tested with PRs 45685, 45572, 45624, 45723, 45733)
Remove unused fields from Kubelet struct
Just a small attempt to clean up some unused fields in the kubelet struct. This doesn't make any actual code changes.
/assign @mtaufen
Automatic merge from submit-queue
Enable shared PID namespace by default for docker pods
**What this PR does / why we need it**: This PR enables PID namespace sharing for docker pods by default, bringing the behavior of docker in line with the other CRI runtimes when used with docker >= 1.13.1.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: ref #1615
**Special notes for your reviewer**: cc @dchen1107 @yujuhong
**Release note**:
```release-note
Kubernetes now shares a single PID namespace among all containers in a pod when running with docker >= 1.13.1. This means processes can now signal processes in other containers in a pod, but it also means that the `kubectl exec {pod} kill 1` pattern will cause the pod to be restarted rather than a single container.
```
Automatic merge from submit-queue (batch tested with PRs 45453, 45307, 44987)
Migrate the docker client code from dockertools to dockershim
Move docker client code from dockertools to dockershim/libdocker. This includes
DockerInterface (renamed to Interface), FakeDockerClient, etc.
This is part of #43234
Automatic merge from submit-queue (batch tested with PRs 45508, 44258, 44126, 45441, 45320)
cloud initialize node in external cloud controller
@thockin This PR adds support in the `cloud-controller-manager` to initialize nodes (instead of kubelet, which did it previously)
This also adds support in the kubelet to skip node cloud initialization when `--cloud-provider=external`
Specifically,
Kubelet
1. The kubelet has a new flag called `--provider-id` which uniquely identifies a node in an external DB
2. The kubelet sets a node taint - called "ExternalCloudProvider=true:NoSchedule" if cloudprovider == "external"
Cloud-Controller-Manager
1. The cloud-controller-manager listens on "AddNode" events, and then processes nodes that starts with that above taint. It performs the cloud node initialization steps that were previously being done by the kubelet.
2. On addition of node, it figures out the zone, region, instance-type, removes the above taint and updates the node.
3. Then periodically queries the cloudprovider for node addresses (which was previously done by the kubelet) and updates the node if there are new addresses
```release-note
NONE
```
Automatic merge from submit-queue
adds log when gpuManager.start() failed
If gpuManager.start() returns error, there is no log.
We confused with scheduler do not schedule any pod(with gpu) to one node.
kubectl describe node xxx shows there is no gpu on that node, because the gpu driver do not work on that node, gpuManager.start() failed, but we can not see anything in log.
Automatic merge from submit-queue (batch tested with PRs 45316, 45341)
Pass NoOpLegacyHost to dockershim in --experimental-dockershim mode
This allows dockershim to use network plugins, if needed.
/cc @Random-Liu