This PR fixes the following issues:
1. Use ResourceStorageScratch instead of ResourceStorage API to represent
local storage capacity
2. In eviction manager, use container manager instead of node provider
(kubelet) to retrieve the node capacity and reserved resources. Node
provider (kubelet) has a feature gate so that storagescratch information
may not be exposed if feature gate is not set. On the other hand,
container manager has all the capacity and allocatable resource
information.
This is used by integrators that want to perform partial overrides of
key interfaces. Refactors the run() method to fit the existing style and
preserve the existing behavior, but allow (for instance) client
bootstrap and cert refresh even when some dependencies are injected.
Automatic merge from submit-queue (batch tested with PRs 47694, 47772, 47783, 47803, 47673)
Make different container runtimes constant
Make different container runtimes constant to avoid hardcode
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 47915, 47856, 44086, 47575, 47475)
kubelet should resume csr bootstrap
Right now the kubelet creates a new csr object with the same key every
time it restarts during the bootstrap process. It should resume with the
old csr object if it exists. To do this the name of the csr object must
be stable.
Issue https://github.com/kubernetes/kubernetes/issues/47855
Automatic merge from submit-queue (batch tested with PRs 47922, 47195, 47241, 47095, 47401)
Run cAdvisor on the same interface as kubelet
**What this PR does / why we need it**:
cAdvisor currently binds to all interfaces. Currently the only
solution is to use iptables to block access to the port. We
are better off making cAdvisor to bind to the interface that
kubelet uses for better security.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
Fixes#11710
**Special notes for your reviewer**:
**Release note**:
```release-note
cAdvisor binds only to the interface that kubelet is running on instead of all interfaces.
```
Right now the kubelet creates a new csr object with the same key every
time it restarts during the bootstrap process. It should resume with the
old csr object if it exists. To do this the name of the csr object must
be stable. Also using a list watch here eliminates a race condition
where a watch event is missed and the kubelet stalls.
Automatic merge from submit-queue (batch tested with PRs 46327, 47166)
Fixed typo in comments.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes # N/A
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 46327, 47166)
mark --network-plugin-dir deprecated for kubelet
**What this PR does / why we need it**:
**Which issue this PR fixes** : fixes#43967
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
It was discussed and agreed by sig-storage that
this flag causes unnecessary confusion and is hard to keep
synchornized with controller's attach/detach functionality.
cAdvisor currently binds to all interfaces. Currently the only
solution is to use iptables to block access to the port. We
are better off making cAdvisor to bind to the interface that
kubelet uses for better security.
Fixes#11710
Automatic merge from submit-queue (batch tested with PRs 43005, 46660, 46385, 46991, 47103)
Consolidate sysctl commands for kubelet
**What this PR does / why we need it**:
These commands are important enough to be in the Kubelet itself.
By default, Ubuntu 14.04 and Debian Jessie have these set to 200 and
20000. Without this setting, nodes are limited in the number of
containers that they can start.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#26005
**Special notes for your reviewer**:
I had a difficult time writing tests for this. It is trivial to create a fake sysctl for testing, but the Kubelet does not have any tests for the prior settings.
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 46972, 42829, 46799, 46802, 46844)
promote tls-bootstrap to beta
last commit of this PR.
Towards https://github.com/kubernetes/kubernetes/issues/46999
```release-note
Promote kubelet tls bootstrap to beta. Add a non-experimental flag to use it and deprecate the old flag.
```
Automatic merge from submit-queue
Add local storage (scratch space) allocatable support
This PR adds the support for allocatable local storage (scratch space).
This feature is only for root file system which is shared by kubernetes
componenets, users' containers and/or images. User could use
--kube-reserved flag to reserve the storage for kube system components.
If the allocatable storage for user's pods is used up, some pods will be
evicted to free the storage resource.
This feature is part of local storage capacity isolation and described in the proposal https://github.com/kubernetes/community/pull/306
**Release note**:
```release-note
This feature exposes local storage capacity for the primary partitions, and supports & enforces storage reservation in Node Allocatable
```
Automatic merge from submit-queue (batch tested with PRs 46726, 41912, 46695, 46034, 46551)
Rotate kubelet client certificate.
Changes the kubelet so it bootstraps off the cert/key specified in the
config file and uses those to request new cert/key pairs from the
Certificate Signing Request API, as well as rotating client certificates
when they approach expiration.
Default behavior is for client certificate rotation to be disabled. If enabled
using a command line flag, the kubelet exits each time the certificate is
rotated. I tried to use `GetCertificate` in [tls.Config](https://golang.org/pkg/crypto/tls/#Config) but it is only called
on the server side of connections. Then I tried `GetClientCertificate`,
but it is new in 1.8.
**Release note**
```release-note
With --feature-gates=RotateKubeletClientCertificate=true set, the kubelet will
request a client certificate from the API server during the boot cycle and pause
waiting for the request to be satisfied. It will continually refresh the certificate
as the certificates expiration approaches.
```
Automatic merge from submit-queue (batch tested with PRs 46801, 45184, 45930, 46192, 45563)
adds log when --kubeconfig with wrong config
**What this PR does / why we need it**:
easy for troubleshooting
I have set --kubeconfig==/etc/kubernetes/kubelet.conf when copy & paste(the file path is wrong “==/etc/kubernetes/kubelet.conf”), but kubelet start with no error log. I don't know what happend.
**Release note**:
```release-note
NONE
```
This PR adds the check for local storage request when admitting pods. If
the local storage request exceeds the available resource, pod will be
rejected.
This PR adds the support for allocatable local storage (scratch space).
This feature is only for root file system which is shared by kubernetes
componenets, users' containers and/or images. User could use
--kube-reserved flag to reserve the storage for kube system components.
If the allocatable storage for user's pods is used up, some pods will be
evicted to free the storage resource.
Changes the kubelet so it bootstraps off the cert/key specified in the
config file and uses those to request new cert/key pairs from the
Certificate Signing Request API, as well as rotating client certificates
when they approach expiration.
Automatic merge from submit-queue (batch tested with PRs 46076, 43879, 44897, 46556, 46654)
Local storage plugin
**What this PR does / why we need it**:
Volume plugin implementation for local persistent volumes. Scheduler predicate will direct already-bound PVCs to the node that the local PV is at. PVC binding still happens independently.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*:
Part of #43640
**Release note**:
```
Alpha feature: Local volume plugin allows local directories to be created and consumed as a Persistent Volume. These volumes have node affinity and pods will only be scheduled to the node that the volume is at.
```
Automatic merge from submit-queue (batch tested with PRs 46635, 45619, 46637, 45059, 46415)
Certificate rotation for kubelet server certs.
Replaces the current kubelet server side self signed certs with certs signed by
the Certificate Request Signing API on the API server. Also renews expiring
kubelet server certs as expiration approaches.
Two Points:
1. With `--feature-gates=RotateKubeletServerCertificate=true` set, the kubelet will
request a certificate during the boot cycle and pause waiting for the request to
be satisfied.
2. In order to have the kubelet's certificate signing request auto approved,
`--insecure-experimental-approve-all-kubelet-csrs-for-group=` must be set on
the cluster controller manager. There is an improved mechanism for auto
approval [proposed](https://github.com/kubernetes/kubernetes/issues/45030).
**Release note**:
```release-note
With `--feature-gates=RotateKubeletServerCertificate=true` set, the kubelet will
request a server certificate from the API server during the boot cycle and pause
waiting for the request to be satisfied. It will continually refresh the certificate as
the certificates expiration approaches.
```
Automatic merge from submit-queue
kubelet: group all container-runtime-specific flags/options into a separate struct
They don't belong in the KubeletConfig.
This addresses #43253
Automatic merge from submit-queue
fixtypo
**What this PR does / why we need it**:
fix typo seperated -> separated
**Release note**:
```release-note
None
```
Replaces the current kubelet server side self signed certs with certs
signed by the Certificate Request Signing API on the API server. Also
renews expiring kubelet server certs as expiration approaches.
Automatic merge from submit-queue
Enable shared PID namespace by default for docker pods
**What this PR does / why we need it**: This PR enables PID namespace sharing for docker pods by default, bringing the behavior of docker in line with the other CRI runtimes when used with docker >= 1.13.1.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: ref #1615
**Special notes for your reviewer**: cc @dchen1107 @yujuhong
**Release note**:
```release-note
Kubernetes now shares a single PID namespace among all containers in a pod when running with docker >= 1.13.1. This means processes can now signal processes in other containers in a pod, but it also means that the `kubectl exec {pod} kill 1` pattern will cause the pod to be restarted rather than a single container.
```
Automatic merge from submit-queue (batch tested with PRs 45453, 45307, 44987)
Migrate the docker client code from dockertools to dockershim
Move docker client code from dockertools to dockershim/libdocker. This includes
DockerInterface (renamed to Interface), FakeDockerClient, etc.
This is part of #43234
Automatic merge from submit-queue (batch tested with PRs 45362, 45159, 45321, 45238)
expose kubelet authentication and authorization builders
The kubelet authentication and authorization builder methods are useful for consumers.
@liggitt
Automatic merge from submit-queue
kubelet/get-pods-from-path: correct description of implemention
**What this PR does / why we need it**:
I find this description does not follow the current implementation, it should be describe like this according to my understanding of the source code.
These commands are important enough to be in the Kubelet itself.
By default, Ubuntu 14.04 and Debian Jessie have these set to 200 and
20000. Without this setting, nodes are limited in the number of
containers that they can start.
Automatic merge from submit-queue
Delete "hard-coded" default value in flags usage.
**What this PR does / why we need it**:
Some flags of kubernetes components have "hard-coded" default values in their usage info. In fact, [pflag pkg](https://github.com/kubernetes/kubernetes/blob/master/vendor/github.com/spf13/pflag/flag.go#L602-L608) has already added a string `(default value)` automatically in the usage info if the flag is initialized. Then we don't need to hard-code the default value in usage info. After this PR, if we want to update the default value of a flag, we only need to update the flag where it is initialized. `pflag` will update the usage info for us. This will avoid inconsistency.
For example:
Before
```
kubelet -h
...
--node-status-update-frequency duration Specifies how often kubelet posts node status to master. Note: be cautious when changing the constant, it must work with nodeMonitorGracePeriod in nodecontroller. Default: 10s (default 10s)
...
```
After
```
kubelet -h
...
--node-status-update-frequency duration Specifies how often kubelet posts node status to master. Note: be cautious when changing the constant, it must work with nodeMonitorGracePeriod in nodecontroller. (default 10s)
...
```
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
This PR doesn't delete some "hard-coded" default values because they are not explicitly initialized. We still need to hard-code them to give users friendly info.
```
--allow-privileged If true, allow containers to request privileged mode. [default=false]
```
**Release note**:
```release-note
None
```
Automatic merge from submit-queue
Add dockershim only mode
This PR added a `experimental-dockershim` hidden flag in kubelet to run dockershim only.
We introduce this flag mainly for cri validation test. In the future we should compile dockershim into another binary.
@yujuhong @feiskyer @xlgao-zju
/cc @kubernetes/sig-node-pr-reviews
Automatic merge from submit-queue
add rancher credential provider
This adds rancher as a credential provider in kubernetes.
@erictune This might be a good opportunity to discuss adding a provision for people to have their own credential providers that is similar to the new cloud provider changes (https://github.com/kubernetes/community/pull/128). WDYT?
```
release-note
Added Rancher Credential Provider to use Rancher Registry credentials when running in a Rancher cluster
```
Kubelet flags are not necessarily appropriate for the KubeletConfiguration
object. For example, this PR also removes HostnameOverride and NodeIP
from KubeletConfiguration. This is a preleminary step to enabling Nodes
to share configurations, as part of the dynamic Kubelet configuration
feature (#29459). Fields that must be unique for each node inhibit
sharing, because their values, by definition, cannot be shared.
Automatic merge from submit-queue
kubelet: change image-gc-high-threshold below docker dm.min_free_space
docker dm.min_free_space defaults to 10%, which "specifies the min free space percent in a thin pool require for new device creation to succeed....Whenever a new a thin pool device is created (during docker pull or during container creation), the Engine checks if the minimum free space is available. If sufficient space is unavailable, then device creation fails and any relevant docker operation fails." [1]
This setting is preventing the storage usage to cross the 90% limit. However, image GC is expected to kick in only beyond image-gc-high-threshold. The image-gc-high-threshold has a default value of 90%, and hence GC never triggers. If image-gc-high-threshold is set to a value lower than (100 - dm.min_free_space)%, GC triggers.
xref https://bugzilla.redhat.com/show_bug.cgi?id=1408309
```release-note
changed kubelet default image-gc-high-threshold to 85% to resolve a conflict with default settings in docker that prevented image garbage collection from resolving low disk space situations when using devicemapper storage.
```
@derekwaynecarr @sdodson @rhvgoyal
Automatic merge from submit-queue
Use realistic value for the memory example of kube-reserved and system-reserved
Use realistic value for the memory example of kube-reserved and system-reserved
Currently, kublet help shows the memory example of
kube-reserved and system-reserved as 150G. This 150G is not realistic
value and it leads misconfiguration or confusion. This patch changes
to example value as 500Mi.
Before(same with system-reserved):
```
--kube-reserved value A set of ResourceName=ResourceQuantity (e.g. cpu=200m,memory=150G) pairs that describe resources reserved for kubernetes system components. Currently only cpu and memory are supported. See http://releases.k8s.io/HEAD/docs/user-guide/compute-resources.md for more detail. [default=none]
```
After(same with system-reserved):
```
--kube-reserved value A set of ResourceName=ResourceQuantity (e.g. cpu=200m,memory=500Mi) pairs that describe resources reserved for kubernetes system components. Currently only cpu and memory are supported. See http://releases.k8s.io/HEAD/docs/user-guide/compute-resources.md for more detail. [default=none]
```
Automatic merge from submit-queue
Eviction Manager Enforces Allocatable Thresholds
This PR modifies the eviction manager to enforce node allocatable thresholds for memory as described in kubernetes/community#348.
This PR should be merged after #41234.
cc @kubernetes/sig-node-pr-reviews @kubernetes/sig-node-feature-requests @vishh
** Why is this a bug/regression**
Kubelet uses `oom_score_adj` to enforce QoS policies. But the `oom_score_adj` is based on overall memory requested, which means that a Burstable pod that requested a lot of memory can lead to OOM kills for Guaranteed pods, which violates QoS. Even worse, we have observed system daemons like kubelet or kube-proxy being killed by the OOM killer.
Without this PR, v1.6 will have node stability issues and regressions in an existing GA feature `out of Resource` handling.
Automatic merge from submit-queue (batch tested with PRs 42443, 38924, 42367, 42391, 42310)
Dell EMC ScaleIO Volume Plugin
**What this PR does / why we need it**
This PR implements the Kubernetes volume plugin to allow pods to seamlessly access and use data stored on ScaleIO volumes. [ScaleIO](https://www.emc.com/storage/scaleio/index.htm) is a software-based storage platform that creates a pool of distributed block storage using locally attached disks on every server. The code for this PR supports persistent volumes using PVs, PVCs, and dynamic provisioning.
You can find examples of how to use and configure the ScaleIO Kubernetes volume plugin in [examples/volumes/scaleio/README.md](examples/volumes/scaleio/README.md).
**Special notes for your reviewer**:
To facilitate code review, commits for source code implementation are separated from other artifacts such as generated, docs, and vendored sources.
```release-note
ScaleIO Kubernetes Volume Plugin added enabling pods to seamlessly access and use data stored on ScaleIO volumes.
```
Automatic merge from submit-queue (batch tested with PRs 41919, 41149, 42350, 42351, 42285)
enable cgroups tiers and node allocatable enforcement on pods by default.
```release-note
Pods are launched in a separate cgroup hierarchy than system services.
```
Depends on #41753
cc @derekwaynecarr
Automatic merge from submit-queue
Extend experimental support to multiple Nvidia GPUs
Extended from #28216
```release-note
`--experimental-nvidia-gpus` flag is **replaced** by `Accelerators` alpha feature gate along with support for multiple Nvidia GPUs.
To use GPUs, pass `Accelerators=true` as part of `--feature-gates` flag.
Works only with Docker runtime.
```
1. Automated testing for this PR is not possible since creation of clusters with GPUs isn't supported yet in GCP.
1. To test this PR locally, use the node e2e.
```shell
TEST_ARGS='--feature-gates=DynamicKubeletConfig=true' FOCUS=GPU SKIP="" make test-e2e-node
```
TODO:
- [x] Run manual tests
- [x] Add node e2e
- [x] Add unit tests for GPU manager (< 100% coverage)
- [ ] Add unit tests in kubelet package
- Add a new type PortworxVolumeSource
- Implement the kubernetes volume plugin for Portworx Volumes under pkg/volume/portworx
- The Portworx Volume Driver uses the libopenstorage/openstorage specifications and apis for volume operations.
Changes for k8s configuration and examples for portworx volumes.
- Add PortworxVolume hooks in kubectl, kube-controller-manager and validation.
- Add a README for PortworxVolume usage as PVs, PVCs and StorageClass.
- Add example spec files
Handle code review comments.
- Modified READMEs to incorporate to suggestions.
- Add a test for ReadWriteMany access mode.
- Use util.UnmountPath in TearDown.
- Add ReadOnly flag to PortworxVolumeSource
- Use hostname:port instead of unix sockets
- Delete the mount dir in TearDown.
- Fix link issue in persistentvolumes README
- In unit test check for mountpath after Setup is done.
- Add PVC Claim Name as a Portworx Volume Label
Generated code and documentation.
- Updated swagger spec
- Updated api-reference docs
- Updated generated code under pkg/api/v1
Godeps update for Portworx Volume Driver
- Adds github.com/libopenstorage/openstorage
- Adds go.pedge.io/pb/go/google/protobuf
- Updates Godep Licenses