Introduced chages:
1. Re-writing of the resolv.conf file generated by docker.
Cluster dns settings aren't passed anymore to docker api in all cases, not only for pods with host network:
the resolver conf will be overwritten after infra-container creation to override docker's behaviour.
2. Added new one dnsPolicy - 'ClusterFirstWithHostNet', so now there are:
- ClusterFirstWithHostNet - use dns settings in all cases, i.e. with hostNet=true as well
- ClusterFirst - use dns settings unless hostNetwork is true
- Default
Fixes#17406
Automatic merge from submit-queue
Extend experimental support to multiple Nvidia GPUs
Extended from #28216
```release-note
`--experimental-nvidia-gpus` flag is **replaced** by `Accelerators` alpha feature gate along with support for multiple Nvidia GPUs.
To use GPUs, pass `Accelerators=true` as part of `--feature-gates` flag.
Works only with Docker runtime.
```
1. Automated testing for this PR is not possible since creation of clusters with GPUs isn't supported yet in GCP.
1. To test this PR locally, use the node e2e.
```shell
TEST_ARGS='--feature-gates=DynamicKubeletConfig=true' FOCUS=GPU SKIP="" make test-e2e-node
```
TODO:
- [x] Run manual tests
- [x] Add node e2e
- [x] Add unit tests for GPU manager (< 100% coverage)
- [ ] Add unit tests in kubelet package
Automatic merge from submit-queue (batch tested with PRs 41921, 41695, 42139, 42090, 41949)
Unify fake runtime helper in kuberuntime, rkt and dockertools.
Addresses https://github.com/kubernetes/kubernetes/pull/42081#issuecomment-282429775.
Add `pkg/kubelet/container/testing/fake_runtime_helper.go`, and change `kuberuntime`, `rkt` and `dockertools` to use it.
@yujuhong This is a small unit test refactoring PR. Could you help me review it?
Automatic merge from submit-queue (batch tested with PRs 38676, 41765, 42103, 41833, 41702)
kubelet: cm: refactor QoS logic into seperate interface
This commit has no functional change. It refactors the QoS cgroup logic into a new `QOSContainerManager` interface to allow for better isolation for QoS cgroup features coming down the pike.
This is a breakout of the refactoring component of my QoS memory limits PR https://github.com/kubernetes/kubernetes/pull/41149 which will need to be rebased on top of this.
@vishh @derekwaynecarr
Automatic merge from submit-queue (batch tested with PRs 42162, 41973, 42015, 42115, 41923)
Increase Min Timeout for kill pod
Should mitigate #41347, which describes flakes in the inode eviction test due to "GracePeriodExceeded" errors.
When we use gracePeriod == 0, as we do in eviction, the pod worker currently sets a timeout of 2 seconds to kill a pod.
We are hitting this timeout fairly often during eviction tests, causing extra pods to be evicted (since the eviction manager "fails" to evict that pod, and kills the next one).
This PR increases the timeout from 2 seconds to 4, although we could increase it even more if we think that would be appropriate.
cc @yujuhong @vishh @derekwaynecarr
Automatic merge from submit-queue (batch tested with PRs 41116, 41804, 42104, 42111, 42120)
Remove SandboxReceived event
This PR removes SandboxReceived event in sync pod.
> This event seems somewhat meaningless, and clouds the event records for a pod. Do we actually need it? Pulling and pod received on the node are very relevant, this seems much less so. Would suggest we either remove it, or turn it into a message that clearly indicates why it has value.
Refer d65309399a (commitcomment-21052453).
cc @smarterclayton @yujuhong
Automatic merge from submit-queue (batch tested with PRs 41116, 41804, 42104, 42111, 42120)
Add support for attacher/detacher interface in Flex volume
Add support for attacher/detacher interface in Flex volume
This change breaks backward compatibility and requires to be release noted.
```release-note
Flex volume plugin is updated to support attach/detach interfaces. It broke backward compatibility. Please update your drivers and implement the new callouts.
```
Automatic merge from submit-queue (batch tested with PRs 41962, 42055, 42062, 42019, 42054)
dockershim puts pause container in pod cgroup
**What this PR does / why we need it**:
The CRI was not launching the pause container in the pod level cgroup. The non-CRI code path was.
Automatic merge from submit-queue
Admit critical pods under resource pressure
And evict critical pods that are not static.
Depends on #40952.
For #40573
Automatic merge from submit-queue (batch tested with PRs 41994, 41969, 41997, 40952, 40576)
Guaranteed admission for Critical Pods
This is the first step in implementing node-level preemption for critical pods.
It defines the AdmissionFailureHandler interface, which allows callers, like the kubelet, to define how failed predicates are handled, and take steps to correct failures if necessary.
In the kubelet's implementation, it triggers preemption if the pod being admitted is critical, and if the only failed predicates are InsufficientResourceErrors, then it prempts (not yet implemented) other other pods to allow admission of the critical pod.
cc: @vishh
Automatic merge from submit-queue (batch tested with PRs 41814, 41922, 41957, 41406, 41077)
Use consistent helper for getting secret names from pod
Kubelet secret-manager and mirror-pod admission both need to know what secrets a pod spec references. Eventually, a node authorizer will also need to know the list of secrets.
This creates a single (well, double, because api versions) helper that can be used to traverse the secret names referenced from a pod, optionally short-circuiting (for places that are just looking to see if any secrets are referenced, like admission, or are looking for a particular secret ref, like authorization)
Fixes:
* secret manager not handling secrets used by env/envFrom in initcontainers
* admission allowing mirror pods with secret references
@smarterclayton @wojtek-t
Automatic merge from submit-queue (batch tested with PRs 41621, 41946, 41941, 41250, 41729)
bug fix for hostport-syncer
fix a bug introduced by the previous refactoring of hostport-syncer. https://github.com/kubernetes/kubernetes/pull/39443
and fix some nits
Automatic merge from submit-queue
BestEffort QoS class has min cpu shares
**What this PR does / why we need it**:
BestEffort QoS class is given the minimum amount of CPU shares per the QoS design.
Automatic merge from submit-queue (batch tested with PRs 41714, 41510, 42052, 41918, 31515)
Switch scheduler to use generated listers/informers
Where possible, switch the scheduler to use generated listers and
informers. There are still some places where it probably makes more
sense to use one-off reflectors/informers (listing/watching just a
single node, listing/watching scheduled & unscheduled pods using a field
selector).
I think this can wait until master is open for 1.7 pulls, given that we're close to the 1.6 freeze.
After this and #41482 go in, the only code left that references legacylisters will be federation, and 1 bit in a stateful set unit test (which I'll clean up in a follow-up).
@resouer I imagine this will conflict with your equivalence class work, so one of us will be doing some rebasing 😄
cc @wojtek-t @gmarek @timothysc @jayunit100 @smarterclayton @deads2k @liggitt @sttts @derekwaynecarr @kubernetes/sig-scheduling-pr-reviews @kubernetes/sig-scalability-pr-reviews
Where possible, switch the scheduler to use generated listers and
informers. There are still some places where it probably makes more
sense to use one-off reflectors/informers (listing/watching just a
single node, listing/watching scheduled & unscheduled pods using a field
selector).
Automatic merge from submit-queue (batch tested with PRs 41812, 41665, 40007, 41281, 41771)
Kubelet-rkt: Add useful informations for Ops on the Kubelet Host
Create a Systemd SyslogIdentifier inside the [Service]
Create a Systemd Description inside the [Unit]
**What this PR does / why we need it**:
#### Overview
Logged against the host, it's difficult to identify who's who.
This PR add useful information to quickly get straight to the point with the **DESCRIPTION** field:
```
systemctl list-units "k8s*"
UNIT LOAD ACTIVE SUB DESCRIPTION
k8s_b5a9bdf7-e396-4989-8df0-30a5fda7f94c.service loaded active running kube-controller-manager-172.20.0.206
k8s_bec0d8a1-dc15-4b47-a850-e09cf098646a.service loaded active running nginx-daemonset-gxm4s
k8s_d2981e9c-2845-4aa2-a0de-46e828f0c91b.service loaded active running kube-apiserver-172.20.0.206
k8s_fde4b0ab-87f8-4fd1-b5d2-3154918f6c89.service loaded active running kube-scheduler-172.20.0.206
```
#### Overview and Journal
Always on the host, to easily retrieve the pods logs, this PR add a SyslogIdentifier named as the PodBaseName.
```
# A DaemonSet prometheus-node-exporter is running on the Kubernetes Cluster
systemctl list-units "k8s*" | grep prometheus-node-exporter
k8s_c60a4b1a-387d-4fce-afa1-642d6f5716c1.service loaded active running prometheus-node-exporter-85cpp
# Get the logs from the prometheus-node-exporter DaemonSet
journalctl -t prometheus-node-exporter | wc -l
278
```
Sadly the `journalctl` flag `-t` / `--identifier` doesn't allow a pattern to catch the logs.
Also this field improve any queries made by any tools who exports the Journal (E.g: ES, Kibana):
```
{
"__CURSOR" : "s=86fd390d123b47af89bb15f41feb9863;i=164b2c27;b=7709deb3400841009e0acc2fec1ebe0e;m=1fe822ca4;t=54635e6a62285;x=b2d321019d70f36f",
"__REALTIME_TIMESTAMP" : "1484572200411781",
"__MONOTONIC_TIMESTAMP" : "8564911268",
"_BOOT_ID" : "7709deb3400841009e0acc2fec1ebe0e",
"PRIORITY" : "6",
"_UID" : "0",
"_GID" : "0",
"_SYSTEMD_SLICE" : "system.slice",
"_SELINUX_CONTEXT" : "system_u:system_r:kernel_t:s0",
"_MACHINE_ID" : "7bbb4401667243da81671e23fd8a2246",
"_HOSTNAME" : "Kubelet-Host",
"_TRANSPORT" : "stdout",
"SYSLOG_FACILITY" : "3",
"_COMM" : "ld-linux-x86-64",
"_CAP_EFFECTIVE" : "3fffffffff",
"SYSLOG_IDENTIFIER" : "prometheus-node-exporter",
"_PID" : "88827",
"_EXE" : "/var/lib/rkt/pods/run/c60a4b1a-387d-4fce-afa1-642d6f5716c1/stage1/rootfs/usr/lib64/ld-2.21.so",
"_CMDLINE" : "stage1/rootfs/usr/lib/ld-linux-x86-64.so.2 stage1/rootfs/usr/bin/systemd-nspawn [....]",
"_SYSTEMD_CGROUP" : "/system.slice/k8s_c60a4b1a-387d-4fce-afa1-642d6f5716c1.service",
"_SYSTEMD_UNIT" : "k8s_c60a4b1a-387d-4fce-afa1-642d6f5716c1.service",
"MESSAGE" : "[ 8564.909237] prometheus-node-exporter[115]: time=\"2017-01-16T13:10:00Z\" level=info msg=\" - time\" source=\"node_exporter.go:157\""
}
```
Automatic merge from submit-queue (batch tested with PRs 41146, 41486, 41482, 41538, 41784)
fix issue #41746
**What this PR does / why we need it**:
**Which issue this PR fixes** : fixes#41746
**Special notes for your reviewer**:
cc @feiskyer
Automatic merge from submit-queue (batch tested with PRs 38957, 41819, 41851, 40667, 41373)
Change taints/tolerations to api fields
This PR changes current implementation of taints and tolerations from annotations to API fields. Taint and toleration are now part of `NodeSpec` and `PodSpec`, respectively. The annotation keys: `scheduler.alpha.kubernetes.io/tolerations` and `scheduler.alpha.kubernetes.io/taints` have been removed.
**Release note**:
Pod tolerations and node taints have moved from annotations to API fields in the PodSpec and NodeSpec, respectively. Pod tolerations and node taints that are defined in the annotations will be ignored. The annotation keys: `scheduler.alpha.kubernetes.io/tolerations` and `scheduler.alpha.kubernetes.io/taints` have been removed.
Automatic merge from submit-queue (batch tested with PRs 41349, 41532, 41256, 41587, 41657)
Enable pod level cgroups by default
**What this PR does / why we need it**:
It enables pod level cgroups by default.
**Special notes for your reviewer**:
This is intended to be enabled by default on 2/14/2017 per the plan outlined here:
https://github.com/kubernetes/community/pull/314
**Release note**:
```release-note
Each pod has its own associated cgroup by default.
```