Automatic merge from submit-queue (batch tested with PRs 54826, 53576, 55591, 54946, 54825). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Run nvidia-gpu device-plugin daemonset as an addon on GCE nodes that have nvidia GPUs attached
- Instead of the old `Accelerators` feature that added `alpha.kubernetes.io/nvidia-gpu` resource, use the new `DevicePlugins` feature that adds vendor specific resources. (In case of nvidia GPUs it will
add `nvidia.com/gpu` resource.)
- Add node label to GCE nodes with accelerators attached. This node label is the same as what GKE attaches to node pools with accelerators attached. (For example, for nvidia-tesla-p100 GPU, the label would be `cloud.google.com/gke-accelerator=nvidia-tesla-p100`) This will help us target accelerator specific
daemonsets etc. to these nodes.
- Run nvidia-gpu device-plugin daemonset as an addon on GCE nodes that have nvidia GPUs attached.
- Some minor documentation improvements in addon manager.
**Release note**:
```release-note
GCE nodes with NVIDIA GPUs attached now expose `nvidia.com/gpu` as a resource instead of `alpha.kubernetes.io/nvidia-gpu`.
```
/sig cluster-lifecycle
/sig scheduling
/area hw-accelerators
https://github.com/kubernetes/features/issues/368
Automatic merge from submit-queue (batch tested with PRs 54460, 55258, 54858, 55506, 55510). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix CRI fluentd config.
This should fix the cri-containerd stackdriver test failure:
```
Cluster level logging implemented by Stackdriver should ingest logs
```
I copied the pattern from a comment previously. However, it doesn't actually work properly. `\b` only matches word boundary, and seems to match the boundary of previous word in our case.
That's why we get the log with a leading space:
```
Nov 10 18:39:11.661: INFO: Unexpected error occurred: log entry ingested incorrectly, got --> <--I0101 00:00:00.000000 1 main.go:1] Text, want Text
```
@kubernetes/sig-node-bugs @kubernetes/sig-instrumentation-bugs
Signed-off-by: Lantao Liu <lantaol@google.com>
```release-note
none
```
Automatic merge from submit-queue (batch tested with PRs 54987, 55221, 54099, 55144, 54215). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
extracted elasticsearch-logging service name as environment variable
**What this PR does / why we need it**:
Deploying the cluster-addon fluentd-elasticsearch with customized resource definitions can cause elasticsearch discovery to fail because the service name `elasticsearch-logging` is hard-coded in cluster/addons/fluentd-elasticsearch/es-image/elasticsearch_logging_discovery.go
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
-> none yet
**Special notes for your reviewer**:
The name of the environment variable is ELASTICSEARCH_SERVICE_NAME. When non is given the fallback service-name fallback is `elasticsearch-logging`
```release-note
[fluentd-elasticsearch addon] Elasticsearch service name can be overridden via env variable ELASTICSEARCH_SERVICE_NAME
```
Automatic merge from submit-queue (batch tested with PRs 53866, 54852, 55178, 55185, 55130). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Adjust resources for Metrics Server
**What this PR does / why we need it**:
This PR adjusts resources set for Metrics Server by Pod Nanny to reduce resources usage by core Kubernetes components when enabling Metrics Server. In Kubernetes 1.8 Metrics Server is used only by HPAv2, other use-cases are covered by Heapster.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 53645, 54734, 54586, 55015, 54688). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Disable the grace termination period for the calico/node pod
**What this PR does / why we need it**:
Disable the termination grace period for the calico/node add-on DaemonSet. The grace period is unnecessary for calico/node and it delays restart of a new calico/node pod to take over routing and policy updates.
Setting the grace period to 0 has the special meaning of doing a force deletion, which avoids a slow round-trip through the kubelet and API server.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#55013
**Special notes for your reviewer**:
**Release note**:
```release-note
Disable the termination grace period for the calico/node add-on DaemonSet to reduce downtime during a rolling upgrade or deletion.
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Reduce metadata-proxy cpu requests to 30m
After the recent change enabling metadata-proxy in tests (https://github.com/kubernetes/kubernetes/pull/54150) we started seeing problems with scheduling cluster autoscaler on master. Metadata-proxy eats all of the available space leaving nothing for CA to run on.
This PR reduces the cpu requests for metadata-proxy allowing other components to fit in.
cc: @kubernetes/sig-autoscaling-bugs
The grace period is unneccessary for calico/node and it delays restart of
a new calico/node pod to take over routing and policy updates.
Setting the grace period to 0 has the special meaning of doing a force deletion,
which avoids a slow round-trip through the kubelet and API server.
Fixes#55013
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
update rbac apiversion
**What this PR does / why we need it**:
update rbac apiversion to v1
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```NONE
```
Automatic merge from submit-queue (batch tested with PRs 54635, 54250, 54657, 54696, 54700). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Bump version of prometheus-to-sd to 0.2.2.
Bump version of prometheus-to-sd to improve logging, add pod_name and
pod_namespace flags and remove deprecated flags.
Fixes#54583
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 50776, 54395). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Move fluentd-gcp out of host network
Since metadata proxy doesn't filter service account after all, make fluentd-gcp addon run in its own network
This will mitigate the problem with port collision
```release-note
[fluentd-gcp addon] Fluentd now runs in its own network, not in the host one.
```
Automatic merge from submit-queue (batch tested with PRs 54419, 53545). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Updating Calico to v2.6.1
**What this PR does / why we need it**:
Updating Calico to the most recent release v2.6.1.
[Release page](https://docs.projectcalico.org/v2.6/releases/) and [blog post](https://www.projectcalico.org/project-calico-2-6-released/)
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 53946, 53993, 54315, 54143, 54532). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
RBAC for Calico Typha Horizontal Autoscaler
**What this PR does / why we need it**:
On v1.8.0-gke.1 I noticed a number of RBAC failures for `default` in kube-system. Turns out the only container missing the serviceAccountName was the typha-horizontal-autoscaler.
**Special notes for your reviewer**:
cc @caseydavenport seems like this is up your alley
**Release note**:
```release-note
NONE
```
- Use a dedicated service account to run the fluentd-gcp DS
- Update prometheus-to-sd from v0.1.3 to v0.2.1
- Use the certificates in the prometheus-to-sd image rather than mounting the host certs
Automatic merge from submit-queue (batch tested with PRs 52003, 54559, 54518). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Load kernel modules automatically inside a kube-proxy pod
**What this PR does / why we need it**:
This change will mount `/lib/modules` on host to the kube-proxy pod,
so that a kube-proxy pod can load kernel modules by need
or when `modprobe <kmod>` is run inside the pod.
This will be convenient for kube-proxy running in IPVS mode.
Users will don't have to run `modprobe ip_vs` on nodes before starting
a kube-proxy pod.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
The kube-proxy IPVS proxier will check if the kernel supports IPVS, or it will fallback to iptables or userspace modes. There is a false negative condition in the check, #51874 addressed that issue.
**Release note**:
```release-note
Load kernel modules automatically inside a kube-proxy pod
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
[addon/storage-class] update storageclass groupversion in storage-class
**What this PR does / why we need it**:
[addon/storage-class] update storageclass groupversion in storage-class
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```NONE
```
```release-notes
* Logging cleanups
* Updates kube-dns to use client-go 3
* Updates containers to use alpine as the base image on all platforms
* Adds support for IPv6
```
Automatic merge from submit-queue (batch tested with PRs 53106, 52193, 51250, 52449, 53861). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
kube-dns-anti-affinity: kube-dns never-co-located-in-the-same-node
**What this PR does / why we need it**:
This is upstreaming the kubernetes/kops#2705 pull request by @jamesbucher that was originally against [kops](github.com/kubernetes/kops).
Please see kubernetes/kops#2705 for more details, including a lengthy discussion.
Briefly, given the constraints of how the system works today:
+ if you need multiple DNS pods primarily for availability, then requiredDuringSchedulingIgnoredDuringExecution makes sense because putting more than one DNS pod on the same node isn't useful
+ if you need multiple DNS pods primarily for performance, then
preferredDuringScheduling IgnoredDuringExecution makes sense because it will allow the DNS pods to schedule even if they can't be spread across nodes
**Which issue this PR fixes**
fixeskubernetes/kops#2693
**Release note**:
```release-note
Improve resilience by annotating kube-dns addon with podAntiAffinity to prefer scheduling on different nodes.
```
Automatic merge from submit-queue (batch tested with PRs 52883, 52183, 53915, 53848). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
[GCE kube-up] Don't provision kubeconfig file for kube-proxy service account
**What this PR does / why we need it**:
Offloading the burden of provisioning kubeconfig file for kube-proxy service account from GCE startup scripts. This also helps us decoupling kube-proxy daemonset upgrade from node upgrade.
Previous attempt on https://github.com/kubernetes/kubernetes/pull/51172, using InClusterConfig for kube-proxy based on discussions on https://github.com/kubernetes/client-go/issues/281.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #NONE
**Special notes for your reviewer**:
/assign @bowei @thockin
cc @luxas @murali-reddy
**Release note**:
```release-note
NONE
```
This change will mount `/lib/modules` on host to the kube-proxy pod,
so that a kube-proxy pod can load kernel modules by need
or when `modprobe <kmod>` is run inside the pod.
This will be convenient for kube-proxy running in IPVS mode.
Users will don't have to run `modprobe ip_vs` on nodes before starting
a kube-proxy pod.
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Use `--oom-score-adj` flag for kube-proxy
**What this PR does / why we need it**:
Replace `echo -998 > /proc/$$$/oom_score_adj` with `--oom-score-adj` flag for kube-proxy.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#51083
**Special notes for your reviewer**:
/assign @justinsb @vishh
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
fluentd-elasticsearch add-on: Rename Docker image tag
As @crassirostris requested in #53307 - rename tag of Docker image gcr.io/google-containers/elasticsearch to drop -1 suffix.