Automatic merge from submit-queue (batch tested with PRs 53507, 53772, 52903, 53543). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Removes TTY flag from etcd image build process
**What this PR does / why we need it**:
etcd image building fails when running without TTY with `the input device is not a TTY`
Related:
- https://stackoverflow.com/q/43099116/5285457
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Ensure base image includes the modprobe binary
**What this PR does / why we need it**:
Includes the kmod package so that "modprobe" is available for kubelet and kube-proxy.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#53396
Automatic merge from submit-queue (batch tested with PRs 47039, 53681, 53303, 53181, 53781). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Use bazel to build/push kubemark image
try to get some proof of concept, kubemark image is probably simple enough to get converted to bazel. (me bazel noob still trying it out locally)
cc @BenTheElder @ixdy @shyamjvs
/release-note-none
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
[GCE kube-up] Allow creating/deleting custom network
**What this PR does / why we need it**:
From https://github.com/kubernetes/test-infra/issues/4472.
This is the first step to make PR jobs use custom network instead of auto network (so that we will be less likely hitting subnetwork quota issue).
The last commit is purely for testing out the changes on PR jobs. It will be removed after review.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #NONE.
**Special notes for your reviewer**:
/assign @bowei @nicksardo
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Bump kube-dns version used in e2e
**What this PR does / why we need it**: Updates the version of kube-dns used in the e2e network tests.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: ref #53153
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 50223, 53205). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Create e2e tests for Custom Metrics - Stackdriver Adapter and HPA based on custom metrics from Stackdriver
**What this PR does / why we need it**:
- Add e2e test for Custom Metrics - Stackdriver Adapter
- Add 2e2 test for HPA based on custom metrics from Stackdriver
- Enable HorizontalPodAutoscalerUseRESTClients option
**Release note**:
```release-note
Horizontal pod autoscaler uses REST clients through the kube-aggregator instead of the legacy client through the API server proxy.
```
Automatic merge from submit-queue (batch tested with PRs 52520, 52033, 53626, 50478). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
GCE kube-down: Delete all remaining firewall rules when DELETE_NETWORK is set
**What this PR does / why we need it**: From https://github.com/kubernetes/kubernetes/issues/52347#issuecomment-335245693, we think it'd be reasonable to cleanup firewall resources as well during GCE kube-down.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #NONE
**Special notes for your reviewer**:
/assign @shyamjvs @bowei
**Release note**:
```release-note
NONE
```
This allows the etcd docker registry that is currently hard coded to
`gcr.io/google_containers/etcd` in the `etcd.manifest` template to be
overridden. This can be used to test new versions of etcd with
kubernetes that have not yet been published to
`gcr.io/google_containers/etcd` and also enables cluster operators to
manage the etcd images used by their cluster in an internal
repository.
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add launching Cluster Autoscaler in Kubemark
**What this PR does / why we need it**:
Allows to launch Cluster Autoscaler in Kubemark.
To do it, set ENABLE_KUBEMARK_CLUSTER_AUTOSCALER flag to true. This currently only works with one nodegroup, for which you can specify minimum and maximum number of nodes and name. (KUBEMARK_AUTOSCALER_MIN_NODES, KUBEMARK_AUTOSCALER_MAX_NODES, KUBEMARK_AUTOSCALER_MIG_NAME).
Is is important to note that NUM_NODES has a different meaning when launching Cluster Autoscaler - we always start with only one node, but NUM_NODES is used to calculate the size of Kubemark master and addon components.
There are no changes to the current setup if ENABLE_KUBEMARK_CLUSTER_AUTOSCALER is set to false.
**Release note**:
```
NONE
```
This change will mount `/lib/modules` on host to the kube-proxy pod,
so that a kube-proxy pod can load kernel modules by need
or when `modprobe <kmod>` is run inside the pod.
This will be convenient for kube-proxy running in IPVS mode.
Users will don't have to run `modprobe ip_vs` on nodes before starting
a kube-proxy pod.
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Build hyperkube image using Bazel
**What this PR does / why we need it**: Before we had the hyperkube base image, it was difficult to build the hyperkube with Bazel. Now that we have the base image with all the necessary dependencies, this has become trivial.
This will enable federation jobs etc on prow.
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
/assign @BenTheElder @mikedanese @spxtr
cc @luxas @pipejakob
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
container-vm is deprecated, so don't use it for GCE test clusters
**What this PR does / why we need it**: container-vm is deprecated. We shouldn't start test clusters using it for nodes.
**Release note**:
```release-note
NONE
```
x-ref #48279 which started this work
Automatic merge from submit-queue (batch tested with PRs 53044, 52956, 53512, 53028). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add ipvs sync period parameters - align to iptables proxier
**What this PR does / why we need it**:
Add ipvs sync period parameters - align to iptables proxier
**Which issue this PR fixes**:
fixes#52957
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 53454, 53446, 52935, 53443, 52917). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Bump default snap channel to 1.8/stable in juju charms
**What this PR does / why we need it**:
This updates the Juju charms to deploy Kubernetes 1.8 by default.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Use `--oom-score-adj` flag for kube-proxy
**What this PR does / why we need it**:
Replace `echo -998 > /proc/$$$/oom_score_adj` with `--oom-score-adj` flag for kube-proxy.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#51083
**Special notes for your reviewer**:
/assign @justinsb @vishh
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
fluentd-elasticsearch add-on: Rename Docker image tag
As @crassirostris requested in #53307 - rename tag of Docker image gcr.io/google-containers/elasticsearch to drop -1 suffix.
Automatic merge from submit-queue (batch tested with PRs 53280, 53330). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add permisions for Metrics Server to read resources on cluster level
**What this PR does / why we need it**:
Add permisions for Metrics Server to read resources on cluster level.
**Which issue this PR fixes**:
fixes https://github.com/kubernetes-incubator/metrics-server/issues/16
**Release note**:
```release-note
Fix permissions for Metrics Server.
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Merge kube-dns templates into a single file
**What this PR does / why we need it**: Merge all of the kube-dns cluster yamls into a single file.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#42832
**Special notes for your reviewer**:
/assign @bowei @shashidharatd
cc @kevin-wangzefeng @euank @lhuard1A
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
default fail-swap-on to false for kubelet on kubernetes-worker charm
**What this PR does / why we need it**: default fail-swap-on to false for kubelet on kubernetes-worker charm
**Release note**:
```release-note
default fail-swap-on to false for kubelet on kubernetes-worker charm
```
Automatic merge from submit-queue (batch tested with PRs 51765, 53053, 52771, 52860, 53284). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix failing import in juju master namespace actions.
**What this PR does / why we need it**: The import of the templating render method is failing.This is to address this issue.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Change RBAC storage version to v1 for 1.9
v1 was introduced in 1.8, but storage version remained at v1beta1 to accommodate HA rolling upgrades. in 1.9, we can change the persisted and preferred version to v1
```release-note
RBAC objects are now stored in etcd in v1 format. After completing an upgrade to 1.9, RBAC objects (Roles, RoleBindings, ClusterRoles, ClusterRoleBindings) should be migrated to ensure all persisted objects are written in `v1` format, prior to `v1alpha1` support being removed in a future release.
```
Automatic merge from submit-queue (batch tested with PRs 52685, 53344). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Don't referece not-exist addon manager manifests in comment
**What this PR does / why we need it**:
`addon-manager-multinode.json` and `addon-manager-singlenode.json` have been removed by b814b62447 (diff-89347a70de188b3c15f5ee15323658d2).
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #NONE
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Upgrade fluentd-elasticsearch addon to Elasticsearch/Kibana 5.6.2
Upgrade Elasticsearch and Kibana to version 5.6.2. I also upgrade some API versions of manifests to correspond to Kubernetes 1.8, I hope the latter is uncontroversial?
```release-notes
```
Automatic merge from submit-queue (batch tested with PRs 50280, 52529, 53093, 53108, 53168). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Remove touch-lock init container from kube-proxy
**What this PR does / why we need it**: Ack https://github.com/kubernetes/kubeadm/issues/298, touch-lock init container is no longer needed after we have https://github.com/kubernetes/kubernetes/pull/46597.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #NONE
**Special notes for your reviewer**:
/assign @bowei @cmluciano
cc @dixudx
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 49249, 53203, 53209, 53208, 53177). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix lint error on kubernetes-worker
**What this PR does / why we need it**:
This fixes a lint error on kubernetes-worker that's causing problems in our CI builds.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix kube-proxy addon OWNERS file
**What this PR does / why we need it**: Sorry for the typo :(
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #NONE
**Special notes for your reviewer**:
/assign @thockin @bowei
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 50988, 50509, 52660, 52663, 52250). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
s390x ingress controller support
**What this PR does / why we need it**: Adds support for an s390x ingress image to the juju kubernetes-worker charm.
**Release note**:
```
Adds support for an s390x ingress image to the juju kubernetes-worker charm.
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
Mark Cluster Autoscaler as GA (1.0.0)
This is basically the same version as 0.7.0(-beta2). However to reduce confusion among users we decided to name the first GA version of CA as 1.0.0.
```release-note
Cluster Autoscaler 1.0.0
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
Add fallback function 'detect-project' in skeleton util
**What this PR does / why we need it**:
detect-project is not implemented by default:
When use ./hack/ginkgo-e2e.sh to run e2e test with custom providers, it will prompt
```
log-dump.sh: line 70: detect-project: command not found
```
And script exits with code 127.
**Which issue this PR fixes**
**Special notes for your reviewer**:
**Release note**:
`NONE`
@shyamjvs
Automatic merge from submit-queue (batch tested with PRs 52880, 52855, 52761, 52885, 52929). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
Add cos as an alias for gci in the upgrade script
This was causing some issues when upgrading from a GCI image. This is the same conversion happening in config-defaults.sh.
https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-stable2-stable1-upgrade-cluster-new/205
The node image was being left at COS, and when we went to build the kube-env, we only check against "gci". This caused us to not fully construct the environment for nodes and then they couldn't fully come up after an upgrade.
I've already fixed the CI test suites to explicitly specify "gci", but this auto-detection logic should be fixed too.
Fixes: #52930
Automatic merge from submit-queue (batch tested with PRs 52880, 52855, 52761, 52885, 52929). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
Remove cloud provider rackspace
**What this PR does / why we need it**:
For now, we have to implement functions in both `rackspace` and `openstack` packages if we want to add function for cinder, for example [resize for cinder](https://github.com/kubernetes/kubernetes/pull/51498). Since openstack has implemented all the functions rackspace has, and rackspace is considered deprecated for a long time, [rackspace deprecated](https://github.com/rackspace/gophercloud/issues/592) ,
after talking with @mikedanese and @jamiehannaford offline , i sent this PR to remove `rackspace` in favor of `openstack`
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#52854
**Special notes for your reviewer**:
**Release note**:
```release-note
The Rackspace cloud provider has been removed after a long deprecation period. It was deprecated because it duplicates a lot of the OpenStack logic and can no longer be maintained. Please use the OpenStack cloud provider instead.
```
Automatic merge from submit-queue (batch tested with PRs 52355, 52537, 52551, 52403, 50673). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
Add s390x to juju kubernetes
**What this PR does / why we need it**: With this PR we add support for s390x to juju kubernetes worker
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*:
**Special notes for your reviewer**:
**Release note**:
```Kubernetes deployments to s390x via Juju
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
Fix kubernetes charms not restarting services properly after host reboot on LXD
**What this PR does / why we need it**:
This fixes an issue when running the Kubernetes charms on LXD where the services don't restart properly after a reboot of the host machine.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: https://github.com/juju-solutions/bundle-canonical-kubernetes/issues/357
**Special notes for your reviewer**:
See https://github.com/juju-solutions/layer-cdk-service-kicker
**Release note**:
```release-note
Fix kubernetes charms not restarting services properly after host reboot on LXD
```
Automatic merge from submit-queue (batch tested with PRs 51438, 52182, 51607, 47912, 51595). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
Add api-extra-args support to the kubernetes-master juju layer
**What this PR does / why we need it**: It adds api-extra-args config option to the kubernetes-master juju layer
**Which issue this PR fixes**: fixes#46778
**Special notes for your reviewer**:
```release-note
Add api-extra-args support to the kubernetes-master juju layer
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
Fix url for Saltstack administration document
Got an 404 not found error on `https://kubernetes.io/docs/admin/salt.md`
**What this PR does / why we need it**:
Fixed a wrong url in document
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
NONE
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
[fluentd-gcp addon] Update Stackdriver plugin to version 0.6.7
A new gem among all fixes Java logging severity parsing and string timestamp parsing
Also sync the buffer size with the gem guidelines, making it 1M instead of 2M.
/cc @igorpeshansky
Automatic merge from submit-queue (batch tested with PRs 51081, 52725). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
Fix mistype that causes breakage of e2e test.
**What this PR does / why we need it**:
Mistype in the configuration that breaks configuration with special heapster node.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
Fixes#52496.
Automatic merge from submit-queue (batch tested with PRs 48970, 52497, 51367, 52549, 52541). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
Default ABAC to off in GCE (for new clusters).
**What this PR does / why we need it**:
Disables the legacy ABAC authorizer by default on GCE/GKE clusters using kube-up.sh. Existing clusters upgrading to 1.8 will keep their existing configuration.
**Release note**:
```release-note
New GCE or GKE clusters created with `cluster/kube-up.sh` will not enable the legacy ABAC authorizer by default. If you would like to enable the legacy ABAC authorizer, export ENABLE_LEGACY_ABAC=true before running `cluster/kube-up.sh`.
```
Automatic merge from submit-queue (batch tested with PRs 52488, 52548). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
Enable overriding Heapster resource requirements in GCP
This PR enables to override Heapster resource requirements in GCP.
**Release note:**
```release-note
```
Add CLUSTER_SIGNING_DURATION environment variable to cluster
configuration scripts to allow configuration of signing duration of
certificates issued via the Certificate Signing Request API.
Automatic merge from submit-queue (batch tested with PRs 52452, 52115, 52260, 52290)
Add env var to enable kubelet rotation in kube-up.sh.
Fixes https://github.com/kubernetes/kubernetes/issues/52114
```release-note
Adds ROTATE_CERTIFICATES environment variable to kube-up.sh script for GCE
clusters. When that var is set to true, the command line flag enabling kubelet
client certificate rotation will be added to the kubelet command line.
```
Automatic merge from submit-queue (batch tested with PRs 51824, 50476, 52451, 52009, 52237)
Allow metadata firewall & proxy on in GCE, off by default
**What this PR does / why we need it**: Add necessary variables in kube-env to allow a user to turn on metadata firewall and proxy for K8s on GCE.
Ref #8867.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*:
**Special notes for your reviewer**:
**Release note**:
```release-note
GCE users can enable the metadata firewall and metadata proxy with KUBE_FIREWALL_METADATA_SERVER and ENABLE_METADATA_PROXY, respectively.
```
Automatic merge from submit-queue (batch tested with PRs 52376, 52439, 52382, 52358, 52372)
Add new api groups to the GCE advanced audit policy
Fixes https://github.com/kubernetes/kubernetes/issues/52265
It introduces the missing api groups, that were introduced in 1.8 release.
@piosz there's also the 'metrics' api group, should we audit it?
Automatic merge from submit-queue (batch tested with PRs 51601, 52153, 52364, 52362, 52342)
Make advanced audit policy on GCP configurable
Related to https://github.com/kubernetes/kubernetes/issues/52265
Make GCP audit policy configurable
/cc @tallclair
Automatic merge from submit-queue (batch tested with PRs 52316, 52289, 52375)
[fluentd-gcp addon] Trim too long log entries due to Stackdriver limitations
Stackdriver doesn't support log entries bigger than 100KB, so by default fluentd plugin just drops such entries. To avoid that and increase the visibility of this problem it's suggested to trim long lines instead.
/cc @igorpeshansky
```release-note
[fluentd-gcp addon] Fluentd will trim lines exceeding 100KB instead of dropping them.
```
Automatic merge from submit-queue (batch tested with PRs 52316, 52289, 52375)
Small fix in salt manifest for kube-apiserver for request-timeout flag
**What this PR does / why we need it**:
Fixes a minor bug in salt manifest (typo from #51480)
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
xref: #51355
Automatic merge from submit-queue (batch tested with PRs 52339, 52343, 52125, 52360, 52301)
Make log-dump use 'gcloud ssh' for GKE also
Fixes https://github.com/kubernetes/test-infra/issues/4323
I tested it locally (with some hacking for mimicking gke's DumpClusterLogs function in kubetest) and it worked.
cc @ericchiang
Automatic merge from submit-queue (batch tested with PRs 52339, 52343, 52125, 52360, 52301)
Switch default audit policy to beta and omit RequestReceived stage
Related to https://github.com/kubernetes/kubernetes/issues/52265
```release-note
By default, clusters on GCE no longer sends RequestReceived audit event, if advanced audit is configured.
```
Automatic merge from submit-queue
[GCE kube-up] Add a warning for kube-proxy DaemonSet option
**What this PR does / why we need it**:
Add a warning for kube-proxy DaemonSet option for GCE kube-up so that user will be aware of the risks.
Ref: https://github.com/kubernetes/kubernetes/issues/23225
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #NONE
**Special notes for your reviewer**:
/assign @bowei
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 51041, 52297, 52296, 52335, 52338)
[fluentd-gcp addon] Restore the metric for the number of read log entries
This metric, previously removed, will allow to monitor the number of log entries, that were read, but weren't sent by the output plugin because of liveness probe removing the data.
Automatic merge from submit-queue (batch tested with PRs 52007, 52196, 52169, 52263, 52291)
[fluentd-gcp addon] Update event-exporter to address metrics problem
Follow-up of https://github.com/GoogleCloudPlatform/k8s-stackdriver/pull/37:
```
In the clusters with CA, the number of metric streams will continuously grow if the host is included.
```
Name is updated b/c otherwise addon manager will not be able to pick up the change.
Automatic merge from submit-queue (batch tested with PRs 52227, 52120)
Use COS for nodes in testing clusters by default, and bump COS.
Addresses part of issue #51487. May assist with #51961 and #50695.
CVM is being deprecated, and falls out of support on 2017/10/01. We shouldn't run test jobs on it. So start using COS for all test jobs.
The default value of `KUBE_NODE_OS_DISTRIBUTION` for clusters created for testing will now be gci. Testjobs that do not specify this value will now run on clusters using COS (aka GCI) as the node OS, instead of CVM, the previous default.
This change only affects testing; non-testing clusters already use COS by default.
In addition, bump the version of COS from `cos-stable-60-9592-84-0` to `cos-stable-60-9592-90-0`.
```release-note
NONE
```
/cc @yujuhong, @mtaufen, @fejta, @krzyzacy
Automatic merge from submit-queue
Add cluster up configuration for certificate signing duration.
```release-note
Add CLUSTER_SIGNING_DURATION environment variable to cluster configuration scripts
to allow configuration of signing duration of certificates issued via the Certificate
Signing Request API.
```
Addresses part of issue #51487.
This is a big change for testing; any testjobs that do not
set an explicit KUBE_NODE_OS_DISTRIBUTION will have been running
on CVM, but after this PR will start running COS.
CVM is being deprecated, and falls out of support on 2018/10/01.
In addition, bump the patch version of COS from
cos-stable-60-9592-84-0 to cos-stable-60-9592-90-0.
Automatic merge from submit-queue (batch tested with PRs 51921, 51829, 51968, 51988, 51986)
COS/GCE: bump the max pids for the docker service
**What this PR does / why we need it**:
TasksMax limits how many threads/processes docker can create. Insufficient limit affects container starts.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*:
fixes#51977
**Special notes for your reviewer**:
**Release note**:
```release-note
Ensure TasksMax is sufficient for docker
```
Automatic merge from submit-queue (batch tested with PRs 51921, 51829, 51968, 51988, 51986)
Fix unbound variable in configure-helper.sh
This isn't plumbed yet on GKE, so results in an unbound variable.
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 51186, 50350, 51751, 51645, 51837)
Set up DNS server in containerized mounter path
During NFS/GlusterFS mount, it requires to have DNS server to be able to
resolve service name. This PR gets the DNS server ip from kubelet and
add it to the containerized mounter path. So if containerized mounter is
used, service name could be resolved during mount
**Release note**:
```release-note
Allow DNS resolution of service name for COS using containerized mounter. It fixed the issue with DNS resolution of NFS and Gluster services.
```
During NFS/GlusterFS mount, it requires to have DNS server to be able to
resolve service name. This PR gets the DNS server ip from kubelet and
add it to the containerized mounter path. So if containerized mounter is
used, service name could be resolved during mount
Automatic merge from submit-queue (batch tested with PRs 51739, 51762)
GCE: Separate the network's project from the rest of the project
**What this PR does / why we need it**:
PR allows the user to specify a different project for network resources during cluster turn-up.
Depends on #51725Fixes#51846
/assign @bowei
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Bump gce metadata-proxy from 0.1.2 to 0.1.3
**What this PR does / why we need it**: Bump metadata-proxy from 0.1.2 to 0.1.3 to incorporate fix for CVE 2016-9063, xref https://github.com/kubernetes/contrib/pull/2720
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 49727, 51792)
Introducing metrics-server
ref https://github.com/kubernetes/features/issues/271
There is still some work blocked on problems with repo synchronization:
- migrate to `v1beta1` introduced in #51653
- bump deps to HEAD
Will do it in a follow up PRs once the issue is resolved.
```release-note
Introduced Metrics Server
```
Automatic merge from submit-queue
Add RBAC, healthchecks, autoscalers and update Calico to v2.5.1
**What this PR does / why we need it**:
- Updates Calico to `v2.5`
- Calico/node to `v2.5.1`
- Calico CNI to `v1.10.0`
- Typha to `v0.4.1`
- Enable health check endpoints
- Add Readiness probe for calico-node and Typha
- Add Liveness probe for calico-node and Typha
- Add RBAC manifest
- With calico ClusterRole, ServiceAccount and ClusterRoleBinding
- Add Calico CRDs in the Calico manifest (only works for k8s v1.7+)
- Add vertical autoscaler for calico-node and Typha
- Add horizontal autoscaler for Typha
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 50602, 51561, 51703, 51748, 49142)
Use arm32v7|arm64v8 images instead of the deprecated armhf|aarch64 image organizations
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#50601
**Special notes for your reviewer**:
/assign @ixdy @jbeda @zmerlynn
**Release note**:
```release-note
Use arm32v7|arm64v8 images instead of the deprecated armhf|aarch64 image organizations
```
Automatic merge from submit-queue (batch tested with PRs 51553, 51538, 51663, 51069, 51737)
Edit owner files for kube-proxy manifests
**What this PR does / why we need it**: We should have owner file for kube-proxy daemonset manifest.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #NONE
**Special notes for your reviewer**:
/assign @bowei @thockin
cc @dnardo @freehan @nicksardo
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 51553, 51538, 51663, 51069, 51737)
Allow enable pod priority feature gate for GCE and configure priority for kube-proxy
**What this PR does / why we need it**:
From #23225, this PR adds an option for user to enable pod priority feature gate using GCE startup scripts, and configure pod priority for kube-proxy when enabled.
The setup `priorityClassName: system` derives from: ce1485c626/staging/src/k8s.io/api/core/v1/types.go (L2536-L2542)
The plan is to configure pod priority for kube-proxy daemonset (https://github.com/kubernetes/kubernetes/pull/50705) in the same way.
**Special notes for your reviewer**:
cc @bsalamat @davidopp @thockin
**Release note**:
```release-note
When using kube-up.sh on GCE, user could set env `ENABLE_POD_PRIORITY=true` to enable pod priority feature gate.
```
Automatic merge from submit-queue (batch tested with PRs 51583, 51283, 51374, 51690, 51716)
Create a secondary range for the services instead of a subnetwork
GCE now supports >1 secondary ranges / subnetwork.
Fixes#51774
```release-note
When using IP aliases, use a secondary range rather than subnetwork to reserve cluster IPs.
```
Automatic merge from submit-queue (batch tested with PRs 51590, 48217, 51209, 51575, 48627)
FlexVolume setup script for COS instance using mounting utility image in GCR.
**What this PR does / why we need it**: This scripts automates FlexVolume installation for a single COS instance. Users need to pre-pack their drivers and mount utilities in a Docker image and upload it to GCR.
For each FlexVolume plugin, the script places a driver wrapper in a writable and executable location. The wrapper calls commands from the actual driver but in a chroot environment, so that mount utilities from the image can be used.
I'm working on a script that automatically executes this on all instances. Will be in a separate PR.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#48626
```release-note
NONE
```
/cc @gmarek @chakri-nelluri
/assign @saad-ali @msau42
/sig storage
/release-note-none
Automatic merge from submit-queue
Adding Flexvolume plugin dir piping for controller manager on COS
**What this PR does / why we need it**: Sets the default Flexvolume plugin directory correctly for controller manager running on COS images.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#51563
```release-note
NONE
```
/release-note-none
/sig storage
/assign @msau42
/cc @wongma7
Automatic merge from submit-queue (batch tested with PRs 49971, 51357, 51616, 51649, 51372)
Add some initial shell parsing tests.
These just test to see if there is a bash syntax error in these shell
libraries.
For #51642
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 51480, 49616, 50123, 50846, 50404)
Add KUBE_APISERVER_REQUEST_TIMEOUT_SEC env var.
Cluster startup support for the flag added by #51415. I won't merge until that PR merges.
Bug: #51355
cc @jpbetz
Automatic merge from submit-queue
fix some bad URL in the /cluster/uju/layers/kubernetes-e2e/README.md
**What this PR does / why we need it**:
There are some bad URL when I read the file and I have fix it.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```
NONE
```
Automatic merge from submit-queue
Retry master instance creation in case of retriable error (with sleep)
To help with our 5k-node CI tests failing to startup the cluster.
And also towards the greater goal - https://github.com/kubernetes/kubernetes/issues/43140
cc @kubernetes/sig-scalability-misc @kubernetes/sig-cluster-lifecycle-misc
Automatic merge from submit-queue (batch tested with PRs 47054, 50398, 51541, 51535, 51545)
Switch away from gcloud deprecated flags in compute resource listings
**What is fixed**
Remove deprecated `gcloud compute` flags, see linked issue.
**Which issue this PR fixes**:
fixes#49673
**Special notes for your reviewer**:
The change in `gcloudComputeResourceList` in `test/e2e/framework/ingress_utils.go` isn't strictly needed as currently no affected resources are called on within that file, however the function has the _potential_ to access affected resources so I covered it as well. Happy to change if deemed unnecessary.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Add Google cloud KMS service for envelope encryption transformer
This adds the required pieces which will allow addition of KMS based encryption providers (envelope transformer).
For now, we will be implementing it using Google Cloud KMS, but the code should make it easy to add support for any other such provider which can expose Decrypt and Encrypt calls.
Writing tests for Google Cloud KMS Service may cause a significant overhead to the testing framework. It has been tested locally and on GKE though.
Upcoming after this PR:
* Complete implementation of the envelope transformer, which uses LRU cache to maintain decrypted DEKs in memory.
* Track key version to assist in data re-encryption after a KEK rotation.
Development branch containing the changes described above: https://github.com/sakshamsharma/kubernetes/pull/4
Envelope transformer used by this PR was merged in #49350
Concerns #48522
Planned configuration:
```
kind: EncryptionConfig
apiVersion: v1
resources:
- resources:
- secrets
providers:
- kms:
cachesize: 100
configfile: gcp-cloudkms.conf
name: gcp-cloudkms
- identity: {}
```
gcp-cloudkms.conf:
```
[GoogleCloudKMS]
kms-location: global
kms-keyring: google-container-engine
kms-cryptokey: example-key
```
Automatic merge from submit-queue (batch tested with PRs 51471, 50561, 50435, 51473, 51436)
Fix `gcloud compute instance-groups managed list` call
**What this PR does / why we need it**: gcloud 168.0.0 makes the `gcloud compute instance-groups managed list --format='value(instanceGroup)'` call return a URL instead of just the name, which is causing `list-instances` to fail. Switching to `--format='value(name)'` seems to restore the old behavior.
x-ref #49673
**Release note**:
```release-note
NONE
```
/cc @wojtek-t @mwielgus @shyamjvs @jiayingz @mindprince
Automatic merge from submit-queue (batch tested with PRs 50932, 49610, 51312, 51415, 50705)
Allow running kube-proxy as a DaemonSet when using kube-up.sh on GCE
**What this PR does / why we need it**:
From #23225, this PR adds an option for user to run kube-proxy as a DaemonSet instead of static pods using GCE startup scripts. By default, kube-proxy will run as static pods.
This is the first step for moving kube-proxy into a DaemonSet in GCE, remaining tasks will be tracked on #23225.
**Special notes for your reviewer**:
The last commit are purely for testing out kube-proxy as daemonset via CIs.
cc @kubernetes/sig-network-misc @kubernetes/sig-cluster-lifecycle-misc
**Release note**:
```release-note
When using kube-up.sh on GCE, user could set env `KUBE_PROXY_DAEMONSET=true` to run kube-proxy as a DaemonSet. kube-proxy is run as static pods by default.
```
Automatic merge from submit-queue (batch tested with PRs 51038, 50063, 51257, 47171, 51143)
update related manifest files to use hostpath type
**What this PR does / why we need it**:
Per [discussion in #46597](https://github.com/kubernetes/kubernetes/pull/46597#pullrequestreview-53568947)
Dependes on #46597
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
Fixes: https://github.com/kubernetes/kubeadm/issues/298
**Special notes for your reviewer**:
/cc @euank @thockin @tallclair @Random-Liu
**Release note**:
```release-note
None
```
Automatic merge from submit-queue (batch tested with PRs 50033, 49988, 51132, 49674, 51207)
Update cos image to cos-stable-60-9592-84-0
cos-m60 has been stable for a long time. This image contains a docker upgrade, which has been validated in https://github.com/kubernetes/kubernetes/issues/42926.
**Release note**:
```
None
```
/assign @yujuhong
/cc @dchen1107
Automatic merge from submit-queue (batch tested with PRs 50713, 47660, 51198, 51159, 51195)
Dump installation and configuration logs for master
**What this PR does / why we need it**:
We are dumping out empty configuration and installation logs on master, see `kube-node-configuration.log` and `kube-node-installation.log` on http://gcsweb.k8s.io/gcs/kubernetes-jenkins/logs/ci-kubernetes-e2e-gci-gce/12818/artifacts/bootstrap-e2e-master/.
I guess it is just because [we name the services on master differently](https://github.com/kubernetes/kubernetes/blob/v1.7.3/cluster/gce/gci/master.yaml#L4-L40)?
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #NONE
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Fix invalid url link in cluster/addons/registry/auth/README.md
**What this PR does / why we need it**:
Fix invalid url link in `cluster/addons/registry/auth/README.md`.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 47115, 51196, 51204, 51208, 51206)
Removing push_api_data on kube-api.connected seems to be dead code
**What this PR does / why we need it**: Removing dead code is always good :)
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**: There is no kube-api relation. This method was replace probably at some point by push_service_data firing when kube-api-endpoint.available
**Release note**:
```
```
Automatic merge from submit-queue (batch tested with PRs 51193, 51154, 42689, 51189, 51200)
Revert default service-cidr config on kubernetes-master charm
**What this PR does / why we need it**:
This reverts the default service-cidr config in the kubernetes-master charm.
A while back, we changed the default service-cidr in the kubernetes-master charm from `10.152.183.0/24` to `10.152.0.0/16`. In testing, we have found that the charms don't handle this change well, so we are reverting it until we can make the change more safely.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 51193, 51154, 42689, 51189, 51200)
Include $USER in network name to not clash for different users' cl…
Automatic merge from submit-queue (batch tested with PRs 51108, 51035, 50539, 51160, 50947)
Auto-calculate CLUSTER_IP_RANGE based on cluster size
In preparation for eliminating CLUSTER_IP_RANGE env var from job configs, making it less error prone while folks try to start their own large cluster tests (https://github.com/kubernetes/kubernetes/issues/50907).
/cc @kubernetes/sig-scalability-misc @wojtek-t @gmarek
Automatic merge from submit-queue (batch tested with PRs 51108, 51035, 50539, 51160, 50947)
Set GCE_ALPHA_FEATURES environment variable in gce.conf
This allows us to gate alpha features in the pkg/cloudprovider/providers/gce.
Automatic merge from submit-queue (batch tested with PRs 51113, 46597, 50397, 51052, 51166)
Changing Flexvolume plugin directory to a location reachable by containerized k8s components.
**What this PR does / why we need it**: Testing Flexvolume requires plugins to be installed at a location which is accessible by containerized k8s components (such as controller-manager).
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#51123
```release-note
NONE
```
/assign @wongma7 @msau42
/release-note-none
/sig storage
Automatic merge from submit-queue (batch tested with PRs 50489, 51070, 51011, 51022, 51141)
Run multiarch/qemu-user-static:register before building cross-arch images
**What this PR does / why we need it**: #48365 inadvertently broke building non-x86 hyperkube images for developers who'd not built non-x86 images before and thus hadn't yet run `multiarch/qemu-user-static:register`. This PR restores that step.
**Release note**:
```release-note
NONE
```
/assign @david-mcmahon @mbohlool @luxas
Automatic merge from submit-queue (batch tested with PRs 50489, 51070, 51011, 51022, 51141)
update to rbac v1 in yaml file
**What this PR does / why we need it**:
ref to https://github.com/kubernetes/kubernetes/pull/49642
ref https://github.com/kubernetes/features/issues/2
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
cc @liggitt
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 51039, 50512, 50546, 50965, 50467)
Add flags for prometheus-to-sd components.
Configure prometheus-to-sd-endpoint and prometheus-to-sd-prefix base on
the environment.
**Release note**:
NONE
Automatic merge from submit-queue (batch tested with PRs 50531, 50853, 49976, 50939, 50607)
Fix duplicate field in kubeconfig
The server field was accidentally duplicated during a rebase of #40050.
```release-note
NONE
```
Automatic merge from submit-queue
Update OWNERS files for networking components
This will reduce the approval load for the top level tree owners
```release-note
NONE
```
**What this PR does / why we need it**:
Makes functions in validation/schema.go private to kubectl,
further isolating kubectl.
**Which issue this PR fixes**
Part of a series of PRs to address kubernetes/community#598
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Add Priority admission controller
**What this PR does / why we need it**: Add Priority admission controller. This admission controller checks creation and update of PriorityClasses. It also resolves a PriorityClass name of a pod to its integer value.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
Add Priority admission controller for monitoring and resolving PriorityClasses.
```
ref/ #47604
ref/ #48646
Automatic merge from submit-queue (batch tested with PRs 50302, 50573, 50500, 50633, 50617)
Fix kubernetes-worker charm hook failure when applying labels
**What this PR does / why we need it**:
This fixes a failure that can occur in the kubernetes-worker charm when trying to apply node labels.
The failure is rare, and can occur in two situations that I've seen:
1. kube-apiserver is not up yet
2. kubelet has not registered itself as a node yet
Rather than give up right away, let's give the services a minute to come up.
**Release note**:
```release-note
Fix kubernetes-worker charm hook failure when applying labels
```
Automatic merge from submit-queue
Increase kibana CPU limit to sped up the startup
Similarly to Elasticsearch, Kibana requires some additional CPU during startup to build caches.
Fixes https://github.com/kubernetes/kubernetes/issues/50610
/cc @piosz @coffeepac @aknuds1
Automatic merge from submit-queue
Add variables for passing test args to kubemark master components
cc @msau42 - This change will enable us to turn on extender in the scheduler in kubemark-scale job
Automatic merge from submit-queue (batch tested with PRs 50485, 49951, 50508, 50511, 50506)
Pass config to external Kubemark cluster in e2e tests
When cluster autoscaler is used in kubemark tests,
pass default kubeconfig as external cluster config.
@shyamjvs @gmarek
**Release note**:
```
NONE
```
Automatic merge from submit-queue (batch tested with PRs 50386, 50374, 50444, 50382)
Add explicit API kind and version to the audit policy file on GCE
Adds an explicit API version and kind to the audit policy file in GCE configuration scripts. It's a prerequisite for https://github.com/kubernetes/kubernetes/pull/49115
/cc @tallclair @piosz
Automatic merge from submit-queue (batch tested with PRs 49725, 50367, 50391, 48857, 50181)
New get-kube.sh option: KUBERNETES_SKIP_RELEASE_VALIDATION
**What this PR does / why we need it**:
This is an alternative solution to https://github.com/kubernetes/kubernetes/pull/49884. The goal is to be able to pull releases that were built by bazel jobs (both presubmit and postsubmit builds), which currently fail our regex validation against the version string.
This implementation is a simple "I know what I'm doing" breakglass option to turn regex validation off, whereas https://github.com/kubernetes/kubernetes/pull/49884 was to extend our validation to support the new formats of bazel build jobs. I'm testing the waters to see if this is a more palatable solution.
**Release note**:
```release-note
New get-kube.sh option: KUBERNETES_SKIP_RELEASE_VALIDATION
```
CC @BenTheElder @fejta @ixdy
Automatic merge from submit-queue (batch tested with PRs 50300, 50328, 50368, 50370, 50372)
Bugfix: set resources only for fluentd-gcp container.
There is more than one container in fluentd-gcp deployment. Previous
implementation was setting resources for all containers, not just
the fluent-gcp one.
**What this PR does / why we need it**:
Bugfix; https://github.com/kubernetes/kubernetes/pull/49009 without this is eating more resources.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#50366
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
GKE deployment: Kill cluster/gke
kubernetes/test-infra#3983 migrated the remaining GKE jobs using the bash deployment (cluster/gke). All jobs are now on the gke `deployer` in `kubetest`.
Fixeskubernetes/test-infra#3307
```release-note
`cluster/gke` has been removed. GKE end-to-end testing should be done using `kubetest --deployment=gke`
```
There is more than one container in fluentd-gcp deployment. Previous
implementation was setting resources for multiple containers, not just
the fluent-gcp one.
Automatic merge from submit-queue
Ensure that pricing expander is used by default in Cluster Autoscaler
Pricing expander was set as the default one for GCP, however on some occasion it was possible that AUTOSCALER_EXPANDER_CONFIG variable was not set resulting in using the the random expander.
Automatic merge from submit-queue (batch tested with PRs 48532, 50054, 50082)
Refactored the fluentd-es addon
Refactor fluentd-elasticsearch addon:
- Decrease the number of files by moving RBAC-related objects in the same files where they're used
- Move the fluentd configuration out of the image
- Don't use PVC to avoid leaking resources in e2e tests
- Fluentd now ingest docker and kubelet logs that are written to journald
- Disable X-Pack, because it's not free
Fixes https://github.com/kubernetes/kubernetes/issues/41462
Fixes https://github.com/kubernetes/kubernetes/issues/49816
Fixes https://github.com/kubernetes/kubernetes/issues/48973
Fixes https://github.com/kubernetes/kubernetes/issues/49450
@aknuds1 @coffeepac Could you please take a look?
```release-note
Fluentd DaemonSet in the fluentd-elasticsearch addon is configured via ConfigMap and includes journald plugin
Elasticsearch StatefulSet in the fluentd-elasticsearch addon uses local storage instead of PVC by default
```
Automatic merge from submit-queue (batch tested with PRs 48487, 49009, 49862, 49843, 49700)
Enable overriding fluentd resources in GCP
**What this PR does / why we need it**: This enables overriding fluentd resources in GCP, when there is a need for custom ones.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 50119, 48366, 47181, 41611, 49547)
Add basic install and mount flexvolumes e2e tests
fixes https://github.com/kubernetes/kubernetes/issues/47010
These two tests install a skeleton "dummy" flex driver, attachable and non-attachable respectively, then test that a pod can successfully use the flex driver. They are labeled disruptive because kubelet and controller-manager get restarted as part of the flex install. IMO it's important to keep this install procedure as part of the test to isolate any bugs with the startup plugin probe code.
There is a bit of an ugly dependency on cluster/gce/config-test.sh because --flex-volume-plugin-dir must be set to a dir that's readable from controller-manager container and writable by the flex e2e test. The default path is not writable on GCE masters with read-only root so I picked a location that looks okay.
In the "dummy" drivers I trick kubelet into thinking there is a mount point by doing "mount -t tmpfs none ${MNTPATH} >/dev/null 2>&1", hope that is okay.
I have only tested on GCE and theoretically they may work on AWS but I don't think there is a need to test on multiple cloudproviders.
-->
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 46685, 49863, 50098, 50070, 50096)
GCE: Bump GLBC version to 0.9.6
Closes#50095
**Release note**:
```release-note
GCE: Bump GLBC version to 0.9.6
```
Automatic merge from submit-queue (batch tested with PRs 50103, 49677, 49449, 43586, 48969)
Run kazel on the entire tree
**What this PR does / why we need it**: part of #47558: auto-generate `BUILD` files on the entire tree, since this is what `gazelle` does, and it'll make subsequent reviews easier if less is changing.
**Release note**:
```release-note
NONE
```
/assign
/release-note-none
Automatic merge from submit-queue (batch tested with PRs 48365, 49902, 49808, 48722, 47045)
Upgrade fluentd-elasticsearch addon to Elasticsearch/Kibana 5.5
This is a patch to upgrade the fluentd-elasticsearch addon to Elasticsearch/Kibana 5.5. Please provide feedback!
```release-notes
* Upgrade Elasticsearch/Kibana to 5.5.1 in fluentd-elasticsearch addon
* Switch to basing our image of Elasticsearch in fluentd-elasticsearch addon off the official one
* Switch to the official image of Kibana in fluentd-elasticsearch addon
* Use StatefulSet for Elasticsearch instead of ReplicationController, with persistent volume claims
* Require authenticating towards Elasticsearch, as Elasticsearch 5.5 by default requires basic authentication
```
Automatic merge from submit-queue (batch tested with PRs 48365, 49902, 49808, 48722, 47045)
Rebase hyperkube image on debian-hyperkube-base, based on debian-base.
**What this PR does / why we need it**: saves all of the hyperkube image dependencies in a cacheable base image, rather than downloading them for every build (which is slow and flaky).
This way, at build time, we only need to pull down the hyperkube base image and add the hyperkube binary.
I've additionally based the base image on `debian-base` instead of `debian`, though we amusing end up reinstalling a bunch of the things we removed in `debian-base`.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#35058, at least partially
**Special notes for your reviewer**: I'm increasingly convinced that the hyperkube image is a bad pattern, as this image carries the superset of dependencies anyone might need, rather than the limited set of dependencies one needs. hyperkube really needs a proper owner.
**Release note**:
```release-note
```
/assign @timstclair @luxas @philips @nikhiljindal
cc @kubernetes/sig-release-pr-reviews
Automatic merge from submit-queue (batch tested with PRs 49989, 49806, 49649, 49412, 49512)
Use existing k8s binaries and images on disk when they are preloaded to gce cos image.
**What this PR does / why we need it**:
This change is to accelerate K8S startup time on gce when k8s tarballs and images are already preloaded in VM image, by skipping the downloading, extracting and file transfer steps.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 50029, 48517, 49739, 49866, 49782)
fix spelling
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue
Add parallelism to GCE cluster upgrade
Fixes https://github.com/kubernetes/kubernetes/issues/48373
Should allow upgrading 500-node cluster (1.6->1.7) in < 1 hr. It currently takes ~1.5 day.
Though it is the duty of the upgrader to choose the right parallelism in order to avoid disrupting too many pods.
/cc @kubernetes/sig-cluster-lifecycle-pr-reviews @kubernetes/sig-scalability-misc @mikedanese @gmarek
Automatic merge from submit-queue
[addon-manager] Remove unneeded annotation codes
**What this PR does / why we need it**:
Clean up addon-manager codes to make it less confusing. The annotation logics is only needed for 1.4->1.5 upgrade.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 49898, 49897, 49919, 48860, 49491)
gce: make append_or_replace.. atomic
Before this change,
* the final echo is not atomically written to the target file
* two concurrent callers will use the same tempfile
Helps with https://github.com/kubernetes/kubernetes/issues/49895
cc @miekg
Automatic merge from submit-queue (batch tested with PRs 49898, 49897, 49919, 48860, 49491)
gce: extend CLOBBER_CONFIG to support known_tokens.csv
Helps with #49895
Automatic merge from submit-queue
Reduce kubectl calls from O(#nodes) to O(1) in cluster logdump
Ref https://github.com/kubernetes/kubernetes/issues/48513
Each node's logexporter is made to write a file to a GCS directory on success (https://github.com/kubernetes/test-infra/pull/3782).
We now use that directory as a registry of successful nodes and get it through a single "gsutil ls" call. This:
- reduces the current waiting time for logexporter in 5k-node cluster from >1hr to <10s.
- eliminates dependency on `kubectl logs` calls which seem to be unreliable sometimes (e.g when kubelet (or apiserver) is down)
/cc @kubernetes/sig-scalability-misc @wojtek-t @gmarek @fejta
Automatic merge from submit-queue (batch tested with PRs 49538, 49708, 47665, 49750, 49528)
Add a support for GKE regional clusters in e2e tests.
**What this PR does / why we need it**:
Add a support for GKE regional clusters in e2e tests.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 49712, 49694, 49714, 49670, 49717)
set juju master charm state to blocked if the services appear to be failing
**What this PR does / why we need it**: set the juju master charm state to blocked if the services appear to be failing
**Release note**:
```release-note
set the juju master charm state to blocked if the services appear to be failing
```
Automatic merge from submit-queue (batch tested with PRs 49712, 49694, 49714, 49670, 49717)
Adding old Juju charm maintainers
**What this PR does / why we need it**: Update email addresses of past Juju charm maintainers
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*:
**Special notes for your reviewer**:
**Release note**:
```NONE
```
Automatic merge from submit-queue (batch tested with PRs 47738, 49196, 48907, 48533, 48822)
Fix a dead link in cluster/update-storage-objects.sh
**What this PR does / why we need it**: This PR fixes a dead link in cluster/update-storage-objects.sh.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 48360, 48469, 49576, 49516, 49558)
Update maintainers for Juju charm layers
**What this PR does / why we need it**: Update maintainers of harm layers to reflect ... reality
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*:
**Special notes for your reviewer**:
**Release note**:
```NONE
```
Automatic merge from submit-queue (batch tested with PRs 48360, 48469, 49576, 49516, 49558)
Rev Calico's Typha daemon to v0.2.3 in add-on deployment.
**What this PR does / why we need it**:
This PR revs the version of Calico's Typha daemon used in the calico-policy-controller add-on to the latest bug-fix release, which incorporates a [critical bug fix](https://github.com/projectcalico/typha/issues/28).
**Which issue this PR fixes**
fixes#49473
**Release note**:
```release-note
Rev version of Calico's Typha daemon used in add-on to v0.2.3 to pull in bug-fixes.
```
Automatic merge from submit-queue
Set snat to false
**What this PR does / why we need it**:
- the [version](e8bea554c5) of the portmap plugin included with calico CNI version `v1.9.1` doesn't have `noSnat` config option, it has `snat` which is not specified (which is the case without this PR), [will be set to true by default](https://github.com/containernetworking/plugins/tree/master/plugins/meta/portmap#usage) , so we need to explicitly set it to `false`
CC @caseydavenport
Automatic merge from submit-queue (batch tested with PRs 45040, 48960)
Add ceph-common to hyperkube image
**What this PR does / why we need it**:
Adds the ceph-common package to the hyperkube image
Automatic merge from submit-queue (batch tested with PRs 48976, 49474, 40050, 49426, 49430)
Fix bug in cluster/log-dump
We're breaking in case KUBECTL is set as "./cluster/kubectl.sh --match-server-version". Moreover we always are using cluster/kubectl.sh as the default and don't want to do match-server-version for the purpose of logexporter.
Also adding owners file so I'm not blocked for approves while making fixes in log-dump. Besides I'll be able to review fixes sent by others.
/cc @wojtek-t
Automatic merge from submit-queue (batch tested with PRs 48976, 49474, 40050, 49426, 49430)
Use presence of kubeconfig file to toggle standalone mode
Fixes#40049
```release-note
The deprecated --api-servers flag has been removed. Use --kubeconfig to provide API server connection information instead. The --require-kubeconfig flag is now deprecated. The default kubeconfig path is also deprecated. Both --require-kubeconfig and the default kubeconfig path will be removed in Kubernetes v1.10.0.
```
/cc @kubernetes/sig-cluster-lifecycle-misc @kubernetes/sig-node-misc
Automatic merge from submit-queue
Remove flags low-diskspace-threshold-mb and outofdisk-transition-frequency
issue: #48843
This removes two flags replaced by the eviction manager. These have been depreciated for two releases, which I believe correctly follows the kubernetes depreciation guidelines.
```release-note
Remove depreciated flags: --low-diskspace-threshold-mb and --outofdisk-transition-frequency, which are replaced by --eviction-hard
```
cc @mtaufen since I am changing kubelet flags
cc @vishh @derekwaynecarr
/sig node
Replaces use of --api-servers with --kubeconfig in Kubelet args across
the turnup scripts. In many cases this involves generating a kubeconfig
file for the Kubelet and placing it in the correct location on the node.
Automatic merge from submit-queue (batch tested with PRs 49326, 49394, 49346, 49379, 49399)
more robust stat handling from ceph df output in the kubernetes-master charm create-rbd-pv action
**What this PR does / why we need it**: more robust stat handling from ceph df output in the kubernetes-master charm create-rbd-pv action
**Release note**:
```release-note
more robust stat handling from ceph df output in the kubernetes-master charm create-rbd-pv action
```
Automatic merge from submit-queue (batch tested with PRs 49420, 49296, 49299, 49371, 46514)
Fix: PV metric is not namespaced
**What this PR does / why we need it**: The PV metric of juju deployments is not namespaced. This PR fixes this bug.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes https://github.com/juju-solutions/bundle-canonical-kubernetes/issues/348
**Special notes for your reviewer**:
**Release note**:
```NONE
```
Automatic merge from submit-queue (batch tested with PRs 49420, 49296, 49299, 49371, 46514)
Update status to show failing services.
**What this PR does / why we need it**: Report on charm status any services that are not running.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes https://github.com/juju-solutions/bundle-canonical-kubernetes/issues/341
**Special notes for your reviewer**:
**Release note**:
```Report failing services in Juju deployed clusters.
```
Automatic merge from submit-queue
Auto-calculate master disk and root disk sizes in GCE
@gmarek PR https://github.com/kubernetes/kubernetes/pull/49282 didn't fix the issue because MASTER_DISK_SIZE was defaulting to 20GB in config-test.sh before being calculated inside get-master-disk-size() where you use pre-existing value if any.
It should be fixed by this now.
Automatic merge from submit-queue (batch tested with PRs 48565, 49172)
On GCE check whether NODE_LOCAL_SSDS=0 and handle this case appropriately
**What this PR does / why we need it**: Presently if you are using a mac and GCE and specify NODE_LOCAL_SSDS=0, or use the default, you end up with 2 local SSDs.
**Which issue this PR fixes** : fixes https://github.com/kubernetes/kubernetes/issues/49171
**Special notes for your reviewer**:
I've discovered that this issue is due to b353792f9c/cluster/gce/util.sh (L579)
If NODE_LOCAL_SSDS=0, this evaluates to $(seq 0)
```
$ for i in $(seq 0); do echo $i; done
1
0
```
From man seq on mac osx
```
The seq utility prints a sequence of numbers, one per line (default), from first (default 1),
to near last as possible, in increments of incr (default 1).When first is larger than last the
default incr is -1.
```
This was run on mac with the seq manpage indicating it comes from BSD Feb 19 2010.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 49222, 49333, 48708, 49337)
Fix issue in installing containerized mounter
Fix PR #49335
PR #49157 causes failure when installing containerized mounter. This
PR is a fix for it
Automatic merge from submit-queue (batch tested with PRs 49222, 49333, 48708, 49337)
glbc: change the label of the l7-lb-controller pod
This ensures that the default http backend service doesn't include this
pod as its endpoint. This fixes#49159
Automatic merge from submit-queue (batch tested with PRs 49330, 49252, 49262, 49278, 49334)
Simplify master-worker relation missing message
**What this PR does / why we need it**: Simplify messaging of missing relation in Juju deployments
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes https://github.com/juju-solutions/bundle-canonical-kubernetes/issues/309
**Special notes for your reviewer**:
**Release note**:
```NONE
```
Automatic merge from submit-queue
Use custom port for node-problem-detector
It fixes https://github.com/kubernetes/kubernetes/issues/49263
```release-note
Use port 20256 for node-problem-detector in standalone mode.
```
Automatic merge from submit-queue
fix the typo of Kubernetes Worker
**What this PR does / why we need it**:
fix the typo of Kubernetes Worker that Kubernetes spell error
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
```NONE
Automatic merge from submit-queue
Bump rescheduler version to v0.3.1
**What this PR does / why we need it**:
Bump Rescheduler version to v0.3.1 to log to STDERR.
**Which issue this PR fixes**
Fixes https://github.com/kubernetes/contrib/issues/2518
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 48377, 48940, 49144, 49062, 49148)
add some more deprecation warnings to cluster
Part of https://github.com/kubernetes/kubernetes/issues/49213
@kubernetes/sig-cluster-lifecycle-misc
Automatic merge from submit-queue
Bump e2e mounttest image version to 0.8
Reduce the number of image files required for e2e test run
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 49058, 49072, 49137, 49182, 49045)
Set default CIDR to /16 for Juju deployments
**What this PR does / why we need it**: Increase the number of IPs on a deployment
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes https://github.com/juju-solutions/bundle-canonical-kubernetes/issues/272
**Special notes for your reviewer**:
**Release note**:
```Set default CIDR to /16 for Juju deployments
```
Automatic merge from submit-queue (batch tested with PRs 49120, 46755, 49157, 49165, 48950)
gce: don't print every file in mounter to stdout
This is printing ~3000 lines.
Automatic merge from submit-queue (batch tested with PRs 48914, 48535, 49099, 48935, 48871)
Log error when fail to execute command in with-retry()
**What this PR does / why we need it**: Enhance gke/util.sh logging.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#48913
**Special notes for your reviewer**:
/cc @krzyzacy
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 49019, 48919, 49040, 49018, 48874)
Set default snap channel on charms to 1.7 stable
**What this PR does / why we need it**: This PR sets the default snap channel on charms to 1.7/stable.
This addresses problems where the the user might want to deploy the charm and get the same kubernetes version found on the bundles.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes https://github.com/juju-solutions/bundle-canonical-kubernetes/issues/305
**Special notes for your reviewer**:
**Release note**:
```Set default snap channel on charms to 1.7/stable
```
Automatic merge from submit-queue (batch tested with PRs 48231, 47377, 48797, 49020, 49033)
prevent unsetting of nonexistent previous port in kubeapi-load-balancer charm
**What this PR does / why we need it**: prevent unsetting of nonexistent previous port in kubeapi-load-balancer charm
**Release note**:
```release-note
prevent unsetting of nonexistent previous port in kubeapi-load-balancer charm
```
Automatic merge from submit-queue (batch tested with PRs 48578, 48895, 48958)
use port configuration
**What this PR does / why we need it**: Uses the `port` config option in the kubeapi-load-balancer charm.
**Release note**:
```release-note
Uses the port config option in the kubeapi-load-balancer charm.
```
Automatic merge from submit-queue
remove some people from OWNERS so they don't get reviews anymore
These are googlers who don't work on the project anymore but are still
getting reviews assigned to them:
- @bprashanth
- @rjnagal
- @vmarmol
Automatic merge from submit-queue (batch tested with PRs 48812, 48276)
Change fluentd-gcp monitoring to use metrics exposed by SD plugin
Following https://github.com/GoogleCloudPlatform/fluent-plugin-google-cloud/pull/135, make fluentd-gcp expose metrics in Prometheus registry and use them instead of counting records in the pipeline.
/cc @piosz @igorpeshansky
```release-note
Fluentd-gcp DaemonSet exposes different set of metrics.
```
Automatic merge from submit-queue (batch tested with PRs 48864, 48651, 47703)
Enable logexporter mechanism to dump logs from k8s nodes to GCS directly
Ref https://github.com/kubernetes/kubernetes/issues/48513
This adds support for logexporter from k8s side. Next I'll send a PR adding support from test-infra side.
/cc @kubernetes/sig-scalability-misc @kubernetes/test-infra-maintainers @fejta @wojtek-t @gmarek
Automatic merge from submit-queue
Fixed cluster validation for multizonal clusters.
Fixed cluster validation for multizonal clusters.
This should fix HA master e2e tests.
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 46748, 48826)
Added `CriticalAddonsOnly` toleration for npd.
**What this PR does / why we need it**:
We should add `CriticalAddonsOnly` toleration to make sure the daemonset can be scheduled on the node even if already planned to run critical pod.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#47015
**Special notes for your reviewer**:
**Release note**:
```release-note
none
```
Automatic merge from submit-queue
Properly nest code blocks
**What this PR does / why we need it**:
Markdown code blocks are adjusted to better display on GitHub. See [rendered](c3fbec7663/cluster/addons/cluster-loadbalancing/glbc/README.md) version.
**Release note**:
```release-note
Adjust markdown code block in README for Google Load Balancer addon.
```
Automatic merge from submit-queue
Update docs for user-guide
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 48781, 48817, 48830, 48829, 48053)
Fix yaml-quote typo
Caught this looking through CI logs.
/assign wojtek-t
Automatic merge from submit-queue (batch tested with PRs 48279, 48566, 48319, 48794, 47952)
Add prometheus plugin on fluentd image.
**What this PR does / why we need it**:
This PR adds the prometheus plugin on Fluentd.
**Special notes for your reviewer**:
The plugin used was: https://github.com/kazegusuri/fluent-plugin-prometheus, on the latest stable version.
All configs used are default.
**Release note**:
```release-note
Fluentd-es addon now exposes a /metrics endpoint for monitoring on port 24231.
```
Automatic merge from submit-queue
Use Container-optimzed OS images for nodes by default
Part of the deprecation of the debian-based ContainerVM images.
```release-note
kube-up and kubemark will default to using cos (GCI) images for nodes.
The previous default was container-vm (CVM, "debian"), which is deprecated.
If you need to explicitly use container-vm for some reason, you should set
KUBE_NODE_OS_DISTRIBUTION=debian
```
Automatic merge from submit-queue
Pass cluster name to Heapster with Stackdriver sink.
**What this PR does / why we need it**:
Passes cluster name as argument to Heapster when it's used with Stackdriver sink to allow setting resource label 'cluster_name' in exported metrics.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 48405, 48742, 48748, 48571, 48482)
Setting default FlexVolume driver directory on COS images.
**What this PR does / why we need it**: The original default FlexVolume driver directory is not writable on COS. A new location is necessary to make FlexVolume work.
This directory doesn't exist by default. FlexVolume users need to create this directory, bind mount it, and remount with the executable permission. The other candidate is /home/kubernetes/bin, but the directory is already getting cluttered. I will submit a different PR for a script that automates this step.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#48570
Automatic merge from submit-queue (batch tested with PRs 48698, 48712, 48516, 48734, 48735)
GCE: Allow empty NETWORK_PROJECT_ID env var
Changes:
1. Adds `GCE_API_ENDPOINT` logic to container-linux as it was added to GCI in #47881.
1. Apply `NETWORK_PROJECT_ID` value to gce.conf only if the env var is set.
/sig network
/area platform/gce
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Launch kubemark with an existing Kubemark master
In order to expand the use of kubemark, allow developers to use kubemark with a pre-existing Kubernetes cluster.
Ref issue #44393
Automatic merge from submit-queue (batch tested with PRs 48399, 48450, 48144)
Skip errors when unregistering juju kubernetes-workers
**What this PR does / why we need it**: When removing a kubernetes node from using Juju and for some reason kubernetes master fails we should not error the node, instead we should proceed with the removal of the node and the master will recognise that node as unavailable because it will fail heartbeats.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes https://github.com/juju-solutions/bundle-canonical-kubernetes/issues/300
**Special notes for your reviewer**:
**Release note**:
```
Clean decommission of Juju kubernetes worker units
```
Automatic merge from submit-queue (batch tested with PRs 48399, 48450, 48144)
configure kube-proxy to run with unset conntrack param when in lxc
**What this PR does / why we need it**: Configures the Juju Charm code to run kube-proxy with `conntrack-max-per-core` set to `0` when in an lxc as a workaround for issues when mounting `/sys/module/nf_conntrack/parameters/hashsize`
**Release note**:
```release-note
Configures the Juju Charm code to run kube-proxy with conntrack-max-per-core set to 0 when in an lxc as a workaround for issues when mounting /sys/module/nf_conntrack/parameters/hashsize
```
Automatic merge from submit-queue (batch tested with PRs 47043, 48448, 47515, 48446)
Fix charms leaving services running after remove-unit
**What this PR does / why we need it**:
This fixes a case where removed charm units can sometimes leave behind running services that interfere with the rest of the cluster.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
Fix charms leaving services running after remove-unit
```
Automatic merge from submit-queue (batch tested with PRs 48439, 48440, 48394)
Fix kubernetes charms not restarting services after snap upgrades
**What this PR does / why we need it**:
This fixes a problem where the Kubernetes charms don't restart services after upgrading snaps. This can cause certain fixes not to be picked up (for example https://github.com/juju-solutions/release/pull/10)
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
Fixed kubernetes charms not restarting services after snap upgrades
```
Automatic merge from submit-queue (batch tested with PRs 48439, 48440, 48394)
Fix: namespace-create have kubectl in path
**What this PR does / why we need it**: In juju deployed clusters namespace-create action is failing
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes https://github.com/juju-solutions/bundle-canonical-kubernetes/issues/326
**Special notes for your reviewer**:
**Release note**:
```Fix: namespace-create action on Juju deployed clusters
```
Automatic merge from submit-queue
Add configuration for swift container name
**What this PR does / why we need it:**
This review updates the OpenStack Heat provider to allow for configuring the name of the Swift object store.
**Which issue this PR fixes:**
fixes#47966
**Special notes for your reviewer**:
Note that the terminology for OpenStack Swift conflicts with K8S terminology. In this instance, container is referring to the organization structure of Swift storage objects.
**Release note**:
```release-note
Adds configuration option for Swift object store container name to OpenStack Heat provider.
```
Automatic merge from submit-queue (batch tested with PRs 48317, 48313, 48351, 48357, 48115)
Ensure get_password is accessing a file that exists.
**What this PR does / why we need it**: get_password will throw an exception instead of returning None in case the basic_auth.csv file is missing but /root/cdk/ is there in a juju deployment.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes https://github.com/juju-solutions/bundle-canonical-kubernetes/issues/324
**Special notes for your reviewer**:
**Release note**:
```
Fix race condition where /root/cdk is not yet initialised in kubernetes-master setup by Juju
```
Automatic merge from submit-queue (batch tested with PRs 47918, 47964, 48151, 47881, 48299)
Add ApiEndpoint support to GCE config.
**What this PR does / why we need it**:
Add the ability to change ApiEndpoint for GCE.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
None
```
Automatic merge from submit-queue (batch tested with PRs 43558, 48261, 42376, 46803, 47058)
Add bind mount /etc/resolv.conf from host to containerized mounter
Currently, in containerized mounter rootfs, there is no DNS setup. If client
try to set up volume with host name instead of IP address, it will fail to resolve
the host name.
By bind mount the host's /etc/resolv.conf to mounter rootfs, VM hosts name
could be resolved when using host name during mount.
```release-note
Fixes issue where you could not mount NFS or glusterFS volumes using hostnames on GCI/GKE with COS images.
```
Automatic merge from submit-queue (batch tested with PRs 47850, 47835, 46197, 47250, 48284)
Securing the cluster created by Juju
**What this PR does / why we need it**: This PR secures the deployments done with Juju master. Works around certain security issues inherent to kubernetes (see for example dashboard access)
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```
Securing Juju kubernetes dashboard
```
Automatic merge from submit-queue (batch tested with PRs 46850, 47984)
Update addon-resizer version
Update addon-resizer version and remove the flags that have been deprecated in the new version.
**What this PR does / why we need it**:
ref kubernetes/contrib#2623
**Special notes for your reviewer**:
Need to wait for merging kubernetes/contrib#2623 first.
**Release note**:
```release-note
addon-resizer flapping behavior was removed.
```
Automatic merge from submit-queue
Allow log-dumping only N randomly-chosen nodes in the cluster
This should let us save "lots" (~3-4 hours) of time in our 5000-node cluster scale tests as we copy logs from all the nodes to jenkins worker and then upload all of them to gcs (while we don't need too many).
This will also prevent the jenkins container facing "No space left on device" error while dumping logs, that we saw in runs 12-13 of gce-enormous-cluster.
The longterm fix will be to enable [logexporter](https://github.com/kubernetes/test-infra/tree/master/logexporter) for our tests.
cc @kubernetes/sig-scalability-misc @kubernetes/test-infra-maintainers @gmarek @fejta
Automatic merge from submit-queue (batch tested with PRs 48004, 48205, 48130, 48207)
Bumped Heapster to v1.4.0
``` release-note
Bumped Heapster to v1.4.0.
More details about the release https://github.com/kubernetes/heapster/releases/tag/v1.4.0
```
follow up #47961
The release candidate `v1.4.0-beta.0` turned out to be stable.
Automatic merge from submit-queue (batch tested with PRs 48004, 48205, 48130, 48207)
Do not set CNI in cases where there is a private master and network policy provider is set.
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
In GCE and in a "private master" setup, do not set the network-plugin provider to CNI by default if a network policy provider is given.
```
Automatic merge from submit-queue (batch tested with PRs 48192, 48182)
Add generic NoSchedule toleration to fluentd in gcp config as a quick…
…-fix for #44445
Automatic merge from submit-queue (batch tested with PRs 48139, 48042, 47645, 48054, 48003)
Add a failsafe for etcd not returning a connection string
**What this PR does / why we need it**: Removing a kubernetes-master will fail as described on this issue: https://github.com/juju-solutions/bundle-canonical-kubernetes/issues/311
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes https://github.com/juju-solutions/bundle-canonical-kubernetes/issues/311
**Special notes for your reviewer**: This is a two liner defensive code. I am not totally sold on this patch. I might not be the right place to address the above issue. However, solving the problem on the etcd side and updating the interface scope to be unit (as suggested) seems much more involving.
**Release note**:
```
Fix error when removing juju kubernetes-master unit
```
Automatic merge from submit-queue
Make big clusters work again after introduction of subnets
This PR does two things:
- make IP aliases automatically pick Node IP Range based on number of Nodes,
- fix logic for starting clusters >4095 Nodes that was broken by introduction of subnets,
cc @wojtek-t @shyamjvs
```release-note
Setting env var ENABLE_BIG_CLUSTER_SUBNETS=true will allow kube-up.sh to start clusters bigger that 4095 Nodes on GCE.
```
Ref https://github.com/kubernetes/kubernetes/issues/47344
Automatic merge from submit-queue
Insert Cynerva and Kjackal to approvers list
**What this PR does / why we need it**:
Per the membership reviews, we're looking to promote Konstantinos and
George to approvers to help distribute the review/bug load for the `cluster/juju` code
tree.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*:
**Special notes for your reviewer**:
cc @marcoceppi and @tvansteenburgh
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 48092, 47894, 47983)
fix systemd service file for custom args.
`KUBE_SCHEDULER_ARGS` and `KUBELET_ARGS` are used to custom args for scheduler or kubelet by users.
But if there are more than one params in `KUBELET_ARGS`, for example, if I set KUBELET_ARGS="--cgroups-per-qos=false --enforce-node-allocatable=", the kubelet will judge the `false --enforce-node-allocatable=` as the value of `cgroups-per-qos`. Because `${KUBELET_ARGS}` in kubelet.service will expands the variable into one word. And if I take `$KUBELET_ARGS` instead, kubelet will worker perfectly.
For more info, please click [EnvironmentFiles and support for /etc/sysconfig files](http://fedoraproject.org/wiki/Packaging:Systemd#EnvironmentFiles_and_support_for_.2Fetc.2Fsysconfig_files). This bug is reported by @huanxingyouyoutoo. And I make this PR for her to fix it.
**Release note**:
```
NONE
```
Automatic merge from submit-queue (batch tested with PRs 48012, 47443, 47702, 47178)
Fix setting juju worker labels during deployment
**What this PR does / why we need it**: Allows for setting the labels of juju workers during deployment (eg inside a bundle)
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#47176
**Special notes for your reviewer**:
**Release note**:
```
Fix bug in setting Juju kubernetes-worker labels in bundle.yaml files.
```
Automatic merge from submit-queue (batch tested with PRs 47860, 47170)
Fix restart action on juju kubernetes-master
**What this PR does / why we need it**: Restart action of kubernetes-master of Juju is not functioning.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes https://github.com/juju-solutions/bundle-canonical-kubernetes/issues/299
**Special notes for your reviewer**:
**Release note**:
```
Fix: Restart action of juju's kubernetes-master restarts the respective snap based services
```
Automatic merge from submit-queue (batch tested with PRs 47860, 47170)
Make fluentd log to stdio instead of a dedicated file
Lower verbosity also, to reduce volume of system logs exported to the backend.
Fix https://github.com/kubernetes/kubernetes/issues/43772
/cc @piosz
Automatic merge from submit-queue (batch tested with PRs 47961, 46276)
Bumped Heapster to v1.4.0-beta.0
Heapster release candidate for Kubernetes 1.7
cc @dchen1107 @caesarxuchao
Automatic merge from submit-queue
Parameterize the binary path and host arch for the hyperkube image
As the [cluster/images/hyperkube/README.md](https://github.com/kubernetes/kubernetes/tree/master/cluster/images/hyperkube) shows, I run the command: `make build VERSION=test ARCH=ppc64le`, but got the below errors, so this PR will fix it.
```
ARCH=ppc64le
cp -r ./* /tmp/hyperkubeTFbYrI
mkdir -p /tmp/hyperkubeTFbYrI/cni-bin
cp ../../../_output/dockerized/bin/linux/ppc64le/hyperkube /tmp/hyperkubeTFbYrI
cp: cannot stat '../../../_output/dockerized/bin/linux/ppc64le/hyperkube': No such file or directory
Makefile:62: recipe for target 'build' failed
make: *** [build] Error 1
```
Automatic merge from submit-queue (batch tested with PRs 47993, 47892, 47591, 47469, 47845)
Bump up npd version to v0.4.1
```
Bump up npd version to v0.4.1
```
Fixes#47219
Automatic merge from submit-queue (batch tested with PRs 47993, 47892, 47591, 47469, 47845)
Use a different env var to enable the ip-masq-agent addon.
We shouldn't mix setting the non-masq-cidr with enabling the addon.
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
```
https://github.com/kubernetes/kubernetes/issues/47865
Automatic merge from submit-queue (batch tested with PRs 47883, 47179, 46966, 47982, 47945)
Strip versions from known api groups in audit policy
Props to @CaoShuFeng for catching this.
Issue: kubernetes/features#22
/cc @ericchiang
Automatic merge from submit-queue (batch tested with PRs 46151, 47602, 47507, 46203, 47471)
Add RBAC support to fluentd-elasticsearch cluster addon
**What this PR does / why we need it**:
Adds rbac support to the fluentd-elasticsearch addon
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#46023
**Special notes for your reviewer**:
**Release note**:
```release-note
Add RBAC support to fluentd-elasticsearch cluster addon
```
Automatic merge from submit-queue (batch tested with PRs 46151, 47602, 47507, 46203, 47471)
es discovery support args apiserver-host and kubeconfig
Now discovery elasticsearch through kubernetes client,but now does not support specifying the apiserver-host or kubeconfig create client.
Automatic merge from submit-queue (batch tested with PRs 47403, 46646, 46906, 46527, 46792)
Avoid redundant copying of tars during kube-up for gce if the same file already exists
**What this PR does / why we need it**:
Whenever I execute cluster/kube-up.sh it copies my tar files to google cloud, even if the files haven't changed. This PR checks to see whether the files already exist, and avoids uploading them again. These files are large and can take a long time to upload.
**Which issue this PR fixes**: fixes#46791
**Special notes for your reviewer**:
Here is the new output:
cluster/kube-up.sh
... Starting cluster in us-central1-b using provider gce
... calling verify-prereqs
... calling verify-kube-binaries
... calling kube-up
Project: PROJECT
Zone: us-central1-b
+++ Staging server tars to Google Storage: gs://kubernetes-staging-PROJECT/kubernetes-devel
+++ kubernetes-server-linux-amd64.tar.gz uploaded earlier, cloud and local file md5 match (md5 = 3a095kcf27267a71fe58f91f89fab1bc)
**Release note**:
```cluster/kube-up.sh on gce now avoids redundant copying of kubernetes tars if the local and cloud files' md5 hash match```
Automatic merge from submit-queue
Remove limits from ip-masq-agent for now and disable ip-masq-agent in GCE
ip-masq-agent when issuing an iptables-save will read any configured iptables on the node. This means that the ip-masq-agent's memory requirements would grow with the number of iptables (i.e. services) on the node.
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
#47865
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue
Don't audit log tokens in TokenReviews
We don't want to leak auth tokens in the audit logs, so only log TokenReview requests at the metadata level.
Issue: kubernetes/features#22
ip-masq-agent when issuing an iptables-save will read
any configured iptables on the node. This means that
the ip-masq-agent's memory requirements would grow
with the number of iptables (i.e. services) on the node.
Disable ip-masq-agent in GCE
Automatic merge from submit-queue
Fix broken command in registry addon document
**What this PR does / why we need it**:
Fix a command example in registry addon document so it matches the example yaml above.
Automatic merge from submit-queue (batch tested with PRs 47906, 47910)
Reduce scheduler CPU request to 75m
On a 1 cpu master we are over budget with CPU requests. Components like npd or cluster autoscaler don't have *any* space to run. We need to reduce some requests.
cc: @gmarek @mikedanese @roberthbailey @davidopp @dchen1107
Automatic merge from submit-queue
Add aleksandra-malinowska to cluster-autoscaler salt definition owners
@aleksandra-malinowska is working on Cluster Autoscaler so she should be added to reviewers and approvers.
Automatic merge from submit-queue
Reduce Cluster Autoscaler cpu request to 10m
We are super tight on 1 cpu master node. With the recent changes we cannot fit to the master if request is bigger than 10m.
cc: @gmarek @MaciekPytel @aleksandra-malinowska
Automatic merge from submit-queue
Update addons with upstream CVE fixes
**What this PR does / why we need it**: refreshes the kube-dns, metadata-proxy, and fluentd-gcp, event-exporter, prometheus-to-sd, and ip-masq-agent addons with new base images containing fixes for the following vulnerabilities:
* CVE-2016-4448
* CVE-2016-9841
* CVE-2016-9843
* CVE-2017-1000366
* CVE-2017-2616
* CVE-2017-9526
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#47386 (yay!)
**Special notes for your reviewer**:
**Release note**:
```release-note
Update kube-dns, metadata-proxy, and fluentd-gcp, event-exporter, prometheus-to-sd, and ip-masq-agent addons with new base images containing fixes for CVE-2016-4448, CVE-2016-9841, CVE-2016-9843, CVE-2017-1000366, CVE-2017-2616, and CVE-2017-9526.
```
/assign @bowei @MrHohn @Q-Lee @crassirostris @dnardo
/cc @dchen1107 @timstclair
Automatic merge from submit-queue (batch tested with PRs 42252, 42251, 42249, 47512, 47887)
Bump the memory request/limit for ip-masq-daemon.
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
issue #47865
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Per the membership reviews we're looking to promote Konstantinos and
George to approvers to help distribute the review/bug load for the juju code
tree.
Automatic merge from submit-queue
Add ip-masq-agent readiness label by default.
Since we are setting the non-masq-cidr in the kubelet to 0.0.0.0/0 we
need to ensure the ip-masq-agent runs.
pr/#46473 made the NON_MASQUERADE_CIDR default to 0.0.0.0/0 which means we need to have this label set now.
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
fixes#47752
**Special notes for your reviewer**:
**Release note**:
```release-note
ip-masq-agent is now the default for GCE
```
Automatic merge from submit-queue
fix validate-cluster.sh
attempt to fix#47379.
Without this fix, the validate-cluster.sh never retries if `kubectl-retry get cs` fails.
cc @dchen1107
Automatic merge from submit-queue (batch tested with PRs 45268, 47573, 47632, 47818)
NODE_TAINTS in gce startup scripts
Currently there is now way to pass a list of taints that should be added on node registration (at least not in gce or other saltbased deployment). This PR adds necessary plumbing to pass the taints from user or instance group template to kubelet startup flags.
```release-note
Taints support in gce/salt startup scripts.
```
The PR was manually tested.
```
NODE_TAINTS: 'dedicated=ml:NoSchedule'
```
in kube-env results in
```
spec:
[...]
taints:
- effect: NoSchedule
key: dedicated
timeAdded: null
value: ml
```
cc: @davidopp @gmarek @dchen1107 @MaciekPytel
setting the non-masq-cidr in the kubelet to 0.0.0.0/0 we
need to ensure the ip-masq-agent runs.
Add node label pre-req back to ip-masq-agent.
Make gce test consistent with gce default scripts.
Automatic merge from submit-queue (batch tested with PRs 46604, 47634)
Set price expander in Cluster Autoscaler for GCE
With CA 0.6 we will make price-preferred node expander the default one for GCE. For other cloud providers we will stick to the default one (random) until the community implement the required interfaces in CA repo.
https://github.com/kubernetes/autoscaler/issues/82
cc: @MaciekPytel @aleksandra-malinowska
Automatic merge from submit-queue (batch tested with PRs 47669, 40284, 47356, 47458, 47701)
Standardize on home/kubernetes/bin for CNI
**What this PR does / why we need it**:
Standardizes where CNI plugins get installed on GCE.
**Which issue this PR fixes**
Fixes: https://github.com/kubernetes/kubernetes/issues/47453
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 47669, 40284, 47356, 47458, 47701)
Mark Static pods on the Master as critical
fixes#47277.
A known issue with static pods is that they do not interact well with evictions. If a static pod is evicted or oom killed, then it will never be recreated. To mitigate this, we do not evict static pods that are critical. In addition, non-critical pods are candidates for preemption if a critical pod is scheduled to the node. If there are not enough allocatable resources on the node, this causes the static pod to be preempted.
This PR marks all static pods in the kube-system namspace as critical.
cc @vishh @dchen1107
Automatic merge from submit-queue (batch tested with PRs 46327, 47166)
mark --network-plugin-dir deprecated for kubelet
**What this PR does / why we need it**:
**Which issue this PR fixes** : fixes#43967
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 47530, 47679)
Use cos-stable-59-9460-64-0 instead of cos-beta-59-9460-20-0.
Remove dead code that has now moved to another repo as part of #47467
**Release note**:
```release-note
NONE
```
/sig node
Automatic merge from submit-queue (batch tested with PRs 47626, 47674, 47683, 47290, 47688)
Use echoserver:1.6 for better debugging and XSS prevention.
**What this PR does / why we need it**: This updates our test code to use a newer echoserver with XSS preventions.
**Which issue this PR fixes**: fixes#47682
**Special notes for your reviewer**: Marking as 1.7 since it's a fix to test code.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 47626, 47674, 47683, 47290, 47688)
Fix Juju kubernetes-master idle status never being set
**What this PR does / why we need it**:
This fixes a problem with the kubernetes-master charm where the "Kubernetes master running." status message never gets set.
This happens because the `kube-api-endpoint.connected` state that it's waiting for doesn't exist. The state we need is `kube-api-endpoint.available` as seen [here](https://github.com/juju-solutions/interface-http/blob/master/provides.py#L12).
Additionally, we need to add the relation arguments to idle_status so it doesn't break when called.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#47676
**Special notes for your reviewer**:
**Release note**:
```release-note
Fix Juju kubernetes-master idle status never being set
```
Automatic merge from submit-queue (batch tested with PRs 47626, 47674, 47683, 47290, 47688)
The KUBE-METADATA-SERVER firewall must be applied before the universa…
…l tcp ACCEPT
**What this PR does / why we need it**: the metadata firewall rule was broken by being appended after the universal tcp accept.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 38751, 44282, 46382, 47603, 47606)
Working on fixing #43716.
This will create the necessary certificates.
On GCE is will upload those certificates to Metadata.
They are then pulled down on to the kube-apiserver.
They are written to the /etc/src/kubernetes/pki directory.
Finally they are loaded vi the appropriate command line flags.
The requestheader-client-ca-file can be seen by running the following:-
kubectl get ConfigMap extension-apiserver-authentication
--namespace=kube-system -o yaml
Minor bug fixes.
Made sure AGGR_MASTER_NAME is set up in all configs.
Clean up variable names.
Added additional requestheader configuration parameters.
Added check so that if there is no Aggregator CA contents we won't start
the aggregator with the relevant flags.
**What this PR does / why we need it**:
This PR creates a request header CA. It also creates a proxy client cert/key pair.
It causes these files to end up on kube-apiserver and set the CLI flags so they are properly loaded.
Without it the customer either has to set them up themselves or re-use the master CA which is a security vulnerability.
Currently this creates everything on GCE.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#43716
**Special notes for your reviewer**:
This is a reapply of pull/47094 with the GKE issue resolved.
**Release note**: None
- It contains a fix for ipaliasing.
- It contains a fix which decouples GPU driver installation from kernel
version.
Remove dead code that has now moved to another repo as part of #47467
Automatic merge from submit-queue
Don't start any Typha instances if not using Calico
**What this PR does / why we need it**:
Don't start any Typha instances if Calico isn't being used. A recent change now includes all add-ons on the master, but we don't always want a Typha replica.
**Which issue this PR fixes**
Fixes https://github.com/kubernetes/kubernetes/issues/47622
**Release note**:
```release-note
NONE
```
cc @dnardo
Automatic merge from submit-queue (batch tested with PRs 47562, 47605)
Adding option in node start script to add "volume-plugin-dir" flag to kubelet.
**What this PR does / why we need it**: Adds a variable to allow specifying FlexVolume driver directory through cluster/kube-up.sh. Without this, the process of setting up FlexVolume in a non-default directory is very manual.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#47561
Automatic merge from submit-queue
Add encryption provider support via environment variables
These changes are needed to allow cloud providers to use the encryption providers as an alpha feature. The version checks can be done in the respective cloud providers'.
Context: #46460 and #46916
@destijl @jcbsmpsn @smarterclayton
Automatic merge from submit-queue
[GCE] Bump GLBC version to 0.9.5
Fixes#47559
```release-note
Bump GLBC version to 0.9.5 - fixes [loss of manually modified GCLB health check settings](https://github.com/kubernetes/kubernetes/issues/47559) upon upgrade from pre-1.6.4 to either 1.6.4 or 1.6.5, or from pre-1.5.8 to 1.5.8.
```
This will create the necessary certificates.
On GCE is will upload those certificates to Metadata.
They are then pulled down on to the kube-apiserver.
They are written to the /etc/src/kubernetes/pki directory.
Finally they are loaded vi the appropriate command line flags.
The requestheader-client-ca-file can be seen by running the following:-
kubectl get ConfigMap extension-apiserver-authentication
--namespace=kube-system -o yaml
Minor bug fixes.
Made sure AGGR_MASTER_NAME is set up in all configs.
Clean up variable names.
Added additional requestheader configuration parameters.
Added check so that if there is no Aggregator CA contents we won't start
the aggregator with the relevant flags.
Automatic merge from submit-queue (batch tested with PRs 47492, 47542, 46800, 47545, 45764)
Update addons with upstream CVE fixes
**What this PR does / why we need it**: refreshes the cluster-proportional-autoscaler, metadata-proxy, and fluentd-gcp addons with new base images with fixes for the following vulnerabilities:
* CVE-2016-4448
* CVE-2016-8859
* CVE-2016-9841
* CVE-2016-9843
* CVE-2017-9526
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: x-ref #47386, though there are still a few images left to update
**Release note**:
```release-note
Update cluster-proportional-autoscaler, metadata-proxy, and fluentd-gcp addons with fixes for CVE-2016-4448, CVE-2016-8859, CVE-2016-9841, CVE-2016-9843, and CVE-2017-9526.
```
/cc @timstclair @MrHohn @Q-Lee @crassirostris
Automatic merge from submit-queue
Fix dangling reference to gcloud alpha API for GCI (should be beta)
This reference to the alpha API was missed (fixed in GCE, but not GCI)
Fixes#47494
```release-note
none
```
Automatic merge from submit-queue (batch tested with PRs 47510, 47516, 47482, 47521, 47537)
Allow autoscaler min at 0 in GCE
Allow scaling migs to zero in GCE startup scripts. This only makes sense when there is more than 1 mig. The main use case (for now) will be to test scaling to to zero in e2e tests.
Automatic merge from submit-queue (batch tested with PRs 47302, 47389, 47402, 47468, 47459)
Change port on which fluentd exposes its metrics
Fix https://github.com/kubernetes/kubernetes/issues/47397
/cc @Q-Lee @nicksardo
```release-note
Stackdriver Logging deployment exposes metrics on node port 31337 when enabled.
```
Automatic merge from submit-queue (batch tested with PRs 47302, 47389, 47402, 47468, 47459)
Update to kube-addon-manager:v6.4-beta.2: kubectl v1.6.4 and refreshed base images
**What this PR does / why we need it**: refreshes base images for kube-addon-manager with fixes for CVE-2016-9841 and CVE-2016-9843.
x-ref https://github.com/kubernetes/kubernetes/issues/47386
**Special notes for your reviewer**: the updated images are not yet pushed, so tests will fail until that's done.
**Release note**:
```release-note
```
/assign @MrHohn
Automatic merge from submit-queue (batch tested with PRs 46441, 43987, 46921, 46823, 47276)
Enable Node authorizer and NodeRestriction admission in kubemark
xref https://github.com/kubernetes/features/issues/279
We want to ensure scale testing covers use of the authorizer/admission pair that partitions nodes. This includes enabling the authorizer, which populates a graph of existing nodes and pods.
Kubemark is still running all nodes with a single credential, so a follow-up step is to generate unique credentials per node (or enable TLS bootstrapping) and remove the temporary rolebinding added in this PR so the node authorizer is the one authorizing each call by a hollow node.
Automatic merge from submit-queue (batch tested with PRs 47000, 47188, 47094, 47323, 47124)
Set up proxy certs for Aggregator.
Working on fixing https://github.com/kubernetes/kubernetes/issues/43716.
This will create the necessary certificates.
On GCE is will upload those certificates to Metadata.
They are then pulled down on to the kube-apiserver.
They are written to the /etc/src/kubernetes/pki directory.
Finally they are loaded vi the appropriate command line flags.
The requestheader-client-ca-file can be seen by running the following:-
kubectl get ConfigMap extension-apiserver-authentication --namespace=kube-system -o yaml
**What this PR does / why we need it**:
This PR creates a request header CA. It also creates a proxy client cert/key pair.
It causes these files to end up on kube-apiserver and set the CLI flags so they are properly loaded.
Without it the customer either has to set them up themselves or re-use the master CA which is a security vulnerability.
Currently this creates everything on GCE.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#43716
**Special notes for your reviewer**:
Automatic merge from submit-queue (batch tested with PRs 47000, 47188, 47094, 47323, 47124)
Add Calico typha agent
**What this PR does / why we need it**:
- Adds the Calico typha agent with autoscaling to the GCE scripts.
- Adds logic to adjust Calico resource requests based on cluster size.
Fixes https://github.com/kubernetes/kubernetes/issues/47269
**Special notes for your reviewer**:
CC @dnardo
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Make fluentd-gcp run with host network
Fluentd-gcp should have access to instance's platform-dependent service account in order to work.
/cc @piosz
Working on fixing https://github.com/kubernetes/kubernetes/issues/43716.
This will create the necessary certificates.
On GCE is will upload those certificates to Metadata.
They are then pulled down on to the kube-apiserver.
They are written to the /etc/src/kubernetes/pki directory.
Finally they are loaded vi the appropriate command line flags.
The requestheader-client-ca-file can be seen by running the following:-
kubectl get ConfigMap extension-apiserver-authentication
--namespace=kube-system -o yaml
Minor bug fixes.
Made sure AGGR_MASTER_NAME is set up in all configs.
Clean up variable names.
Added additional requestheader configuration parameters.
Automatic merge from submit-queue
Remove e2e-rbac-bindings.
Replace todo-grabbag binding w/ more specific heapster roles/bindings.
Move kubelet binding.
**What this PR does / why we need it**:
The "e2e-rbac-bindings" held 2 leftovers from the 1.6 RBAC rollout process:
- One is the "kubelet-binding" which grants the "system:node" role to kubelet. This is needed until we enable the node authorizer. I moved this to the folder w/ some other kubelet related bindings.
- The other is the "todo-remove-grabbag-cluster-admin" binding, which grants the cluster-admin role to the default service account in the kube-system namespace. This appears to only be required for heapster. Heapster will instead use a "heapster" service account, bound to a "system:heapster" role on the cluster (no write perms), and a "system:pod-nanny" role in the kube-system namespace.
**Which issue this PR fixes**: Addresses part of #39990
**Release Note**:
```release-note
New and upgraded 1.7 GCE/GKE clusters no longer have an RBAC ClusterRoleBinding that grants the `cluster-admin` ClusterRole to the `default` service account in the `kube-system` namespace.
If this permission is still desired, run the following command to explicitly grant it, either before or after upgrading to 1.7:
kubectl create clusterrolebinding kube-system-default --serviceaccount=kube-system:default --clusterrole=cluster-admin
```
Automatic merge from submit-queue
Audit webhook config for GCE
Add a `ADVANCED_AUDIT_BACKEND` (comma delimited list) environment variable to the GCE cluster config to select the audit backend, and add configuration for the webhook backend.
~~Based on the first commit from https://github.com/kubernetes/kubernetes/pull/46557~~
For kubernetes/features#22
Since this is GCE-only configuration plumbing, I think this should be exempt from code-freeze.
Automatic merge from submit-queue
Remove Initializers from admission-control in kubernetes-master charm for pre-1.7
**What this PR does / why we need it**:
This fixes a problem with the kubernetes-master charm where kube-apiserver never comes up:
```
failed to initialize admission: Unknown admission plugin: Initializers
```
The Initializers plugin does not exist before Kubernetes 1.7. The charm needs to support 1.6 as well.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#47062
**Special notes for your reviewer**:
This fixes a problem introduced by https://github.com/kubernetes/kubernetes/pull/36721
**Release note**:
```release-note
Remove Initializers from admission-control in kubernetes-master charm for pre-1.7
```
Automatic merge from submit-queue
Fixes 47182
**What this PR does / why we need it**: Adds some state guards to the idle_status message to speed up the deployment
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#47182
**Special notes for your reviewer**:
This adds additional state guards of the idle_status method, which will
prevent it from being run until a worker has joined the relationship.
Previous invocations may have some messaging inconsistencies but will reach
eventual consistency once a worker has joined.
This prevents the polling loop from executing too soon, bloating the
installation time by bare-minimum an additional 10 minutes.
**Release note**:
```release-note
Added state guards to the idle_status messaging in the kubernetes-master charm to make deployment faster on initial deployment.
```
Automatic merge from submit-queue
Bump up npd version to v0.4.0
Fixes#47070.
Bump up npd version to [v0.4.0](https://github.com/kubernetes/node-problem-detector/releases/tag/v0.4.0).
```release-note
Bump up Node Problem Detector version to v0.4.0, which added support of parsing log from /dev/kmsg and ABRT.
```
/cc @dchen1107 @ajitak
This adds additional state guardsof the idle_status method, which will
prevent it from being run until a worker has joined the relationship.
Previous invocations may have some message artifacting, but will reach
eventual consistency once a worker has joined.
This prevents the polling loop from executing too soon, bloating the
installation time by bare-minimum an additional 10 minutes.
Automatic merge from submit-queue (batch tested with PRs 46718, 46828, 46988)
Update docs/ links to point to main site
**What this PR does / why we need it**:
This updates various links to either point to kubernetes.io or to the kubernetes/community repo instead of the legacy docs/ tree in k/k
Pre-requisite for #46813
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
@kubernetes/sig-docs-maintainers @chenopis @ahmetb @thockin
Automatic merge from submit-queue (batch tested with PRs 46897, 46899, 46864, 46854, 46875)
Write audit policy file for GCE/GKE configuration
Setup the audit policy configuration for GCE & GKE. Here is the high level summary of the policy:
- Default logging everything at `Metadata`
- Known write APIs default to `RequestResponse`
- Known read-only APIs default to `Request`
- Except secrets & configmaps are logged at `Metadata`
- Don't log events
- Don't log `/version`, swagger or healthchecks
In addition to the above, I spent time analyzing the noisiest lines in the audit log from a cluster that soaked for 24 hours (and ran a batch of e2e tests). Of those top requests, those that were identified as low-risk (all read-only, except update kube-system endpoints by controllers) are dropped.
I suspect we'll want to tweak this a bit more once we've had a time to soak it on some real clusters.
For kubernetes/features#22
/cc @sttts @ericchiang
Automatic merge from submit-queue
Add event exporter deployment to the fluentd-gcp addon
Introduce event exporter deployment to the fluentd-gcp addon so that by default if logging to Stackdriver is enabled, events will be available there also.
In this release, event exporter is a non-critical pod in BestEffort QoS class to avoid preempting actual workload in tightly loaded clusters. It will become critical in one of the future releases.
```release-note
Stackdriver cluster logging now deploys a new component to export Kubernetes events.
```
Automatic merge from submit-queue (batch tested with PRs 46972, 42829, 46799, 46802, 46844)
promote tls-bootstrap to beta
last commit of this PR.
Towards https://github.com/kubernetes/kubernetes/issues/46999
```release-note
Promote kubelet tls bootstrap to beta. Add a non-experimental flag to use it and deprecate the old flag.
```
Automatic merge from submit-queue (batch tested with PRs 46734, 46810, 46759, 46259, 46771)
Add iptables lock-file mount to kube-proxy manifest
**What this PR does / why we need it**: kube-proxy is broken in make bazel-release. The new iptables binary uses a lockfile in "/run", but the directory doesn't exist. This causes iptables-restore to fail. We need to share the same lock-file amongst all containers, so mount the host /run dir.
This is similar to #46132 but expediency matters, since builds are broken.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#46103
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue
Respect PDBs during node upgrades and add test coverage to the ServiceTest upgrade test.
This is still a WIP... needs to be squashed at least, and I don't think it's currently passing until I increase the scale of the RC, but please have a look at the general outline. Thanks!
Fixes#38336
@kow3ns @bdbauer @krousey @erictune @maisem @davidopp
```
On GCE, node upgrades will now respect PodDisruptionBudgets, if present.
```
Automatic merge from submit-queue
Add some initial resource limits to the ip-masq-agent.
These limits were based on observing the agent over roughly a day RES was typically ~4M for me but I'd like to make sure we have some headroom. If there was a huge config map then this could increase slightly but not significantly since we only allow 64 entries.
VmPeak: 11164 kB
VmSize: 11164 kB
VmLck: 0 kB
VmPin: 0 kB
VmHWM: 7652 kB
VmRSS: 4260 kB
VmData: 7612 kB
VmStk: 136 kB
VmExe: 1856 kB
VmLib: 0 kB
VmPTE: 40 kB
VmPMD: 20 kB
VmSwap: 0 kB
Automatic merge from submit-queue
Adding a metadata proxy addon
**What this PR does / why we need it**: adds a metadata server proxy daemonset to hide kubelet secrets.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: this partially addresses #8867
**Special notes for your reviewer**:
**Release note**: the gce metadata server can be hidden behind a proxy, hiding the kubelet's token.
```release-note
The gce metadata server can be hidden behind a proxy, hiding the kubelet's token.
```
Automatic merge from submit-queue
Add initializer support to admission and uninitialized filtering to rest storage
Initializers are the opposite of finalizers - they allow API clients to react to object creation and populate fields prior to other clients seeing them.
High level description:
1. Add `metadata.initializers` field to all objects
2. By default, filter objects with > 0 initializers from LIST and WATCH to preserve legacy client behavior (known as partially-initialized objects)
3. Add an admission controller that populates .initializer values per type, and denies mutation of initializers except by certain privilege levels (you must have the `initialize` verb on a resource)
4. Allow partially-initialized objects to be viewed via LIST and WATCH for initializer types
5. When creating objects, the object is "held" by the server until the initializers list is empty
6. Allow some creators to bypass initialization (set initializers to `[]`), or to have the result returned immediately when the object is created.
The code here should be backwards compatible for all clients because they do not see partially initialized objects unless they GET the resource directly. The watch cache makes checking for partially initialized objects cheap. Some reflectors may need to change to ask for partially-initialized objects.
```release-note
Kubernetes resources, when the `Initializers` admission controller is enabled, can be initialized (defaulting or other additive functions) by other agents in the system prior to those resources being visible to other clients. An initialized resource is not visible to clients unless they request (for get, list, or watch) to see uninitialized resources with the `?includeUninitialized=true` query parameter. Once the initializers have completed the resource is then visible. Clients must have the the ability to perform the `initialize` action on a resource in order to modify it prior to initialization being completed.
```
Automatic merge from submit-queue (batch tested with PRs 46456, 46675, 46676, 46416, 46375)
Move tolerations to PodSpec for ip-masq-agent.yaml.
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 46456, 46675, 46676, 46416, 46375)
Move tolerations to PodSpec for calico-node.yaml.
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
none
```
Automatic merge from submit-queue (batch tested with PRs 46239, 46627, 46346, 46388, 46524)
Configure NPD version through env variable
This lets user specify NPD version to be installed with kubernetes.
Automatic merge from submit-queue (batch tested with PRs 41563, 45251, 46265, 46462, 46721)
change kubemark image project to match new cos image project
The old project is not available anymore.
https://github.com/kubernetes/kubernetes/pull/45136
Automatic merge from submit-queue (batch tested with PRs 46648, 46500, 46238, 46668, 46557)
Add an e2e test for AdvancedAuditing
Enable a simple "advanced auditing" setup for e2e tests running on GCE, and add an e2e test that creates & deletes a pod, a secret, and verifies that they're audited.
Includes https://github.com/kubernetes/kubernetes/pull/46548
For https://github.com/kubernetes/features/issues/22
/cc @ericchiang @sttts @soltysh @ihmccreery
Respect PDBs during node upgrades and add test coverage to the
ServiceTest upgrade test. Modified that test so that we include pod
anti-affinity constraints and a PDB.
Automatic merge from submit-queue
Set Kubelet Disk Defaults for the 1.7 release
The `--low-diskspace-threshold-mb` flag has been depreciated since 1.6.
This PR sets the default to `0`, and sets defaults for disk eviction based on the values used for our [e2e tests](https://github.com/kubernetes/kubernetes/blob/master/test/e2e_node/services/kubelet.go#L145).
This also removes the custom defaults for vagrant, as the new defaults should work for it as well.
/assign @derekwaynecarr
cc @vishh
```release-note
By default, --low-diskspace-threshold-mb is not set, and --eviction-hard includes "nodefs.available<10%,nodefs.inodesFree<5%"
```
Automatic merge from submit-queue (batch tested with PRs 46661, 46562, 46657, 46655, 46640)
remove openvpn and nginx from salt
only used in azure which doesn't exist.
Automatic merge from submit-queue
Plumb through the ENABLE_LEGACY_ABAC flag for GKE kube-up.
**What this PR does / why we need it**:
Makes the "gke" provider in `cluster/` respect the `ENABLE_LEGACY_ABAC` env var by plumbing it through to the `--enable-legacy-authorization` gcloud flag.
Automatic merge from submit-queue
Support storageclass storage updates to v1
**What this PR does / why we need it**: enable cluster administrators to update storageclasses stored in etcd from storage.k8s.io/v1beta1 to storage.k8s.io/v1.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**: I had a hard time getting the test to work with how it was handling KUBE_API_VERSIONS and RUNTIME_CONFIG. I would appreciate some extra review attention there. Also, I had to hack in a `cluster-scoped` "namespace" to get the verification portions of the test script to work. I'm definitely open to ideas for how to improve that if needed.
**Release note**:
```release-note
Support updating storageclasses in etcd to storage.k8s.io/v1. You must do this prior to upgrading to 1.8.
```
cc @kubernetes/sig-storage-pr-reviews @kubernetes/sig-api-machinery-pr-reviews @jsafrane @deads2k @saad-ali @enj
Automatic merge from submit-queue
gcloud command syntax changed between alpha and beta versions
syntax for secondary-ranges changed from:
name=NAME,range=RANGE
to
NAME=RANGE
Automatic merge from submit-queue
Add generic NoExecute Toleration to NPD
Ref. #44445
cc @davidopp
```release-note
Add generic Toleration for NoExecute Taints to NodeProblemDetector
```
Automatic merge from submit-queue
fix typo in build.sh
**What this PR does / why we need it**:
fix typo in build.sh
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
NONE
**Special notes for your reviewer**:
NONE
**Release note**:
```release-note
NONE
```
proxy_handler now uses the endpoint router to map the cluster IP to
appropriate endpoint (Pod) IP for the given resource.
Added code to allow aggregator routing to be optional.
Updated bazel build.
Fixes to cover JLiggit comments.
Added util ResourceLocation method based on Listers.
Fixed issues from verification steps.
Updated to add an interface to obfuscate some of the routing logic.
Collapsed cluster IP resolution in to the aggregator routing
implementation.
Added 2 simple unit tests for ResolveEndpoint
Automatic merge from submit-queue (batch tested with PRs 46501, 45944, 46473)
Enable the ip-masq-agent on GCE installs
Setting this will trigger cluster/addons/ip-masq-agent/ip-masq-agent.yaml to be installed as an addon, which disable configure IP masquerade for all of RFC1918, rather
than just 10.0/8.
Because the flag defaulted to 10.0/8 we can't just change the default. I think anyone who needs IP masquerade set up should probably use this instead.
@justinsb @kubernetes/sig-cluster-lifecycle-misc
Fixes#11204
@dnardo - any reason not to do this?
Release Note:
```release-note
GCE installs will now avoid IP masquerade for all RFC-1918 IP blocks, rather than just 10.0.0.0/8. This means that clusters can
be created in 192.168.0.0./16 and 172.16.0.0/12 while preserving the container IPs (which would be lost before).
```
Automatic merge from submit-queue (batch tested with PRs 46429, 46308, 46395, 45867, 45492)
Bump Go version to 1.8.3
This PR also removed this patched version of Go 1.8.1 which we used to use to workaround performance problem of Go 1.8.1.
Fix https://github.com/kubernetes/kubernetes/issues/45216
Ref #46391
@timothysc @bradfitz
Automatic merge from submit-queue (batch tested with PRs 46124, 46434, 46089, 45589, 46045)
Bump elasticsearch and kibana to 5.4.0
**What this PR does / why we need it**: Updates elasticsearch and kibana docker image assets to 5.4.0 version
**Release note**:
```release-note
Upgrade Elasticsearch Addon to v5.4.0
```
Setting this will trigger
cluster/addons/ip-masq-agent/ip-masq-agent.yaml to be installed as an
addon, which disable configure IP masquerade for all of RFC1918, rather
than just 10.0/8.
Automatic merge from submit-queue (batch tested with PRs 44774, 46266, 46248, 46403, 46430)
kube-proxy: ratelimit runs of iptables by sync-period flags
This bounds how frequently iptables can be synced. It will be no more often than every 10 seconds and no less often than every 1 minute, by default.
@timothysc FYI
@dcbw @freehan FYI
Automatic merge from submit-queue (batch tested with PRs 45573, 46354, 46376, 46162, 46366)
GCE - Retrieve subnetwork name/url from gce.conf
**What this PR does / why we need it**:
Features like ILB require specifying the subnetwork if the network is type manual.
**Notes:**
The network URL can be [constructed](68e7e18698/pkg/cloudprovider/providers/gce/gce.go (L211-L217)) by fetching instance metadata; however, the subnetwork is not provided through this feature. Users must specify the subnetwork name/url through the gce.conf.
Although multiple subnets can exist in the same region for a network, the cloud provider will only use one subnet url for creating LBs.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 46299, 46309, 46311, 46303, 46150)
Create a subnet for reserving the service cluster IP range
This will be done if IP aliases is enabled on GCP.
```release-note
NONE
```
Automatic merge from submit-queue
Allow the /logs handler on the apiserver to be toggled.
Adds a flag to kube-apiserver, and plumbs through en environment variable in configure-helper.sh
Automatic merge from submit-queue
Enable "kick the tires" support for Nvidia GPUs in COS
This PR provides an installation daemonset that will install Nvidia CUDA drivers on Google Container Optimized OS (COS).
User space libraries and debug utilities from the Nvidia driver installation are made available on the host in a special directory on the host -
* `/home/kubernetes/bin/nvidia/lib` for libraries
* `/home/kubernetes/bin/nvidia/bin` for debug utilities
Containers that run CUDA applications on COS are expected to consume the libraries and debug utilities (if necessary) from the host directories using `HostPath` volumes.
Note: This solution requires updating Pod Spec across distros. This is a known issue and will be addressed in the future. Until then CUDA workloads will not be portable.
This PR updates the COS base image version to m59. This is coupled with this PR for the following reasons:
1. Driver installation requires disabling a kernel feature in COS.
2. The kernel API for disabling this interface changed across COS versions
3. If the COS image update is not handled in this PR, then a subsequent COS image update will break GPU integration and will require an update to the installation scripts in this PR.
4. Instead of having to post `3` PRs, one each for adding the basic installer, updating COS to m59, and then updating the installer again, this PR combines all the changes to reduce review overhead and latency, and additional noise that will be created when GPU tests break.
**Try out this PR**
1. Get Quota for GPUs in any region
2. `export `KUBE_GCE_ZONE=<zone-with-gpus>` KUBE_NODE_OS_DISTRIBUTION=gci`
3. `NODE_ACCELERATORS="type=nvidia-tesla-k80,count=1" cluster/kube-up.sh`
4. `kubectl create -f cluster/gce/gci/nvidia-gpus/cos-installer-daemonset.yaml`
5. Run your CUDA app in a pod.
**Another option is to run a e2e manually to try out this PR**
1. Get Quota for GPUs in any region
2. export `KUBE_GCE_ZONE=<zone-with-gpus>` KUBE_NODE_OS_DISTRIBUTION=gci
3. `NODE_ACCELERATORS="type=nvidia-tesla-k80,count=1"`
4. `go run hack/e2e.go -- --up`
5. `hack/ginkgo-e2e.sh --ginkgo.focus="\[Feature:GPU\]"`
The e2e will install the drivers automatically using the daemonset and then run test workloads to validate driver integration.
TODO:
- [x] Update COS image version to m59 release.
- [x] Remove sleep from the install script and add it to the daemonset
- [x] Add an e2e that will run the daemonset and run a sample CUDA app on COS clusters.
- [x] Setup a test project with necessary quota to run GPU tests against HEAD to start with https://github.com/kubernetes/test-infra/pull/2759
- [x] Update node e2e serial configs to install nvidia drivers on COS by default
Automatic merge from submit-queue (batch tested with PRs 46201, 45952, 45427, 46247, 46062)
remove the elasticsearch template
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```
NONE
```
Loading file-based index template has been disabled since 2.0.0-beta1 version of Elasticsearch. https://www.elastic.co/guide/en/elasticsearch/reference/2.0/breaking_20_index_api_changes.html#_file_based_index_templates
So the `template-k8s-logstash.json` is not longer useful.
On the other hand, as https://github.com/kubernetes/kubernetes/issues/25127 indicated, we might better curl the elasticsearch API to load this template.
Automatic merge from submit-queue
Add version for fluentd-gcp config
Fluentd-gcp config should be versioned, because otherwise during the update race can happen and the new pod can mount the old config
Packaged the script as a docker container stored in gcr.io/google-containers
A daemonset deployment is included to make it easy to consume the installer
A cluster e2e has been added to test the installation daemonset along with verifying installation
by using a sample CUDA application.
Node e2e for GPUs updated to avoid running on nodes without GPU devices.
Signed-off-by: Vishnu kannan <vishnuk@google.com>
Automatic merge from submit-queue
Update Calico add-on
**What this PR does / why we need it:**
Updates Calico to the latest version using self-hosted install as a DaemonSet, removes Calico's dependency on etcd.
- [x] Remove [last bits of Calico salt](175fe62720/cluster/saltbase/salt/calico/master.sls (L3))
- [x] Failing on the master since no kube-proxy to access API.
- [x] Fix outgoing NAT
- [x] Tweak to work on both debian / GCI (not just GCI)
- [x] Add the portmap plugin for host port support
Maybe:
- [ ] Add integration test
**Which issue this PR fixes:**
https://github.com/kubernetes/kubernetes/issues/32625
**Try it out**
Clone the PR, then:
```
make quick-release
export NETWORK_POLICY_PROVIDER=calico
export NODE_OS_DISTRIBUTION=gci
export MASTER_SIZE=n1-standard-4
./cluster/kube-up.sh
```
**Release note:**
```release-note
The Calico version included in kube-up for GCE has been updated to v2.2.
```