Automatic merge from submit-queue
Make fluentd-gcp run with host network
Fluentd-gcp should have access to instance's platform-dependent service account in order to work.
/cc @piosz
Automatic merge from submit-queue
Remove e2e-rbac-bindings.
Replace todo-grabbag binding w/ more specific heapster roles/bindings.
Move kubelet binding.
**What this PR does / why we need it**:
The "e2e-rbac-bindings" held 2 leftovers from the 1.6 RBAC rollout process:
- One is the "kubelet-binding" which grants the "system:node" role to kubelet. This is needed until we enable the node authorizer. I moved this to the folder w/ some other kubelet related bindings.
- The other is the "todo-remove-grabbag-cluster-admin" binding, which grants the cluster-admin role to the default service account in the kube-system namespace. This appears to only be required for heapster. Heapster will instead use a "heapster" service account, bound to a "system:heapster" role on the cluster (no write perms), and a "system:pod-nanny" role in the kube-system namespace.
**Which issue this PR fixes**: Addresses part of #39990
**Release Note**:
```release-note
New and upgraded 1.7 GCE/GKE clusters no longer have an RBAC ClusterRoleBinding that grants the `cluster-admin` ClusterRole to the `default` service account in the `kube-system` namespace.
If this permission is still desired, run the following command to explicitly grant it, either before or after upgrading to 1.7:
kubectl create clusterrolebinding kube-system-default --serviceaccount=kube-system:default --clusterrole=cluster-admin
```
Automatic merge from submit-queue
Bump up npd version to v0.4.0
Fixes#47070.
Bump up npd version to [v0.4.0](https://github.com/kubernetes/node-problem-detector/releases/tag/v0.4.0).
```release-note
Bump up Node Problem Detector version to v0.4.0, which added support of parsing log from /dev/kmsg and ABRT.
```
/cc @dchen1107 @ajitak
Automatic merge from submit-queue (batch tested with PRs 46718, 46828, 46988)
Update docs/ links to point to main site
**What this PR does / why we need it**:
This updates various links to either point to kubernetes.io or to the kubernetes/community repo instead of the legacy docs/ tree in k/k
Pre-requisite for #46813
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
@kubernetes/sig-docs-maintainers @chenopis @ahmetb @thockin
Automatic merge from submit-queue
Add event exporter deployment to the fluentd-gcp addon
Introduce event exporter deployment to the fluentd-gcp addon so that by default if logging to Stackdriver is enabled, events will be available there also.
In this release, event exporter is a non-critical pod in BestEffort QoS class to avoid preempting actual workload in tightly loaded clusters. It will become critical in one of the future releases.
```release-note
Stackdriver cluster logging now deploys a new component to export Kubernetes events.
```
Automatic merge from submit-queue
Add some initial resource limits to the ip-masq-agent.
These limits were based on observing the agent over roughly a day RES was typically ~4M for me but I'd like to make sure we have some headroom. If there was a huge config map then this could increase slightly but not significantly since we only allow 64 entries.
VmPeak: 11164 kB
VmSize: 11164 kB
VmLck: 0 kB
VmPin: 0 kB
VmHWM: 7652 kB
VmRSS: 4260 kB
VmData: 7612 kB
VmStk: 136 kB
VmExe: 1856 kB
VmLib: 0 kB
VmPTE: 40 kB
VmPMD: 20 kB
VmSwap: 0 kB
Automatic merge from submit-queue
Adding a metadata proxy addon
**What this PR does / why we need it**: adds a metadata server proxy daemonset to hide kubelet secrets.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: this partially addresses #8867
**Special notes for your reviewer**:
**Release note**: the gce metadata server can be hidden behind a proxy, hiding the kubelet's token.
```release-note
The gce metadata server can be hidden behind a proxy, hiding the kubelet's token.
```
Automatic merge from submit-queue (batch tested with PRs 46456, 46675, 46676, 46416, 46375)
Move tolerations to PodSpec for ip-masq-agent.yaml.
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 46456, 46675, 46676, 46416, 46375)
Move tolerations to PodSpec for calico-node.yaml.
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
none
```
Automatic merge from submit-queue
Add generic NoExecute Toleration to NPD
Ref. #44445
cc @davidopp
```release-note
Add generic Toleration for NoExecute Taints to NodeProblemDetector
```
Automatic merge from submit-queue (batch tested with PRs 46201, 45952, 45427, 46247, 46062)
remove the elasticsearch template
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```
NONE
```
Loading file-based index template has been disabled since 2.0.0-beta1 version of Elasticsearch. https://www.elastic.co/guide/en/elasticsearch/reference/2.0/breaking_20_index_api_changes.html#_file_based_index_templates
So the `template-k8s-logstash.json` is not longer useful.
On the other hand, as https://github.com/kubernetes/kubernetes/issues/25127 indicated, we might better curl the elasticsearch API to load this template.
Automatic merge from submit-queue
Add version for fluentd-gcp config
Fluentd-gcp config should be versioned, because otherwise during the update race can happen and the new pod can mount the old config
Automatic merge from submit-queue
Update Calico add-on
**What this PR does / why we need it:**
Updates Calico to the latest version using self-hosted install as a DaemonSet, removes Calico's dependency on etcd.
- [x] Remove [last bits of Calico salt](175fe62720/cluster/saltbase/salt/calico/master.sls (L3))
- [x] Failing on the master since no kube-proxy to access API.
- [x] Fix outgoing NAT
- [x] Tweak to work on both debian / GCI (not just GCI)
- [x] Add the portmap plugin for host port support
Maybe:
- [ ] Add integration test
**Which issue this PR fixes:**
https://github.com/kubernetes/kubernetes/issues/32625
**Try it out**
Clone the PR, then:
```
make quick-release
export NETWORK_POLICY_PROVIDER=calico
export NODE_OS_DISTRIBUTION=gci
export MASTER_SIZE=n1-standard-4
./cluster/kube-up.sh
```
**Release note:**
```release-note
The Calico version included in kube-up for GCE has been updated to v2.2.
```
Automatic merge from submit-queue (batch tested with PRs 44606, 46038)
Add ip-masq-agent addon to the addons folder.
This also ensures that under gce we add this DaemonSet if the non-masq-cidr
is set to 0/0.
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
Add ip-masq-agent addon to the addons folder which is used in GCE if --non-masquerade-cidr is set to 0/0
```
Automatic merge from submit-queue (batch tested with PRs 45860, 45119, 44525, 45625, 44403)
Support running StatefulSetBasic e2e tests with local-up-cluster
**What this PR does / why we need it**:
Currently StatefulSet(s) fail when you use local-up-cluster without
setting a cloud provider. In this PR, we use set the
kubernetes.io/host-path provisioner as the default provisioner when
there CLOUD_PROVIDER is not specified. This enables e2e test(s)
(specifically StatefulSetBasic) to work.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 44337, 45775, 45832, 45574, 45758)
Refactor gcr.io/google_containers/elasticsearch to alpine
**What this PR does / why we need it**:
This reduces the image size of the gcr.io/google_containers/elasticsearch image.
Before:
```
REPOSITORY TAG IMAGE ID CREATED SIZE
gcr.io/google_containers/elasticsearch v2.4.1-2 6941e43df81a 4 weeks ago 419MB
```
After:
```
REPOSITORY TAG IMAGE ID CREATED SIZE
gcr.io/google_containers/elasticsearch v2.4.1-2 24ad40c21a52 About an hour ago 178MB
```
**Special notes for your reviewer**:
I used a workaround to make the elasticsearch_logging_discovery binary work with alpine. (See [stackoverflow](https://stackoverflow.com/questions/34729748/installed-go-binary-not-found-in-path-on-alpine-linux-docker/35613430#35613430)). Alternatively this can be solved by setting ```CGO_ENABLED=0```when compiling the binary. I didn't feel comfortable chaing the Makefile though, since I'm no golang expert. Feedback wanted!
Automatic merge from submit-queue (batch tested with PRs 45691, 45667, 45698, 45715)
Add general NoExecute Toleration to fluentd in gcp configuration
Ref #44445
Once merged I'll create a cherry-pick that will be picked up in GKE together with the next patch release.
cc @JorritSalverda @davidopp @aveshagarwal @nimeshksingh @piosz
```release-note
fluentd will tolerate all NoExecute Taints when run in gcp configuration.
```
Changes:
- Support kube-master-url flag without kubeconfig
- Fix concurrent R/Ws in dns.go
- Fix confusing logging when initialize server
- Fix printf in cmd/kube-dns/app/server.go
- Fix version on startup and --version flag
- Support specifying port number for nameserver in stubDomains
Automatic merge from submit-queue (batch tested with PRs 43884, 44712, 45124, 43883)
Increase Dashboard memory limits
**What this PR does / why we need it**: Increases memory requests and limits for Dashboard.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes https://github.com/kubernetes/dashboard/issues/1431
**Special notes for your reviewer**: Dashboard crashes on large clusters, this change should fix that problem.
**Release note**:
```release-note
Increase Dashboard's memory requests and limits
```
Currently StatefulSet(s) fail when you use local-up-cluster without
setting a cloud provider. In this PR, we use set the
kubernetes.io/host-path provisioner as the default provisioner when
there CLOUD_PROVIDER is not specified. This enables e2e test(s)
(specifically StatefulSetBasic) to work.
Automatic merge from submit-queue
issue_43986: fix docu with non-functional proxy
The documentation defines a couple of replication-controller and service
to provision a docker-registry somewhere on the cluster and have it
available by the name viz. A record of
kube-registry.default.svc.<clustername>.
On each node, http-proxies are placed as daemon-set with the
kube-registry DNS name set as upstream, so that the registry is
available on each host under endpoint localhost:5000
Because in the documentation, selector-identifiers are the same for
"upstream" registry and proxies, the proxies themselves register under
the service intended for the upstream and now have themselves as
upstream under a different port, where connection attempts result in
"connection refused".
Adapting selectors to be unique as in this patch fixes the problem.
**What this PR does / why we need it**:
Patch fixes (cf. above) erroneous documentation.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
Fixes#43986
**Special notes for your reviewer**:
Thank you for your consideration.
**Release note**:
```release-note
```
The documentation defines a couple of replication-controller and service
to provision a docker-registry somewhere on the cluster and have it
available by the name viz. A record of
kube-registry.default.svc.<clustername>.
On each node, http-proxies are placed as daemon-set with the
kube-registry DNS name set as upstream, so that the registry is
available on each host under endpoint localhost:5000
Because in the documentation, selector-identifiers are the same for
"upstream" registry and proxies, the proxies themselves register under
the service intended for the upstream and now have themselves as
upstream under a different port, where connection attempts result in
"connection refused".
Adapting selectors to be unique as in this patch fixes the problem.
modified: cluster/addons/registry/README.md
modified: cluster/addons/registry/registry-rc.yaml
modified: cluster/addons/registry/registry-svc.yaml
Automatic merge from submit-queue
[Federation] Remove FEDERATIONS_DOMAIN_MAP references
Remove all references to FEDERATIONS_DOMAIN_MAP as this method is no longer is used and is replaced by adding federation domain map to kube-dns configmap.
cc @madhusudancs @kubernetes/sig-federation-pr-reviews
**Release note**:
```
[Federation] Mechanism of adding `federation domain maps` to kube-dns deployment via `--federations` flag is superseded by adding/updating `federations` key in `kube-system/kube-dns` configmap. If user is using kubefed tool to join cluster federation, adding federation domain maps to kube-dns is already taken care by `kubefed join` and does not need further action.
```
Automatic merge from submit-queue (batch tested with PRs 42379, 42668, 42876, 41473, 43260)
Silence error messages from the docker rmi call we expect to fail
**What this PR does / why we need it**: when we removed `docker tag -f` in #34361 we added a bunch of `docker rmi` calls to preserve behavior for older docker versions. That step is usually a no-op, however, and results in confusing messages like
```
Tagging docker image gcr.io/google_containers/kube-proxy:c8d0b2e7a06b451117a8ac58fc3bb3d3 as gcr.io/kubernetes-release-test/kube-proxy-amd64:v1.5.4
Error response from daemon: No such image: gcr.io/kubernetes-release-test/kube-proxy-amd64:v1.5.4
```
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#42665
**Special notes for your reviewer**: I could probably remove the `docker rmi` calls entirely, though I don't know if folks are still using docker < 1.10. (I think Jenkins still has 1.9.1.)
**Release note**:
```release-note
NONE
```
cc @jessfraz
Automatic merge from submit-queue
Moves dns-horizontal-autoscaler to a separate service account
Similar to #38816.
As one of the cluster add-ons, dns-horizontal-autoscaler is now using the default service account in kube-system namespace, which is introduced by https://github.com/kubernetes/kubernetes/blob/master/cluster/addons/e2e-rbac-bindings/random-addon-grabbag.yaml for the ease of transition. This default service account will be removed in the future.
This PR subdivides dns-horizontal-autoscaler to a separate service account and setup the necessary permissions.
@bowei
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Update NPD rbac.
I recently enabled NPD in gke.
However, I found that in gke e2e test (https://k8s-testgrid.appspot.com/google-gke#gci-gke), npd on the node could not talk with apiserver, and reported full of following errors:
```
E0324 05:08:26.745545 1328 manager.go:160] failed to update node conditions: the server does not allow access to the requested resource (patch nodes gke-bootstrap-e2e-default-pool-fd91d792-mqh4)
E0324 05:08:37.719423 1328 manager.go:160] failed to update node conditions: the server does not allow access to the requested resource (patch nodes gke-bootstrap-e2e-default-pool-fd91d792-mqh4)
E0324 05:08:47.719694 1328 manager.go:160] failed to update node conditions: the server does not allow access to the requested resource (patch nodes gke-bootstrap-e2e-default-pool-fd91d792-mqh4)
```
I created a GKE cluster (v1.7.0-alpha.0.1483+1e879c69ecf09e) myself, and found that addon manager could not create npd binding with the following error:
```
error: error validating "/etc/kubernetes/addons/node-problem-detector/standalone/npd-binding.yaml": error validating data: couldn't find type: v1alpha1.ClusterRoleBinding; if you choose to ignore these errors, turn validation off with --validate=false
```
I found that rbac was updated to beta, but npd was missed because it was merged after 9e6a3496b4 (diff-b05c70853d9a772b310db71a61297841).
I updated rbac to beta in the master manifest and npd on the node could talk with apiserver immediately.
We must get this in 1.6 to make NPD working. @dchen1107
@dchen1107 @fabioy @liggitt
Automatic merge from submit-queue
Update Dashboard version to v1.6.0
**What this PR does / why we need it**:
Updates dashboard addon to latest version. Changelog can be found [here](https://github.com/kubernetes/dashboard/releases/tag/v1.6.0).
**Release note**:
```release-note
Update dashboard version to v1.6.0
```
Automatic merge from submit-queue
Update npd to the official v0.3.0 release.
Update npd to the official release v0.3.0.
This also fixes a npd bug https://github.com/kubernetes/node-problem-detector/pull/98.
@dchen1107 @kubernetes/node-problem-detector-reviewers
From UX perspective, 'default' is a bad name for the default storage class:
$ kubectl get storageclass
NAME TYPE
default (default) kubernetes.io/aws-ebs
This is sort of OK, it gets more confusing when user is not happy with the
preinstalled default storage class and creates its own and makes it default:
NAME TYPE
default kubernetes.io/aws-ebs
iops (default) kubernetes.io/aws-ebs
Automatic merge from submit-queue (batch tested with PRs 42940, 42906, 42970, 42848)
Enable RollingUpdates for the fluentd daemonset addon
In anticipation of needing to rev fluentd-gcp image versions in patch releases, we should enable rolling update so the new versions get rolled out in a timely manner.
/cc @ixdy
Automatic merge from submit-queue (batch tested with PRs 41830, 42630)
Arrange for elasticsearch to shutdown cleanly
Kubernetes initiates "graceful shutdown" by sending SIGTERM to pid 1, which
is exactly what elasticsearch is expecting (good!)
The way the existing startup scripts worked however, this signal arrived at
the shell wrapper, not elasticsearch, and the shell wrapper exited,
killing the container immediately (bad!)
Before this change:
```
1 ? Ss 0:00 /bin/sh -c /run.sh
6 ? S 0:00 /bin/bash /run.sh
13 ? S 0:00 \_ /bin/su -c /elasticsearch/bin/elasticsearch elasticsearch
14 ? Ss 0:00 \_ sh -c /elasticsearch/bin/elasticsearch
15 ? Sl 19:18 \_ /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java ... org.elasticsearch.bootstrap.Elasticsearch start
```
After this change:
```
1 ? Ssl 0:29 /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java ... org.elasticsearch.bootstrap.Elasticsearch start
```
Automatic merge from submit-queue (batch tested with PRs 42768, 42760, 42771, 42767)
Create EnsureExists class addons before Reconcile class addons
From #42757.
The addon-manager creates "Reconcile" class addons before creates "EnsureExists" class addons, which is not the best order. The "EnsureExists" class addons tend to be some default configurations like `default-storage-class` and `default kube-dns ConfigMap` (being added in #42757), and we would like to have these default configurations created before other addons are created.
@mikedanese @bowei
```release-note
NONE
```
Kubernetes initiates "graceful shutdown" by sending SIGTERM to pid 1.
The way the existing startup scripts worked, this signal arrived at
the shell wrapper, not elasticsearch, and the shell wrapper exited,
killing the container immediately.
Before this change:
1 ? Ss 0:00 /bin/sh -c /run.sh
6 ? S 0:00 /bin/bash /run.sh
13 ? S 0:00 \_ /bin/su -c /elasticsearch/bin/elasticsearch elasticsearch
14 ? Ss 0:00 \_ sh -c /elasticsearch/bin/elasticsearch
15 ? Sl 19:18 \_ /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java ... org.elasticsearch.bootstrap.Elasticsearch start
After this change:
1 ? Ssl 0:29 /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java ... org.elasticsearch.bootstrap.Elasticsearch start
Automatic merge from submit-queue (batch tested with PRs 41826, 42405)
Add stubDomains and upstreamNameservers configuration to kube-dns
```release-note
Updates the dnsmasq cache/mux layer to be managed by dnsmasq-nanny.
dnsmasq-nanny manages dnsmasq based on values from the
kube-system:kube-dns configmap:
"stubDomains": {
"acme.local": ["1.2.3.4"]
},
is a map of domain to list of nameservers for the domain. This is used
to inject private DNS domains into the kube-dns namespace. In the above
example, any DNS requests for *.acme.local will be served by the
nameserver 1.2.3.4.
"upstreamNameservers": ["8.8.8.8", "8.8.4.4"]
is a list of upstreamNameservers to use, overriding the configuration
specified in /etc/resolv.conf.
```
Automatic merge from submit-queue (batch tested with PRs 42070, 42127)
Remove fluentd-gcp image sources
This PR removes fluentd-gcp image sources from the main kubernetes repo to move it the `contrib`: https://github.com/kubernetes/contrib/pull/2426
Once image is moved, it will be maintained by Stackdriver team (@igorpeshansky, @qingling128 and @dhrupadb)
CC @ixdy @timstclair
Automatic merge from submit-queue (batch tested with PRs 42126, 42130, 42232, 42245, 41932)
Update fluentd-gcp configuration for hosted masters
This PR makes use of the new fluentd-gcp image, which is not configured per se, for the hosted masters, which cannot use configmaps.
Mirroring https://github.com/kubernetes/kubernetes/pull/42126
Automatic merge from submit-queue
Move fluentd DS config to configmap
This is the logical continuation of https://github.com/kubernetes/kubernetes/pull/41998. This PR makes fluentd-gcp DaemonSet use the new image configured using ConfigMap.
This PR doesn't change the way fluentd-gcp works in case master is not registered, that'll be fixed in a separate PR
CC @ixdy @timstclair @igorpeshansky @qingling128 @dhrupadb
**Release note:**
```release-note
Fluentd-gcp containers spawned by DaemonSet are now configured using ConfigMap
```
Automatic merge from submit-queue (batch tested with PRs 41644, 42020, 41753, 42206, 42212)
Update defaultbackend image to 1.3
Update `gcr.io/google-containers/defaultbackend` to the latest version.
See https://github.com/kubernetes/contrib/pull/2386
/cc @ixdy
Updates the dnsmasq cache/mux layer to be managed by dnsmasq-nanny.
dnsmasq-nanny manages dnsmasq based on values from the
kube-system:kube-dns configmap:
"stubDomains": {
"acme.local": ["1.2.3.4"]
},
is a map of domain to list of nameservers for the domain. This is used
to inject private DNS domains into the kube-dns namespace. In the above
example, any DNS requests for *.acme.local will be served by the
nameserver 1.2.3.4.
"upstreamNameservers": ["8.8.8.8", "8.8.4.4"]
is a list of upstreamNameservers to use, overriding the configuration
specified in /etc/resolv.conf.
Automatic merge from submit-queue (batch tested with PRs 42058, 41160, 42065, 42076, 39338)
Bump up dns-horizontal-autoscaler to 1.1.1
cluster-proportional-autoscaler 1.1.1 is releasing by kubernetes-incubator/cluster-proportional-autoscaler#26, also bump it up for dns-horizontal-autoscaler to introduce below features:
- Add PreventSinglePointFailure option in linear mode.
- Use protobufs for communication with apiserver.
- Support switching control mode on-the-fly.
Note:
The new entry `"preventSinglePointFailure":true` ensures kube-dns to have at least 2 replicas when there is more than one node. Mitigate the issue mentioned in #40063.
@bowei @thockin
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Cleanup fluentd-gcp image, rebase on debian-base
**Why we need this PR**:
There are several problems with our current fluentd-gcp image:
- It pulls in lots of unused packages, which expose unnecessary risk and create noise in CVE scans (and scare customers). The most notable example is the fluent-ui, which pulls in rails.
- `curl | sh ` is not a good practice for a Dockerfile. First, the script is not checked in the same source control branch, so builds are not reproducible. Second, the actions it is taking are opaque. Third, in this case, using non-standard packages means they're harder to manage with CVE scans & upstream fixes.
**What is changed by this PR?**
- Rather than relying on td-agent (which includes fluent-ui), use standard upstream packages. This is largely based off the [official fluentd debian-based image](https://github.com/fluent/fluentd-docker-image/blob/master/v0.12/debian/Dockerfile).
- Rebases the image on debian-base (depends on https://github.com/kubernetes/kubernetes/pull/41915). We would like to move towards a single full-distro base image we can maintain. This change should be relatively minor.
As a result of these changes, the image size is reduced from 360.6 MB to 185.8 MB (nearly half). Many packages were removed, and the full diff (focus on the unversioned files) is listed here: 3fb704f977
**Which issue this PR fixes** https://github.com/kubernetes/kubernetes/issues/40248
**Special notes for your reviewer**:
This change both addresses security concerns, and is expected to greatly reduce the maintenance burden of the fluentd-gcp image. I'd *really* like to get this into 1.6, so please prioritize this review if possible.
I tested this by running the default e2e suite on a private e2e cluster using the new image. If there are other tests you'd like me to run, please let me know ASAP.
**Release note**:
```release-note
Cleanup fluentd-gcp image: rebase on debian-base, switch to upstream packages, remove fluent-ui & rails
```
Automatic merge from submit-queue (batch tested with PRs 41854, 41801, 40088, 41590, 41911)
Default storage class for vSphere Fixes#40070
**What this PR does / why we need it**:
Create default storage class for vSphere. This is part of the storage class GA effort https://github.com/kubernetes/features/issues/36
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
fixes#40070
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue
Base etcd-empty-dir-cleanup on busybox, run as nobody, and update to etcdctl 3.0.14
**What this PR does / why we need it**: since the `etcd-empty-dir-cleanup` image just uses a simple shell script and `etcdctl`, we can base it on busybox, which is a smaller target than alpine.
I've also updated this to use an `etcdctl` from etcd 3.0.14, which matches the version of etcd we're running in 1.6 clusters (I believe), and changed the tag to match the `etcdctl` version.
Tested in my own e2e cluster, where it seems to work.
I haven't pushed the image yet, so e2e tests *may* fail. Tagging `do-not-merge`; if you think this looks good, I'll push the image and retest.
**Release note**:
```release-note
```
cc @timstclair @mml @wojtek-t
Automatic merge from submit-queue
move kube-dns to a separate service account
Switches the kubedns addon to run as a separate service account so that we can subdivide RBAC permission for it. The RBAC permissions will need a little more refinement which I'm expecting to find in https://github.com/kubernetes/kubernetes/pull/38626 .
@cjcullen @kubernetes/sig-auth since this is directly related to enabling RBAC with subdivided permissions
@thockin @kubernetes/sig-network since this directly affects now kubedns is added.
```release-note
`kube-dns` now runs using a separate `system:serviceaccount:kube-system:kube-dns` service account which is automatically bound to the correct RBAC permissions.
```
Automatic merge from submit-queue (batch tested with PRs 39855, 41433, 41567, 41887, 41652)
Add fluentd monitoring to fluentd-gcp image
Right now we are not able to monitor the state of fluentd in cluster, which may result in logging subsystem quietly failing. This PR tries to address that problem by introducing the fluentd container monitoring:
* fluentd internal metrics, like number of buffers and number of data in buffers
* `logging_line_count`, number of lines, read by fluentd from application containers' logs
* Has `tag` label, corresponding to the fluentd tag of the entry
* `logging_entry_count`, number of entries, emitted to the output plugin
* With label `component` set to `container`, generated by application containers
* With label `component` set to `system`, generated by system components like kubelet, docker, scheduler, etc.
* Has `tag` label, corresponding to the fluentd tag of the entry
CC @fabxc @igorpeshansky @edsiper
Automatic merge from submit-queue (batch tested with PRs 41797, 41793, 41795, 41807, 41781)
Turn fluentd supervisor off for fluentd-gcp
By default, turn fluentd supervisor off so that when fluentd process fails, for example due to OOM, container fails completely and it would be easy to detect.
CC @igorpeshansky @qingling128
Automatic merge from submit-queue (batch tested with PRs 41349, 41532, 41256, 41587, 41657)
Update kubectl in addon-manager to use HPA in autoscaling/v1
Addon-manager is broken since HPA objects were removed from extensions api group.
Came across the logs from [the latest addon-manager on Jenkins](https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-e2e-gci-gce/4290/artifacts/bootstrap-e2e-master/kube-addon-manager.log):
```
INFO: == Entering periodical apply loop at 2017-02-16T17:33:37+0000 ==
error: error pruning namespaced object extensions/v1beta1, Kind=HorizontalPodAutoscaler: the server could not find the requested resource
WRN: == Failed to execute /usr/local/bin/kubectl apply --namespace=kube-system -f /etc/kubernetes/addons --prune=true -l kubernetes.io/cluster-service=true --recursive >/dev/null at 2017-02-16T17:33:38+0000. 2 tries remaining. ==
error: error pruning namespaced object extensions/v1beta1, Kind=HorizontalPodAutoscaler: the server could not find the requested resource
WRN: == Failed to execute /usr/local/bin/kubectl apply --namespace=kube-system -f /etc/kubernetes/addons --prune=true -l kubernetes.io/cluster-service=true --recursive >/dev/null at 2017-02-16T17:33:46+0000. 1 tries remaining. ==
error: error pruning namespaced object extensions/v1beta1, Kind=HorizontalPodAutoscaler: the server could not find the requested resource
WRN: == Failed to execute /usr/local/bin/kubectl apply --namespace=kube-system -f /etc/kubernetes/addons --prune=true -l kubernetes.io/cluster-service=true --recursive >/dev/null at 2017-02-16T17:33:53+0000. 0 tries remaining. ==
WRN: == Kubernetes addon update completed with errors at 2017-02-16T17:33:58+0000 ==
```
And notice this commit (f66679a4e9) came in two weeks ago, which removed HorizontalPodAutoscaler from extensions/v1beta1.
Addon-manager is now partially functioning that it could successfully create and update addons, but will fail to prune objects, which means upgrade tests may mostly fail.
Pushed another version of addon-manager with kubectl v1.6.0-alpha.2 ([release 2 days ago](https://github.com/kubernetes/kubernetes/releases/tag/v1.6.0-alpha.2)) for fixing, including below images:
- gcr.io/google-containers/kube-addon-manager:v6.4-alpha.2
- gcr.io/google-containers/kube-addon-manager-amd64:v6.4-alpha.2
- gcr.io/google-containers/kube-addon-manager-arm:v6.4-alpha.2
- gcr.io/google-containers/kube-addon-manager-arm64:v6.4-alpha.2
- gcr.io/google-containers/kube-addon-manager-ppc64le:v6.4-alpha.2
- gcr.io/google-containers/kube-addon-manager-s390x:v6.4-alpha.2
@mikedanese
cc @wojtek-t @shyamjvs
Automatic merge from submit-queue
Bump fluentd-gcp google_cloud plugin version
Bump the version of `fluent-plugin-google-cloud` in fluentd-gcp image, because it's broken for version `0.5.2`.
Recently, gem `google-api-client` was updated to version `0.10.0`. The new version broke `fluent-plugin-google-cloud` which doesn't specify the upper version of `google-api-client` gem. I'm bumping the version used in our image to allow future changes in this release to be run and tested.
This PR doesn't bump the version, since no effective changes has happened, leaving this for the next PR to do.
CC @igorpeshansky
Automatic merge from submit-queue (batch tested with PRs 40000, 41508, 41489)
Add toleration to fluentd daemonset to make it run on master
Because of https://github.com/kubernetes/kubernetes/pull/41172 fluentd pods stopped being allocated on master node.
This PR introduces toleration for master taint for fluentd.
CC @davidopp @janetkuo @kubernetes/sig-scheduling-bugs
Unfortunately, we don't have e2e tests to ensure that master logs are being ingested. This problem is a great signal to work on https://github.com/kubernetes/kubernetes/issues/41411
Automatic merge from submit-queue
fluentd-gcp: Add kube-apiserver-audit.log.
**What this PR does / why we need it**:
Add `kube-apiserver-audit.log` from https://github.com/kubernetes/kubernetes/pull/41211 to fluentd config, so the audit log gets sent to the same place as `kube-apiserver.log`.
**Which issue this PR fixes**:
**Special notes for your reviewer**:
We would like to backport this to release-1.5 also.
**Release note**:
```release-note
The apiserver audit log (`/var/log/kube-apiserver-audit.log`) will be sent through fluentd if enabled.
```
Automatic merge from submit-queue (batch tested with PRs 41357, 41178, 41280, 41184, 41278)
Switch RBAC subject apiVersion to apiGroup in v1beta1
Referencing a subject from an RBAC role binding, the API group and kind of the subject is needed to fully-qualify the reference.
The version is not, and adds complexity around re-writing the reference when returning the binding from different versions of the API, and when reconciling subjects.
This PR:
* v1beta1: change the subject `apiVersion` field to `apiGroup` (to match roleRef)
* v1alpha1: convert apiVersion to apiGroup for backwards compatibility
* all versions: add defaulting for the three allowed subject kinds
* all versions: add validation to the field so we can count on the data in etcd being good until we decide to relax the apiGroup restriction
```release-note
RBAC `v1beta1` RoleBinding/ClusterRoleBinding subjects changed `apiVersion` to `apiGroup` to fully-qualify a subject. ServiceAccount subjects default to an apiGroup of `""`, User and Group subjects default to an apiGroup of `"rbac.authorization.k8s.io"`.
```
@deads2k @kubernetes/sig-auth-api-reviews @kubernetes/sig-auth-pr-reviews
Automatic merge from submit-queue (batch tested with PRs 41182, 41290)
Add a default storage class for Azure Disk
Part of https://github.com/kubernetes/kubernetes/issues/40071
@jsafrane @colemickens @codablock @rootfs
These files have been created lately, so we don't have much information
about them anyway, so let's just:
- Remove assignees and make them approvers
- Copy approves as reviewers
Automatic merge from submit-queue (batch tested with PRs 36693, 40154, 40170, 39033)
make client-go authoritative for pkg/client/restclient
Moves client/restclient to client-go and a util/certs, util/testing as transitives.
Automatic merge from submit-queue (batch tested with PRs 40168, 40165, 39158, 39966, 40190)
Include system:masters group in the bootstrap admin client certificate
Sets up the bootstrap admin client certificate for new clusters to be in the system:masters group
Removes the need for an explicit grant to the kubecfg user in e2e-bindings
```release-note
The default client certificate generated by kube-up now contains the superuser `system:masters` group
```
Automatic merge from submit-queue (batch tested with PRs 40003, 40017)
Remove library copying from fluentd image
It seems that fluentd can no longer copy systemd libraries from host to be able to read journals.
Automatic merge from submit-queue
Build release tars using bazel
**What this PR does / why we need it**: builds equivalents of the various kubernetes release tarballs, solely using bazel.
For example, you can now do
```console
$ make bazel-release
$ hack/e2e.go -v -up -test -down
```
**Special notes for your reviewer**: this is currently dependent on 3b29803eb5, which I have yet to turn into a pull request, since I'm still trying to figure out if this is the best approach.
Basically, the issue comes up with the way we generate the various server docker image tarfiles and load them on nodes:
* we `md5sum` the binary being encapsulated (e.g. kube-proxy) and save that to `$binary.docker_tag` in the server tarball
* we then build the docker image and tag using that md5sum (e.g. `gcr.io/google_containers/kube-proxy:$MD5SUM`)
* we `docker save` this image, which embeds the full tag in the `$binary.tar` file.
* on cluster startup, we `docker load` these tarballs, which are loaded with the tag that we'd created at build time. the nodes then use the `$binary.docker_tag` file to find the right image.
With the current bazel `docker_build` rule, the tag isn't saved in the docker image tar, so the node is unable to find the image after `docker load`ing it.
My changes to the rule save the tag in the docker image tar, though I don't know if there are subtle issues with it. (Maybe we want to only tag when `--stamp` is given?)
Also, the docker images produced by bazel have the timestamp set to the unix epoch, which is not great for debugging. Might be another thing to change with a `--stamp`.
Long story short, we probably need to follow up with bazel folks on the best way to solve this problem.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Use $HOSTNAME as node.name by default
**What this PR does / why we need it**:
Allows to identify elasticsearch instances more easily.
As $HOSTNAME of a pod is unique, this should be no problem.
Automatic merge from submit-queue
Update images that use ubuntu-slim base image to :0.6
**What this PR does / why we need it**: `ubuntu-slim:0.4` is somewhat old, being based on Ubuntu 16.04, whereas `ubuntu-slim:0.6` is based on Ubuntu 16.04.1.
**Special notes for your reviewer**: I haven't pushed any of these images yet, so I expect all of the e2e builds to fail. If we're happy with the changes, I can push the images and then re-trigger tests.
**Release note**:
```release-note
NONE
```
cc @aledbf as FYI
Automatic merge from submit-queue
Update kubectl to stable version for Addon Manager
Bumps up Addon Manager to v6.2, below images are pushed:
- gcr.io/google-containers/kube-addon-manager:v6.2
- gcr.io/google-containers/kube-addon-manager-amd64:v6.2
- gcr.io/google-containers/kube-addon-manager-arm:v6.2
- gcr.io/google-containers/kube-addon-manager-arm64:v6.2
- gcr.io/google-containers/kube-addon-manager-ppc64le:v6.2
- gcr.io/google-containers/kube-addon-manager-s390x:v6.2
@mikedanese
cc @ixdy
Automatic merge from submit-queue (batch tested with PRs 39803, 39698, 39537, 39478)
include bootstrap admin in super-user group, ensure tokens file is correct on upgrades
Fixes https://github.com/kubernetes/kubernetes/issues/39532
Possible issues with cluster bring-up scripts:
- [x] known_tokens.csv and basic_auth.csv is not rewritten if the file already exists
* new users (like the controller manager) are not available on upgrade
* changed users (like the kubelet username change) are not reflected
* group additions (like the addition of admin to the superuser group) don't take effect on upgrade
* this PR updates the token and basicauth files line-by-line to preserve user additions, but also ensure new data is persisted
- [x] existing 1.5 clusters may depend on more permissive ABAC permissions (or customized ABAC policies). This PR adds an option to enable existing ABAC policy files for clusters that are upgrading
Follow-ups:
- [ ] both scripts are loading e2e role-bindings, which only be loaded in e2e tests, not in normal kube-up scenarios
- [ ] when upgrading, set the option to use existing ABAC policy files
- [ ] update bootstrap superuser client certs to add superuser group? ("We also have a certificate that "used to be" a super-user. On GCE, it has CN "kubecfg", on GKE it's "client"")
- [ ] define (but do not load by default) a relaxed set of RBAC roles/rolebindings matching legacy ABAC, and document how to load that for new clusters that do not want to isolate user permissions
Automatic merge from submit-queue
Enable kubernetes_metadata by default for ELK stack
Looks like it was accidentally removed and was not restored back in this PR https://github.com/kubernetes/kubernetes/pull/29883
Because actually this plugin still exists in the image, but new ELK deployment don't allow you to index namespaces, pod names, etc.
Automatic merge from submit-queue (batch tested with PRs 39731, 39662, 39721)
Update dashboard version to v1.5.1
**What this PR does / why we need it**:
Latest Dashboard developments, including a CSRF issue in the dashboard POST handlers
**Release note**:
```
Set Dashboard UI version to v1.5.1
```
Automatic merge from submit-queue (batch tested with PRs 39694, 39383, 39651, 39691, 39497)
Allow rolebinding/clusterrolebinding with explicit bind permission check
Fixes https://github.com/kubernetes/kubernetes/issues/39176
Fixes https://github.com/kubernetes/kubernetes/issues/39258
Allows creating/updating a rolebinding/clusterrolebinding if the user has explicitly been granted permission to perform the "bind" verb against the referenced role/clusterrole (previously, they could only bind if they already had all the permissions in the referenced role via an RBAC role themselves)
```release-note
To create or update an RBAC RoleBinding or ClusterRoleBinding object, a user must:
1. Be authorized to make the create or update API request
2. Be allowed to bind the referenced role, either by already having all of the permissions contained in the referenced role, or by having the "bind" permission on the referenced role.
```
Automatic merge from submit-queue (batch tested with PRs 39628, 39551, 38746, 38352, 39607)
fix e2e kubelet binding
Fixes#39543
This limits scope of the kubelet. It was an oversight before. Hopefully we won't end up chasing permissions again.
Automatic merge from submit-queue
Try parse golang logs by default
Glog by default logs to stderr, so Stackdriver Logging shows them all as errors. This PR makes fluentd try to parse messages using glog format and if succeeded, set timestamp and severity accordingly.
CC @piosz @fgrzadkowski
Automatic merge from submit-queue
Remove all MAINTAINER statements in the codebase as they are deprecated
**What this PR does / why we need it**:
ref: https://github.com/docker/docker/pull/25466
**Release note**:
```release-note
Remove all MAINTAINER statements in Dockerfiles in the codebase as they are deprecated by docker
```
@ixdy @thockin (who else should be notified?)
Automatic merge from submit-queue
Adds assignees for kube-dns
Adds assignees for auto-assigning. Does not add assignees for pkg/dns folder as we are moving it out.
@thockin
Automatic merge from submit-queue
Adds kubernetes.io link for dns autoscaler addon
The [official page for DNS Horizontal Autoscaling](http://kubernetes.io/docs/tasks/administer-cluster/dns-horizontal-autoscaling/) is available on kubernetes.io after 1.5 release. Putting the link into this dns autoscaler addon folder as well.
@bowei
Automatic merge from submit-queue (batch tested with PRs 39146, 39094)
cleanup last e2e authorization failures
Builds on https://github.com/kubernetes/kubernetes/pull/39080. This adds rbac role bindings during e2e tests for test that use SA permissions to loopback to the API server.
Assigned to me until its ready.
Automatic merge from submit-queue
Make fluentd pods critical
Related to https://github.com/kubernetes/kubernetes/issues/38322
Make fluentd critical so it will be evicted with less probability.
CC @piosz @fgrzadkowski
Automatic merge from submit-queue
Add liveness probe for fluentd-gcp
It's known that fluentd can hung up during execution until manual restart.
Liveness probe fixes this problem in the following way: if no buffer chunks were sent or created in the last 5 minutes, fluentd is hanging and should be restarted.
CC @piosz
Automatic merge from submit-queue
Coreos kube-up now with less cloud init
This update includes significant refactoring. It moves almost all of the
logic into bash scripts, modeled after the `gci` cluster scripts.
The reason to do this is:
1. Avoid duplicating the saltbase manifests by reusing gci's parsing logic (easier maintenance)
2. Take an incremental step towards sharing more code between gci/trusty/coreos, again for better maintenance
3. Pave the way for making future changes (e.g. improved rkt support, kubelet support) easier to share
The primary differences from the gci scripts are the following:
1. Use of the `/opt/kubernetes` directory over `/home/kubernetes`
2. Support for rkt as a runtime
3. No use of logrotate
4. No use of `/etc/default/`
5. No logic related to noexec mounts or gci-specific firewall-stuff
It will make sense to move 2 over to gci, as well as perhaps a few other small improvements. That will be a separate PR for ease of review.
Ref #29720, this is a part of that because it removes a copy of them.
Fixes#24165
cc @yifan-gu
Since this logic largely duplicates logic from the gci folder, it would be nice if someone closely familiar with that gave an OK or made sure I didn't fall into any gotchas related to that, so cc @andyzheng0831
This update includes significant refactoring. It moves almost all of the
logic into bash scripts, modeled after the `gci` cluster scripts.
The primary differences between the two are the following:
1. Use of the `/opt/kubernetes` directory over `/home/kubernetes`
2. Support for rkt as a runtime
3. No use of logrotate
4. No use of `/etc/default/`
5. No logic related to noexec mounts or gci-specific firewall-stuff
Automatic merge from submit-queue
Use daemonset in docker registry add on
When using registry add on with kubernetes cluster it will be right to use `daemonset` to bring up a pod on each node of cluster, right now the docs suggests to bring up a pod on each node manually by dropping the pod manifests into directory `/etc/kubernetes/manifests`.
Automatic merge from submit-queue (batch tested with PRs 38760, 38213)
Avoid exporting fluentd-gcp own logs
To prevent fluentd from exporting its own logs, redirect the output to a file. Ability to read fluentd logs remains, but because these logs will not be exported, we can increase the verbosity of these logs.
Same change should be made for fluentd-es image.
CC @piosz
Using daemonset to bring up a pod on each node of cluster,
right now the docs suggests to bring up a pod on each node by
manually dropping the pod manifests into directory /etc/kubernetes/manifests.
Automatic merge from submit-queue (batch tested with PRs 38058, 38523)
Renames kube-dns configure files from skydns* to kubedns*
`skydns-` prefix and `-rc` suffix are confusing and misleading. Renaming it to `kubedns` in existing yaml files and scripts.
@bowei @thockin
Automatic merge from submit-queue (batch tested with PRs 37860, 38429, 38451, 36050, 38463)
[Part 2] Adding s390x cross-compilation support for gcr.io images in this repo
<!-- Thanks for sending a pull request! Here are some tips for you:
1. If this is your first time, read our contributor guidelines https://github.com/kubernetes/kubernetes/blob/master/CONTRIBUTING.md and developer guide https://github.com/kubernetes/kubernetes/blob/master/docs/devel/development.md
2. If you want *faster* PR reviews, read how: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/faster_reviews.md
3. Follow the instructions for writing a release note: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/pull-requests.md#release-notes
-->
**What this PR does / why we need it**: This PR enables s390x support to kube-dns , pause, addon-manager, etcd, hyperkube, kube-discovery etc. This PR also includes the changes due to which it can be cross compiled on x86 host architecture.
**Which issue this PR fixes#34328
**Special notes for your reviewer**: In existing file "build-tools/build-image/cross/Dockerfile" the repository mentioned for installing cross build tool chains for supporting architecture does not have a tool chain for s390x hence in my PR I am changing the repository so that it will be cross compiled for s390x.
**Release note**:
<!-- Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access)
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`.
-->
```
Allows cross compilation of Kubernetes on x86 host for s390x also enables s390x support to kube-dns , pause, addon-manager, etcd, hyperkube, kube-discovery etc
```
Automatic merge from submit-queue (batch tested with PRs 32663, 35797)
contribute deis/registry-proxy as a replacement for kube-registry-proxy
This PR is a proposal to replace the `kube-registry-proxy` addon code with [deis/registry-proxy](https://github.com/deis/registry-proxy). We have been running this component in production for several months ([since Workflow v2.3.0](15d4c1c298/workflow-v2.3.0/tpl/deis-registry-proxy-daemon.yaml)) without any issues.
There are several benefits that this proxy provides over the current implementation:
- it's the same code that is provided in [docker/distribution's contrib dir](https://github.com/docker/distribution/tree/master/contrib/compose) which I have personally used for both Docker v1 and v2 engine deployments without any issues
- the ability to [disable old Docker clients](https://github.com/deis/registry-proxy/blob/master/rootfs/etc/nginx/conf.d/default.conf.in#L19-L23) that are incompatible with the v2 registry
- better default connection timeouts, using best practices from the Docker community as a whole
- workarounds for bugs like https://github.com/docker/docker/issues/1486 (see https://github.com/deis/registry-proxy/blob/master/rootfs/etc/nginx/conf.d/default.conf.in#L15-L16)
Things that this PR differs from the current implementation:
- it's not HAProxy.
I'm not sure how the release process goes for this component, but I bumped the version to v0.4 and changed the maintainer to myself considering this is a massive overhaul. Please let me know if this is acceptable as a replacement or if we should perhaps consider this as an alternative implementation.
Happy Friday!
/etc/ssl/certs is currently mounted through in a number of places.
However, on Gentoo and CoreOS (and probably others), the files in
/etc/ssl/certs are just symlinks to files in /usr/share/ca-certificates.
For these components to correclty work, the target of the symlinks needs
to be available as well.
This is especially important for kube-controller-manager, where this
issue was noticed.
This change was originally part of #33965, but was split out for ease of
review.
Automatic merge from submit-queue
Adds docs for dns-horizontal-autoscaler and kube-dns
Although we have separate docs on kubernetes.io, we should have a short description about the dns-horizontal-autoscaler addon in folder.
Also updates kube-dns README with example command to scale kube-dns Deployment. This is needed because Addon Manager v6 has stricter reconcile behavior.
@bowei @bprashanth @thockin
Automatic merge from submit-queue
Unify fluentd-gcp configurations
There're two different configs and two different pod specs for fluentd agent for GCL: one for GCI and one for CVM. This PR makes it possible to use only one config and only one pod spec.
CC @piosz
Automatic merge from submit-queue
Set strategy spec for kube-dns to support zero downtime rolling update
From #37728 and coreos/kube-aws#111.
Set `maxUnavailable` to 0 to prevent DNS service outage during update when the replica number is only 1.
Also keeps all kube-dns yaml files in sync.
@bowei @thockin
Automatic merge from submit-queue
Update Stateful Set example files for 1.5
1. Remove initialized annotation from statefulset examples
2. Update storage class annotation to beta in statefulset examples
3. Remove alpha limitation on PetSet in cassandra example
cc @erictune @foxish @kow3ns @enisoc @chrislovecnm @kubernetes/sig-apps
```release-note
NONE
```
Automatic merge from submit-queue
Fixes Addon Manager's pruning issue for old Deployments
Fixes#37641.
Attaches the `last-applied`annotations to the existing Deployments for pruning.
Below images are built and pushed:
- gcr.io/google-containers/kube-addon-manager:v6.1
- gcr.io/google-containers/kube-addon-manager-amd64:v6.1
- gcr.io/google-containers/kube-addon-manager-arm:v6.1
- gcr.io/google-containers/kube-addon-manager-arm64:v6.1
- gcr.io/google-containers/kube-addon-manager-ppc64le:v6.1
@mikedanese
cc @saad-ali @krousey
Automatic merge from submit-queue
Set Dashboard UI version to v1.5.0-beta1
There will be one more such PR coming for 1.5 release. In one week.
Setting release note to none. Will set notes for final version PR.
Github release info:
https://github.com/kubernetes/dashboard/releases/tag/v1.5.0-beta1
Automatic merge from submit-queue
Deploy a default StorageClass instance on AWS and GCE
This needs a newer kubectl in kube-addons-manager container. It's quite tricky to test as I cannot push new container image to gcr.io and I must copy the newer container manually.
cc @kubernetes/sig-storage
**Release note**:
```release-note
Kubernetes now installs a default StorageClass object when deployed on AWS, GCE and
OpenStack with kube-up.sh scripts. This StorageClass will automatically provision
a PeristentVolume in corresponding cloud for a PersistentVolumeClaim that cannot be
satisfied by any existing matching PersistentVolume in Kubernetes.
To override this default provisioning, administrators must manually delete this default StorageClass.
```
Automatic merge from submit-queue
Bumps up Addon Manager to v6.0 with full support of kubectl apply
Below images are built and pushed:
- gcr.io/google-containers/kube-addon-manager:v6.0
- gcr.io/google-containers/kube-addon-manager-amd64:v6.0
- gcr.io/google-containers/kube-addon-manager-arm:v6.0
- gcr.io/google-containers/kube-addon-manager-arm64:v6.0
- gcr.io/google-containers/kube-addon-manager-ppc64le:v6.0
The actual change made is upgrade kubectl version from `v1.5.0-alpha.1` to `v1.5.0-beta.1`, which is released today.
@mikedanese
@saad-ali This need to get into 1.5 because Addon Manager v6.0-alpha.1 (currently in used) does not have full support of `kubectl apply --prune`.
Automatic merge from submit-queue
fix: elasticsearch template mapping to parse kubernetes.labels
**What this PR does / why we need it**:
This PR updates the field mappings for the elasticsearch template that ships with the EFK stack implementation.
Specifically, elasticsearch cannot parse the `kubernetes.labels` object because it attempts to treat it as a string and produces an error. This update treats `kubernetes.labels` as an object and all of the properties within as a string, allowing accurate indexing and allowing users in kibana to search on `kubernetes.labels.*`.
**Release note**:
```release-note
Fluentd/Elastisearch add-on: correctly parse and index kubernetes labels
```
Automatic merge from submit-queue
Add kube-proxy logs to fluentd configs
Related to https://github.com/kubernetes/kubernetes/issues/37107
Makes fluentd collect logs from kube-proxy. It's completely backward-compatible change that does not cause problems currently, so I suggest not to bump version.
cc @piosz
- Adds command line flags --config-map, --config-map-ns.
- Fixes 36194 (https://github.com/kubernetes/kubernetes/issues/36194)
- Update kube-dns yamls
- Update bazel (hack/update-bazel.sh)
- Update known command line flags
- Temporarily reference new kube-dns image (this will be fixed with
a separate commit when the DNS image is created)
Automatic merge from submit-queue
Update Dashboard UI version to 1.4.2
**What this PR does / why we need it**:
Dashboard 1.4.2 contains a fix for an XSS security bug, so I think it would be prudent to update the Dashboard version 'shipped' with kubernetes to this version
**Special notes for your reviewer**:
**Release note**:
- Updated dashboard version in addons to 1.4.2```
Automatic merge from submit-queue
Fix Docker Registry image version to 2.5.1
`registry:2` is constantly being updated with new versions. This means there's a possibility that the image may be changed unintentionally. For example, when the Pod is rescheduled on nodes that does not already have the image, depending on the time of the pull, `registry:2` may result in different images.
Fix this to the latest `registry:2.5.1` instead to avoid this problem.
@uluyol @freehan
Dashboard 1.4.2 contains a fix for an XSS security bug, so I think it would be prudent to update the Dashboard version 'shipped' with kubernetes to this version
Automatic merge from submit-queue
Migrates addons from RCs to Deployments
Fixes#33698.
Below addons are being migrated:
- kube-dns
- GLBC default backend
- Dashboard UI
- Kibana
For the new deployments, the version suffixes are removed from their names. Version related labels are also removed because they are confusing and not needed any more with regard to how Deployment and the new Addon Manager works.
The `replica` field in `kube-dns` Deployment manifest is removed for the incoming DNS horizontal autoscaling feature #33239.
The `replica` field in `Dashboard` Deployment manifest is also removed because the rescheduler e2e test is manually scaling it.
Some resource limit related fields in `heapster-controller.yaml` are removed, as they will be set up by the `addon resizer` containers. Detailed reasons in #34513.
Three e2e tests are modified:
- `rescheduler.go`: Changed to resize Dashboard UI Deployment instead of ReplicationController.
- `addon_update.go`: Some namespace related changes in order to make it compatible with the new Addon Manager.
- `dns_autoscaling.go`: Changed to examine kube-dns Deployment instead of ReplicationController.
Both of above two tests passed on my own cluster. The upgrade process --- from old Addons with RCs to new Addons with Deployments --- was also tested and worked as expected.
The last commit upgrades Addon Manager to v6.0. It is still a work in process and currently waiting for #35220 to be finished. (The Addon Manager image in used comes from a non-official registry but it mostly works except some corner cases.)
@piosz @gmarek could you please review the heapster part and the rescheduler test?
@mikedanese @thockin
cc @kubernetes/sig-cluster-lifecycle
---
Notes:
- Kube-dns manifest still uses *-rc.yaml for the new Deployment. The stale file names are preserved here for receiving faster review. May send out PR to re-organize kube-dns's file names after this.
- Heapster Deployment's name remains in the old fashion(with `-v1.2.0` suffix) for avoiding describe this upgrade transition explicitly. In this way we don't need to attach fake apply labels to the old Deployments.
Automatic merge from submit-queue
Fix startup script bug in kibana image
Big thanks to @lhopki01 for noticing this!
As mention in discussion in https://github.com/kubernetes/kubernetes/pull/36103 current image crashes if we don't want to work behind proxy because of string interpolation in bash.
@piosz
Automatic merge from submit-queue
Fixes token_found bug in addon manager
From #35832.
Above PR exposed addon manager's logs on Jenkins, found below error on the gce e2e test artifacts:
```
Error from server: serviceaccounts "default" not found
error executing template "{{with index .secrets 0}}{{.name}}{{end}}": template: output:1:7: executing "output" at <index .secrets 0>: error calling index: index of untyped nil
== default service account in the kube-system namespace has token Error executing template: template: output:1:7: executing "output" at <index .secrets 0>: error calling index: index of untyped nil. Printing more information for debugging the template:
template was:
{{with index .secrets 0}}{{.name}}{{end}}
raw data was:
{"kind":"ServiceAccount","apiVersion":"v1","metadata":{"name":"default","namespace":"kube-system","selfLink":"/api/v1/namespaces/kube-system/serviceaccounts/default","uid":"de3f2f85-9d6a-11e6-9df3-42010af00002","resourceVersion":"48","creationTimestamp":"2016-10-29T00:01:40Z"}}
object given to template engine was:
map[apiVersion:v1 metadata:map[selfLink:/api/v1/namespaces/kube-system/serviceaccounts/default uid:de3f2f85-9d6a-11e6-9df3-42010af00002 resourceVersion:48 creationTimestamp:2016-10-29T00:01:40Z name:default namespace:kube-system] kind:ServiceAccount] ==
```
Seems like the script failed to retrieve service token at the first time and mistakenly used the error message as the token content. Fixes by replacing `|| true` with if condition.
https://hub.docker.com/r/library/registry/tags/
`registry:2` is constantly being updated with new versions. This means there's a possibility that the image may be changed unintentionally. For example, when the Pod is rescheduled on nodes that does not already have the image, depending on the time of the pull, `registry:2` may result in different images.
Fix this to the latest `registry:2.5.1` instead to avoid this problem.
Automatic merge from submit-queue
Update fluentd-gcp configuration
Related to #32762
Though it's not a final solution to the fluentd OOM problems, it increases number of logs that can be handled without losses by
- switching to the file buffering, making buffering mechanism more resilient
- decreasing size of the buffer, decreasing the amount of memory needed
- decreasing number of threads handling the load, since number of chunks is lower than previous number of threads
which results in decrease in theoretical throughput. Tests to confirm cases covered by this change will follow.
cc @piosz @edsiper @repeatedly please take look and confirm that all of these changed are meaningful.
Automatic merge from submit-queue
Rename PetSet to StatefulSet in docs and examples.
**What this PR does / why we need it**: Addresses some of the pre-code-freeze changes for implementing the PetSet --> StatefulSet rename. (#35534)
**Special notes for your reviewer**: This PR only changes docs and examples, as #35731 hasn't been merged yet and I don't want to create merge conflicts. I'll open another PR for any remaining code changes needed after that PR is merged. /cc @erictune @janetkuo @chrislovecnm
Automatic merge from submit-queue
cluster-addons: enable Jemalloc for Fluentd based images
**What this PR does / why we need it**:
This Pull Request includes two patches that enable the recommended use of Jemalloc memory allocator for container images that are based in Fluentd. The patches applies to the following cluster-addons:
- fluentd-es-image
- fluentd-gcp-image
**Which issue this PR fixes**
This PR is part of the solution for issues:
- kubernetes/kubernetes/issues/32762
- GoogleCloudPlatform/fluent-plugin-google-cloud/issues/87
When Fluentd runs in high load environments, it's likely the default operating system memory allocator will generate a high fragmentation ending up in a high memory usage. In order to reduce fragmentation and decrease memory usage an alternative memory allocator as Jemalloc is used.

For the record: fluentd-es-image uses [td-agent](https://docs.treasuredata.com/articles/td-agent) Fluentd package maintained by Treasure Data, which contains Jemalloc 4.2.1 (latest stable version). The google-fluentd package used in fluentd-gcp-image comes with Jemalloc 2.2.5, which have many known issues, I strongly suggest google-fluentd package gets updated.
**Special notes for your reviewer**:
In the research of this topic have been involved @piosz and @Crassirostris.
Automatic merge from submit-queue
Fixed kibana image and controller to work through proxy
As described in #34969, new kibana image doesn't work properly with proxies without additional configuration.
@piosz
Automatic merge from submit-queue
Update grafana in kubernetes to version 3.1.1
Fix#33775
```release-note
Update grafana version used by default in kubernetes to 3.1.1
```
@piosz
Automatic merge from submit-queue
Update elasticsearch and kibana usage
```release-note
Updated default Elasticsearch and Kibana used for elasticsearch logging destination to versions 2.4.1 and 4.6.1 respectively.
```
Updated controllers for elasticsearch and kibana to use newer versions of images. Fixed e2e test because of elasticsearch backward incompatible API changes.
Fixed out of sync elasticsearch controller for coreos.
@piosz
Automatic merge from submit-queue
Updated addon manager READMEs
Updates addon-manager's README. Based on the pre-condition that the addon manager keeps current "reconciled" pattern instead of "fire-once".
@mikedanese
The current DockerFile build an image using td-agent package but it let
the service run with the default memory allocator provided by glibc.
In high load environments, is highly required to use a customized memory
allocator such as Jemalloc. Otherwise the service will face a high memory
fragmentation ending up in 'high memory' usage from a monitoring perspective.
td-agent package by default install Jemalloc and set the LD_PRELOAD
environment variable through it init script, but since the service is
launched through Docker the env variable needs to be set manually.
After this patch, when running td-agent container image now is possible
to see that Jemalloc is used:
root@monotop:/proc/18810# cat maps |grep jemall
7f251eddd000-7f251ee1b000 ... /opt/td-agent/embedded/lib/libjemalloc.so.2
7f251ee1b000-7f251f01b000 ... /opt/td-agent/embedded/lib/libjemalloc.so.2
7f251f01b000-7f251f01d000 ... /opt/td-agent/embedded/lib/libjemalloc.so.2
7f251f01d000-7f251f01e000 ... /opt/td-agent/embedded/lib/libjemalloc.so.2
For a reference about the memory usage difference between malloc v/s jemalloc
please refer to the following chart:
https://goo.gl/dVYTmw
Signed-off-by: Eduardo Silva <eduardo@treasure-data.com>
Automatic merge from submit-queue
Elasticsearch and Kibana update
```release-note
Updated Elasticsearch image from version 1.5.1 to version 2.4.1. Updated Kibana image from version 4.0.2 to version 4.6.1.
```
Updated es and kibana images. Made image versions match es/kibana versions they contain.
ref #19149
Automatic merge from submit-queue
Upgrade addon-manager with kubectl apply
The first step of #33698.
Use `kubectl apply` to replace addon-manager's previous logic.
The most important issue this PR is targeting is the upgrade from 1.4 to 1.5. Procedure as below:
1. Precondition: After the master is upgraded, new addon-manager starts and all the old resources on nodes are running normally.
2. Annotate the old ReplicationController resources with kubectl.kubernetes.io/last-applied-configuration=""
3. Call `kubectl apply --prune=false` on addons folder to create new addons, including the new Deployments.
4. Wait for one minute for new addons to be spinned up.
5. Enter the periodical loop of `kubectl apply --prune=true`. The old RCs will be pruned at the first call.
Procedure of a normal startup:
1. Addon-manager starts and no addon resources are running.
2. Annotate nothing.
3. Call `kubectl apply --prune=false` to create all new addons.
4. No need to explain the remain.
Remained Issues:
- Need to add `--type` flag to `kubectl apply --prune`, mentioned [here](https://github.com/kubernetes/kubernetes/pull/33075#discussion_r80814070).
- This addon manager is not working properly with the current Deployment heapster, which runs [addon-resizer](https://github.com/kubernetes/contrib/tree/master/addon-resizer) in the same pod and changes resource limit configuration through the apiserver. `kubectl apply` fights with the addon-resizers. May be we should remove the initial resource limit field in the configuration file for this specific Deployment as we removed the replica count.
@mikedanese @thockin @bprashanth
---
Below are some logical things that may need to be clarified, feel free to **OMIT** them as they are too verbose:
- For upgrade, the old RCs will not fight with the new Deployments during the overlap period even if they use the same label in template:
- Deployment will not recognize the old pods because it need to match an additional "pod-template-hash" label.
- ReplicationController will not manage the new pods (created by deployment) because the [`controllerRef`](https://github.com/kubernetes/kubernetes/blob/master/docs/proposals/controller-ref.md) feature.
- As we are moving all addons to Deployment, all old RCs would be removed. Attach empty annotation to RCs is only for letting `kubectl apply --prune` to recognize them, the content does not matter.
- We might need to also annotate other resource types if we plan to upgrade them in 1.5 release:
- They don't need to be attached this fake annotation if they remain in the same name. `kubectl apply` can recognize them by name/type/namespace.
- In the other case, attaching empty annotations to them will still work. As the plan is to use label selector for annotate, some innocence old resources may also be attached empty annotations, they work as below two cases:
- Resources that need to be bumped up to a newer version (mainly due to some significant update --- change disallowed fields --- that could not be managed by the update feature of `kubectl apply`) are good to go with this fake annotation, as old resources will be deleted and new sources will be created. The content in annotation does not matter.
- Resources that need to stay inside the management of `kubectl apply` is also good to go. As `kubectl apply` will [generate a 3-way merge patch](https://github.com/kubernetes/kubernetes/blob/master/pkg/util/strategicpatch/patch.go#L1202-L1226). This empty annotation is harmless enough.
Automatic merge from submit-queue
fix sed command run failed on mac os
bash command ```sed -i ... ``` run failed on mac os, it should be ```sed -i.back ..```
We can then avoid the following warning:
```
WARNING: The '--' argument must be specified between gcloud specific args on the left and DOCKER_ARGS on the right. IMPORTANT: previously, commands allowed the omission of the --, and unparsed arguments were treated as implementation args. This usage is being deprecated and will be removed in March 2017.
This will be strictly enforced in March 2017. Use 'gcloud beta docker' to see new behavior.
```
Signed-off-by: Jess Frazelle <acidburn@google.com>
Automatic merge from submit-queue
Added --log-facility flag to enhance dnsmasq logging
Fix#31010.
Dnsmasq in kube-dns pod is logging in default setting, which is somehow hard to locate. Add --log-facility=- flag to redirect logs to std.
@girishkalele
Automatic merge from submit-queue
Use a Deployment for kube-dns
Attempt to fix#31554
Switching kube-dns from using Replication Controller to Deployment.
The outdated kube-dns YAML file in coreos and juju dir is also updated. Most of the specific memory limit in the files remain unchanged because it seems like people were modifying it explicitly(c8d82fc2a9). Only the memory limit for healthz is increased due to this pending investigation(#29688).
YAML files stay in *-rc.yaml format considering there are a lots of scripts in cluster and hack dirs are using this format. But it may be fine to changed them all.
@bprashanth @girishkalele
Automatic merge from submit-queue
Reduce size of images fluentd-gcp and fluentd-elasticsearch
replaces #26652
```
aledbf/fluentd-elasticsearch 1.19 769ece5c8ba8 About an hour ago 269.9 MB
gcr.io/google_containers/fluentd-elasticsearch 1.18 0a8cbfbea7f7 5 weeks ago 530.3 MB
aledbf/fluentd-gcp 1.22 ef979b82a767 About an hour ago 307.9 MB
gcr.io/google_containers/fluentd-gcp 1.21 0ef09b1bcfd7 2 weeks ago 498.5 MB
```
closes#29782
Automatic merge from submit-queue
Add user-specified kubectl arguments to addons start script
This is a simple way, using the same environment variable paradigm used throughout these scripts, to let a user specify kubectl arguments to the addons script.
fixes#30371
Automatic merge from submit-queue
Add support for kube-up.sh to deploy Calico network policy to GCI masters
Also remove requirement for calicoctl from Debian / salt installed nodes and clean it up a little by deploying calico-node with a manifest rather than calicoctl. This also makes it more reliable by retrying properly.
How to use:
```
make quick-release
NETWORK_POLICY_PROVIDER=calico cluster/kube-up.sh
```
One place where I was uncertain:
- CPU allocations (on the master particularly, where there's very little spare capacity). I took some from etcd, but if there's a better way to decide this, I'm happy to change it.
<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.kubernetes.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.kubernetes.io/reviews/kubernetes/kubernetes/29037)
<!-- Reviewable:end -->
Automatic merge from submit-queue
Add cleanup addon pod to remove empty keys from etcd
namespace deletion will leave a trace of empty keys on etcd. This PR adds an addon pod to periodically check for those empty keys on etcd and remove them.
fixes#27307
Automatic merge from submit-queue
Bump exechealthz image
With the new image at least if we observe an exec container taking more ram than it should (like the oom situation, which shouldn't happen today because of the increased limits), we can kubectl exec and check the pprof endpoints.
Note that I'm not bumping the rc version, because I just did so with: https://github.com/kubernetes/kubernetes/pull/29693.
Only run the systemd-journal plugin when on a platform that requests it.
The plugin crashes the fluentd process if the journal isn't present, so
it can't just be run blindly in all configurations.
Automatic merge from submit-queue
Update to dnsmasq:1.3 and make hyperkube always use the latest addons
This bumps dnsmasq to a version that works on all architectures: https://github.com/kubernetes/contrib/pull/1192 (which have to be pushed first indeed)
Also I removed the manifests in hyperkube addons in favor for machine-generated ones, which will avoid mistakes.
This one is required for `v1.3`, so it has to be cherrypicked I think...
It makes docker and docker-multinode addons work again...
(Yes, we'll probably get rid of docker in favor for minikube, but we'll have to have it in this release at least)
@girishkalele @thockin @ArtfulCoder @david-mcmahon @bgrant0607 @mikedanese
Automatic merge from submit-queue
Re-enable node problem detector by default
Re-enable node problem detector started in gce cluster by default.
For now, in the master node, the node problem detector will be started and do nothing (see https://github.com/kubernetes/node-problem-detector/pull/13).
But in fact, in my test cluster, the master has no extra cpu to run the node problem detector, so node problem detector is started on all nodes except master, which is what we want but not expected...
@dchen1107
/cc @kubernetes/sig-node
/cc @andyzheng0831 for the gci script change.
[]()
Automatic merge from submit-queue
Add collection of the new glbc and cluster-autoscaler logs
I've incremented the version numbers by 2 to avoid conflicting with #26652. I'll make sure the potential conflict between the images gets resolved reasonably.
cc @piosz @bprashanth @aledbf
Automatic merge from submit-queue
Switch DNS addons from skydns to kubedns
Change GCI and trusty cluster-helper scripts to use kubedns instead of skydns.
Unified skydns templates using a simple underscore based template and
added transform sed scripts to transform into salt and sed yaml
templates
Moved all content out of cluster/addons/dns into build/kube-dns and
saltbase/salt/kube-dns
Automatic merge from submit-queue
Add node problem detector as an addon pod.
```release-note
Introduce a new add-on pod NodeProblemDetector.
NodeProblemDetector is a DaemonSet running on each node, monitoring node health and reporting
node problems as NodeCondition and Event. Currently it already supports kernel log monitoring, and
will support more problem detection in the future. It is enabled by default on gce now.
```
This PR enables NodeProblemDetector as an add-on pod.
/cc @mikedanese @kubernetes/sig-node
[]()
Automatic merge from submit-queue
add index template for es aggregations
This index template helps us to do es aggregations of namespace_name, pod_name and container_name. Then after doing eggs, we will get the whole name not all the spilt pieces.
fix#25127
Automatic merge from submit-queue
Initial kube-up support for VMware's Photon Controller
This is for: https://github.com/kubernetes/kubernetes/issues/24121
Photon Controller is an open-source cloud management platform. More
information is available at:
http://vmware.github.io/photon-controller/
This commit provides initial support for Photon Controller. The
following features are tested and working:
- kube-up and kube-down
- Basic pod and service management
- Networking within the Kubernetes cluster
- UI and DNS addons
It has been tested with a Kubernetes cluster of up to 10
nodes. Further work on scaling is planned for the near future.
Internally we have implemented continuous integration testing and will
run it multiple times per day against the Kubernetes master branch
once this is integrated so we can quickly react to problems.
A few things have not yet been implemented, but are planned:
- Support for kube-push
- Support for test-build-release, test-setup, test-teardown
Assuming this is accepted for inclusion, we will write documentation
for the kubernetes.io site.
We have included a script to help users configure Photon Controller
for use with Kubernetes. While not required, it will help some
users get started more quickly. It will be documented.
We are aware of the kube-deploy efforts and will track them and
support them as appropriate.
This is for: https://github.com/kubernetes/kubernetes/issues/24121
Photon Controller is an open-source cloud management platform. More
information is available at:
http://vmware.github.io/photon-controller/
This commit provides initial support for Photon Controller. The
following features are tested and working:
- kube-up and kube-down
- Basic pod and service management
- Networking within the Kubernetes cluster
- UI and DNS addons
It has been tested with a Kubernetes cluster of up to 10
nodes. Further work on scaling is planned for the near future.
Internally we have implemented continuous integration testing and will
run it multiple times per day against the Kubernetes master branch
once this is integrated so we can quickly react to problems.
A few things have not yet been implemented, but are planned:
- Support for kube-push
- Support for test-build-release, test-setup, test-teardown
Assuming this is accepted for inclusion, we will write documentation
for the kubernetes.io site.
We have included a script to help users configure Photon Controller
for use with Kubernetes. While not required, it will help some
users get started more quickly. It will be documented.
We are aware of the kube-deploy efforts and will track them and
support them as appropriate.
Automatic merge from submit-queue
Fix GLBC cluster addon README link
Fix the link to L7 load balancer controller in GLBC cluster addon README.
Fixed#24462.
Automatic merge from submit-queue
don't ship kube-registry-proxy and pause images in tars.
pause is built into containervm. if it's not on the machine we should just pull
it. nobody that I'm aware of uses kube-registry-proxy and it makes build/deployment
more complicated and slower.
Automatic merge from submit-queue
Make kube2sky and skydns docker images cross-platform
ARM tracking issue: #17981
Continues on: #19216
Make it possible to create `kube2sky` and `skydns` docker images for ARM and other architectures too
Build in a container, so `golang` isn't a dependency
I've preserved the original default behaviour:
- `skydns`: It just compiles with go on host
- `kube2sky`: Build an image
@brendandburns @dchen1107 @ArtfulCoder @thockin @fgrzadkowski
pause is built into containervm. if it's not on the machine we should just pull
it. nobody that I'm aware of uses kube-registry-proxy and it makes build/deployment
more complicated and slower.
It includes some performance improvements for parsing JSON (which is
very important for us, since all Docker logs are JSON) as well as a
couple new settings, like forcing of a flush of multiline logs after a
time period rather than having to wait until a new log is seen before
feeling confident flushing the previous one.
I didn't expect glog to split single log statements onto multiple lines,
but apparently it does if they're long enough. This groups them back
together appropriately.
Combine the fields that will be used for content transformation
(content-type, codec, and group version) into a single struct in client,
and then pass that struct into the rest client and request. Set the
content-type when sending requests to the server, and accept the content
type as primary.
Will form the foundation for content-negotiation via the client.
To build the python image, BUILD_PYTHON_IMAGE should be set during make.
When the addon script is running, it will check if python is installed
on the machine, if not, it will use the python image that built previously.
This scopes down the initially ambitious PR:
https://github.com/kubernetes/kubernetes/pull/14960 to replace just
`pause` and `fluentd-elasticsearch` to come through `beta.gcr.io`.
The v2 versions have been pushed under new tags, `pause:2.0` and
`fluentd-elastisearch:1.12`.
NOTE: `beta.gcr.io` will still serve images using v1 until they are repushed with v2. Pulls through `gcr.io` will still work after pushing through `beta.gcr.io`, but will be served over v1 (via compat logic).
The new version fixes problem with missing metrics.
The new config decreases load on GCM/InfluxDB.
Increased stats resolution from default 5s to 30s.
Decreased sink frequency from 2m to 1m.
Since skydns is created in namespace 'kube-system' and kubernetes service is created in namespace 'default', if busybox is created in namespace 'kube-system' then nslookup will work with 'kubernetes.default'.
changes to fluent-plugin-google-cloud to attach Kubernetes metadata to
logs.
Along with this, separate logs from containers in the cluster out from
logs from the daemons running on the node by instantiating two instances
of the output plugin, one which uses the new metadata (for containers)
and one which doesn't (for things like docker and the kubelet).
gunk when installing the google-fluentd agent.
Also let it log things by not redirecting to a file within the container
and only using -q (warning logs only) rather than -qq (error logs only).
This registry can be accessed through proxies that run on each node
listening on port 5000. We send the proxy images to the nodes directly
to avoid requests that hit the network during cluster launch. For now,
we continue to pull the registry itself over the network, especially
given its large size (we should be able to dramatically shrink the
image). On GCE we create a PD and use that for storage, otherwise we
use an emptyDir. The registry is not enabled outside of GCE. All
communication is currently plain HTTP. In order to use SSL, we will
need to be able to request a certificate/key from the apiserver signed
by the apiserver's CA cert.
This was originally submitted to pick up v0.3.1 of the cloud logging
plugin which had a fix for the name 'metadata' failing to resolve.
Since new releases of google-fluentd have this fix, it is no longer
required.
I've done some additional testing of 'gem update' behavior in the interim
and I think it is ok to use in targeted situations, but we should not be
doing an unconstrained update in general. The issue is that updating a
gem may bring new dependencies, some of those dependencies may include
native code, so it may try to launch a compiler, which isn't desirable
and prone to failure.
If we do need to grab an updated gem in the future we should specify an
explicit version and the --minimal-deps flag.
Changes include:
- Add kube-ui binary for serving static dashboard UI
- Add kube-ui docker image, replication controller, and service
- Make the kube-ui a cluster-addon (enabled by default)
- Split the compiled pkg/ui/datafile.go into separate dashboard and swagger packages
- Update docs to reflect changes
pkg/service:
There were a couple of references here just as a reminder to change the
behavior of findPort. As of v1beta3, TargetPort was always defaulted, so
we could remove findDefaultPort and related tests.
pkg/apiserver:
The tests were using versioned API codecs for some of their encoding
tests. Necessary API types had to be written and registered with the
fake versioned codecs.
pkg/kubectl:
Some tests were converted to current versions where it made sense.
Refactored the code a bit to make it easy for future enhancements and
unit testing.
Tested the change manually on a kube node with and without kubeConfig based tokens.
1. Move fluentd-gcp to be a core cluster addon, rather than a contrib.
2. Get rid of the synthetic logger under contrib, since the exact same
synthetic logger was also included in the logging-demo.
3. Move the logging-demo to examples, since it's effectively an example.
We should also consider adding on a GCP section to the logging-demo
example :)
I had some trouble with the kubernetes docker image for SkyDNS being outdated. In my experience the version in `kubernetes/skydns:2014-12-23-001` will not behave correctly if it manages to startup before etcd, for details see skynetservices/skydns#142
Updating to SkyDNS latest fixes this.
This is first version of the command. It prints IPs of master and cluster
services. Should be improved once generalized labels are implemented #341.
It requires label kubernet.io/cluster-service=true set for cluster services.
Follow up cl after discussion in #4417.
Adds labels to the services, waits for them to be created (which
should be instant, but just in case), query the forwarding rules like
as we did before.
Fixes#3893
This implements phase 1 of the proposal in #3579, moving the creation
of the pods, RCs, and services to the master after the apiserver is
available.
This is such a wide commit because our existing initial config story
is special:
* Add kube-addons service and associated salt configuration:
** We configure /etc/kubernetes/addons to be a directory of objects
that are appropriately configured for the current cluster.
** "/etc/init.d/kube-addons start" slurps up everything in that dir.
(Most of the difficult is the business logic in salt around getting
that directory built at all.)
** We cheat and overlay cluster/addons into saltbase/salt/kube-addons
as config files for the kube-addons meta-service.
* Change .yaml.in files to salt templates
* Rename {setup,teardown}-{monitoring,logging} to
{setup,teardown}-{monitoring,logging}-firewall to properly reflect
their real purpose now (the purpose of these functions is now ONLY to
bring up the firewall rules, and possibly to relay the IP to the user).
* Rework GCE {setup,teardown}-{monitoring,logging}-firewall: Both
functions were improperly configuring global rules, yet used
lifecycles tied to the cluster. Use $NODE_INSTANCE_PREFIX with the
rule. The logging rule needed a $NETWORK specifier. The monitoring
rule tried gcloud describe first, but given the instancing, this feels
like a waste of time now.
* Plumb ENABLE_CLUSTER_MONITORING, ENABLE_CLUSTER_LOGGING,
ELASTICSEARCH_LOGGING_REPLICAS and DNS_REPLICAS down to the master,
since these are needed there now.
(Desperately want just a yaml or json file we can share between
providers that has all this crap. Maybe #3525 is an answer?)
Huge caveats: I've gone pretty firm testing on GCE, including
twiddling the env variables and making sure the objects I expect to
come up, come up. I've tested that it doesn't break GKE bringup
somehow. But I haven't had a chance to test the other providers.
Removed auth for Grafana to facilitate usage via service proxy on the api-server.
Added a grafana service
Removed elasticsearch dependency for monitoring - faster startup times.
Removed auth for Grafana to facilitate usage via service proxy on the api-server.
Added a grafana service
Removed elasticsearch dependency for monitoring - faster startup times.