Automatic merge from submit-queue
Update fluentd-gcp configuration
Related to #32762
Though it's not a final solution to the fluentd OOM problems, it increases number of logs that can be handled without losses by
- switching to the file buffering, making buffering mechanism more resilient
- decreasing size of the buffer, decreasing the amount of memory needed
- decreasing number of threads handling the load, since number of chunks is lower than previous number of threads
which results in decrease in theoretical throughput. Tests to confirm cases covered by this change will follow.
cc @piosz @edsiper @repeatedly please take look and confirm that all of these changed are meaningful.
Automatic merge from submit-queue
Rename PetSet to StatefulSet in docs and examples.
**What this PR does / why we need it**: Addresses some of the pre-code-freeze changes for implementing the PetSet --> StatefulSet rename. (#35534)
**Special notes for your reviewer**: This PR only changes docs and examples, as #35731 hasn't been merged yet and I don't want to create merge conflicts. I'll open another PR for any remaining code changes needed after that PR is merged. /cc @erictune @janetkuo @chrislovecnm
Automatic merge from submit-queue
cluster-addons: enable Jemalloc for Fluentd based images
**What this PR does / why we need it**:
This Pull Request includes two patches that enable the recommended use of Jemalloc memory allocator for container images that are based in Fluentd. The patches applies to the following cluster-addons:
- fluentd-es-image
- fluentd-gcp-image
**Which issue this PR fixes**
This PR is part of the solution for issues:
- kubernetes/kubernetes/issues/32762
- GoogleCloudPlatform/fluent-plugin-google-cloud/issues/87
When Fluentd runs in high load environments, it's likely the default operating system memory allocator will generate a high fragmentation ending up in a high memory usage. In order to reduce fragmentation and decrease memory usage an alternative memory allocator as Jemalloc is used.

For the record: fluentd-es-image uses [td-agent](https://docs.treasuredata.com/articles/td-agent) Fluentd package maintained by Treasure Data, which contains Jemalloc 4.2.1 (latest stable version). The google-fluentd package used in fluentd-gcp-image comes with Jemalloc 2.2.5, which have many known issues, I strongly suggest google-fluentd package gets updated.
**Special notes for your reviewer**:
In the research of this topic have been involved @piosz and @Crassirostris.
Automatic merge from submit-queue
Fixed kibana image and controller to work through proxy
As described in #34969, new kibana image doesn't work properly with proxies without additional configuration.
@piosz
Automatic merge from submit-queue
Update grafana in kubernetes to version 3.1.1
Fix#33775
```release-note
Update grafana version used by default in kubernetes to 3.1.1
```
@piosz
Automatic merge from submit-queue
Update elasticsearch and kibana usage
```release-note
Updated default Elasticsearch and Kibana used for elasticsearch logging destination to versions 2.4.1 and 4.6.1 respectively.
```
Updated controllers for elasticsearch and kibana to use newer versions of images. Fixed e2e test because of elasticsearch backward incompatible API changes.
Fixed out of sync elasticsearch controller for coreos.
@piosz
Automatic merge from submit-queue
Updated addon manager READMEs
Updates addon-manager's README. Based on the pre-condition that the addon manager keeps current "reconciled" pattern instead of "fire-once".
@mikedanese
The current DockerFile build an image using td-agent package but it let
the service run with the default memory allocator provided by glibc.
In high load environments, is highly required to use a customized memory
allocator such as Jemalloc. Otherwise the service will face a high memory
fragmentation ending up in 'high memory' usage from a monitoring perspective.
td-agent package by default install Jemalloc and set the LD_PRELOAD
environment variable through it init script, but since the service is
launched through Docker the env variable needs to be set manually.
After this patch, when running td-agent container image now is possible
to see that Jemalloc is used:
root@monotop:/proc/18810# cat maps |grep jemall
7f251eddd000-7f251ee1b000 ... /opt/td-agent/embedded/lib/libjemalloc.so.2
7f251ee1b000-7f251f01b000 ... /opt/td-agent/embedded/lib/libjemalloc.so.2
7f251f01b000-7f251f01d000 ... /opt/td-agent/embedded/lib/libjemalloc.so.2
7f251f01d000-7f251f01e000 ... /opt/td-agent/embedded/lib/libjemalloc.so.2
For a reference about the memory usage difference between malloc v/s jemalloc
please refer to the following chart:
https://goo.gl/dVYTmw
Signed-off-by: Eduardo Silva <eduardo@treasure-data.com>
Automatic merge from submit-queue
Elasticsearch and Kibana update
```release-note
Updated Elasticsearch image from version 1.5.1 to version 2.4.1. Updated Kibana image from version 4.0.2 to version 4.6.1.
```
Updated es and kibana images. Made image versions match es/kibana versions they contain.
ref #19149
Automatic merge from submit-queue
Upgrade addon-manager with kubectl apply
The first step of #33698.
Use `kubectl apply` to replace addon-manager's previous logic.
The most important issue this PR is targeting is the upgrade from 1.4 to 1.5. Procedure as below:
1. Precondition: After the master is upgraded, new addon-manager starts and all the old resources on nodes are running normally.
2. Annotate the old ReplicationController resources with kubectl.kubernetes.io/last-applied-configuration=""
3. Call `kubectl apply --prune=false` on addons folder to create new addons, including the new Deployments.
4. Wait for one minute for new addons to be spinned up.
5. Enter the periodical loop of `kubectl apply --prune=true`. The old RCs will be pruned at the first call.
Procedure of a normal startup:
1. Addon-manager starts and no addon resources are running.
2. Annotate nothing.
3. Call `kubectl apply --prune=false` to create all new addons.
4. No need to explain the remain.
Remained Issues:
- Need to add `--type` flag to `kubectl apply --prune`, mentioned [here](https://github.com/kubernetes/kubernetes/pull/33075#discussion_r80814070).
- This addon manager is not working properly with the current Deployment heapster, which runs [addon-resizer](https://github.com/kubernetes/contrib/tree/master/addon-resizer) in the same pod and changes resource limit configuration through the apiserver. `kubectl apply` fights with the addon-resizers. May be we should remove the initial resource limit field in the configuration file for this specific Deployment as we removed the replica count.
@mikedanese @thockin @bprashanth
---
Below are some logical things that may need to be clarified, feel free to **OMIT** them as they are too verbose:
- For upgrade, the old RCs will not fight with the new Deployments during the overlap period even if they use the same label in template:
- Deployment will not recognize the old pods because it need to match an additional "pod-template-hash" label.
- ReplicationController will not manage the new pods (created by deployment) because the [`controllerRef`](https://github.com/kubernetes/kubernetes/blob/master/docs/proposals/controller-ref.md) feature.
- As we are moving all addons to Deployment, all old RCs would be removed. Attach empty annotation to RCs is only for letting `kubectl apply --prune` to recognize them, the content does not matter.
- We might need to also annotate other resource types if we plan to upgrade them in 1.5 release:
- They don't need to be attached this fake annotation if they remain in the same name. `kubectl apply` can recognize them by name/type/namespace.
- In the other case, attaching empty annotations to them will still work. As the plan is to use label selector for annotate, some innocence old resources may also be attached empty annotations, they work as below two cases:
- Resources that need to be bumped up to a newer version (mainly due to some significant update --- change disallowed fields --- that could not be managed by the update feature of `kubectl apply`) are good to go with this fake annotation, as old resources will be deleted and new sources will be created. The content in annotation does not matter.
- Resources that need to stay inside the management of `kubectl apply` is also good to go. As `kubectl apply` will [generate a 3-way merge patch](https://github.com/kubernetes/kubernetes/blob/master/pkg/util/strategicpatch/patch.go#L1202-L1226). This empty annotation is harmless enough.
Automatic merge from submit-queue
fix sed command run failed on mac os
bash command ```sed -i ... ``` run failed on mac os, it should be ```sed -i.back ..```
We can then avoid the following warning:
```
WARNING: The '--' argument must be specified between gcloud specific args on the left and DOCKER_ARGS on the right. IMPORTANT: previously, commands allowed the omission of the --, and unparsed arguments were treated as implementation args. This usage is being deprecated and will be removed in March 2017.
This will be strictly enforced in March 2017. Use 'gcloud beta docker' to see new behavior.
```
Signed-off-by: Jess Frazelle <acidburn@google.com>
Automatic merge from submit-queue
Added --log-facility flag to enhance dnsmasq logging
Fix#31010.
Dnsmasq in kube-dns pod is logging in default setting, which is somehow hard to locate. Add --log-facility=- flag to redirect logs to std.
@girishkalele
Automatic merge from submit-queue
Use a Deployment for kube-dns
Attempt to fix#31554
Switching kube-dns from using Replication Controller to Deployment.
The outdated kube-dns YAML file in coreos and juju dir is also updated. Most of the specific memory limit in the files remain unchanged because it seems like people were modifying it explicitly(c8d82fc2a9). Only the memory limit for healthz is increased due to this pending investigation(#29688).
YAML files stay in *-rc.yaml format considering there are a lots of scripts in cluster and hack dirs are using this format. But it may be fine to changed them all.
@bprashanth @girishkalele
Automatic merge from submit-queue
Reduce size of images fluentd-gcp and fluentd-elasticsearch
replaces #26652
```
aledbf/fluentd-elasticsearch 1.19 769ece5c8ba8 About an hour ago 269.9 MB
gcr.io/google_containers/fluentd-elasticsearch 1.18 0a8cbfbea7f7 5 weeks ago 530.3 MB
aledbf/fluentd-gcp 1.22 ef979b82a767 About an hour ago 307.9 MB
gcr.io/google_containers/fluentd-gcp 1.21 0ef09b1bcfd7 2 weeks ago 498.5 MB
```
closes#29782
Automatic merge from submit-queue
Add user-specified kubectl arguments to addons start script
This is a simple way, using the same environment variable paradigm used throughout these scripts, to let a user specify kubectl arguments to the addons script.
fixes#30371
Automatic merge from submit-queue
Add support for kube-up.sh to deploy Calico network policy to GCI masters
Also remove requirement for calicoctl from Debian / salt installed nodes and clean it up a little by deploying calico-node with a manifest rather than calicoctl. This also makes it more reliable by retrying properly.
How to use:
```
make quick-release
NETWORK_POLICY_PROVIDER=calico cluster/kube-up.sh
```
One place where I was uncertain:
- CPU allocations (on the master particularly, where there's very little spare capacity). I took some from etcd, but if there's a better way to decide this, I'm happy to change it.
<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.kubernetes.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.kubernetes.io/reviews/kubernetes/kubernetes/29037)
<!-- Reviewable:end -->
Automatic merge from submit-queue
Add cleanup addon pod to remove empty keys from etcd
namespace deletion will leave a trace of empty keys on etcd. This PR adds an addon pod to periodically check for those empty keys on etcd and remove them.
fixes#27307
Automatic merge from submit-queue
Bump exechealthz image
With the new image at least if we observe an exec container taking more ram than it should (like the oom situation, which shouldn't happen today because of the increased limits), we can kubectl exec and check the pprof endpoints.
Note that I'm not bumping the rc version, because I just did so with: https://github.com/kubernetes/kubernetes/pull/29693.