Commit Graph

1890 Commits

Author SHA1 Message Date
SataQiu
35a7924327 fix shellcheck failures of cluster/addons/addon-manager/kube-addons.sh 2019-09-02 19:18:51 +08:00
Kubernetes Prow Robot
0466cb6e69
Merge pull request #82140 from wangzhen127/fix-npd-test
Update NPD addon to use v0.7.1
2019-08-30 13:04:33 -07:00
Kubernetes Prow Robot
7236850194
Merge pull request #82093 from rajansandeep/reconcilecorednscm
Add the ability to migrate CoreDNS configmap in kube-up
2019-08-30 07:59:56 -07:00
Zhen Wang
58e64193c9 Update NPD addon to use v0.7.1 2019-08-29 11:24:13 -07:00
Sandeep Rajan
8a7a8032b1 hardcoded check sha of corefile tool 2019-08-29 10:03:29 -04:00
Kubernetes Prow Robot
467bdcb445
Merge pull request #81532 from MrHohn/cpa-1.7.0
Bump cluster proportional autoscaler to 1.7.1
2019-08-27 19:37:32 -07:00
Zihong Zheng
84e8bccdb5 Bump cluster proportional autoscaler to 1.7.1 2019-08-26 13:22:53 -07:00
Kubernetes Prow Robot
5ced7377c3
Merge pull request #81428 from MrHohn/cpva-v0.8.1
Bump vertical autoscaler to v0.8.1
2019-08-23 17:58:50 -07:00
Kubernetes Prow Robot
6789f38199
Merge pull request #80912 from monotek/fluentd-elasticsearch
[fluentd/elasticsearch] updated fluentd to 1.6.3
2019-08-20 23:05:16 -07:00
Zheng Chen
70a7134906
added override for sd testing env in event-exporter yaml 2019-08-20 16:29:15 -04:00
Sandeep Rajan
7980da9f46 bump coredns to 1.5.0 2019-08-20 14:38:23 -04:00
Kubernetes Prow Robot
ec57547034
Merge pull request #80864 from jeefy/owner-updates
Prune OWNERS file
2019-08-19 02:53:30 -07:00
Zihong Zheng
dfe2e1a1ee Bump vertical autoscaler to v0.8.1 2019-08-14 11:26:31 -07:00
André Bauer
8cda6da27d use image in statefulset too
Signed-off-by: André Bauer <monotek23@gmail.com>
2019-08-13 19:45:52 +02:00
draveness
495faa22db feat: cleanup pod critical pod annotations feature 2019-08-09 08:41:23 +08:00
Kubernetes Prow Robot
bdb8e05b97
Merge pull request #80536 from lzang/policy
Upgrade Calico to 3.7.4
2019-08-05 13:35:49 -07:00
André Bauer
bb51318a07 added latest tag
Signed-off-by: André Bauer <monotek23@gmail.com>
2019-08-02 16:34:45 +02:00
André Bauer
596328de41 fixed whitespaces
Signed-off-by: André Bauer <monotek23@gmail.com>
2019-08-02 16:32:31 +02:00
André Bauer
ca9424dd2a updated fluentd to 1.6.3
Signed-off-by: André Bauer <monotek23@gmail.com>
2019-08-02 16:05:15 +02:00
Maciej Borsz
a620b47e13 Adde 9.0.2 to CHANGELOG.md 2019-08-01 20:15:46 +02:00
Jeffrey Sica
5bc4deafa0 prune owners 2019-08-01 11:13:03 -04:00
Zang Li
0bc273d646 Upgrade Calico to 3.7.4
Release note: https://docs.projectcalico.org/v3.7/release-notes/
2019-07-30 16:53:25 -07:00
Maciej Borsz
9f71739623
Bump kube-addon-manager's version to v9.0.2 2019-07-26 11:36:37 +02:00
Kubernetes Prow Robot
96594b6723
Merge pull request #80566 from BenTheElder/fix-image-ref
fix kube-proxy manifest
2019-07-25 22:36:36 -07:00
Maciej Borsz
b851a3365a
Fix leader election in kube-addon manager 2019-07-25 14:00:22 +02:00
Benjamin Elder
8d04fa065f fix kube-proxy manifest 2019-07-25 00:41:45 -07:00
Laurent Godet
19c0aa98e1 Fix es 7.x.x initial cluster formation 2019-07-24 16:42:40 +01:00
Tobias Bradtke
ce3e3f0660
Fix link to moved Docker image
See https://github.com/kubernetes/kubernetes/pull/79390
2019-07-22 20:15:08 +02:00
draveness
d83526d253 Revert "feat: cleanup pod critical pod annotations feature"
This reverts commit b6d41ee5cc.
2019-07-18 13:31:12 +08:00
draveness
b6d41ee5cc feat: cleanup pod critical pod annotations feature 2019-07-11 08:54:19 +08:00
André Bauer
146d7c85dc updated fluentd to 1.5.1, es & kibana to 7.1.1
Signed-off-by: André Bauer <monotek23@gmail.com>
2019-07-03 12:26:55 +02:00
Kubernetes Prow Robot
68eb29cba0
Merge pull request #79390 from coffeepac/move-es-fd-to-quay
move images from gcr.io to quay.io
2019-06-28 12:43:32 -07:00
Kubernetes Prow Robot
85aaf7ea36
Merge pull request #79407 from BenTheElder/super-minor-shellcheck
clarify elasticsearch script as bash
2019-06-27 18:53:45 -07:00
Kubernetes Prow Robot
ed9f340add
Merge pull request #79305 from paivagustavo/clean-up-self-set-node-labels
Clean up self-set node labels
2019-06-27 11:37:21 -07:00
Pat Christopher
8c819a2dc1 add defaul latest tag as well 2019-06-26 12:11:00 -07:00
Benjamin Elder
1a28fbde4d clarify elastisearch script as bash 2019-06-25 22:19:45 -07:00
Pat Christopher
a1cda614dc move images from gcr.io to quay.io 2019-06-25 14:37:02 -07:00
Kubernetes Prow Robot
f4a62ad660
Merge pull request #78868 from chardch/update-plugin-vulkan
Update gpu device plugin to better support Vulkan workloads
2019-06-25 09:32:39 -07:00
Kubernetes Prow Robot
8c3b7d7679
Merge pull request #76955 from ialidzhikov/readiness-probe
Add probes for Kibana
2019-06-23 16:23:53 -07:00
Kubernetes Prow Robot
eee3e976d8
Merge pull request #78294 from vllry/kp-remove-resource-container
Remove deprecated flag --resource-container from kube-proxy
2019-06-22 00:38:12 -07:00
Kubernetes Prow Robot
4e0b76469f
Merge pull request #75638 from ramnar/bugfix_24thMarch
Bug fix 72757.Removed deprecated label kubernetes.io/cluster-service
2019-06-20 06:54:49 -07:00
Gustavo Paiva
ca3519c7ad Clean up selft-set node labels 2019-06-20 00:07:31 -03:00
Vinay Bannai
e7b7c8bc10 The default-http-backend for handling 404 pages will now point to 404 handler
with prometheus integration and provides metrics related to requests per second
and the duration of responding to the requests for various percentile
groupings. Please check
https://github.com/kubernetes/ingress-gce/blob/master/cmd/404-server-with-metrics/README.md
for details about the 404-server-with-metrics.
2019-06-17 11:24:47 -07:00
Kubernetes Prow Robot
e91556c62f
Merge pull request #72452 from SuperQ/patch-1
Adjust node_exporter CPU params
2019-06-17 02:40:11 -07:00
Vallery Lancey
dc0f14312e Removed deprecated --resource-container flag from kube-proxy. 2019-06-16 08:36:42 -07:00
Kubernetes Prow Robot
a03bc34313
Merge pull request #78672 from msau42/default-resize
Enable resize in default gce storageclass
2019-06-14 13:26:48 -07:00
Kubernetes Prow Robot
7abf2832bf
Merge pull request #78614 from liggitt/remove-beta-e2e-use
Move test/e2e use to v1 APIs
2019-06-14 07:58:38 -07:00
Sandeep Rajan
5e265e046c add upstream to CoreDNS ConfigMap 2019-06-11 18:38:46 -04:00
Richard Chen
687291c0bd Update gpu device plugin to better support Vulkan workloads 2019-06-10 11:15:54 -07:00
Sandeep Rajan
bfb809f3c6 revert coredns to 1.3.1 2019-06-04 08:53:02 -04:00
Michelle Au
95ec53e40d Enable resize in default gce storageclass
Change-Id: I0eda852543264cc1fdecc113c12dd8e797e6d362
2019-06-03 18:06:51 -07:00
Jordan Liggitt
8229af31d2 Move test/e2e use to v1 APIs 2019-06-03 14:46:08 -04:00
Kubernetes Prow Robot
0216ccf80a
Merge pull request #78546 from prameshj/nodelocal-1_15_3
Use node-cache image 1.15.3 in the yaml
2019-06-01 23:40:14 -07:00
Kubernetes Prow Robot
0f78b57fef
Merge pull request #77887 from prameshj/nodelocal-beta
Doc changes for nodelocaldns graduating to beta
2019-05-31 20:44:47 -07:00
Pavithra Ramesh
934f35f9b2 Use nodecache image 1.15.3 2019-05-30 13:29:05 -07:00
Kubernetes Prow Robot
c4a2042177
Merge pull request #78449 from santinoncs/add_annotations_for_prometheus
Add annotations for Prometheus service discovery
2019-05-30 02:24:59 -07:00
Kubernetes Prow Robot
59f0f2d2f9
Merge pull request #78417 from prameshj/nodelocal-cm
Lock down nodelocaldns configmap.
2019-05-30 00:33:11 -07:00
Santiago Nuñez-Cacho
fe53ed8ca9 metrics is default value. Not necessary. 2019-05-28 23:45:34 +02:00
Santiago Nuñez-Cacho
8603800f65 Add annotations for Prometheus service discovery 2019-05-28 18:32:37 +02:00
Pavithra Ramesh
86d12be975 Lock down nodelocaldns configmap. 2019-05-27 23:53:48 -07:00
Beata Skiba
cd6cc65236 Addon resizer version 1.8.5
Rebases addon-resizer to distroless
2019-05-27 15:12:29 +02:00
Sandeep Rajan
0b28419412 bump coredns version to 1.5.0 and update manifest 2019-05-17 10:03:02 -04:00
Kubernetes Prow Robot
3e8d49d46b
Merge pull request #77950 from yuwenma/bump-metrics-server
Bump metrics-server to v0.3.3
2019-05-17 05:38:32 -07:00
Kubernetes Prow Robot
af692da080
Merge pull request #77844 from grayluck/one-more-ip
Add 198.51.100.0/24 to non-masq ranges.
2019-05-17 05:38:19 -07:00
Kubernetes Prow Robot
f8d2b6b982
Merge pull request #77918 from mborsz/coredns
Make dns memory limit configurable
2019-05-16 08:49:08 -07:00
yankaiz
14015d9ce1 Add 198.51.100.0/24 to non-masq ranges.
Groupped the IP ranges by RFC and type.

Change reference for 198.18.0.0/15 from RFC 2544 to RFC 6815.
2019-05-15 16:23:41 -07:00
Yuwen Ma
454460f875 Bump metrics-server to v0.3.3 2019-05-15 11:44:45 -07:00
Maciej Borsz
59af63c687 Make coredns memory limit configurable 2019-05-15 13:35:28 +02:00
Pavithra Ramesh
e1748407a5 Doc changes for nodelocaldns graduating to beta 2019-05-14 14:01:33 -07:00
Kubernetes Prow Robot
d6c8edd391
Merge pull request #77690 from MrHohn/CPVA-0.7.0
Bump cluster-proportional-vertical-autoscaler to 0.7.1
2019-05-14 07:17:21 -07:00
Zihong Zheng
66086c32cf Bump cluster-proportional-vertical-autoscaler to 0.7.1 2019-05-13 13:22:27 -07:00
Marian Lobur
60e5717f4f Bump image of event-exporter.
Image has a new base image that have some security issue fixes.
2019-05-13 16:27:25 +02:00
Kubernetes Prow Robot
5d9d5bca79
Merge pull request #77765 from coffeepac/es-6.7.2
upgrade elasticsearch for vuln handling
2019-05-11 17:20:10 -07:00
Kubernetes Prow Robot
b6c53beed5
Merge pull request #72667 from jeefy/update-dashboard-owners
Update OWNERS so it isn't single threaded.
2019-05-11 00:46:12 -07:00
Kubernetes Prow Robot
5669014f52
Merge pull request #76854 from ialidzhikov/update-images
Update gem versions
2019-05-10 19:28:24 -07:00
Patrick Christopher
65fcbf4afb upgrade elasticsearch for vuln handling 2019-05-10 16:57:17 -07:00
Zihong Zheng
beba9921aa Bump cluster-proportional-autoscaler to 1.6.0 2019-05-09 11:25:12 -07:00
ialidzhikov
7082ed4330 Add readiness probe for Kibana
Signed-off-by: ialidzhikov <i.alidjikov@gmail.com>
2019-05-09 10:23:02 +03:00
Kubernetes Prow Robot
bec5345cc6
Merge pull request #77569 from yuwenma/patch-5
Bump metrics-server version to v0.3.3
2019-05-08 20:13:13 -07:00
Kubernetes Prow Robot
df117c7625
Merge pull request #73637 from ramnar/branch_bug_fix
Removes deprecated label kubernetes.io/cluster-service in yaml files of kubernetes add-ons. Bug fix #72757
2019-05-08 14:31:51 -07:00
Kubernetes Prow Robot
b34d7ac0ce
Merge pull request #77458 from grayluck/agent-v2.3.0
Bump ip-masq-agent version to v2.3.0. Enable nomasq for reserved IPs.
2019-05-07 17:52:58 -07:00
Yuwen Ma
7f629b6921
Bump metrics-server version to v0.3.3 2019-05-07 17:44:23 -07:00
Kubernetes Prow Robot
dca61deaf9
Merge pull request #77029 from StevenACoffman/patch-1
Update k8s.gcr.io/k8s-dns-node-cache image version
2019-05-07 14:31:02 -07:00
yankaiz
1059a71973 Bump ip-masq-agent version to v2.3.0. Enable nomasq for reserved IPs.
Added the non-masq ranges to configure-helper.sh so that GCE clusters
will have the non-masq IP ranges aligned with GKE clusters.
2019-05-06 22:32:34 -07:00
Kubernetes Prow Robot
8b0c36d620
Merge pull request #77328 from varunmar/ip-masq-cve-fix
Bump the version of the ip-masq-agent addon to pick up CVE fixes
2019-05-03 18:26:28 -07:00
Kubernetes Prow Robot
dbad8f360c
Merge pull request #77357 from dekkagaijin/md-proxy-bump
Bump metadata-proxy image to v0.1.12
2019-05-03 15:11:52 -07:00
Kubernetes Prow Robot
0b10d1b830
Merge pull request #77140 from dekkagaijin/glbc
use static token to authenticate glbc
2019-05-02 16:22:30 -07:00
Jake Sanders
0b6eb2bf89
Bump metadata-proxy image to v0.1.12
Rebases the image on `gcr.io/distroless/static:latest` per kubernetes/enhancements#900

https://github.com/GoogleCloudPlatform/k8s-metadata-proxy/releases/tag/v0.1.12
2019-05-02 11:57:52 -07:00
Kubernetes Prow Robot
d2ce69d9ad
Merge pull request #76762 from serathius/fluentd-gcp-scaler-0-5-2
Pick up security patches for fluentd-gcp-scaler by upgrading to version 0.5.2
2019-05-02 07:00:26 -07:00
Jake Sanders
8bd0b45eae use static token to authenticate glbc 2019-05-01 22:24:48 -07:00
Kubernetes Prow Robot
206eb91c15
Merge pull request #77035 from chardch/fix-device-plugin
Fix a bug in the gpu device plugin
2019-05-01 22:10:17 -07:00
Varun Marupadi
d4443fef81 Bump the version of the ip-masq-agent addon to pick up CVE fixes
This is related to the same CVE fixes in PR #75845

The CVEs are in the dependencies of ip-masq-agent -
debian-base bump at: https://github.com/kubernetes-incubator/ip-masq-agent/pull/31
debian-iptables-amd64 bump at: https://github.com/kubernetes-incubator/ip-masq-agent/pull/30
2019-05-01 18:26:27 -07:00
Steve Coffman
7f30be79b3 Update k8s-dns-node-cache image version
This revised image resolves kubernetes dns#292 by updating the image from `k8s-dns-node-cache:1.15.2` to `k8s-dns-node-cache:1.15.2`
2019-05-01 13:38:42 -04:00
ialidzhikov
becbed87f1 Update gem versions
Signed-off-by: ialidzhikov <i.alidjikov@gmail.com>
2019-05-01 20:13:31 +03:00
Kubernetes Prow Robot
31d36d865c
Merge pull request #77172 from ialidzhikov/remove-cluster-service-label
Clean up cluster-service label from fluentd-elastic
2019-04-30 09:21:41 -07:00
Kubernetes Prow Robot
4ebe11a6cb
Merge pull request #76110 from DirectXMan12/infra/prune-owners
Prune directxman12 from metrics/autoscaling OWNERS
2019-04-29 14:35:36 -07:00
ialidzhikov
5fc1bcba3f Clean up cluster-service label from fluentd-elastic
Signed-off-by: ialidzhikov <i.alidjikov@gmail.com>
2019-04-27 22:52:01 +03:00
Joel Smith
f55ebc6129 Fix link for resource metrics pipeline
See https://github.com/kubernetes/website/pull/12934
2019-04-24 22:38:48 -06:00
Richard Chen
2c681334c2 Fix a bug in the gpu device plugin where not all devices were registered.
Refer to https://github.com/GoogleCloudPlatform/container-engine-accelerators/pull/110
2019-04-24 18:02:00 -07:00
Kubernetes Prow Robot
888b81b638
Merge pull request #76238 from Dieken/30s-ttl-for-coredns
change default 5s ttl to 30s for coredns to be same with kube-dns/dnsmasq
2019-04-23 06:20:09 -07:00
Kubernetes Prow Robot
a961c13db5
Merge pull request #76640 from george-angel/master
update k8s.gcr.io/k8s-dns-node-cache image version
2019-04-22 14:38:02 -07:00
Marek Siarkowicz
2fc8ac9e81 [metrics-server addon] Restore metrics-server using of IP addresses
This preference list matches is used to pick prefered field from k8s
node object. It was introduced in metrics-server 0.3 and changed default
behaviour to use DNS instead of IP addresses. It was merged into k8s
1.12 and caused breaking change by introducing dependency on DNS
configuration.
2019-04-19 11:43:17 +02:00
Marek Siarkowicz
37381eb384 Pick up security patches for fluentd-gcp-scaler by upgrading to version 0.5.2 2019-04-18 11:52:53 +02:00
George Angel
f40f767d94 update k8s.gcr.io/k8s-dns-node-cache image version
v1.15.0 is affected by https://github.com/kubernetes/dns/issues/282
2019-04-16 09:43:53 +01:00
Kubernetes Prow Robot
dda0e75d36
Merge pull request #76404 from MrHohn/addon-manager-9.1
Update addon-manager to use debian-base:v1.0.0
2019-04-15 18:05:36 -07:00
Kubernetes Prow Robot
8a636a3151
Merge pull request #76467 from MrHohn/addon-manager-owner
Add approver and label to addon-manager
2019-04-15 14:25:06 -07:00
Kubernetes Prow Robot
b4c77eff33
Merge pull request #76427 from hprateek43/Fix-#75567
Fix for #75567
2019-04-15 11:46:39 -07:00
Zihong Zheng
2d635bc29d Add approver and label to addon-manager 2019-04-12 13:04:43 -07:00
Zihong Zheng
9f8d9ba847 Update addon-manager to use debian-base:v1.0.0 2019-04-11 10:18:33 -07:00
Brett Elliott
da4a8aa5ce Bump metrics server to v0.3.2 2019-04-11 13:27:14 +02:00
Harsh Singh
47275cb6cd Fix for #75567 2019-04-11 13:18:47 +05:30
yue9944882
b5e3acc5c0 remove internal client references in cluster/* 2019-04-09 21:43:54 +08:00
Yubao Liu
f7f51fab2a change default 5s ttl to 30s for coredns to be same with kube-dns/dnsmasq 2019-04-07 20:41:25 +08:00
Kubernetes Prow Robot
3e954d3bd3
Merge pull request #76211 from wangzhen127/npd063
Use Node-Problem-Detector v0.6.3 on GCI
2019-04-05 14:34:17 -07:00
Kubernetes Prow Robot
63ae37304b
Merge pull request #75967 from ialidzhikov/fluentd-1.4.1
Update fluentd 1.4.1
2019-04-05 11:51:58 -07:00
Zhen Wang
953677d7a5 Use Node-Problem-Detector v0.6.3 on GCI 2019-04-05 11:08:24 -07:00
Solly Ross
837976cb59 Prune directxman12 from metrics/autoscaling OWNERS
Since I'm not really working on metrics or autoscaling stuff any more, I
figured it was time to remove myself from the approvers list.
2019-04-03 16:24:51 -07:00
Michelle Au
d2aa8178f2 Remove alpha CRD install 2019-04-02 10:59:11 -07:00
ialidzhikov
ebfb92bdce Update fluentd 1.4.1
Signed-off-by: ialidzhikov <i.alidjikov@gmail.com>
2019-04-01 18:24:47 +03:00
Kubernetes Prow Robot
da018a6bfa
Merge pull request #75362 from serathius/gcp-security-patches
Update gcp images with security patches
2019-03-28 14:25:58 -07:00
Kubernetes Prow Robot
753a6edc37
Merge pull request #74616 from ialidzhikov/fluentd-1.4.0
Update fluentd to 1.4.0
2019-03-26 12:33:48 -07:00
Kubernetes Prow Robot
81d37386aa
Merge pull request #75168 from ialidzhikov/update-golang-version
Update golang to 1.12.0
2019-03-26 11:17:20 -07:00
ramnar
0ec6eb6177 Bug fix 72757.Removed deprecated label kubernetes.io/cluster-service 2019-03-24 09:41:47 +05:30
ialidzhikov
db6add318a Update fluentd to 1.4.0
Signed-off-by: ialidzhikov <i.alidjikov@gmail.com>
2019-03-20 08:48:26 +02:00
Marek Siarkowicz
9e9b906047 Update gcp images with security patches
[stackdriver addon] Bump prometheus-to-sd to v0.5.0 to pick up security fixes.
[fluentd-gcp addon] Bump fluentd-gcp-scaler to v0.5.1 to pick up security fixes.
[fluentd-gcp addon] Bump event-exporter to v0.2.4 to pick up security fixes.
[fluentd-gcp addon] Bump prometheus-to-sd to v0.5.0 to pick up security fixes.
[metatada-proxy addon] Bump prometheus-to-sd v0.5.0 to pick up security fixes.
2019-03-15 09:24:32 +01:00
Kubernetes Prow Robot
d778b9308a
Merge pull request #75063 from wangzhen127/npd-test-fix
Fix NPD e2e test on Ubuntu node and update NPD container version
2019-03-08 14:19:09 -08:00
ialidzhikov
c72115dede Update golang to 1.12.0
Signed-off-by: ialidzhikov <i.alidjikov@gmail.com>
2019-03-08 11:45:46 +02:00
Tim Allclair
63f61a6714 Migrate RuntimeClass to internal API 2019-03-07 11:07:54 -08:00
Zhen Wang
f4d9e7d992 Fix NPD e2e test on Ubuntu node and update NPD container version 2019-03-06 22:42:47 -08:00
Kubernetes Prow Robot
45e5f6053b
Merge pull request #74424 from liggitt/drop-k8s-io-node-labels
Clean up self-set node labels
2019-03-06 08:24:26 -08:00
Kubernetes Prow Robot
95cd1d59e4
Merge pull request #74209 from monotek/fluentd-helm-readme
added production note about EFK stack to the readme
2019-03-04 17:55:12 -08:00
Kubernetes Prow Robot
ccf33be0cc
Merge pull request #73940 from jiayingz/nvidia-dp-update
Update nvidia-gpu-device-plugin addon.
2019-02-27 17:13:01 -08:00
Kubernetes Prow Robot
1942c1ccb0
Merge pull request #71251 from monotek/kibana
updated kibana to 6.6.1
2019-02-26 23:40:33 -08:00
Kubernetes Prow Robot
7a4496429d
Merge pull request #71252 from monotek/elasticsearch
updated elasticsearch to 6.6.1
2019-02-26 09:33:44 -08:00
Jordan Liggitt
0174e043c5 Prepare switch from beta.kubernetes.io/masq-agent-ds-ready to node.kubernetes.io/masq-agent-ds-ready 2019-02-26 11:43:10 -05:00
Jordan Liggitt
943b32a289 Prepare switch from beta.kubernetes.io/kube-proxy-ds-ready to node.kubernetes.io/kube-proxy-ds-ready 2019-02-26 11:42:23 -05:00
Jordan Liggitt
d6664a2365 Prepare switch from beta.kubernetes.io/metadata-proxy-ready to cloud.google.com/metadata-proxy-ready 2019-02-26 11:42:23 -05:00
Jordan Liggitt
8975233788 Finish migration of fluentd to daemonset 2019-02-26 11:42:23 -05:00
André Bauer
9e2d9cfbb0 changed es image repo
Signed-off-by: André Bauer <monotek23@gmail.com>
2019-02-26 09:09:21 +01:00
Florent Delannoy
e627474e8f Fix fluentd-gcp addon liveness probe
Fix three issues with the fluentd-gcp liveness probe:

h1. STUCK_THRESHOLD_SECONDS was overridden by LIVENESS_THRESHOLD_SECONDS
if defined

Probably a copy/paste issue introduced in edf1ffc074

h1. `[[` is [a bashism](https://stackoverflow.com/a/47576482), and will always failed when called with `/bin/sh`

Introduced by a844523c20

Given that we call the liveness probe with `/bin/sh`, we cannot use the
double-bracketed `[[` syntax for test, as it is not POSIX-compliant and
will throw an error.

Annoyingly, even through it prints an error, `sh` returns with exit code 0
in this case:

```bash
root@fluentd-7mprs:/# sh liveness.sh
liveness.sh: 8: liveness.sh: [[: not found
liveness.sh: 15: liveness.sh: [[: not found
root@fluentd-7mprs:/# echo $?
0
```

Which means the liveness probe is considered successful by Kubernetes,
despite failing to test things as it was intended. This is also
probably the reason why this bug wasn't reported sooner :)

Thankfully, the test in this case can just as easily be written as
POSIX-compliant as it doesn't use any bash-specific features within the
`[[` block.

h1. Buffers are transient and cannot be relied upon for monitoring

Finally, after fixing the above issue, we started seeing the fluentd
containers being restarted very often, and found an issue with the
underlying logic of the liveness probe.

The probe checks that the pod is still alive by running the following
command:

`find /var/log/fluentd-buffers -type f -newer /tmp/marker-stuck -print -quit`

This checks if any _regular_ file exists under `/var/log/fluentd-buffers`
that is more recent than a predetermined time, and will return an empty
string otherwise.

The issue is that these buffers are temporary and volatile, they get created and
deleted constantly. Here is an example of running that check every second on a
running fluentd:

```
root@fluentd-eks-playground-jdc8m:/# LIVENESS_THRESHOLD_SECONDS=${LIVENESS_THRESHOLD_SECONDS:-300};
root@fluentd-eks-playground-jdc8m:/# STUCK_THRESHOLD_SECONDS=${LIVENESS_THRESHOLD_SECONDS:-900};
root@fluentd-eks-playground-jdc8m:/# touch -d "${STUCK_THRESHOLD_SECONDS} seconds ago" /tmp/marker-stuck;
root@fluentd-eks-playground-jdc8m:/# touch -d "${LIVENESS_THRESHOLD_SECONDS} seconds ago" /tmp/marker-liveness;
root@fluentd-eks-playground-jdc8m:/# while true; do date ; find /var/log/fluentd-buffers -type f -newer /tmp/marker-stuck -print -quit ; sleep 1 ; done
Fri Feb 22 10:52:57 UTC 2019
Fri Feb 22 10:52:58 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer/buffer.b5827964ccf4c7004103c3fa7c8533f85.log
Fri Feb 22 10:52:59 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer/buffer.b5827964ccf4c7004103c3fa7c8533f85.log
Fri Feb 22 10:53:00 UTC 2019
Fri Feb 22 10:53:01 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer/buffer.b5827964fb8b2eedcccd2763ea7775cc2.log
Fri Feb 22 10:53:02 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer/buffer.b5827964fb8b2eedcccd2763ea7775cc2.log
Fri Feb 22 10:53:03 UTC 2019
Fri Feb 22 10:53:04 UTC 2019
Fri Feb 22 10:53:05 UTC 2019
Fri Feb 22 10:53:06 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer/buffer.b5827965564883997b673d703af54848b.log
Fri Feb 22 10:53:07 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer/buffer.b5827965564883997b673d703af54848b.log
Fri Feb 22 10:53:08 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer/buffer.b5827965564883997b673d703af54848b.log
Fri Feb 22 10:53:09 UTC 2019
Fri Feb 22 10:53:10 UTC 2019
Fri Feb 22 10:53:11 UTC 2019
Fri Feb 22 10:53:12 UTC 2019
Fri Feb 22 10:53:13 UTC 2019
Fri Feb 22 10:53:14 UTC 2019
Fri Feb 22 10:53:15 UTC 2019
Fri Feb 22 10:53:16 UTC 2019
```

We can see buffers being created, then disappearing. The LivenessProbe running
under these conditions has a ~50% chance of failing, despite fluentd being
perfectly happy.

I believe that check is probably ok for fluentd installs using large
amounts of buffers, in which case the liveness probe will be correct more
often than not, but fluentd installs that use buffering less intensively
will be negatively impacted by this.

My solution to fix this is to check the last updated time of buffering
_folders_ within `/var/log/fluentd_buffers`. These _do_ get updated when
buffers are created, and do not get deleted as buffers are emptied,
making them the perfect candidate for our use.

Here's an example with the `-d` flag for directories:
```
root@fluentd-eks-playground-jdc8m:/# while true; do date ; find /var/log/fluentd-buffers -type d -newer /tmp/marker-stuck -print -quit ; sleep 1 ; done
Fri Feb 22 10:57:51 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:57:52 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:57:53 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:57:54 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:57:55 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:57:56 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:57:57 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:57:58 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:57:59 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:58:00 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:58:01 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:58:02 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:58:03 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
```

And example of the directory being updated as new buffers come in:
```
root@fluentd-eks-playground-jdc8m:/# ls -lah /var/log/fluentd-buffers/kubernetes.system.buffer
total 0
drwxr-xr-x 2 root root  6 Feb 22 11:17 .
drwxr-xr-x 3 root root 38 Feb 22 11:14 ..
root@fluentd-eks-playground-jdc8m:/# ls -lah /var/log/fluentd-buffers/kubernetes.system.buffer
total 16K
drwxr-xr-x 2 root root  224 Feb 22 11:18 .
drwxr-xr-x 3 root root   38 Feb 22 11:14 ..
-rw-r--r-- 1 root root 1.8K Feb 22 11:18 buffer.b58279be6e21e8b29fc333a7d50096ed0.log
-rw-r--r-- 1 root root  215 Feb 22 11:18 buffer.b58279be6e21e8b29fc333a7d50096ed0.log.meta
-rw-r--r-- 1 root root  429 Feb 22 11:18 buffer.b58279be6f09bdfe047a96486a525ece2.log
-rw-r--r-- 1 root root  195 Feb 22 11:18 buffer.b58279be6f09bdfe047a96486a525ece2.log.meta
root@fluentd-eks-playground-jdc8m:/# ls -lah /var/log/fluentd-buffers/kubernetes.system.buffer
total 0
drwxr-xr-x 2 root root  6 Feb 22 11:18 .
drwxr-xr-x 3 root root 38 Feb 22 11:14 ..
```
2019-02-25 11:48:31 +00:00
André Bauer
2bd6d3dc12 use image version 6.6.1
Signed-off-by: André Bauer <monotek23@gmail.com>
2019-02-25 11:05:52 +01:00
André Bauer
2d15ffc9cc updated to 6.5.2
Signed-off-by: André Bauer <monotek23@gmail.com>
2019-02-25 10:56:50 +01:00
André Bauer
0c29ea1a2e Update es-statefulset.yaml 2019-02-25 10:55:23 +01:00
André Bauer
53a936c359 Update Makefile 2019-02-25 10:55:23 +01:00
André Bauer
0e44fa6359 updated elasticsearch to 6.5.0 2019-02-25 10:55:23 +01:00
André Bauer
fc850b5ecd fixed wording
Signed-off-by: André Bauer <monotek23@gmail.com>
2019-02-25 10:49:43 +01:00
André Bauer
421fcd8262 added prodution note to readme
Signed-off-by: André Bauer <monotek23@gmail.com>
2019-02-25 10:47:26 +01:00
Xiang Dai
36065c6dd7 delete all duplicate empty blanks
Signed-off-by: Xiang Dai <764524258@qq.com>
2019-02-23 10:28:04 +08:00
Kubernetes Prow Robot
743f864310
Merge pull request #73819 from coffeepac/move-fluentd-es-images
Move fluentd es images
2019-02-22 17:58:12 -08:00
Patrick Christopher
1bd45ba6eb review updates 2019-02-22 10:00:10 -08:00