Commit Graph

8124 Commits

Author SHA1 Message Date
John Schnake
03d0e86117 Add support for dryRun option to kube-conformance image
A common issue users run into is wanting a list of tests
a certain regexp will run, without actually running it.

ginkgo supports this with the dryRun flag but it was not
exposed via the kube-conformance image. This change
will set the flag if the E2E_DRYRUN environment variable
is set.

Fixes #74727
2019-02-28 09:21:04 -06:00
Kubernetes Prow Robot
02b8056efb
Merge pull request #73288 from wangzhen127/npd-config
Decouple node-problem-detector release from kubernetes
2019-02-28 00:27:25 -08:00
Kubernetes Prow Robot
ccf33be0cc
Merge pull request #73940 from jiayingz/nvidia-dp-update
Update nvidia-gpu-device-plugin addon.
2019-02-27 17:13:01 -08:00
Kubernetes Prow Robot
b2e9b2a842
Merge pull request #74608 from dims/lighter-weight-make-for-conformance-dependencies
Lighter weight make for conformance dependencies, better script and manifest
2019-02-27 07:19:55 -08:00
Davanum Srinivas
23b33f5c85
Switch to k8s.gcr.io (from staging)
Change-Id: Ib0d6f681be3537f0cbfcae1dc254f1c42a04be66
2019-02-27 08:54:45 -05:00
Davanum Srinivas
87d9903aaf
Add a script to run the conformance image and pull results
Change-Id: I1eb673fe37b5e8a719f9095473765c80fb7f2347
2019-02-27 07:38:43 -05:00
Kubernetes Prow Robot
1942c1ccb0
Merge pull request #71251 from monotek/kibana
updated kibana to 6.6.1
2019-02-26 23:40:33 -08:00
Zhen Wang
efa96f7eb8 allows configuring NPD release and flags on GCI and add cluster e2e test 2019-02-26 21:21:54 -08:00
Kubernetes Prow Robot
81ec358db4
Merge pull request #74438 from pjh/gce-windows-log-dump
Support dumping logs from Windows test nodes on GCE
2019-02-26 18:12:09 -08:00
Peter Hornyack
0bb25290c8 Update log-dump.sh for Windows nodes.
Tested:
```
$ PROJECT=${CLOUDSDK_CORE_PROJECT} KUBERNETES_SKIP_CONFIRM=y NUM_NODES=2 \
  NUM_WINDOWS_NODES=2 KUBE_GCE_ENABLE_IP_ALIASES=true go run \
  ./hack/e2e.go -- --up
$ cluster/log-dump/log-dump.sh
$ ls _artifacts
```

And with: NUM_NODES=2 NUM_WINDOWS_NODES=0; NUM_NODES=0 NUM_WINDOWS_NODES=2
2019-02-26 12:10:19 -08:00
Yu-Ju Hong
093e5a50ac GCE/Windows: create a C:\tmp directory
This is required for running host path tests.
2019-02-26 10:59:06 -08:00
Kubernetes Prow Robot
7a4496429d
Merge pull request #71252 from monotek/elasticsearch
updated elasticsearch to 6.6.1
2019-02-26 09:33:44 -08:00
Davanum Srinivas
94ad1dfb11
Better manifest for running conformance image
Change-Id: I137180ed781edd4a9877cabe039e40a72aa71366
2019-02-26 10:29:12 -05:00
Kubernetes Prow Robot
b8ddc7945b
Merge pull request #74522 from Pluies/master
Fix fluentd-gcp addon liveness probe
2019-02-26 06:38:24 -08:00
Davanum Srinivas
069eeb541b
Simpler make commands for ginkgo/kubectl/e2e.test
Change-Id: I78cff10231eabd53b1fc7bdd1526c861179e135a
2019-02-26 09:18:05 -05:00
André Bauer
9e2d9cfbb0 changed es image repo
Signed-off-by: André Bauer <monotek23@gmail.com>
2019-02-26 09:09:21 +01:00
Kubernetes Prow Robot
3fb6e77770
Merge pull request #74549 from yujuhong/pause-image
GCE: switch to using e2eteam/pause:3.1 for pause containers
2019-02-25 23:12:00 -08:00
Kubernetes Prow Robot
0ff7e463ee
Merge pull request #73746 from mrbobbytables/kubemark-shellcheck
Fix shellcheck lint errors in Kubemark scripts
2019-02-25 17:25:13 -08:00
Kubernetes Prow Robot
3814176d42
Merge pull request #74455 from SataQiu/fix-shell-2019022302
Fix shellcheck lint errors in cluster and hack scripts
2019-02-25 15:15:19 -08:00
Yu-Ju Hong
b863655faa GCE: switch to using e2eteam/pause:3.1 for pause containers
Stop building pause images on node startup.
2019-02-25 14:36:49 -08:00
Kubernetes Prow Robot
2aacb77374
Merge pull request #74444 from pjh/gce-windows-no-defender
Disable Windows Defender on Windows nodes.
2019-02-25 13:54:42 -08:00
Bob Killen
9a4f4878f5
Fix shellcheck lint errors in cluster/kubemark/util.sh 2019-02-25 15:21:29 -05:00
Bob Killen
9a58913e8f
Fix shellcheck lint errors in cluster/kubemark/iks/config-default.sh 2019-02-25 15:21:25 -05:00
Bob Killen
ce4c85e3fd
Fix shellcheck lint errors in cluster/kubemark/gce/config-default.sh 2019-02-25 14:55:01 -05:00
Kubernetes Prow Robot
35a258d640
Merge pull request #73272 from danielqsj/juju
fix shellcheck in cluster/juju
2019-02-25 11:33:21 -08:00
Kubernetes Prow Robot
f288678cfa
Merge pull request #73261 from danielqsj/local
fix shellcheck in cluster/local
2019-02-25 11:33:11 -08:00
Florent Delannoy
e627474e8f Fix fluentd-gcp addon liveness probe
Fix three issues with the fluentd-gcp liveness probe:

h1. STUCK_THRESHOLD_SECONDS was overridden by LIVENESS_THRESHOLD_SECONDS
if defined

Probably a copy/paste issue introduced in edf1ffc074

h1. `[[` is [a bashism](https://stackoverflow.com/a/47576482), and will always failed when called with `/bin/sh`

Introduced by a844523c20

Given that we call the liveness probe with `/bin/sh`, we cannot use the
double-bracketed `[[` syntax for test, as it is not POSIX-compliant and
will throw an error.

Annoyingly, even through it prints an error, `sh` returns with exit code 0
in this case:

```bash
root@fluentd-7mprs:/# sh liveness.sh
liveness.sh: 8: liveness.sh: [[: not found
liveness.sh: 15: liveness.sh: [[: not found
root@fluentd-7mprs:/# echo $?
0
```

Which means the liveness probe is considered successful by Kubernetes,
despite failing to test things as it was intended. This is also
probably the reason why this bug wasn't reported sooner :)

Thankfully, the test in this case can just as easily be written as
POSIX-compliant as it doesn't use any bash-specific features within the
`[[` block.

h1. Buffers are transient and cannot be relied upon for monitoring

Finally, after fixing the above issue, we started seeing the fluentd
containers being restarted very often, and found an issue with the
underlying logic of the liveness probe.

The probe checks that the pod is still alive by running the following
command:

`find /var/log/fluentd-buffers -type f -newer /tmp/marker-stuck -print -quit`

This checks if any _regular_ file exists under `/var/log/fluentd-buffers`
that is more recent than a predetermined time, and will return an empty
string otherwise.

The issue is that these buffers are temporary and volatile, they get created and
deleted constantly. Here is an example of running that check every second on a
running fluentd:

```
root@fluentd-eks-playground-jdc8m:/# LIVENESS_THRESHOLD_SECONDS=${LIVENESS_THRESHOLD_SECONDS:-300};
root@fluentd-eks-playground-jdc8m:/# STUCK_THRESHOLD_SECONDS=${LIVENESS_THRESHOLD_SECONDS:-900};
root@fluentd-eks-playground-jdc8m:/# touch -d "${STUCK_THRESHOLD_SECONDS} seconds ago" /tmp/marker-stuck;
root@fluentd-eks-playground-jdc8m:/# touch -d "${LIVENESS_THRESHOLD_SECONDS} seconds ago" /tmp/marker-liveness;
root@fluentd-eks-playground-jdc8m:/# while true; do date ; find /var/log/fluentd-buffers -type f -newer /tmp/marker-stuck -print -quit ; sleep 1 ; done
Fri Feb 22 10:52:57 UTC 2019
Fri Feb 22 10:52:58 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer/buffer.b5827964ccf4c7004103c3fa7c8533f85.log
Fri Feb 22 10:52:59 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer/buffer.b5827964ccf4c7004103c3fa7c8533f85.log
Fri Feb 22 10:53:00 UTC 2019
Fri Feb 22 10:53:01 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer/buffer.b5827964fb8b2eedcccd2763ea7775cc2.log
Fri Feb 22 10:53:02 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer/buffer.b5827964fb8b2eedcccd2763ea7775cc2.log
Fri Feb 22 10:53:03 UTC 2019
Fri Feb 22 10:53:04 UTC 2019
Fri Feb 22 10:53:05 UTC 2019
Fri Feb 22 10:53:06 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer/buffer.b5827965564883997b673d703af54848b.log
Fri Feb 22 10:53:07 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer/buffer.b5827965564883997b673d703af54848b.log
Fri Feb 22 10:53:08 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer/buffer.b5827965564883997b673d703af54848b.log
Fri Feb 22 10:53:09 UTC 2019
Fri Feb 22 10:53:10 UTC 2019
Fri Feb 22 10:53:11 UTC 2019
Fri Feb 22 10:53:12 UTC 2019
Fri Feb 22 10:53:13 UTC 2019
Fri Feb 22 10:53:14 UTC 2019
Fri Feb 22 10:53:15 UTC 2019
Fri Feb 22 10:53:16 UTC 2019
```

We can see buffers being created, then disappearing. The LivenessProbe running
under these conditions has a ~50% chance of failing, despite fluentd being
perfectly happy.

I believe that check is probably ok for fluentd installs using large
amounts of buffers, in which case the liveness probe will be correct more
often than not, but fluentd installs that use buffering less intensively
will be negatively impacted by this.

My solution to fix this is to check the last updated time of buffering
_folders_ within `/var/log/fluentd_buffers`. These _do_ get updated when
buffers are created, and do not get deleted as buffers are emptied,
making them the perfect candidate for our use.

Here's an example with the `-d` flag for directories:
```
root@fluentd-eks-playground-jdc8m:/# while true; do date ; find /var/log/fluentd-buffers -type d -newer /tmp/marker-stuck -print -quit ; sleep 1 ; done
Fri Feb 22 10:57:51 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:57:52 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:57:53 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:57:54 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:57:55 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:57:56 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:57:57 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:57:58 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:57:59 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:58:00 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:58:01 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:58:02 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:58:03 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
```

And example of the directory being updated as new buffers come in:
```
root@fluentd-eks-playground-jdc8m:/# ls -lah /var/log/fluentd-buffers/kubernetes.system.buffer
total 0
drwxr-xr-x 2 root root  6 Feb 22 11:17 .
drwxr-xr-x 3 root root 38 Feb 22 11:14 ..
root@fluentd-eks-playground-jdc8m:/# ls -lah /var/log/fluentd-buffers/kubernetes.system.buffer
total 16K
drwxr-xr-x 2 root root  224 Feb 22 11:18 .
drwxr-xr-x 3 root root   38 Feb 22 11:14 ..
-rw-r--r-- 1 root root 1.8K Feb 22 11:18 buffer.b58279be6e21e8b29fc333a7d50096ed0.log
-rw-r--r-- 1 root root  215 Feb 22 11:18 buffer.b58279be6e21e8b29fc333a7d50096ed0.log.meta
-rw-r--r-- 1 root root  429 Feb 22 11:18 buffer.b58279be6f09bdfe047a96486a525ece2.log
-rw-r--r-- 1 root root  195 Feb 22 11:18 buffer.b58279be6f09bdfe047a96486a525ece2.log.meta
root@fluentd-eks-playground-jdc8m:/# ls -lah /var/log/fluentd-buffers/kubernetes.system.buffer
total 0
drwxr-xr-x 2 root root  6 Feb 22 11:18 .
drwxr-xr-x 3 root root 38 Feb 22 11:14 ..
```
2019-02-25 11:48:31 +00:00
André Bauer
2bd6d3dc12 use image version 6.6.1
Signed-off-by: André Bauer <monotek23@gmail.com>
2019-02-25 11:05:52 +01:00
André Bauer
2d15ffc9cc updated to 6.5.2
Signed-off-by: André Bauer <monotek23@gmail.com>
2019-02-25 10:56:50 +01:00
André Bauer
0c29ea1a2e Update es-statefulset.yaml 2019-02-25 10:55:23 +01:00
André Bauer
53a936c359 Update Makefile 2019-02-25 10:55:23 +01:00
André Bauer
0e44fa6359 updated elasticsearch to 6.5.0 2019-02-25 10:55:23 +01:00
danielqsj
7d051e1a75 update juju shell 2019-02-24 20:46:20 +08:00
danielqsj
7e655e8666 fix shellcheck in cluster/juju 2019-02-24 20:40:59 +08:00
danielqsj
f02a986081 add comments to shell function 2019-02-24 20:35:46 +08:00
SataQiu
9cda80e836 fix shellcheck lint errors in cluster and hack scripts 2019-02-24 11:15:35 +08:00
Kubernetes Prow Robot
1cf8001e53
Merge pull request #74449 from xichengliudui/fix190223
make more of the shell pass lints
2019-02-23 12:52:34 -08:00
Kubernetes Prow Robot
8993fbc543
Merge pull request #74328 from daixiang0/delete-blank
delete all duplicate empty blanks
2019-02-23 01:43:58 -08:00
Peter Hornyack
621df2cddb Disable Windows Defender on Windows test nodes. 2019-02-22 18:35:38 -08:00
Xiang Dai
36065c6dd7 delete all duplicate empty blanks
Signed-off-by: Xiang Dai <764524258@qq.com>
2019-02-23 10:28:04 +08:00
Jeff Grafton
7a938eb541 Create work-around wrappers for pure attribute on go_binary and go_test
This enables cgo when cross-compiling certain tests and binaries to
Linux, while disabling cgo for Windows and Darwin.
2019-02-22 18:08:34 -08:00
Kubernetes Prow Robot
46d7e9c784
Merge pull request #74381 from yujuhong/add-key
GCE: add EventLog registry key for docker
2019-02-22 17:58:44 -08:00
Kubernetes Prow Robot
fd7acc3275
Merge pull request #74065 from ixdy/break-up-test-tarball
break up the test tarball
2019-02-22 17:58:23 -08:00
Kubernetes Prow Robot
743f864310
Merge pull request #73819 from coffeepac/move-fluentd-es-images
Move fluentd es images
2019-02-22 17:58:12 -08:00
Peter Hornyack
3efd4ca1dc Enhance/repair detect-node-names() and related env vars for Windows nodes. 2019-02-22 14:56:55 -08:00
Patrick Christopher
1bd45ba6eb review updates 2019-02-22 10:00:10 -08:00
Aaron Crickenberger
2d868025be Promote spiffxp to approver for cluster/
Also add Katharine as reviewer
2019-02-22 09:25:31 -08:00
Kubernetes Prow Robot
c7ac532816
Merge pull request #74360 from pjh/gce-windows-cluster-ssh
Enable OpenSSH on Windows nodes in test clusters.
2019-02-21 20:51:23 -08:00
Kubernetes Prow Robot
125dc6c8ea
Merge pull request #74187 from xichengliudui/fixgolint0218
Fix shellcheck lint errors in cluster/addons/fluentd-elasticsearch/fl……uentd-es-image/run.sh
2019-02-21 20:51:13 -08:00
Yu-Ju Hong
40d0ae311c GCE: add EventLog registry key for docker 2019-02-21 17:47:32 -08:00