Commit Graph

926 Commits

Author SHA1 Message Date
Pengfei Ni
00eeb7f53a Add node e2e tests for runAsUser 2017-06-29 09:17:14 +08:00
Kubernetes Submit Queue
165c94aa7b Merge pull request #47549 from yguo0905/change-tested-images
Automatic merge from submit-queue

Changes node e2e tests to use the new Ubuntu image

ref: https://github.com/kubernetes/kubernetes/issues/46891

`ubuntu-docker10` and `ubuntu-docker12` images are deprecated in favor of the new one.

**Release note**:
```
None
```
/sig node
/area node-e2e
/assign @dchen1107
2017-06-27 23:30:24 -07:00
Kubernetes Submit Queue
98ee52ed78 Merge pull request #48001 from yguo0905/report-prefix
Automatic merge from submit-queue (batch tested with PRs 47675, 48001)

Encodes ReportPrefix into the generated metrics file names

Ref: https://github.com/kubernetes/kubernetes/issues/44003

Adds the test prefix to be part of the name. Otherwise the same test case running on different images will override each other. Nothing needs to be changed at the node-perf-dash side.

See test run at https://console.cloud.google.com/storage/browser/ygg-gke-dev-bucket/e2e-node-test/ci-kubernetes-node-kubelet-benchmark/10.


**Release note**:
```
None
```

/sig node
/area node-e2e
/assign @Random-Liu
2017-06-27 16:11:07 -07:00
Kubernetes Submit Queue
0dad2d0803 Merge pull request #47983 from yguo0905/memcg
Automatic merge from submit-queue (batch tested with PRs 48092, 47894, 47983)

Enables memcg notification in cluster/node e2e tests

Ref: https://github.com/kubernetes/kubernetes/issues/42676

This PR sets Kubelet flag `--experimental-kernel-memcg-notification=true` when running cluster/node e2e tests on COS and Ubuntu images.

Tested:
```
e2e-node-cos:
I0623 00:09:06.641776    1080 server.go:147] Starting server "kubelet" with command "/usr/bin/systemd-run --unit=kubelet-777178888.service --slice=runtime.slice --remain-after-exit /tmp/node-e2e-20170622T170739/kubelet --kubelet-cgroups=/kubelet.slice --cgroup-root=/ --api-servers http://localhost:8080 --address 0.0.0.0 --port 10250 --read-only-port 10255 --volume-stats-agg-period 10s --allow-privileged true --serialize-image-pulls false --pod-manifest-path /tmp/node-e2e-20170622T170739/pod-manifest571288056 --file-check-frequency 10s --pod-cidr 10.100.0.0/24 --eviction-pressure-transition-period 30s --feature-gates  --eviction-hard memory.available<250Mi,nodefs.available<10%%,nodefs.inodesFree<5%% --eviction-minimum-reclaim nodefs.available=5%%,nodefs.inodesFree=5%% --v 4 --logtostderr --network-plugin=kubenet --cni-bin-dir /tmp/node-e2e-20170622T170739/cni/bin --cni-conf-dir /tmp/node-e2e-20170622T170739/cni/net.d --hostname-override tmp-node-e2e-bfe5799d-cos-stable-59-9460-64-0 --experimental-mounter-path=/tmp/node-e2e-20170622T170739/cluster/gce/gci/mounter/mounter --experimental-kernel-memcg-notification=true"

e2e-node-ubuntu:
I0623 00:03:28.526984    2279 server.go:147] Starting server "kubelet" with command "/usr/bin/systemd-run --unit=kubelet-1407651753.service --slice=runtime.slice --remain-after-exit /tmp/node-e2e-20170622T170203/kubelet --kubelet-cgroups=/kubelet.slice --cgroup-root=/ --api-servers http://localhost:8080 --address 0.0.0.0 --port 10250 --read-only-port 10255 --volume-stats-agg-period 10s --allow-privileged true --serialize-image-pulls false --pod-manifest-path /tmp/node-e2e-20170622T170203/pod-manifest083943734 --file-check-frequency 10s --pod-cidr 10.100.0.0/24 --eviction-pressure-transition-period 30s --feature-gates  --eviction-hard memory.available<250Mi,nodefs.available<10%%,nodefs.inodesFree<5%% --eviction-minimum-reclaim nodefs.available=5%%,nodefs.inodesFree=5%% --v 4 --logtostderr --network-plugin=kubenet --cni-bin-dir /tmp/node-e2e-20170622T170203/cni/bin --cni-conf-dir /tmp/node-e2e-20170622T170203/cni/net.d --hostname-override tmp-node-e2e-e48cdd73-ubuntu-gke-1604-xenial-v20170420-1 --experimental-kernel-memcg-notification=true"

e2e-node-containervm:
I0623 00:14:35.392383    2774 server.go:147] Starting server "kubelet" with command "/tmp/node-e2e-20170622T171318/kubelet --runtime-cgroups=/docker-daemon --kubelet-cgroups=/kubelet --cgroup-root=/ --system-cgroups=/system --api-servers http://localhost:8080 --address 0.0.0.0 --port 10250 --read-only-port 10255 --volume-stats-agg-period 10s --allow-privileged true --serialize-image-pulls false --pod-manifest-path /tmp/node-e2e-20170622T171318/pod-manifest507536807 --file-check-frequency 10s --pod-cidr 10.100.0.0/24 --eviction-pressure-transition-period 30s --feature-gates  --eviction-hard memory.available<250Mi,nodefs.available<10%,nodefs.inodesFree<5% --eviction-minimum-reclaim nodefs.available=5%,nodefs.inodesFree=5% --v 4 --logtostderr --network-plugin=kubenet --cni-bin-dir /tmp/node-e2e-20170622T171318/cni/bin --cni-conf-dir /tmp/node-e2e-20170622T171318/cni/net.d --hostname-override tmp-node-e2e-9e3fdd7c-e2e-node-containervm-v20161208-image"

e2e-cos:
Jun 23 17:54:38 e2e-test-ygg-minion-group-t5r0 kubelet[2005]: I0623 17:54:38.646374    2005 flags.go:52] FLAG: --experimental-kernel-memcg-notification="true"

e2e-ubuntu:
Jun 23 18:25:27 e2e-test-ygg-minion-group-19qp kubelet[1547]: I0623 18:25:27.722253    1547 flags.go:52] FLAG: --experimental-kernel-memcg-notification="true"

e2e-containervm:
I0623 18:55:51.886632    3385 flags.go:52] FLAG: --experimental-kernel-memcg-notification="false"
```

**Release note**:
```
None
```

/sig node
/area node-e2e
/assign @dchen1107 @dashpole
2017-06-26 21:08:10 -07:00
Kubernetes Submit Queue
36ae4ae4e3 Merge pull request #47971 from yujuhong/bump-usage-limit
Automatic merge from submit-queue (batch tested with PRs 48074, 47971, 48044, 47514, 47647)

e2e: bump kubelet's resurce usage limit

We don't have per-OS image limits. Bumping these to more generous
numbers to not fail the tests.
2017-06-26 11:40:51 -07:00
Yang Guo
50d49d9c51 Enables memcg notification in cluster/node e2e tests 2017-06-26 11:40:22 -07:00
Kubernetes Submit Queue
14edc46c2e Merge pull request #47892 from ajitak/npd-config
Automatic merge from submit-queue (batch tested with PRs 47993, 47892, 47591, 47469, 47845)

Bump up npd version to v0.4.1

```
Bump up npd version to v0.4.1
```

Fixes #47219
2017-06-23 18:05:46 -07:00
Yang Guo
8ab15e3774 Encodes ReportPrefix into the generated metrics file names 2017-06-23 16:11:25 -07:00
Yu-Ju Hong
71bd92ce3b e2e: bump kubelet's resurce usage limit
We don't have per-OS image limits. Bumping these to more generous
numbers to not fail the tests.
2017-06-23 09:55:18 -07:00
Kubernetes Submit Queue
467705be00 Merge pull request #47195 from dims/bind-cadvisor-on-kubelet-interface
Automatic merge from submit-queue (batch tested with PRs 47922, 47195, 47241, 47095, 47401)

Run cAdvisor on the same interface as kubelet

**What this PR does / why we need it**:

cAdvisor currently binds to all interfaces. Currently the only
solution is to use iptables to block access to the port. We
are better off making cAdvisor to bind to the interface that
kubelet uses for better security.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #

Fixes #11710

**Special notes for your reviewer**:

**Release note**:

```release-note
cAdvisor binds only to the interface that kubelet is running on instead of all interfaces.
```
2017-06-22 21:33:27 -07:00
Ajit Kumar
caff16c678 Bump up npd version to v0.4.1 2017-06-22 13:13:50 -07:00
Chao Xu
60604f8818 run hack/update-all 2017-06-22 11:31:03 -07:00
Chao Xu
f4989a45a5 run root-rewrite-v1-..., compile 2017-06-22 10:25:57 -07:00
mbohlool
c91a12d205 Remove all references to types.UnixUserID and types.UnixGroupID 2017-06-21 04:09:07 -07:00
Kubernetes Submit Queue
cc645a8c6f Merge pull request #46327 from supereagle/mark-network-plugin-dir-deprecated
Automatic merge from submit-queue (batch tested with PRs 46327, 47166)

mark --network-plugin-dir deprecated for kubelet

**What this PR does / why we need it**:

**Which issue this PR fixes** : fixes #43967

**Special notes for your reviewer**:

**Release note**:

```release-note
NONE
```
2017-06-19 11:23:54 -07:00
Kubernetes Submit Queue
b6faf34862 Merge pull request #47530 from mindprince/issue-47388-remove-dead-code
Automatic merge from submit-queue (batch tested with PRs 47530, 47679)

Use cos-stable-59-9460-64-0 instead of cos-beta-59-9460-20-0.

Remove dead code that has now moved to another repo as part of #47467

**Release note**:
```release-note
NONE
```

/sig node
2017-06-16 20:57:58 -07:00
Rohit Agarwal
3a86c97cf6 Use cos-stable-59-9460-64-0 instead of cos-beta-59-9460-20-0.
- It contains a fix for ipaliasing.
- It contains a fix which decouples GPU driver installation from kernel
version.

Remove dead code that has now moved to another repo as part of #47467
2017-06-16 13:48:50 -07:00
Bowei Du
1ed4afca80 Fix hardcoded CIDR in the validation_test
The ideal fix is to not hardcode these values.

fixes #47479
2017-06-15 22:15:56 -07:00
Kubernetes Submit Queue
d797c219b3 Merge pull request #47260 from yguo0905/perf-dash
Automatic merge from submit-queue (batch tested with PRs 47470, 47260, 47411, 46852, 46135)

Logs node e2e perf data to standalone json files

Fixes the node-dash-perf issue in https://github.com/kubernetes/kubernetes/issues/44003.

- Move perf data types to `test/e2e/perftype/perftype.go` so that the node-perf-dash can depend on.
- Logs the perf data to standalone json files so that node-perf-dash can consume it easily. A sample run of `ci-kubernetes-node-kubelet-benchmark` is at https://console.cloud.google.com/storage/browser/ygg-gke-dev-bucket/e2e-node-test/ci-kubernetes-node-kubelet-benchmark/1.

The corresponding changes in node-perf-dash is at https://github.com/kubernetes/contrib/pull/2628.

**Release note**:
`None`

/sig node
/area node-e2e
/assign @Random-Liu
2017-06-14 12:52:18 -07:00
Yang Guo
404cda2777 Changes node e2e tests to use new Ubuntu image 2017-06-14 11:44:25 -07:00
Rohit Agarwal
9c0bf19f80 Use cos-stable-59-9460-60-0 and newer installer for GPU node e2e tests. 2017-06-13 15:36:20 -07:00
Kubernetes Submit Queue
f4d2c7b931 Merge pull request #46441 from dashpole/eviction_time
Automatic merge from submit-queue

Shorten eviction tests, and increase test suite timeout

After #43590, the eviction manager is less aggressive when evicting pods.  Because of that, many runs in the flaky suite time out.
To shorten the inode eviction test, I have lowered the eviction threshold.
To shorten the allocatable eviction test, I now set KubeReserved = NodeMemoryCapacity - 200Mb, so that any pod using 200Mb will be evicted.  This shortens this test from 40 minutes, to 10 minutes.
While this should be enough to not hit the flaky suite timeout anymore, it is better to keep lower individual test timeouts than a lower suite timeout, since hitting the suite timeout means that even successful test runs are not reported.

/assign @Random-Liu @mtaufen 

issue: #31362
2017-06-13 12:58:22 -07:00
Yang Guo
29b2db5af3 Logs node e2e perf data to standalone json files 2017-06-12 14:27:56 -07:00
David Ashpole
3365cca78a shorten eviction testst and lengthen flaky suite timeout 2017-06-12 12:56:45 -07:00
Rohit Agarwal
f7a563435f Fix bad check in node e2e tests for GPUs.
When no nvidia device was attached, the -ne check had a syntax error:

    sh: -ne: argument expected

This resulted in 'Success' being echoed and the test passing incorrectly.
This was found while debugging issue #47216
2017-06-11 19:25:35 -07:00
Kubernetes Submit Queue
3040cba17d Merge pull request #47144 from jingxu97/May/emptyDir
Automatic merge from submit-queue

Fix local capacity isolation test
2017-06-09 12:17:19 -07:00
Kubernetes Submit Queue
f75478875a Merge pull request #47113 from feiskyer/cri
Automatic merge from submit-queue

Kubelet: rename cri package name to pkg/kubelet/apis/cri/v1alpha1/runtime

**What this PR does / why we need it**:

We have moved CRI from api/v1alpha1/runtime to apis/cri/v1alpha1, which changed the package name of CRI. This would cause a significant problem: old-versioned runtime (based on CRI in v1.6) doesn't work with latest kubelet v1.7, and vice versa.

This PR renames cri package name to `pkg/kubelet/apis/cri/v1alpha1/runtime` for fixing the problem.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: 

fixes #47012

**Special notes for your reviewer**:

Should be included in v1.7.

**Release note**:

```release-note
CRI has been moved to package `pkg/kubelet/apis/cri/v1alpha1/runtime`.
```
2017-06-09 10:08:36 -07:00
Kubernetes Submit Queue
3a5df705fe Merge pull request #47190 from mindprince/faster-node-e2e-gci
Automatic merge from submit-queue

Move the nvidia installer to the beginning.

When the installer runs for the first time, it disables loadpin and restarts
the node. So, it is better to run it in the beginning so that we can avoid
redoing the later steps. One of the later steps include downloading a tar file
and untarring it. Doing that only once saves around 1m30s in test runtime for
the gci image.

/sig node
/area node-e2e

```release-note
NONE
```
2017-06-09 09:19:16 -07:00
Pengfei Ni
22e99504d7 Update CRI references 2017-06-09 10:16:40 +08:00
Kubernetes Submit Queue
3a96c31de5 Merge pull request #46885 from kewu1992/test_gci_next_canary
Automatic merge from submit-queue (batch tested with PRs 46885, 47197)

Let COS docker validation node test against gci-next-canary

**What this PR does / why we need it**:
This is for COS docker validation node test. We plan to use family gci-next-canary in container-vm-image-staging for future Docker upgration and validation.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #47134

**Special notes for your reviewer**:

**Release note**:

```release-note
```
2017-06-08 15:46:41 -07:00
Davanum Srinivas
7e5c43a042 Run cAdvisor on the same interface as kubelet
cAdvisor currently binds to all interfaces. Currently the only
solution is to use iptables to block access to the port. We
are better off making cAdvisor to bind to the interface that
kubelet uses for better security.

Fixes #11710
2017-06-08 16:43:38 -04:00
Rohit Agarwal
4a5badfafa Move the nvidia installer to the beginning.
When the installer runs for the first time, it disables loadpin and restarts
the node. So, it is better to run it in the beginning so that we can avoid
redoing the later steps. One of the later steps include downloading a tar file
and untarring it. Doing that only once saves around 1m30s in test runtime for
the gci image.
2017-06-08 09:55:14 -07:00
Kubernetes Submit Queue
9c1b2aa9b5 Merge pull request #46743 from Random-Liu/bump-up-npd
Automatic merge from submit-queue

Bump up npd version to v0.4.0

Fixes #47070.

Bump up npd version to [v0.4.0](https://github.com/kubernetes/node-problem-detector/releases/tag/v0.4.0).

```release-note
Bump up Node Problem Detector version to v0.4.0, which added support of parsing log from /dev/kmsg and ABRT.
```

/cc @dchen1107 @ajitak
2017-06-08 08:24:18 -07:00
Jing Xu
426d44ded4 Fix local capacity isolation test
Fix issue #47128, also add flaky tag for this evition test
2017-06-08 06:30:29 -07:00
Kubernetes Submit Queue
6ee028249b Merge pull request #46385 from rickypai/rpai/host_mapping_node_e2e_test
Automatic merge from submit-queue (batch tested with PRs 43005, 46660, 46385, 46991, 47103)

add e2e node test for Pod hostAliases feature

**What this PR does / why we need it**: adds node e2e test for #45148 

tests requested in https://github.com/kubernetes/kubernetes/issues/43632#issuecomment-298434125

**Release note**:
```release-note
NONE
```

@yujuhong @thockin
2017-06-07 13:31:00 -07:00
Random-Liu
1d3979190c Bump up npd version to v0.4.0 2017-06-06 16:30:02 -07:00
Kubernetes Submit Queue
6ed4bc7b97 Merge pull request #46828 from cblecker/links-update
Automatic merge from submit-queue (batch tested with PRs 46718, 46828, 46988)

Update docs/ links to point to main site

**What this PR does / why we need it**:
This updates various links to either point to kubernetes.io or to the kubernetes/community repo instead of the legacy docs/ tree in k/k
Pre-requisite for #46813

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #

**Special notes for your reviewer**:

**Release note**:
```release-note
NONE
```

@kubernetes/sig-docs-maintainers @chenopis @ahmetb @thockin
2017-06-06 11:43:18 -07:00
Kubernetes Submit Queue
3338d784ba Merge pull request #46899 from mindprince/issue-46889-node-e2e-gpu-cos-fix
Automatic merge from submit-queue (batch tested with PRs 46897, 46899, 46864, 46854, 46875)

Wait for cloud-init to finish before starting tests.

This fixes #46889.


**Release note**:
```release-note
NONE
```
2017-06-06 05:22:42 -07:00
Christoph Blecker
1bdc7a29ae Update docs/ URLs to point to proper locations 2017-06-05 22:13:54 -07:00
Jing Xu
0b13aee0c0 Add EmptyDir Volume and local storage for container overlay Isolation
This PR adds two features:
1. add support for isolating the emptyDir volume use. If user
sets a size limit for emptyDir volume, kubelet's eviction manager
monitors its usage
and evict the pod if the usage exceeds the limit.
2. add support for isolating the local storage for container overlay. If
the container's overly usage exceeds the limit defined in container
spec, eviction manager will evict the pod.
2017-06-05 12:05:48 -07:00
Rohit Agarwal
1561f55c4c Wait for cloud-init to finish before starting tests.
This fixes #46889.
2017-06-05 10:50:24 -07:00
Kubernetes Submit Queue
6fef1a1deb Merge pull request #46810 from vishh/gpu-cos-image-validation
Automatic merge from submit-queue (batch tested with PRs 46734, 46810, 46759, 46259, 46771)

Update the COS kernel sha for node e2e gpu installer

cc @mindprince

Relevant COS image - https://github.com/kubernetes/kubernetes/blob/master/test/e2e_node/jenkins/image-config-serial.yaml#L19
2017-06-05 06:51:23 -07:00
Kubernetes Submit Queue
e6c74bbaaf Merge pull request #46221 from FengyunPan/close-file
Automatic merge from submit-queue

Close file after os.Open()

None
2017-06-03 04:42:00 -07:00
Kubernetes Submit Queue
b8c9ee8abb Merge pull request #46456 from jingxu97/May/allocatable
Automatic merge from submit-queue

Add local storage (scratch space) allocatable support

This PR adds the support for allocatable local storage (scratch space).
This feature is only for root file system which is shared by kubernetes
componenets, users' containers and/or images. User could use
--kube-reserved flag to reserve the storage for kube system components.
If the allocatable storage for user's pods is used up, some pods will be
evicted to free the storage resource.

This feature is part of local storage capacity isolation and described in the proposal https://github.com/kubernetes/community/pull/306

**Release note**:

```release-note
This feature exposes local storage capacity for the primary partitions, and supports & enforces storage reservation in Node Allocatable 
```
2017-06-03 00:24:29 -07:00
Kubernetes Submit Queue
d063ce213f Merge pull request #46801 from dashpole/summary_container_restart
Automatic merge from submit-queue

[Flaky PR Test] Fix summary test

fixes issue: #46797 

As we can see in the [example failure build log](https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-node-kubelet/4319/build-log.txt), the summary containers are pinging google 100s of times a second.  This causes the summary container to be killed occasionally, and fail the test.  The summary containers are only supposed to ping every 10 seconds according to the current test.  As it turns out, we were missing a semicolon, and were not sleeping between pings.  For background, we ping google to generate network traffic, so that the summary test can validate network metrics.

This PR adds the semicolon to make the container sleep between calls, and decreases the sleep time from 10 seconds to 1 second, as 1 call / 10 seconds did not produce enough activity.

cc @kubernetes/kubernetes-build-cops @dchen1107
2017-06-02 18:02:19 -07:00
Ricky Pai
4e7fed4479 e2e node test for PodSpec HostAliases 2017-06-02 17:01:44 -07:00
Kubernetes Submit Queue
a6f0033164 Merge pull request #46238 from yguo0905/package-validator
Automatic merge from submit-queue (batch tested with PRs 46648, 46500, 46238, 46668, 46557)

Support validating package versions in node conformance test

**What this PR does / why we need it**:

This PR adds a package validator in node conformance test for checking whether the locally installed packages meet the image spec.

**Special notes for your reviewer**:

The image spec for GKE (which has the package spec) will be in a separate PR. Then we will publish a new node conformance test image for GKE whose name should use the convention in https://github.com/kubernetes/kubernetes/issues/45760 and have `gke` in it.


**Release note**:
```
NONE
```
2017-06-02 15:20:47 -07:00
Ke Wu
cb28ed1f95 Let COS docker validation node tests against gci-next-canary
We plan to use family gci-next-canary in container-vm-image-staging
for future Docker upgration and validation.
2017-06-02 14:52:01 -07:00
Vishnu kannan
d45286c575 update cos kernel sha for node e2e GPU installer
Signed-off-by: Vishnu kannan <vishnuk@google.com>
2017-06-01 17:09:18 -07:00
Jing Xu
943fc53bf7 Add predicates check for local storage request
This PR adds the check for local storage request when admitting pods. If
the local storage request exceeds the available resource, pod will be
rejected.
2017-06-01 15:57:50 -07:00