Commit Graph

607 Commits

Author SHA1 Message Date
Brendan Burns
ef6529bf2f make groupVersionResource listing dynamic when third party resources are
enabled.
2016-11-20 20:48:57 -08:00
David Ashpole
10f73bde27 added eviction minimum reclaim flags to test flags, and changed gce default config for eviction-hard to match what tests are using 2016-11-18 08:48:40 -08:00
Kubernetes Submit Queue
eca9e989a3 Merge pull request #36779 from sjenning/fix-memory-leak-via-terminated-pods
Automatic merge from submit-queue

fix leaking memory backed volumes of terminated pods

Currently, we allow volumes to remain mounted on the node, even though the pod is terminated.  This creates a vector for a malicious user to exhaust memory on the node by creating memory backed volumes containing large files.

This PR removes memory backed volumes (emptyDir w/ medium Memory, secrets, configmaps) of terminated pods from the node.

@saad-ali @derekwaynecarr
2016-11-17 21:29:51 -08:00
Random-Liu
87a9d94f24 Update bazel. 2016-11-17 10:18:00 -08:00
Random-Liu
edf7608c51 Remove kubelet related flags from node e2e. Add a single flag kubelet-flags to pass kubelet flags all together. 2016-11-17 10:17:32 -08:00
Random-Liu
090809d8ad Remove dependency on kubelet related flags. 2016-11-17 10:17:32 -08:00
Yu-Ju Hong
6ba2c7b857 Bump gci image version for cri builds 2016-11-16 14:09:51 -08:00
Seth Jennings
b80bea4a62 fix leaking memory backed volumes of terminated pods 2016-11-16 10:17:22 -06:00
Kubernetes Submit Queue
587cbaf988 Merge pull request #36410 from Random-Liu/avoid-printing-test-result-twice
Automatic merge from submit-queue

Node E2E: Avoid printing test result twice.

This is a problem since long time ago.

`RunSshCommand` includes the command output to the error. If the command running the test fails, the test output will also be included in the error. [The runner prints both the test output and the error](https://github.com/kubernetes/kubernetes/blob/master/test/e2e_node/runner/remote/run_remote.go#L270), which leads the test result to be printed twice. (See the [test result](https://storage.googleapis.com/kubernetes-jenkins/logs/kubelet-gce-e2e-ci/10968/build-log.txt) on node tmp-node-e2e-af900a4d-e2e-node-ubuntu-trusty-docker9-v1-image)

This PR changes `RunSshCommand` not to put command output into the error, and leave the caller to decide how to deal with command output when the command fails.
2016-11-16 02:23:12 -08:00
Random-Liu
09bc5e23a6 Avoid printing test result twice. 2016-11-15 18:10:27 -08:00
Kubernetes Submit Queue
f0aba3d6fe Merge pull request #35811 from dashpole/garbage_collect_testing
Automatic merge from submit-queue

Garbage collection tests the MaxPerPodContainers and MaxContainers constraints

This is the first version of this test.  It tests that containers are garbage collected according to the default configuration.
2016-11-15 11:22:52 -08:00
David Ashpole
f6224590f7 Test Container Garbage Collection 2016-11-15 09:15:31 -08:00
Kubernetes Submit Queue
d8fa4f4d56 Merge pull request #35897 from k82cn/fix_mac_build
Automatic merge from submit-queue

Fixed failed build on Mac.

fixed build error on Mac.
2016-11-14 20:42:50 -08:00
Vishnu kannan
9066253491 [kubelet] rename --cgroups-per-qos to --experimental-cgroups-per-qos to reflect the true nature of that feature
Signed-off-by: Vishnu kannan <vishnuk@google.com>
2016-11-14 14:06:39 -08:00
Michael Taufen
a38c61395e Bump GCI version to gci-dev-56-8977-0-0 2016-11-11 16:00:18 -08:00
Kubernetes Submit Queue
1bc5b822cd Merge pull request #36479 from Random-Liu/node-e2e-node-name
Automatic merge from submit-queue

Node Conformance & E2E: Get node name from node object.

This PR changes the node e2e test framework to get node name from apiserver instead of test flags.

When a user tried out the node conformance test, he found that node conformance test will not work properly if kubelet is started with `hostname-override`.

The reason is that node conformance test is using [the default node name - `os.Hostname`](https://github.com/kubernetes/kubernetes/blob/master/test/e2e_node/e2e_node_suite_test.go#L124), which may be different from `hostname-override`. This will cause test pods not scheduled, and eventually test timeout.

We can expose a flag from node conformance test, and let user set node name themselves if they are using `hostname-override` on kubelet. However, let the framework automatically detect it from apiserver is more user friendly.

/cc @kubernetes/sig-node 
This PR 1) only changes node e2e test framework; 2) fixes a problem in node conformance test which is a 1.5 feature. @saad-ali Can we have this in 1.5?
2016-11-10 18:56:53 -08:00
Kubernetes Submit Queue
e7754e89df Merge pull request #36594 from mtaufen/fixup-density_test
Automatic merge from submit-queue

Fix wrong comparison var in e2e_node density test
2016-11-10 15:27:47 -08:00
Random-Liu
1c70c899f7 Get node name from node object. 2016-11-10 14:25:50 -08:00
Michael Taufen
90f8bffc33 Fix wrong comparison var in e2e_node density test 2016-11-10 10:26:00 -08:00
Kubernetes Submit Queue
44f672e5e2 Merge pull request #34877 from resouer/e2e-log-path
Automatic merge from submit-queue

Add e2e node test for log path

fixes #34661

A node e2e test to check if container logs files are properly created with right content.

Since the log files under `/var/log/containers` are actually symbolic of docker containers log files, we can not use a pod to mount them in and do check (symbolic doesn't supported by docker volume).

cc @Random-Liu
2016-11-10 08:35:59 -08:00
Kubernetes Submit Queue
467a1cd23b Merge pull request #35868 from Random-Liu/cleanup-node-e2e-output-dir
Automatic merge from submit-queue

Node E2E: Reorganize node e2e output directories.

Fixes https://github.com/kubernetes/kubernetes/issues/35074.

This PR cleans up the result directory and workspace directory of node e2e test.

Local result directory:

```
/tmp/_artifacts/
        |----- build-log.txt  (build log)
        |----- *.xml  (junit xml file)
        |----- local/  (local run *.log)
        |----- hostname1/  (remote run *.log)
        |----- hostname2/
```

Workspace directory on test node:

```
/tmp/node-e2e-yyyy-mm-ddThh-mm-ss/
        |----- cluster/  (gci mounter)
        |----- cni/  (cni binary)
        |----- e2e_node.test  (test binary)
        |----- e2e_node_test.tar.gz  (test tar)
        |----- etcd060429031/  (etcd data directory)
        |----- ginkgo  (ginkgo binary)
        |----- kubelet (kubelet binary)
        |----- pod-manifest365096781/  (mirror pod directory)
        |----- results/  (test result directory)
```

@mtaufen 
/cc @kubernetes/sig-node
2016-11-10 01:58:58 -08:00
Yu-Ju Hong
cbe2358940 Remove mounter flags from cri test configs 2016-11-08 22:14:28 -08:00
Kubernetes Submit Queue
6983262914 Merge pull request #36267 from vishh/gci-mounter-scope
Automatic merge from submit-queue

Make GCI nodes mount non tmpfs, ext* & bind mounts using an external mounter 

This PR downloads the stage1 & gci-mounter ACIs as part of cluster bring up instead of downloading them dynamically from gcr.io, which was the cause for #36206.

I have also optimized the containerized mounter to pre-load the mounter image once to avoid fetch latency while using it.

Original PR which got reverted: https://github.com/kubernetes/kubernetes/pull/35821

```release-note
GCI nodes use an external mounter script to mount NFS & GlusterFS storage volumes
```

@mtaufen Node e2e is not re-enabled in this PR.

cc @jingxu97
2016-11-08 19:46:32 -08:00
Vishnu kannan
0562386385 re-enable node e2e for GCI mounter
Signed-off-by: Vishnu kannan <vishnuk@google.com>
2016-11-08 11:09:13 -08:00
Vishnu kannan
dd8ec911f3 Revert "Revert "Merge pull request #35821 from vishh/gci-mounter-scope""
This reverts commit 402116aed4.
2016-11-08 11:09:10 -08:00
Kubernetes Submit Queue
2e4e932391 Merge pull request #36413 from Random-Liu/extend-node-e2e-timeout
Automatic merge from submit-queue

Node E2E: Extend the default ci node e2e test timeout to 1h.

With more and more test added into node e2e, 45m seems to be not enough now.
I saw sometimes the test takes 40+m:
* https://storage.googleapis.com/kubernetes-jenkins/logs/kubelet-gce-e2e-ci/10942/build-log.txt (44m4.917870119s)
* https://storage.googleapis.com/kubernetes-jenkins/logs/kubelet-gce-e2e-ci/10965/build-log.txt (40m2.37254827s)

And sometimes even timeout:
* https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/kubelet-gce-e2e-ci/10968
* https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/kubelet-gce-e2e-ci/10941

Although it's quite likely that the timeout happened because the limit is too tight, we are not 100% sure.
This PR extends the test timeout of regular ci node e2e test to 1h. Let's see whether the timeout will happen again.

@yujuhong
2016-11-08 11:08:12 -08:00
Harry Zhang
fad1990eaa Fixe verify bazel
Remove rootfs and chroot in scripts
2016-11-08 13:01:28 -05:00
Harry Zhang
64c8d3ad3d Add e2e node test for log path
Update to use pod to check log file
2016-11-08 13:01:25 -05:00
Random-Liu
d9ddd64c9c Reorganize node e2e output directories. 2016-11-08 00:12:14 -08:00
Kubernetes Submit Queue
0df6384770 Merge pull request #31093 from Random-Liu/containerize-node-e2e-test
Automatic merge from submit-queue

Node Conformance Test: Containerize the node e2e test

For #30122, #30174.
Based on #32427, #32454.

**Please only review the last 3 commits.**

This PR packages the node e2e test into a docker image:
- 1st commit: Add `NodeConformance` flag in the node e2e framework to avoid starting kubelet and collecting system logs. We do this because:
  - There are all kinds of ways to manage kubelet and system logs, for different situation we need to mount different things into the container, run different commands. It is hard and unnecessary to handle the complexity inside the test suite.
- 2nd commit: Remove all `sudo` in the test container. We do this because:
  - In most container, there is no `sudo` command, and there is no need to use `sudo` inside the container.
  - It introduces some complexity to use `sudo` inside the test. (https://github.com/kubernetes/kubernetes/issues/29211, https://github.com/kubernetes/kubernetes/issues/26748) In fact we just need to run the test suite with `sudo`.
- 3rd commit: Package the test into a docker container with corresponding `Makefile` and `Dockerfile`. We also added a `run_test.sh` script to start kubelet and run the test container. The script is only for demonstration purpose and we'll also use the script in our node e2e framework. In the future, we should update the script to start kubelet in production way (maybe with `systemd` or `supervisord`).

@dchen1107 @vishh 
/cc @kubernetes/sig-node @kubernetes/sig-testing



**Release note**:

<!--  Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access) 
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`. 
-->

``` release-note
Release alpha version node test container gcr.io/google_containers/node-test-ARCH:0.1 for users to verify their node setup.
```
2016-11-07 23:41:25 -08:00
Random-Liu
0247626ed2 Extend the default ci node e2e test timeout to 1h. 2016-11-07 19:02:53 -08:00
Kubernetes Submit Queue
18cdbadb96 Merge pull request #36319 from yujuhong/cri_flag
Automatic merge from submit-queue

Rename experimental-runtime-integration-type to experimental-cri

Also rename the field in the component config to `EnableCRI`
2016-11-07 17:07:14 -08:00
Random-Liu
9345e12bc9 Add Dockerfile and Makefile to containerize node conformance test. 2016-11-07 15:27:53 -08:00
Random-Liu
919935beec Remove sudo in test suite and run test with sudo. 2016-11-07 15:27:53 -08:00
Kubernetes Submit Queue
356230f8a1 Merge pull request #36299 from Random-Liu/mark-more-conformance-test
Automatic merge from submit-queue

Node Conformance Test: Mark more conformance test

For https://github.com/kubernetes/kubernetes/issues/30122.

This PR:
1) Removes unused image test.
2) Marks more conformance tests based on https://docs.google.com/spreadsheets/d/1yib6ypfdWuq8Ikyo-rTcBGHe76Xur7tGqCKD9dkzx0Y/edit?usp=sharing.

Notice that 2 tests are not marked conformance for now:
1. **OOM score test:** The test is serial and is verifying host PID directly. The test should start a pod with PID=host and verify inside the pod. @vishh 
2. **Summary api test:** The assumption made in the test doesn't always make sense for arbitrary image, for example: The fs capacity bounds is only [(100mb, 100gb)](https://github.com/kubernetes/kubernetes/blob/master/test/e2e_node/summary_test.go#L62). @timstclair 
3. We should consider mark **[cgroup manager test](https://github.com/kubernetes/kubernetes/blob/master/test/e2e_node/cgroup_manager_test.go)** as conformance test. 

@dchen1107 @vishh @timstclair 
/cc @kubernetes/sig-node
2016-11-07 12:45:40 -08:00
Yu-Ju Hong
dcce768a3e Rename experimental-runtime-integration-type to experimental-cri 2016-11-07 11:29:24 -08:00
Tim St. Clair
3977a14463 Enable StreamingProxyRedirects for CRI e2e tests 2016-11-07 09:42:44 -08:00
Random-Liu
13a50e3b97 Add containerize flag to avoid starting kubelet and collecting logs. 2016-11-06 20:18:23 -08:00
Kubernetes Submit Queue
9534c4f563 Merge pull request #32427 from Random-Liu/system-verification
Automatic merge from submit-queue

Node Conformance Test: Add system verification

For #30122 and #29081.

This PR introduces system verification test in node e2e and conformance test. It will run before the real test. Once the system verification fails, the test will just fail. The output of the system verification is like this:

```
I0909 23:33:20.622122    2717 validators.go:45] Validating os...
OS: Linux
I0909 23:33:20.623274    2717 validators.go:45] Validating kernel...
I0909 23:33:20.624037    2717 kernel_validator.go:79] Validating kernel version
KERNEL_VERSION: 3.16.0-4-amd64
I0909 23:33:20.624146    2717 kernel_validator.go:93] Validating kernel config
CONFIG_NAMESPACES: enabled
CONFIG_NET_NS: enabled
CONFIG_PID_NS: enabled
CONFIG_IPC_NS: enabled
CONFIG_UTS_NS: enabled
CONFIG_CGROUPS: enabled
CONFIG_CGROUP_CPUACCT: enabled
CONFIG_CGROUP_DEVICE: enabled
CONFIG_CGROUP_FREEZER: enabled
CONFIG_CGROUP_SCHED: enabled
CONFIG_CPUSETS: enabled
CONFIG_MEMCG: enabled
I0909 23:33:20.679328    2717 validators.go:45] Validating cgroups...
CGROUPS_CPU: enabled
CGROUPS_CPUACCT: enabled
CGROUPS_CPUSET: enabled
CGROUPS_DEVICES: enabled
CGROUPS_FREEZER: enabled
CGROUPS_MEMORY: enabled
I0909 23:33:20.679454    2717 validators.go:45] Validating docker...
DOCKER_GRAPH_DRIVER: aufs
```

It verifies the system following a predefined `SysSpec`:

``` go
// DefaultSysSpec is the default SysSpec.
 var DefaultSysSpec = SysSpec{
    OS:            "Linux",
    KernelVersion: []string{`3\.[1-9][0-9].*`, `4\..*`}, // Requires 3.10+ or 4+
    // TODO(random-liu): Add more config
    KernelConfig: KernelConfig{
        Required: []string{
            "NAMESPACES", "NET_NS", "PID_NS", "IPC_NS", "UTS_NS",
            "CGROUPS", "CGROUP_CPUACCT", "CGROUP_DEVICE", "CGROUP_FREEZER",
            "CGROUP_SCHED", "CPUSETS", "MEMCG",
        },
        Forbidden: []string{},
    },
    Cgroups: []string{"cpu", "cpuacct", "cpuset", "devices", "freezer", "memory"},
    RuntimeSpec: RuntimeSpec{
        DockerSpec: &DockerSpec{
            Version: []string{`1\.(9|\d{2,})\..*`}, // Requires 1.9+
            GraphDriver: []string{"aufs", "overlay", "devicemapper"},
        },
    },
 }
```

Currently, it only supports:
- Kernel validation: version validation and kernel configuration validation
- Cgroup validation: validating whether required cgroups subsystems are enabled.
- Runtime Validation: currently, only validates docker graph driver.

The validating framework is ready. The specific validation items could be added over time.

@dchen1107 
/cc @kubernetes/sig-node
2016-11-06 17:12:39 -08:00
Kubernetes Submit Queue
f650ddf800 Merge pull request #35132 from dashpole/per_volume_inode
Automatic merge from submit-queue

Per Volume Inode Accounting

Collects volume inode stats using the same find command as cadvisor.  The command is "find _path_ -xdev -printf '.' | wc -c".  The output is passed to the summary api, and will be consumed by the eviction manager.

This cannot be merged yet, as it depends on changes adding the InodesUsed field to the summary api, and the eviction manager consuming this.  Expect tests to fail until this happens.
DEPENDS ON #35137
2016-11-05 23:45:44 -07:00
Kubernetes Submit Queue
649c0ddd0e Merge pull request #35342 from timstclair/rejected
Automatic merge from submit-queue

[AppArmor] Hold bad AppArmor pods in pending rather than rejecting

Fixes https://github.com/kubernetes/kubernetes/issues/32837

Overview of the fix:

If the Kubelet needs to reject a Pod for a reason that the control plane doesn't understand (e.g. which AppArmor profiles are installed on the node), then it might contiinuously try to run the pod on the same rejecting node. This change adds a concept of "soft rejection", in which the Pod is admitted, but not allowed to run (and therefore held in a pending state). This prevents the pod from being retried on other nodes, but also prevents the high churn. This is consistent with how other missing local resources (e.g. volumes) is handled.

A side effect of the change is that Pods which are not initially runnable will be retried. This is desired behavior since it avoids a race condition when a new node is brought up but the AppArmor profiles have not yet been loaded on it.

``` release-note
Pods with invalid AppArmor configurations will be held in a Pending state, rather than rejected (failed). Check the pod status message to find out why it is not running.
```

@kubernetes/sig-node @timothysc @rrati @davidopp
2016-11-05 22:52:26 -07:00
Random-Liu
150a04d2fc Remove unused image test. 2016-11-05 22:19:43 -07:00
Random-Liu
f4aee8664d Mark more conformance tests. 2016-11-05 21:11:51 -07:00
Kubernetes Submit Queue
56526043d5 Merge pull request #32530 from mtaufen/dynamic-settings-tests
Automatic merge from submit-queue

Utility functions for using dynamic Kubelet configuration from a test

/cc @vishh @dchen1107
2016-11-04 20:24:03 -07:00
Kubernetes Submit Queue
fbe29f43ea Merge pull request #35724 from mtaufen/disable-cmount-for-e2e-node
Automatic merge from submit-queue

Temporarily disable GCI mounter in e2e node tests

This is just so we have an off-switch ready to go if we need it. Don't merge unless we need to disable this functionality in the e2e node tests.
2016-11-04 14:49:52 -07:00
Michael Taufen
c76c9c5330 Temporarily disable GCI mounter in e2e node tests 2016-11-04 12:42:47 -07:00
Yu-Ju Hong
0918a5d5f3 Revert "cr2 e2e: remove experimental-mounter-rootfs flag" 2016-11-04 08:25:03 -07:00
Random-Liu
f9b50f0949 Update bazel. 2016-11-03 20:38:29 -07:00
Random-Liu
b76b2f218b Add unit test for system verification 2016-11-03 20:38:28 -07:00
Random-Liu
a5fdf3850c Add system verification. 2016-11-03 20:37:18 -07:00