Commit Graph

913 Commits

Author SHA1 Message Date
Random-Liu
dc023144a3 Move docker validation test to separate project. 2017-05-23 14:07:15 -07:00
FengyunPan
287f703d3a Close file after os.Open() 2017-05-22 21:51:11 +08:00
Vishnu kannan
86b5edb79a Update COS version to m59
Signed-off-by: Vishnu kannan <vishnuk@google.com>
2017-05-20 21:17:19 -07:00
Vishnu kannan
1e77594958 Adding an installer script that installs Nvidia drivers in Container Optimized OS
Packaged the script as a docker container stored in gcr.io/google-containers
A daemonset deployment is included to make it easy to consume the installer
A cluster e2e has been added to test the installation daemonset along with verifying installation
by using a sample CUDA application.
Node e2e for GPUs updated to avoid running on nodes without GPU devices.

Signed-off-by: Vishnu kannan <vishnuk@google.com>
2017-05-20 21:17:19 -07:00
Kubernetes Submit Queue
112ed869c7 Merge pull request #46053 from dashpole/test_eviction_metrics
Automatic merge from submit-queue (batch tested with PRs 46033, 46122, 46053, 46018, 45981)

Log age of stats used for evictions during eviction tests

I recently added prometheus metrics for the age of the metrics used for evictions #43031.  It would be nice to surface these during eviction tests, so I can better assess how old stats are, and whether or not the age of stats causes extra evictions.

This isnt super-high priority, and can be done after code-freeze, since it is a testing improvement.  Feel free to take a look whenever either of you has time.

/assign @mtaufen 
/assign @Random-Liu
2017-05-19 23:29:28 -07:00
Kubernetes Submit Queue
51f3ac1b99 Merge pull request #45004 from feiskyer/hostnetwork
Automatic merge from submit-queue

Add node e2e tests for hostNetwork

**What this PR does / why we need it**:

Add node e2e tests for hostNetwork.

**Which issue this PR fixes**

Part of #44118.

**Special notes for your reviewer**:

**Release note**:

```release-note
NONE
```

/assign @Random-Liu @yujuhong
2017-05-19 01:18:07 -07:00
David Ashpole
0bd0d705e3 log age of stats used for evictions during eviction tests 2017-05-18 13:51:23 -07:00
Kubernetes Submit Queue
23f0fe8632 Merge pull request #45901 from Random-Liu/fix-node-e2e
Automatic merge from submit-queue (batch tested with PRs 44520, 45253, 45838, 44685, 45901)

Fix node e2e panic when not using image config file.

In https://github.com/kubernetes/kubernetes/pull/45430, `resources` field in image config is a pointer, and we only initialize it when using image config file.

However, we still have test specifying images directly without image config file, this will cause those test to panic. See:
* https://k8s-testgrid.appspot.com/google-docker#kubelet
* https://k8s-testgrid.appspot.com/google-docker#e2e-cos

This PR fixes this.

@vishh @mtaufen @kewu1992
2017-05-16 21:28:02 -07:00
Kubernetes Submit Queue
85775105f1 Merge pull request #44520 from dashpole/test_eviction_fix
Automatic merge from submit-queue (batch tested with PRs 44520, 45253, 45838, 44685, 45901)

Ensure ordering of using dynamic kubelet config and setting up tests.

This PR simply places the body of the eviction test within its own context.  This ensures that the kubelet config is set before the pods are created, and that the kubelet config is reverted only after the pods are deleted.
2017-05-16 21:27:54 -07:00
Pengfei Ni
f9eafea8bf Add node e2e tests for hostNetwork 2017-05-17 10:31:17 +08:00
Kubernetes Submit Queue
d823a6e228 Merge pull request #45899 from vishh/fix-nodee2e-sone
Automatic merge from submit-queue (batch tested with PRs 45247, 45810, 45034, 45898, 45899)

Fix zone in node e2e serial tests

```shell
$ gcloud compute regions describe us-west1 --project k8s-jkns-ci-node-e2e
creationTimestamp: '2016-06-14T17:29:18.761-07:00'
description: us-west1
id: '1210'
kind: compute#region
name: us-west1
quotas:
- limit: 100.0
  metric: CPUS
  usage: 0.0
- limit: 10240.0
  metric: DISKS_TOTAL_GB
  usage: 0.0
- limit: 7.0
  metric: STATIC_ADDRESSES
  usage: 0.0
- limit: 100.0
  metric: IN_USE_ADDRESSES
  usage: 0.0
- limit: 2048.0
  metric: SSD_TOTAL_GB
  usage: 0.0
- limit: 10240.0
  metric: LOCAL_SSD_TOTAL_GB
  usage: 0.0
- limit: 100.0
  metric: INSTANCE_GROUPS
  usage: 0.0
- limit: 50.0
  metric: INSTANCE_GROUP_MANAGERS
  usage: 0.0
- limit: 1000.0
  metric: INSTANCES
  usage: 0.0
- limit: 50.0
  metric: AUTOSCALERS
  usage: 0.0
- limit: 20.0
  metric: REGIONAL_AUTOSCALERS
  usage: 0.0
- limit: 20.0
  metric: REGIONAL_INSTANCE_GROUP_MANAGERS
  usage: 0.0
selfLink: https://www.googleapis.com/compute/v1/projects/k8s-jkns-ci-node-e2e/regions/us-west1
status: UP
zones:
- https://www.googleapis.com/compute/v1/projects/k8s-jkns-ci-node-e2e/zones/us-west1-a
- https://www.googleapis.com/compute/v1/projects/k8s-jkns-ci-node-e2e/zones/us-west1-b
```
2017-05-16 19:02:03 -07:00
Random-Liu
56803ec97d Fix node e2e panic when not using image config file. 2017-05-16 11:36:50 -07:00
Vishnu Kannan
03b3d2e119 fix zone in node e2e serial tests
Signed-off-by: Vishnu Kannan <vishnuk@google.com>
2017-05-16 11:03:07 -07:00
David Ashpole
9f098a9c1a add context around test 2017-05-16 09:38:41 -07:00
zhengjiajin
de1150385d fix typo 2017-05-15 12:07:36 +08:00
Vishnu kannan
d1b4dba440 adding support for gpus in node e2e
Signed-off-by: Vishnu kannan <vishnuk@google.com>
2017-05-13 16:35:54 -07:00
Kubernetes Submit Queue
3619c33350 Merge pull request #42759 from mtaufen/kubelet-apis-reorg
Automatic merge from submit-queue

Reorganize kubelet tree so apis can be independently versioned

@yujuhong @lavalamp @thockin @bgrant0607 
This is an example of how we might reorganize `pkg/kubelet` so the apis it exposes can be independently versioned. This would also provide a logical place to put the `KubeletConfiguration` type, which currently lives in `pkg/apis/componentconfig`; it could live in e.g. `pkg/kubelet/apis/config` instead.

Take a look when you have a chance and let me know what you think. The most significant change in this PR is reorganizing `pkg/kubelet/api` to `pkg/kubelet/apis`, the rest is pretty much updating import paths and `BUILD` files.
2017-05-12 17:43:22 -07:00
Michael Taufen
cbad320205 Reorganize kubelet tree so apis can be independently versioned 2017-05-12 10:02:33 -07:00
Kubernetes Submit Queue
3d704fa40f Merge pull request #45676 from mtaufen/fix-dkcfg-test-gci
Automatic merge from submit-queue

Fix flag formatting errors in the node tests

There were three problems:
- Lack of a trailing space after prepending flags.
- Passing multiple flags in a string to --kubelet-flags seems to confuse
the flag parser; it stops parsing ALL flags as soon as it sees the
second kubelet flag. Fortunately, all instances of --kubelet-flags are
combined together, so we can just pass two of those.
- --feature-gates should be passed to the test framework, which then
forwards it to the kubelet, instead of using --kubelet-flags.

This hopefully fixes the dynamic config test failures on COS, which
started after #45602. (See: https://k8s-testgrid.appspot.com/google-node#kubelet-serial-gce-e2e)
2017-05-12 09:04:38 -07:00
Kubernetes Submit Queue
e1bb9a5177 Merge pull request #45667 from yujuhong/mv-pull-tests
Automatic merge from submit-queue (batch tested with PRs 45691, 45667, 45698, 45715)

dockertools: migrate the unit tests and delete the package
2017-05-12 04:09:41 -07:00
Kubernetes Submit Queue
ee4e5e79f3 Merge pull request #45437 from kewu1992/cos-docker-validation
Automatic merge from submit-queue

Add properties file for cos-docker-validation test job

**What this PR does / why we need it**: 
This is forked from test/e2e_node/jenkins/docker_validation/jenkins-validation.properties. It is used for COS docker validation test.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #

**Special notes for your reviewer**:

**Release note**:

```NONE
```
2017-05-11 20:58:49 -07:00
Michael Taufen
abb5c3fd5a Fix flag formatting errors in the node tests
There were three problems:
- Lack of a trailing space after prepending flags.
- Passing multiple flags in a string to --kubelet-flags seems to confuse
the flag parser; it stops parsing ALL flags as soon as it sees the
second kubelet flag. Fortunately, all instances of --kubelet-flags are
combined together, so we can just pass two of those.
- --feature-gates should be passed to the test framework, which then
forwards it to the kubelet, instead of using --kubelet-flags.

This hopefully fixes the dynamic config test failures on COS, which
started after #45602.
2017-05-11 11:28:49 -07:00
Yu-Ju Hong
fccf34ccb6 Remove various references of dockertools
Also update the bazel files.
2017-05-11 10:01:41 -07:00
Kubernetes Submit Queue
a507d30833 Merge pull request #45602 from dashpole/enable_memcg_for_all_tests
Automatic merge from submit-queue (batch tested with PRs 45569, 45602, 45604, 45478, 45550)

Enable kernel memcg notification for node and cluster GCI/COS testing.

Sets --experimental-kernel-memcg-notification=true when running on the GCI/COS image.  It sets this for master and nodes for cluster e2e tests, and for the node in node e2e tests.

Issue #42676 

cc @dchen1107 @Random-Liu
2017-05-10 21:34:39 -07:00
David Ashpole
0b1e45c5ff enable memcg on all testing 2017-05-10 11:38:26 -07:00
Kubernetes Submit Queue
51a3413371 Merge pull request #45307 from yujuhong/mv-docker-client
Automatic merge from submit-queue (batch tested with PRs 45453, 45307, 44987)

Migrate the docker client code from dockertools to dockershim

Move docker client code from dockertools to dockershim/libdocker. This includes
DockerInterface (renamed to Interface), FakeDockerClient, etc.

This is part of #43234
2017-05-09 20:23:44 -07:00
Pengfei Ni
2b540b6d74 Add node e2e tests for hostIPC 2017-05-09 18:25:19 +08:00
Ke Wu
bdcfad15ce Add properties file for cos-docker-validation test job 2017-05-05 14:49:25 -07:00
Yu-Ju Hong
cf3635c876 Update bazel BUID files 2017-05-05 11:48:08 -07:00
Yu-Ju Hong
389c140eaf Move docker client code from dockertools to dockershim/dockerlib
The code affected include DockerInterface (renamed to Interface),
FakeDockerClient, etc.
2017-05-05 11:48:08 -07:00
Jamie Hannaford
9440a68744 Use dedicated Unix User and Group ID types 2017-05-05 14:07:38 +02:00
Yu-Ju Hong
5644587e07 More dockertools cleanup
Move some constants/functions to dockershim and remove unused tests.
2017-05-03 11:22:06 -07:00
Kubernetes Submit Queue
4998d78f89 Merge pull request #43883 from yujuhong/rm_non-cri-configs
Automatic merge from submit-queue (batch tested with PRs 43884, 44712, 45124, 43883)

Node e2e: Remove CRI/non-CRI configs

This depends on https://github.com/kubernetes/test-infra/pull/2363
2017-05-01 15:49:13 -07:00
Yang Guo
f7c2efa42b adds Ubuntu node e2e tests 2017-05-01 12:22:11 -07:00
Kubernetes Submit Queue
8efb5c9957 Merge pull request #44983 from caesarxuchao/easy-remove-client-go-api-scheme
Automatic merge from submit-queue (batch tested with PRs 45052, 44983, 41254)

Non-controversial part of #44523

For easier review of #44523, i extracted the non-controversial part out to this PR.
2017-04-27 17:14:04 -07:00
Kubernetes Submit Queue
8ab63dd9ea Merge pull request #42740 from mtaufen/tarball-cleanup
Automatic merge from submit-queue (batch tested with PRs 42740, 44980, 45039, 41627, 45044)

Cleanup some of the tarball producing code for e2e node tests

This is some e2e node cleanup work I found sitting in a local branch while deleting old local git branches. It looks like it's still useful.
2017-04-27 13:27:00 -07:00
Chao Xu
958903509c bazel 2017-04-27 09:41:53 -07:00
Chao Xu
3fa7b7824a easy changes 2017-04-27 09:41:53 -07:00
Kubernetes Submit Queue
7670341f56 Merge pull request #43395 from sjenning/selinux-e2e-node-lifecycle
Automatic merge from submit-queue (batch tested with PRs 43395, 44960)

e2e-node: refactor lifecycle test to avoid selinux issues

Fixes #42905

Previously, the exec hook tests mounted a HostPath volume from /tmp and touched a file as a indicator that the hook had run.  This is prohibited by selinux policy on Fedora/RHEL/Centos.

This PR refactors the test to avoid filesystem indication and use the same indication that the HTTP hooks use; a GET to a http endpoint.  The exec hooks run `curl` to hit this endpoint and trigger the indication.  This simplifies this test quite a bit as well, removing over 85 lines of code.

REVIEWER NOTE: The diff is a mess on this one.  Probably better to just review the new version of the file.

@derekwaynecarr
2017-04-26 14:29:41 -07:00
Ryan Hitchman
0c6fd62582 Fix the last deprecated "gcloud docker push args" usage. 2017-04-26 11:20:25 -07:00
NickrenREN
d4376599ba Cleanup: replace some hardcoded codes and remove unused functions 2017-04-25 09:38:25 +08:00
Maru Newby
9413071ce8 e2e: Prefer kubeconfig host to default 2017-04-19 14:58:43 -07:00
Kubernetes Submit Queue
4f50a8d7cd Merge pull request #44457 from dnardo/e2e_node_cni
Automatic merge from submit-queue (batch tested with PRs 43000, 44500, 44457, 44553, 44267)

Updates e2e_node test to allow both kubenet and cni to be specified f…

…or the network plugin.

This adds a simple CNI configuration which is added to the node during test setup.
This also modifies the default flags in services/kubelet.go to specify the "cni-bin-dir"
and the "cni-conf-dir" and removes the "network-plugin-dir" flag.  This leaves the default
network plugin to kubenet.



**What this PR does / why we need it**:

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #

**Special notes for your reviewer**:

**Release note**:

```release-note
```
2017-04-18 13:19:08 -07:00
Chao Xu
4f9591b1de move pkg/api/v1/ref.go and pkg/api/v1/resource.go to subpackages. move some functions in resource.go to pkg/api/v1/node and pkg/api/v1/pod 2017-04-17 11:38:11 -07:00
Mike Danese
a05c3c0efd autogenerated 2017-04-14 10:40:57 -07:00
Daniel Nardo
29b2708046 Add comment to setupCNI. 2017-04-13 15:11:24 -07:00
Daniel Nardo
4e458ce001 Updates e2e_node test to allow both kubenet and cni to be specified for the network plugin.
This adds a simple CNI configuration which is added to the node during test setup.
This also modifies the default flags in services/kubelet.go to specify the "cni-bin-dir"
and the "cni-conf-dir" and removes the "network-plugin-dir" flag.  This leaves the default
network plugin to kubenet.
2017-04-13 09:57:46 -07:00
Timothy St. Clair
442713aaaf Remove leagcy init that no longer works. 2017-04-11 08:48:59 -05:00
Kubernetes Submit Queue
f3d2ea5dfd Merge pull request #43990 from php-coder/e2e_readmes
Automatic merge from submit-queue

test/e2e*: add/update README.md files

**What this PR does / why we need it**:

This PR is adding `README.md` files with a link to the documentation to all E2E tests.
2017-04-10 08:05:04 -07:00
Michael Taufen
e6321c7440 Cleanup some of the tarball producing code 2017-04-10 07:09:31 -07:00
Pengfei Ni
a696e86bb0 Add node e2e tests for hostPid 2017-04-07 13:41:58 +08:00
Kubernetes Submit Queue
08fefc9d9a Merge pull request #42769 from timchenxiaoyu/acrosstypo
Automatic merge from submit-queue

fix across typo

fix across typo


NONE
2017-04-05 14:28:26 -07:00
Kubernetes Submit Queue
0a1385178d Merge pull request #43248 from yujuhong/pause_proc
Automatic merge from submit-queue

node e2e: improve the validate OOM score test for infra containers

The test blindly checked all "pause" processes on the node, assuming
they were all infra containers. This change takes a snapshot of all
existing "pause" processes on the node, and exclude them in the
validation. The test still relies on the fact that it runs exclusively
on the node. If that assumption changes, we will need other methods to
locate the PIDs of the infra containers.

This fixes #37580
2017-04-03 20:20:53 -07:00
Slava Semushin
be78d03afb test/e2e*: add/update README.md files. 2017-04-03 19:05:50 +02:00
Kubernetes Submit Queue
25a87fa19c Merge pull request #40804 from runcom/prepull-cri
Automatic merge from submit-queue

test/e2e_node: prepull images with CRI

Part of https://github.com/kubernetes/kubernetes/issues/40739

- This PR builds on top of #40525 (and contains one commit from #40525)
- The second commit contains a tiny change in the `Makefile`.
- Third commit is a patch to be able to prepull images using the CRI (as opposed to run `docker` to pull images which doesn't make sense if you're using CRI most of the times)

Marked WIP till #40525 makes its way into master

@Random-Liu @lucab @yujuhong @mrunalp @rhatdan
2017-04-01 03:08:35 -07:00
Antonio Murdaca
2634f57f7f
test/e2e_node: prepull images with CRI
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
2017-04-01 10:18:56 +02:00
Kubernetes Submit Queue
659ea8708f Merge pull request #43407 from sjenning/selinux-npd-refactor
Automatic merge from submit-queue

tests: e2e-node: refactor node-problem-detector test to avoid selinux…

Fixes https://github.com/kubernetes/kubernetes/issues/43401

This test creates a file in /tmp on the host and creates a HostPath volume in the container to so that the host can inject messages into the logfile being read by the node problem detector in the container.  However, selinux prohibits the container from reading files out of /tmp on the host.

This PR modifies the test to create the log file in an EmptyDir volume instead, which will be properly labeled for container access.

@derekwaynecarr
2017-03-31 18:30:46 -07:00
Derek Carr
7ded90eafb Add derekwaynecarr to approvers list for test/e2e_node 2017-03-31 18:00:15 -04:00
Yu-Ju Hong
09cf5c8192 Node e2e: Remove CRI/non-CRI configs 2017-03-30 12:07:05 -07:00
Seth Jennings
8f6f6bf141 tests: e2e-node: refactor node-problem-detector test to avoid selinux issues 2017-03-29 10:26:59 -05:00
deads2k
8e26fa25da wire in aggregation 2017-03-27 09:44:10 -04:00
Kubernetes Submit Queue
0afb06b600 Merge pull request #43083 from alejandroEsc/ae/osx/e2e_node_problem_detector
Automatic merge from submit-queue (batch tested with PRs 43378, 43216, 43384, 43083, 43428)

Darwin won't build: syscall.Sysinfo issue.

**What this PR does / why we need it**: On darwin had problems building and testing because of syscall.Sysinfo_t etc which is a linux specific command. 

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*:

**Special notes for your reviewer**: Definitely would like another set of eyes on the bootTime function, it will have to be inaccurate but open to suggestions about improving this for darwin.


**Release note**:
```
NONE
```
2017-03-25 21:22:26 -07:00
Kubernetes Submit Queue
fb537762fc Merge pull request #42297 from YuPengZTE/devErrorf
Automatic merge from submit-queue (batch tested with PRs 42237, 42297, 42279, 42436, 42551)

should replace errors.New(fmt.Sprintf(...)) with fmt.Errorf(...)

Signed-off-by: yupengzte <yu.peng36@zte.com.cn>



**What this PR does / why we need it**:

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #

**Special notes for your reviewer**:

**Release note**:

```release-note
```
2017-03-24 14:16:23 -07:00
caleb miles
f4d9bbc7d8
Bump CNI consumers to latest version
- vendored CNI plugins properly handle `DEL` on missing resources
- [based on v0.5.1](https://github.com/kubernetes/kubernetes/issues/43488#issuecomment-288525151)
2017-03-22 16:03:13 -07:00
Derek Carr
5c8b957779 Fix faulty assumptions in summary API testing 2017-03-20 14:56:11 -04:00
Seth Jennings
5156f582fe e2e-node: refactor lifecycle test to avoid selinux issues 2017-03-20 11:56:10 -05:00
Alejandro Escobar
ee741f23af added a space
bazel update

added new files to reflect that only one method has changed between arch types.

forgot to add changes to a commit.

changes made and gfmt run.

changed node_problem_detector to node_problem_detector_linux and made it linux only.

updated bazel
2017-03-20 06:40:27 -07:00
Yu-Ju Hong
056e343e03 node e2e: improve the validate OOM score test for infra containers
The test blindly checked all "pause" processes on the node, assuming
they were all infra containers. This change takes a snapshot of all
existing "pause" processes on the node, and exclude them in the
validation. The test still relies on the fact that it runs exclusively
on the node. If that assumption changes, we will need other methods to
locate the PIDs of the infra containers.
2017-03-16 15:39:03 -07:00
Kubernetes Submit Queue
5e0f0047dd Merge pull request #43242 from timstclair/summary-test
Automatic merge from submit-queue

Relax 'misc' container memory constraints

Fixes https://github.com/kubernetes/kubernetes/issues/40607

/cc @dchen1107
2017-03-16 15:25:23 -07:00
Tim St. Clair
827dd340d4
Relax 'misc' container memory constraints 2017-03-16 12:08:22 -07:00
Kubernetes Submit Queue
6656ffc300 Merge pull request #43165 from Random-Liu/update-npd
Automatic merge from submit-queue

Update npd to the official v0.3.0 release.

Update npd to the official release v0.3.0.

This also fixes a npd bug https://github.com/kubernetes/node-problem-detector/pull/98.

@dchen1107 @kubernetes/node-problem-detector-reviewers
2017-03-16 11:23:43 -07:00
Random-Liu
c4b3fd4e63 Update npd to the official v0.3.0 release. 2017-03-15 14:26:12 -07:00
Tim St. Clair
9a1236ae20
Add process debug information to summary test 2017-03-14 17:45:12 -07:00
Vishnu Kannan
8ed9bff073 handle container restarts for GPUs
Signed-off-by: Vishnu Kannan <vishnuk@google.com>
2017-03-13 10:58:26 -07:00
Random-Liu
f81460e35d Change the junit file name format to junit_image-name_id.xml,
and make the gci image name shorter.
2017-03-09 16:47:48 -08:00
Kubernetes Submit Queue
7c08e817a5 Merge pull request #42734 from dashpole/deletion_timeout
Automatic merge from submit-queue (batch tested with PRs 42734, 42745, 42758, 42814, 42694)

Create DefaultPodDeletionTimeout for e2e tests

In our e2e and e2e_node tests, we had a number of different timeouts for deletion.
Recent changes to the way deletion works (#41644, #41456) have resulted in some timeouts in e2e tests.  #42661 was the most recent fix for this.
Most of these tests are not meant to test pod deletion latency, but rather just to clean up pods after a test is finished.
For this reason, we should change all these tests to use a standard, fairly high timeout for deletion.

cc @vishh @Random-Liu
2017-03-09 15:06:53 -08:00
Kubernetes Submit Queue
eefa2ef1bb Merge pull request #42425 from apprenda/kubeadm_189_docker_version
Automatic merge from submit-queue (batch tested with PRs 42762, 42739, 42425, 42778)

kubeadm: update docker version for CE and EE

**What this PR does / why we need it**: Update regex for docker version to also capture new CE and EE versions. 

**Which issue this PR fixes**: fixes #https://github.com/kubernetes/kubeadm/issues/189

**Special notes for your reviewer**: /cc @jbeda @luxas

**Release note**:
```release-note
NONE
```
2017-03-09 02:51:40 -08:00
timchenxiaoyu
767719ea9c fix across typo 2017-03-09 09:07:21 +08:00
David Ashpole
3806d386df use default timeout for deletion 2017-03-08 14:40:19 -08:00
Derek McQuay
35f07095d8
kubeadm: validators pass warnings and errors
This change allows validators to pass warnings as well as errors. This
was needed because of how support for docker 1.13+ and the new EE and CE
versions is currently being handled.
2017-03-08 14:35:26 -08:00
Kubernetes Submit Queue
bf7f42d362 Merge pull request #42499 from dashpole/memcg_test_suite
Automatic merge from submit-queue

New e2e node test suite with memcg turned on

The flag --experimental-kernal-memcg-notification was initially added to allow disabling an eviction feature which used memcg notifications to make memory evictions more reactive.
As documented in #37853, memcg notifications increased the likelihood of encountering soft lockups, especially on CVM.

This feature would valuable to turn on, at least for GCI, since soft lockup issues were less prevalent on GCI and appeared (at the time) to be unrelated to memcg notifications.

In the interest of caution, I would like to monitor serial tests on GCI with --experimental-kernal-memcg-notification=true.

cc @vishh @Random-Liu @dchen1107 @kubernetes/sig-node-pr-reviews
2017-03-07 18:47:40 -08:00
Kubernetes Submit Queue
0d60fc4013 Merge pull request #42687 from dashpole/flaky_to_serial
Automatic merge from submit-queue (batch tested with PRs 42664, 42687)

[Fix Flaky Tests] E2e Node Flaky test suite runs serially

The [e2e Node Flaky Test Suite](https://k8s-testgrid.appspot.com/google-node#kubelet-flaky-gce-e2e&width=20) has been failing with strange errors.
This is because the tests in that suite are meant to be run serially, but are running in parallel, since that was left out of the config.  This PR fixes this by changing the Flaky test suite to serial

cc @Random-Liu
2017-03-07 17:51:17 -08:00
David Ashpole
b0d138692e make the flaky suite run serially. Should prevent all the dynamic config errors 2017-03-07 15:12:17 -08:00
David Ashpole
0e20caf3fb new suite with memcg turned on 2017-03-07 14:14:08 -08:00
David Ashpole
6a0d5506c2 use default timeout 2017-03-07 11:45:59 -08:00
Derek McQuay
eeefd2ca87
kubeadm: fail on docker version 1.13+, CE, and EE 2017-03-07 10:20:32 -08:00
Derek Carr
48d822eafe cgroup names created by kubelet should be lowercased 2017-03-06 11:19:21 -05:00
yupengzte
363f321f32 should replace errors.New(fmt.Sprintf(...)) with fmt.Errorf(...)
Signed-off-by: yupengzte <yu.peng36@zte.com.cn>
2017-03-06 09:14:48 +08:00
Kubernetes Submit Queue
cb0728c50f Merge pull request #42457 from yujuhong/do_not_panic
Automatic merge from submit-queue (batch tested with PRs 42456, 42457, 42414, 42480, 42370)

node e2e: apparmor test should fail instead of panicking

This doesn't fix #42420, but at least stop the test from panicking.
2017-03-04 00:17:42 -08:00
Kubernetes Submit Queue
f9ccee7714 Merge pull request #42435 from dashpole/timestamps_for_fsstats
Automatic merge from submit-queue (batch tested with PRs 42369, 42375, 42397, 42435, 42455)

[Bug Fix]: Avoid evicting more pods than necessary by adding Timestamps for fsstats and ignoring stale stats

Continuation of #33121.  Credit for most of this goes to @sjenning.  I added volume fs timestamps.

**why is this a bug** 
This PR attempts to fix part of https://github.com/kubernetes/kubernetes/issues/31362 which results in multiple pods getting evicted unnecessarily whenever the node runs into resource pressure. This PR reduces the chances of such disruptions by avoiding reacting to old/stale metrics.
Without this PR, kubernetes nodes under resource pressure will cause unnecessary disruptions to user workloads. 
This PR will also help deflake a node e2e test suite.

The eviction manager currently avoids evicting pods if metrics are old.  However, timestamp data is not available for filesystem data, and this causes lots of extra evictions.
See the [inode eviction test flakes](https://k8s-testgrid.appspot.com/google-node#kubelet-flaky-gce-e2e) for examples.
This should probably be treated as a bugfix, as it should help mitigate extra evictions.

cc: @kubernetes/sig-storage-pr-reviews  @kubernetes/sig-node-pr-reviews @vishh @derekwaynecarr @sjenning
2017-03-03 23:21:48 -08:00
Kubernetes Submit Queue
2d319bd406 Merge pull request #42204 from dashpole/allocatable_eviction
Automatic merge from submit-queue

Eviction Manager Enforces Allocatable Thresholds

This PR modifies the eviction manager to enforce node allocatable thresholds for memory as described in kubernetes/community#348.
This PR should be merged after #41234. 

cc @kubernetes/sig-node-pr-reviews @kubernetes/sig-node-feature-requests @vishh 

** Why is this a bug/regression**

Kubelet uses `oom_score_adj` to enforce QoS policies. But the `oom_score_adj` is based on overall memory requested, which means that a Burstable pod that requested a lot of memory can lead to OOM kills for Guaranteed pods, which violates QoS. Even worse, we have observed system daemons like kubelet or kube-proxy being killed by the OOM killer.
Without this PR, v1.6 will have node stability issues and regressions in an existing GA feature `out of Resource` handling.
2017-03-03 20:20:12 -08:00
Kubernetes Submit Queue
67500b3947 Merge pull request #42443 from Random-Liu/fix-node-e2e-npd
Automatic merge from submit-queue (batch tested with PRs 42443, 38924, 42367, 42391, 42310)

Cast system uptime to time.Duration to fix cross build.

Fixes https://github.com/kubernetes/kubernetes/issues/42441.

Cast system uptime to `time.Duration` to avoid different behavior on different architectures.

@sjenning @ixdy @ncdc
2017-03-03 18:08:38 -08:00
Kubernetes Submit Queue
98eae9b222 Merge pull request #42341 from dashpole/critial_pod_test
Automatic merge from submit-queue

Critial pod test uses allocatable instead of capacity

This solves #42239.

When this test was first introduced, pods could request up to the capacity of the node.
With the addition of allocatable introduced in #41234, this is no longer the case, and pods can only use up to allocatable.

This should be included in 1.6, as it is a bug related to a 1.6 feature.

cc @vish @yujuhong
2017-03-03 14:34:37 -08:00
Yu-Ju Hong
1d907dbf4f node e2e: apparmor test should fail instead of panicking 2017-03-02 16:36:52 -08:00
David Ashpole
a90c7951d4 add volume timestamps 2017-03-02 15:01:59 -08:00
Random-Liu
d41c2503e7 Cast system uptime to time.Duration to fix cross build. 2017-03-02 14:48:09 -08:00
Kubernetes Submit Queue
1d97472361 Merge pull request #41928 from Random-Liu/move-npd-test-to-node-e2e
Automatic merge from submit-queue (batch tested with PRs 41984, 41682, 41924, 41928)

Move node problem detector test into node e2e.

Move current NPD e2e test into node e2e.

In fact, current NPD e2e test is only a functionality test for NPD. It creates test NPD pod, sets test configuration, generates test logs and verifies test result.
It doesn't actually test the NPD really deployed in the cluster.

So it doesn't actually need to run in cluster e2e. Running it in node e2e will:
1) Make it easier to run the test.
2) Make it more light weight to introduce this as a pre/post submit test in NPD repo in the future.

Except this, I'm working on a cluster e2e to run some basic functionality test and benchmark test against the real NPD deployed in the cluster. Will send the PR later.

/cc @dchen1107 @kubernetes/node-problem-detector-reviewers
2017-03-02 10:51:18 -08:00
David Ashpole
ac612eab8e eviction manager changes for allocatable 2017-03-02 07:36:24 -08:00
David Ashpole
5fa6515509 critial pod test uses allocatable instead of capacity 2017-03-01 09:57:17 -08:00
Kubernetes Submit Queue
ed479163fa Merge pull request #42116 from vishh/gpu-experimental-support
Automatic merge from submit-queue

Extend experimental support to multiple Nvidia GPUs

Extended from #28216

```release-note
`--experimental-nvidia-gpus` flag is **replaced** by `Accelerators` alpha feature gate along with  support for multiple Nvidia GPUs. 
To use GPUs, pass `Accelerators=true` as part of `--feature-gates` flag.
Works only with Docker runtime.
```

1. Automated testing for this PR is not possible since creation of clusters with GPUs isn't supported yet in GCP.
1. To test this PR locally, use the node e2e.
```shell
TEST_ARGS='--feature-gates=DynamicKubeletConfig=true' FOCUS=GPU SKIP="" make test-e2e-node
```

TODO:

- [x] Run manual tests
- [x] Add node e2e
- [x] Add unit tests for GPU manager (< 100% coverage)
- [ ] Add unit tests in kubelet package
2017-03-01 04:52:50 -08:00