Commit Graph

425 Commits

Author SHA1 Message Date
Paul Morie
88acffcda1 Fix error message around gcloud calls in node e2e and gubernator 2016-09-17 01:05:20 -04:00
Kubernetes Submit Queue
d69cdce704 Merge pull request #32820 from coufon/change_collector_log
Automatic merge from submit-queue

change the error log for empty resource usage

This PR changes the error log for empty resource usage buffer for a container to be more clear. It happens when the container name is wrong, or cAdvisor somehow does not response.
2016-09-15 23:54:34 -07:00
Vish Kannan
492ca3bc9c Revert "[kubelet] Fix oom-score-adj policy in kubelet" 2016-09-15 19:28:59 -07:00
Kubernetes Submit Queue
fcc97f37ee Merge pull request #32718 from mikedanese/mv-informer
Automatic merge from submit-queue

move informer and controller to pkg/client/cache

@kubernetes/sig-api-machinery
2016-09-15 16:44:30 -07:00
Zhou Fang
3e16eb5082 change the error log for empty resource usage 2016-09-15 14:13:25 -07:00
Mike Danese
a765d59932 move informer and controller to pkg/client/cache
Signed-off-by: Mike Danese <mikedanese@google.com>
2016-09-15 12:50:08 -07:00
Vishnu kannan
e4acad7afb Fix oom-score-adj policy in kubelet.
Docker daemon and kubelet needs to be protected by setting oom-score-adj to -999.

Signed-off-by: Vishnu kannan <vishnuk@google.com>
2016-09-14 11:56:10 -07:00
Kubernetes Submit Queue
312acd9e30 Merge pull request #32342 from coufon/get_image_machine_info_from_apiserver
Automatic merge from submit-queue

Get image and machine info from apiserver in node e2e test

This PR changes node e2e test to get image and machine information from API server instead of pass them from Jenkins test framework. The original format to pass image and machine info is naming the test node as "machine-image-uuid", which is hard to parse because "-" occurs a lot in both machine and image names.

Now we add two labels "image" and "machine" into performance data. The machine type has the format "cpu:1core,memory:3.6GB".

This PR is based on #32250.
2016-09-14 03:34:45 -07:00
Zhou Fang
b47e22d013 endable DynamicKubeletConfig in benchmark test properties 2016-09-13 13:59:19 -07:00
Zhou Fang
a683eb0418 get image and machine info from api server instead of passing from test
# Please enter the commit message for your changes. Lines starting
2016-09-13 08:41:29 -07:00
Kubernetes Submit Queue
0ca6506850 Merge pull request #32250 from coufon/increase_qps
Automatic merge from submit-queue

Add node e2e density test using 60 QPS for benchmark

This PR adds a new benchmark node e2e density test which sets Kubelet API QPS limit from default 5 to 60, through ConfigMap. 

The latency caused by API QPS limit is as large as ~30% when creating a large batch of pods (e.g. 105). It makes the pod startup latency, as well creation throughput underestimated. This test helps us to know the real performance of Kubelet core.
2016-09-12 20:27:11 -07:00
Michael Taufen
28db03869b Fix memory eviction test parameters. Those parameters should NOT have come through in b9f0bd95 2016-09-12 12:01:53 -07:00
Zhou Fang
a6500cc74a change benchmark configration file to add QPS60 tests 2016-09-12 11:46:38 -07:00
Kubernetes Submit Queue
af325ee7bf Merge pull request #31797 from aveshagarwal/master-dapi-volume-tests-image-update
Automatic merge from submit-queue

Update container image version for downward api volume tests

Some tests were using 0.7, and some were using 0.6, so updating all to 0.7.
@kubernetes/rh-cluster-infra
2016-09-12 01:22:27 -07:00
Kubernetes Submit Queue
469698a803 Merge pull request #32169 from ixdy/node-e2e-flake
Automatic merge from submit-queue

Make error more useful when failing to list node e2e images

To help investigate https://github.com/kubernetes/kubernetes/issues/31694 if it happens again.
2016-09-11 05:07:00 -07:00
Kubernetes Submit Queue
51d996e5d7 Merge pull request #32003 from Random-Liu/change-docker-validation-config-file
Automatic merge from submit-queue

Automated Docker Validation: Change wrong name in perf config.

The config key `containervm-density*` is improper, remove it.

/cc @coufon
2016-09-10 17:58:23 -07:00
Kubernetes Submit Queue
09efe0457d Merge pull request #32163 from mtaufen/more-eviction-logging
Automatic merge from submit-queue

Log pressure condition, memory usage, events in memory eviction test

I want to log this to help us debug some of the latest memory eviction test flakes, where we are seeing burstable "fail" before the besteffort. I saw (in the logs) attempts by the eviction manager to evict besteffort a while before burstable phase changed to "Failed", but the besteffort's phase appeared to remain "Running". I want to see the pressure condition interleaved with the pod phases to get a sense of the eviction manager's knowledge vs. pod phase.
2016-09-09 18:37:55 -07:00
Michael Taufen
b9f0bd959e Log the following items in memory eviction test:
- memory working set
- pressure condition
- events for the default and test namespaces, after the test completes
2016-09-09 13:42:26 -07:00
Kubernetes Submit Queue
e317af87cc Merge pull request #31819 from mtaufen/plumb-feature-gates
Automatic merge from submit-queue

Plumb --feature-gates from TEST_ARGS to components in node e2e tests

This means you can set `TEST_ARGS` on the command line, in a `.properties` config for a Jenkins job, etc, to toggle gated features. For example:

`TEST_ARGS='--feature-gates=DynamicKubeletConfig=true'`

/cc @vishh @jlowdermilk
2016-09-09 12:31:00 -07:00
Zhou Fang
a1d2f43e0e add benchmark test using 60 QPS 2016-09-07 17:51:51 -07:00
Kubernetes Submit Queue
0bd0d5571a Merge pull request #31540 from mtaufen/DockerOrDieRename
Automatic merge from submit-queue

Rename ConnectToDockerOrDie to CreateDockerClientOrDie

This function does not actually attempt to connect to the docker daemon, it just creates a client object that can be used to do so later. The old name was confusing, as it implied that a failure to touch the docker daemon could cause program termination (rather than just a failure to create the client).
2016-09-07 15:27:41 -07:00
Jeff Grafton
1e0cbbf451 Make error more useful when failing to list node e2e images 2016-09-06 17:16:53 -07:00
Minhan Xia
1e88c99e3e bump cni 2016-09-06 10:48:36 -07:00
Kubernetes Submit Queue
2a7d0df30d Merge pull request #30727 from asalkeld/iptables-caps
Automatic merge from submit-queue

Clean up IPTables caps i.e.: sed -i "s/Iptables/IPTables/g"

Fixes #30651
2016-09-06 09:01:27 -07:00
Kubernetes Submit Queue
631fda19a1 Merge pull request #31769 from Random-Liu/wrong-log-permission-bit
Automatic merge from submit-queue

Node E2E: Fix wrong permission bit for log file.

When creating log for logs from journald, we use `0755` which is weird to me.
This PR changes it to `0666`.
2016-09-06 01:24:12 -07:00
Random-Liu
35cabc370e Fix wrong permission bit for log file. 2016-09-02 14:05:18 -07:00
Vishnu kannan
59e14cf55b Increase logging level for e2e node services
Signed-off-by: Vishnu kannan <vishnuk@google.com>
2016-09-02 14:05:09 -07:00
Random-Liu
d36f7220da Change wrong name in perf config. 2016-09-02 13:48:10 -07:00
Kubernetes Submit Queue
0cdaf1028e Merge pull request #31912 from mtaufen/eviction-aftereach
Automatic merge from submit-queue

Move wait for pressure to subside to AfterEach 

so we still wait if the test part of the test for eviction order fails.
2016-09-02 13:15:36 -07:00
Zhou Fang
bb0a0c0fe9 increase resource usage limit in node e2e test 2016-09-02 09:12:40 -07:00
Avesh Agarwal
4ba39b4722 Update mounttest container image version to 0.7 everywhere. 2016-09-02 09:03:11 -04:00
Kubernetes Submit Queue
ca98482da9 Merge pull request #31945 from Random-Liu/prepull-common-test-image
Automatic merge from submit-queue

Add images used in common test into prepull image list.

Addresses https://github.com/kubernetes/kubernetes/issues/31774#issuecomment-243800830.

Fixes #31774, #31579.

This PR added all images in common test into the node e2e prepull list.
Mark P2 because this help get rid of image pulling flake.

@yujuhong
2016-09-02 01:36:55 -07:00
Random-Liu
76b35cf387 Add images used in common test into prepull image list. 2016-09-01 21:20:49 -07:00
Random-Liu
5420138378 Add automated docker performance validation. 2016-09-01 19:18:20 -07:00
Michael Taufen
3ad9a1e423 Move wait for pressure to subside to AfterEach so we still wait if the test for eviction order fails 2016-09-01 14:39:43 -07:00
Kubernetes Submit Queue
d3270bce71 Merge pull request #31566 from ronnielai/container-gc
Automatic merge from submit-queue

Avoid disk eviction node e2e test using up all the disk space
2016-09-01 14:19:25 -07:00
Amey Deshpande
6a2201f410 Pick a specific GCI version by default on GCE.
Prior to this change, a K8s branch (master as well as release) was
pinned to a GCI milestone.  It would pick up the latest GCI release on
that milestone at the time of cluster creation.  The rationale was the
K8s users would automatically get the bug fixes in newer versions of
GCI.  However in practice, it makes the runtime environment
non-deterministic, and lack of continuous e2e tests mean we would run
into breakages sooner or later.

With this change, each K8s release will pick a specific version
of GCI by default (similar to how the Debian-based container-vm gets used).
Users can override the default version through KUBE_GCE_MASTER_IMAGE and
KUBE_GCE_NODE_IMAGE environment variables.

We expect the default GCI version will be updated relatively frequently stay
updated with newer GCI releases.  We can also automate the process to
automatically bump the hard-coded GCI version in future.
2016-08-31 17:26:00 -07:00
Michael Taufen
af0a0c6367 Enable dynamic kubelet configuration for node e2e Jenkins serial tests
This commit enables the dynamic kubelet configuration feature for the
node e2e Jenkins serial tests, which is where the test for dynamic kubelet
configuration currently runs.
2016-08-31 13:54:57 -07:00
Michael Taufen
a40b2cbe10 feature-gates flag plumbing for node e2e tests
This gives the node e2e test binary a --feature-gates flag that populates a
FeatureGates field on the test context. The value of this field is forwarded
to the kubelet's --feature-gates flag and is also used to populate the global
DefaultFeatureGate object so that statically-linked components see the same
feature gate settings as provided via the flag.

This means that you can set feature gates via the TEST_ARGS environment
variable when running node e2e tests. For example:

TEST_ARGS='--feature-gates=DynamicKubeletConfig=true'
2016-08-31 13:52:08 -07:00
Michael Taufen
b3e9875fcc Wait before trying to start a new pod after the eviction test
This should stop the test from flaking while we figure out why there is
a mismatch between the reported pressure condition and the eviction
manager's decision to evict due to memory pressure.
2016-08-31 10:42:20 -07:00
Kubernetes Submit Queue
3b404bd213 Merge pull request #31651 from Random-Liu/move-host-info-around-test-result
Automatic merge from submit-queue

Node E2E: Move host info around test result.

Discussed offline with @yujuhong and @dchen1107. Currently, the node e2e result is organized as:
```
================================================================
Success Finished Host tmp-node-e2e-b6c375c7-e2e-node-containervm-v20160321-image Test Suite
{ginkgo-output}
{framework-error}
================================================================
```
This makes it painful to find which image the test is failing on. The `{ginkgo-output}` is usually quite long, so we have to scroll mouse up and down to find the host name.
This PR changes the test result to:
```
================================================================
Start Host tmp-node-e2e-b6c375c7-e2e-node-containervm-v20160321-image Test Suite
{ginkgo-output}
Success Finished Host tmp-node-e2e-b6c375c7-e2e-node-containervm-v20160321-image Test Suite
{framework-error}
================================================================
```
This is not perfect, but much better than before. We can easily find the host name under the ginkgo test result, like this:
```
================================================================
Start Host test-gci-dev-54-8743-3-0 Test Suite
Running Suite: E2eNode Suite
============================
Random Seed: 1472511489 - Will randomize all specs
Will run 0 of 131 specs

Running in parallel across 8 nodes

I0829 22:58:13.727764    1143 e2e_node_suite_test.go:98] Pre-pulling images so that they are cached for the tests.
I0829 22:58:28.562459    1143 e2e_node_suite_test.go:111] Node services started.  Running tests...
I0829 22:58:28.562477    1143 e2e_node_suite_test.go:116] Wait for the node to be ready

SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS
------------------------------
I0829 22:58:29.742596    1143 e2e_node_suite_test.go:136] Stopping node services...
I0829 22:58:29.742650    1143 services.go:673] Killing process 1423 (services) with -TERM
I0829 22:58:29.860893    1143 e2e_node_suite_test.go:141] Tests Finished


Ran 0 of 131 Specs in 16.185 seconds
SUCCESS! -- 0 Passed | 0 Failed | 0 Pending | 131 Skipped 

Ginkgo ran 1 suite in 19.939034297s
Test Suite Passed

Success Finished Host test-gci-dev-54-8743-3-0 Test Suite
================================================================
```

In a following PR, I'll print the test result from different images into different files to make it more clear for debugging. Mark v1.4 because this helps us de-flake test.

/cc @kubernetes/sig-node
2016-08-30 23:08:41 -07:00
Random-Liu
2907d4019d Move host info around test result. 2016-08-30 21:31:04 -07:00
Random-Liu
e7a1b4e16f Do not set stop-services=false for node e2e and add more logs. 2016-08-30 17:52:23 -07:00
Kubernetes Submit Queue
2b755dc480 Merge pull request #31716 from coufon/explicitly_delete_pods_in_node_perf_test
Automatic merge from submit-queue

Explicitly delete pods in node performance tests

This PR explicitly deletes all created pods at the end in node e2e performance related tests.

The large number of pods may cause namespace cleanup times out (in #30878), therefore we explicitly delete all pods for cleaning up.
2016-08-30 16:52:03 -07:00
Kubernetes Submit Queue
09e3fb355b Merge pull request #31629 from rmmh/fix-gubernator-line
Automatic merge from submit-queue

Only print "running gubernator.sh" when actually running it.
2016-08-30 16:10:18 -07:00
Zhou Fang
0167f74c6c explicitly delete pods in node perf tests 2016-08-30 15:18:37 -07:00
Kubernetes Submit Queue
12429e1690 Merge pull request #31664 from coufon/fix_perf_test_limit
Automatic merge from submit-queue

increase latency and resource limit accroding to test results

This PR increases the latency limit of node e2e density test according to previous test results.

Fixed #30878
2016-08-30 09:57:19 -07:00
Kubernetes Submit Queue
cca6d7ddd9 Merge pull request #31634 from timstclair/gubernator
Automatic merge from submit-queue

Cleanup node failure message

Fix missing newline
2016-08-30 00:53:15 -07:00
Kubernetes Submit Queue
4ae5770162 Merge pull request #31370 from mtaufen/dynkubetest-update
Automatic merge from submit-queue

Capitalize feature name in test for dynamic kubelet configuration
2016-08-30 00:53:10 -07:00
Kubernetes Submit Queue
22fe385c13 Merge pull request #31653 from euank/kill-the-updates-for-the-nth-time
Automatic merge from submit-queue

test/node-e2e: Update CoreOS update disabling

Previously in this saga... #25004

This disables update-engine and locksmithd with ignition instead of
cloud-init so that they're really totally 100% disabled. Our ignition guy promises.

Pretty much every way of disabling them with cloud-init is mildly racy.

Fixes #31633 

I think @vishh can say "I told you so" after the comment on https://github.com/kubernetes/kubernetes/pull/30023#discussion-diff-73431324 .. he was right, but it turns out "stop" there doesn't really work either because of the mess that is cloud-init. Fortunately, converting our cloud-init to json and calling it "ignition" works quite well 😄 

Testing done: I ssh'd in and verified that yes, they're disabled. I didn't wait on the e2e tests to pass, so we'll let this PR check that.
2016-08-30 00:08:45 -07:00