Automatic merge from submit-queue
Node E2E: Remove fatal error in e2e_node_suite_test.go
Addresses https://github.com/kubernetes/kubernetes/issues/30779#issuecomment-240532190.
Currently we run node e2e test in parallel, and ginkgo makes sure that we only initialize test framework in the first test node.
However, because we throw out some fatal error during the initialization. Once there is an fatal error, the first test node will die immediately without reporting any error, and the other nodes will exit because the first node is gone with meaningless error.
If kubelet start fails, we'll get something like:
```
------------------------------
Failure [132.485 seconds]
[BeforeSuite] BeforeSuite
/usr/local/google/home/lantaol/workspace/src/k8s.io/kubernetes/test/e2e_node/e2e_node_suite_test.go:138
BeforeSuite on Node 1 failed
/usr/local/google/home/lantaol/workspace/src/k8s.io/kubernetes/test/e2e_node/e2e_node_suite_test.go:138
------------------------------
......
------------------------------
Failure [132.465 seconds]
[BeforeSuite] BeforeSuite
/usr/local/google/home/lantaol/workspace/src/k8s.io/kubernetes/test/e2e_node/e2e_node_suite_test.go:138
BeforeSuite on Node 1 failed
/usr/local/google/home/lantaol/workspace/src/k8s.io/kubernetes/test/e2e_node/e2e_node_suite_test.go:138
```
This PR replaces these fatal errors with gomega assertion, with this PR, we'll get:
```
Failure [132.482 seconds]
[BeforeSuite] BeforeSuite
/usr/local/google/home/lantaol/workspace/src/k8s.io/kubernetes/test/e2e_node/e2e_node_suite_test.go:138
should be able to start node services.
Expected success, but got an error:
<*errors.errorString | 0xc8203351b0>: {
s: "failed to run server start command \"/tmp/ginkgo869068712/e2e_node.test --run-services-mode --server-start-timeout 2m0s --report-dir --node-name lantaol0.mtv.corp.google.com --disable-kubenet=true --cgroups-per-qos=false --manifest-path /tmp/node-e2e-pod221291440 --eviction-hard memory.available<250Mi\": exit status 255",
}
failed to run server start command "/tmp/ginkgo869068712/e2e_node.test --run-services-mode --server-start-timeout 2m0s --report-dir --node-name lantaol0.mtv.corp.google.com --disable-kubenet=true --cgroups-per-qos=false --manifest-path /tmp/node-e2e-pod221291440 --eviction-hard memory.available<250Mi": exit status 255
/usr/local/google/home/lantaol/workspace/src/k8s.io/kubernetes/test/e2e_node/e2e_node_suite_test.go:117
------------------------------
Failure [132.485 seconds]
[BeforeSuite] BeforeSuite
/usr/local/google/home/lantaol/workspace/src/k8s.io/kubernetes/test/e2e_node/e2e_node_suite_test.go:138
BeforeSuite on Node 1 failed
/usr/local/google/home/lantaol/workspace/src/k8s.io/kubernetes/test/e2e_node/e2e_node_suite_test.go:138
------------------------------
......
------------------------------
Failure [132.465 seconds]
[BeforeSuite] BeforeSuite
/usr/local/google/home/lantaol/workspace/src/k8s.io/kubernetes/test/e2e_node/e2e_node_suite_test.go:138
BeforeSuite on Node 1 failed
/usr/local/google/home/lantaol/workspace/src/k8s.io/kubernetes/test/e2e_node/e2e_node_suite_test.go:138
```
This is much more informative.
/cc @kubernetes/sig-node
Automatic merge from submit-queue
Node E2E: Wait for node ready before the node e2e test started.
Fixes https://github.com/kubernetes/kubernetes/issues/30252.
This PR makes node e2e test wait for exactly one node ready before running other test.
@ronnielai @mtaufen
Automatic merge from submit-queue
Add benchmark to jenkins
This PR contains the following changes:
1. Add more tests in density benchmark test;
2. Add the peak value (100%) in latency and CPU usage statistic data;
3. Move the Ginkgo focus flag from e2e_remote.go to run_e2e.go;
4. Support running benchmark in run_e2e.go. The benchmark configuration file is an extension of image configuration. Each item requires additional GCE machine type (e.g. n1-standard-1, default value will be used if empty) and test names (Ginkgo focus regex strings). A test item is regarded as benchmark if the tests field is non-empty.
Automatic merge from submit-queue
Node Conformance Test: Move namespace controller to services
For #30122, #30174.
Based on #30116, #30198.
**Please only review the 3rd PR.**
This PR is part of our roadmap to package node conformance test.
The 1st commit is from #30116, which started e2e services in a separate process.
The 2nd commit is from #30198, it statically linked etcd into the node e2e framework.
The 3rd commit is new, it moved namespace controller into e2e services.
@dchen1107 @vishh
/cc @kubernetes/sig-node @kubernetes/sig-testing
Automatic merge from submit-queue
Unblock iterative development on pod-level cgroups
In order to allow forward progress on this feature, it takes the commits from #28017#29049 and then it globally disables the flag that allows these features to be exercised in the kubelet. The flag can be re-added to the kubelet when its actually ready.
/cc @vishh @dubstack @kubernetes/rh-cluster-infra
Automatic merge from submit-queue
When running inside docker, activate service account ASAP
Also switching to just use `GOOGLE_APPLICATION_CREDENTIALS`, rather than both.
x-ref https://github.com/kubernetes/test-infra/issues/318
Automatic merge from submit-queue
Wait for memory to be reclaimed after node_e2e MemoryEviction test
This helps prevent interference with other tests that run immediately after the MemoryEviction test.
/cc @Random-Liu @coufon
Automatic merge from submit-queue
Gubernator bug fixes: mv and GCS bucket permissions
Fixed issue where results file was not moved correctly, and also the permissions issue with the GCS bucket.
Will rebase after #30414 is merged
@timstclair
Automatic merge from submit-queue
Add logging time series to benchmark test
This PR adds a new file benchmark_util.go which contains tool functions for benchmark (we can migrate benchmark related functions into it).
The PR logs time series data for density benchmark test.
Automatic merge from submit-queue
Node E2E: Make readiness check handling process exits with 0 exit code.
As is mentioned by @mtaufen:
"there is a problem with the way service `start` is currently implemented in test/e2e_node/e2e_service.go. If the Kubelet exits with status 0 before the health check completes, cmdErrorChan will be closed and, as a result, nil will be read from that channel, and you will return a nil error from `start`."
This PR changes the logic to:
1) If the err channel returns an error, return the error
2) If the err channel returns a nil, ignore it and continue checking readiness.
3) If the err channel is closed before readiness check succeeds, replace it with `blockCh` and continue checking readiness.
@mtaufen
/cc @kubernetes/sig-node
Automatic merge from submit-queue
Add GUBERNATOR flag which produces g8r link for node e2e tests
When you run 'make tests-e2e-node REMOTE=true GUBERNATOR=true' outputs a URL to view the test results on Gubernator. ~~Should work after my PR for Gubernator is merged.~~
@timstclair
Automatic merge from submit-queue
Add tag [benchmark] to node-e2e-test where performance limits are not verified
This PR adds a new tag "[benchmark]" to density and resource-usage node e2e test. The performance limits will not be verified at the end of benchmark tests.
Automatic merge from submit-queue
Run CI Jenkins node e2e tests in project k8s-jkns-ci-node-e2e
Fixes#27648.
If node VMs leak, they should only harm themselves, not the rest of Jenkins.
This also lets us do VM cleanup without worrying that we might accidentally delete important Jenkins VMs.
The `k8s-jkns-ci-node-e2e` should have the right ACLs in place already. The quota is at defaults, but I don't think we'll need to increase it at this point.
Automatic merge from submit-queue
Update core etcd references to use 3.0.4
This updates the core references to use 3.0.4.
There are still legacy references in the code base that should be cleaned, or just removed but I'm reluctant to purge.
/cc @kubernetes/sig-scalability
Automatic merge from submit-queue
Add Time Series Data and Labels in Node density test
This pull requests contain:
1. Increase the pod creation latency limit according to test results;
2. Add 'GetResourceSeriesWithLabels' in 'resource_collector.go' to provide resource usage time series data;
3. Modify 'GetBasicCPUStats' in 'resource_collector.go' to make a copy of CPU usage array before sorting (otherwise time series data is disordered);
4. Add 'ResourceUsageToPerfDataWithLabels' and 'CPUUsageToPerfDataWithLabels' to attach labels to 'PerfData' for benchmark dashboard;
<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.kubernetes.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.kubernetes.io/reviews/kubernetes/kubernetes/30333)
<!-- Reviewable:end -->