Automatic merge from submit-queue
Add test/test_owners.csv, for automatic assignment of test failures.
This file will be read by the munger -- see kubernetes/contrib#1264
This also includes a simple script to do minor automatic updates to the CSV.
I'd like to get `update_owners.py` into a more usable state -- right now the CSV is based directly on the Google Sheets data. It has 9 outdated tests and is missing 80 new tests.
I can randomly assign new tests to people on kubernetes-maintainers, but are there any caveats to how the assignment should work? Should they be load balanced? Should some people in the group not receive issues? Etc.
Automatic merge from submit-queue
Fix node e2e issues on selinux enabled systems
It fixes following 3 node e2es:
```
[Fail] [k8s.io] Container Runtime Conformance Test container runtime conformance blackbox test when starting a container that exits [It] it should run with the expected status [Conformance]
/root/upstream-code/gocode/src/k8s.io/kubernetes/test/e2e_node/runtime_conformance_test.go:114
[Fail] [k8s.io] Kubelet metrics api when querying /stats/summary [It] it should report resource usage through the stats api
/root/upstream-code/gocode/src/k8s.io/kubernetes/test/e2e_node/kubelet_test.go:158
```
```
[Fail] [k8s.io] Container Runtime Conformance Test container runtime conformance blackbox test when starting a container that exits [It] should report termination message if TerminationMessagePath is set [Conformance]
/root/upstream-code/gocode/src/k8s.io/kubernetes/test/e2e_node/runtime_conformance_test.go:150
```
@kubernetes/rh-cluster-infra
Automatic merge from submit-queue
e2e: increase timeout when waiting for deployment pods to be deleted
Use the same timeout as the one used for waiting for the deployment
reaper to complete.
Takes a stab at https://github.com/kubernetes/kubernetes/issues/28067
@kubernetes/deployment PTAL
Automatic merge from submit-queue
Reorganize volume controllers and manager
* Move both PV and attach/detach volume controllers to `controllers/volume` (closes#26222)
* Rename `kubelet/volume` to `kubelet/volumemanager`
* Add/update OWNER files
Automatic merge from submit-queue
Add MinReadySeconds to rolling updater
Add MinReadySeconds support to RollingUpdater that allows to specify the number of seconds to wait on top of the pod is "ready" because its readiness probe passed.
Automatic merge from submit-queue
Fix node confomance test
Fixes https://github.com/kubernetes/kubernetes/issues/28255, https://github.com/kubernetes/kubernetes/issues/28250, https://github.com/kubernetes/kubernetes/issues/28341.
The main reason of the flake is that in the failed test expects the `PodPhase` to keep `Pending`. It did `Eventually` check and `Consistently` check for 5 seconds. However, the default `PodPhase` is `Pending`, when the check passes, the `PodStatus` could still be in default state.
After that, the test expects the container status to be `Waiting`, which may not be the case, because the default `ContainerStatuses` is empty, and the pod could still be in the default state.
This PR changes the test to ensure `ContainerStatuses` first and then check the `PodPhase` after that.
Mark P1 because the test fails relatively frequently and does block some PRs.
@pwittrock
/cc @liangchenye @ncdc
[]()
Automatic merge from submit-queue
Use slices of items to clean up after tests
Fixes#27582.
We used to maintain a pointer variable for each process to kill after the
tests finish. @lavalamp suggested using a slice instead, which is a much
cleaner solution. This implements @lavalamp's suggestion and also extends
the idea to tracking directories that need to be removed after the tests finish.
This also means that we should no longer check for nil `killCmd`s inside
`func (k *killCmd) Kill() error {...}` (see #27582 and #27589). If a nil
`killCmd` makes it in there, something is bad elsewhere and we want to see
the nil pointer exception immediately.
Mentioning @timstclair and @euank wrt the original issue/PR.
Automatic merge from submit-queue
Federated Services e2e: Simplify logic and logging around verificatio…
Simplify logic and logging around verification of underlying services.
Fixes#28269.
Without this PR, service verification in 4 of our e2e tests sometimes fails.
[Fail] [k8s.io] Kubelet metrics api when querying /stats/summary [It] it should report resource usage through the stats api
/root/upstream-code/gocode/src/k8s.io/kubernetes/test/e2e_node/kubelet_test.go:158
[Fail] [k8s.io] Container Runtime Conformance Test container runtime conformance blackbox test when starting a container that exits [It] should report termination message if TerminationMessagePath is set [Conformance]
/root/upstream-code/gocode/src/k8s.io/kubernetes/test/e2e_node/runtime_conformance_test.go:150
[Fail] [k8s.io] Container Runtime Conformance Test container runtime conformance blackbox test when starting a container that exits [It] it should run with the expected status [Conformance]
/root/upstream-code/gocode/src/k8s.io/kubernetes/test/e2e_node/runtime_conformance_test.go:114
Fixes#27582
We used to maintain a pointer variable for each process to kill after the
tests finish. @lavalamp suggested using a slice instead, which is a much
cleaner solution. This implements @lavalamp's suggestion and also extends
the idea to tracking directories that need to be removed after the tests finish.
This also means that we should no longer check for nil `killCmd`s inside
`func (k *killCmd) Kill() error {...}` (see #27582 and #27589). If a nil
`killCmd` makes it in there, something is bad elsewhere and we want to see
the nil pointer exception immediately.
Automatic merge from submit-queue
Remove duplicated nginx image. Use nginx-slim instead
This PR removes the image `gcr.io/google_containers/nginx:1.7.9` and uses `gcr.io/google_containers/nginx-slim:0.7`.
Besides removing the duplication `1.7.9` is 16 months old.
Automatic merge from submit-queue
Fix federation e2e tests by correctly managing cluster clients
1. The main fix: Correct overall BeforeEach() to create a new set of cluster clients, rather than just append to the set created by all previous tests. This was screwing up a lot of stuff in difficult to diagnose ways.
2. Add lots of debug logging.
3. Be better about cleaning up after each test.
```
SUCCESS! -- 6 Passed | 0 Failed :-)
```
cc @nikhiljindal @madhusudancs @mfanjie @colhom FYI
Automatic merge from submit-queue
Add two pd tests with default grace period
Add two tests in pd.go. They are same as the flaky test, but the pod deletion has default grace period
Automatic merge from submit-queue
Refactored, expanded and fixed federated-services e2e tests.
1. Moved BeforeEach() and AfterEach() to an inner scope, to prevent clashes with Framework's BeforeEach() and AfterEach(). Morte to come on this, as it's a major bug in our use of Ginkgo, and affects many other tests.
2. Keep track of which clusters we have created namespaces in, so that we don't try to delete namespaces out of clusters that we didn't create them in (e.g. the primary cluster, where the framework already creates and deleted the required namespace).
3. Separate tests for federated service creation and verification that underlying services are created correctly.
4. For DNS resolution tests, create backend pods (and delete on cleanup) where required).
5. For non-local DNS resolution, delete a backend pod in one cluster to test, and in the remainder of clusters on cleanup.
6. Lots of refactoring to make code re-usable across multiple test.
7. Lots of debugging/fixing to make sure that everything that the testscreate are cleaned up properly afterwards, and don't clash with the cleanups done by the e2e Framework.
Automatic merge from submit-queue
TLS bootstrap API group (alpha)
This PR only covers the new types and related client/storage code- the vast majority of the line count is codegen. The implementation differs slightly from the current proposal document based on discussions in design thread (#20439). The controller logic and kubelet support mentioned in the proposal are forthcoming in separate requests.
I submit that #18762 ("Creating a new API group is really hard") is, if anything, understating it. I've tried to structure the commits to illustrate the process.
@mikedanese @erictune @smarterclayton @deads2k
```release-note-experimental
An alpha implementation of the the TLS bootstrap API described in docs/proposals/kubelet-tls-bootstrap.md.
```
[]()
Automatic merge from submit-queue
Add EndpointReconcilerConfig to master Config
Add EndpointReconcilerConfig to master Config to allow downstream integrators to customize the reconciler and reconciliation interval when starting a customized master
@kubernetes/sig-api-machinery @deads2k @smarterclayton @liggitt @kubernetes/rh-cluster-infra
Automatic merge from submit-queue
Skip multi-zone e2e tests unless provider is GCE, GKE or AWS
No need to fail the tests. If label is not present then it means that node is not in any zone.
Related issue: #27372
Automatic merge from submit-queue
Convert service account token controller to use a work queue
Converts the service account token controller to use a work queue. This allows parallelization of token generation (useful when there are several simultaneous namespaces or service accounts being created). It also lets us requeue failures to be retried sooned than the next sync period (which can be very long).
Fixes an issue seen when a namespace is created with secrets quotaed, and the token controller tries to create a token secret prior to the quota status having been initialized. In that case, the secret is rejected at admission, and the token controller wasn't retrying until the resync period.
Automatic merge from submit-queue
Mark "RW PD, remove it, then schedule" test flaky
Mark test as flaky while it is being investigated. Tracked by https://github.com/kubernetes/kubernetes/issues/27691
Assigning to @jlowdermilk since he's on call
Add EndpointReconcilerConfig to master Config to allow downstream integrators to customize the reconciler
and reconciliation interval when starting a customized master.
Automatic merge from submit-queue
e2e: Allow skipping tests for specific runtimes, skip a few tests under rkt
The main benefit of this is that it gives a developer more useful output (more signal to noise) for things that are known broken on that runtime.
cc @kubernetes/rktnetes-maintainers , @ixdy
I'll run this PR through our jenkins and make sure things look happy and compare to the e2e results for this PR.
Automatic merge from submit-queue
[Refactor] QOS to have QOS Class type for QoS classes
This PR adds a QOSClass type and initializes QOSclass constants for the three QoS classes.
It would be good to use this in all future QOS related features.
This would be good to have for the (Pod level cgroups isolation proposal)[https://github.com/kubernetes/kubernetes/pull/26751] that i am working on aswell.
@vishh PTAL
Signed-off-by: Buddha Prakash <buddhap@google.com>
Automatic merge from submit-queue
e2e.framework.util.StartPods: panic if the number or replicas is zero
The number of pods to start must be non-zero.
Otherwise the function waits for pods forever if ``waitForRunning`` is true.
It the number of replicas is zero, panic so the mistake is heard all over the e2e realm.
Update all callers of StartPods to test for non-zero number of replicas.
Automatic merge from submit-queue
Set grace period to 0 when deleting namespaces after the test.
Otherwise, we try to run the next test and the pods are still there.
Automatic merge from submit-queue
Proportionally scale paused and rolling deployments
Enable paused and rolling deployments to be proportionally scaled.
Also have cleanup policy work for paused deployments.
Fixes#20853Fixes#20966Fixes#20754
@bgrant0607 @janetkuo @ironcladlou @nikhiljindal
<!-- Reviewable:start -->
---
This change is [<img src="http://reviewable.k8s.io/review_button.svg" height="35" align="absmiddle" alt="Reviewable"/>](http://reviewable.k8s.io/reviews/kubernetes/kubernetes/20273)
<!-- Reviewable:end -->
Automatic merge from submit-queue
e2e: Delete old code
These tests were added commented out over a year ago. Now they don't compile. The port forward test has a whole file devoted to replacing it (`e2e/portforward.go`) and while the exec test doesn't have a perfect replacement, it has several tests that cover for it (exec over a websocket, an e2e_node test, all the kubectl execs). If we want that test, it would be better to write it fresh anyways.
cc @ncdc
Automatic merge from submit-queue
Use gcloud for default node pool and api for other in cluster autoscaler e2e test
cc: @piosz @jszczepkowski @fgrzadkowski
Currently there is a problem with gcloud when non-default pool is used for cluster update. So we temporarily switch to the old ca-enable method for non-default pools until it is fixed.
Automatic merge from submit-queue
A few changes to federated-service e2e test.
Most of the changes that get the test to pass have been made already or
elsewhere. Here we restructure a bit fixing a nesting problem, extend the
timeouts, and start creating distinct backend pods that I'll delete in the
non-local test (coming shortly).
Also some extra debugging info in the DNS code. I made some upstream
changes to skydns in https://github.com/skynetservices/skydns/pull/283
For #27739
Includes a commit from @madhusudancs that I will remove once his merges.
Automatic merge from submit-queue
e2e_node: lower the log verbosity level
The current level is so high that the logs are almost unreadable.
This fixes#27593
Most of the changes that get the test to pass have been made already or
elsewhere. Here we restructure a bit fixing a nesting problem, extend
the timeouts, and start creating distinct backend pods that I'll delete
in the non-local test (coming shortly).
Also some extra debugging info in the DNS code. I made some upstream
changes to skydns in https://github.com/skynetservices/skydns/pull/283
Automatic merge from submit-queue
Fixes a node e2e test error
Fixes following node e2e test error:
[k8s.io] Kubelet metrics api when querying /stats/summary [It] it should report resource usage through the stats api
And the logs show following error:
```
Jun 21 15:57:13 localhost journal: tee: /test-empty-dir-mnt: Is a directory
```
And the test fails with:
```
------------------------------
• Failure [310.665 seconds]
[k8s.io] Kubelet
/root/upstream-code/gocode/src/k8s.io/kubernetes/test/e2e/framework/framework.go:685
metrics api
/root/upstream-code/gocode/src/k8s.io/kubernetes/test/e2e_node/kubelet_test.go:161
when querying /stats/summary
/root/upstream-code/gocode/src/k8s.io/kubernetes/test/e2e_node/kubelet_test.go:160
it should report resource usage through the stats api [It]
/root/upstream-code/gocode/src/k8s.io/kubernetes/test/e2e_node/kubelet_test.go:159
Timed out after 300.000s.
Expected
<*errors.errorString | 0xc82026b6f0>: {
s: "expected \"volume used\" to not be zero",
}
to be nil
/root/upstream-code/gocode/src/k8s.io/kubernetes/test/e2e_node/kubelet_test.go:158
------------------------------
```
@kubernetes/rh-cluster-infra
Automatic merge from submit-queue
increase addon check interval
Do static pods have a crash loop back off? If so, this test would be much faster if we restarted the kubelet to clear that.
Fixes#26770
Automatic merge from submit-queue
Add integration test for binding PVs using label selectors
Adds an integration test for persistent volume claim 'MatchExpressions' label selector.
Automatic merge from submit-queue
Fix 7 broken example e2e tests
Fixes#27325, Fixes#27727
7 broken example e2e tests:
- [x] Spark
* `namespace` is specified in example yaml files which conflict with e2e test namespaces, fixed by removing the namespace in yaml (the yaml files of [spark example](https://github.com/kubernetes/kubernetes/tree/master/examples/spark) doesn't need the namespace specified since it's specified in its context) -- cc @k82 who added namespace to Spark example in #23807
* wait for pods to exist before determining if it's running
- [x] Hazelcast
* wait for pods to exist before determining if it's running
- [x] Redis
* image `kubernetes/redis:v2` is not found, changed to `kubernetes/redis:v1` instead
* wait for pods to exist before determining if it's running
- [x] Celery-RabbitMQ
* remove 1 redundant call to `forEachPod`
* wait for pods to exist before determining if it's running
- [x] Cassandra
* fix `kubectl exec` on incorrect pod name
* fix getting endpoint ip addresses before creating pods
* wait for pods to exist before determining if it's running
- [x] Storm
* wait for pods to exist before determining if it's running
- [x] RethinkDB
* wait for pods to exist before determining if it's running
[]()
[k8s.io] Kubelet metrics api when querying /stats/summary [It] it should report resource usage through the stats api
And the logs show following error:
Jun 21 15:57:13 localhost journal: tee: /test-empty-dir-mnt: Is a directory
Automatic merge from submit-queue
Reapply ScheduledJob tests (2ab885a53a)
Re-applied the ScheduledJob tests (#25737) which were reverted due to an integration test error in #27184.
The problem was in `TestBatchGroupBackwardCompatibility` which is testing backwards compatibility for storing jobs (`extensions/v1beta1` vs `batch/v1`), which is not needed for `batch/v2alpha1`. I've added a skip to aforementioned test for that group. See `test/integration/master_test.go` for the actual fix.
@caesarxuchao @mikedanese ptal
@piosz @jszczepkowski @erictune fyi
[]()
Automatic merge from submit-queue
GCE provider: Limit Filter calls to regexps rather than insane blobs
Filters can't exceed 4k, and GET requests against the GCE API are also limited, so these break down in different ways at different cluster counts. Fix it by introducing an advisory `node-instance-prefix` configuration in the GCE provider that can hint the `EnsureLoadBalancer`/`UpdateLoadBalancer code` (and the firewall creation/update code). If it's not there, or wrong (a hostname that's registered violates it), just ignore it and grab the whole project.
Fixes#27731
[]()
Filters can't exceed 4k, and GET requests against the GCE API are also
limited, so these break down in different ways at different cluster
counts. Fix it by introducing an advisory node-instance-prefix
configuration in the GCE provider that can hint the
EnsureLoadBalancer/UpdateLoadBalancer code (and the firewall
creation/update code). If it's not there, or wrong (a hostname that's
registered violates it), just ignore it and grab the whole project.