Automatic merge from submit-queue (batch tested with PRs 36419, 38330, 37718, 38244, 38375)
Guarantees drop packets commands succeed in reboot test
Fixes the main case in #33405 and #36230.
Previous attempted fix in #38057.
During the reboot test, the iptables command that was supposed to take the node offline failed to exec.
Turned out the xtables lock was holding by other processes led to this failure. Logs as below:
```
I1202 20:00:29.686] Dec 2 20:00:29.685: INFO: ssh jenkins@146.148.111.167:22: stdout:
"+ sleep 10
+ sudo iptables -I INPUT 1 -s 127.0.0.1 -j ACCEPT
Another app is currently holding the xtables lock. Perhaps you want to use the -w option?"
I1202 20:00:29.686] Dec 2 20:00:29.685: INFO: ssh jenkins@146.148.111.167:22: stderr: ""
I1202 20:00:29.686] Dec 2 20:00:29.685: INFO: ssh jenkins@146.148.111.167:22: exit code: 0
```
This reboot test won't pass if any one of these iptables commands fails. This PR put "reboot" commands into while loops to guarantee it retries until succeed.
`sudo iptables -t filter -nL` is removed since it is clear now that the `FILTER` rules won't be clobbered.
(Tests passed on local cluster.)
@bprashanth
Automatic merge from submit-queue (batch tested with PRs 36071, 32752, 37998, 38350, 38401)
Pass addressable values to DeepCopy
Extracted from https://github.com/kubernetes/kubernetes/pull/35728
These are the places we are currently calling DeepCopy incorrectly, and we need to fix, even if we don't pick up the changes to DeepCopy in #35728:
* creating a new cloner means we have no generated functions registered
* passing non-addressable values doesn't pick up generated deep copy functions, and forces us into reflective mode
Automatic merge from submit-queue (batch tested with PRs 37092, 37850)
Turns on dns horizontal scaling tests for GKE
Seems like the dns-autoscaler is already enabled in [this recent gke build](https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-e2e-gci-gke/769/).
Turning on the corresponding e2e tests to increase test coverage.
Probably better to wait for this fix#37261 to go in first.
@bowei @bprashanth
cc @maisem @roberthbailey
Automatic merge from submit-queue (batch tested with PRs 37325, 38313, 38141, 38321, 38333)
Cleanup firewalls, add nginx ingress to presubmit
Make the firewall cleanup code follow the same pattern as the other cleanup functions, and add the nginx ingress e2e to presubmit.
Planning to watch the test for a bit, and if it works alright, I'll add the other Ingress e2e to post-submit merge blocker.
Automatic merge from submit-queue (batch tested with PRs 37325, 38313, 38141, 38321, 38333)
Fix running e2e with 'Completed' kube-system pods
As of now, e2e runner keeps waiting for pods in `kube-system` namespace to be "Running and Ready" if there are any pods in `Completed` state in that namespace.
This for example happens after following [Kubernetes Hosted Installation](http://docs.projectcalico.org/v2.0/getting-started/kubernetes/installation/#kubernetes-hosted-installation) instructions for Calico, making it impossible to run conformance tests against the cluster. It's also to possible to reproduce the problem like that:
```
$ cat testjob.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: tst
namespace: kube-system
spec:
template:
metadata:
name: tst
spec:
containers:
- name: tst
image: busybox
command: ["echo", "test"]
restartPolicy: Never
$ kubectl create -f testjob.yaml
$ go run hack/e2e.go -v --test --test_args='--ginkgo.focus=existing\s+RC'
```
Automatic merge from submit-queue
Delete regional static-ip instead of global for type=lb
Global vs region is the difference between
```
$ gcloud compute addresses delete foo --global
$ gcloud compute addresses delete foo --region us-central1
```
Type=LoadBalancer users the second type and were were doing the first.
Also adds some logging.
Automatic merge from submit-queue (batch tested with PRs 38294, 37009, 36778, 38130, 37835)
fix permissions when using fsGroup
Currently, when an fsGroup is specified, the permissions of the defaultMode are not respected and all files created by the atomic writer have mode 777. This is because in `SetVolumeOwnership()` the `filepath.Walk` includes the symlinks created by the atomic writer. The symlinks have mode 777 when read from `info.Mode()`. However, when the are chmod'ed later, the chmod applies to the file the symlink points to, not the symlink itself, resulting in the wrong mode for the underlying file.
This PR skips chmod/chown for symlinks in the walk since those operations are carried out on the underlying file which will be included elsewhere in the walk.
xref https://bugzilla.redhat.com/show_bug.cgi?id=1384458
@derekwaynecarr @pmorie
Automatic merge from submit-queue (batch tested with PRs 38173, 38151, 38197, 38221)
test: wait for ready replica set before adopting
Reworked version of https://github.com/kubernetes/kubernetes/pull/36439 which was reverted in https://github.com/kubernetes/kubernetes/pull/38049. This PR doesn't use any of the new status API added in replica sets so it should cause no trouble with upgrade tests.
@kubernetes/deployment @smarterclayton
Automatic merge from submit-queue (batch tested with PRs 37032, 38119, 38186, 38200, 38139)
New ns param for NewClusterVerification
**What this PR does / why we need it**: Allows the test to specify alternate namespaces to when waiting for pods to be in a specific state.
**Which issue this PR fixes**: fixes#38138
**Special notes for your reviewer**: Minor fix
**Release note**: None
Automatic merge from submit-queue
[Federation] Separate the cleanup phases of service and service shards so that service shards can be cleaned up even after the service is deleted elsewhere.
Fixes Federated Service e2e test.
This separation is necessary because "Federated Service DNS should be
able to discover a federated service" e2e test recently added a case
where it deletes the service from federation but not the shards from
the underlying clusters.
Because of the way cleanup was implemented in the AfterEach block
currently, we did not cleanup any of the underlying shards. However,
separating the two phases of the cleanup needs this separation.
cc @kubernetes/sig-cluster-federation @nikhiljindal
Automatic merge from submit-queue (batch tested with PRs 37328, 38102, 37261, 31321, 38146)
Fixes flake: wait for dns pods terminating after test completed
From #37194. Based on #36600. Please only look at the second commit.
As mentioned in [comment](https://github.com/kubernetes/kubernetes/issues/37194#issuecomment-262007174), "DNS horizontal autoscaling" test does not wait for the additional pods to be terminated and this may lead to the failure of later tests.
This fix adds a wait loop at the end of the serial test to ensure the cluster recovers to the original state. In the non-serial test it does not wait for the additional pods terminating because it will not affect other tests, given they are able to be run simultaneously. Plus wait for pods terminating will take certain amount of time.
Note this only fixes certain case of #37194. I noticed there are other failures irrelevant to dns autoscaler. LIke [this one](https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-e2e-gci-gce-serial/34/).
@bprashanth @Random-Liu
Automatic merge from submit-queue (batch tested with PRs 37328, 38102, 37261, 31321, 38146)
Make thirdparty codec able to decode DeleteOptions
Fix#37278.
Without this PR, the gvk sent to the delegated codec will be the thirdparty one, which is not recognized by the delegated codec (usually api.Codecs).
Automatic merge from submit-queue (batch tested with PRs 38076, 38137, 36882, 37634, 37558)
Make logging for gcl e2e test more verbose
To help debug https://github.com/kubernetes/kubernetes/issues/37241
CC @piosz
Automatic merge from submit-queue (batch tested with PRs 36352, 36538, 37976, 36374)
test: update deployment helper to return better error messages
@kubernetes/deployment the problem with https://github.com/kubernetes/kubernetes/issues/36270 is that the selector key is never added in the deployment but this change would make it clearer.
This separation is necessary because "Federated Service DNS should be
able to discover a federated service" e2e test recently added a case
where it deletes the service from federation but not the shards from
the underlying clusters.
Because of the way cleanup was implemented in the AfterEach block
currently, we did not cleanup any of the underlying shards. However,
separating the two phases of the cleanup needs this separation.
Automatic merge from submit-queue (batch tested with PRs 38049, 37823, 38000, 36646)
Revert "test: update rollover test to wait for available rs before adopting"
This reverts commit 5b7bf78f3f from pr #36439 which appears to have mostly broken the gci-gke test.
Automatic merge from submit-queue (batch tested with PRs 37094, 37663, 37442, 37808, 37826)
Moved gobindata, refactored ReadOrDie refs
**What this PR does / why we need it**: Having gobindata inside of test/e2e/framework prevents external projects from importing the framework. Moving it out and managing refs fixes this problem.
**Which issue this PR fixes**: fixes#37007
Automatic merge from submit-queue (batch tested with PRs 37997, 37939, 37990, 36700, 37258)
Add cluster-level AppArmor E2E test
My goal is to reuse this test for an automated cluster upgrade test.
Automatic merge from submit-queue
test: update rollover test to wait for available rs before adopting
Scenario that happened in https://github.com/kubernetes/kubernetes/issues/35355#issuecomment-257808460
-- Replica set that is about to be adopted has 2 out of 4 ready replicas
-- Deployment is created with 4 replicas, adopts pre-existing replica set, creates a new one, and starts rolling replicas over to the new replica set.
```
Nov 2 01:38:17.088: INFO: At 2016-11-02 01:38:04 -0700 PDT - event for test-rollover-deployment: {deployment-controller } ScalingReplicaSet: Scaled down replica set test-rollover-controller to 3
Nov 2 01:38:17.088: INFO: At 2016-11-02 01:38:04 -0700 PDT - event for test-rollover-deployment: {deployment-controller } ScalingReplicaSet: Scaled up replica set test-rollover-deployment-2505289747 to 1
Nov 2 01:38:17.088: INFO: At 2016-11-02 01:38:04 -0700 PDT - event for test-rollover-deployment-2505289747: {replicaset-controller } SuccessfulCreate: Created pod: test-rollover-deployment-2505289747-iuiei
Nov 2 01:38:17.088: INFO: At 2016-11-02 01:38:04 -0700 PDT - event for test-rollover-deployment-2505289747-iuiei: {default-scheduler } Scheduled: Successfully assigned test-rollover-deployment-2505289747-iuiei to gke-jenkins-e2e-default-pool-33c0400e-6q5m
Nov 2 01:38:17.088: INFO: At 2016-11-02 01:38:05 -0700 PDT - event for test-rollover-deployment: {deployment-controller } ScalingReplicaSet: Scaled up replica set test-rollover-deployment-2505289747 to 2
```
At this point there is no minimum availability for the Deployment (maxUnavailable is 1 meaning desired minimum available is 3 but we only have 2), and the new replica set uses a non-existent image. New replica set is scaled up to 1 (maxSurge is 1), then old replica set is scaled down by one, because cleanupUnhealthyReplicas observes that it has 2 unhealthy replicas - it can only scale down one though because the [maximum replicas it can cleanup is one](d87dfa2723/pkg/controller/deployment/rolling.go (L125)) (4+1-3-1). New replica set is scaled to 2. Available replicas are still 2 (third replica from the old replica set has yet to come up).
-- Deployment is rolled over with a new update. Test reaches for the WaitForDeploymentStatus check but there are only 2 availableReplicas (maxUnavailable is still violated).
This change makes the test wait for a healthy replica set before proceeding thus it should never hit the scenario described above.
@kubernetes/deployment
- Remaining spaghetti untangled
- Missed bazel update and a few hardcoded refs
- New instance of framework.ReadOrDie reference removed post rebase
- Resolve new clientset rebase
- Fixed e2e/generated BUILD dep
- A space
- Missed gobindata ref in golang.sh
Automatic merge from submit-queue
Build vendored copy of go-bindata and use that in go generate step
**What this PR does / why we need it**: as the title says, uses the vendored version of `go-bindata` rather than expecting developers to `go get` it (when building outside docker).
**Which issue this PR fixes**: fixes#34067, partially addresses #36655
**Special notes for your reviewer**: we still call `go generate` far too many times:
```console
~/.../src/k8s.io/kubernetes $ which go-bindata
~/.../src/k8s.io/kubernetes $ make
+++ [1116 17:35:28] Building the toolchain targets:
k8s.io/kubernetes/hack/cmd/teststale
k8s.io/kubernetes/vendor/github.com/jteeuwen/go-bindata/go-bindata
+++ [1116 17:35:29] Generating bindata:
test/e2e/framework/gobindata_util.go
+++ [1116 17:35:30] Building go targets for linux/amd64:
cmd/libs/go2idl/deepcopy-gen
+++ [1116 17:35:35] Building the toolchain targets:
k8s.io/kubernetes/hack/cmd/teststale
k8s.io/kubernetes/vendor/github.com/jteeuwen/go-bindata/go-bindata
+++ [1116 17:35:35] Generating bindata:
test/e2e/framework/gobindata_util.go
+++ [1116 17:35:36] Building go targets for linux/amd64:
cmd/libs/go2idl/defaulter-gen
+++ [1116 17:35:41] Building the toolchain targets:
k8s.io/kubernetes/hack/cmd/teststale
k8s.io/kubernetes/vendor/github.com/jteeuwen/go-bindata/go-bindata
+++ [1116 17:35:41] Generating bindata:
test/e2e/framework/gobindata_util.go
+++ [1116 17:35:42] Building go targets for linux/amd64:
cmd/libs/go2idl/conversion-gen
+++ [1116 17:35:47] Building the toolchain targets:
k8s.io/kubernetes/hack/cmd/teststale
k8s.io/kubernetes/vendor/github.com/jteeuwen/go-bindata/go-bindata
+++ [1116 17:35:47] Generating bindata:
test/e2e/framework/gobindata_util.go
+++ [1116 17:35:48] Building go targets for linux/amd64:
cmd/libs/go2idl/openapi-gen
+++ [1116 17:35:56] Building the toolchain targets:
k8s.io/kubernetes/hack/cmd/teststale
k8s.io/kubernetes/vendor/github.com/jteeuwen/go-bindata/go-bindata
+++ [1116 17:35:56] Generating bindata:
test/e2e/framework/gobindata_util.go
```
Fixing that is a separate effort, though.
cc @sebgoa @ZhangBanger
Automatic merge from submit-queue
Cleanup old cloud resources after 48 hours
With this pr the ingress e2e purges old leaked resources (>48h), so even if tests fail due to leaks, the entire queue won't close till someone bumps up quota through a manual request.
Automatic merge from submit-queue
Adds termination hook in reboot test for debugging
From #33405 and #36230.
Logs the SSH command issued for dropping inbound / outbound traffic to file and dump it out when test ends.
The first `sudo iptables -t filter -nL` is called to confirm the rules are injected. The second `sudo iptables -t filter -nL` is to check whether the rules get clobbered. Adds `date` in between to check time frame.
@bprashanth @freehan
Automatic merge from submit-queue
Update Stateful Set example files for 1.5
1. Remove initialized annotation from statefulset examples
2. Update storage class annotation to beta in statefulset examples
3. Remove alpha limitation on PetSet in cassandra example
cc @erictune @foxish @kow3ns @enisoc @chrislovecnm @kubernetes/sig-apps
```release-note
NONE
```
Automatic merge from submit-queue
Fix package aliases to follow golang convention
Some package aliases are not not align with golang convention https://blog.golang.org/package-names. This PR fixes them. Also adds a verify script and presubmit checks.
Fixes#35070.
cc/ @timstclair @Random-Liu
Automatic merge from submit-queue
Revision handling in federated deployment controller
Deployment controller in regular kubernetes automatically adds an annotation in deployment. This causes a bit of confusion in controller and tests. This PR skips revision annotation in checks. In the next K8S release we will need to have better support for deployment revisions.
Helps with #36588
cc: @nikhiljindal @madhusudancs
Automatic merge from submit-queue
Stop deleting underlying services when federation service is deleted
Fixes https://github.com/kubernetes/kubernetes/issues/36799
Fixing federation service controller to not delete services from underlying clusters when federated service is deleted.
None of the federation controller should do this unless explicitly asked by the user using DeleteOptions. This is the only federation controller that does that.
cc @kubernetes/sig-cluster-federation @madhusudancs
```release-note
federation service controller: stop deleting services from underlying clusters when federated service is deleted.
```
Automatic merge from submit-queue
Skip rather than fail networking tests on single node
**What this PR does / why we need it**:
Needed for the general e2e tidying we need to do for flakey slow tests, imo pre 1.5, see #31402 and so on.
**Which issue this PR fixes** *
Dont fail multinode tests if on a single node cluster, skip instead.
Automatic merge from submit-queue
Fix nil pointer dereference in test framework
Checking the `result.Code` prior to `err` in the if statement causes a panic if result is `nil`. It turns out the formatting of the error is already in `IssueSSHCommandWithResult`, so removing redundant code is enough to fix the issue. Logging the SSH result was also redundant, so I removed that as well.
Automatic merge from submit-queue
Fixes dns autoscaling test flakes
Fixes #36457 and fixes#36569.
#36457 is flake due to the 10 minutes timeout for scaling down cluster. Changes to use `scaleDownTimeout` from [test/e2e/cluster_size_autoscaling.go](https://github.com/kubernetes/kubernetes/blob/master/test/e2e/cluster_size_autoscaling.go), which is 15 minutes.
The failure in #36569 is because we get the schedulable nodes number at the beginning of the test and assume it will not change unless we manually change the cluster size. But below logs indicate there may be nodes become ready after the test has begun.
```
[BeforeEach] [k8s.io] DNS horizontal autoscaling
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/dns_autoscaling.go:71
Nov 10 00:36:26.951: INFO: Condition Ready of node jenkins-e2e-minion-group-x6w1 is false instead of true. Reason: KubeletNotReady, message: Kubenet does not have netConfig. This is most likely due to lack of PodCIDR
STEP: Replace the dns autoscaling parameters with testing parameters
Nov 10 00:36:26.961: INFO: DNS autoscaling ConfigMap updated.
STEP: Wait for kube-dns scaled to expected number
Nov 10 00:36:26.961: INFO: Waiting up to 5m0s for kube-dns reach 8 replicas
...
Expected error:
<*errors.errorString | 0xc420b17ef0>: {
s: "err waiting for DNS replicas to satisfy 8, got 9: timed out waiting for the condition",
}
err waiting for DNS replicas to satisfy 8, got 9: timed out waiting for the condition
not to have occurred
```
This fix puts the logic of counting schedulable nodes into the polling loop. By doing so, the test will have the correct expected replicas count even if schedulable nodes change in between.
@bowei @bprashanth
---
Updates: all `ExpectNoError(err)` are changed to `Expect(err).NotTo(HaveOccurred())`
Automatic merge from submit-queue
federation service e2e: Creating configmap for kube-dns
Ref #37105 and #37143.
Creating kube-dns config map to pass federations flag to kube-dns.
This is required since we moved to the new add on manager. With the old add on manager, we were using kube-dns rc that included the federations flag.
cc @kubernetes/sig-cluster-federation @madhusudancs @bowei @MrHohn
Verified that the tests pass with this change.
Checking the result.Code prior to err in the if statement causes a panic
if result is nil. It turns out the formatting of the error is already in
IssueSSHCommandWithResult, so removing redundant code is enough to fix
the issue. Logging the SSH result was also redundant, so I removed that
as well.
Automatic merge from submit-queue
Modify GCI mounter to enable NFSv3
In order to make NFSv3 work, mounter needs to start rpcbind daemon. This
change modify mounter's Dockerfile and mounter script to start the
rpcbind daemon if it is not running on the host.
After this change, need to make push the image and update the sha number in Changelog.
Automatic merge from submit-queue
Guard the ready replica checking by server version
I fixed replica readiness checking for 1.4->1.5 upgrades by using a field that only exists in versions >=1.4.0 in #36924
This fixed a lot of issues in 1.4->1.5 upgrade testing, but did not fix 1.3->1.5 upgrade tests. I've disabled replica checking for 1.3 masters as the old logic was broken anyway.
This will not affect the 1.3 CI tests. Just 1.3 -> {1.4, 1.5} upgrade tests.
https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/kubernetes-e2e-gke-container_vm-1.3-container_vm-1.5-upgrade-cluster-new/330?log
is an example of this breakage. This is the tell-tale logs:
```console
Nov 22 09:40:50.469: INFO: 11 / 11 pods in namespace 'kube-system' are running and ready (506 seconds elapsed)
Nov 22 09:40:50.469: INFO: expected 5 pod replicas in namespace 'kube-system', 0 are Running and Ready.
Nov 22 09:40:50.469: INFO: POD NODE PHASE GRACE CONDITIONS
```
Automatic merge from submit-queue
Use netexec container in http lifecycle hook test.
Fixes https://github.com/kubernetes/kubernetes/issues/33636.
The original test is using `"echo -e \"HTTP/1.1 200 OK\n\" | nc -l -p 1234` as a simple http server.
However, it seems that this is not very reliable, which may response before golang thinks it should.
So we get the error:
```
I1106 06:14:13.325397 2096 logs.go:41] Unsolicited response received on idle HTTP channel starting with "HTTP/1.1 200 OK\n\n"; err=<nil>
```
This PR changes the test to use the `netexec` container which is a simple http server written by golang and used in many of our networking e2e test. It should be more reliable.
Mark 1.5 since this is fixing a 1.5 release blocking issue. Mark P0 to match the original issue.
@dchen1107
This disables ready replica checking for 1.3 masters, but only from 1.4
or 1.5 clients. The old logic was broken anyway due to overlapping
labels with replica sets.
Automatic merge from submit-queue
Fixed e2e tests for HA master.
Set of fixes that allows HA master e2e tests to pass for removal/addition master replicas.
The summary of changes:
- fixed host name in etcd certs,
- added cluster validation after kube-down,
- fixed the number of master replicas in cluster validation,
- made MULTIZONE=true required for HA master deployments, ensured we correctly handle MULTIZONE=true when user wants to create HA master but not kubelets in multiple zones,
- extended verification of master replicas in HA master e2e tests.
Automatic merge from submit-queue
Test flake: ignore dig failure; wait until timeout
The intent of this test was to continue trying until the timeout has
been reached. The extra assertion makes the test fail early when it
should not.
https://github.com/kubernetes/kubernetes/issues/37144
fix 37114
Automatic merge from submit-queue
Update the timeout in CLOSE_WAIT e2e test
I see some test flakes due to the timeout being too strict,
updating to a larger value.
Adding tail -n 1, it looks like there may be leftover state for other
runs. We really only care about one of the CLOSE_WAIT entries.
Automatic merge from submit-queue
Test case for scalling down before scale up finished
Verifies that current pet (one which was created last by scale up action)
will enter running before scale down will take place
related: #30082
In order to make NFSv3 work, mounter needs to start rpcbind daemon. This
change modify mounter's Dockerfile and mounter script to start the
rpcbind daemon if it is not running on the host.
After this change, need to make push the image and update the sha number in Changelog.
I see some test flakes due to the timeout being too strict,
updating to a larger value.
Adding tail -n 1, it looks like there may be leftover state for other
runs. We really only care about one of the CLOSE_WAIT entries.
Automatic merge from submit-queue
Validate pet set scale up/down order
This change implements test cases to validate that:
- scale up will be done in order
- scale down in reverse order
- if any of the pets will be unhealthy, scale up/down will halt
related: https://github.com/kubernetes/kubernetes/issues/30082
Verifies that current pet (one which was created last by scale up action)
will enter running before scale down will take place
Change-Id: Ib47644c22b3b097bc466f68c5cac46c78dd44552
This change implements test cases to validate that:
scale up will be done in order
scale down in reverse order
if any of the pets will be unhealthy, scale up/down will halt
Change-Id: I4487a7c9823d94a2995bbb06accdfd8f3001354c
Automatic merge from submit-queue
Retry job update after failure to prevent modification conflict
This fixes#34585 flake.
@janetkuo || @kubernetes/sig-apps ptal
I've been getting too many emails recently wrt to that issue, so I wanted to "clean" my inbox a bit 😉
Automatic merge from submit-queue
Add limited config-map support to kube-dns
This is an integration bugfix for https://github.com/kubernetes/kubernetes/issues/36194
```release-note
kube-dns
Added --config-map and --config-map-namespace command line options.
If --config-map is set, kube-dns will load dynamic configuration from the config map
referenced by --config-map-namespace, --config-map. The config-map supports
the following properties: "federations".
--federations flag is now deprecated. Prefer to set federations via the config-map.
Federations can be configured by settings the "federations" field to the value currently
set in the command line.
Example:
kind: ConfigMap
apiVersion: v1
metadata:
name: kube-dns
namespace: kube-system
data:
federations: abc=def
```
- Adds command line flags --config-map, --config-map-ns.
- Fixes 36194 (https://github.com/kubernetes/kubernetes/issues/36194)
- Update kube-dns yamls
- Update bazel (hack/update-bazel.sh)
- Update known command line flags
- Temporarily reference new kube-dns image (this will be fixed with
a separate commit when the DNS image is created)
Automatic merge from submit-queue
Add e2e test for statefulset updates
Verify that one can (manually) update statefulset template
cc @erictune @foxish @kow3ns @kubernetes/sig-apps
Automatic merge from submit-queue
Wait for all Nodes to be schedulable before running e2e tests
This should fix the problem we're seeing when running tests on large clusters.
cc @dchen1107
Automatic merge from submit-queue
Delete taint annotation when removing last taint
It messes with debugging of tests failures.
cc @davidopp @kevin-wangzefeng
Automatic merge from submit-queue
Add more test cases to k8s e2e upgrade tests
**Special notes for your reviewer**:
Added guestbook, secrets, daemonsets, configmaps, jobs to e2e upgrade tests according to the discussions in #35078
Still need to run these test cases in real setup, raised a PR here for initial comments
@quinton-hoole
Automatic merge from submit-queue
Add clarity/retries to proxy url test
Improve one segment of the kube-proxy networking test by:
1. Retrying for 30s
2. Bucketing into 2 failure modes
3. Adding some clarity by describing the exec pod on failure
Althought 1 shouldn't be necessary, I don't think we lose anything if the kube-proxy convenience endpoint doesn't respond immediately, and if it fails for 30s straight it is indicative of something that requires attention probably within 1.5.
Fixes https://github.com/kubernetes/kubernetes/issues/32436
Automatic merge from submit-queue
Enable NFSv4 and GlusterFS tests on cluster e2e tests
Enable NFSv4 and GlusterFS tests on cluster e2e tests for GCI images
only.
Automatic merge from submit-queue
Add e2e test for CockroachDB statefulset
Refactor the code of statefulset e2e test for clustered applications, and add a test for CockroachDB.
The yaml file is copied from examples/cockroachdb/
cc @erictune @foxish @kow3ns @kubernetes/sig-apps
Automatic merge from submit-queue
e2e pod cleanup test: restrict pods to be assigned to nodes observed …
The test checks the individual kubelet /runningPods endpoint based on the
initial list of nodes it observes. It is important that all pods are
scheduled only onto those nodes. Apply node labels to ensure no stray pods on
other nodes.
This fixes#35197
kubectl-in-pod.json has a hardcoded path for
kubectl in its hostPath ('/usr/bin/kubectl').
When the test actually runs on gke, kubectl
is available at '/workspace/kubernetes/platforms/linux/amd64/kubectl'
and NOT at '/usr/bin/kubectl'. So we need to
fix the hostPath for the test to work properly.
Fixes#36586
Automatic merge from submit-queue
Cleanup pod in MatchContainerOutput
MatchContainerOutput always creates a pod and does not cleanup. We need
to fix this to be better at re-trying the scenarios.
When there is an error say in the first attempt of ExpectNoErrorWithRetries
(for example in "Pods should contain environment variables for services" test)
the retries logic calls MatchContainerOutput another time and the
podClient.create fails correctly since the pod was not cleaned up the
first time MatchContainerOutput was called.
Fixes#35089
MatchContainerOutput always creates a pod and does not cleanup. We need
to fix this to be better at re-trying the scenarios.
When there is an error say in the first attempt of ExpectNoErrorWithRetries
(for example in "Pods should contain environment variables for services" test)
the retries logic calls MatchContainerOutput another time and the
podClient.create fails correctly since the pod was not cleaned up the
first time MatchContainerOutput was called.
Fixes#35089
Automatic merge from submit-queue
[kubelet] rename --cgroups-per-qos to --experimental-cgroups-per-qos
This reflects the true nature of "cgroups per qos" feature.
```release-note
* Rename `--cgroups-per-qos` to `--experimental-cgroups-per-qos` in Kubelet
```
Automatic merge from submit-queue
tests: update memory resource limits
```release-note
NONE
```
On ubuntu, the `RestartNever` test OOMs its cgroup limit fairly
frequently.
This bumps the number up to something suitably large since the test
isn't testing anything related to this anyways.
Fixes#36159
Fix based on https://github.com/kubernetes/kubernetes/issues/36159#issuecomment-258992255
cc @yujuhong @saad-ali
The test checks the individual kubelet /runningPods endpoint based on the
initial list of nodes it observes. It is important that all pods are
scheduled only onto those nodes. Apply node labels to ensure no stray pods on
other nodes.
On ubuntu, the `RestartNever` test OOMs its cgroup limit fairly
frequently.
This bumps the number up to something suitably large since the test
isn't testing anything related to this anyways.
Fixes#36159
Automatic merge from submit-queue
Use generous limits in the resource usage tracking tests
These tests are mainly used to catch resource leaks in the soak cluster. Using
higher limits to reduce noise.
This should fix#32942 and #32214.
Automatic merge from submit-queue
Add back e2e tests for disruption budget
The tests were temporarily removed due to problems with api versions in client-go. The test is almost exactly the same as it used to be before api version change.
cc: @caesarxuchao @davidopp
Automatic merge from submit-queue
Restore event messages for replica sets in the deployment controller
Needed to unblock release upgrade tests (see https://github.com/kubernetes/kubernetes/issues/36453)
@kubernetes/deployment ptal
Automatic merge from submit-queue
Node Conformance & E2E: Get node name from node object.
This PR changes the node e2e test framework to get node name from apiserver instead of test flags.
When a user tried out the node conformance test, he found that node conformance test will not work properly if kubelet is started with `hostname-override`.
The reason is that node conformance test is using [the default node name - `os.Hostname`](https://github.com/kubernetes/kubernetes/blob/master/test/e2e_node/e2e_node_suite_test.go#L124), which may be different from `hostname-override`. This will cause test pods not scheduled, and eventually test timeout.
We can expose a flag from node conformance test, and let user set node name themselves if they are using `hostname-override` on kubelet. However, let the framework automatically detect it from apiserver is more user friendly.
/cc @kubernetes/sig-node
This PR 1) only changes node e2e test framework; 2) fixes a problem in node conformance test which is a 1.5 feature. @saad-ali Can we have this in 1.5?
Automatic merge from submit-queue
Add e2e node test for log path
fixes#34661
A node e2e test to check if container logs files are properly created with right content.
Since the log files under `/var/log/containers` are actually symbolic of docker containers log files, we can not use a pod to mount them in and do check (symbolic doesn't supported by docker volume).
cc @Random-Liu
Automatic merge from submit-queue
Migrates addons from RCs to Deployments
Fixes#33698.
Below addons are being migrated:
- kube-dns
- GLBC default backend
- Dashboard UI
- Kibana
For the new deployments, the version suffixes are removed from their names. Version related labels are also removed because they are confusing and not needed any more with regard to how Deployment and the new Addon Manager works.
The `replica` field in `kube-dns` Deployment manifest is removed for the incoming DNS horizontal autoscaling feature #33239.
The `replica` field in `Dashboard` Deployment manifest is also removed because the rescheduler e2e test is manually scaling it.
Some resource limit related fields in `heapster-controller.yaml` are removed, as they will be set up by the `addon resizer` containers. Detailed reasons in #34513.
Three e2e tests are modified:
- `rescheduler.go`: Changed to resize Dashboard UI Deployment instead of ReplicationController.
- `addon_update.go`: Some namespace related changes in order to make it compatible with the new Addon Manager.
- `dns_autoscaling.go`: Changed to examine kube-dns Deployment instead of ReplicationController.
Both of above two tests passed on my own cluster. The upgrade process --- from old Addons with RCs to new Addons with Deployments --- was also tested and worked as expected.
The last commit upgrades Addon Manager to v6.0. It is still a work in process and currently waiting for #35220 to be finished. (The Addon Manager image in used comes from a non-official registry but it mostly works except some corner cases.)
@piosz @gmarek could you please review the heapster part and the rescheduler test?
@mikedanese @thockin
cc @kubernetes/sig-cluster-lifecycle
---
Notes:
- Kube-dns manifest still uses *-rc.yaml for the new Deployment. The stale file names are preserved here for receiving faster review. May send out PR to re-organize kube-dns's file names after this.
- Heapster Deployment's name remains in the old fashion(with `-v1.2.0` suffix) for avoiding describe this upgrade transition explicitly. In this way we don't need to attach fake apply labels to the old Deployments.
Automatic merge from submit-queue
Enable NFS and GlusterFS tests in both node and cluster e2e tests
This PR is to enable NFS and GlusterFS tests on both node and cluster
e2e tests.
It also change the code to use ExecCommandInPod instead of kubectl since
node does not have kubectl available
This PR is to enable NFS and GlusterFS tests on both node and cluster
e2e tests for gci and containervm distro.
It also change the code to use ExecCommandInPod instead of kubectl since
node does not have kubectl available
Automatic merge from submit-queue
Extend test timeout for LB creation in large clusters
This will most probably be necessary to test 3000-node clusters.
Automatic merge from submit-queue
add e2e test for kubectl in a Pod
Add a e2e test to make sure kubectl can talk to the api server when it is mounted in a pod.
Fixes: #33138
Automatic merge from submit-queue
Make GCI nodes mount non tmpfs, ext* & bind mounts using an external mounter
This PR downloads the stage1 & gci-mounter ACIs as part of cluster bring up instead of downloading them dynamically from gcr.io, which was the cause for #36206.
I have also optimized the containerized mounter to pre-load the mounter image once to avoid fetch latency while using it.
Original PR which got reverted: https://github.com/kubernetes/kubernetes/pull/35821
```release-note
GCI nodes use an external mounter script to mount NFS & GlusterFS storage volumes
```
@mtaufen Node e2e is not re-enabled in this PR.
cc @jingxu97
Automatic merge from submit-queue
Node Conformance Test: Containerize the node e2e test
For #30122, #30174.
Based on #32427, #32454.
**Please only review the last 3 commits.**
This PR packages the node e2e test into a docker image:
- 1st commit: Add `NodeConformance` flag in the node e2e framework to avoid starting kubelet and collecting system logs. We do this because:
- There are all kinds of ways to manage kubelet and system logs, for different situation we need to mount different things into the container, run different commands. It is hard and unnecessary to handle the complexity inside the test suite.
- 2nd commit: Remove all `sudo` in the test container. We do this because:
- In most container, there is no `sudo` command, and there is no need to use `sudo` inside the container.
- It introduces some complexity to use `sudo` inside the test. (https://github.com/kubernetes/kubernetes/issues/29211, https://github.com/kubernetes/kubernetes/issues/26748) In fact we just need to run the test suite with `sudo`.
- 3rd commit: Package the test into a docker container with corresponding `Makefile` and `Dockerfile`. We also added a `run_test.sh` script to start kubelet and run the test container. The script is only for demonstration purpose and we'll also use the script in our node e2e framework. In the future, we should update the script to start kubelet in production way (maybe with `systemd` or `supervisord`).
@dchen1107 @vishh
/cc @kubernetes/sig-node @kubernetes/sig-testing
**Release note**:
<!-- Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access)
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`.
-->
``` release-note
Release alpha version node test container gcr.io/google_containers/node-test-ARCH:0.1 for users to verify their node setup.
```
Automatic merge from submit-queue
Adding cadcading deletion support for federated secrets
Ref https://github.com/kubernetes/kubernetes/issues/33612
Adding cascading deletion support for federated secrets.
The code is same as that for namespaces. Just ensuring that DeletionHelper functions are called at right places in secret_controller.
Also added e2e tests.
cc @kubernetes/sig-cluster-federation @caesarxuchao
```release-note
federation: Adding support for DeleteOptions.OrphanDependents for federated secrets. Setting it to false while deleting a federated secret also deletes the corresponding secrets from all registered clusters.
```
Automatic merge from submit-queue
Rename experimental-runtime-integration-type to experimental-cri
Also rename the field in the component config to `EnableCRI`
The e2e tests cover cases like cluster size changed, parameters
changed, ConfigMap got deleted, autoscaler pod got deleted, etc.
They are separated into a fast part(could be run parallelly) and
a slow part(put in [serial]). The fast part of the e2e tests cost
around 50 seconds to run.
Automatic merge from submit-queue
Rename ScheduledJobs to CronJobs
I went with @smarterclayton idea of registering named types in schema. This way we can support both the new (CronJobs) and old (ScheduledJobs) resource name. Fixes#32150.
fyi @erictune @caesarxuchao @janetkuo
Not ready yet, but getting close there...
**Release note**:
```release-note
Rename ScheduledJobs to CronJobs.
```
This allows us to interrupt/kill the executed command if it exceeds the
timeout (not implemented by this commit).
Set timeout in Exec probes. HTTPGet and TCPSocket probes respect the
timeout, while Exec probes used to ignore it.
Add e2e test for exec probe with timeout. However, the test is skipped
while the default exec handler doesn't support timeouts.
Automatic merge from submit-queue
Adding more e2e tests for federated namespace cascading deletion and fixing bugs
Ref https://github.com/kubernetes/kubernetes/issues/33612
Adding more e2e tests for testing cascading deletion of federated namespace.
New tests are now verifying that cascading deletion happen when DeletionOptions.OrphanDependents=false and it does not happen when DeleteOptions.OrphanDependents=true.
Also updated deletion helper to always add OrphanFinalizer. generic registry will remove it if DeleteOptions.OrphanDependents=false. Also updated namespace registry to do the same.
We need to add the orphan finalizer to keep the orphan by default behavior. We assume that its dependents are going to be orphaned and hence add that finalizer. If user does not want the orphan behavior, he can do so using DeleteOptions and then the registry will remove that finalizer.
cc @kubernetes/sig-cluster-federation @caesarxuchao @derekwaynecarr
Automatic merge from submit-queue
Update how we detect overlapping deployments
<!-- Thanks for sending a pull request! Here are some tips for you:
1. If this is your first time, read our contributor guidelines https://github.com/kubernetes/kubernetes/blob/master/CONTRIBUTING.md and developer guide https://github.com/kubernetes/kubernetes/blob/master/docs/devel/development.md
2. If you want *faster* PR reviews, read how: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/faster_reviews.md
3. Follow the instructions for writing a release note: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/pull-requests.md#release-notes
-->
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#24152
**Special notes for your reviewer**: cc @kubernetes/deployment
**Release note**:
<!-- Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access)
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`.
-->
```release-note
NONE
```
When looking for overlapping deployments, we should also find other deployments that select current deployment's pods,
not just the ones whose pods are selected by current deployment.
Automatic merge from submit-queue
Add --force to kubectl delete and explain force deletion
--force is required for --grace-period=0. --now is == --grace-period=1.
Improve command help to explain what graceful deletion is and warn about
force deletion.
Part of #34160 & #29033
```release-note
In order to bypass graceful deletion of pods (to immediately remove the pod from the API) the user must now provide the `--force` flag in addition to `--grace-period=0`. This prevents users from accidentally force deleting pods without being aware of the consequences of force deletion. Force deleting pods for resources like StatefulSets can result in multiple pods with the same name having running processes in the cluster, which may lead to data corruption or data inconsistency when using shared storage or common API endpoints.
```
Automatic merge from submit-queue
NPD: Add e2e test for NPD v0.2.
Node problem detector has been updated after v0.1, including:
1. Add lookback support. It will lookback for configured time to search for possible kernel panic before node reboot.
2. Get node name via downward api.
This PR updates the test to test the new NPD behavior.
@dchen1107
/cc @kubernetes/sig-node
Automatic merge from submit-queue
Made changes to DELETE API to let v1.DeleteOptions be passed in as a queryParameter
**Which issue this PR fixes** _(optional, in `fixes #<issue number>(, #<issue_number>, ...)` format, will close that issue when PR gets merged)_: fixes#34856
```release-note
DELETE requests can now pass in their DeleteOptions as a query parameter or a body parameter, rather than just as a body parameter.
```
Automatic merge from submit-queue
Set the annotation only if the test requires it.
**What this PR does / why we need it**: Fixes StatefulSet flake
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes https://github.com/kubernetes/kubernetes/issues/36107
**Special notes for your reviewer**: We shouldn't be setting the debug annotation in all our tests, only the ones that bring statefulset pods up one after another. In the absence of the annotation, we have the new default behavior governed by https://github.com/kubernetes/kubernetes/pull/35739
**Release note**:
```release-note
NONE
```
cc @kubernetes/sig-apps @bprashanth @calebamiles
Automatic merge from submit-queue
Controller changes for perma failed deployments
This PR adds support for reporting failed deployments based on a timeout
parameter defined in the spec. If there is no progress for the amount
of time defined as progressDeadlineSeconds then the deployment will be
marked as failed by a Progressing condition with a ProgressDeadlineExceeded
reason.
Follow-up to https://github.com/kubernetes/kubernetes/pull/19343
Docs at kubernetes/kubernetes.github.io#1337
Fixes https://github.com/kubernetes/kubernetes/issues/14519
@kubernetes/deployment @smarterclayton
Automatic merge from submit-queue
add secret type to RBD secrets in examples and e2e test
<!-- Thanks for sending a pull request! Here are some tips for you:
1. If this is your first time, read our contributor guidelines https://github.com/kubernetes/kubernetes/blob/master/CONTRIBUTING.md and developer guide https://github.com/kubernetes/kubernetes/blob/master/docs/devel/development.md
2. If you want *faster* PR reviews, read how: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/faster_reviews.md
3. Follow the instructions for writing a release note: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/pull-requests.md#release-notes
-->
**What this PR does / why we need it**:
This is a followup to recent changes in secret type matching
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
@kubernetes/sig-storage @liggitt
**Release note**:
<!-- Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access)
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`.
-->
```release-note
```
Automatic merge from submit-queue
Remove statefulset e2e test setup for alpha
Depends on #35731, once statefulset is beta, it doesn't need special treatment for alpha version in e2e test
cc @erictune @foxish @kubernetes/sig-apps
Automatic merge from submit-queue
Fixed kibana test problem
After #36103 a problem was introduced. If you change basePath, dynamic bundle optimization is performed upon startup, which can take several minutes.
Following PR fixed this problem by waiting until optimization is completed.
@piosz
Edit: This problem has no known workarounds if basePath is dynamic (like in our case)
Reference: https://discuss.elastic.co/t/how-to-avoid-dynamic-bundle-optimization/37367
Automatic merge from submit-queue
Enable NFS volume test
This PR fixes the dockerfile for NFS server image and enable NFSv4.
After using containeried mounts approach on GCI, this test should pass.
The functionality used to exist entirely in the NC which would
previously clean up pods and nodes together. Now, we simply
wait for the PodGC to see that the node is now deleted and clean up the
pods. This may take a while and hence we set a 1 minute timeout.
Automatic merge from submit-queue
Switch DisruptionBudget api from bool to int allowed disruptions [only v1beta1]
Continuation of #34546. Apparently it there is some bug that prevents us from having 2 different incompatibile version of API in integration tests. So in this PR v1alpha1 is removed until testing infrastructure is fixed.
Base PR comment:
Currently there is a single bool in disruption budget api that denotes whether 1 pod can be deleted or not. Every time a pod is deleted the apiserver filps the bool to false and the disruptionbudget controller sets it to true if more deletions are allowed. This works but it is far from optimal when the user wants to delete multiple pods (for example, by decreasing replicaset size from 10000 to 8000).
This PR adds a new api version v1beta1 and changes bool to int which contains a number of pods that can be deleted at once.
cc: @davidopp @mml @wojtek-t @fgrzadkowski @caesarxuchao
Automatic merge from submit-queue
[Kubelet] Use the custom mounter script for Nfs and Glusterfs only
This patch reduces the scope for the containerized mounter to NFS and GlusterFS on GCE + GCI clusters
This patch also enabled the containerized mounter on GCI nodes
Shepherding multiple PRs through the submit queue is painful. Hence I combined them into this PR. Please review each commit individually.
cc @jingxu97 @saad-ali
https://github.com/kubernetes/kubernetes/pull/35652 has also been reverted as part of this PR
Automatic merge from submit-queue
pod and qos level cgroup support
```release-note
[Kubelet] Add alpha support for `--cgroups-per-qos` using the configured `--cgroup-driver`. Disabled by default.
```
Automatic merge from submit-queue
Move Statefulset (previously PetSet) to v1beta1
**What this PR does / why we need it**: #28718
**Which issue this PR fixes** _(optional, in `fixes #<issue number>(, #<issue_number>, ...)` format, will close that issue when PR gets merged)_: fixes #
**Special notes for your reviewer**: depends on #35663 (PetSet rename)
cc @erictune @foxish @kubernetes/sig-apps
**Release note**:
``` release-note
v1beta1/StatefulSet replaces v1alpha1/PetSet.
```
When looking for overlapping deployments, we should also find other deployments that select current deployment's pods,
not just the ones whose pods are selected by current deployment.
--force is required for --grace-period=0. --now is == --grace-period=1.
Improve command help to explain what graceful deletion is and warn about
force deletion.
Automatic merge from submit-queue
Making the pod.alpha.kubernetes.io/initialized annotation optional in PetSet pods
**What this PR does / why we need it**: As of now, the absence of the annotation `pod.alpha.kubernetes.io/initialized` in PetSets causes the PetSet controller to effectively "pause". Being a debug hook, users expect that its absence has no effect on the working of a PetSet. This PR inverts the logic so that we let the PetSet controller operate as expected in the absence of the annotation.
Letting the annotation remain alpha seems ok. Renaming it to something more meaningful needs further discussion.
**Which issue this PR fixes** _(optional, in `fixes #<issue number>(, #<issue_number>, ...)` format, will close that issue when PR gets merged)_: fixes https://github.com/kubernetes/kubernetes/issues/35498
**Special notes for your reviewer**:
**Release note**:
``` release-note
The annotation "pod.alpha.kubernetes.io/initialized" on StatefulSets (formerly PetSets) is now optional and only encouraged for debug use.
```
cc @erictune @smarterclayton @bprashanth @kubernetes/sig-apps
@kow3ns The examples will need to be cleaned up as well I think later on to remove them.
Automatic merge from submit-queue
Node controller to not force delete pods
Fixes https://github.com/kubernetes/kubernetes/issues/35145
- [x] e2e tests to test Petset, RC, Job.
- [x] Remove and cover other locations where we force-delete pods within the NodeController.
**Release note**:
<!-- Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access)
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`.
-->
``` release-note
Node controller no longer force-deletes pods from the api-server.
* For StatefulSet (previously PetSet), this change means creation of replacement pods is blocked until old pods are definitely not running (indicated either by the kubelet returning from partitioned state, or deletion of the Node object, or deletion of the instance in the cloud provider, or force deletion of the pod from the api-server). This has the desirable outcome of "fencing" to prevent "split brain" scenarios.
* For all other existing controllers except StatefulSet , this has no effect on the ability of the controller to replace pods because the controllers do not reuse pod names (they use generate-name).
* User-written controllers that reuse names of pod objects should evaluate this change.
```
Automatic merge from submit-queue
Disruption e2e: wait for running pods in the table test, too
**What this PR does / why we need it**: Makes the `DisruptionController` tests less flaky.
**Which issue this PR fixes**: fixes#34032
Automatic merge from submit-queue
e2e: Fix GetReadySchedulableNodesOrDie for taints
**What this PR does / why we need it**:
This changes framework.GetReadySchedulableNodesOrDie and
framework.GetMasterAndWorkerNodesOrDie so that nodes that can't take a
generic fake pod due to a taint/toleration mismatch aren't returned.
This is a rehash of #35210, but pulls in the scheduler code.
**Which issue this PR fixes**: c.f. #35210
**Special notes for your reviewer**: I think it's gross that we keep having to manually compute this in e2es. Maybe we need a bug for that?
This changes framework.GetReadySchedulableNodesOrDie and
framework.GetMasterAndWorkerNodesOrDie so that nodes that can't take a
generic fake pod due to a taint/toleration mismatch aren't returned.
This is a rehash of #35210, but pulls in the scheduler code.
Automatic merge from submit-queue
Verify and update client-go staging area for every PR
We need to keep the staging area up-to-date to prevent PRs from breaking client-go.
It's marked as "WIP" because we need to decide the [versioning strategy](https://github.com/kubernetes/client-go/issues/9) for client-go first. This PR contains breaking changes for client-go.
This is blocking #29934 and potentially #34441
cc @kubernetes/sig-api-machinery
Automatic merge from submit-queue
Let release_1_5 clientset include multiple versions of a group
Fix#35237
This PR make versioned clientset to include multiple versions of a group. Currently only `batch` has `v1` and `v2alpha1`. The clientset interface now looks like:
```go
BatchV2alpha1() v2alpha1batch.BatchV2alpha1Interface
BatchV1() v1batch.BatchV1Interface
// Deprecated: please explicitly pick a version if possible.
Batch() v1batch.BatchV1Interface
```
Commit "update client-gen to say internalversion rather than unversioned" fixes https://github.com/kubernetes/kubernetes/issues/24481.
cc @kubernetes/sig-api-machinery @soltysh @deads2k @nikhiljindal
```release-note
release_1_5 clientset supports multiple versions of a group.
```
Automatic merge from submit-queue
Make overlapping deployments deletable
@kubernetes/deployment ptal
Fixes https://github.com/kubernetes/kubernetes/issues/34466 by 1) not adding the overlapping annotation in the working deployment, 2) updates observedGeneration for overlapping deployments, and 3) updates the kubectl deployment reaper to do non-cascading deletion for deployments with the overlapping annotation.
Automatic merge from submit-queue
Fixing e2e tests which rely on network disruptions
**What this PR does / why we need it**: It fixes e2e tests on network disruptions
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes https://github.com/kubernetes/kubernetes/issues/27324, https://github.com/kubernetes/kubernetes/issues/35293https://github.com/kubernetes/kubernetes/pull/33655 changed the kubelet's `--api-server` flag to use the external IP address of the APIServer in GCE. Hence, the iptables rules were failing previously. We now return the external IP for `getMaster()`. This fix is required in order to write e2e tests which test behavior in the face of network disruptions.
/cc @bprashanth @dchen1107 @kubernetes/sig-apps
Automatic merge from submit-queue
Add a retry when reading a file content from a container
To avoid temporal failure in reading the file content, add a retry
process in function verifyPDContentsViaContainer
- Prevents kubelet from overwriting capacity during sync.
- Handles opaque integer resources in the scheduler.
- Adds scheduler predicate tests for opaque resources.
- Validates opaque int resources:
- Ensures supplied opaque int quantities in node capacity,
node allocatable, pod request and pod limit are integers.
- Adds tests for new validation logic (node update and pod spec).
- Added e2e tests for opaque integer resources.
Automatic merge from submit-queue
remove the non-generated client
Removes the non-generated client from kube. The package has a few methods left, but nothing that needs updating when adding new groups.
@ingvagabund
Automatic merge from submit-queue
Set done to true & return error if RestartPolicy not Always in test framework
Found a small issue with https://github.com/kubernetes/kubernetes/pull/34632, it returns an error if the RestartPolicy is not Always, but the user will never see it because done isn't set to true & they will timeout instead.
@Random-Liu because you wrote that PR
Automatic merge from submit-queue
Adding cascading deletion support to federated namespaces
Ref https://github.com/kubernetes/kubernetes/issues/33612
With this change, whenever a federated namespace is deleted with `DeleteOptions.OrphanDependents = false`, then federation namespace controller first deletes the corresponding namespaces from all underlying clusters before deleting the federated namespace.
cc @kubernetes/sig-cluster-federation @caesarxuchao
```release-note
Adding support for DeleteOptions.OrphanDependents for federated namespaces. Setting it to false while deleting a federated namespace also deletes the corresponding namespace from all registered clusters.
```
Automatic merge from submit-queue
rename kubelet flag mounter-path to experimental-mounter-path
```release-note
* Kubelet flag '--mounter-path' renamed to '--experimental-mounter-path'
```
The feature the flag controls is an experimental feature and this renaming ensures that users do not depend on this feature just yet.
Automatic merge from submit-queue
Add multiple PV/PVC pair handling to persistent volume e2e test
Adds the framework for creating, validating, and deleting groups of PVs and PVCs.
Automatic merge from submit-queue
Speed up some networking tests in large clusters
Since we are getting towards testing larger and larger clusters (hopefully 5000-node ones soon-ish), I'm trying to limit the amount of super long tests to minimum.
This should significantly reduce amount of time used by those from test/e2e/networking.go.
@gmarek
Automatic merge from submit-queue
Remove versioned LabelSelectors
We have LabelSelectors defined in `unversioned`, `batch/v1`, `batch/v2alpha1`, and `extensions/v1beta1`. Their definitions are all the same. I kept the definition in `unversioned` and removed the others. It only makes sense to define a versioned LabelSelectors if the definition is different.
Automatic merge from submit-queue
Marked NodeOutOfDisk test with feature label
Marked NodeOutOfDisk test with feature label to temporarily remove it from flaky suite.
@madhusudancs @piosz
Automatic merge from submit-queue
always clean gce resources in service e2e
@bprashanth the previous PR was closed when I squashed my commits.
Here is the new change set, please help to review again.
1). only the following two It() create, I created a string array to persist the LB name so that they can be cleaned in AfterEach(), and the string array was reset after clean up.
```
"should be able to change the type and ports of a service [Slow]"
"should be able to create services of type LoadBalancer and externalTraffic=localOnly"
```
2). Directly call gce api to delete the resource and ignore any error returned.
Automatic merge from submit-queue
CRI: Add dockershim grpc server.
This PR adds a in-process grpc server for dockershim.
Flags change:
1. `container-runtime` will not be automatically set to remote when `container-runtime-endpoint` is set. @feiskyer
2. set kubelet flag `--experimental-runtime-integration-type=remote --container-runtime-endpoint=UNIX_SOCKET_FILE_PATH` to enable the in-process dockershim grpc server.
3. set node e2e test flag `--runtime-integration-type=remote -container-runtime-endpoint=UNIX_SOCKET_FILE_PATH` to run node e2e test against in-process dockershim grpc server.
I've run node e2e test against the remote cri integration, tests which don't rely on stream and log functions can pass.
This unblocks the following work:
1) CRI conformance test.
2) Performance comparison between in-process integration and in-process grpc integration.
@yujuhong @feiskyer
/cc @kubernetes/sig-node
Automatic merge from submit-queue
Make hack/update_owners.py get list from local repo, add --check option.
This should become a verify step soon. The munger understands the * syntax already.
Automatic merge from submit-queue
Add a retry when reading a file content from a container
To avoid temporal failure in reading the file content, add a retry
process in function verifyPDContentsViaContainer
Automatic merge from submit-queue
e2e node plumbing and bundling for GCI mounter
**Note:** The code in this PR only bundles the mounter and modifies `--mounter-path` if it can find `cluster/gce/gci/mounter` in the K8s source dir when building the test bundle.
This bundles the mounter script for GCI with the node e2e tests and allows the `--mounter-path` to be passed to the Kubelet via the node test framework. The node test runner will detect when we are running on a remote GCI node and add the appropriate `--mounter-path` to the `testArgs`.
It also includes a simple node test that mounts a tmpfs volume. This will exercise the Kubelet's mounter code path.
**ITEM OF NOTE:** To get the k8s root dir (in order to copy the mount script into the tarball), I changed `getK8sRootDir` -> `GetK8sRootDir` in `test/e2e_node/build/build.go`. Based on the comment above that function (and the fact that it was private to begin with), I'm not sure this is the best way to do things:
```
// TODO: Dedup / merge this with comparable utilities in e2e/util.go
```
On the other hand, the `e2e/util.go` file mentioned in that comment doesn't exist anymore. This should be resolved before this PR is merged.
Automatic merge from submit-queue
Fixing a typo in federated secrets test
Realized that our secret e2e test was not running due to a typo in my last PR :)
cc @kubernetes/sig-cluster-federation
Automatic merge from submit-queue
service e2e: remove TODO and subtle changes in logging
Removes the stale `TODO` for external source IP preservation as the e2e test of ESIPP was added.
Changes logging in create service functions: namespace/namespace -> namespace/serviceName.
@bprashanth
Automatic merge from submit-queue
Update elasticsearch and kibana usage
```release-note
Updated default Elasticsearch and Kibana used for elasticsearch logging destination to versions 2.4.1 and 4.6.1 respectively.
```
Updated controllers for elasticsearch and kibana to use newer versions of images. Fixed e2e test because of elasticsearch backward incompatible API changes.
Fixed out of sync elasticsearch controller for coreos.
@piosz
Automatic merge from submit-queue
Loadbalanced client src ip preservation enters beta
Sounds like we're going to try out the proposal (https://github.com/kubernetes/kubernetes/issues/30819#issuecomment-249877334) for annotations -> fields on just one feature in 1.5 (scheduler). Or do we want to just convert to fields right now?
Automatic merge from submit-queue
Verify petset status.replicas in e2e test
<!-- Thanks for sending a pull request! Here are some tips for you:
1. If this is your first time, read our contributor guidelines https://github.com/kubernetes/kubernetes/blob/master/CONTRIBUTING.md and developer guide https://github.com/kubernetes/kubernetes/blob/master/docs/devel/development.md
2. If you want *faster* PR reviews, read how: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/faster_reviews.md
3. Follow the instructions for writing a release note: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/pull-requests.md#release-notes
-->
**What this PR does / why we need it**: follow up #33983. PetSet status.replicas bug is fixed, so adding tests for it (especially for the `should handle healthy pet restarts during scale` case)
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**: cc @erictune @foxish @kubernetes/sig-apps
**Release note**:
<!-- Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access)
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`.
-->
```release-note
NONE
```
Automatic merge from submit-queue
Set non-always RestartPolicy for write-pod in pv e2e
Due to https://github.com/kubernetes/kubernetes/pull/34632 the RestartPolicy can't be Always (& it shouldn't be anyway)
Automatic merge from submit-queue
Add secret e2e test for keys mapping
**What this PR does / why we need it**: Adds a basic e2e test missing in secrets
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, #<issue_number>, ...)` format, will close that issue when PR gets merged)*:
**Special notes for your reviewer**:
**Release note**:
```NONE
```
This patch adds a secret e2e test. While configmap e2e tests are far
more complete, this patch makes secret e2e tests one step closer.
Also, now is more easy to add more tests without code duplication (as I
did in earlier patches :-/), because of the functions created, and is
more easy to make it similar to confimap e2e in the future, that is
really complete.
Automatic merge from submit-queue
Adding default StorageClass annotation printout for resource_printer and describer and some refactoring
adding ISDEFAULT for _kubectl get storageclass_ output
```
[root@screeley-sc1 gce]# kubectl get storageclass
NAME TYPE ISDEFAULT
another-class kubernetes.io/gce-pd NO
generic1-slow kubernetes.io/gce-pd YES
generic2-fast kubernetes.io/gce-pd YES
```
```release-note
Add ISDEFAULT to kubectl get storageClass output
```
@kubernetes/sig-storage
Commit 53ec6e6 missed to remove this deferred call (probably due to a
rebase).
I noticied it while working with the code. I'm not sure why the original
commits removes them, but it seems the right thing to do.
This patch adds a secret e2e test. While configmap e2e tests are far
more complete, this patch makes secret e2e tests one step closer.
Also, now is more easy to add more tests without code duplication (as I
did in earlier patches :-/), because of the functions created, and is
more easy to make it similar to confimap e2e in the future, that is
really complete.
Automatic merge from submit-queue
e2e: stop tracking resource usage for the "misc" container
There is e2e test checking the resource usage of "misc", and it is not
supported on GCI.
Automatic merge from submit-queue
New client-go structure
This PR is part of restructuring client-go (https://github.com/kubernetes/client-go/issues/9#issue-181545998). In short, the top-level folder for client-go versions are removed.
This PR also runs copy.sh to pick up changes in the main repository. The number of files in client-go has increase from 1361 files to 1405.
@mbohlool @mml @timoreimann
Automatic merge from submit-queue
Add test_list command, to enumerate unit and e2e tests.
This uses go/parser and go/ast to analyze all test files in ~1 second.
It only recognizes a few simple structures that the tests all have, and
modifies a few tests to fit expected structure better.
This is part of an effort to ensure all tests have owners, by having a
verify check to catch new tests being added without an owner.
This uses go/parser and go/ast to analyze all test files in ~1 second.
It only recognizes a few simple structures that the tests all have, and
modifies a few tests to fit expected structure better.
This is part of an effort to ensure all tests have owners, by having a
verify check to catch new tests being added without an owner.
Automatic merge from submit-queue
Make E2E tests easier to debug
This PR removes deferred deletion calls in E2E tests in order to make test easier to debug. Some of these tests predate namespace finalization; we should just rely on that mechanism to ensure that framework test artifacts are deleted.
Automatic merge from submit-queue
controller: set minReadySeconds in deployment's replica sets
* Estimate available pods for a deployment by using minReadySeconds on
the replica set.
* Stop requeueing deployments on pod events, superseded by following the
replica set status.
* Cleanup redundant deployment utilities
Fixes https://github.com/kubernetes/kubernetes/issues/26079
@kubernetes/deployment ptal
Automatic merge from submit-queue
Cleanup the commented code for overriding flags with viper. For now,…
Minor cleanup for the viper configuration logic, removes commented code into a function of its own. We can decide wether or not to overwrite flag values at a later time...
Automatic merge from submit-queue
Move RunRC-like functions to test/utils
Ref. #34336
cc @timothysc - the "move" part of the small refactoring. @jayunit100
Automatic merge from submit-queue
e2e: don't require minimum availability once scaling takes place
This test shouldn't care about availability at all in the first place.
@mfojtik @kubernetes/deployment ptal
Fixes https://github.com/kubernetes/kubernetes/issues/34717
Automatic merge from submit-queue
Add e2e tests for storageclass
- test pd-ssd and pd-standard on GCE,
- test all four volume types and encryption on AWS
- test just the default volume type on OpenStack (right now, there is no API
to get list of them)
These tests are quite slow, e.g. there are two tests on AWS that has to run mkfs.ext4 on 500 GB magnetic drive with low IOPS, which takes ~3-4 minutes each.
Automatic merge from submit-queue
Fix race condition in test with git server startup
Fixes https://github.com/kubernetes/kubernetes/issues/32467 (hopefully)
Previously, the test didn't ensure the git server was running before attempting to connect to it.
- test pd-ssd and pd-standard on GCE,
- test all four volume types on AWS
- test just the default volume type on OpenStack (right now, there is no API
to get list of them)
* Estimate available pods for a deployment by using minReadySeconds on
the replica set.
* Stop requeueing deployments on pod events, superseded by following the
replica set status.
* Cleanup redundant deployment utilities
Automatic merge from submit-queue
Fix leaking ingress resources in federated ingress e2e test.
Originally the federated ingresses were being deleted, but due to the lack of cascading deletion, the cluster ingresses were never being deleted, leading to leaked GCE loadbalancer resources. This fixes that.
Automatic merge from submit-queue
Ignore mirror pods with RestartPolicy == Never in restart tests
Kubelet does not sync the mirror pods once they have terminated. If, for some
reason, that such mirror pods get deleted once they have terminated (either by
the node controller or by users), kubelet will not attempt to recreate them.
However, when kubelet restarts, it will examine the static pods, sync once,
and create a mirror pod. This has led to unexpected pod counts in disruptive
tests where kubelet gets restarted on purpose (see #34003).
This change disregard such mirror pods when totaling the pods to fix the test
flake until we have time to implement a long-term solution.
This PR addresses #34003
Automatic merge from submit-queue
Add gcl cluster logging test
This PR changes default logging destination for tests to gcp and adds test for cluster logging using google cloud logging
Fix#20760
Automatic merge from submit-queue
test: move deployment deletion in its own test
@kubernetes/deployment this PR moves the deletion of deployment in its own test (no need to delete deployments in every test since the namespace controller does that for us once the namespace gets deleted after the test finishes)
Fixes https://github.com/kubernetes/kubernetes/issues/33256
Automatic merge from submit-queue
Make logging function injectable in RCConfig
Ref. #34336
cc @timothysc - the "move" part of the small refactoring. @jayunit100
Automatic merge from submit-queue
remove [Conformance] flag on some e2es
Downstream distributions that absorb the upstream tests would like to give their customers a standard mechanism to validate their clusters, post setup. As of today [Conformance] works for most things, but there are a known set of tests that vary due to opinionated differences around networking, security, etc... and providing a complete skip list can be cumbersome. To address this, we've simply modified the flag on some tests to [Conformance:Variant]. All existing behavior should be maintained.
Fixes: #34105
Kubelet does not sync the mirror pods once they have terminated. If, for some
reason, that such mirror pods get deleted once they have terminated (either by
the node controller or by users), kubelet will not attempt to recreate them.
However, when kubelet restarts, it will examine the static pods, sync once,
and create a mirror pod. This has led to unexpected pod counts in disruptive
tests where kubelet gets restarted on purpose (see #34003).
This change disregard such mirror pods when totaling the pods to fix the test
flake until we have time to implement a long-term solution.
Automatic merge from submit-queue
Fix confusing log messages
While debugging https://github.com/kubernetes/kubernetes/issues/33876 , I noticed following confusing message:
```
The status of Pod kibana-logging-v1-j99la is Running, waiting for it to be either Running or Failed
```
Automatic merge from submit-queue
annotate some addtional errors in e2e tests
<!-- Thanks for sending a pull request! Here are some tips for you:
1. If this is your first time, read our contributor guidelines https://github.com/kubernetes/kubernetes/blob/master/CONTRIBUTING.md and developer guide https://github.com/kubernetes/kubernetes/blob/master/docs/devel/development.md
2. If you want *faster* PR reviews, read how: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/faster_reviews.md
3. Follow the instructions for writing a release note: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/pull-requests.md#release-notes
-->
**What this PR does / why we need it**:
Adds some additional context to e2e test failures.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, #<issue_number>, ...)` format, will close that issue when PR gets merged)*: some work toward #34059
**Special notes for your reviewer**: I didn't want to go through all of the offending cases so I picked off a few files and addressed those.
**Release note**:
<!-- Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access)
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`.
-->
```release-note
NONE
```
Automatic merge from submit-queue
remove testapi.Default.GroupVersion
I'm going to try to take this as a series of mechanicals. This removes `testapi.Default.GroupVersion()` and replaces it with `registered.GroupOrDie(api.GroupName).GroupVersion`.
@caesarxuchao I'm trying to see how much of `pkg/api/testapi` I can remove.
Automatic merge from submit-queue
try to use ifdown/ifup if available
Tried this on both ContainerVM and GCI image.
`ip link set eth0 down` is too destructive for containerVM. It could not recover with correct network setup hence failing the test. Need to use ifdown/ifup on containerVM.
reference:
http://serverfault.com/questions/603906/ip-link-set-not-assigning-ip-address-but-ifup-does
Automatic merge from submit-queue
Don't set timeouts in clients in tests
We are not setting timeouts in production - we shouldn't do it in tests then...
Addresses point 2. of #31345
Automatic merge from submit-queue
add delete-namespace-on-failure flag
I have been doing this for a while.
Setting `--delete-namespace=false --clean-start=true` only works if you have only one e2e test running in a loop.
This PR lets someone to set `delete-namespace-on-failure=false` and run multiple tests in parallel and preserve the crime scene. It makes it easier to reproduce failures.
Let me know if this is worth it or there are some other tricks I am not aware.
Automatic merge from submit-queue
Delete federation namespace after the test completes
The code was commented because of a bug in namespace deletion which is now fixed.
Note that this deletes the namespace in federation control plane. We still need to delete the namespace from each cluster (cascading deletion)
cc @kubernetes/sig-cluster-federation
Automatic merge from submit-queue
Revert "Revert "move pod networking tests common""
Reverts #34011
And fix the problem causing `Granular Checks: Services [Slow] should update nodePort` tests to fail
Automatic merge from submit-queue
Add a retry loop to KubeletManagerEtcHosts flake
Add a retry loop to KubeletManagerEtcHosts flake
Note: this still makes the test fail if a retry occurs, but
will give us more information regarding whether or not the
test flake could be occuring due to delay in mounting of
/etc/hosts.
Note: this still makes the test fail if a retry occurs, but
will give us more information regarding whether or not the
test flake could be occuring due to delay in mounting of
/etc/hosts.
Automatic merge from submit-queue
Check for empty string post trimming
We curl in a retry loop and timeout, trimming stdout to find endpoint names. When curl hits the timeout, stdout is empty, so we insert the empty string into the received set of endpoints.
Fixes https://github.com/kubernetes/kubernetes/issues/32684
Automatic merge from submit-queue
Improve source ip preservation test, fail the test instead of panic.
From #31085.
The source IP preserve test starts to be flake again. Sending out this PR to get rid of panicing and log the unexpected output for future investigation.
@freehan
Automatic merge from submit-queue
[Client-gen] Let versioned client use versioned options
i.e., use v1.ListOptions, v1.DeleteOptions when possible.
Remove the extension/v1beta1.ListOptions, because it's exactly the same as v1.ListOptions, and is not referred throughout the code base. After its removal, I register v1.ListOptions during extensions/v1beta1 scheme registration.
First three commits are manual changes.
Fix#27753
cc @lavalamp
Automatic merge from submit-queue
Add a check for file size if the reading content returns empty
In order to debug the flaky tests for writing/reading files via
contains, this PR adds a check for file size if reading returns empty
content.
Automatic merge from submit-queue
Fix kubelet perf data to make it work again for perfdash.
Addresses https://github.com/kubernetes/kubernetes/pull/30333#issuecomment-248791257.
Add the "node" label back to fix kubelet perf dash. At least for now, we still need original perfdash to catch summary api performance regression.
/cc @coufon @yujuhong
Automatic merge from submit-queue
Heal the namespaceless ingresses in federation e2e.
For createIngressOrFail, it incorrectly returned the ingress passed in as an argument, which does not include the namespace, instead of the ingress returned from the create call (which does).
This in turn leads to errors in e2e tests like this:
INFO: Waiting for Ingress federated-ingress to acquire IP, error an empty namespace may not be set when a resource name is provided.
Self-applying LGTM label, as this is the same code that was LGTM'd by @nikhiljindal in #33502
Automatic merge from submit-queue
Delete evicted pet
If pet was evicted by kubelet - it will stuck in this state forever.
By analogy to regular pod we need to re-create pet so that it will
be re-scheduled to another node, so in order to re-create pet
and preserve consitent naming we will delete it in petset controller
and create after that.
fixes: https://github.com/kubernetes/kubernetes/issues/31098
Automatic merge from submit-queue
Deregister clusters during federated namespace e2e tear down.
This is causing other tests to leak resources.
cc @mwielgus @kubernetes/sig-cluster-federation
Automatic merge from submit-queue
fix kubectl taint e2e flake: add retries for removing taint
**What this PR does / why we need it**:
Why we need it: recent failures occurred in #29503 are caused by taints removing conflict on nodes, this PR is to fix it. (#33073 fixed taints updating conflict, but not taints removing.)
What this PR does: use `runKubectlRetryOrDie()` instead of `RunKubectlOrDie()` in all the places in "Kubectl taint" e2e tests.
**Which issue this PR fixes** : fixes part of #29503, (would like to keep this issue open for some days more to make sure no other failures occur)
**Special notes for your reviewer**: NONE
**Release note**: NONE
Automatic merge from submit-queue
Increase timeout for federated ingress test.
Right now federated ingress e2e takes more than 1 minute, as we need to wait for the first clusters ingress to have an IP address allocated to it before creating the others. Sometimes this takes a while due to GCE loadbalancer backend delays.
Automatic merge from submit-queue
Change minion to node
Continuation of #1111
I tried to keep this PR down to just a simple search-n-replace to keep
things simple. I may have gone too far in some spots but its easy to
roll those back if needed - just let me know.
I avoided renaming `contrib/mesos/pkg/minion` because there's already
a `contrib/mesos/pkg/node` dir and fixing that will require a bit of work
due to a circular import chain that pops up. So I'm saving that for a
follow-on PR.
Signed-off-by: Doug Davis <dug@us.ibm.com>
Automatic merge from submit-queue
Make the restart test restart the nodes without a mig rolling update.
This is one approach to fix#33113. I switched from using a mig rolling-update to just pushing the reset button on the nodes and then waiting for their boot IDs to change.
Contination of #1111
I tried to keep this PR down to just a simple search-n-replace to keep
things simple. I may have gone too far in some spots but its easy to
roll those back if needed.
I avoided renaming `contrib/mesos/pkg/minion` because there's already
a `contrib/mesos/pkg/node` dir and fixing that will require a bit of work
due to a circular import chain that pops up. So I'm saving that for a
follow-on PR.
I rolled back some of this from a previous commit because it just got
to big/messy. Will follow up with additional PRs
Signed-off-by: Doug Davis <dug@us.ibm.com>
We had another bug where we confused the hostname with the NodeName.
To avoid this happening again, and to make the code more
self-documenting, we use types.NodeName (a typedef alias for string)
whenever we are referring to the Node.Name.
A tedious but mechanical commit therefore, to change all uses of the
node name to use types.NodeName
Also clean up some of the (many) places where the NodeName is referred
to as a hostname (not true on AWS), or an instanceID (not true on GCE),
etc.
Automatic merge from submit-queue
Added e2e framweork and a simple test for HA master.
Added e2e framweork for testing HA master. Added one simple e2e test for HA master that first grows and then shrinks GCE cluster.
Automatic merge from submit-queue
Corrected timeout on Downward API volume
The tests for Downward API volume were apparently created before the
default sync-frequency was set to 1 minute. As the current
implementation of Pod gives us no guarantee that the changes will be
seen in Downward API before `sync-frequency` we need the tests to
reflect this.
Fixes: #29633
Automatic merge from submit-queue
Pass the real cluster scoped service object to cleanup functions instead of passing the federation scoped object.
cc @kubernetes/sig-cluster-federation
Ref: Issue #31624
Automatic merge from submit-queue
Logging soak
Implements #24427
Needs
- #24471 so that it doesnt clog test outputs for scale
- builds on the utils function added in support of #22869
cc @timothysc @kubernetes/sig-testing
Automatic merge from submit-queue
Allow garbage collection to work against different API prefixes
The GC needs to build clients based only on Resource or Kind. Hoist the
restmapper out of the controller and the clientpool, support a new
ClientForGroupVersionKind and ClientForGroupVersionResource, and use the
appropriate one in both places.
Allows OpenShift to use the GC
Automatic merge from submit-queue
Speed up job's e2e when waiting for failure
**What this PR does / why we need it**:
Job controller synchronizes objects only when job itself or underlying pod changes. Or, when full resync is performed once 10 mins. This leads e2e test to unnecessarily wait that longer timeout, sometimes at least. I've added job modification action which triggers resync, if the job wasn't terminated within shorter period of time.
@ixdy ptal
@janetkuo @erictune fyi
Automatic merge from submit-queue
Staging 1.5 client
Created the 1.5 folder and remove the 1.4 folder in the staging area in the master branch.
Content of kubernetes/client-go/1.4 will be pulled from the kubernetes/kubernetes 1.4 branch (https://github.com/kubernetes/contrib/pull/1719)
The GC needs to build clients based only on Resource or Kind. Hoist the
restmapper out of the controller and the clientpool, support a new
ClientForGroupVersionKind and ClientForGroupVersionResource, and use the
appropriate one in both places.
Automatic merge from submit-queue
Node E2E: Change the disk eviction test to pull images again after the test.
Fixes https://github.com/kubernetes/kubernetes/issues/32022#issuecomment-248677706.
This PR changes the disk eviction test to pull test images again in `AfterEach`, because images may be evicted during the test.
@yujuhong
/cc @kubernetes/sig-node
The tests for Downward API volume were apparently created before the
default sync-frequency was set to 1 minute. As the current
implementation of Pod gives us no guarantee that the changes will be
seen in Downward API before `sync-frequency` we need the tests to
reflect this.
Fixes: #29633
Automatic merge from submit-queue
Minor Ingress tests cleanup, that includes service shard and GCE resource cleanups in underlying clusters.
Follow up for #32810.
cc @kubernetes/sig-cluster-federation
Automatic merge from submit-queue
Dumping federation events if federation e2e test failed
Updating the e2e framework to dump events in federation control plane if a federation e2e test failed.
This should help in debugging https://github.com/kubernetes/kubernetes/issues/32733
cc @kubernetes/sig-cluster-federation
Spawn pet set with single replica and simple pod. They will have
conflicting hostPort definitions, and spawned on the same node.
As the result pet set pod, it will be created after simple pod, will be
in Failed state. Pet set controller will try to re-create it. After
verifying that pet set pod failed and was recreated atleast once, we will
remove pod with conflicting hostPort and wait until pet set pod will be in
running state.
Change-Id: I5903f5881f8606c696bd390df58b06ece33be88a
Prior to this, we would approve eviction as long as the current state of
the pods matched the budget. The new version requires that after the
eviction, the pods would still match the budget.
Also update tests to match.
Automatic merge from submit-queue
Viper direct bindings to TestContext struct with hierarchichal suppor…
Part of #31453 to support hierarchichal parameters. This one does so for density, paves way for other tests as well.
Automatic merge from submit-queue
Node E2E: Add image white list
This is part of #29081. Fixes#29155.
As is discussed with @yujuhong in #29155, it is difficult to maintain the prepull image list if it is not enforced.
This PR added an image white list in the test framework, only images in the white list could be used in the test. If the image is not in the white list, the test will fail with reason:
```
Image "XXX" is not in the white list, consider adding it to CommonImageWhiteList in test/e2e/common/util.go or NodeImageWhiteList in test/e2e_node/image_list.go
```
Notice that if image pull policy is `PullAlways`, the image is not necessary to be in the white list or prepulled, because the test expects the image to be pulled during the test.
Currently, the image white list is only enabled in node e2e, because the image puller in e2e test is not integrated with the image white list yet.
/cc @kubernetes/sig-node
Automatic merge from submit-queue
Implement cleanup methods to cleanup underlying cluster resources in federated service tests.
cc @kubernetes/sig-cluster-federation @colhom
Automatic merge from submit-queue
Make container exec failures in e2e easier to debug
Makes container exec failures in e2e tests easier to debug. Found while chasing some SELinux bugs :)
@pwittrock I'm adding this to the 1.4 milestone because it makes e2e failures easier to debug.
Automatic merge from submit-queue
Pet Set Example for Cassandra
- updating cassandra to 3.7
- added pet set example
- adding pet set for Cassandra e2e tests
- changed service as we do not want a lb service, as we are running C*
- updated docs
cc @bgrant0607
cc @kubernetes/examples
We can probably close a couple of other open PR, since I did some other stuff.
Automatic merge from submit-queue
Decrease timeout for namespace creation in test
If apiserver is unresponsive (e.g. because of crashloop or sth), we are wasting a lot of test time on retries.
Automatic merge from submit-queue
Add kubectl run ScheduledJob e2e test
**What this PR does / why we need it**:
This add another `kubectl run` e2e test, this time verifying proper creation of a ScheduledJob.
@janetkuo ptal
@deads2k that should give you more confidence when ditching manual clients
Automatic merge from submit-queue
Add Viper parametrization as E2E config option.
do-not-merge
Fixes#18099 via viper rather than inis.
Wont build until we remove BurntSushi/ COPYING based deps from upstream viper.
I'll dig into those issues independently and update later, before pushing the updated godeps into this PR.
Automatic merge from submit-queue
Allow to use GetSigner with vagrant provider
In order to run tests that require ssh access to a node on vagrant
we need to provide path to private ssh key.
Now it will be possible to do using VAGRANT_SSH_KEY environment variable
Automatic merge from submit-queue
Move nginx ingress e2e to slow
Normal GCE L7 e2e takes ~15m and runs in a feature private suite. This e2e ensure that the api isn't broken, by creating an nginx controller. I plan to write a really slimmed down version for presubmit, but I need to shave off a minute to get it below 5m.
Fixes https://github.com/kubernetes/kubernetes/issues/23416
Automatic merge from submit-queue
Add test for --quiet flag for kubectl run
This adds a test for the changes introduced in #30247 and #28801.
Ref #28695
Automatic merge from submit-queue
Only skip petset test if resource is missing
**What this PR does / why we need it**:
Unblock testing petset on other providers.
cc @pwittrock. Would like to cherrypick onto 1.4 but this is test code only, so it can wait til after release cut.
Automatic merge from submit-queue
Adding support for upgrading testing across image types.
Adds support for upgrade testing across image types.
@spxtr @vishh @ixdy @pwittrock
This change only affects upgrade testing. This does not touch production code and hence should be safe for cherrypicks into the 1.4 release branch.
Automatic merge from submit-queue
Re-enable Federated Ingress e2e test to check connectivity to global load balancer
...Now that it works properly.
Should not merge before #31600, as it will fail until then.
In order to run tests that require ssh access to a node on vagrant
we need to provide path to private ssh key.
Now it will be possible to do using VAGRANT_SSH_KEY environment variable
Change-Id: Ic5fe0037edd46d0db3b8036ad7fc03cf1ea07574
Automatic merge from submit-queue
Ensure that we are closing files.
**What this PR does / why we need it**: In several places we are leaking file descriptors. This could be problematic on systems with low ulimits for them.
**Release note**:
```release-note
```
Automatic merge from submit-queue
Generate 1 5 clientset
Generate the 1.5 clientset. Stop updating 1.4 clientset. Remove 1.2 clientset.
@nikhiljindal @lavalamp
I will rebase #31994 atop of this one.
Automatic merge from submit-queue
remove the rest of the non-generated clients from the kubectl code
Die `Client` Die!
It's always bigger than you think. Last bit @kargakis after this, it's gone.
Automatic merge from submit-queue
Remove long sleep in provisioning e2e tests.
PV controller sync is now 15 seconds, i.e. the controller re-tries to delete a PV four times in a minute until it succeeds. There is no need to wait for three minutes.
@kubernetes/sig-storage
Automatic merge from submit-queue
Provide an e2e skip helper checking for available resource
@janetkuo @dims this is the promised util function, but unfortunately I just learned that dynamic client suffers from the problem I've fixed in the manually written one (https://github.com/kubernetes/kubernetes/pull/29187) I need to look into the dynamic client in that case :/
Automatic merge from submit-queue
Update container image version for downward api volume tests
Some tests were using 0.7, and some were using 0.6, so updating all to 0.7.
@kubernetes/rh-cluster-infra
Automatic merge from submit-queue
Revert "tag scheduledjob e2e as [Feature:ScheduledJob]"
Reverts kubernetes/kubernetes#32233
The way the e2e jobs are configured, `[Feature:...]` tests can't easily be run in jenkins-pr or any of submit-queue blocking jobs.
Automatic merge from submit-queue
Replace gcloud shelling out with cloudprovider calls.
gcloud flakes a lot leading to resource leak. Also fixes https://github.com/kubernetes/kubernetes/issues/16636 by verifying instance-groups, ssl-certs and firewall-rules and cleaned up.
Automatic merge from submit-queue
Add e2e tests that check for wrapped volume race
This PR adds two new e2e tests that reproduce the race condition fixed in #29641 (see e.g. #29297)
In order to observe the race, you need to revert the PR that fixes it, via e.g.
```
git revert -n df1e925143
```
or
```
curl -sL https://github.com/kubernetes/kubernetes/pull/29641.patch | patch -p1 -R
```
The tests are `[Slow]` because they need to run several passes that involve creating pods with many volumes. They also are `[Serial]` because the load on the cluster may affect reproducibility of the race. They take about ~450s each when they fail on standard GCE cluster created by `go run hack/e2e.go -v --up`. `git_repo` test takes about 66s to run when it succeeds (fix PR not reverted) and `configmap` test takes about 546s in this case because configmap mounting is slower and still requires 3 passes x 5 pods x 50 configmap volumes to fail constantly with fix PR reverted. Probably these times can be reduced but frankly I've already spent quite a bit of time on tuning the numbers to find a balance between reproducibility and speed.
Managed to reproduce the problem in more or less reliable way for `configMap` and `gitRepo` volumes. Tried to reproduce it for `secret` volumes too but without success so far because they use tmpfs-based `emptyDir` variety. For `downwardAPI` volumes I expect the same problems with race reproducibility as with `secret` volumes, although I think some e2e races were caused by the bug, e.g. #29633.
The tests operate by creating several pods (via an RC) with many volumes and waiting for them to become Running. It sets node affinity for pods so that they all get created on a single node (the first one in the node list). The race condition leads to volume mount failures with slow retries, thus causing the test to time out.
The test failures look like this:
configmap:
```
• Failure [435.547 seconds]
[k8s.io] Wrapped EmptyDir volumes
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/framework/framework.go:709
should not cause race condition when used for configmaps [Serial] [Slow] [It]
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/wrapped_empty_dir.go:170
Failed waiting for pod wrapped-volume-race-8c097734-6376-11e6-9ffa-5254003793ad-acbtt to enter running state
Expected error:
<*errors.errorString | 0xc8201758d0>: {
s: "timed out waiting for the condition",
}
timed out waiting for the condition
not to have occurred
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/wrapped_empty_dir.go:395
```
You'll see errors like this in kubelet log on the first node in the cluster:
```
E0816 00:27:23.319431 3510 configmap.go:174] Error creating atomic writer: stat /var/lib/kubelet/pods/e5986355-6347-11e6-a5d7-42010af00002/volumes/kubernetes.io~configmap/racey-configmap-14: no such file or directory
E0816 00:27:23.319478 3510 nestedpendingoperations.go:232] Operation for "\"kubernetes.io/configmap/e5986355-6347-11e6-a5d7-42010af00002-racey-configmap-14\" (\"e5986355-6347-11e6-a5d7-42010af00002\")" failed. No retries permitted until 2016-08-16 00:28:27.319450118 +0000 UTC (durationBeforeRetry 1m4s). Error: MountVolume.SetUp failed for volume "kubernetes.io/configmap/e5986355-6347-11e6-a5d7-42010af00002-racey-configmap-14" (spec.Name: "racey-configmap-14") pod "e5986355-6347-11e6-a5d7-42010af00002" (UID: "e5986355-6347-11e6-a5d7-42010af00002") with: stat /var/lib/kubelet/pods/e5986355-6347-11e6-a5d7-42010af00002/volumes/kubernetes.io~configmap/racey-configmap-14: no such file or directory
```
git_repo:
```
• Failure [455.035 seconds] [0/1882]
[k8s.io] Wrapped EmptyDir volumes
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/framework/framework.go:709
should not cause race condition when used for git_repo [Serial] [Slow] [It]
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/wrapped_empty_dir.go:179
Failed waiting for pod wrapped-volume-race-71b12b3d-6375-11e6-9ffa-5254003793ad-b0slz to enter running state
Expected error:
<*errors.errorString | 0xc8201758d0>: {
s: "timed out waiting for the condition",
}
timed out waiting for the condition
not to have occurred
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/wrapped_empty_dir.go:395
```
Errors in kubelet log:
```
E0815 23:41:08.670203 3510 nestedpendingoperations.go:232] Operation for "\"kubernetes.io/git-repo/97636bd8-6341-11e6-a5d7-42010af00002-racey-git-repo-8\" (\"97636bd8-6341-11e6-a5d7-42010af00002\")" failed. No retries permitted until 2016-08-15 23:42:12.670181604 +0000 UTC (durationBeforeRetry 1m4s). Error: MountVolume.SetUp failed for volume "kubernetes.io/git-repo/97636bd8-6341-11e6-a5d7-42010af00002-racey-git-repo-8" (spec.Name: "racey-git-repo-8") pod "97636bd8-6341-11e6-a5d7-42010af00002" (UID: "97636bd8-6341-11e6-a5d7-42010af00002") with: failed to exec 'git clone http://10.0.68.35:2345 test': : chdir /var/lib/kubelet/pods/97636bd8-6341-11e6-a5d7-42010af00002/volumes/kubernetes.io~git-repo/racey-git-repo-8: no such file or directory
```
Generally, the races cause unexpected "no such directory" errors in kubelet logs with subsequent volume mount failures.
I've added race tests to e2e test `empty_dir_wrapper.go` ("EmptyDir wrapper volumes"). This test was added in #18445, the same PR that introduced the race bug. The original purpose of the test was making sure that no conflicts occur between different wrapped emptyDir volumes, so I've replaced "should becomes" with "should not conflict" in the first `It(...)`.
Automatic merge from submit-queue
update taints e2e, restrict taints operation with key, effect
Since taints are now unique by key, effect on a node, this PR is to restrict existing taints adding/removing/updating operations in taints e2e.
Also fixes https://github.com/kubernetes/kubernetes/issues/31066#issuecomment-242870101
Related prior Issue/PR #29362 and #30590
Automatic merge from submit-queue
Log pressure condition, memory usage, events in memory eviction test
I want to log this to help us debug some of the latest memory eviction test flakes, where we are seeing burstable "fail" before the besteffort. I saw (in the logs) attempts by the eviction manager to evict besteffort a while before burstable phase changed to "Failed", but the besteffort's phase appeared to remain "Running". I want to see the pressure condition interleaved with the pod phases to get a sense of the eviction manager's knowledge vs. pod phase.
Automatic merge from submit-queue
Plumb --feature-gates from TEST_ARGS to components in node e2e tests
This means you can set `TEST_ARGS` on the command line, in a `.properties` config for a Jenkins job, etc, to toggle gated features. For example:
`TEST_ARGS='--feature-gates=DynamicKubeletConfig=true'`
/cc @vishh @jlowdermilk
Automatic merge from submit-queue
re-enable provisioning test
Reverts https://github.com/kubernetes/kubernetes/pull/32199 for when the gke control plane is updated. This should be merged AFTER gke is ready.
@kubernetes/sig-storage @wojtek-t
Automatic merge from submit-queue
Networking test rewrite
Decomposes the old kubeproxy tests into (tcp, udp) tests for each of the following:
* intra-pod
* node-pod
* pod-Service
* node-Service
* endpoint-Service
* delete endpoints, confirm unreachability
* delete nodeport, confirm unreachability
* kube-proxy /proxymode, /healthz
Also gets rid of the old network conformance test that used apiserver proxy to check reported peer count of a webserver in a container (the netexec pod used in this test does the same thing without apiserver proxy).
Fixes https://github.com/kubernetes/kubernetes/issues/26490, https://github.com/kubernetes/kubernetes/issues/14204
[Feature:...] tag is recognized by most e2e suites and will prevent
test from being run in suites where it should not. This pattern is
used by other alpha feature tests.
Automatic merge from submit-queue
federated ingress e2e - retry reads properly
@quinton-hoole I made this rookie mistake when I wrote the code and missed your comment in last PR.
- Corrected the reference of constant test/e2e/federated-ingress.go
- Move federation ingress query call to wait. PollImmediate()
- set private method to non-capital
Automatic merge from submit-queue
Use federated namespace instead of the bootstrap cluster's namespace in Ingress e2e tests.
This should fix#31825.
cc @kubernetes/sig-cluster-federation @quinton-hoole
Automatic merge from submit-queue
Move StorageClass to a storage group
We discussed the pros and cons in sig-api-machinery yesterday. Choosing a particular group name means that clients (including our internal code) require less work and re-swizzling to handle promotions between versions. Even if you choose a group you end up not liking, the amount of work remains the same as the incubator work case: you move the affected kind, resource, and storage.
This moves the `StorageClass` type to the `storage.k8s.io` group (named for consistency with authentication, authorization, rbac, and imagepolicy). There are two commits, one for manaul changes and one for generated code.
- updating java to 3.7
- added pet set example
- adding pet set for Cassandra e2e tests
- changed service as we do not want a lb service, as we are running C*
- updated docs
fixing headers and adding exception for run.sh
adding documentation, thank god for reflog
Did not mean to commit that as the README ... fixing
fixing problems in README
fixing more problems in README
more README tweaks
munge updates
updating examples_test for PetSet in Cassandra examples
updating petset to no use better security context
Automatic merge from submit-queue
Add e2e tests for eviction subresource.
This branch includes changes pending in both #31638 and #31721. I will rebase
once those merge.
Job controller synchronizes objects only when job itself or underlying pod
changes. Or, when full resync is performed once 10 mins. This leads e2e test
to unnecessarily wait that longer timeout, sometimes at least. I've added job
modification action which triggers resync, if the job wasn't terminated within
shorter period of time.
Automatic merge from submit-queue
Add e2e test for Source IP preservation (pod to service cluster IP)
Working on #27134.
This PR added the e2e test for source ip preservation (pod to service cluster IP) in service.go. Test scenario described as below:
- Pick two different nodes in cluster.
- Create a clusterIP type service.
- Create an echo server, which echoes back client IP, to be part of the service.
- Create a client on another node. Hit the server through service cluster IP.
- Verify the source IP.
@girishkalele @freehan
Automatic merge from submit-queue
Bumped memory limit for resource consumer. Fixes#31591.
Bumped memory limit for resource consumer from 100 MB to 200 MB, increased request sizes so that the number of consumers will be smaller. Fixes#31591.
Automatic merge from submit-queue
Check server version when running scheduled job e2e tests
@janetkuo this is the promised followup to #30575 which is checking minimal server version when running ScheduledJob e2e's.
Automatic merge from submit-queue
update e2e test for federation replicaset controlelr
e2e test to verify replicases synced to underlying clusters.
@quinton-hoole @nikhiljindal @deepak-vij @kshafiee @mwielgus
Automatic merge from submit-queue
Return detailed error message for better debugging.
Try to provide more details error message for debugging when this flake #31561 happens again.
@pwittrock
Automatic merge from submit-queue
Bump nfs server image tag in pv e2e
Image modified in https://github.com/kubernetes/kubernetes/pull/30084 has been pushed, so we can bump this back up to enable the part where pod writes to server with restrictive permissions
Automatic merge from submit-queue
Adding namespaces/finalizer subresource to federation apiserver
Fixes https://github.com/kubernetes/kubernetes/issues/31077
cc @kubernetes/sig-cluster-federation @mwielgus
Verified manually that I can delete federation namespaces now.
Will update federation-namespace e2e test to verify that namespace is deleted fine
Automatic merge from submit-queue
test/e2e: fix flake in kubelet expose should create services for rc
<!-- Thanks for sending a pull request! Here are some tips for you:
1. If this is your first time, read our contributor guidelines https://github.com/kubernetes/kubernetes/blob/master/CONTRIBUTING.md and developer guide https://github.com/kubernetes/kubernetes/blob/master/docs/devel/development.md
2. If you want *faster* PR reviews, read how: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/faster_reviews.md
3. Follow the instructions for writing a release note: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/pull-requests.md#release-notes
-->
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
<!-- Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access)
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`.
-->
```release-note
NONE
```
Add a loop to retry the request to account for the TLS Timeout and API
credential error responses outlined by the flakes in #29227.
Fixes#29227
Automatic merge from submit-queue
Skip hazelcast E2E test
**What this PR does / why we need it**:
Skip hazelcast e2e test due to flakiness, which in turn is (most likely) due to a race condition upstream. See https://github.com/pires/hazelcast-kubernetes-bootstrapper/issues/9 for comments.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
https://github.com/kubernetes/kubernetes/issues/30672
**Special notes for your reviewer**:
This is temporary pending upstream changes.
**Release note**:
NONE
This gives the node e2e test binary a --feature-gates flag that populates a
FeatureGates field on the test context. The value of this field is forwarded
to the kubelet's --feature-gates flag and is also used to populate the global
DefaultFeatureGate object so that statically-linked components see the same
feature gate settings as provided via the flag.
This means that you can set feature gates via the TEST_ARGS environment
variable when running node e2e tests. For example:
TEST_ARGS='--feature-gates=DynamicKubeletConfig=true'
Automatic merge from submit-queue
Make a scheduler predicates test resiliant to race for scheduledCondi…
Fix#31341
@pwittrock - this fixes a P1 flake.
FYI @mwielgus - I don't think that the race that caused this flake can impact cluster autoscaling, but you probably should know about it.
cc @wojtek-t
Automatic merge from submit-queue
Rewrite disruption e2e test to use versioned client.
This currently includes the changes from #31638. I will rebase once that is merged.
Automatic merge from submit-queue
e2e: log wget output on CheckConnectivityToHost error
Log output might help to diagnose e2e flakes, whether they are caused by dns issues or connection timeouts.
Might help with flake https://github.com/kubernetes/kubernetes/issues/28188.
Automatic merge from submit-queue
Add e2e tests for Federated Ingress
This is e2e code for federation ingress controller.
Based util functions, add federation ingress e2e cases(reuse current k8s ones) and add logic to validate the result.
Automatic merge from submit-queue
E2E tests for the Source IP Preservation for LoadBalancers
Breaking out E2E changes from the main PR - these tests require the Alpha feature gate turned on for this feature otherwise they will consistently fail.
Add a loop to retry the request to account for the TLS Timeout and API
credential error responses outlined by the flakes in #29227.
Fixes#29227
Signed-off-by: Jess Frazelle <me@jessfraz.com>
Automatic merge from submit-queue
add retries for add/update/remove taints on node in taints e2e
fixes taint update conflict in taints e2e by adding retries for add/update/remove taints on node.
ref #27655 and #31066
Automatic merge from submit-queue
Get network name via e2e environment.
This should work, right? I plan to pipe it through into the TestContext soon, just not today, and I'd like some test runtime over the weekend. Open to suggestions.
Automatic merge from submit-queue
Delete the broken Celery+RabbitMQ example
The celery container used in the example is broken and does not come up
on most distros. The e2e test that was validating this example was not
detecting the fact the celery pod was crash looping.
I attempted to fix the celery container, but it proved to be tedious.
The proposed fix is to update the glibc version to >= 2.23. In this case
it requires updating the python docker image and the celery base image.
https://github.com/kubernetes/kubernetes/issues/31456
has more details.
I'm deleting the example instead of marking it as broken because a user
might overlook the broken warning and it should be trivial to revert
this PR if someone can fix the celery container.
Automatic merge from submit-queue
Allow services which use same port, different protocol to use the same nodePort for both
fix#20092
@thockin @smarterclayton ptal.
Automatic merge from submit-queue
Add ExternalName kube-dns e2e test
ExternalName allows kubedns to return CNAME records for external
services. No proxying is involved.
Built on top of and includes #30599
See original issue at
https://github.com/kubernetes/kubernetes/issues/13748
Feature tracking at
https://github.com/kubernetes/features/issues/33
The e2e test is at least as comprehensive as the one for headless services (namely, only to some degree)
```release-note
Add ExternalName services as CNAME references to external ones
```
Most of the contents of docs/ has moved to kubernetes.github.io.
Development of the docs and accompanying files has continued there, making
the copies in this repo stale. I've removed everything but the .md files
which remain to redirect old links. The .yaml config files in the docs
were used by some tests, these have been moved to test/fixtures/doc-yaml,
and can remain there to be used by tests or other purposes.
Automatic merge from submit-queue
Add sysctl support
Implementation of proposal https://github.com/kubernetes/kubernetes/pull/26057, feature https://github.com/kubernetes/features/issues/34
TODO:
- [x] change types.go
- [x] implement docker and rkt support
- [x] add e2e tests
- [x] decide whether we want apiserver validation
- ~~[ ] add documentation~~: api docs exist. Existing PodSecurityContext docs is very light and links back to the api docs anyway: 6684555ed9/docs/user-guide/security-context.md
- [x] change PodSecurityPolicy in types.go
- [x] write admission controller support for PodSecurityPolicy
- [x] write e2e test for PodSecurityPolicy
- [x] make sure we are compatible in the sense of https://github.com/kubernetes/kubernetes/blob/master/docs/devel/api_changes.md
- [x] test e2e with rkt: it only works with kubenet, not with no-op network plugin. The later has no sysctl support.
- ~~[ ] add RunC implementation~~ (~~if that is already in kube,~~ it isn't)
- [x] update whitelist
- [x] switch PSC fields to annotations
- [x] switch PSP fields to annotations
- [x] decide about `--experimental-whitelist-sysctl` flag to be additive or absolute
- [x] decide whether to add a sysctl node whitelist annotation
### Release notes:
```release-note
The pod annotation `security.alpha.kubernetes.io/sysctls` now allows customization of namespaced and well isolated kernel parameters (sysctls), starting with `kernel.shm_rmid_forced`, `net.ipv4.ip_local_port_range`, `net.ipv4.tcp_max_syn_backlog` and `net.ipv4.tcp_syncookies` for Kubernetes 1.4.
The pod annotation `security.alpha.kubernetes.io/unsafeSysctls` allows customization of namespaced sysctls where isolation is unclear. Unsafe sysctls must be enabled at-your-own-risk on the kubelet with the `--experimental-allowed-unsafe-sysctls` flag. Future versions will improve on resource isolation and more sysctls will be considered safe.
```
Automatic merge from submit-queue
Make sure the StatusCode is taken into account in DoRaw()
**What this PR does / why we need it**:
Currently if there is an error (not found) the error printed out
is to do with the inablity to convert an empty body into the expected json.
This patch will fill in the err correctly.
example of before (with NotFound error):
$ kubectl top node
failed to unmarshall heapster response: json: cannot unmarshal object into Go value of type []v1alpha1.NodeMetrics
Now:
$ kubectl top node
the server could not find the requested resource (get services http:heapster:)
**Which issue this PR fixes**
related to bug #30818
**Special notes for your reviewer**:
None
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
[GarbageCollector] Allow per-resource default garbage collection behavior
What's the bug:
When deleting an RC with `deleteOptions.OrphanDependents==nil`, garbage collector is supposed to treat it as `deleteOptions.OrphanDependents==true", and orphan the pods created by it. But the apiserver is not doing that.
What's in the pr:
Allow each resource to specify the default garbage collection behavior in the registry. For example, RC registry's default GC behavior is Orphan, and Pod registry's default GC behavior is CascadingDeletion.
Automatic merge from submit-queue
federation: Adding support for namespace admission controls in federation-apiserver
Now that we have namespaces in federation apiserver, we can support namespace admission controls.
There are 3 of these:
namespace/autoprovision, namespace/exists and namespace/lifecycle.
namespace/autoprovision, namespace/exists should be deprecated in kubernetes(https://github.com/kubernetes/kubernetes/issues/31195). Adding support for namespace/lifecycle to federation-apiserver.
As in kube-apiserver, enabling namespace/lifecycle by default.
```release-note
Action required: If you have a running federation control plane, you will have to ensure that for all federation resources, the corresponding namespace exists in federation control plane.
federation-apiserver now supports NamespaceLifecycle admission control, which is enabled by default. Set the --admission-control flag on the server to change that.
```
cc @kubernetes/sig-cluster-federation @quinton-hoole
1. /validate service does not exist, so remove the test for it and add some that actually do exist
2. The namespace does not exist so this will always return NotFound
Note: DoRaw() ignores the StatusCode.
This is in preparation for the next commit
Automatic merge from submit-queue
[e2e density test] Fix unnecessary Delete RC requests when not running latency test
As the following code block
https://github.com/kubernetes/kubernetes/blob/master/test/e2e/density.go#L666-L670
shows, after running each density test case, it will attempt to delete "additional replication controllers" even though there is **no additional replication controller**.
When we are not running latency test, API Server will return "404 error code". So, I propose to move the above code block inside thedetermine statementsif `itArg.runLatencyTest{ }` , looks like:
```
if itArg.runLatencyTest {
...
for i := 1; i <= nodeCount; i++ {
name := additionalPodsPrefix + "-" + strconv.Itoa(i)
c.ReplicationControllers(ns).Delete(name, nil)
}
}
```
In this way, removing RC will be executed only if we set `itArg.runLatencyTest` to be `true`. It can avoid post some necessary requests to API Server.
Issuse is #30977
Automatic merge from submit-queue
[e2e test] Fix e2e test pause image hard code
Use `framework.GetPauseImageName(f.Client)` instead of hard code(such as `"gcr.io/google_containers/pause-amd64:3.0"`) to represent pause image name.
Related issus is #30967
Automatic merge from submit-queue
Add benchmark to jenkins
This PR contains the following changes:
1. Add more tests in density benchmark test;
2. Add the peak value (100%) in latency and CPU usage statistic data;
3. Move the Ginkgo focus flag from e2e_remote.go to run_e2e.go;
4. Support running benchmark in run_e2e.go. The benchmark configuration file is an extension of image configuration. Each item requires additional GCE machine type (e.g. n1-standard-1, default value will be used if empty) and test names (Ginkgo focus regex strings). A test item is regarded as benchmark if the tests field is non-empty.
Automatic merge from submit-queue
Scheduledjobs e2e
@janetkuo resubmitted e2e for SJ, I've updated all scripts to consume `KUBE_RUNTIME_CONFIG` properly in 2nd commit, ptal
Automatic merge from submit-queue
Unblock iterative development on pod-level cgroups
In order to allow forward progress on this feature, it takes the commits from #28017#29049 and then it globally disables the flag that allows these features to be exercised in the kubelet. The flag can be re-added to the kubelet when its actually ready.
/cc @vishh @dubstack @kubernetes/rh-cluster-infra
Automatic merge from submit-queue
Add logging time series to benchmark test
This PR adds a new file benchmark_util.go which contains tool functions for benchmark (we can migrate benchmark related functions into it).
The PR logs time series data for density benchmark test.
Automatic merge from submit-queue
Implement dynamic provisioning (beta) of PersistentVolumes via StorageClass
Implemented according to PR #26908. There are several patches in this PR with one huge code regen inside.
* Please review the API changes (the first patch) carefully, sometimes I don't know what the code is doing...
* `PV.Spec.Class` and `PVC.Spec.Class` is not implemented, use annotation `volume.alpha.kubernetes.io/storage-class`
* See e2e test and integration test changes - Kubernetes won't provision a thing without explicit configuration of at least one `StorageClass` instance!
* Multiple provisioning volume plugins can coexist together, e.g. HostPath and AWS EBS. This is important for Gluster and RBD provisioners in #25026
* Contradicting the proposal, `claim.Selector` and `volume.alpha.kubernetes.io/storage-class` annotation are **not** mutually exclusive. They're both used for matching existing PVs. However, only `volume.alpha.kubernetes.io/storage-class` is used for provisioning, configuration of provisioning with `Selector` is left for (near) future.
* Documentation is missing. Can please someone write some while I am out?
For now, AWS volume plugin accepts classes with these parameters:
```
kind: StorageClass
metadata:
name: slow
provisionerType: kubernetes.io/aws-ebs
provisionerParameters:
type: io1
zone: us-east-1d
iopsPerGB: 10
```
* parameters are case-insensitive
* `type`: `io1`, `gp2`, `sc1`, `st1`. See AWS docs for details
* `iopsPerGB`: only for `io1` volumes. I/O operations per second per GiB. AWS volume plugin multiplies this with size of requested volume to compute IOPS of the volume and caps it at 20 000 IOPS (maximum supported by AWS, see AWS docs).
* of course, the plugin will use some defaults when a parameter is omitted in a `StorageClass` instance (`gp2` in the same zone as in 1.3).
GCE:
```
apiVersion: extensions/v1beta1
kind: StorageClass
metadata:
name: slow
provisionerType: kubernetes.io/gce-pd
provisionerParameters:
type: pd-standard
zone: us-central1-a
```
* `type`: `pd-standard` or `pd-ssd`
* `zone`: GCE zone
* of course, the plugin will use some defaults when a parameter is omitted in a `StorageClass` instance (SSD in the same zone as in 1.3 ?).
No OpenStack/Cinder yet
@kubernetes/sig-storage
Automatic merge from submit-queue
Adding e2e test for federation replicasets
Its a basic test which tests that we can create and delete replicasets. Will enhance it when we write the replicaset controller.
cc @kubernetes/sig-cluster-federation
Automatic merge from submit-queue
Allow setting permission mode bits on secrets, configmaps and downwardAPI files
cc @thockin @pmorie
Here is the first round to implement: https://github.com/kubernetes/kubernetes/pull/28733.
I made two commits: one with the actual change and the other with the auto-generated code. I think it's easier to review this way, but let me know if you prefer in some other way.
I haven't written any tests yet, I wanted to have a first glance and not write them till this (and the API) are more close to the "LGTM" :)
There are some things:
* I'm not sure where to do the "AND 0777". I'll try to look better in the code base, but suggestions are always welcome :)
* The write permission on group and others is not set when you do an `ls -l` on the running container. It does work with write permissions to the owner. Debugging seems to show that is something happening after this is correctly set on creation. Will look closer.
* The default permission (when the new fields are not specified) are the same that on kubernetes v1.3
* I do realize there are conflicts with master, but I think this is good enough to have a look. The conflicts is with the autog-enerated code, so the actual code is actually the same (and it takes like ~30 minutes to generate it here)
* I didn't generate the docs (`generated-docs` and `generated-swagger-docs` from `hack/update-all.sh`) because my machine runs out of mem. So that's why it isn't in this first PR, will try to investigate and see why it happens.
Other than that, this works fine here with some silly scripts I did to create a secret&configmap&downwardAPI, a pod and check the file permissions. Tested the "defaultMode" and "mode" for all. But of course, will write tests once this is looking fine :)
Thanks a lot again!
Rodrigo
Automatic merge from submit-queue
Continue on #30774: Change podNamespacer API
continue on #30774, credit to @wojtek-t, Ref #30759
I just fixed a test and converted IsActivePod to operate on *Pod.
Automatic merge from submit-queue
Implement federation API server authentication e2e tests.
This PR depends on #30397. Please review only the last commit here.
Fixes: Issue #28602.
cc @kubernetes/sig-cluster-federation
This implements the proposal in:
docs/proposals/secret-configmap-downwarapi-file-mode.md
Fixes: #28317.
The mounttest image is updated so it returns the permissions of the linked file
and not the symlink itself.
Automatic merge from submit-queue
Fix default resource limits (node allocatable) for downward api volumes and env vars
@kubernetes/rh-cluster-infra @pmorie @derekwaynecarr
Automatic merge from submit-queue
two new pv e2e tests
Added two more pv e2e tests: 1) creating a claim before the pv (both not pre-bound), 2) creating a claim before the pv with the claim pre-bound to the PV via Spec.Volumename.
Automatic merge from submit-queue
Let load and density e2e tests use GC if it's on
I've run the 100 and 500 nodes tests and they both pass.
The test-infra half of the PR is https://github.com/kubernetes/test-infra/pull/369
cc @lavalamp
Automatic merge from submit-queue
Extend the wait time to 2*time.Minute for liveness check test.
Fixes https://github.com/kubernetes/kubernetes/issues/30264.
https://github.com/kubernetes/kubernetes/pull/29814 changes the wait time to 1 minute, which is not enough. That's what causes the flake.
The test expected the container to be restarted, and the container was indeed restarted but it took about 1 minute:
```
Aug 9 01:03:40.696: INFO: At 2016-08-09 01:02:35 -0700 PDT - event for liveness-exec: {default-scheduler } Scheduled: Successfully assigned liveness-exec to e2e-gce-agent-pr-38-0-minion-group-bv95
Aug 9 01:03:40.696: INFO: At 2016-08-09 01:02:36 -0700 PDT - event for liveness-exec: {kubelet e2e-gce-agent-pr-38-0-minion-group-bv95} Pulled: Container image "gcr.io/google_containers/busybox:1.24" already present on machine
Aug 9 01:03:40.696: INFO: At 2016-08-09 01:02:36 -0700 PDT - event for liveness-exec: {kubelet e2e-gce-agent-pr-38-0-minion-group-bv95} Created: Created container with docker id cf4e8e60535e
Aug 9 01:03:40.696: INFO: At 2016-08-09 01:02:36 -0700 PDT - event for liveness-exec: {kubelet e2e-gce-agent-pr-38-0-minion-group-bv95} Started: Started container with docker id cf4e8e60535e
Aug 9 01:03:40.696: INFO: At 2016-08-09 01:02:55 -0700 PDT - event for liveness-exec: {kubelet e2e-gce-agent-pr-38-0-minion-group-bv95} Unhealthy: Liveness probe failed:
Aug 9 01:03:40.696: INFO: At 2016-08-09 01:03:26 -0700 PDT - event for liveness-exec: {kubelet e2e-gce-agent-pr-38-0-minion-group-bv95} Killing: Killing container with docker id cf4e8e60535e: pod "liveness-exec_e2e-tests-container-probe-b1wip(a1f856fc-5e07-11e6-a7e1-42010af00002)" container "liveness" is unhealthy, it will be killed and re-created.
Aug 9 01:03:40.696: INFO: At 2016-08-09 01:03:26 -0700 PDT - event for liveness-exec: {kubelet e2e-gce-agent-pr-38-0-minion-group-bv95} Created: Created container with docker id 0b18537dc794
Aug 9 01:03:40.696: INFO: At 2016-08-09 01:03:26 -0700 PDT - event for liveness-exec: {kubelet e2e-gce-agent-pr-38-0-minion-group-bv95} Started: Started container with docker id 0b18537dc794
```
This PR recovers the wait time to the original 2 * time. Mark P0 to match the corresponding issue.
@fejta
Automatic merge from submit-queue
Remove unversioned federation client, clientset and versioned release_1_3 clientset and all their accesses in e2e tests. Switch everything to federation release_1_4 external client.
cc @kubernetes/sig-cluster-federation
<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.kubernetes.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.kubernetes.io/reviews/kubernetes/kubernetes/30397)
<!-- Reviewable:end -->
Automatic merge from submit-queue
[GarbageCollector] measure latency
First commit is #27600.
In e2e tests, I measure the average time an item spend in the eventQueue(~1.5 ms), dirtyQueue(~13ms), and orphanQueue(~37ms). There is no stress test in e2e yet, so the number may not be useful.
<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.kubernetes.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.kubernetes.io/reviews/kubernetes/kubernetes/28387)
<!-- Reviewable:end -->
Automatic merge from submit-queue
Add test for supplemental gid annotation to pv e2e test
Test for feature added in #29119. Adds the annotation to the pv in the PersistentVolumes e2e test. It tests it by checking the output of 'id -G.' Also tests the actual use case of an nfs server by using an nfs server image 'volume-nfs-gid', modified slightly from 'volume-nfs' to have some permissions set.
@pmorie
<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.kubernetes.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.kubernetes.io/reviews/kubernetes/kubernetes/30084)
<!-- Reviewable:end -->
Automatic merge from submit-queue
Cut the client repo, staging it in the main repo
Tracking issue: #28559
ref: https://github.com/kubernetes/kubernetes/pull/25978#issuecomment-232710174
This PR implements the plan a few of us came up with last week for cutting client into its own repo:
1. creating "_staging" (name is tentative) directory in the main repo, using a script to copy the client and its dependencies to this directory
2. periodically publishing the contents of this staging client to k8s.io/client-go repo
3. converting k8s components in the main repo to use the staged client. They should import the staged client as if the client were vendored. (i.e., the import line should be `import "k8s.io/client-go/<pacakge name>`). This requirement is to ease step 4.
4. In the future, removing the staging area, and vendoring the real client-go repo.
The advantage of having the staging area is that we can continuously run integration/e2e tests with the latest client repo and the latest main repo, without waiting for the client repo to be vendored back into the main repo. This staging area will exist until our test matrix is vendoring both the client and the server.
In the above plan, the tricky part is step 3. This PR achieves it by creating a symlink under ./vendor, pointing to the staging area, so packages in the main repo can refer to the client repo as if it's vendored. To prevent the godep tool from messing up the staging area, we export the staged client to GOPATH in hack/godep-save.sh so godep will think the client packages are local and won't attempt to manage ./vendor/k8s.io/client-go.
This is a POC. We'll rearrange the directory layout of the client before merge.
@thockin @lavalamp @bgrant0607 @kubernetes/sig-api-machinery
<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.kubernetes.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.kubernetes.io/reviews/kubernetes/kubernetes/29147)
<!-- Reviewable:end -->
Automatic merge from submit-queue
E2E & Node E2E: Move pods test into common directory
This is the 4th part of #29494.
For #29081.
Based on #29092, #29806.
The first commit is squash of all dependent commits. Please only review the last 2 commits.
The 2nd commit migrates pods.go to `common/pods.go`.
Notice that the test `should be schedule with cpu and memory limits` is removed because:
* It doesn't make sense at the node level.
* It should have been tested in scheduler_predicates at the cluster level https://github.com/kubernetes/kubernetes/blob/master/test/e2e/scheduler_predicates.go#L264
The 3rd commit splits pods.go into several pods (nothing is changed, only move code):
* **Liveness probe test:** Moved into `container_probe.go`.
* **Init container test:** Moved into a new file `init_container.go`.
* **Others:** Still in pods.go
@vishh @timstclair
<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.kubernetes.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.kubernetes.io/reviews/kubernetes/kubernetes/29814)
<!-- Reviewable:end -->
This test creates three pods with QoS of besteffort, burstable, and
guaranteed, respectively, which each contain a container that tries to
consume almost all the available memory at a rate of about 12Mi/10sec.
The expectation is that eviction will be initiated when the hard
memory.available<250Mi threshold is triggered, and that eviction will proceed
in the order of besteffort, then burstable. Since guaranteed pods should
only be evicted if something charged to the host uses more resources
than were reserved for it, we currently end the test when besteffort and
burstable have both been evicted.
Note that this commit also sets --eviction-hard=memory.available<250Mi
to enable eviction during tests.
Automatic merge from submit-queue
e2e flake: real fix of PodAntiAffinity test
The fix in PR https://github.com/kubernetes/kubernetes/pull/30135 was wrong in using a wrong test condition for an already broken test.
### Summary
The test tries to launch a pod with an anti-affinity annotation, waits 10
seconds and then checks that it is still pending.
But the anti-affinity annotation does not forbid to launch that pod on just
another node that does not have the zone label at all.
This commit changes this behavior by labeling two nodes with the zone label
and then forcing the pod to be launched on one of those two nodes.
**I assume here that a non-existing label is considered as a different label value.**
Fixes#30078
Automatic merge from submit-queue
Use report-dir in test framework instead.
We already have `report-dir` option in framework test context.
The node e2e framework should use it as well.
/cc @ronnielai
Automatic merge from submit-queue
E2E & Node E2E: Move configmap, docker_containers, downward_api, expansion and secrets test into common directory.
This is the 3rd part of #29494.
For #29081.
Based on #29092, #29806.
The first commit is squash of all dependent commits. Please only review the second commit.
The second PR added 17 lines.
@vishh @timstclair
Automatic merge from submit-queue
Bug fix: Use p.Name instead of pod.Name
For example, if you used `pod.GenerateName`, `pod.Name` might be the empty
string while `p.Name` contains the actual name of your pod. Thus passing
`pod.Name` can result in a `resource name may not be empty` error.
For example, if you used pod.GenerateName, pod.Name might be the empty
string while p.Name contains the actual name of your pod. Thus passing
pod.Name can result in a `resource name may not be empty` error.
The test tries to launch a pod with an anti-affinity annotation, waits 10
seconds and then checks that it is still pending.
But the anti-affinity annotation does not forbid to launch that pod on just
another node that does not have the zone label at all.
This commit changes this behavior by labeling two nodes with the zone label
and then forcing the pod to be launched on one of those two nodes.
Automatic merge from submit-queue
E2E & NodeE2E: Move host_path, downwardapi_volume and empty_dir into common directory.
This is the second part of #29494.
For #29081.
Based on #29092, #29806.
The first commit is squash of all dependent commits. Please only review the second commit.
The second PR is only 20 lines of change.
@vishh @timstclair
Automatic merge from submit-queue
Node E2E: Change the node e2e junit file name to junit_{image-name}{test-node-number}.xml
Fixes https://github.com/kubernetes/kubernetes/issues/30103.
Reuse the `report-prefix` in e2e test framework. Now the junit file will be like: `junit_{image-name}{test-node-number}.xml`.
Mark P2 to fix the test result.
/cc @rmmh
Automatic merge from submit-queue
federation: Adding secret API
Adding secret API to federation-apiserver and updating the federation client to include secrets
Automatic merge from submit-queue
pv e2e refactor and pre-bind test
refactored persistentvolume e2e so that multiple It() tests can be run. Added one test case for pre-binding, but the overall structure of the test should allow additional test cases to be more easily added.
Automatic merge from submit-queue
Fix deployment e2e test: waitDeploymentStatus should error when entering an invalid state
Follow up #28162
1. We should check that max unavailable and max surge aren't violated at all times in e2e tests (didn't check this in deployment scaled rollout yet, but we should wait for it to become valid and then continue do the check until it finishes)
2. Fix some minor bugs in e2e tests
@kubernetes/deployment
Automatic merge from submit-queue
Revert "Revert "Drop support for --gce-service-account, require activated creds""
Reverts kubernetes/kubernetes#29242
Automatic merge from submit-queue
Limit number of pods spawned in SchedulerPredicates validates resourc…
Fixes https://github.com/kubernetes/kubernetes/issues/29190,
With this patch test should spawn at most 10 pods on the smallest node.
Add min size of pod and max number of pods for SchedulerPredicates validate resouce limits test
Fix typo in patch for SchedulerPredicates validate resouce limits test
Moving max number of pods and min pod cpu request to constants
Automatic merge from submit-queue
Add support to quota pvc storage requests
Adds support to quota cumulative `PersistentVolumeClaim` storage requests in a namespace.
Per our chat today @markturansky @abhgupta - this is not done (lacks unit testing), but is functional.
This lets quota enforcement for `PersistentVolumeClaim` to occur at creation time. Supporting bind time enforcement would require substantial more work. It's possible this is sufficient for many, so I am opening it up for feedback.
In the future, I suspect we may want to treat local disk in a special manner, but that would have to be a different resource altogether (i.e. `requests.disk`) or something.
Example quota:
```
apiVersion: v1
kind: ResourceQuota
metadata:
name: quota
spec:
hard:
persistentvolumeclaims: "10"
requests.storage: "40Gi"
```
/cc @kubernetes/rh-cluster-infra @deads2k
bindata and yaml, Gobindata automation
bindata utils for generating, go generate
match server version
gitignore for dirty, ca, rbase, KUBE_ROOT, buildfix
(rebased jul-25,29)
Automatic merge from submit-queue
Remove redundant pod deletion in scheduler predicates tests and fix taints-tolerations e2e
~~In scheduler predicates test, some tests won't clean pods they created when exit with failure, which may lead to pod leak. This PR is to fix it.~~
Remove redundant pod deletion in scheduler predicates tests, since framework.AfterEach() already did the cleanup work after every test.
Also fix the test "validates that taints-tolerations is respected if not matching", refer to the change on taint-toleration test in #29003, and https://github.com/kubernetes/kubernetes/pull/24134#discussion_r63794924.
Automatic merge from submit-queue
Faster test
<!--
Checklist for submitting a Pull Request
Please remove this comment block before submitting.
1. Please read our [contributor guidelines](https://github.com/kubernetes/kubernetes/blob/master/CONTRIBUTING.md).
2. See our [developer guide](https://github.com/kubernetes/kubernetes/blob/master/docs/devel/development.md).
3. If you want this PR to automatically close an issue when it is merged,
add `fixes #<issue number>` or `fixes #<issue number>, fixes #<issue number>`
to close multiple issues (see: https://github.com/blog/1506-closing-issues-via-pull-requests).
4. Follow the instructions for [labeling and writing a release note for this PR](https://github.com/kubernetes/kubernetes/blob/master/docs/devel/pull-requests.md#release-notes) in the block below.
-->
In attempting to troubleshoot flakes with this test case I actually wanted to understand how it worked.
There's some poor comments that need work.
I added some additional output which may or may not help in debugging the flakes.
I doubt this fixes the flake.
My major concern is the 'refactor' I did of the test case to batch up runs by sub-test-case. As it stood there was a 200ms pause between each sub, so they should not have interfered with each other. Now they are just started as fast as possible, but only 20 run at a time before moving on to the next 20. I am not sure if I am violating the ethos of the original test case.
Runs on my computer are down from 2m40s -> 40s.
Getting rid of the arbitrary client limiting brings it down to ~12 seconds. 11 to fetch the image and <1 to actually run the tests against the proxies. I can add a zero to the number of loops if you want to hit it harder. It would result in 10x as much text output though.
[]()
Automatic merge from submit-queue
Add support for kubectl create quota command
Follow-up of https://github.com/kubernetes/kubernetes/pull/19625
```
Create a resourcequota with the specified name, hard limits and optional scopes
Usage:
kubectl create quota NAME [--hard=key1=value1,key2=value2] [--scopes=Scope1,Scope2] [--dry-run=bool] [flags]
Aliases:
quota, q
Examples:
// Create a new resourcequota named my-quota
$ kubectl create quota my-quota --hard=cpu=1,memory=1G,pods=2,services=3,replicationcontrollers=2,resourcequotas=1,secrets=5,persistentvolumeclaims=10
// Create a new resourcequota named best-effort
$ kubectl create quota best-effort --hard=pods=100 --scopes=BestEffort
```
Automatic merge from submit-queue
Rework pod waiting mechanism in e2e tests to accept pod and watch based
This PR re-applies #28212 which was reverted in #29223. The only difference is that the initial PR contained also `PodStartTimeout` shortening (see [here](4b0c0bd924)) which might caused the problems. Let's give it a 2nd try. I've tested all the flakes and they were passing on my machine.
@smarterclayton @apelisse ptal
- what the test is doing
- how the test is set up
- subsections of the test setup
additional output
- print time spent getting ready to run proxy attempts
- number of test cases
- multiple attempts of each test case
- how many total proxying attempts will be made
- fast path output now has numerical identity of attempt like error output
- error output has time taken and http status like fast path output
batching runs
- run groups of test cases vs starting all 34*20=680 proxy attempts at
the same time.
- don't wait between starting proxy attempts anymore.
proxy e2e changes
- disable the client side rate limiter
- use `By` construct of ginkgo for inline `STEP` logging
- move the waitGroup add outside of the loop
Automatic merge from submit-queue
test/e2e: plug time.Ticker resource leak.
This commit ensures that `logPodStartupStatus` does not leak
running `time.Ticker` instances. Upon termination of the consuming
routine, we stop the ticker.
Automatic merge from submit-queue
use regular client instead of kubectl in scheduler predicate tests when checking/setting/cleanning taints/labels
The existing implementation in scheduler predicate tests uses kubectl to check/set/clean taints/labels on node, which makes the test very related to kubectl.
This PR is to use regular client instead.
Automatic merge from submit-queue
Revert "Drop support for --gce-service-account, require activated creds"
Reverts kubernetes/kubernetes#28802
This appears to break the soak tests with "invalid grant" errors -- see the recent batch of errors in #27920.
Automatic merge from submit-queue
Start namespace controller in node e2e
Fix https://github.com/kubernetes/kubernetes/issues/28320.
Based on https://github.com/kubernetes/kubernetes/pull/28807, only the last 2 commits are new.
Before this PR, there was no namespace controller running in node e2e test infrastructure. We can not enable the [`delete-namespace`](f2ddd60eb9/test/e2e/framework/test_context.go (L109)) flag in the test framework.
So after the test running, there will be running pod left on the test node. This seems to be acceptable in our test infrastructure because we create an new instance each time.
However, in 1.4 we may want to provide part of the test as node conformance test to the user, they definitely don't want the test to leave tons of pods on their node after test running.
Currently, there is no easy way to only start namespace controller in kube-controller-manager (confirmed with @mikedanese), so in this PR I started a "uncontainerized" one in the test infrastructure.
This PR:
* Started the namespace controller in the node e2e test infrastructure and enable the automatic namespace deletion.
* Change the privileged test to use framework (@yujuhong), so that all node e2e tests are using the framework and test pods will be cleaned up by namespace controller.
/cc @kubernetes/sig-node
Automatic merge from submit-queue
Switched watches in tests require ResourceVersion to be passed
For testing the Watches are not sufficient in that it might miss the event of transitioning a Pod from one state to another which might happen before we start Watching events. To remedy this, I'm proposing to switch to Gets to always read the actual state of a Pod.
@smarterclayton this fixes https://github.com/openshift/origin/issues/9192 and hopefully all `gave up waiting for pod...` flakes
[]()
Automatic merge from submit-queue
Node E2E: Make it possible to share test between e2e and node e2e
This PR is part of the plan to improve node e2e test coverage.
* Now to improve test coverage, we have to copy test from e2e to node e2e.
* When adding a new test, we have to decide its destiny at the very beginning - whether it is a node e2e or e2e.
This PR makes it possible to share test between e2e and node e2e.
By leveraging the mechanism of ginkgo, as long as we can import the test package in the test suite, the corresponding `Describe` will be run to initialize the global variable `_`, and the test will be inserted into the test suite. (See https://github.com/onsi/composition-ginkgo-example)
In the future, we just need to use the framework to write the test, and put the test into `test/e2e/node`, then it will be automatically shared by the 2 test suites.
This PR:
1) Refactored the framework to make it automatically differentiate e2e and node e2e (Mainly refactored the `PodClient` and the apiserver client initialization).
2) Created a new directory `test/e2e/node` and make it shared by e2e and node e2e.
3) Moved `container_probe.go` into `test/e2e/node` to verify the change.
@kubernetes/sig-node
[]()
Automatic merge from submit-queue
Drop support for --gce-service-account, require activated creds
Now that `gcloud auth activate-service-account` is in remove support in the test framework for default service accounts -- testing GCE/GKE now requires prior gcloud activation.
This commit ensures that `logPodStartupStatus` does not leak
running `time.Ticker` instances. Upon termination of the consuming
routine, we stop the ticker.
Automatic merge from submit-queue
Fix verify results in MaxPods
As we already have "unschedulable" PodCondition we can stop relying on Events, which should make the tests more reliable.
cc @davidopp
Automatic merge from submit-queue
[garbage collector] add e2e test
This PR also includes some changes to plumb controller-manager's `--enable_garbage_collector` from the environment variable.
The e2e test will not be run by the core suite because it's marked `[Feature:GarbageCollector]`.
The corresponding jenkins job configuration PR is https://github.com/kubernetes/test-infra/pull/132.
Automatic merge from submit-queue
Deprecate the term "Ubernetes"
Deprecate the term "Ubernetes" in favor of "Cluster Federation" and "Multi-AZ Clusters"
Automatic merge from submit-queue
Fix path for examples - storage/volume directories changed
Added /volume and /storage in a couple of spots.
Fixes#27978
Automatic merge from submit-queue
Return server's representation of pod from framework pod creation functions
Since PodInterface.Create returns the server's representation of the pod, which may differ from the api.Pod object passed to Create, we do the same from the framework's pod creation functions. This is useful if e.g. you create pods using Pod.GenerateName rather than Pod.Name, and you still want to refer to pods by name later on (e.g. for deletion).
cc @timstclair
Since PodInterface.Create returns the server's representation of the
pod, which may differ from the api.Pod object passed to Create, we do
the same from the framework's pod creation functions. This is useful if
e.g. you create pods using Pod.GenerateName rather than
Pod.Name, and you still want to refer to pods by name later on
(e.g. for deletion).
Search and replace for references to moved examples
Reverted find and replace paths on auto gen docs
Reverting changes to changelog
Fix bugs in test-cmd.sh
Fixed path in examples README
ran update-all successfully
Updated verify-flags exceptions to include renamed files
Automatic merge from submit-queue
e2e: increase timeout when waiting for deployment pods to be deleted
Use the same timeout as the one used for waiting for the deployment
reaper to complete.
Takes a stab at https://github.com/kubernetes/kubernetes/issues/28067
@kubernetes/deployment PTAL
Automatic merge from submit-queue
Add MinReadySeconds to rolling updater
Add MinReadySeconds support to RollingUpdater that allows to specify the number of seconds to wait on top of the pod is "ready" because its readiness probe passed.
Automatic merge from submit-queue
Remove duplicated nginx image. Use nginx-slim instead
This PR removes the image `gcr.io/google_containers/nginx:1.7.9` and uses `gcr.io/google_containers/nginx-slim:0.7`.
Besides removing the duplication `1.7.9` is 16 months old.
Automatic merge from submit-queue
Fix federation e2e tests by correctly managing cluster clients
1. The main fix: Correct overall BeforeEach() to create a new set of cluster clients, rather than just append to the set created by all previous tests. This was screwing up a lot of stuff in difficult to diagnose ways.
2. Add lots of debug logging.
3. Be better about cleaning up after each test.
```
SUCCESS! -- 6 Passed | 0 Failed :-)
```
cc @nikhiljindal @madhusudancs @mfanjie @colhom FYI
Automatic merge from submit-queue
Add two pd tests with default grace period
Add two tests in pd.go. They are same as the flaky test, but the pod deletion has default grace period
Automatic merge from submit-queue
Skip multi-zone e2e tests unless provider is GCE, GKE or AWS
No need to fail the tests. If label is not present then it means that node is not in any zone.
Related issue: #27372
Automatic merge from submit-queue
Mark "RW PD, remove it, then schedule" test flaky
Mark test as flaky while it is being investigated. Tracked by https://github.com/kubernetes/kubernetes/issues/27691
Assigning to @jlowdermilk since he's on call
Automatic merge from submit-queue
e2e: Allow skipping tests for specific runtimes, skip a few tests under rkt
The main benefit of this is that it gives a developer more useful output (more signal to noise) for things that are known broken on that runtime.
cc @kubernetes/rktnetes-maintainers , @ixdy
I'll run this PR through our jenkins and make sure things look happy and compare to the e2e results for this PR.
Automatic merge from submit-queue
[Refactor] QOS to have QOS Class type for QoS classes
This PR adds a QOSClass type and initializes QOSclass constants for the three QoS classes.
It would be good to use this in all future QOS related features.
This would be good to have for the (Pod level cgroups isolation proposal)[https://github.com/kubernetes/kubernetes/pull/26751] that i am working on aswell.
@vishh PTAL
Signed-off-by: Buddha Prakash <buddhap@google.com>
Automatic merge from submit-queue
e2e.framework.util.StartPods: panic if the number or replicas is zero
The number of pods to start must be non-zero.
Otherwise the function waits for pods forever if ``waitForRunning`` is true.
It the number of replicas is zero, panic so the mistake is heard all over the e2e realm.
Update all callers of StartPods to test for non-zero number of replicas.
Automatic merge from submit-queue
Set grace period to 0 when deleting namespaces after the test.
Otherwise, we try to run the next test and the pods are still there.
Automatic merge from submit-queue
Proportionally scale paused and rolling deployments
Enable paused and rolling deployments to be proportionally scaled.
Also have cleanup policy work for paused deployments.
Fixes#20853Fixes#20966Fixes#20754
@bgrant0607 @janetkuo @ironcladlou @nikhiljindal
<!-- Reviewable:start -->
---
This change is [<img src="http://reviewable.k8s.io/review_button.svg" height="35" align="absmiddle" alt="Reviewable"/>](http://reviewable.k8s.io/reviews/kubernetes/kubernetes/20273)
<!-- Reviewable:end -->
Automatic merge from submit-queue
e2e: Delete old code
These tests were added commented out over a year ago. Now they don't compile. The port forward test has a whole file devoted to replacing it (`e2e/portforward.go`) and while the exec test doesn't have a perfect replacement, it has several tests that cover for it (exec over a websocket, an e2e_node test, all the kubectl execs). If we want that test, it would be better to write it fresh anyways.
cc @ncdc
Automatic merge from submit-queue
Use gcloud for default node pool and api for other in cluster autoscaler e2e test
cc: @piosz @jszczepkowski @fgrzadkowski
Currently there is a problem with gcloud when non-default pool is used for cluster update. So we temporarily switch to the old ca-enable method for non-default pools until it is fixed.