Per Clayton's suggestion, move stuff from cluster/lib/util.sh to
hack/lib/util.sh. Also consolidate ensure-temp-dir and use the
hack/lib/util.sh implementation rather than cluster/common.sh.
Automatic merge from submit-queue (batch tested with PRs 42662, 43035, 42578, 43682)
Apply [Feature:Volumes] to subset of tests in volumes.go
**What this PR does / why we need it**:
According to [this test doc](https://github.com/kubernetes/community/blob/master/contributors/devel/e2e-tests.md#kinds-of-tests), the `[Feature:xyz]` tag indicates that the tests have a non-default requirements and should not be run in the core suite. Previously all test in _e2e/volumes.go_ were tagged `[Feature:Volumes]` but I believe only six of the tests warrant this tag: iSCSI, GlusterFS, Ceph-RBD, Ceph-FS, vSPhere, and Cinder. The remaining tests (NFS, PD, ConfigMap) do not need the `Feature:` tag.
**Special notes for your reviewer**:
I moved the `[Volume]` tags from the `It`'s to the outer `KubeDescribe` to simplify the `It` text and remove redundancy.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 42662, 43035, 42578, 43682)
[Federation] Fix a typo in the ReplicaSet E2E test: s/extrace/extract/
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 43726, 43643)
Make a smaller redis image for testing, based on Alpine.
**What this PR does / why we need it**:
This shrinks gcr.io/google_containers/redis from 400MB to 5MB, which should reduce flakes.
**Which issue this PR fixes**:
fixes#43631
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 42617, 43247, 43509, 43644, 43820)
Adding a SSH tunnel check to GKE upgrades
**What this PR does / why we need it**: After an upgrade, it takes some time fore the master to re-establish SSH tunnels to the nodes. This adds a wait for GKE since it runs in this configuration.
**Which issue this PR fixes**: fixes#43611, #43612
Automatic merge from submit-queue (batch tested with PRs 38741, 41301, 43645, 43779, 42337)
De-Flake PersistentVolume E2E: Isolate resources with selectors
PersistentVolume tests continue to flake because tests Claims often bind to other, parallel tests' volumes. This is addressed by isolating test resource with selector labels specific to each test namespace. Doing so enables deterministic binding within each test space and allows tests to run in parallel.
1. Assign selector label to volumes and claims per test.
2. Remove `[Serial]` tag
cc @jeffvance
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 38741, 41301, 43645, 43779, 42337)
replicaset adopt and release e2e tests
**What this PR does / why we need it**: adds e2e tests for replicaset pod adoption and release
**Which issue this PR fixes** fixes#39962
**Special notes for your reviewer**: for adoption pod is created, a replicaset with a selector matching the pod's label is created and then we check for the pod being in the list of pods of the replicaset. For release, a replicaset with two replicas is created, the label of one of them is patched and then we check that that pod is no longer in the list of pods of the replicaset.
For checking the list of pods the `framework.PodsCreated` is used, maybe the name is not very adecuate given that it is used to return the pods belonging to one replicaset, no matter how they were created. Also, the patching is done through `framework.KubectlOrDie`, not sure if this is better that calling the API directly.
cc @liggitt
Comments, log lines, exported MakePersistentVolume
Moved pv/pvcConfig assignment out of diskName check, nil ptrs and configs afterEach
change label/selector code to use k8s defined types from generic string:string maps
adjust for gce refactor
Automatic merge from submit-queue
Extract GCEPD pv tests
**What this PR does / why we need it**:
This is strictly a refactor moving the GCEPD suite in Persistent Volumes E2E to it's own file. This will make future provider specific additions to pv testing much more organized and readable. It will also enable a smoother transition to providers moving out of tree by consolidating related tests.
```release-note
NONE
```
Automatic merge from submit-queue
Volume Provisioning E2E: test PVC delete causes PV delete
**What this PR does / why we need it**:
Test for a regression addressed in #21268. There was a case where the PVC being created and deleted quickly may result in a provisioned PV left behind as `Available.`
```release-note
NONE
```
cc @jeffvance
Removed wait for PVC phase Pending.
iterate test 100 times to increase chance of regression
Moved claim obj assignment out of loop.
add wait loop check for PVs
loop until no PVs detected
refactor per git comments
replace api calls with framework wrappers
add default suffix
Automatic merge from submit-queue
Fix problems of not-starting image pullers
In e2e.go there are the following lines:
https://github.com/kubernetes/kubernetes/blob/master/test/e2e/e2e.go#L150
```
if err := framework.WaitForPodsSuccess(c, metav1.NamespaceSystem, framework.ImagePullerLabels, imagePrePullingTimeout); err != nil {
// There is no guarantee that the image pulling will succeed in 3 minutes
// and we don't even run the image puller on all platforms (including GKE).
// We wait for it so we get an indication of failures in the logs, and to
// maximize benefit of image pre-pulling.
framework.Logf("WARNING: Image pulling pods failed to enter success in %v: %v", imagePrePullingTimeout, err)
}
```
However, few lines above:
https://github.com/kubernetes/kubernetes/blob/master/test/e2e/e2e.go#L143
we were waiting for all image pullers to actually enter Success state. It's pretty clear that the latter wasn't expected.
This PR is fixing this problem.
Ref #43728
@anhowe @davidopp
Automatic merge from submit-queue
Move cluster logging tests to a separate folder
Since there are several e2e tests for cluster logging and the infrastructure for them got complicated, it makes sense to move those tests to a separate folder.
Also, adding myself and Piotr to OWNERS of this directory as owners of the tests.
Automatic merge from submit-queue
Move DNS configmap tests to slow, serial suites
These tests take a long time due to the ConfigMap update interval
and may briefly disrupt DNS resolution in the cluster.
Automatic merge from submit-queue
Use ping to ip instead of wget google.com in net connectivity check
This is a flakey test and this commit reduces the number of dependent
systems involved with the flake.
Automatic merge from submit-queue (batch tested with PRs 42835, 42974)
remove legacy insecure port options from genericapiserver
The insecure port has been a source of problems and it will prevent proper aggregation into a cluster, so the genericapiserver has no need for it. In addition, there's no reason for it to be in the main kube-apiserver flow either. This pull removes it from genericapiserver and removes it from the shared kube-apiserver code. It's still wired up in the command, but its no longer possible for someone to mess up and start using in mainline code.
@kubernetes/sig-api-machinery-misc @ncdc
Automatic merge from submit-queue
proxy to IP instead of name, but still use host verification
I think I found a setting that lets us proxy to an IP and still do hostname verification on the certificate.
@liggitt @sttts Can you see if you agree that this knob does what I think it does? Last commit only, still needs tests.
Automatic merge from submit-queue (batch tested with PRs 41728, 42231)
Adding new tests to e2e/vsphere_volume_placement.go
**What this PR does / why we need it**:
Adding new tests to e2e/vsphere_volume_placement.go
Below is the tests description and test steps.
**Test Back-to-back pod creation/deletion with different volume sources on the same worker node**
1. Create volumes - vmdk2, vmdk1 is created in the test setup.
2. Create pod Spec - pod-SpecA with volume path of vmdk1 and NodeSelector set to label assigned to node1.
3. Create pod Spec - pod-SpecB with volume path of vmdk2 and NodeSelector set to label assigned to node1.
4. Create pod-A using pod-SpecA and wait for pod to become ready.
5. Create pod-B using pod-SpecB and wait for POD to become ready.
6. Verify volumes are attached to the node.
7. Create empty file on the volume to make sure volume is accessible. (Perform this step on pod-A and pod-B)
8. Verify file created in step 5 is present on the volume. (perform this step on pod-A and pod-B)
9. Delete pod-A and pod-B
10. Repeatedly (5 times) perform step 4 to 9 and verify associated volume's content is matching.
11. Wait for vmdk1 and vmdk2 to be detached from node.
12. Delete vmdk1 and vmdk2
**Test multiple volumes from different datastore within the same pod**
1. Create volumes - vmdk2 on non default shared datastore.
2. Create pod Spec with volume path of vmdk1 (vmdk1 is created in test setup on default datastore) and vmdk2.
3. Create pod using spec created in step-2 and wait for pod to become ready.
4. Verify both volumes are attached to the node on which pod are created. Write some data to make sure volume are accessible.
5. Delete pod.
6. Wait for vmdk1 and vmdk2 to be detached from node.
7. Create pod using spec created in step-2 and wait for pod to become ready.
8. Verify both volumes are attached to the node on which PODs are created. Verify volume contents are matching with the content written in step 4.
9. Delete POD.
10. Wait for vmdk1 and vmdk2 to be detached from node.
11. Delete vmdk1 and vmdk2
**Test multiple volumes from same datastore within the same pod**
1. Create volumes - vmdk2, vmdk1 is created in testsetup
2. Create pod Spec with volume path of vmdk1 (vmdk1 is created in test setup) and vmdk2.
3. Create pod using spec created in step-2 and wait for pod to become ready.
4. Verify both volumes are attached to the node on which pod are created. Write some data to make sure volume are accessible.
5. Delete pod.
6. Wait for vmdk1 and vmdk2 to be detached from node.
7. Create pod using spec created in step-2 and wait for pod to become ready.
8. Verify both volumes are attached to the node on which PODs are created. Verify volume contents are matching with the content written in step 4.
9. Delete POD.
10. Wait for vmdk1 and vmdk2 to be detached from node.
11. Delete vmdk1 and vmdk2
**Which issue this PR fixes**
fixes #
**Special notes for your reviewer**:
Executed tests against K8S v1.5.3 release
**Release note**:
```release-note
NONE
```
cc: @kerneltime @abrarshivani @BaluDontu @tusharnt @pdhamdhere
Automatic merge from submit-queue (batch tested with PRs 43149, 41399, 43154, 43569, 42507)
Distribute load in cluster load tests uniformly
This PR makes cluster logging load tests distribute logging uniformly, to avoid situation, where 80% of pods are allocated on one node and overall results are worse then it could be.
Automatic merge from submit-queue (batch tested with PRs 43429, 43416, 43312, 43141, 43421)
Make e2e-dns test more stable by increasing wait time
**What this PR does / why we need it**:
In many cases, 60 seconds are not enough for generating all dns results.
Since the probeCmd takes up to 600 seconds, use 600 here too.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*:
fixes#43264, #43094, #43100
**Special notes for your reviewer**:
**Release note**:
NONE
Automatic merge from submit-queue (batch tested with PRs 43378, 43216, 43384, 43083, 43428)
Darwin won't build: syscall.Sysinfo issue.
**What this PR does / why we need it**: On darwin had problems building and testing because of syscall.Sysinfo_t etc which is a linux specific command.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*:
**Special notes for your reviewer**: Definitely would like another set of eyes on the bootTime function, it will have to be inaccurate but open to suggestions about improving this for darwin.
**Release note**:
```
NONE
```
Automatic merge from submit-queue (batch tested with PRs 43144, 42671, 43226, 43314, 43361)
Removal of unused mesos e2e test.
**What this PR does / why we need it**:
Remove mesos e2e test which is not used.
```
NONE
```
/cc @sttts @k82cn
Automatic merge from submit-queue
add local option to APIService
APIServices need an option to avoid proxying in cases where the groupversion is handled later in the chain. This will allow a coherent and complete set of APIServices, but won't require extra connections.
@kubernetes/sig-api-machinery-misc @ncdc @cheftako
Automatic merge from submit-queue (batch tested with PRs 42672, 42770, 42818, 42820, 40849)
kubemark test: Bump addon-manager to v6.4-beta.1
Follow up PR of #42760. This PR bumps addon-manager to v6.4-beta.1 for kubemark test.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 43048, 43624, 43649)
[Federation][e2e] Ingress delays and service DNS issues
Ingress has been seen to take >10 minutes to allocate an IP in some circumstances (even more so in parallel testing). Also, due to issues with Services and DNS, disable those tests so we can get a green grid (see #43646)
Automatic merge from submit-queue (batch tested with PRs 43653, 43654)
[Federation] Disable the E2E test for federated replica set rebalancing
We are able to reproduce the flaky failure locally, and can debug without running this on the CI.
Automatic merge from submit-queue (batch tested with PRs 43642, 43170, 41813, 42170, 41581)
Add the ability to customize federation system namespace in e2e turn up scripts.
**Release note**:
```release-note
NONE
```
Ingress has been seen to take >10 minutes to allocate an IP in
some circumstances (even more so in parallel testing). Also, due
to issues with Services and DNS, disable those tests so we can
get a green grid.
Automatic merge from submit-queue (batch tested with PRs 42237, 42297, 42279, 42436, 42551)
Cleanup federation_util.go in e2e/framework
The only function GetValidDNSSubdomainName in test/e2e/framework/federation_util.go is no longer used for some time now. so cleaning it up.
cc @kubernetes/sig-federation-pr-reviews @madhusudancs
Automatic merge from submit-queue (batch tested with PRs 42237, 42297, 42279, 42436, 42551)
Reword PVC polling message to log a more readable message.
**What this PR does / why we need it**:
Previous message used to report an error is misleading and poorly written. This PR changes the log to be more readable.
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 42237, 42297, 42279, 42436, 42551)
should replace errors.New(fmt.Sprintf(...)) with fmt.Errorf(...)
Signed-off-by: yupengzte <yu.peng36@zte.com.cn>
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue
Fix test for provisioning in unmanaged zone.
defer evaluates arguments of the deferred function immediately, so it actually
deleted a storage class and a claim before the test could do anything useful.
The test passed just accidentally, as the test is expected to time out. It
timed out from wrong reasons though.
@copejon @kubernetes/sig-storage-pr-reviews
```release-note
NONE
```
defer evaluates arguments of the deferred function immediately, so it actually
deleted a storage class and a claim before the test could do anything useful.
The test passed just accidentally, as the test is expected to time out. It
timed out from wrong reasons though.
Automatic merge from submit-queue
Bump CNI consumers to v0.5.1
**What this PR does / why we need it**:
- vendored CNI plugins properly handle `DEL` on missing resources
- update CNI version refs
**Which issue this PR fixes**
fixes#43488
**Release note**:
`bumps CNI to version v0.5.1 where plugins properly handle DEL on non existent resources`
Automatic merge from submit-queue
Increase delays between calling Stackdriver Logging API in e2e tests
Fix https://github.com/kubernetes/kubernetes/issues/43442
This is a temporary hack, proper solution will be implemented soon
Automatic merge from submit-queue (batch tested with PRs 43465, 43529, 43474, 43521)
Added retransmissions in service call by e2e resource consumer library.
Added retransmissions in service call by e2e resource consumer library.
Fixes#43187.
```release-note
NONE
```
Automatic merge from submit-queue
update influxdb dependency to v1.1.1 and change client to v2
**What this PR does / why we need it**:
1. it updates version of influxdb libraries used by tests to v1.1.1 to match version used by grafana
2. it switches influxdb client to v2 to address the fact that [v1 is being depricated](https://github.com/influxdata/influxdb/tree/v1.1.1/client#description)
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
cc @piosz
1. [vendor/BUILD](https://github.com/KarolKraskiewicz/kubernetes/blob/master/vendor/BUILD) didn't get regenerated after executing `./hack/godep-save.sh` so I left previous version.
Not sure how to trigger regeneration of this file.
2. `tests/e2e/monitoring.go` seem to be passing without changes, even after changing version of the client.
**Release note**:
```release-note
```
Automatic merge from submit-queue
e2e test for cluster-autoscaler draining node
**What this PR does / why we need it**:
Adds an e2e test for Cluster-Autoscaler removing a node with a pod running (by rescheduling the pod).
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
@mwielgus can you take a look?
**Release note**:
```release-note
```
Automatic merge from submit-queue
Unify test timeouts under a common name.
Some timeouts were too aggressive and since we've slowly been moving every controller to 5 minutes, consolidate everyone under ``federatedDefaultTestTimeout``. To aid in debugging some service-related issues, if a service cannot be deleted, we issue a kubectl describe on it prior to failing.
Automatic merge from submit-queue (batch tested with PRs 42452, 43399)
Fix faulty assumptions in summary API testing
**What this PR does / why we need it**:
1. on systemd, launch kubelet in dedicated part of cgroup hierarchy
1. bump allowable memory usage for busy box containers as my own local testing often showed values > 1mb which were valid per the memory limit settings we impose
1. there is a logic flaw today in how we report node.memory.stats that needs to be fixed in follow-on.
for the last issue, we look at `/sys/fs/cgroup/memory.stat[rss]` value which if you have global accounting enabled on systemd machines (as expected) will report 0 because nothing runs local to the root cgroup. we really want to be showing the total_rss value for non-leaf cgroups so we get the full hierarchy of usage.
bazel update
added new files to reflect that only one method has changed between arch types.
forgot to add changes to a commit.
changes made and gfmt run.
changed node_problem_detector to node_problem_detector_linux and made it linux only.
updated bazel
Automatic merge from submit-queue
Loosen requirements of cluster logging e2e tests, make them more stable
There should be an e2e test for cloud logging in the main test suite, because this is the important part of functionality and it can be broken by different components.
However, existing cluster logging e2e tests were too strict for the current solution, which may loose some log entries, which results in flakes. There's no way to fix this problem in 1.6, so this PR makes basic cluster logging e2e tests less strict.
Automatic merge from submit-queue (batch tested with PRs 43355, 42827)
[Federation] Rewrite ReplicaSet CRUD and Preferences tests.
I think `should create replicasets and rebalance them` test is still flaky. I still don't know the source of this flakiness. I will continue hunting. But it is a lot less flaky than before (or perhaps it even never passed before?). This PR could be merged now and flake hunting can happen in parallel.
```release-note
NONE
```
Automatic merge from submit-queue
Use storage.k8s.io/v1 in tests instead of v1beta1
This is trimmed version of #42477 and contains only tests of the new storage API. Together with #43285 it passes all dynamic provisioning tests on my GCE.
I did not change vsphere_utils.go and vsphere_volume_diskformat.go as @divyenpatel runs master vsphere tests with Kubernetes 1.5 - @divyenpatel, did I get it right?
@kubernetes/sig-storage-pr-reviews, @msau42, @ethernetdan
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 43313, 43257, 43271, 43307)
In DaemonSet e2e tests, use Patch instead of Update to avoid conflict
Fixes#43310
@marun @kargakis @lukaszo @kubernetes/sig-apps-bugs
Automatic merge from submit-queue
kubectl: Use v1.5-compatible ownership logic when listing dependents.
**What this PR does / why we need it**:
This restores compatibility between kubectl 1.6 and clusters running Kubernetes 1.5.x. It introduces transitional ownership logic in which the client considers ControllerRef when it exists, but does not require it to exist.
If we were to ignore ControllerRef altogether (pre-1.6 client behavior), we would introduce a new failure mode in v1.6 because controllers that used to get stuck due to selector overlap will now make progress. For example, that means when reaping ReplicaSets of an overlapping Deployment, we would risk deleting ReplicaSets belonging to a different Deployment that we aren't about to delete.
This transitional logic avoids such surprises in 1.6 clusters, and does no worse than kubectl 1.5 did in 1.5 clusters. To prevent this when kubectl 1.5 is used against 1.6 clusters, we can cherrypick this change.
**Which issue this PR fixes**:
Fixes#43159
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 42869, 43298, 43285)
Fix default storage class tests
Name of the default storage class is not "default", it must be discovered dynamically.
```release-note
NONE
```
This fixes flake `storageclasses.storage.k8s.io "default" not found` in #43261
Automatic merge from submit-queue (batch tested with PRs 42869, 43298, 43285)
Bumped Heapster to v1.3.0
``` release-note
Bumped Heapster to v1.3.0.
More details about the release https://github.com/kubernetes/heapster/releases/tag/v1.3.0
```
Automatic merge from submit-queue
Add retry to monitoring e2e
**What this PR does / why we need it**:
Add retry to monitoring e2e to prevent it from failing because heapster have not yet been started after cluster creation.
@piosz @jszczepkowski
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#43024
**Special notes for your reviewer**:
**Release note**:
```release-note
```
The test blindly checked all "pause" processes on the node, assuming
they were all infra containers. This change takes a snapshot of all
existing "pause" processes on the node, and exclude them in the
validation. The test still relies on the fact that it runs exclusively
on the node. If that assumption changes, we will need other methods to
locate the PIDs of the infra containers.
In particular, we should not assume ControllerRefs are necessarily set.
However, we can still use ControllerRefs that do exist to avoid
interfering with controllers that do use it.
Automatic merge from submit-queue
Update npd to the official v0.3.0 release.
Update npd to the official release v0.3.0.
This also fixes a npd bug https://github.com/kubernetes/node-problem-detector/pull/98.
@dchen1107 @kubernetes/node-problem-detector-reviewers
Automatic merge from submit-queue
Add guards for StatefulSet and AppArmor upgrade testing
This PR adds automated upgrade infrastructure to allow test suites to know what versions and node images are going to be testing and whether or not they should be skipped. It also adds a guard to prevent StatefulSets from being tested with versions prior to 1.5.0, and a guard to prevent AppArmor from running on distros other than gci and ubuntu.
Automatic merge from submit-queue (batch tested with PRs 43180, 42928)
Fix waitForScheduler in scheduer predicates e2e tests
**What this PR does / why we need it**: Fixes waitForScheduler in e2e to resolve flaky tests in scheduler_predicates.go
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#42691
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 43162, 43157)
Use beta default class annotation for default storageclass tests.
**What this PR does / why we need it**:
The default storageclasses are still installed with the beta annotation, so the test should explicitly use the beta annotation.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#43150
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Add saad-ali and marun to test/OWNERS
/assign @saad-ali @marun
Also ensure that approvers are in the reviewer list, and sort both lists.
Automatic merge from submit-queue (batch tested with PRs 40964, 42967, 43091, 43115)
Add process debug information to summary test
Print out the processes in each system cgroup when the Summary API test fails, to help debug https://github.com/kubernetes/kubernetes/issues/40607
/cc @yujuhong @Random-Liu
Automatic merge from submit-queue (batch tested with PRs 40964, 42967, 43091, 43115)
fixes dswp flake
Sometimes a pod may not appear in desired state
of world immediately, we poll before failing.
It only adds additional 30s to tests in worst case.
Fixes#42990
cc @jingxu97
Automatic merge from submit-queue
Guarantee watch before action in e2e event observer helper function.
**What this PR does / why we need it**:
Adds a missing synchronization barrier to an e2e event observation helper function.
- This change should guarantee that in observeEventAfterAction,
the action is only executed after the informer begins watching
the event stream.
**Release note**:
```release-note
NONE
```
cc @kubernetes/sig-scheduling-pr-reviews @bsalamat
Automatic merge from submit-queue (batch tested with PRs 40404, 43134, 43117)
Fix ES cluster logging test
Fix#37324
Test was broken because fluentd-gcp now parses golang and fluentd-es doesn't
Automatic merge from submit-queue
Fix Deployment upgrade test.
**What this PR does / why we need it**:
When the upgrade test operates on Deployments in a pre-1.6 cluster (i.e. during the Setup phase), it needs to use the v1.5 deployment/util logic. In particular, the v1.5 logic does not filter children to only those with a matching ControllerRef.
**Which issue this PR fixes**:
Fixes#42738
**Special notes for your reviewer**:
**Release note**:
```release-note
```
cc @kubernetes/sig-apps-pr-reviews
Automatic merge from submit-queue (batch tested with PRs 43106, 43110)
Wait for garbagecollector to be synced in test
Fix#42952
Without the `cache.WaitForCacheSync` in the test, it's possible for the GC to get a merged event of RC's creation and its update (update to deletionTimestamp != 0), before GC gets the creation event of the pod, so it's possible the GC will handle the foreground deletion of the RC before it adds the Pod to the dependency graph, thus the race.
With the `cache.WaitForCacheSync` in the test, because GC runs a single thread to process graph changes, it's guaranteed the Pod will be added to the dependency graph before GC handles the foreground deletion of the RC.
Note that this pull fixes the race in the test. The race described in the first point of #26120 still exists.
Automatic merge from submit-queue
Retry calls to ReadFileViaContainer in PD tests
**What this PR does / why we need it**:
kubectl exec occasionally fails to return a valid output string. It seems to be an issue with docker #34256. This PR retries the 'kubectl exec' call to workaround the issue. This should fix the flaky PD test issues.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#28283
**Release note**:
NONE
Automatic merge from submit-queue (batch tested with PRs 42854, 43105, 43090)
Add a timeout to allow replacement pod to become ready
Hopefully fixes https://github.com/kubernetes/kubernetes/issues/37259
```
I0314 04:26:02.562] Mar 14 04:26:02.562: INFO: Pod my-hostname-net-1bgrj still exists
I0314 04:26:22.491] Mar 14 04:26:22.491: INFO: Waiting for pod my-hostname-net-1bgrj to disappear
I0314 04:26:22.496] Mar 14 04:26:22.495: INFO: Pod my-hostname-net-1bgrj no longer exists
I0314 04:26:22.496] STEP: verifying whether the pod from the unreachable node is recreated
I0314 04:26:22.498] Mar 14 04:26:22.498: INFO: Pod name my-hostname-net: Found 3 pods out of 3
I0314 04:26:22.499] STEP: ensuring each pod is running
I0314 04:26:22.499] STEP: trying to dial each unique pod
I0314 04:26:22.579] Mar 14 04:26:22.579: INFO: Controller my-hostname-net: Got expected result from replica 1 [my-hostname-net-5jrdb]: "my-hostname-net-5jrdb", 1 of 3 required successes so far
I0314 04:26:22.642] Mar 14 04:26:22.642: INFO: Controller my-hostname-net: Got expected result from replica 2 [my-hostname-net-mjf3c]: "my-hostname-net-mjf3c", 2 of 3 required successes so far
I0314 04:31:22.645] Mar 14 04:31:22.644: INFO: Controller my-hostname-net: Failed to Get from replica 3 [my-hostname-net-rf46s]: Get https://35.184.87.178/api/v1/namespaces/e2e-tests-network-partition-s5gqt/pods/my-hostname-net-rf46s/proxy/: context deadline exceeded
```
The issue appears to be that we have a race between the pod being "running + ready" and being accessible via the APIServer proxy.
cc @kow3ns @bowei @davidopp
Automatic merge from submit-queue (batch tested with PRs 42854, 43105, 43090)
Move e2e sched event predicates to new file.
**What this PR does / why we need it**:
Small e2e test refactor for scheduler. Moves scheduler event predicates out of opaque_resource.go for reuse elsewhere.
**Release note**:
```release-note
NONE
```
cc @kubernetes/sig-scheduling-pr-reviews @timothysc @bsalamat
When the upgrade test operates on Deployments in a pre-1.6 cluster
(i.e. during the Setup phase), it needs to use the v1.5 deployment/util
logic. In particular, the v1.5 logic does not filter children to only
those with a matching ControllerRef.
Automatic merge from submit-queue (batch tested with PRs 43018, 42713)
Log instead of fail on GLBCs tendency to leak resources
**What this PR does / why we need it**:
Stops upgrade tests from flaking because the GLBC does not cleanup all resources due to a race condition.
**Which issue this PR fixes**: fixes#38569
**Special notes for your reviewer**:
To be reviewed by @mml
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 42775, 42991, 42968, 43029)
Add e2e test for Deployment controllerRef orphaning and adoption
Follow up #42908
@enisoc @kubernetes/sig-apps-bugs @kargakis
Automatic merge from submit-queue (batch tested with PRs 42775, 42991, 42968, 43029)
Initial breakout of scheduling e2es to help assist in assignment and refactoring
**What this PR does / why we need it**:
This PR segregates the scheduling specific e2es to isolate the library which will assist both in refactoring but also auto-assignment of issues.
**Which issue this PR fixes**
xref: https://github.com/kubernetes/kubernetes/issues/42691#issuecomment-285563265
**Special notes for your reviewer**:
All this change does is shuffle code around and quarantine. Behavioral, and other cleanup changes, will be in follow on PRs. As of today, the e2es are a monolith and there is massive symbol pollution, this 1st step allows us to segregate the e2es and tease apart the dependency mess.
**Release note**:
```
NONE
```
/cc @kubernetes/sig-scheduling-pr-reviews @kubernetes/sig-testing-pr-reviews @marun @skriss
/cc @gmarek - same trick for load + density, etc.
Automatic merge from submit-queue (batch tested with PRs 43034, 43066)
Allow StatefulSet controller to PATCH Pods.
**What this PR does / why we need it**:
StatefulSet now needs the PATCH permission on Pods since it calls into ControllerRefManager to adopt and release. This adds the permission and the missing e2e test that should have caught this.
**Which issue this PR fixes**:
**Special notes for your reviewer**:
This is based on #42925.
**Release note**:
```release-note
```
cc @kubernetes/sig-apps-pr-reviews
Automatic merge from submit-queue (batch tested with PRs 42942, 42935)
[Bug] Handle container restarts and avoid using runtime pod cache while allocating GPUs
Fixes#42412
**Background**
Support for multiple GPUs is an experimental feature in v1.6.
Container restarts were handled incorrectly which resulted in stranding of GPUs
Kubelet is incorrectly using runtime cache to track running pods which can result in race conditions (as it did in other parts of kubelet). This can result in same GPU being assigned to multiple pods.
**What does this PR do**
This PR tracks assignment of GPUs to containers and returns pre-allocated GPUs instead of (incorrectly) allocating new GPUs.
GPU manager is updated to consume a list of active pods derived from apiserver cache instead of runtime cache.
Node e2e has been extended to validate this failure scenario.
**Risk**
Minimal/None since support for GPUs is an experimental feature that is turned off by default. The code is also isolated to GPU manager in kubelet.
**Workarounds**
In the absence of this PR, users can mitigate the original issue by setting `RestartPolicyNever` in their pods.
There is no workaround for the race condition caused by using the runtime cache though.
Hence it is worth including this fix in v1.6.0.
cc @jianzhangbjz @seelam @kubernetes/sig-node-pr-reviews
Replaces #42560
Automatic merge from submit-queue
Allow DaemonSet controller to PATCH pods, and add more steps and logs in DaemonSet pods adoption e2e test
DaemonSet pods adoption failed because DS controller aren't allowed to patch pods when claiming pods.
[Edit] This PR fixes#42908 by modifying RBAC to allow DaemonSet controllers to patch pods, as well as adding more logs and steps to the original e2e test to make debugging easier.
Tested locally with a local cluster and GCE cluster.
@kargakis @lukaszo @kubernetes/sig-apps-pr-reviews
Automatic merge from submit-queue (batch tested with PRs 42940, 42906, 42970, 42848)
Move node and event observer helpers to e2e/common
**What this PR does / why we need it**:
Moves existing test helper functions in OIR e2e tests to `test/e2e/common`. These functions wrap informers to help test writers to observe events instead of long-polling for status updates.
For usage examples, see `test/e2e/opaque_resource.go`.
cc @kubernetes/sig-scheduling-misc
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Add fabianofranz as approver for test/e2e/kubectl.go
Adding myself as approver for `kubectl` end-to-end tests.
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 41794, 42349, 42755, 42901, 42933)
Fixes kubectl skew test failure when using kubectl.sh
Fixes leftovers from https://github.com/kubernetes/kubernetes/pull/42737.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 41794, 42349, 42755, 42901, 42933)
Fix DefaultTolerationSeconds admission plugin
DefaultTolerationSeconds is not working as expected. It is supposed to add default tolerations (for unreachable and notready conditions). but no pod was getting these toleration. And api server was throwing this error:
```
Mar 08 13:43:57 fedora25 hyperkube[32070]: E0308 13:43:57.769212 32070 admission.go:71] expected pod but got Pod
Mar 08 13:43:57 fedora25 hyperkube[32070]: E0308 13:43:57.789055 32070 admission.go:71] expected pod but got Pod
Mar 08 13:44:02 fedora25 hyperkube[32070]: E0308 13:44:02.006784 32070 admission.go:71] expected pod but got Pod
Mar 08 13:45:39 fedora25 hyperkube[32070]: E0308 13:45:39.754669 32070 admission.go:71] expected pod but got Pod
Mar 08 14:48:16 fedora25 hyperkube[32070]: E0308 14:48:16.673181 32070 admission.go:71] expected pod but got Pod
```
The reason for this error is that the input to admission plugins is internal api objects not versioned objects so expecting versioned object is incorrect. Due to this, no pod got desired tolerations and it always showed:
```
Tolerations: <none>
```
After this fix, the correct tolerations are being assigned to pods as follows:
```
Tolerations: node.alpha.kubernetes.io/notReady=:Exists:NoExecute for 300s
node.alpha.kubernetes.io/unreachable=:Exists:NoExecute for 300s
```
@davidopp @kevin-wangzefeng @kubernetes/sig-scheduling-pr-reviews @kubernetes/sig-scheduling-bugs @derekwaynecarr
Fixes https://github.com/kubernetes/kubernetes/issues/42716
Automatic merge from submit-queue (batch tested with PRs 41794, 42349, 42755, 42901, 42933)
AppArmor cluster upgrade test
Add a cluster upgrade test for AppArmor. I still need to test this (having some trouble with the cluster-upgrade tests), but wanted to start the review process.
/cc @dchen1107 @roberthbailey
Automatic merge from submit-queue (batch tested with PRs 41794, 42349, 42755, 42901, 42933)
[Federation][e2e] Add framework for upgrade test in federation
Adding framework for federation upgrade tests. please refer to #41791
cc @madhusudancs @nikhiljindal @kubernetes/sig-federation-pr-reviews
The Deployment controller was not propagating ReadyReplicas to underlying clusters causing these errors:
```
Error syncing cluster controller: Deployment.apps "federation-deployment" is invalid: status.availableReplicas: Invalid value: 5: cannot be greater than readyReplicas
```
This was caught in e2e testing and is a 1.6 regression for support that was added in #37959. Without this fix, users will be unable to scale up their deployments.
Automatic merge from submit-queue (batch tested with PRs 42608, 42444)
Return nil when deleting non-exist GCE PD
When gce cloud tries to delete a disk, if the disk could not be found
from the zones, the function should return nil error. This modified behavior is also consistent with AWS
Automatic merge from submit-queue (batch tested with PRs 36704, 42719)
Extend timeouts in taints test to account for slow Pod deletions
Fix#42685
Before merging this we need a consensus on what to do with slow Pod deletions.
Automatic merge from submit-queue
e2e test: Log container output on TestContainerOutput error
When a pod started with TestContainerOutput or TestContainerOutputRegexp
fails from unknown reason, we should log all output of all its containers
so we can analyze what went wrong.
This would help us to see what wrong in https://github.com/kubernetes/kubernetes/issues/40811 - a container is running there for 3 minutes and dies and we want to see what it did for these 3 minutes.
```release-note
NONE
```
When a pod started with TestContainerOutput or TestContainerOutputRegexp
fails from unknown reason, we should log all output of all its containers
so we can analyze what went wrong.
Automatic merge from submit-queue (batch tested with PRs 42734, 42745, 42758, 42814, 42694)
Implement automated downgrade testing.
Node version cannot be higher than the master version, so we must
switch the node version first. Also, we must use the upgrade script
from the appropriate version for GCE.
Automatic merge from submit-queue (batch tested with PRs 42734, 42745, 42758, 42814, 42694)
Create DefaultPodDeletionTimeout for e2e tests
In our e2e and e2e_node tests, we had a number of different timeouts for deletion.
Recent changes to the way deletion works (#41644, #41456) have resulted in some timeouts in e2e tests. #42661 was the most recent fix for this.
Most of these tests are not meant to test pod deletion latency, but rather just to clean up pods after a test is finished.
For this reason, we should change all these tests to use a standard, fairly high timeout for deletion.
cc @vishh @Random-Liu
Automatic merge from submit-queue
Don't wait for the final deletion of pod
The final deletion of the pod depends on kubelet and other components operating correctly. The purpose of this e2e test is verifying the clientset can handle deleteOptions correctly, so waiting for the deletionTimestamp and deletionGraceperiod get set is good enough.
In the long run, we should move this set of e2e tests to integration tests.
Fix#42724#42646
cc @marun
Node version cannot be higher than the master version, so we must
switch the node version first. Also, we must use the upgrade script
from the appropriate version for GCE.
Automatic merge from submit-queue
add debugging to the client watch test
Adds debugging information for https://github.com/kubernetes/kubernetes/issues/42724. I suspect that the watch is closing early, but I'd like proof before I consider things like retrying the list and doing another watch to observe the delete. I'm not even sure that would satisfy the test
It seems like a flaky way to build the test. Why wouldn't we delete non-gracefully?
@kubernetes/sig-api-machinery-misc @caesarxuchao
@wojtek-t saw you just hit this if you wanted to take a quick look at the debugging I added.
Automatic merge from submit-queue (batch tested with PRs 42728, 42278)
[Federation] Create integration test fixture for api
This PR factors a reusable fixture for the federation api server out of the existing integration test.
Targets #40705
cc: @kubernetes/sig-federation-pr-reviews
Automatic merge from submit-queue (batch tested with PRs 42762, 42739, 42425, 42778)
kubeadm: update docker version for CE and EE
**What this PR does / why we need it**: Update regex for docker version to also capture new CE and EE versions.
**Which issue this PR fixes**: fixes #https://github.com/kubernetes/kubeadm/issues/189
**Special notes for your reviewer**: /cc @jbeda @luxas
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 42762, 42739, 42425, 42778)
[Federation][e2e] Use correct default dns name in e2e-testing
After some kubefed changes, the environment variable did not get propagated and we defaulted back to 'federation' instead of 'e2e-federation'. This fixes ongoing service test issues in e2e.
Automatic merge from submit-queue
Add default storageclass tests
**What this PR does / why we need it**:
Adds test cases for using and disabling the default storageclass.
**Release note**:
NONE
Automatic merge from submit-queue (batch tested with PRs 42211, 38691, 42737, 42757, 42754)
[Federation] Generate a random nodePort for each service object in e2e tests.
We now run e2e tests in parallel in the CI environment and nodeports are a single available range of numbers for all the service objects, so they have to be unique for each service object.
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 42211, 38691, 42737, 42757, 42754)
Add more e2e tests for DaemonSet templateGeneration and pod adoption
Depends on #42173
@erictune @kargakis @lukaszo @kubernetes/sig-apps-pr-reviews
After some kubefed changes, the environment variable did not get propagated
and we defaulted back to 'federation' instead of
'e2e-federation'. This fixes ongoing service test issues in e2e.
This change allows validators to pass warnings as well as errors. This
was needed because of how support for docker 1.13+ and the new EE and CE
versions is currently being handled.
We now run e2e tests in parallel in the CI environment and nodeports are
a single available range of numbers for all the service objects, so they
have to be unique for each service object.
Automatic merge from submit-queue (batch tested with PRs 42652, 42681, 42708, 42730)
e2e: fix restarting the apiserver
The string used to match the image name of the apiserver (e.g., `gcr.io/google_containers/kube-apiserver:3be...`),
but this no longer works. Change the test to locate the kube-apiserver container by name.
Automatic merge from submit-queue
Use namespace from context
Fixes#42653
Updates rbac_test.go to submit objects without namespaces set, which matches how actual objects are submitted to the API.
Automatic merge from submit-queue (batch tested with PRs 42705, 42647)
[federation][e2e] Increase timeout waiting for service shard to appear
Most of recent federation service tests are failing with timeouts. although some times they do pass, giving kind of flaky nature. So increasing the timeout waiting for service shard to appear in federated cluster from 1 min to 5 mins.
cc @madhusudancs @kubernetes/sig-federation-bugs
Automatic merge from submit-queue
[Federation] Use and return created replicaset instead of the passed object.
Passed replicaset object doesn't contain object name, but has a prefix set in `GenerateName`. However, we need to operate on the object name later to uniquely identified the created object. So we need the created object with the name set by the API server.
```release-note
NONE
```
Passed replicaset object doesn't contain object name, but has a prefix
set in `GenerateName`. However, we need to operate on the object name
later to uniquely identified the created object. So we need the created
object with the name set by the API server.
Automatic merge from submit-queue
New e2e node test suite with memcg turned on
The flag --experimental-kernal-memcg-notification was initially added to allow disabling an eviction feature which used memcg notifications to make memory evictions more reactive.
As documented in #37853, memcg notifications increased the likelihood of encountering soft lockups, especially on CVM.
This feature would valuable to turn on, at least for GCI, since soft lockup issues were less prevalent on GCI and appeared (at the time) to be unrelated to memcg notifications.
In the interest of caution, I would like to monitor serial tests on GCI with --experimental-kernal-memcg-notification=true.
cc @vishh @Random-Liu @dchen1107 @kubernetes/sig-node-pr-reviews
Automatic merge from submit-queue (batch tested with PRs 42664, 42687)
[Fix Flaky Tests] E2e Node Flaky test suite runs serially
The [e2e Node Flaky Test Suite](https://k8s-testgrid.appspot.com/google-node#kubelet-flaky-gce-e2e&width=20) has been failing with strange errors.
This is because the tests in that suite are meant to be run serially, but are running in parallel, since that was left out of the config. This PR fixes this by changing the Flaky test suite to serial
cc @Random-Liu
Automatic merge from submit-queue
[Bug Fix] Garbage Collect Node e2e Failing
This node e2e test uses its own deletion timeout (1 minute) instead of the default (3 minutes).
#41644 likely increased time for deletion. See that PR for analysis on that.
There may be other problems with this test, but those are difficult to pick apart from hitting this low timeout.
This PR changes the Garbage Collector test to use the default timeout. This should allow us to discern if there are any actual bugs to fix.
cc @kubernetes/sig-node-bugs @calebamiles @derekwaynecarr
Automatic merge from submit-queue
Create "framework" per upgrade test
There were already a few tests just using the default framework
namespace instead of creating a new one. Also there are several
testing libraries that use the default framework's default namespace
as well. It's just easier this way.
Automatic merge from submit-queue
deflake TestPatch by waiting for cache
Fixes#39471
Rather than retry on conflicts for an unknown number of times, we have the resource version after the previous patch and we can use that to wait for a GET response that is at least as current as that resourceVersion.
@kubernetes/sig-api-machinery-pr-reviews @liggitt @wojtek-t
Automatic merge from submit-queue (batch tested with PRs 41890, 42593, 42633, 42626, 42609)
Pods pending due to insufficient OIR should get scheduled once sufficient OIR becomes available (e2e disabled).
#41870 was reverted because it introduced an e2e test flake. This is the same code with the e2e for OIR disabled again.
We can attempt to enable the e2e test cases one-by-one in follow-up PRs, but it would be preferable to get the main fix merged in time for 1.6 since OIR is broken on master (see #41861).
cc @timothysc
Automatic merge from submit-queue (batch tested with PRs 41890, 42593, 42633, 42626, 42609)
Remove everything that is not new from batch/v2alpha1
Fixes#37166.
@lavalamp you've asked for it
@erictune this is a prereq for moving CronJobs to beta. I initially planned to put all in one PR, but after I did that I figured out it'll be easier to review separately. ptal
@kubernetes/api-approvers @kubernetes/sig-api-machinery-pr-reviews ptal
Automatic merge from submit-queue (batch tested with PRs 42506, 42585, 42596, 42584)
[Federation][e2e] Add wait for namespaces to appear in federated cluster
Some federation e2e tests are failing because the namespace (created for every test case) does not seem to appear in one (or more) of federated clusters.
This may be a timing issue, so introducing a wait until the namespace is created in federated clusters.
cc @madhusudancs, @nikhiljindal, @kubernetes/sig-federation-bugs
Automatic merge from submit-queue
cgroup names created by kubelet should be lowercased
**What this PR does / why we need it**:
This PR modifies the kubelet to create cgroupfs names that are lowercased. This better aligns us with the naming convention for cgroups v2 and other cgroup managers in ecosystem (docker, systemd, etc.)
See: https://www.kernel.org/doc/Documentation/cgroup-v2.txt
"2-6-2. Avoid Name Collisions"
**Special notes for your reviewer**:
none
**Release note**:
```release-note
kubelet created cgroups follow lowercase naming conventions
```
Automatic merge from submit-queue (batch tested with PRs 42080, 41653, 42598, 42555)
Fix resource cleanup in ingress_utils.go within e2e/framework
**What this PR does / why we need it**:
The GLBC is failing to delete resources during the etcd rollback tests and the e2e cleanup is leaking them. After a short while, tests are failing to create new resources.
This PR addresses the e2e/framework's ability to delete GLBC-created resources and adds more logging.
**Which issue this PR fixes**:
Helps #38569 but does not completely close this flake
**Special notes for your reviewer**:
Resources were not being deleted because resource names were being truncated and then their ability to be deleted was determined by the entire cluster id existing in the name. Truncated names also have an extra '0' append to the end of their name (unknown origin). This PR tries to match on a common prefix.
Minor changes were made to improve log readability.
**Testing this PR**:
This was tested by running a master upgrade test and by adding a second forwarding-rule mid-run. This forwarding rule referenced the same url-map used by the first forwarding-rule created by the GLBC. Therefore, the GLBC will be able to delete the forwarding-rule but not anymore L7 resources. This second forwarding rule's name was nearly identical to the first forwarding rule so that the cleanup code will find it.
As you can see from the test run below, the cleanup code deleted all the resources that the GLBC could not.
```log
...
Mar 5 18:35:53.112: INFO: Monitoring glbc's cleanup of gce resources:
k8s-fws-e2e-tests-ingress-upgrsde-0px85-static-ip--5f38ac0e2420 (forwarding rule)
k8s-tps-e2e-tests-ingress-upgrade-0px85-static-ip--5f38ac0e2420 (target-https-proxy)
k8s-um-e2e-tests-ingress-upgrade-0px85-static-ip--5f38ac0e24260 (url-map)
k8s-be-32331--5f38ac0e2426f796 (backend-service)
k8s-be-32613--5f38ac0e2426f796 (backend-service)
k8s-be-32331--5f38ac0e2426f796 (http-health-check)
k8s-be-32613--5f38ac0e2426f796 (http-health-check)
k8s-ig--5f38ac0e2426f796 (instance-group)
k8s-ssl-e2e-tests-ingress-upgrade-0px85-static-ip--5f38ac0e2420 (ssl-certificate)
STEP: Performing final delete of any remaining resources
Mar 5 18:35:54.055: INFO: Deleting forwarding-rules: k8s-fws-e2e-tests-ingress-upgrsde-0px85-static-ip--5f38ac0e2420
Mar 5 18:36:06.945: INFO: Deleting target-https-proxies: k8s-tps-e2e-tests-ingress-upgrade-0px85-static-ip--5f38ac0e2420
Mar 5 18:36:14.301: INFO: Deleting url-map: k8s-um-e2e-tests-ingress-upgrade-0px85-static-ip--5f38ac0e24260
Mar 5 18:36:18.309: INFO: Deleting backed-service: k8s-be-32331--5f38ac0e2426f796
Mar 5 18:36:22.112: INFO: Deleting backed-service: k8s-be-32613--5f38ac0e2426f796
Mar 5 18:36:26.192: INFO: Deleting http-health-check: k8s-be-32331--5f38ac0e2426f796
Mar 5 18:36:29.846: INFO: Deleting http-health-check: k8s-be-32613--5f38ac0e2426f796
Mar 5 18:36:33.722: INFO: Deleting instance-group: k8s-ig--5f38ac0e2426f796
Mar 5 18:36:37.762: INFO: Deleting ssl-certificate: k8s-ssl-e2e-tests-ingress-upgrade-0px85-static-ip--5f38ac0e2420
STEP: No resources leaked.
Mar 5 18:36:46.441: INFO: Deleting addresses: e2e-tests-ingress-upgrade-0px85-static-ip
Mar 5 18:36:53.902: INFO: L7 controller failed to delete all cloud resources on time. timed out waiting for the condition
...
```
Automatic merge from submit-queue (batch tested with PRs 42080, 41653, 42598, 42555)
Revert "Pods pending due to insufficient OIR should get scheduled once sufficient OIR becomes available."
Reverts kubernetes/kubernetes#41870 for stopping bleeding edge: #42597
cc/ @ConnorDoyle @kubernetes/release-team
Connor if there is a pending pr to fix the issue, please point it out to me. We can close this one, otherwise, I would like to revert the pr first. You can resubmit the fix. Thanks!
Automatic merge from submit-queue (batch tested with PRs 42080, 41653, 42598, 42555)
StatefulSet: Respect ControllerRef
**What this PR does / why we need it**:
This is part of the completion of the [ControllerRef](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/controller-ref.md) proposal. It brings StatefulSet into full compliance with ControllerRef. See the individual commit messages for details.
**Which issue this PR fixes**:
Fixes#36859
**Special notes for your reviewer**:
**Release note**:
```release-note
StatefulSet now respects ControllerRef to avoid fighting over Pods. At the time of upgrade, **you must not have StatefulSets with selectors that overlap** with any other controllers (such as ReplicaSets), or else [ownership of Pods may change](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/controller-ref.md#upgrading).
```
cc @erictune @kubernetes/sig-apps-pr-reviews
The list functions in deployment/util are used outside the Deployment
controller itself. Therefore, they don't do actual adoption/orphaning.
However, they still need to avoid listing things that don't belong.
Automatic merge from submit-queue (batch tested with PRs 41826, 42405)
Fixed too long name in HPA e2e upgrade test.
Fixed too long name in HPA e2e upgrade test.
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 41826, 42405)
Add stubDomains and upstreamNameservers configuration to kube-dns
```release-note
Updates the dnsmasq cache/mux layer to be managed by dnsmasq-nanny.
dnsmasq-nanny manages dnsmasq based on values from the
kube-system:kube-dns configmap:
"stubDomains": {
"acme.local": ["1.2.3.4"]
},
is a map of domain to list of nameservers for the domain. This is used
to inject private DNS domains into the kube-dns namespace. In the above
example, any DNS requests for *.acme.local will be served by the
nameserver 1.2.3.4.
"upstreamNameservers": ["8.8.8.8", "8.8.4.4"]
is a list of upstreamNameservers to use, overriding the configuration
specified in /etc/resolv.conf.
```
There were already a few tests just using the default framework
namespace instead of creating a new one. Also there are several
testing libraries that use the default framework's default namespace
as well. It's just easier this way.
Automatic merge from submit-queue (batch tested with PRs 31783, 41988, 42535, 42572, 41870)
Pods pending due to insufficient OIR should get scheduled once sufficient OIR becomes available.
This appears to be a regression since v1.5.0 in scheduler behavior for opaque integer resources, reported in https://github.com/kubernetes/kubernetes/issues/41861.
- [X] Add failing e2e test to trigger the regression
- [x] Restore previous behavior (pods pending due to insufficient OIR get scheduled once sufficient OIR becomes available.)
Automatic merge from submit-queue (batch tested with PRs 31783, 41988, 42535, 42572, 41870)
update names for kube plugin initializer to avoid conflicts
Fixes#42581
Other API servers are likely to create admission plugin initializers and so the names we choose for our interfaces matter (they may want to run multiple initializers in the chain). This updates the names for the plugin initializers to be more specific. No other changes.
@ncdc
Automatic merge from submit-queue
Remove the kube-discovery binary from the tree
**What this PR does / why we need it**:
kube-discovery was a temporary solution to implementing proposal: https://github.com/kubernetes/community/blob/master/contributors/design-proposals/bootstrap-discovery.md
However, this functionality is now gonna be implemented in the core for v1.6 and will fully replace kube-discovery:
- https://github.com/kubernetes/kubernetes/pull/36101
- https://github.com/kubernetes/kubernetes/pull/41281
- https://github.com/kubernetes/kubernetes/pull/41417
So due to that `kube-discovery` isn't used in any v1.6 code, it should be removed.
The image `gcr.io/google_containers/kube-discovery-${ARCH}:1.0` should and will continue to exist so kubeadm <= v1.5 continues to work.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
Remove cmd/kube-discovery from the tree since it's not necessary anymore
```
@jbeda @dgoodwin @mikedanese @dmmcquay @lukemarsden @errordeveloper @pires
Automatic merge from submit-queue
Add ProviderUid support to Federated Ingress
This PR (along with GLBC support [here](https://github.com/kubernetes/ingress/pull/278)) is a proposed fix for #39989. The Ingress controller uses a configMap reconciliation process to ensure that all underlying ingresses agree on a unique UID. This works for all of GLBC's resources except firewalls which need their own cluster-unique UID. This PR introduces a ProviderUid which is maintained and synchronized cross-cluster much like the UID. We chose to derive the ProviderUid from the cluster name (via md5 hash).
Testing here is augmented to guarantee that configMaps are adequately propagated prior to Ingress creation.
```release-note
Federated Ingress over GCE no longer requires separate firewall rules to be created for each cluster to circumvent flapping firewall health checks.
```
cc @madhusudancs @quinton-hoole
Automatic merge from submit-queue (batch tested with PRs 42456, 42457, 42414, 42480, 42370)
In DaemonSet e2e test, don't check nodes with NoSchedule taints
Fixes#42345
For example, master node has a ismaster:NoSchedule taint. We don't expect pods to be created there without toleration.
cc @marun @lukaszo @kargakis @yujuhong @Random-Liu @davidopp @kubernetes/sig-apps-pr-reviews
Automatic merge from submit-queue (batch tested with PRs 42456, 42457, 42414, 42480, 42370)
node e2e: apparmor test should fail instead of panicking
This doesn't fix#42420, but at least stop the test from panicking.
Automatic merge from submit-queue (batch tested with PRs 42456, 42457, 42414, 42480, 42370)
Update npd in kubemark since #42201 is merged.
Revert https://github.com/kubernetes/kubernetes/pull/41716.
#42201 has been merged, and #41713 is fixed. Now we could retry update npd in kubemark.
/cc @shyamjvs @wojtek-t @dchen1107
Automatic merge from submit-queue (batch tested with PRs 42369, 42375, 42397, 42435, 42455)
Add alsologtostderr flag to hollow node
@yujuhong @wojtek-t that should solve the kubemark log issue.
Automatic merge from submit-queue (batch tested with PRs 42369, 42375, 42397, 42435, 42455)
[Bug Fix]: Avoid evicting more pods than necessary by adding Timestamps for fsstats and ignoring stale stats
Continuation of #33121. Credit for most of this goes to @sjenning. I added volume fs timestamps.
**why is this a bug**
This PR attempts to fix part of https://github.com/kubernetes/kubernetes/issues/31362 which results in multiple pods getting evicted unnecessarily whenever the node runs into resource pressure. This PR reduces the chances of such disruptions by avoiding reacting to old/stale metrics.
Without this PR, kubernetes nodes under resource pressure will cause unnecessary disruptions to user workloads.
This PR will also help deflake a node e2e test suite.
The eviction manager currently avoids evicting pods if metrics are old. However, timestamp data is not available for filesystem data, and this causes lots of extra evictions.
See the [inode eviction test flakes](https://k8s-testgrid.appspot.com/google-node#kubelet-flaky-gce-e2e) for examples.
This should probably be treated as a bugfix, as it should help mitigate extra evictions.
cc: @kubernetes/sig-storage-pr-reviews @kubernetes/sig-node-pr-reviews @vishh @derekwaynecarr @sjenning
Automatic merge from submit-queue
Eviction Manager Enforces Allocatable Thresholds
This PR modifies the eviction manager to enforce node allocatable thresholds for memory as described in kubernetes/community#348.
This PR should be merged after #41234.
cc @kubernetes/sig-node-pr-reviews @kubernetes/sig-node-feature-requests @vishh
** Why is this a bug/regression**
Kubelet uses `oom_score_adj` to enforce QoS policies. But the `oom_score_adj` is based on overall memory requested, which means that a Burstable pod that requested a lot of memory can lead to OOM kills for Guaranteed pods, which violates QoS. Even worse, we have observed system daemons like kubelet or kube-proxy being killed by the OOM killer.
Without this PR, v1.6 will have node stability issues and regressions in an existing GA feature `out of Resource` handling.
Automatic merge from submit-queue (batch tested with PRs 42443, 38924, 42367, 42391, 42310)
Fix StatefulSet e2e flake
**What this PR does / why we need it**:
Fixes StatefulSet e2e flake by ensuring that the StatefulSet controller has observed the unreadiness of Pods prior to attempting to exercise scale functionality.
**Which issue this PR fixes**
fixes#41889
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 42443, 38924, 42367, 42391, 42310)
Cast system uptime to time.Duration to fix cross build.
Fixes https://github.com/kubernetes/kubernetes/issues/42441.
Cast system uptime to `time.Duration` to avoid different behavior on different architectures.
@sjenning @ixdy @ncdc
When gce cloud tries to delete a disk, if the disk could not be found
from the zones, the function should return nil error. This modified behavior is also consistent with AWS
Automatic merge from submit-queue
Critial pod test uses allocatable instead of capacity
This solves #42239.
When this test was first introduced, pods could request up to the capacity of the node.
With the addition of allocatable introduced in #41234, this is no longer the case, and pods can only use up to allocatable.
This should be included in 1.6, as it is a bug related to a 1.6 feature.
cc @vish @yujuhong
Automatic merge from submit-queue (batch tested with PRs 41306, 42187, 41666, 42275, 42266)
Bump test timeouts to make secret tests work in large clusters
The previous Get/Update pattern with no retry on resource version mismatch
would flake with the following error:
"the object has been modified; please apply your changes to the latest
version and try again"
gives each ingress object a cluster-unique Uid that can be
leveraged by ingress providers.
In the process, supplement the testing of configMap updates to
ensure that the updates are propagated prior to any ingress
object being created. Configmap key/vals for Uid and ProviderUid
must exist at time of Ingress creation.
Automatic merge from submit-queue (batch tested with PRs 41984, 41682, 41924, 41928)
Move node problem detector test into node e2e.
Move current NPD e2e test into node e2e.
In fact, current NPD e2e test is only a functionality test for NPD. It creates test NPD pod, sets test configuration, generates test logs and verifies test result.
It doesn't actually test the NPD really deployed in the cluster.
So it doesn't actually need to run in cluster e2e. Running it in node e2e will:
1) Make it easier to run the test.
2) Make it more light weight to introduce this as a pre/post submit test in NPD repo in the future.
Except this, I'm working on a cluster e2e to run some basic functionality test and benchmark test against the real NPD deployed in the cluster. Will send the PR later.
/cc @dchen1107 @kubernetes/node-problem-detector-reviewers
Automatic merge from submit-queue (batch tested with PRs 41984, 41682, 41924, 41928)
Add options to kubefed telling it to generate HTTP Basic and/or token credentials for the Federated API server
fixes#41265.
**Release notes**:
```release-note
Adds two options to kubefed, `-apiserver-enable-basic-auth` and `-apiserver-enable-token-auth`, which generate an HTTP Basic username/password and a token respectively for the Federated API server.
```
Automatic merge from submit-queue (batch tested with PRs 41984, 41682, 41924, 41928)
RC/RS: Fully Respect ControllerRef
**What this PR does / why we need it**:
This is part of the completion of the [ControllerRef](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/controller-ref.md) proposal. It brings ReplicaSet and ReplicationController into full compliance with ControllerRef. See the individual commit messages for details.
**Which issue this PR fixes**:
Although RC/RS had partially implemented ControllerRef, they didn't use it to determine which controller to sync, or to update expectations. This could lead to instability or controllers getting stuck.
Ref: https://github.com/kubernetes/kubernetes/issues/24433
**Special notes for your reviewer**:
**Release note**:
```release-note
```
cc @erictune @kubernetes/sig-apps-pr-reviews
Automatic merge from submit-queue (batch tested with PRs 42128, 42064, 42253, 42309, 42322)
Add storage.k8s.io/v1 API
This is combined version of reverted #40088 (first 4 commits) and #41646. The difference is that all controllers and tests use old `storage.k8s.io/v1beta1` API so in theory all tests can pass on GKE.
Release note:
```release-note
StorageClassName attribute has been added to PersistentVolume and PersistentVolumeClaim objects and should be used instead of annotation `volume.beta.kubernetes.io/storage-class`. The beta annotation is still working in this release, however it will be removed in a future release.
```
Automatic merge from submit-queue (batch tested with PRs 41980, 42192, 42223, 41822, 42048)
Adjust parameters of GCL cluster logging load tests
This PR increases the amount of logs produced in load tests to match the number of nodes and provide the predictable load of 100 KB/sec on each node.
Also this PR reduces in half amount of time, given for ingesting logs.
Automatic merge from submit-queue (batch tested with PRs 41980, 42192, 42223, 41822, 42048)
Take into account number of restarts in cluster logging tests
Before, in cluster logging tests, we only measured e2e number of lines delivered to the backend.
Also, befure https://github.com/kubernetes/kubernetes/pull/41795 was merged, from the k8s perspective, fluentd was always working properly, even if it's crashlooping inside.
Now we can detect whether fluentd is truly working properly, experiencing no, or almost no OOMs duing its operation.
Automatic merge from submit-queue (batch tested with PRs 41980, 42192, 42223, 41822, 42048)
Modified kubemark startup scripts to restore master on reboot
Fixes#41735
As discussed in the issue, modified the scripts to satisfy the conditions of restoring master env, running non-idempotent operations only for the first time and persist important data like pki/auth files on a PD.
Also attached `start-kubemark-master.sh` as startup-script metadata to master instance (on GCE) so that it is called automatically on each boot.
cc @kubernetes/sig-scalability-misc @wojtek-t @gmarek
Automatic merge from submit-queue (batch tested with PRs 41931, 39821, 41841, 42197, 42195)
Admission Controller: Add Pod Preset
Based off the proposal in https://github.com/kubernetes/community/pull/254
cc @pmorie @pwittrock
TODO:
- [ ] tests
**What this PR does / why we need it**: Implements the Pod Injection Policy admission controller
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
Added new Api `PodPreset` to enable defining cross-cutting injection of Volumes and Environment into Pods.
```
Automatic merge from submit-queue (batch tested with PRs 41644, 42020, 41753, 42206, 42212)
Ingress-glbc upgrade tests
Basically #41676 but with some fixes and added comments. @bprashanth has been away this week and it's desirable to have this in before code freeze.
export functions from pkg/api/validation
add settings API
add settings to pkg/registry
add settings api to pkg/master/master.go
add admission control plugin for pod preset
add new admission control plugin to kube-apiserver
add settings to import_known_versions.go
add settings to codegen
add validation tests
add settings to client generation
add protobufs generation for settings api
update linted packages
add settings to testapi
add settings install to clientset
add start of e2e
add pod preset plugin to config-test.sh
Signed-off-by: Jess Frazelle <acidburn@google.com>
Automatic merge from submit-queue
Extensible Userspace Proxy
This PR refactors the userspace proxy to allow for custom proxy socket implementations.
It changes the the ProxySocket interface to ensure that other packages can properly implement it (making sure all arguments are publicly exposed types, etc), and adds in a mechanism for an implementation to create an instance of the userspace proxy with a non-standard ProxySocket.
Custom ProxySockets are useful to inject additional logic into the actual proxying. For example, our idling proxier uses a custom proxy socket to hold connections and notify the cluster that idled scalable resources need to be woken up.
Also-Authored-By: Ben Bennett bbennett@redhat.com
Automatic merge from submit-queue
Add apps/v1beta1 deployments with new defaults
This pull introduces deployments under `apps/v1beta1` and fixes#23597 and #23304.
TODO:
* [x] - create new type `apps/v1beta1.Deployment`
* [x] - update kubectl (stop, scale)
* [ ] - ~~new `kubectl run` generator~~ - this will only duplicate half of generator code, I suggest replacing current to use new endpoint
* [ ] - ~~create extended tests~~ - I've added integration and cmd tests verifying new endpoints
* [ ] - ~~create `hack/test-update-storage-objects.sh`~~ - see above
This is currently blocked by https://github.com/kubernetes/kubernetes/pull/38071, due to conflicting name `v1beta1.Deployment`.
```release-note
Introduce apps/v1beta1.Deployments resource with modified defaults compared to extensions/v1beta1.Deployments.
```
@kargakis @mfojtik @kubernetes/sig-apps-misc
Automatic merge from submit-queue (batch tested with PRs 42316, 41618, 42201, 42113, 42191)
[Kubemark] Add init container in hollow node for setting inotify limit of node to 200
Fixes#41713
Along with adding the init container, I also changed the manifest to a yaml as otherwise the entire init container annotation would have to be in a single line (with escaped characters), as json doesn't allow multi-line strings.
cc @kubernetes/sig-scalability-misc @wojtek-t @gmarek @Random-Liu
Automatic merge from submit-queue
Extend experimental support to multiple Nvidia GPUs
Extended from #28216
```release-note
`--experimental-nvidia-gpus` flag is **replaced** by `Accelerators` alpha feature gate along with support for multiple Nvidia GPUs.
To use GPUs, pass `Accelerators=true` as part of `--feature-gates` flag.
Works only with Docker runtime.
```
1. Automated testing for this PR is not possible since creation of clusters with GPUs isn't supported yet in GCP.
1. To test this PR locally, use the node e2e.
```shell
TEST_ARGS='--feature-gates=DynamicKubeletConfig=true' FOCUS=GPU SKIP="" make test-e2e-node
```
TODO:
- [x] Run manual tests
- [x] Add node e2e
- [x] Add unit tests for GPU manager (< 100% coverage)
- [ ] Add unit tests in kubelet package
Automatic merge from submit-queue (batch tested with PRs 41597, 42185, 42075, 42178, 41705)
auto discovery CA for extension API servers
This is what the smaller pulls were leading to. Only the last commit is unique and I expect I'll still tweak some pod definitions, but this is where I was going.
@sttts @liggitt
Automatic merge from submit-queue (batch tested with PRs 42216, 42136, 42183, 42149, 36828)
[Federation] Remove federat{ed,ion} prefixes from e2e file names since they are all now scoped under the e2e_federation package.
This is purely cosmetic.
```release-note
NONE
```
cc @kubernetes/sig-federation-pr-reviews
Automatic merge from submit-queue (batch tested with PRs 42200, 39535, 41708, 41487, 41335)
Add support for statefulset spreading to the scheduler
**What this PR does / why we need it**:
The scheduler SelectorSpread priority funtion didn't have the code to spread pods of StatefulSets. This PR adds StatefulSets to the list of controllers that SelectorSpread supports.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#41513
**Special notes for your reviewer**:
**Release note**:
```release-note
Add the support to the scheduler for spreading pods of StatefulSets.
```
Automatic merge from submit-queue (batch tested with PRs 35094, 42095, 42059, 42143, 41944)
Use chroot for containerized mounts
This PR is to modify the containerized mounter script to use chroot
instead of rkt fly. This will avoid the problem of possible large number
of mounts caused by rkt containers if they are not cleaned up.
Automatic merge from submit-queue (batch tested with PRs 35094, 42095, 42059, 42143, 41944)
add aggregation integration test
Wires up an integration test which runs a full kube-apiserver, the wardle server, and the kube-aggregator and creates the APIservice object for the wardle server. Without services and DNS the aggregator doesn't proxy, but it does ensure we don't have an obvious panic or bring up failure.
@sttts @ncdc
Automatic merge from submit-queue (batch tested with PRs 40746, 41699, 42108, 42174, 42093)
Avoid fake node names in user info
Node usernames should follow the format `system:node:<node-name>`,
but if we don't know the node name, it's worse to put a fake one in.
In the future, we plan to have a dedicated node authorizer, which would
start rejecting requests from a user with a bogus node name like this.
The right approach is to either mint correct credentials per node, or use node bootstrapping so it requests a correct client certificate itself.
Automatic merge from submit-queue
clean up generic apiserver options
Clean up generic apiserver options before we tag any levels. This makes them more in-line with "normal" api servers running on the platform.
Also remove dead example code.
@sttts
Automatic merge from submit-queue (batch tested with PRs 41937, 41151, 42092, 40269, 42135)
[Federation] ReplicaSet e2es should let the API server to generate the names to avoid collision while running tests in parallel.
cc @kubernetes/sig-federation-pr-reviews
Automatic merge from submit-queue (batch tested with PRs 41937, 41151, 42092, 40269, 42135)
Add a unit test for idempotent applys to the TPR entries.
The test in apply_test follows the general pattern of other tests.
We load from a file in test/fixtures and mock the API server in the
function closure in the HttpClient call.
The apply operation expects a last-modified-configuration annotation.
That is written verbatim in the test/fixture file.
References #40841
**What this PR does / why we need it**:
Adds one unit test for TPR's using applies.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
References:
https://github.com/kubernetes/features/issues/95https://github.com/kubernetes/kubernetes/issues/40841#issue-204769102
**Special notes for your reviewer**:
I am not super proud of the tpr-entry name.
But I feel like we need to call the two objects differently.
The one which has Kind:ThirdPartyResource
and the one has Kind:Foo.
Is the name "ThirdPartyResource" used interchangeably for both ? I used tpr-entry for the Kind:Foo object.
Also I !assume! this is testing an idempotent apply because the last-applied-configuration annotation is the same as the object itself.
This is the state I see in the logs of kubectl if I do a proper idempotent apply of a third party resource entry.
I guess I will know more once I start playing around with apply command that change TPR objects.
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 41234, 42186, 41615, 42028, 41788)
Additional upgrade e2e tests
**What this PR does / why we need it**: Add basic upgrade tests for DaemonSet and Job, and add "during upgrade" testing to ConfigMap test. Add a simple harness for testing upgrade tests.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*:
**Special notes for your reviewer**: continuation of #41296 @krousey please review, thanks
**Release note**: `NONE`
Automatic merge from submit-queue
Enforce Node Allocatable via cgroups
This PR enforces node allocatable across all pods using a top level cgroup as described in https://github.com/kubernetes/community/pull/348
This PR also provides an option to enforce `kubeReserved` and `systemReserved` on user specified cgroups.
This PR will by default make kubelet create top level cgroups even if `kubeReserved` and `systemReserved` is not specified and hence `Allocatable = Capacity`.
```release-note
New Kubelet flag `--enforce-node-allocatable` with a default value of `pods` is added which will make kubelet create a top level cgroup for all pods to enforce Node Allocatable. Optionally, `system-reserved` & `kube-reserved` values can also be specified separated by comma to enforce node allocatable on cgroups specified via `--system-reserved-cgroup` & `--kube-reserved-cgroup` respectively. Note the default value of the latter flags are "".
This feature requires a **Node Drain** prior to upgrade failing which pods will be restarted if possible or terminated if they have a `RestartNever` policy.
```
cc @kubernetes/sig-node-pr-reviews @kubernetes/sig-node-feature-requests
TODO:
- [x] Adjust effective Node Allocatable to subtract hard eviction thresholds
- [x] Add unit tests
- [x] Complete pending e2e tests
- [x] Manual testing
- [x] Get the proposal merged
@dashpole is working on adding support for evictions for enforcing Node allocatable more gracefully. That work will show up in a subsequent PR for v1.6
Automatic merge from submit-queue (batch tested with PRs 41205, 42196, 42068, 41588, 41271)
Implements an upgrade test for Job
**What this PR does / why we need it**:
This PR implements a cluster upgrade test for Job. Some functionality for Job testing has been moved from the e2e package to the framework package to facilitate code reuse between the e2e package and the upgrade package without introducing cyclic dependencies.
We need this PR to help automate the testing of cluster upgrades between versions.
**Release note**
```release-note
NONE
```
This changes the userspace proxy so that it cleans up its conntrack
settings when a service is removed (as the iptables proxy already
does). This could theoretically cause problems when a UDP service
as deleted and recreated quickly (with the same IP address). As
long as packets from the same UDP source IP and port were going to
the same destination IP and port, the the conntrack would apply and
the packets would be sent to the old destination.
This is astronomically unlikely if you did not specify the IP address
to use in the service, and even then, only happens with an "established"
UDP connection. However, in cases where a service could be "switched"
between using the iptables proxy and the userspace proxy, this case
becomes much more frequent.
Updates the dnsmasq cache/mux layer to be managed by dnsmasq-nanny.
dnsmasq-nanny manages dnsmasq based on values from the
kube-system:kube-dns configmap:
"stubDomains": {
"acme.local": ["1.2.3.4"]
},
is a map of domain to list of nameservers for the domain. This is used
to inject private DNS domains into the kube-dns namespace. In the above
example, any DNS requests for *.acme.local will be served by the
nameserver 1.2.3.4.
"upstreamNameservers": ["8.8.8.8", "8.8.4.4"]
is a list of upstreamNameservers to use, overriding the configuration
specified in /etc/resolv.conf.
The tests in apply_test follows the general pattern of other tests.
We load from a file in test/fixtures and mock the API server in the
function closure in the HttpClient call.
In PATCH request rount-tripper we check that the kubectl apply
implementation worked as expected.
References #40841
Automatic merge from submit-queue (batch tested with PRs 41116, 41804, 42104, 42111, 42120)
make kubectl taint command respect effect NoExecute
**What this PR does / why we need it**:
Part of feature forgiveness implementation, make kubectl taint command respect effect NoExecute.
**Which issue this PR fixes**:
Related Issue: #1574
Related PR: #39469
**Special notes for your reviewer**:
**Release note**:
```release-note
make kubectl taint command respect effect NoExecute
```
Automatic merge from submit-queue (batch tested with PRs 41962, 42055, 42062, 42019, 42054)
Update flag to --check-version-skew instead of --check_version_skew
https://github.com/kubernetes/test-infra/issues/2012
Also add a `--` to send the flags to kubetest without triggering a warning.
Automatic merge from submit-queue (batch tested with PRs 41962, 42055, 42062, 42019, 42054)
PV upgrade test
**What this PR does / why we need it**:
This PR adds a PV upgrade test to the new upgrade test framework. Before, this test had to be done manually. Currently the upgrade test framework only works on the GCE environment, so I plan to add support for other providers later. In order to write the test, I had to modify and refactor some volume test util libraries. I reran the impacted tests to make sure they still passed.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
It's probably easier to review the two commits separately. I split it up into the refactor changes, and the upgrade test changes.
**Release note**:
NONE
cc @saad-ali @krousey
Automatic merge from submit-queue (batch tested with PRs 42044, 41694, 41927, 42050, 41987)
Use existing nginx image in the node
**What this PR does / why we need it**:
Fixes a test flake: `[k8s.io] Garbage collector should orphan pods created by rc if delete options say so`
**Which issue this PR fixes**
fixes#35771
**Special notes for your reviewer**:
**Release note**:
```release-note NONE
```
Automatic merge from submit-queue (batch tested with PRs 42044, 41694, 41927, 42050, 41987)
Add apply set-last-applied subcommand
implement part of https://github.com/kubernetes/community/pull/287, will rebase after https://github.com/kubernetes/kubernetes/pull/41699 got merged, EDIT: since bug output format has been confirmed, will update the behavior of output format soon
cc @kubernetes/sig-cli-pr-reviews @AdoHe @pwittrock
```release-note
Support kubectl apply set-last-applied command to update the applied-applied-configuration annotation
```
Automatic merge from submit-queue (batch tested with PRs 35408, 41915, 41992, 41964, 41925)
e2e/upgrade: add sysctls
Add sysctl upgrade tests.
How can these be effectively tested?
Automatic merge from submit-queue (batch tested with PRs 41954, 40528, 41875, 41165, 41877)
Updating apiserver to return 202 when resource is being deleted asynchronously via cascading deletion
As per https://github.com/kubernetes/kubernetes/issues/33196#issuecomment-278440622.
cc @kubernetes/sig-api-machinery-pr-reviews @smarterclayton @caesarxuchao @bgrant0607 @kubernetes/api-reviewers
```release-note
Updating apiserver to return http status code 202 for a delete request when the resource is not immediately deleted because of user requesting cascading deletion using DeleteOptions.OrphanDependents=false.
```
Automatic merge from submit-queue (batch tested with PRs 41701, 41818, 41897, 41119, 41562)
Optimization of on-wire information sent to scheduler extender interfaces that are stateful
The changes are to address the issue described in https://github.com/kubernetes/kubernetes/issues/40991
```release-note
Added support to minimize sending verbose node information to scheduler extender by sending only node names and expecting extenders to cache the rest of the node information
```
cc @ravigadde
Automatic merge from submit-queue (batch tested with PRs 41994, 41969, 41997, 40952, 40576)
Guaranteed admission for Critical Pods
This is the first step in implementing node-level preemption for critical pods.
It defines the AdmissionFailureHandler interface, which allows callers, like the kubelet, to define how failed predicates are handled, and take steps to correct failures if necessary.
In the kubelet's implementation, it triggers preemption if the pod being admitted is critical, and if the only failed predicates are InsufficientResourceErrors, then it prempts (not yet implemented) other other pods to allow admission of the critical pod.
cc: @vishh
Automatic merge from submit-queue (batch tested with PRs 41994, 41969, 41997, 40952, 40576)
Inode Eviction test Tests that Innocent pods are not evicted throughout test.
The current version of the inode eviction test only makes sure that the two abusive pods are evicted before the innocent pod. This change adds a check during the post-eviction monitoring period to ensure that the innocent pod is not evicted, even after the abusive pods.
This will likely result in a higher failure rate for this test, but since it is in the flaky test suite, this is not an issue.
This change should help give us a better perspective on eviction issues.
cc @Random-Liu @vishh @derekwaynecarr
Automatic merge from submit-queue (batch tested with PRs 40932, 41896, 41815, 41309, 41628)
Modify CronJob API to add job history limits, cleanup jobs in controller
**What this PR does / why we need it**:
As discussed in #34710: this adds two limits to `CronJobSpec`, to limit the number of finished jobs created by a CronJob to keep.
**Which issue this PR fixes**: fixes#34710
**Special notes for your reviewer**:
cc @soltysh, please have a look and let me know what you think -- I'll then add end to end testing and update the doc in a separate commit. What is the timeline to get this into 1.6?
The plan:
- [x] API changes
- [x] Changing versioned APIs
- [x] `types.go`
- [x] `defaults.go` (nothing to do)
- [x] `conversion.go` (nothing to do?)
- [x] `conversion_test.go` (nothing to do?)
- [x] Changing the internal structure
- [x] `types.go`
- [x] `validation.go`
- [x] `validation_test.go`
- [x] Edit version conversions
- [x] Edit (nothing to do?)
- [x] Run `hack/update-codegen.sh`
- [x] Generate protobuf objects
- [x] Run `hack/update-generated-protobuf.sh`
- [x] Generate json (un)marshaling code
- [x] Run `hack/update-codecgen.sh`
- [x] Update fuzzer
- [x] Actual logic
- [x] Unit tests
- [x] End to end tests
- [x] Documentation changes and API specs update in separate commit
**Release note**:
```release-note
Add configurable limits to CronJob resource to specify how many successful and failed jobs are preserved.
```
Automatic merge from submit-queue (batch tested with PRs 42106, 42094, 42069, 42098, 41852)
Pod deletion observation is flaking, increase timeout and debug more
We can afford to wait longer than 30 seconds, and we should be printing
more error and output information about the cause of the failure.
Fixes / triages #41902
Automatic merge from submit-queue (batch tested with PRs 42106, 42094, 42069, 42098, 41852)
Add ncdc to test/OWNERS
**What this PR does / why we need it**: add me to test/OWNERS
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
cc @kubernetes/sig-testing-pr-reviews @smarterclayton @bgrant0607
Automatic merge from submit-queue (batch tested with PRs 41854, 41801, 40088, 41590, 41911)
Add storage.k8s.io/v1 API
v1 API is direct copy of v1beta1 API. This v1 API gets installed and exposed in this PR, I tested that kubectl can create both v1beta1 and v1 StorageClass.
~~Rest of Kubernetes (controllers, examples,. tests, ...) still use v1beta1 API, I will update it when this PR gets merged as these changes would get lost among generated code.~~ Most parts use v1 API now, it would not compile / run tests without it.
**Release note**:
```
Kubernetes API storage.k8s.io for storage objects is now fully supported and is available as storage.k8s.io/v1. Beta version of the API storage.k8s.io/v1beta1 is still available in this release, however it will be removed in a future Kubernetes release.
Together with the API endpoint, StorageClass annotation "storageclass.beta.kubernetes.io/is-default-class" is deprecated and "storageclass.kubernetes.io/is-default-class" should be used instead to mark a default storage class. The beta annotation is still working in this release, however it won't be supported in the next one.
```
@kubernetes/sig-storage-misc
Automatic merge from submit-queue
Supports 'ensure exist' class addon in Addon-manager
Fixes#39561, fixes#37047 and fixes#36411. Depends on #40057.
This PR splits cluster addons into two categories:
- Reconcile: Addons that need to be reconciled (`kube-dns` for instance).
- EnsureExists: Addons that need to be exist but changeable (`default-storage-class`).
The behavior for the 'EnsureExists' class addon would be:
- Create it if not exist.
- Users could do any modification they want, addon-manager will not reconcile it.
- If it is deleted, addon-manager will recreate it with the given template.
- It will not be updated/clobbered during upgrade.
As Brian pointed out in [#37048/comment](https://github.com/kubernetes/kubernetes/issues/37048#issuecomment-272510835), this may not be the best solution for addon-manager. Though #39561 needs to be fixed in 1.6 and we might not have enough bandwidth to do a big surgery.
@mikedanese @thockin
cc @kubernetes/sig-cluster-lifecycle-misc
---
Tasks for this PR:
- [x] Supports 'ensure exist' class addon and switch to use new labels in addon-manager.
- [x] Updates READMEs regarding the new behavior of addon-manager.
- [x] Updated `test/e2e/addon_update.go` to match the new behavior.
- [x] Go through all current addons and apply the new labels on them regarding what they need.
- [x] Bump addon-manager and update its template files.
Automatic merge from submit-queue (batch tested with PRs 41714, 41510, 42052, 41918, 31515)
Switch scheduler to use generated listers/informers
Where possible, switch the scheduler to use generated listers and
informers. There are still some places where it probably makes more
sense to use one-off reflectors/informers (listing/watching just a
single node, listing/watching scheduled & unscheduled pods using a field
selector).
I think this can wait until master is open for 1.7 pulls, given that we're close to the 1.6 freeze.
After this and #41482 go in, the only code left that references legacylisters will be federation, and 1 bit in a stateful set unit test (which I'll clean up in a follow-up).
@resouer I imagine this will conflict with your equivalence class work, so one of us will be doing some rebasing 😄
cc @wojtek-t @gmarek @timothysc @jayunit100 @smarterclayton @deads2k @liggitt @sttts @derekwaynecarr @kubernetes/sig-scheduling-pr-reviews @kubernetes/sig-scalability-pr-reviews
This PR is to modify the containerized mounter script to use chroot
instead of rkt fly. This will avoid the problem of possible large number
of mounts caused by rkt containers if they are not cleaned up.
Fix bash command's execution in MakePod().
Add isPriviledged as a parameter to MakePod().
Move PD utils to pv_util.go
Ran all the tests in pd.go, persistent_volumes.go,
persistent_volumes-disruptive.go.
These changes are needed for the PV upgrade test I am working on.
Automatic merge from submit-queue (batch tested with PRs 41667, 41820, 40910, 41645, 41361)
Switch admission to use shared informers
Originally part of #40097
cc @smarterclayton @derekwaynecarr @deads2k @liggitt @sttts @gmarek @wojtek-t @timothysc @lavalamp @kubernetes/sig-scalability-pr-reviews @kubernetes/sig-api-machinery-pr-reviews
node cache don't need to get all the information about every
candidate node. For huge clusters, sending full node information
of all nodes in the cluster on the wire every time a pod is scheduled
is expensive. If the scheduler is capable of caching node information
along with its capabilities, sending node name alone is sufficient.
These changes provide that optimization in a backward compatible way
- removed the inadvertent signature change of Prioritize() function
- added defensive checks as suggested
- added annotation specific test case
- updated the comments in the scheduler types
- got rid of apiVersion thats unused
- using *v1.NodeList as suggested
- backing out pod annotation update related changes made in the
1st commit
- Adjusted the comments in types.go and v1/types.go as suggested
in the code review
Where possible, switch the scheduler to use generated listers and
informers. There are still some places where it probably makes more
sense to use one-off reflectors/informers (listing/watching just a
single node, listing/watching scheduled & unscheduled pods using a field
selector).
Automatic merge from submit-queue (batch tested with PRs 41812, 41665, 40007, 41281, 41771)
Bump golang versions to 1.7.5
**What this PR does / why we need it**: While #41636 might not make it in until 1.7, this would bump current golang versions from 1.7.4 to 1.7.5 to integrate the fixes from that patch version. This would include, among other things, a fix to ensure cross-built binaries for darwin don't have certificate validation errors (golang/go#18688)
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: none
**Special notes for your reviewer**:
**Release note**:
```release-note
Upgrade golang versions to 1.7.5
```
Automatic merge from submit-queue (batch tested with PRs 41146, 41486, 41482, 41538, 41784)
Switch statefulset controller to shared informers
Originally part of #40097
I *think* the controller currently makes a deep copy of a StatefulSet before it mutates it, but I'm not 100% sure. For those who are most familiar with this code, could you please confirm?
@beeps @smarterclayton @ingvagabund @sttts @liggitt @deads2k @kubernetes/sig-apps-pr-reviews @kubernetes/sig-scalability-pr-reviews @timothysc @gmarek @wojtek-t
Automatic merge from submit-queue (batch tested with PRs 41146, 41486, 41482, 41538, 41784)
Add apply view-last-applied subcommand
reopen pr https://github.com/kubernetes/kubernetes/pull/40984, implement part of https://github.com/kubernetes/community/pull/287
for now unit test all pass, the output looks like:
```console
shiywang@dhcp-140-33 template $ ./kubectl apply view last-applied deployment nginx-deployment
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
creationTimestamp: null
name: nginx-deployment
spec:
strategy: {}
template:
metadata:
creationTimestamp: null
labels:
app: nginx
spec:
containers:
- image: nginx:1.12.10
name: nginx
ports:
- containerPort: 80
resources: {}
status: {}
```
```release-note
Support new kubectl apply view-last-applied command for viewing the last configuration file applied
```
not sure if there is any flag I should updated or the some error handling I should changed.
will generate docs when you guys think is ok.
cc @pwittrock @jessfraz @AdoHe @ymqytw
Automatic merge from submit-queue (batch tested with PRs 38957, 41819, 41851, 40667, 41373)
Move pvutil.go from e2e package to framework package
**What this PR does / why we need it**:
This PR moves pvutil.go to the e2e/framework package.
I am working on a PV upgrade test, and would like to use some of the wrapper functions in pvutil.go. However, the upgrade test is in the upgrade package, and not the e2e package, and it cannot import the e2e package because it would create a circular dependency. So pvutil.go needs to be moved out of e2e in order to break the circular dependency. This is a refactoring name change, no logic has been modified.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*:
**Special notes for your reviewer**:
**Release note**:
NONE
Automatic merge from submit-queue (batch tested with PRs 38957, 41819, 41851, 40667, 41373)
Change taints/tolerations to api fields
This PR changes current implementation of taints and tolerations from annotations to API fields. Taint and toleration are now part of `NodeSpec` and `PodSpec`, respectively. The annotation keys: `scheduler.alpha.kubernetes.io/tolerations` and `scheduler.alpha.kubernetes.io/taints` have been removed.
**Release note**:
Pod tolerations and node taints have moved from annotations to API fields in the PodSpec and NodeSpec, respectively. Pod tolerations and node taints that are defined in the annotations will be ignored. The annotation keys: `scheduler.alpha.kubernetes.io/tolerations` and `scheduler.alpha.kubernetes.io/taints` have been removed.
Automatic merge from submit-queue
Removes additional columns in test_owners.csv
**What this PR does / why we need it**:
fixes a huge collection of typos with the number of columns in the CSV file that probably has broken the auto assign bot
**Special notes for your reviewer**:
None
**Release note**:
`NONE`
Automatic merge from submit-queue (batch tested with PRs 41349, 41532, 41256, 41587, 41657)
Update kubectl in addon-manager to use HPA in autoscaling/v1
Addon-manager is broken since HPA objects were removed from extensions api group.
Came across the logs from [the latest addon-manager on Jenkins](https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-e2e-gci-gce/4290/artifacts/bootstrap-e2e-master/kube-addon-manager.log):
```
INFO: == Entering periodical apply loop at 2017-02-16T17:33:37+0000 ==
error: error pruning namespaced object extensions/v1beta1, Kind=HorizontalPodAutoscaler: the server could not find the requested resource
WRN: == Failed to execute /usr/local/bin/kubectl apply --namespace=kube-system -f /etc/kubernetes/addons --prune=true -l kubernetes.io/cluster-service=true --recursive >/dev/null at 2017-02-16T17:33:38+0000. 2 tries remaining. ==
error: error pruning namespaced object extensions/v1beta1, Kind=HorizontalPodAutoscaler: the server could not find the requested resource
WRN: == Failed to execute /usr/local/bin/kubectl apply --namespace=kube-system -f /etc/kubernetes/addons --prune=true -l kubernetes.io/cluster-service=true --recursive >/dev/null at 2017-02-16T17:33:46+0000. 1 tries remaining. ==
error: error pruning namespaced object extensions/v1beta1, Kind=HorizontalPodAutoscaler: the server could not find the requested resource
WRN: == Failed to execute /usr/local/bin/kubectl apply --namespace=kube-system -f /etc/kubernetes/addons --prune=true -l kubernetes.io/cluster-service=true --recursive >/dev/null at 2017-02-16T17:33:53+0000. 0 tries remaining. ==
WRN: == Kubernetes addon update completed with errors at 2017-02-16T17:33:58+0000 ==
```
And notice this commit (f66679a4e9) came in two weeks ago, which removed HorizontalPodAutoscaler from extensions/v1beta1.
Addon-manager is now partially functioning that it could successfully create and update addons, but will fail to prune objects, which means upgrade tests may mostly fail.
Pushed another version of addon-manager with kubectl v1.6.0-alpha.2 ([release 2 days ago](https://github.com/kubernetes/kubernetes/releases/tag/v1.6.0-alpha.2)) for fixing, including below images:
- gcr.io/google-containers/kube-addon-manager:v6.4-alpha.2
- gcr.io/google-containers/kube-addon-manager-amd64:v6.4-alpha.2
- gcr.io/google-containers/kube-addon-manager-arm:v6.4-alpha.2
- gcr.io/google-containers/kube-addon-manager-arm64:v6.4-alpha.2
- gcr.io/google-containers/kube-addon-manager-ppc64le:v6.4-alpha.2
- gcr.io/google-containers/kube-addon-manager-s390x:v6.4-alpha.2
@mikedanese
cc @wojtek-t @shyamjvs
Automatic merge from submit-queue (batch tested with PRs 41349, 41532, 41256, 41587, 41657)
Enable pod level cgroups by default
**What this PR does / why we need it**:
It enables pod level cgroups by default.
**Special notes for your reviewer**:
This is intended to be enabled by default on 2/14/2017 per the plan outlined here:
https://github.com/kubernetes/community/pull/314
**Release note**:
```release-note
Each pod has its own associated cgroup by default.
```
Automatic merge from submit-queue (batch tested with PRs 41844, 41803, 39116, 41129, 41240)
NPD: Update NPD test.
For https://github.com/kubernetes/node-problem-detector/issues/58.
Update NPD e2e test based on the new behavior.
Note that before merging this PR, we need to merge all pending PRs in npd, and release the v0.3.0-alpha.1 version of NPD.
/cc @dchen1107 @kubernetes/node-problem-detector-reviewers
Automatic merge from submit-queue (batch tested with PRs 41844, 41803, 39116, 41129, 41240)
Allow for not-ready pods in large clusters
This is to workaround issues with non-starting pods in large clusters in roughly 1/3rd of runs.
Automatic merge from submit-queue (batch tested with PRs 41844, 41803, 39116, 41129, 41240)
test: fetch updated deployment before finding new and old rss
@krousey @janetkuo ptal
Ref https://github.com/kubernetes/kubernetes/issues/41518
Automatic merge from submit-queue (batch tested with PRs 41364, 40317, 41326, 41783, 41782)
Debug what is hapening in large clusters
What I'm seeing in large clusters is:
```
I0219 19:34:29.994] [90m/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/common/secrets.go:44[0m
I0219 19:34:29.994] [90m------------------------------[0m
I0219 21:27:11.421] Dumping master and node logs to /workspace/_artifacts
I0219 21:27:11.422] Master SSH not supported for gke
```
i have no idea what is happening during those 2 hours, and would like to understand this.
Automatic merge from submit-queue
[Federation] Modify the comments in Federation E2E tests to use standard Go conventions for documentation comments
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 41709, 41685, 41754, 41759, 37237)
Projected volume plugin
This is a WIP volume driver implementation as noted in the commit for https://github.com/kubernetes/kubernetes/pull/35313.
change to GetOriginalConfiguration
add bazel
refactor apply view-last-applied command
update some changes
minor change
add unit tests, update
update some codes and genreate docs
update LongDesc
Automatic merge from submit-queue (batch tested with PRs 41706, 39063, 41330, 41739, 41576)
[Kubemark] Add option to log hollow-node logs
Ref https://github.com/kubernetes/kubernetes/issues/41613
Added an option to log kubemark hollow-node logs which includes kubelet, kubeproxy and npd logs for each hollow-node.
Setting the env var `ENABLE_HOLLOW_NODE_LOGS=true` should now enable logging for tests.
cc @kubernetes/sig-scalability-misc @wojtek-t @gmarek @yujuhong @Random-Liu
Automatic merge from submit-queue (batch tested with PRs 41421, 41440, 36765, 41722)
Use watch param instead of deprecated /watch/ prefix
Switches clients to use watch param instead of /watch/ prefix
```release-note
Clients now use the `?watch=true` parameter to make watch API calls, instead of the `/watch/` path prefix
```
Automatic merge from submit-queue (batch tested with PRs 41421, 41440, 36765, 41722)
ResourceQuota ability to support default limited resources
Add support for the ability to configure the quota system to identify specific resources that are limited by default. A limited resource means its consumption is denied absent a covering quota. This is in contrast to the current behavior where consumption is unlimited absent a covering quota. Intended use case is to allow operators to restrict consumption of high-cost resources by default.
Example configuration:
**admission-control-config-file.yaml**
```
apiVersion: apiserver.k8s.io/v1alpha1
kind: AdmissionConfiguration
plugins:
- name: "ResourceQuota"
configuration:
apiVersion: resourcequota.admission.k8s.io/v1alpha1
kind: Configuration
limitedResources:
- resource: pods
matchContains:
- pods
- requests.cpu
- resource: persistentvolumeclaims
matchContains:
- .storageclass.storage.k8s.io/requests.storage
```
In the above configuration, if a namespace lacked a quota for any of the following:
* cpu
* any pvc associated with particular storage class
The attempt to consume the resource is denied with a message stating the user has insufficient quota for the matching resources.
```
$ kubectl create -f pvc-gold.yaml
Error from server: error when creating "pvc-gold.yaml": insufficient quota to consume: gold.storageclass.storage.k8s.io/requests.storage
$ kubectl create quota quota --hard=gold.storageclass.storage.k8s.io/requests.storage=10Gi
$ kubectl create -f pvc-gold.yaml
... created
```
Automatic merge from submit-queue
Fix kubemark default e2e test suite's name
Seems like the suite "[Feature:performance]" doesn't trigger tests anymore. Changed it to "[Feature:Performance]" in kubemark run-e2e-tests.sh.
cc @wojtek-t @gmarek
Automatic merge from submit-queue
Adds kube-public to the whitelist to not be deleted for e2e tests
We added the `kube-public` namespace but didn't add it to a whitelist of namespaces to not delete as part of e2e cleanup.
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 39373, 41585, 41617, 41707, 39958)
[Federation][e2e] Remove ns creation in federated clusters
**What this PR does / why we need it**:
In federation e2e, framework creates a namespace for each test case. the same ns is supposed to be created in federated clusters. Due to issues in namespace controller, this was not working earlier. but now it is working.
so currently the namespace is created twice, once by namespace controller and another when we call `getRegisteredClusters`. depending on the timing of these 2 calls, some [test cases fails ](https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-federation/1199#k8sio-federation-secrets-featurefederation-secret-objects-should-not-be-deleted-from-underlying-clusters-when-orphandependents-is-true). So removing the creation of namespace when `getRegisteredClusters` which is unnecessary.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
Fixes flakes in federation e2e.
cc @madhusudancs @nikhiljindal @kubernetes/sig-federation-bugs
Automatic merge from submit-queue (batch tested with PRs 39373, 41585, 41617, 41707, 39958)
Owners file related changes for kubectl and docs contributors
- adding a command to kubectl updates the root .generated_docs file requiring root level approval: move .generated_docs under docs/
- run hack/update-generated-docs.sh so the docs are up to date
- add kubectl contributors to test/OWNERS and test/fixtures/pkg/kubectl/OWNERS so they can approve kubectl e2e test changes
```release-note
NONE
```
Automatic merge from submit-queue
Add standalone npd on GCI.
This PR added standalone NPD in GCE GCI cluster. I already verified the PR, and it should work.
/cc @dchen1107 @fabioy @andyxning @kubernetes/sig-node-misc
Automatic merge from submit-queue (batch tested with PRs 41401, 41195, 41664, 41521, 41651)
Remove default failure domains from anti-affinity feature
Removing it is necessary to make performance of this feature acceptable at some point.
With default failure domains (or in general when multiple topology keys are possible), we don't have transitivity between node belonging to a topology. And without this, it's pretty much impossible to solve this effectively.
@timothysc
Automatic merge from submit-queue (batch tested with PRs 41604, 41273, 41547)
Switch pv controller to shared informer
This is WIP because I still need to do something with bazel? and add 'get storageclasses' to the controller-manager rbac role
@jsafrane PTAL and make sure I did not break anything in the PV controller. Do we need to clone the volumes/claims we get from the shared informer before we use them? I could not find a place where we modify them but you would know for certain.
cc @ncdc because I copied what you did in your other PRs.
Automatic merge from submit-queue
Add cluster logging load test for GCL
Changed code around logging tests a little to avoid duplication. Added GCL go library, because command line tool cannot handle required number of logs effectively.
Related to https://github.com/kubernetes/kubernetes/issues/34310
@piosz
Automatic merge from submit-queue (batch tested with PRs 41548, 41221)
StatefulSet Upgrade Test
Adds StatefulSet upgrade tests and moves common functionality into the framework package. This removes the potential for cyclic dependencies while allowing for code reuse.
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 41606, 41475)
Modify kubemark run-e2e-tests.sh to run right command based on environment
In order to run-e2e-tests.sh in kubemark, we currently need to toggle comment b/w the last 2 lines of the script. This is inconvenient to change each time and sometimes this change gets accidentally included as part of a PR. It's even difficult for someone newly trying kubemark to figure this out in order to run tests. Changed the logic to use the right command based on whether the script is running from within a docker container or just locally.
cc @kubernetes/sig-scalability-misc @wojtek-t @gmarek
Automatic merge from submit-queue (batch tested with PRs 41517, 41494, 41163)
Deployment: filter out old RSes that are deleted or with non-zero replicas before cleanup
Fixes#36379
cc @zmerlynn @yujuhong @kargakis @kubernetes/sig-apps-bugs
Automatic merge from submit-queue (batch tested with PRs 41517, 41494, 41163)
Make scheduler_perf warn rather then fail if scheduling rate is betwe…
**What this PR does / why we need it**
Fix scheduler per bench tests so they pass even if there is a cold interval .
**Special notes for your reviewer**:
This wont effect CI, as this bench test is run manually by devs.
This is the first step in a two step change. First is to move the file.
The second step is to change the package and callers. It needs to be
two steps in order to correctly transfer commit history.
Automatic merge from submit-queue (batch tested with PRs 40505, 34664, 37036, 40726, 41595)
Allow `make test-integration` to pass on OSX
**What this PR does / why we need it**: `make test-integration` isn't passing on my OSX setup (10.11.6, go1.7, docker 1.13.1). Tests that startup an api server fail because the default `cert-dir` of `/var/run/kubernetes` isn't world-writable. Use a tempdir instead.
**Release note**:
```release-note
NONE
```
/cc @kubernetes/sig-testing-pr-reviews
Adds SIG owners to e2e tests in order to support assigning test flakes and general test failures to SIGs rather than relying solely on individuals. Ideally this will help spread the maintenance burden across more This work is part of closing https://github.com/kubernetes/contrib/issues/2389
Automatic merge from submit-queue (batch tested with PRs 41505, 41484, 41544, 41514, 41022)
[Federation] Fixes the newly-added test for replicaset deletion upon namespace deletion.
See #41278 for the original test.
**Release note**:
```release-note
NONE
```
`/var/run` is not world-writable on my OSX 10.11.x setup, so tests that
standup a secure apiserver fail with the default cert dir. Use a
tempdir instead.
Automatic merge from submit-queue (batch tested with PRs 41466, 41456, 41550, 41238, 41416)
Revert "Modify load test to not create too may services/endpoints"
Reverts kubernetes/kubernetes#40798
We seem to be ready to go for it now.
Automatic merge from submit-queue
Ingress e2e typos
**What this PR does / why we need it**: fix typos in e2e test
**Special notes for your reviewer**: none
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Remove alpha provisioning
This is the first part of https://github.com/kubernetes/features/issues/36
@kubernetes/sig-storage-misc
**Release note**:
```release-note
Alpha version of dynamic volume provisioning is removed in this release. Annotation
"volume.alpha.kubernetes.io/storage-class" does not have any special meaning. A default storage class
and DefaultStorageClass admission plugin can be used to preserve similar behavior of Kubernetes cluster,
see https://kubernetes.io/docs/user-guide/persistent-volumes/#class-1 for details.
```
Automatic merge from submit-queue
Switch serviceaccounts controller to generated shared informers
Originally part of #40097
cc @deads2k @sttts @liggitt @smarterclayton @gmarek @wojtek-t @timothysc @kubernetes/sig-scalability-pr-reviews
Automatic merge from submit-queue (batch tested with PRs 37137, 41506, 41239, 41511, 37953)
Add field to control service account token automounting
Fixes https://github.com/kubernetes/kubernetes/issues/16779
* adds an `automountServiceAccountToken *bool` field to `ServiceAccount` and `PodSpec`
* if set in both the service account and pod, the pod wins
* if unset in both the service account and pod, we automount for backwards compatibility
```release-note
An `automountServiceAccountToken *bool` field was added to ServiceAccount and PodSpec objects. If set to `false` on a pod spec, no service account token is automounted in the pod. If set to `false` on a service account, no service account token is automounted for that service account unless explicitly overridden in the pod spec.
```
Automatic merge from submit-queue (batch tested with PRs 37137, 41506, 41239, 41511, 37953)
Fix typos in e2e
**What this PR does / why we need it**: fix typos in e2e test
**Special notes for your reviewer**: none
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 37137, 41506, 41239, 41511, 37953)
e2e test for storage class diskformat verification for vsphere cloud provider
**What this PR does / why we need it**:
This PR adds a new e2e test for vsphere cloud provider.
Test is to verify diskformat specified in storage-class is being honored while volume creation.
Steps:
1. Create StorageClass with diskformat set to valid type (supported options are `eagerzeroedthick`, `zeroedthick` and `thin`)
2. Create PVC which uses the StorageClass created in step 1.
3. Wait for PV to be provisioned.
4. Wait for PVC's status to become Bound
5. Create POD using PVC on specific node.
6. Wait for Disk to be attached to the node.
7. Get node VM's devices and find PV's Volume Disk.
8. Get Backing Info of the Volume Disk and obtain Property of `VirtualDiskFlatVer2BackingInfo` - `EagerlyScrub` and `ThinProvisioned`
9. Based on the value of `EagerlyScrub` and `ThinProvisioned`, verify if diskformat is correct.
10. Delete POD and Wait for Volume Disk to be detached from the Node.
11. Delete PVC, PV and Storage Class
**Which issue this PR fixes** *
fixes #
**Special notes for your reviewer**:
Test is executed against v1.6.0-alpha.1
Test is failing on v1.4.8
**Release Note**
```release-note
NONE
```
@kerneltime @BaluDontu @abrarshivani please review this PR
Automatic merge from submit-queue (batch tested with PRs 37137, 41506, 41239, 41511, 37953)
Bump addon-manager version to v6.4-alpha.1 in kubemark
Fixes https://github.com/kubernetes/kubernetes/issues/41493
cc @wojtek-t @liggitt
Automatic merge from submit-queue
Change node e2e cri-validation configs
Copy the configs to a new directory to test non-cri implementation. We can
remove the original directory after the dependent PRs are merged.
Automatic merge from submit-queue
Switch resourcequota controller to shared informers
Originally part of #40097
I have had some issues with this change in the past, when I updated `pkg/quota` to use the new informers while `pkg/controller/resourcequota` remained on the old informers. In this PR, both are switched to using the new informers. The issues in the past were lots of flakey test failures in the ResourceQuota e2es, where it would randomly fail to see deletions and handle replenishment. I am hoping that now that everything here is consistently using the new informers, there won't be any more of these flakes, but it's something to keep an eye out for.
I also think `pkg/controller/resourcequota` could be cleaned up. I don't think there's really any need for `replenishment_controller.go` any more since it's no longer running individual controllers per kind to replenish. It instead just uses the shared informer and adds event handlers to it. But maybe we do that in a follow up.
cc @derekwaynecarr @smarterclayton @wojtek-t @deads2k @sttts @liggitt @timothysc @kubernetes/sig-scalability-pr-reviews
Automatic merge from submit-queue (batch tested with PRs 41332, 41069, 41470, 41474)
Update test owners
@nikhiljindal I've noticed you've duplicated the `test_owners.csv` contents in c1c2a12 was that intentional. I'm removing it here, since it's failing `hack/update_owners.py`
Automatic merge from submit-queue (batch tested with PRs 41134, 41410, 40177, 41049, 41313)
apiserver: further cleanup of apiserver storage plumbing
- move kubeapiserver`s `RESTOptionsFactory` back to EtcdOptions by adding a `AddWithStorageFactoryTo`
- factor out storage backend `Config` construction from EtcdOptions
- move all `StorageFactory` related code into server/storage subpackage.
In short: remove my stomach ache about `kubeapiserver.RESTOptionsFactory`.
approved based on #40363
Automatic merge from submit-queue (batch tested with PRs 41134, 41410, 40177, 41049, 41313)
PV E2E: dynamically provisioned volume should not create in unmanaged zone
**What this PR does / why we need it**:
Adds e2e test to that attempts to provision a volume in an unmanaged zone and fails on success. This is to catch regressions of #31948.
cc @jeffvance
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 41134, 41410, 40177, 41049, 41313)
Isolate recycler behavior in PV E2E
**What this PR does / why we need it**:
Sets the default `reclaimPolicy` for PV E2E to `Retain` and isolates `Recycle` tests to their own context. The purpose of this is to future proof the PV test suite against the possible deprecation of the `Recycle` behavior. This is done by consolidating recycling test code into a single Context block that can be removed en masse without affecting the test suite.
Secondly, adds a liveliness check for the NFS server pod prior to each test to avoid maxing out timeouts if the NFS server becomes unavailable.
cc @saad-ali @jeffvance
Automatic merge from submit-queue (batch tested with PRs 41134, 41410, 40177, 41049, 41313)
Refactored kubemark code into provider-specific and provider-independent parts [Part-3]
Fixes#38967
Applying final part of the changes in PR #39033 (which refactored kubemark code completely). The changes included in this PR are:
- Removed `test/kubemark/common.sh` and moved relevant parts of its code to the right places in start-kubemark/stop-kubemark scripts.
- Added DOCKER_REGISTRY, PROJECT, KUBEMARK_IMAGE_MAKE_TARGET variables to `/test/kubemark/cloud-provider-config.sh` to make the kubemark image push location variable wrt provider.
- Removed get-real-pod-for-hollow-node.sh as it doesn't seem to do anything useful.
@kubernetes/sig-scalability-misc @wojtek-t @gmarek
Automatic merge from submit-queue (batch tested with PRs 40297, 41285, 41211, 41243, 39735)
Secure kube-scheduler
This PR:
* Adds a bootstrap `system:kube-scheduler` clusterrole
* Adds a bootstrap clusterrolebinding to the `system:kube-scheduler` user
* Sets up a kubeconfig for kube-scheduler on GCE (following the controller-manager pattern)
* Switches kube-scheduler to running with kubeconfig against secured port (salt changes, beware)
* Removes superuser permissions from kube-scheduler in local-up-cluster.sh
* Adds detailed RBAC deny logging
```release-note
On kube-up.sh clusters on GCE, kube-scheduler now contacts the API on the secured port.
```
Automatic merge from submit-queue
delete cadvisor pod after test
tracing looks at events for pod deletion and volume teardown. SInce the cadvisor pod has more than 1 volume, this can make results harder to analyze.
This PR moves the deletion of the cadvisor pod to after the logPodCreateThroughput call, since that marks the "end" of the test.
cc: @dchen1107 @Random-Liu
Automatic merge from submit-queue (batch tested with PRs 41216, 41362, 41275, 41277, 41412)
Fix statefulset e2e test
...by removing the liveness/readiness probes from the cockroachdb
manifests, as explained in
https://github.com/kubernetes/test-infra/issues/1740#issuecomment-279555187
@kow3ns @spxtr
Added nfs-server verification between tests; prototyped recycler test
recycler test in working state, needs clean up/comments
comments added; gets mount path from pod instead of hard code
typos
removed NFS server checker, will be handled in separate PR
catch err; inaccurate comment
gofmt; typo
Automatic merge from submit-queue (batch tested with PRs 41382, 41407, 41409, 41296, 39636)
Update to use proxy subresource consistently
Proxy subresources have been in place since 1.2.0 and improve the ability to put policy in place around proxy access.
This PR updates the last few clients to use proxy subresources rather than the root proxy
Automatic merge from submit-queue (batch tested with PRs 41382, 41407, 41409, 41296, 39636)
implement configmap upgrade test
**What this PR does / why we need it**: Add an automated ConfigMap upgrade test to the e2e upgrade test suite. The test creates a ConfigMap and verifies it can be consumed by pods before/after upgrade. See PR #39953 and #40747 for context.
**Special notes for your reviewer**:
@krousey please review for consistency with new upgrade test interface. I copied heavily from secrets test for this. I will implement similar tests for DaemonSets and Jobs next. Note that I have not run this test locally.
**Release note**:
`NONE`
Automatic merge from submit-queue (batch tested with PRs 41299, 41325, 41386, 41329, 41418)
Fix resource leak in federation e2e tests and another issue
**What this PR does / why we need it**:
The cleanup after federation service e2e tests is not effective as this function cleanupServiceShardsAndProviderResources is getting called with empty string for namespace ("nsName") because the nsName variable is getting redefined.
Another issue is we are prematurely exiting the Poll in waitForServiceOrFail and the error check is incorrect.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
Fixing the 2 issues mentioned above.
**Special notes for your reviewer**:
**Release note**:
`NONE`
cc @madhusudancs @kubernetes/sig-federation-bugs
...by removing the liveness/readiness probes from the cockroachdb
manifests, as explained in
github.com/kubernetes/test-infra/issues/1740#issuecomment-279555187
Automatic merge from submit-queue (batch tested with PRs 41342, 41257)
Move two flaky e2e tests to the flaky suite.
cc @kargakis @davidopp
We should have moved these a long time ago.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 41357, 41178, 41280, 41184, 41278)
[Federation] Add an end-to-end test verifying that deleting a federated namespace deletes child replicasets.
Verifies #38225.
Also, remove a few custom package aliases.
Automatic merge from submit-queue (batch tested with PRs 41357, 41178, 41280, 41184, 41278)
Switch RBAC subject apiVersion to apiGroup in v1beta1
Referencing a subject from an RBAC role binding, the API group and kind of the subject is needed to fully-qualify the reference.
The version is not, and adds complexity around re-writing the reference when returning the binding from different versions of the API, and when reconciling subjects.
This PR:
* v1beta1: change the subject `apiVersion` field to `apiGroup` (to match roleRef)
* v1alpha1: convert apiVersion to apiGroup for backwards compatibility
* all versions: add defaulting for the three allowed subject kinds
* all versions: add validation to the field so we can count on the data in etcd being good until we decide to relax the apiGroup restriction
```release-note
RBAC `v1beta1` RoleBinding/ClusterRoleBinding subjects changed `apiVersion` to `apiGroup` to fully-qualify a subject. ServiceAccount subjects default to an apiGroup of `""`, User and Group subjects default to an apiGroup of `"rbac.authorization.k8s.io"`.
```
@deads2k @kubernetes/sig-auth-api-reviews @kubernetes/sig-auth-pr-reviews
Automatic merge from submit-queue (batch tested with PRs 41357, 41178, 41280, 41184, 41278)
fix flaky host cleanup test
**What this PR does / why we need it**:
Fixes 2 flakes in the "HostCleanup tests in e2e/_kubelet.go_
Also does some very minor refactoring.
**Which issue this PR fixes**
This is an improved fix for issue [31272](https://github.com/kubernetes/kubernetes/issues/31272)
**Special notes for your reviewer**:
```release-note
NONE
```
Automatic merge from submit-queue
Fixes Hazelcast example e2e test
**What this PR does / why we need it**:
This PR fixes the Hazelcast example e2e test
**Special notes for your reviewer**:
It is related to this PR https://github.com/kubernetes/kubernetes/pull/39580
Automatic merge from submit-queue (batch tested with PRs 41274, 41241)
[Federation] Make federation namespace e2e tests parallelizable.
Because deleteAllTestNamespaces deleted all the e2e namespaces it interefered with other federation namespace tests running in parallel. This change should mitigate the problem and make the tests runnable in parallel.
cc @kubernetes/sig-federation-pr-reviews
Automatic merge from submit-queue
Add e2e test for external provisioners
this is a retry of https://github.com/kubernetes/kubernetes/pull/39545
This time around:
* take advantage of the system:persistent-volume-provisioner bootstrap cluster role to grant the external provisioner pod serviceaccount permissions
* add storageclass suffix so that the first and third test don't conflict when one creates the storageclass with the same name before the other
also tested more thoroughly myself on gce :)
@jsafrane if you would like to re-review
Automatic merge from submit-queue (batch tested with PRs 41248, 41214)
Switch hpa controller to shared informer
**What this PR does / why we need it**: switch the hpa controller to use a shared informer
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**: Only the last commit is relevant. The others are from #40759, #41114, #41148
**Release note**:
```release-note
```
cc @smarterclayton @deads2k @sttts @liggitt @DirectXMan12 @timothysc @kubernetes/sig-scalability-pr-reviews @jszczepkowski @mwielgus @piosz
Automatic merge from submit-queue (batch tested with PRs 39418, 41175, 40355, 41114, 32325)
add e2e tests for replicasets with weight, min and max replicas
e2e test with weight, min and max replicas set
#31904#32014
@quinton-hoole @nikhiljindal @deepak-vij @kshafiee @mwielgus
Automatic merge from submit-queue (batch tested with PRs 39418, 41175, 40355, 41114, 32325)
TaintController
```release-note
This PR adds a manager to NodeController that is responsible for removing Pods from Nodes tainted with NoExecute Taints. This feature is beta (as the rest of taints) and enabled by default. It's gated by controller-manager enable-taint-manager flag.
```
Automatic merge from submit-queue (batch tested with PRs 41112, 41201, 41058, 40650, 40926)
[Federation][e2e] Fix few flakes in federation e2e tests
**What this PR does / why we need it**:
Fixes few flakes in #37105
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes # partly fixes few test cases in the above mentioned issue.
**Special notes for your reviewer**:
While cleaning up in AfterEach Block some objects are returned while listing, but by the time the object is delete is issued the object is disappearing resulting in this flake occasionally.
To fix this, we need to check if the err is NotFound while deleting, its ok and need not fail the test.
**Release note**: `NONE`
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 41112, 41201, 41058, 40650, 40926)
Promote TokenReview to v1
Peer to https://github.com/kubernetes/kubernetes/pull/40709
We have multiple features that depend on this API:
- [webhook authentication](https://kubernetes.io/docs/admin/authentication/#webhook-token-authentication)
- [kubelet delegated authentication](https://kubernetes.io/docs/admin/kubelet-authentication-authorization/#kubelet-authentication)
- add-on API server delegated authentication
The API has been in use since 1.3 in beta status (v1beta1) with negligible changes:
- Added a status field for reporting errors evaluating the token
This PR promotes the existing v1beta1 API to v1 with no changes
Because the API does not persist data (it is a query/response-style API), there are no data migration concerns.
This positions us to promote the features that depend on this API to stable in 1.7
cc @kubernetes/sig-auth-api-reviews @kubernetes/sig-auth-misc
```release-note
The authentication.k8s.io API group was promoted to v1
```
Because deleteAllTestNamespaces deleted all the e2e namespaces
it interefered with other federation namespace tests running in
parallel. This change should mitigate the problem and make the
tests runnable in parallel.
Automatic merge from submit-queue (batch tested with PRs 40796, 40878, 36033, 40838, 41210)
StatefulSet hardening
**What this PR does / why we need it**:
This PR contains the following changes to StatefulSet. Only one change effects the semantics of how the controller operates (This is described in #38418), and this change only brings the controller into conformance with its documented behavior.
1. pcb and pcb controller are removed and their functionality is encapsulated in StatefulPodControlInterface. This class modules the design contoller.PodControlInterface and provides an abstraction to clientset.Interface which is useful for testing purposes.
2. IdentityMappers has been removed to clarify what properties of a Pod are mutated by the controller. All mutations are performed in the UpdateStatefulPod method of the StatefulPodControlInterface.
3. The statefulSetIterator and petQueue classes are removed. These classes sorted Pods by CreationTimestamp. This is brittle and not resilient to clock skew. The current control loop, which implements the same logic, is in stateful_set_control.go. The Pods are now sorted and considered by their ordinal indices, as is outlined in the documentation.
4. StatefulSetController now checks to see if the Pods matching a StatefulSet's Selector also match the Name of the StatefulSet. This will make the controller resilient to overlapping, and will be enhanced by the addition of ControllerRefs.
5. The total lines of production code have been reduced, and the total number of unit tests has been increased. All new code has 100% unit coverage giving the module 83% coverage. Tests for StatefulSetController have been added, but it is not practical to achieve greater coverage in unit testing for this code (the e2e tests for StatefulSet cover these areas).
6. Issue #38418 is fixed in that StaefulSet will ensure that all Pods that are predecessors of another Pod are Running and Ready prior to launching a new Pod. This removes the potential for deadlock when a Pod needs to be rescheduled while its predecessor is hung in Pending or Initializing.
7. All reference to pet have been removed from the code and comments.
**Which issue this PR fixes**
fixes #38418,#36859
**Special notes for your reviewer**:
**Release note**:
```release-note
Fixes issue #38418 which, under circumstance, could cause StatefulSet to deadlock.
Mediates issue #36859. StatefulSet only acts on Pods whose identity matches the StatefulSet, providing a partial mediation for overlapping controllers.
```
Automatic merge from submit-queue (batch tested with PRs 40796, 40878, 36033, 40838, 41210)
Implement TTL controller and use the ttl annotation attached to node in secret manager
For every secret attached to a pod as volume, Kubelet is trying to refresh it every sync period. Currently Kubelet has a ttl-cache of secrets of its pods and the ttl is set to 1 minute. That means that in large clusters we are targetting (5k nodes, 30pods/node), given that each pod has a secret associated with ServiceAccount from its namespaces, and with large enough number of namespaces (where on each node (almost) every pod is from a different namespace), that resource in ~30 GETs to refresh all secrets every minute from one node, which gives ~2500QPS for GET secrets to apiserver.
Apiserver cannot keep up with it very easily.
Desired solution would be to watch for secret changes, but because of security we don't want a node watching for all secrets, and it is not possible for now to watch only for secrets attached to pods from my node.
So as a temporary solution, we are introducing an annotation that would be a suggestion for kubelet for the TTL of secrets in the cache and a very simple controller that would be setting this annotation based on the cluster size (the large cluster is, the bigger ttl is).
That workaround mean that only very local changes are needed in Kubelet, we are creating a well separated very simple controller, and once watching "my secrets" will be possible it will be easy to remove it and switch to that. And it will allow us to reach scalability goals.
@dchen1107 @thockin @liggitt
Automatic merge from submit-queue (batch tested with PRs 40917, 41181, 41123, 36592, 41183)
fix scheduler performance test script
**What this PR does / why we need it**:
`test-performance.sh` is in dir `kubernetes/test/integration/scheduler_perf`
the dir `kubernetes/test/component/scheduler/perf` does not exist
Thanks.
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 41074, 41147, 40854, 41167, 40045)
Fix some funky funcs.
This is code cleanup. Fix function declarations and remove stale comment.
Automatic merge from submit-queue (batch tested with PRs 41074, 41147, 40854, 41167, 40045)
Upgrade test for deployments
Upgrade test for Deployments. Should prevent issues like https://github.com/kubernetes/kubernetes/issues/40415 in the future.
@krousey @janetkuo @soltysh
Haven't managed to run it locally...
```
$ go run hack/e2e.go --up --test --test_args="--ginkgo.focus=\[Feature:MasterUpgrade\] --upgrade-target=ci/latest --upgrade-image=gci"
2017/02/02 11:43:22 e2e.go:946: Running: ./hack/e2e-internal/e2e-down.sh
2017/02/02 11:43:22 e2e.go:948: Step './hack/e2e-internal/e2e-down.sh' finished in 7.278236ms
2017/02/02 11:43:22 e2e.go:946: Running: ./hack/e2e-internal/e2e-up.sh
2017/02/02 11:43:22 e2e.go:948: Step './hack/e2e-internal/e2e-up.sh' finished in 5.286328ms
2017/02/02 11:43:22 e2e.go:946: Running: ./cluster/kubectl.sh version --match-server-version=false
2017/02/02 11:43:22 e2e.go:948: Step './cluster/kubectl.sh version --match-server-version=false' finished in 213.847259ms
2017/02/02 11:43:22 e2e.go:946: Running: ./hack/e2e-internal/e2e-status.sh
2017/02/02 11:43:22 e2e.go:948: Step './hack/e2e-internal/e2e-status.sh' finished in 103.253183ms
2017/02/02 11:43:22 e2e.go:230: Something went wrong: encountered 2 errors: [exit status 1 exit status 1]
exit status 1
```
@krousey any eta for when the upgrade framework will be integrated in the pr builder?
Automatic merge from submit-queue (batch tested with PRs 41121, 40048, 40502, 41136, 40759)
add k8s.io/sample-apiserver to demonstrate how to build an aggregated API server
builds on https://github.com/kubernetes/kubernetes/pull/41093
This creates a sample API server is a separate staging repo to guarantee no cheating with `k8s.io/kubernetes` dependencies. The sample is run during integration tests (simple tests on it so far) to ensure that it continues to run.
@sttts @kubernetes/sig-api-machinery-misc ptal
@pwittrock @pmorie @kris-nova an aggregated API server example that will stay up to date.
Automatic merge from submit-queue (batch tested with PRs 41121, 40048, 40502, 41136, 40759)
Remove deprecated kubelet flags that look safe to remove
Removes:
```
--config
--auth-path
--resource-container
--system-container
```
which have all been marked deprecated since at least 1.4 and look safe to remove.
```release-note
The deprecated flags --config, --auth-path, --resource-container, and --system-container were removed.
```
Automatic merge from submit-queue (batch tested with PRs 41145, 38771, 41003, 41089, 40365)
Use privileged containers for statefulset e2e tests
Test containers need to run as spc_t in order to interact with the host
filesystem under /tmp, as the tests for StatefulSet are doing. Docker
will transition the container into this domain when running the container
as privileged.
Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>
**Release note**:
```release-note
NONE
```
/cc @ncdc @soltysh @pmorie
1. pcb and pcb controller are removed and their functionality is
encapsulated in StatefulPodControlInterface.
2. IdentityMappers has been removed to clarify what properties of a Pod are
mutated by the controller. All mutations are performed in the
UpdateStatefulPod method of the StatefulPodControlInterface.
3. The statefulSetIterator and petQueue classes are removed. These classes
sorted Pods by CreationTimestamp. This is brittle and not resilient to
clock skew. The current control loop, which implements the same logic,
is in stateful_set_control.go. The Pods are now sorted and considered by
their ordinal indices, as is outlined in the documentation.
4. StatefulSetController now checks to see if the Pods matching a
StatefulSet's Selector also match the Name of the StatefulSet. This will
make the controller resilient to overlapping, and will be enhanced by
the addition of ControllerRefs.
Automatic merge from submit-queue (batch tested with PRs 40175, 41107, 41111, 40893, 40919)
[Federation][e2e] Move Cluster Registration to federation-up.sh
**What this PR does / why we need it**:
Remove cluster register/unregister calls from test case BeforeEach/AfterEach blocks.
Register clusters once in federation-up.sh
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#40768
**Special notes for your reviewer**:
**Release note**: `NONE`
cc: @madhusudancs @kubernetes/sig-federation-pr-reviews
Automatic merge from submit-queue (batch tested with PRs 38796, 40823, 40756, 41083, 41105)
e2e tests for vSphere cloud provider
**What this PR does / why we need it**:
This PR contains changes for existing e2e volume provisioning test cases for running on vsphere cloud provider.
**Following is the summary of changes made in existing e2e test cases**
**Added test/e2e/persistent_volumes-vsphere.go**
- This test verifies deleting a PVC before the pod does not cause pod deletion to fail on PD detach and deleting the PV before the pod does not cause pod deletion to fail on PD detach.
**test/e2e/volume_provisioning.go**
- This test creates a StorageClass and claim with dynamic provisioning and alpha dynamic provisioning annotations and verifies that required volumes are getting created. Test also verifies that created volume is readable and retaining data.
- Added vsphere as supported cloud provider. Also set pluginName to "kubernetes.io/vsphere-volume" for vsphere cloud provider.
**test/e2e/volumes.go**
- Added test spec for vsphere
- This test creates requested volume, mount it on the pod, write some random content at /opt/0/index.html and verifies file contents are perfect to make sure we don't see the content from previous test runs.
- This test also passes "1234" as fsGroup to mount volume and verifies fsGroup is set correctly.
**added test/e2e/vsphere_utils.go**
- Added function verifyVSphereDiskAttached - Verify the persistent disk attached to the node.
- Added function waitForVSphereDiskToDetach - Wait until vsphere vmdk is deteched from the given node or time out after 5 minutes
- Added getVSpherePersistentVolumeSpec - create vsphere volume spec with given VMDK volume path, Reclaim Policy and labels
- Added getVSpherePersistentVolumeClaimSpec - get vsphere persistent volume spec with given selector labels
- createVSphereVolume - function to create vmdk volume
**Following is the summary of new e2e tests added with this PR**
**test/e2e/vsphere_volume_placement.go**
- contains volume placement tests using node label selector
- Test Back-to-back pod creation/deletion with the same volume source on the same worker node
- Test Back-to-back pod creation/deletion with the same volume source attach/detach to different worker nodes
**test/e2e/pv_reclaimpolicy.go**
- contains tests for PV/PVC - Reclaiming Policy
- Test verifies persistent volume should be deleted when reclaimPolicy on the PV is set to delete and associated claim is deleted
- Test also verified that persistent volume should be retained when reclaimPolicy on the PV is set to retain and associated claim is deleted
**test/e2e/pvc_label_selector.go**
- This is function test for Selector-Label Volume Binding Feature.
- Verify volume with the matching label is bounded with the PVC.
Other changes
Updated pkg/cloudprovider/providers/vsphere/BUILD and test/e2e/BUILD
**Which issue this PR fixes** *
fixes # 41087
**Special notes for your reviewer**:
Updated tests were executed on kubernetes v1.4.8 release on vsphere.
Test steps are provided in comments
@kerneltime @BaluDontu
Test containers need to run as spc_t in order to interact with the host
filesystem under /tmp, as the tests for StatefulSet are doing. Docker
will transition the container into this domain when running the container
as privileged.
Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>
Automatic merge from submit-queue (batch tested with PRs 40345, 38183, 40236, 40861, 40900)
Remove checks for pods responding in deployment e2e tests
Fixes#39879
Remove it because it caused deployment e2e tests sometimes timed out waiting for pods responding, and pods responding isn't related to deployment controller and is not a prerequisite of deployment e2e tests.
@kargakis
addressed review comments
Addressed review comment for pv_reclaimpolicy.go to verify content of the volume
addressed 2nd round of review comments
addressed 3rd round of review comments from jeffvance
Automatic merge from submit-queue (batch tested with PRs 41023, 41031, 40947)
scrub aggregator names to eliminate discovery
Cleanup old uses of `discovery`. Also removes the legacy functionality.
@kubernetes/sig-api-machinery-misc @sttts
Automatic merge from submit-queue (batch tested with PRs 40971, 41027, 40709, 40903, 39369)
Promote SubjectAccessReview to v1
We have multiple features that depend on this API:
SubjectAccessReview
- [webhook authorization](https://kubernetes.io/docs/admin/authorization/#webhook-mode)
- [kubelet delegated authorization](https://kubernetes.io/docs/admin/kubelet-authentication-authorization/#kubelet-authorization)
- add-on API server delegated authorization
The API has been in use since 1.3 in beta status (v1beta1) with negligible changes:
- Added a status field for reporting errors evaluating access
- A typo was discovered in the SubjectAccessReviewSpec Groups field name
This PR promotes the existing v1beta1 API to v1, with the only change being the typo fix to the groups field. (fixes https://github.com/kubernetes/kubernetes/issues/32709)
Because the API does not persist data (it is a query/response-style API), there are no data migration concerns.
This positions us to promote the features that depend on this API to stable in 1.7
cc @kubernetes/sig-auth-api-reviews @kubernetes/sig-auth-misc
```release-note
The authorization.k8s.io API group was promoted to v1
```
Automatic merge from submit-queue (batch tested with PRs 40971, 41027, 40709, 40903, 39369)
Bump GCI to gci-beta-56-9000-80-0
cc/ @Random-Liu @adityakali
Changelogs since gci-dev-56-8977-0-0 (currently used in Kubernetes):
- "net.ipv4.conf.eth0.forwarding" and "net.ipv4.ip_forward" may get reset to 0
- Track CVE-2016-9962 in Docker in GCI
- Linux kernel CVE-2016-7097
- Linux kernel CVE-2015-8964
- Linux kernel CVE-2016-6828
- Linux kernel CVE-2016-7917
- Linux kernel CVE-2016-7042
- Linux kernel CVE-2016-9793
- Linux kernel CVE-2016-7039 and CVE-2016-8666
- Linux kernel CVE-2016-8655
- Toolbox: allow docker image to be loaded from local tarball
- Update compute-image-package in GCI
- Change the product name on /etc/os-release (to COS)
- Remove 'dogfood' from HWID_OVERRIDE in /etc/lsb-release
- Include Google NVME extensions to optimize LocalSSD performance.
- /proc/<pid>/io missing on GCI (enables process stats accounting)
- Enable BLK_DEV_THROTTLING
cc/ @roberthbailey @fabioy for GKE cluster update
Automatic merge from submit-queue
federation: Refactoring namespaced resources deletion code from kube ns controller and sharing it with fed ns controller
Ref https://github.com/kubernetes/kubernetes/issues/33612
Refactoring code in kube namespace controller to delete all resources in a namespace when the namespace is deleted. Refactored this code into a separate NamespacedResourcesDeleter class and calling it from federation namespace controller.
This is required for enabling cascading deletion of namespaced resources in federation apiserver.
Before this PR, we were directly deleting the namespaced resources and assuming that they go away immediately. With cascading deletion, we will have to wait for the corresponding controllers to first delete the resources from underlying clusters and then delete the resource from federation control plane. NamespacedResourcesDeleter has this waiting logic.
cc @kubernetes/sig-federation-misc @caesarxuchao @derekwaynecarr @mwielgus
Automatic merge from submit-queue (batch tested with PRs 40385, 40786, 40999, 41026, 40996)
Refactor federated services tests a bit to move a test that requires no cluster creation to a separate block.
Follow up to PR #40769.
cc @kubernetes/sig-federation-pr-reviews
Automatic merge from submit-queue
Replace hand-written informers with generated ones
Replace existing uses of hand-written informers with generated ones.
Follow-up commits will switch the use of one-off informers to shared
informers.
This is a precursor to #40097. That PR will switch one-off informers to shared informers for the majority of the code base (but not quite all of it...).
NOTE: this does create a second set of shared informers in the kube-controller-manager. This will be resolved back down to a single factory once #40097 is reviewed and merged.
There are a couple of places where I expanded the # of caches we wait for in the calls to `WaitForCacheSync` - please pay attention to those. I also added in a commented-out wait in the attach/detach controller. If @kubernetes/sig-storage-pr-reviews is ok with enabling the waiting, I'll do it (I'll just need to tweak an integration test slightly).
@deads2k @sttts @smarterclayton @liggitt @soltysh @timothysc @lavalamp @wojtek-t @gmarek @sjenning @derekwaynecarr @kubernetes/sig-scalability-pr-reviews
Remove it because it caused deployment e2e tests sometimes timed out
waiting for pods responding, and pods responding isn't related to
deployment controller and is not a prerequisite of deployment e2e tests.
Automatic merge from submit-queue (batch tested with PRs 40978, 40994, 41008, 40622)
Refactored kubemark code into provider-specific and provider-independent parts [Part-2]
Applying part of the changes of PR https://github.com/kubernetes/kubernetes/pull/39033 (which refactored kubemark code completely). The changes included in this PR are:
- Added test/kubemark/skeleton/util.sh which defines a well-commented interface that any cloud-provider should implement to run kubemark.
This includes functions like creating the master machine instance along with its resources, remotely executing a given command on the master (like ssh), scp, deleting the master instance and its resources.
All these functions have to be over-ridden by each cloud provider inside the file /test/kubemark/$CLOUD_PROVIDER/util.sh
- Implemented the above mentioned interface for gce in /test/kubemark/$CLOUD_PROVIDER/util.sh
- Made start- and stop- kubemark scripts (almost) provider independent by making them source the interface based on cloud provider.
@kubernetes/sig-scalability-misc @wojtek-t @gmarek
Automatic merge from submit-queue
Add SIG to test owners
**What this PR does / why we need it**:
This PR adds a `sig` column to the test owners file generation script.
A problem experienced with the current owners file is that since members are auto-assigned there are times where tests are assigned to non-active users who don't follow up to notifications to fix flakes. By assigning a SIG to each test we can hold a group we know is active responsible for taking care of flakes it's less likely that flakes will fall through the cracks.
**Special notes for your reviewer**:
* A companion PR will go into *kubernetes/contrib* adding support for mungers parsing this new column.
* Another PR in contrib will add labeling GitHub flake issues with the appropriate SIG
* Currently SIGs are not labeled, this will be added in another PR where SIG determinations can be discussed
@saad-ali @pwittrock
Automatic merge from submit-queue (batch tested with PRs 40289, 40877, 40879, 39972, 40942)
Rename experimental-cgroups-per-pod flag
**What this PR does / why we need it**:
1. Rename `experimental-cgroups-per-qos` to `cgroups-per-qos`
1. Update hack/local-up-cluster to match `CGROUP_DRIVER` with docker runtime if used.
**Special notes for your reviewer**:
We plan to roll this feature out in the upcoming release. Previous node e2e runs were running with this feature on by default. We will default this feature on for all e2es next week.
**Release note**:
```release-note
Rename --experiemental-cgroups-per-qos to --cgroups-per-qos
```
Automatic merge from submit-queue (batch tested with PRs 40289, 40877, 40879, 39972, 40942)
PV E2E: provide each spec with a fresh nfs host
**What this PR does / why we need it**:
PersistentVolume e2e currently reuses an NFS host pod created at the start of the suite and accessed by each test. This is far less favorable than using a fresh volume per test. Additionally, this guards against the volume host pod or it's kubelet being disrupted, which has led to flakes.
```release-note-none
```
Automatic merge from submit-queue (batch tested with PRs 40906, 40924, 40938, 40902, 40911)
Add [Flaky] tag to persistent volumes tests
**What this PR does / why we need it**:
Persistent Volume tests continue to flake in CI.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
```release-note
NONE
```
Automatic merge from submit-queue
Add an upgrade test for secrets.
**What this PR does / why we need it**: This PR adds an upgrade test for secrets. It creates a secret and makes sure that pods can consume it before an after an upgrade.
Automatic merge from submit-queue (batch tested with PRs 40696, 39914, 40374)
Forgiveness library changes
**What this PR does / why we need it**:
Splited from #34825, contains library changes that are needed to implement forgiveness:
1. ~~make taints-tolerations matching respect timestamps, so that one toleration can just tolerate a taint for only a period of time.~~ As TaintManager is caching taints and observing taint changes, time-based checking is now outside the library (in TaintManager). see #40355.
2. make tolerations respect wildcard key.
3. add/refresh some related functions to wrap taints-tolerations operation.
**Which issue this PR fixes**:
Related issue: #1574
Related PR: #34825, #39469
~~Please note that the first 2 commits in this PR come from #39469 .~~
**Special notes for your reviewer**:
~~Since currently we have `pkg/api/helpers.go` and `pkg/api/v1/helpers.go`, there are some duplicated periods of code laying in these two files.~~
~~Ideally we should move taints-tolerations related functions into a separate package (pkg/util/taints), and make it a unified set of implementations. But I'd just suggest to do it in a follow-up PR after Forgiveness ones done, in case of feature Forgiveness getting blocked to long.~~
**Release note**:
```release-note
make tolerations respect wildcard key
```
Automatic merge from submit-queue (batch tested with PRs 40864, 40666, 38382, 40874)
Density Test includes deletion and volumes
Moved the calls to deletePodSync to BEFORE logDensityTimeSeries. This is because the parser considers a line printed in logDensityTimeSeries to be the "end" of the test. This change includes deletion in the "test window", but makes no other changes.
I also added volumes to the test, so that we can make sure that mounting and unmounting volumes are also taken into account for performance profiling.
Automatic merge from submit-queue (batch tested with PRs 40864, 40666, 38382, 40874)
Promote init containers to GA
This is proposed for 1.6
PR moves beta proved concept for init containers to stable. Specification of init containers can be now stated under initContainers field in PodSpec/PodTemplateSpec. Specifying init-containers in annotation is still possible, but will be removed in future version.
```release-note
Init containers have graduated to GA and now appear as a field. The beta annotation value will still be respected and overrides the field value.
```
Automatic merge from submit-queue (batch tested with PRs 40864, 40666, 38382, 40874)
test: reduce deployment progress deadline, ensure its rs is up
Fixes https://github.com/kubernetes/kubernetes/issues/39785 by reducing the deadline of the expected progress and making sure the new replica set is up before checking the deployment condition.
@kubernetes/sig-apps-misc
Automatic merge from submit-queue (batch tested with PRs 40884, 40809, 40845, 40866, 40875)
Remove many job e2e tests.
These tests have equivalent unit test coverage as far as I can tell, in [pkg/controller/job/jobcontroller_test.go](https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/job/jobcontroller_test.go). See #40839 for context.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 40884, 40809, 40845, 40866, 40875)
Node E2E: Create new ubuntu image with docker 1.12.6.
We should test the newest docker 1.12 version - 1.12.6.
/cc @dchen1107 @yujuhong @kubernetes/sig-node-pr-reviews
Automatic merge from submit-queue (batch tested with PRs 40812, 39903, 40525, 40729)
test/node_e2e: wire-in cri-enabled local testing
This commit wires-in the pre-existing `--container-runtime` flag for
local node_e2e testing.
This is needed in order to further skip docker specific testing
and validation.
Local CRI node_e2e can now be performed via
`make test-e2e-node RUNTIME=remote REMOTE=false`
which will also take care of passing the appropriate argument to
the kubelet.
Automatic merge from submit-queue
test: move deployment helper in testing framework
Wanted to get this out of the way before submitting an upgrade test for Deployments and I need the helper in the framework utility
@janetkuo @soltysh
Automatic merge from submit-queue (batch tested with PRs 35782, 35831, 39279, 40853, 40867)
genericapiserver: cut off more dependencies – episode 7
Follow-up of https://github.com/kubernetes/kubernetes/pull/40822
approved based on #40363
Automatic merge from submit-queue (batch tested with PRs 35782, 35831, 39279, 40853, 40867)
Test GCE PD unmounts and detaches when the namespace of the pvc&pod is deleted.
Addition to Persistent Volume E2E testing. On a GCE cluster, create a pv, pvc, and client pod. Delete the namespace and check that the disk detaches successfully.
@jeffvance
~~DEPENDENT ON~~ #34353 merged. No dependencies.
Automatic merge from submit-queue
Removed HPA objects from extensions api group
fix#29778
``` release-note
HorizontalPodAutoscaler is no longer supported in extensions/v1beta1 version. Use autoscaling/v1 instead.
```
cc @kubernetes/autoscaling
Automatic merge from submit-queue (batch tested with PRs 39169, 40719, 38954, 40808, 40689)
Add StatefulSets checks at Service level
Hi!
Please let me propose some very small e2e testsuite enhancement.
This PR removed a `TODO` about checking governing service at unit test level (which is hard) and adds this to e2e testsuite.
Thanks
Sebastian
Automatic merge from submit-queue
Add websocket support for port forwarding
#32880
**Release note**:
```release-note
Port forwarding can forward over websockets or SPDY.
```
This commit wires-in the pre-existing `--container-runtime` flag for
local node_e2e testing.
This is needed in order to further skip docker specific testing
and validation.
Local CRI node_e2e can now be performed via
`make test-e2e-node RUNTIME=remote REMOTE=false`
which will also take care of passing the appropriate arguments to
the kubelet.
Automatic merge from submit-queue
Use full package path for definition name in OpenAPI spec
We were using short package name (last part of package name) plus type name for OpenAPI spec definition name. That can result in duplicate names and make the spec invalid. To be sure we will always have unique names, we are going to use full package name as definition name. Also "x-kubernetes-tag" custom field is added to definitions to list Group/Version/Kind for the definitions that has it. This will help clients to discover definitions easier.
Lastly, we've added a reference from old definition names to the new ones to keep backward compatibilities. The list of old definitions will not be updated.
**Release note**:
- Rename OpenAPI definition names to type's full package names to prevent duplicates
- Create OpenAPI extension "x-kubernetes-group-version-kind" for definitions to store Group/Version/Kind
- Deprecate old definition names and create a reference to the new definitions. Old definitions will be removed in the next release.
- split out port forwarding into its own package
Allow multiple port forwarding ports
- Make it easy to determine which port is tied to which channel
- odd channels are for data
- even channels are for errors
- allow comma separated ports to specify multiple ports
Add portfowardtester 1.2 to whitelist
Automatic merge from submit-queue (batch tested with PRs 40529, 40630)
propagate explicit nulls in apply
Rebase of https://github.com/kubernetes/kubernetes/pull/35496 on top of https://github.com/kubernetes/kubernetes/pull/40260
The client-side propagation of the raw value is no longer needed, since the client is preserving the original object in unstructured form (explicit nulls are preserved).
Kept tests and CreateThreeWayMergePatch changes from https://github.com/kubernetes/kubernetes/pull/35496
```release-note
kubectl apply now supports explicitly clearing values not present in the config by setting them to null
```
- [x] Clean up orphaned objects in test-cmd to preserve pre- and post- conditions
- [x] improve CreateThreeWayMergePatch test to not filter based on string comparison to test name
Automatic merge from submit-queue (batch tested with PRs 40529, 40630)
test/e2e_node: tie together expected string and exec
This commit ties together busybox-sh invocation and test expectation
to avoid subtle mismatches between exec command and output string.
Automatic merge from submit-queue (batch tested with PRs 40645, 40541, 40769)
[Federation] Marked the tests that don't need registered clusters so.
Somewhat related to issue #40766.
cc @kubernetes/sig-federation-pr-reviews
Automatic merge from submit-queue (batch tested with PRs 40527, 40738, 39366, 40609, 40748)
Removed Flaky tag from PV e2e, added [Volume] to disruptive PV e2e
**What this PR does / why we need it**:
Removes `[Flaky]` from PV e2e testing. Flakes were due to interference from an external test disrupting a cluster node. The test has been [removed](9f36c032de) and the flakes have [cleared](https://k8s-testgrid.appspot.com/google-gce#gce-flaky).
Secondly, added `[Volume]` tag to PV disruptive tests.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#39119
**Release note**:
NONE
```release-note
```
Automatic merge from submit-queue
Add flag to node e2e test specifying location of ssh privkey
**What this PR does / why we need it**: in CI, the ssh private key is not always located at `$HOME/.ssh`, so it's helpful to be able to override it.
@krzyzacy here's my resurrected change. I'm not sure why I neglected to follow-through on it originally.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
test/images/mount-tester: ensure exec binary is o+rx
The `mount-tester` image is currently used as a base layer for other
test images (like `mounttest-user`) which perform uid/gid changes.
However, the binary built in here just follows local umask, and currently is
```
-rwxr-x--- 1 root root 2052704 May 19 2016 mt
```
This commit adds an explicit chmod on the binary to make sure uid/gid
tests can still run it as "others".
Automatic merge from submit-queue
refactor pv e2e code to improve readability
**What this PR does / why we need it**:
Moved the helper functions out of _persistent_volumes.go_ to a new file, _pvutil.go_, in order to improve readability and make it easier to add new tests.
Also, all pod delete code now calls the same helper function `deletePodWithWait`.
**Release note**:
```
NONE
```
Automatic merge from submit-queue
e2e test should fail if err==timeout
If err==timeout, it means the replicaset (the owner) is not deleted after 30s, which indicates a bug, so the e2e test should fail.
Automatic merge from submit-queue
Remove jsafrane from some tests
I do not know anything about ESIPP nor clouddns and I have never touched these tests. It would be better to assign flakes to someone else.
@bprashanth @quinton-hoole, PTAL. I took your names as authors of these two tests.
Automatic merge from submit-queue
Improve the multiarch situation; armel => armhf; reenable pcc64le; remove the patched golang
**What this PR does / why we need it**:
- Improves the multiarch situation as described in #38067
- Tries to bump to go1.8 for arm (and later enable ppc64le)
- GOARM 6 => GOARM 7
- Remove the golang 1.7 patch
- armel => armhf
- Bump QEMU version to v2.7.0
**Release note**:
```release-note
Improve the ARM builds and make hyperkube on ARM working again by upgrading the Go version for ARM to go1.8beta2
```
@kubernetes/sig-testing-misc @jessfraz @ixdy @jbeda @david-mcmahon @pwittrock
Automatic merge from submit-queue (batch tested with PRs 40584, 40319)
ssh support for local
**What this PR does / why we need it**: adds local deployment support for e2e tests. Useful for non-cloud, simple testing.
**Special notes for your reviewer**: Formerly this pr was part of #38214
**Release note**:
```
NONE
```
Automatic merge from submit-queue (batch tested with PRs 40132, 39302, 40194, 40619, 40601)
Update NPD version to v0.3.0-alpha.0 in kubemark.
@wojtek-t @shyamjvs Update the NPD version in kubemark.
I just built the alpha release https://github.com/kubernetes/node-problem-detector/releases/tag/v0.3.0-alpha.0.
And the PR https://github.com/kubernetes/node-problem-detector/pull/79 is included.
However, I'm not sure whether 1 minute period is longer enough.
If it's still not longer enough, in fact we can extend it by split the resync and heartbeat:
* Every 1 minute, check whether there is inconsistency between apiserver and npd, and only update when there is inconsistency. (1 GET/m)
* Every > 2 minute, do forcibly update as heartbeat. (<0.5 PATCH/m)
And I can also make the sync period configurable after we finalize the sync mechanism.
Automatic merge from submit-queue (batch tested with PRs 39469, 40557)
Refactored kubemark code into provider-specific and provider-independent parts [Part-1]
Applying part of the changes of PR https://github.com/kubernetes/kubernetes/pull/39033 (which refactored kubemark code completely). The changes included in this PR are:
The following are the major changes as part of this refactoring:
- Moved cluster-kubemark/config-default.sh -> cluster-kubemark/gce/config-default.sh (as the config is gce-specific)
- Changed kubernetes/cluster/kubemark/util.sh to source the right scripts based on the cloud-provider
- Added the file test/kubemark/cloud-provider-config.sh which sets the variable CLOUD_PROVIDER that is later picked up by various scripts (run-e2e-tests.sh, common.sh)
- Removed useless code and restructured start-kubemark.sh and stop-kubemark.sh scripts.
@kubernetes/sig-scalability-misc @wojtek-t @gmarek
The `mount-tester` image is currently used as a base layer for other
test images (like `mounttest-user`) which perform uid/gid changes.
However, the binary built in here just follows local umask, and currently is
```
-rwxr-x--- 1 root root 2052704 May 19 2016 mt
```
This commit adds an explicit chmod on the binary to make sure uid/gid
tests can still run it as "others".
Automatic merge from submit-queue (batch tested with PRs 40046, 40073, 40547, 40534, 40249)
Fix e2e: validates that InterPodAntiAffinity is respected if matching 2
**What this PR does / why we need it**:
Fixed e2e: validates that InterPodAntiAffinity is respected if matching 2
```
• Failure [120.255 seconds]
[k8s.io] SchedulerPredicates [Serial]
/root/code/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/framework/framework.go:656
validates that InterPodAntiAffinity is respected if matching 2 [It]
/root/code/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/scheduler_predicates.go:561
Not scheduled Pods: []v1.Pod(nil)
Expected
<int>: 0
to equal
<int>: 1
/root/code/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/scheduler_predicates.go:933
```
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
https://github.com/kubernetes/kubernetes/issues/30142
**Special notes for your reviewer**:
xref: https://bugzilla.redhat.com/show_bug.cgi?id=1413748
While looking into the above bug, I found that the e2e was failing.
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 40239, 40397, 40449, 40448, 40360)
move the discovery and dynamic clients
Moved the dynamic client, discovery client, testing/core, and testing/cache to `client-go`. Dependencies on api groups we don't have generated clients for have dropped out, so federation, kubeadm, and imagepolicy.
@caesarxuchao @sttts
approved based on https://github.com/kubernetes/kubernetes/issues/40363
Automatic merge from submit-queue (batch tested with PRs 38739, 40480, 40495, 40172, 40393)
Rename controller pkg/registry/core/controller to pkg/registry/core/r…
…eplicationcontroller
**What this PR does / why we need it**:
Rename controller pkg/registry/core/controller to pkg/registry/core/replicationcontroller
This will clarify the purpose of the controller since intent is replicationcontroller
Please refer to
https://github.com/kubernetes/kubernetes/issues/17648
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
NONE
**Release note**:
```NONE
```
Automatic merge from submit-queue
Prep node_e2e for GCI to COS name change
GCI will soon change name in etc/os-release from "gci" to "cos".
This prepares the node_e2e tests to deal with that change and also updates some comments/log messages/var names in anticipation.
Automatic merge from submit-queue (batch tested with PRs 39538, 40188, 40357, 38214, 40195)
test for host cleanup in unfavorable pod deletes
addresses issue #31272 with a new e2e test in _kubelet.go_
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 39538, 40188, 40357, 38214, 40195)
genericapiserver: cut off more dependencies – episode 2
Compare commit subjects.
approved based on #40363
Automatic merge from submit-queue (batch tested with PRs 39538, 40188, 40357, 38214, 40195)
Decoupling scheduler creation from creation of scheduler.Config struc…
**What this PR does / why we need it**:
Adds functionality to the scheduler to initialize from an Configurator interface, rather then via a Config struct.
**Which issue this PR fixes**
Reduces coupling to `scheduler.Config` data structure format so that we can proliferate more interface driven composition of scheduler components.
Automatic merge from submit-queue (batch tested with PRs 40428, 40176)
Cleaup Affinity post conversion from annotations to fields
**What this PR does / why we need it**:
Cleans up leftover work from the conversion of affinity from annotations to fields.
fixes#40016
related #25319
**Special notes for your reviewer**:
There are some TODO items left for @luxas or @errordeveloper b/c they were trying to use affinity in a way that is not possible.
**Release note**:
```release-note
NONE
```
/cc @kubernetes/sig-scheduling-misc @rrati
Automatic merge from submit-queue
move client/cache and client/discovery to client-go
mechanical changes to move those packages. Had to create a `k8s.io/kubernetes/pkg/client/tests` package for tests that were blacklisted from client-go. We can rewrite these tests later and move them, but for now they'll still run at least.
@caesarxuchao @sttts
Automatic merge from submit-queue (batch tested with PRs 40130, 40419, 40416)
fixing source for heapster eventer in kubemark
Fixing the out of place heapster eventer source IP.
cc @wojtek-t @gmarek
Automatic merge from submit-queue (batch tested with PRs 40215, 40340, 39523)
Retry resource quota lookup until count stabilizes
On contended servers the service account controller can slow down,
leading to the count changing during a run. Wait up to 5s for the count
to stabilize, assuming that updates come at a consistent rate, and are
not held indefinitely.
Upstream of openshift/origin#12605 (we create more secrets and flake more often)
Automatic merge from submit-queue
Fix federation component logging when e2e test case fails
When a federation e2e test case fails, federation component logs (esp. controller-manager) were very useful in debugging the failure cause. Due to recent updates in framework, the logs were not captured. This PR will fix those issues.
cc @kubernetes/sig-federation-misc @nikhiljindal @madhusudancs
Automatic merge from submit-queue (batch tested with PRs 40335, 40320, 40324, 39103, 40315)
add some resillency to the new volume-server func
**What this PR does / why we need it**: server-pod `Create` won't fail the test if the server pod already exists.
**Special notes for your reviewer**: Formerly this pr was part of #38214
Automatic merge from submit-queue
Move b.gcr.io/k8s_authenticated_test to gcr.io/k8s-authenticated-test
**What this PR does / why we need it**:
Per https://cloud.google.com/container-registry/docs/support/deprecation-notices, b.gcr.io access will be deprecated soon.
I've already mirrored the repo to the location specified in this PR.
Automatic merge from submit-queue (batch tested with PRs 39260, 40216, 40213, 40325, 40333)
Adding framework to allow multiple upgrade tests
**What this PR does / why we need it**: This adds a framework for multiple tests to run during an upgrade. This also moves the existing services test to that framework.
On contended servers the service account controller can slow down,
leading to the count changing during a run. Wait up to 5s for the count
to stabilize, assuming that updates come at a consistent rate, and are
not held indefinitely.
Automatic merge from submit-queue
Move remaining *Options to metav1
Primarily delete options, but will remove all internal references to non-metav1 options (except ListOptions).
Still working through it @sttts @deads2k
Automatic merge from submit-queue
Adding OWNERS file for federation e2e tests
Now that we have a separate `test/e2e_federation` dir for federation tests (thanks to @shashidharatd), we can have our own OWNERS file.
OWNERS file copied from https://github.com/kubernetes/kubernetes/pull/40328.
cc @kubernetes/sig-federation-misc
Automatic merge from submit-queue
Optional configmaps and secrets
Allow configmaps and secrets for environment variables and volume sources to be optional
Implements approved proposal c9f881b7bb
Release note:
```release-note
Volumes and environment variables populated from ConfigMap and Secret objects can now tolerate the named source object or specific keys being missing, by adding `optional: true` to the volume or environment variable source specifications.
```
Automatic merge from submit-queue (batch tested with PRs 36693, 40154, 40170, 39033)
Refactored kubemark into cloud-provider independent code and GCE specific code
Ref issue #38967
The following are the major changes as part of this refactoring:
- Moved cluster-kubemark/config-default.sh -> cluster-kubemark/gce/config-default.sh (as the config is gce-specific)
- Changed kubernetes/cluster/kubemark/util.sh to source the right scripts based on the cloud-provider
- Added test/kubemark/skeleton/util.sh which defines a well-commented interface that any cloud-provider should implement to run kubemark. (We have this interface defined only for gce currently)
This includes functions like creating the master machine instance along with its resources, executing a given command on the master (like ssh), scp, deleting the master instance and its resources.
All these functions have to be overrided by each cloud provider inside the file /test/kubemark/$CLOUD_PROVIDER/util.sh
- Added the file test/kubemark/cloud-provider-config.sh which sets the variable CLOUD_PROVIDER that is later picked up by various scripts (start-kubemark.sh, stop-kubemark.sh, run-e2e-tests.sh)
- Removed test/kubemark/common.sh and moved whatever provider-independent code it had into start-kubemark.sh (the only place where the scipt is called) and moved the little gce-specific code
into test/kubemark/gce/util.sh.
- Finally, removed useless code and restructured start-kubemark.sh and stop-kubemark.sh scripts.
@kubernetes/sig-scalability-misc @wojtek-t @gmarek
Automatic merge from submit-queue (batch tested with PRs 39446, 40023, 36853)
Create environment variables from secrets
Allow environment variables to be populated from entire secrets.
**Release note**:
```release-note
Populate environment variables from a secrets.
```
Automatic merge from submit-queue
promote certificates api to beta
Mostly posting to see what breaks but also this API is ready to be promoted.
```release-note
Promote certificates.k8s.io to beta and enable it by default. Users using the alpha certificates API should delete v1alpha1 CSRs from the API before upgrading and recreate them as v1beta1 CSR after upgrading.
```
@kubernetes/api-approvers @jcbsmpsn @pipejakob
Automatic merge from submit-queue
Update root approvers files
Replaces #40040
Update top level OWNERS files mostly to set assignees to approvers. Also remove @bgrant0607 from everywhere but the very top level OWNERS file.
Automatic merge from submit-queue
move pkg/fields to apimachinery
Purely mechanical move of `pkg/fields` to apimachinery.
Discussed with @lavalamp on slack. Moving this an `labels` to apimachinery.
@liggitt any concerns? I think the idea of field selection should become generic and this ends up shared between client and server, so this is a more logical location.
Automatic merge from submit-queue
make client-go more authoritative
Builds on https://github.com/kubernetes/kubernetes/pull/40103
This moves a few more support package to client-go for origination.
1. restclient/watch - nodep
1. util/flowcontrol - used interface
1. util/integer, util/clock - used in controllers and in support of util/flowcontrol
Automatic merge from submit-queue (batch tested with PRs 40081, 39951)
Passing correct master address to kubemark NPD & authenticating+authorizing it with apiserver
Fixes#39245
Fixes https://github.com/kubernetes/node-problem-detector/issues/50
Added RBAC for npd and fixed issue with the npd falling back to inClusterConfig.
cc @kubernetes/sig-scalability-misc @wojtek-t @gmarek
Automatic merge from submit-queue
Move Federation e2e test code to independent package
**What this PR does / why we need it**: Move federation e2e test code to an independent package called e2e_federation. This will help in multiple ways.
- easy to move the federation related code to a separate repo from core.
- one step closer to register/unregister clusters to federation only once during e2e instead of every test case. we need to introduce singleton to register cluster during framework creation which will be handled in subsequent PR.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
Optimize federation e2e suite which takes long time to execute currently.
**Special notes for your reviewer**: I have tried to segregate into multiple commits. request to review commit by commit. also mostly the change is about moving the functions to a new location/package.
**Release note**:
```release-note
```
@madhusudancs @nikhiljindal @colhom
Automatic merge from submit-queue
turn on dynamic config for flaky tests
Added dynamic config to inode eviction node e2e tests in #39546, but did not enable it for flaky tests. This PR enables this feature for the flaky test suite
Automatic merge from submit-queue (batch tested with PRs 39898, 39904)
[scheduler] interface for config
**What this PR fixes**
This PR converts the Scheduler configuration factory into an interface, so that
- the scheduler_perf and scheduler integration tests dont rely on the struct for their implementation
- the exported functionality of the factory (i.e. what it needs to provide to create a scheduler configuration) is completely explicit, rather then completely coupled to a struct.
- makes some parts of the factory immutable, again to minimize possible coupling.
This makes it easier to make a custom factory in instances where we might specifically want to import scheduler logic without actually reusing the entire scheduler codebase.
Automatic merge from submit-queue
Build release tars using bazel
**What this PR does / why we need it**: builds equivalents of the various kubernetes release tarballs, solely using bazel.
For example, you can now do
```console
$ make bazel-release
$ hack/e2e.go -v -up -test -down
```
**Special notes for your reviewer**: this is currently dependent on 3b29803eb5, which I have yet to turn into a pull request, since I'm still trying to figure out if this is the best approach.
Basically, the issue comes up with the way we generate the various server docker image tarfiles and load them on nodes:
* we `md5sum` the binary being encapsulated (e.g. kube-proxy) and save that to `$binary.docker_tag` in the server tarball
* we then build the docker image and tag using that md5sum (e.g. `gcr.io/google_containers/kube-proxy:$MD5SUM`)
* we `docker save` this image, which embeds the full tag in the `$binary.tar` file.
* on cluster startup, we `docker load` these tarballs, which are loaded with the tag that we'd created at build time. the nodes then use the `$binary.docker_tag` file to find the right image.
With the current bazel `docker_build` rule, the tag isn't saved in the docker image tar, so the node is unable to find the image after `docker load`ing it.
My changes to the rule save the tag in the docker image tar, though I don't know if there are subtle issues with it. (Maybe we want to only tag when `--stamp` is given?)
Also, the docker images produced by bazel have the timestamp set to the unix epoch, which is not great for debugging. Might be another thing to change with a `--stamp`.
Long story short, we probably need to follow up with bazel folks on the best way to solve this problem.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Curating Owners: test/e2e_node
cc @random-liu @timstclair @vishh
In an effort to expand the existing pool of reviewers and establish a
two-tiered review process (first someone lgtms and then someone
experienced in the project approves), we are adding new reviewers to
existing owners files.
If You Care About the Process:
------------------------------
We did this by algorithmically figuring out who’s contributed code to
the project and in what directories. Unfortunately, that doesn’t work
well: people that have made mechanical code changes (e.g change the
copyright header across all directories) end up as reviewers in lots of
places.
Instead of using pure commit data, we generated an excessively large
list of reviewers and pruned based on all time commit data, recent
commit data and review data (number of PRs commented on).
At this point we have a decent list of reviewers, but it needs one last
pass for fine tuning.
Also, see https://github.com/kubernetes/contrib/issues/1389.
TLDR:
-----
As an owner of a sig/directory and a leader of the project, here’s what
we need from you:
1. Use PR https://github.com/kubernetes/kubernetes/pull/35715 as an example.
2. The pull-request is made editable, please edit the `OWNERS` file to
remove the names of people that shouldn't be reviewing code in the
future in the **reviewers** section. You probably do NOT need to modify
the **approvers** section. Names asre sorted by relevance, using some
secret statistics.
3. Notify me if you want some OWNERS file to be removed. Being an
approver or reviewer of a parent directory makes you a reviewer/approver
of the subdirectories too, so not all OWNERS files may be necessary.
4. Please use ALIAS if you want to use the same list of people over and
over again (don't hesitate to ask me for help, or use the pull-request
above as an example)
Automatic merge from submit-queue (batch tested with PRs 39625, 39842)
Add RBAC v1beta1
Add `rbac.authorization.k8s.io/v1beta1`. This scrubs `v1alpha1` to remove cruft, then add `v1beta1`. We'll update other bits of infrastructure to code to `v1beta1` as a separate step.
```release-note
The `attributeRestrictions` field has been removed from the PolicyRule type in the rbac.authorization.k8s.io/v1alpha1 API. The field was not used by the RBAC authorizer.
```
@kubernetes/sig-auth-misc @liggitt @erictune
Automatic merge from submit-queue
Enable lazy initialization of ext3/ext4 filesystems
**What this PR does / why we need it**: It enables lazy inode table and journal initialization in ext3 and ext4.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#30752, fixes#30240
**Release note**:
```release-note
Enable lazy inode table and journal initialization for ext3 and ext4
```
**Special notes for your reviewer**:
This PR removes the extended options to mkfs.ext3/mkfs.ext4, so that the defaults (enabled) for lazy initialization are used.
These extended options come from a script that was historically located at */usr/share/google/safe_format_and_mount* and later ported to GO so this dependency to the script could be removed. After some search, I found the original script here: https://github.com/GoogleCloudPlatform/compute-image-packages/blob/legacy/google-startup-scripts/usr/share/google/safe_format_and_mount
Checking the history of this script, I found the commit [Disable lazy init of inode table and journal.](4d7346f7f5). This one introduces the extended flags with this description:
```
Now that discard with guaranteed zeroing is supported by PD,
initializing them is really fast and prevents perf from being affected
when the filesystem is first mounted.
```
The problem is, that this is not true for all cloud providers and all disk types, e.g. Azure and AWS. I only tested with magnetic disks on Azure and AWS, so maybe it's different for SSDs on these cloud providers. The result is that this performance optimization dramatically increases the time needed to format a disk in such cases.
When mkfs.ext4 is told to not lazily initialize the inode tables and the check for guaranteed zeroing on discard fails, it falls back to a very naive implementation that simply loops and writes zeroed buffers to the disk. Performance on this highly depends on free memory and also uses up all this free memory for write caching, reducing performance of everything else in the system.
As of https://github.com/kubernetes/kubernetes/issues/30752, there is also something inside kubelet that somehow degrades performance of all this. It's however not exactly known what it is but I'd assume it has something to do with cgroups throttling IO or memory.
I checked the kernel code for lazy inode table initialization. The nice thing is, that the kernel also does the guaranteed zeroing on discard check. If it is guaranteed, the kernel uses discard for the lazy initialization, which should finish in a just few seconds. If it is not guaranteed, it falls back to using *bio*s, which does not require the use of the write cache. The result is, that free memory is not required and not touched, thus performance is maxed and the system does not suffer.
As the original reason for disabling lazy init was a performance optimization and the kernel already does this optimization by default (and in a much better way), I'd suggest to completely remove these flags and rely on the kernel to do it in the best way.
Automatic merge from submit-queue
Start making `k8s.io/client-go` authoritative for generic client packages
Right now, client-go has copies of various generic client packages which produces golang type incompatibilities when you want to switch between different kinds of clients. In many cases, there's no reason to have two sets of packages. This pull eliminates the copy for `pkg/client/transport` and makes `client-go` the authoritative copy.
I recommend going by commits, the first just synchronizes the client-go code again so that I could test the copy script to make sure it correctly preserves the original package.
@kubernetes/sig-api-machinery-misc @lavalamp @sttts
Automatic merge from submit-queue (batch tested with PRs 34763, 38706, 39939, 40020)
Use Statefulset instead in e2e and controller
Quick fix ref: #35534
We should finish the issue to meet v1.6 milestone.
Automatic merge from submit-queue
Move PatchType to apimachinery/pkg/types
Fixes https://github.com/kubernetes/kubernetes/issues/39970
`PatchType` is shared by the client and server, they have to agree, and its critical for our API to function.
@smarterclayton @kubernetes/sig-api-machinery-misc
Automatic merge from submit-queue (batch tested with PRs 39911, 40002, 39969, 40012, 40009)
Fix RBAC role for kube-proxy in Kubemark
Ref #39959
This should ensure that kube-proxy (in Kubemark) has the required role and RBAC binding.
@deads2k PTAL
cc @kubernetes/sig-scalability-misc @wojtek-t @gmarek
Automatic merge from submit-queue (batch tested with PRs 38592, 39949, 39946, 39882)
move api/errors to apimachinery
`pkg/api/errors` is a set of helpers around `meta/v1.Status` that help to create and interpret various apiserver errors. Things like `.NewNotFound` and `IsNotFound` pairings. This pull moves it into apimachinery for use by the clients and servers.
@smarterclayton @lavalamp First commit is the move plus minor fitting. Second commit is straight replace and generation.
Automatic merge from submit-queue (batch tested with PRs 38592, 39949, 39946, 39882)
Add optional per-request context to restclient
**What this PR does / why we need it**: It adds per-request contexts to restclient's API, and uses them to add timeouts to all proxy calls in the e2e tests. An entire e2e shouldn't hang for hours on a single API call.
**Which issue this PR fixes**: #38305
**Special notes for your reviewer**:
This adds a feature to the low-level rest client request feature that is entirely optional. It doesn't affect any requests that don't use it. The api of the generated clients does not change, and they currently don't take advantage of this.
I intend to patch this in to 1.5 as a mostly test only change since it's not going to affect any controller, generated client, or user of the generated client.
cc @kubernetes/sig-api-machinery
cc @saad-ali
Automatic merge from submit-queue
Fix examples e2e permission check
Ref #39382
Follow-up from #39896
Permission check should be done within the e2e test namespace, not cluster-wide
Also improved RBAC audit logging to make the scope of the permission check clearer
Automatic merge from submit-queue
Remove sleep from DynamicProvisioner test.
The comment says that the sleep is there because of 10 minute PV controller
sync. The controller sync is now 15 seconds and it should be quick enough
to hide this in subsequent `WaitForPersistentVolumeDeleted(.. , 20*time.Minute)`
Automatic merge from submit-queue (batch tested with PRs 38427, 39896, 39889, 39871, 39895)
Grant permissions to e2e examples test service account
ref #39382
Automatic merge from submit-queue
Updated unit tests
@janetkuo updated the flaky unit test to have the same structure with regard to uncasting as the rest of the tests. ptal
Automatic merge from submit-queue (batch tested with PRs 39807, 37505, 39844, 39525, 39109)
Made cache.Controller to be interface.
**What this PR does / why we need it**:
#37504
Automatic merge from submit-queue
run staging client-go update
Chasing to see what real problems we have in staging-client-go.
@sttts you get similar results?
Automatic merge from submit-queue
replace global registry in apimachinery with global registry in k8s.io/kubernetes
We'd like to remove all globals, but our immediate problem is that a shared registry between k8s.io/kubernetes and k8s.io/client-go doesn't work. Since client-go makes a copy, we can actually keep a global registry with other globals in pkg/api for now.
@kubernetes/sig-api-machinery-misc @lavalamp @smarterclayton @sttts
Automatic merge from submit-queue
Update images that use ubuntu-slim base image to :0.6
**What this PR does / why we need it**: `ubuntu-slim:0.4` is somewhat old, being based on Ubuntu 16.04, whereas `ubuntu-slim:0.6` is based on Ubuntu 16.04.1.
**Special notes for your reviewer**: I haven't pushed any of these images yet, so I expect all of the e2e builds to fail. If we're happy with the changes, I can push the images and then re-trigger tests.
**Release note**:
```release-note
NONE
```
cc @aledbf as FYI
Automatic merge from submit-queue (batch tested with PRs 39803, 39698, 39537, 39478)
[scheduling] Moved pod affinity and anti-affinity from annotations to api fields #25319
Converted pod affinity and anti-affinity from annotations to api fields
Related: #25319
Related: #34508
**Release note**:
```Pod affinity and anti-affinity has moved from annotations to api fields in the pod spec. Pod affinity or anti-affinity that is defined in the annotations will be ignored.```
Automatic merge from submit-queue
Refactor registry etcd to storage
Fixes#17546
Simple shuffle on naming so any sane new person entering the code base can understand what the actual etcd dependencies are.
Automatic merge from submit-queue (batch tested with PRs 39768, 39463)
Check if path exists before performing unmount
This is part 3 of an effort to check if path exists before performing an unmount operation.
[Part 1](https://github.com/kubernetes/kubernetes/pull/38547) and [part 2](https://github.com/kubernetes/kubernetes/pull/39311) involved auditing the different volume plugins and refactoring their `TearDownAt()s` to use the common util function/or create one if absent.
The ideal way to do this change would involve refactoring of the `TearDownAt()s` of these plugins and make a common util function that checks path. (The plugins involved in this PR use someway of unmounting a bind mount and unmounting a global path, there is also refactoring needed to consolidate disk_manager of fc, rbd and iscsi). A non-goal part of this effort can also involve refactoring all the `SetupAt()s`
In the interest of time and considering other higher priority issues that I am caught up with, I am unable to give the time the refactoring needs. Hence I've made the minimum change that would give the desired output.
I am tracking the work pending in this issue: https://github.com/kubernetes/kubernetes/issues/39251
```release-note
NONE
```
Automatic merge from submit-queue
Add [Volume] tag to all the volume-related E2E tests.
**What this PR does / why we need it**:
Tags all the volume/storage related e2e tests to make it easier to run a volume test suite.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
fixes#35542
**Special notes for your reviewer**:
Please let me know if there are tests that should/should not be included.
**Release note**:
NONE
```release-note
```
Automatic merge from submit-queue
[CRI] Don't include user data in CRI streaming redirect URLs
Fixes: https://github.com/kubernetes/kubernetes/issues/36187
Avoid userdata in the redirect URLs by caching the {Exec,Attach,PortForward}Requests with a unique token. When the redirect URL is created, the token is substituted for the request params. When the streaming server receives the token request, the token is used to fetch the actual request parameters out of the cache.
For additional security, the token is generated using the secure random function, is single use (i.e. the first request with the token consumes it), and has a short expiration time.
/cc @kubernetes/sig-node
Automatic merge from submit-queue (batch tested with PRs 39475, 38666, 39327, 38396, 39613)
e2e tests: new portforwardertester with another three tests for case …
PR include:
- add new e2e test cases for BIND_ADDRESS='0.0.0.0'
- add to portforwardertester.go os.Getenv("BIND_ADDRESS") and if not set, it should be localhost for backward compability with existing tests
- for existing tests pass explicity BIND_ADDRESS='localhost'
- rename existing tests
It was mention in the issue: #32128
cc @mzylowski @pskrzyns
Automatic merge from submit-queue (batch tested with PRs 39495, 39547)
Tag persistent volume PersistentVolume E2E [Volume][Serial][Flaky]
**What this PR does / why we need it**:
When run parallel with other tests that use PV(C)s, cross-test binding causes flakes. Add `[Serial]` tag.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: f
Partly addresses #39119
**Special notes for your reviewer**:
cc @saad-ali @jsafrane @jeffvance
Automatic merge from submit-queue (batch tested with PRs 39684, 39577, 38989, 39534, 39702)
Set PodStatus QOSClass field
This PR continues the work for https://github.com/kubernetes/kubernetes/pull/37968
It converts all local usage of the `qos` package class types to the new API level types (first commit) and sets the pod status QOSClass field in the at pod creation time on the API server in `PrepareForCreate` and in the kubelet in the pod status update path (second commit). This way the pod QOS class is set even if the pod isn't scheduled yet.
Fixes#33255
@ConnorDoyle @derekwaynecarr @vishh
Automatic merge from submit-queue (batch tested with PRs 39694, 39383, 39651, 39691, 39497)
Fix evictions test
**What this PR does / why we need it**:
Fixes bugs in evictions test. Make vet happy.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#39412
Ref: #39452
cc: @calebamiles
Automatic merge from submit-queue (batch tested with PRs 39694, 39383, 39651, 39691, 39497)
Allow rolebinding/clusterrolebinding with explicit bind permission check
Fixes https://github.com/kubernetes/kubernetes/issues/39176
Fixes https://github.com/kubernetes/kubernetes/issues/39258
Allows creating/updating a rolebinding/clusterrolebinding if the user has explicitly been granted permission to perform the "bind" verb against the referenced role/clusterrole (previously, they could only bind if they already had all the permissions in the referenced role via an RBAC role themselves)
```release-note
To create or update an RBAC RoleBinding or ClusterRoleBinding object, a user must:
1. Be authorized to make the create or update API request
2. Be allowed to bind the referenced role, either by already having all of the permissions contained in the referenced role, or by having the "bind" permission on the referenced role.
```
Automatic merge from submit-queue (batch tested with PRs 38212, 38792, 39641, 36390, 39005)
Updating federated service controller to support cascading deletion
Ref https://github.com/kubernetes/kubernetes/issues/33612
Service controller is special than other federation controllers because it does not use federatedinformer and updater to sync services (it was written before we had those frameworks).
Updating service controller code to instantiate these frameworks and then use deletion helper to perform cascading deletion.
Note that, I havent changed the queuing logic in this PR so we still dont use federated informer to manage the queue. Will do that in the next PR.
cc @kubernetes/sig-federation-misc @mwielgus @quinton-hoole
```release-note
federation: Adding support for DeleteOptions.OrphanDependents for federated services. Setting it to false while deleting a federated service also deletes the corresponding services from all registered clusters.
```
Automatic merge from submit-queue (batch tested with PRs 39695, 37054, 39627, 39546, 39615)
Add configs that run more advanced density and load tests
Wojtek is on vacation this week - @timothysc can you please take a look? It's rather terrible, but I don't have a better idea on how to make parametric tests.
cc @wojtek-t
Automatic merge from submit-queue (batch tested with PRs 39695, 37054, 39627, 39546, 39615)
Use Dynamic Config in e2e_node inode eviction test
Alternative solution to #39249. Similar to solution proposed by @vishh in #36828.
@Random-Liu @mtaufen
Automatic merge from submit-queue (batch tested with PRs 39648, 38167, 39591, 39415, 39612)
Add verbs to thirdparty resources in discovery
The namespace controller ignores thirdparty resources right now because verbs are not set. This PR sets a static list of verbs.
Moreover, integration tests are added for the discovery info of thirdparty resources.
/cc @zhouhaibing089
Automatic merge from submit-queue
Deleting federation-util-14.go that is not being used anywhere
We have the same code in federation-util.go
cc @mwielgus @madhusudancs
Automatic merge from submit-queue (batch tested with PRs 39628, 39551, 38746, 38352, 39607)
Increasing times on reconciling volumes fixing impact to AWS.
#**What this PR does / why we need it**:
We are currently blocked by API timeouts with PV volumes. See https://github.com/kubernetes/kubernetes/issues/39526. This is a workaround, not a fix.
**Special notes for your reviewer**:
A second PR will be dropped with CLI cobra options in it, but we are starting with increasing the reconciliation periods. I am dropping this without major testing and will test on our AWS account. Will be marked WIP until I run smoke tests.
**Release note**:
```release-note
Provide kubernetes-controller-manager flags to control volume attach/detach reconciler sync. The duration of the syncs can be controlled, and the syncs can be shut off as well.
```
Added [Volume] tag per issue #35542; added [Flaky] to GCE tests until confirmed fixed. Added [Serial] to NFS to address possible cross test contamination.
Automatic merge from submit-queue (batch tested with PRs 39394, 38270, 39473, 39516, 36243)
Modified run-gcloud-compute-with-retries and used it wherever possible in kubemark
This PR fixes#39335
Simple changes fixing flaky issues within kubemark.
@kubernetes/sig-scalability-misc @wojtek-t @gmarek
Automatic merge from submit-queue (batch tested with PRs 39394, 38270, 39473, 39516, 36243)
Fix wrong skipf parameter
**How to reproduce**
When run e2e test, it reports `%!!(MISSING)d(MISSING)`:
```
STEP: Checking for multi-zone cluster. Zone count = 1
Dec 6 14:16:43.272: INFO: Zone count is %!!(MISSING)d(MISSING), only run for multi-zone clusters, skipping test
[AfterEach] [k8s.io] Multi-AZ Clusters
```
We need to pass a string parameter to `SkipUnlessAtLeast`
The comment says that the sleep is there because of 10 minute PV controller
sync. The controller sync is now 15 seconds and it should be quick enough
to hide this in subsequent WaitForPersistentVolumeDeleted(.. , 20*time.Minute)
Automatic merge from submit-queue
Updated kubemark with RBAC for controllers, proxy and kubelet
Fixes issue #39244
@kubernetes/sig-scalability-misc @wojtek-t @gmarek
Automatic merge from submit-queue (batch tested with PRs 39493, 39496)
Use privileged containers for host path e2e tests
Test containers need to run as spc_t in order to interact with the host
filesystem under /tmp, as the tests for HostPath are doing. Docker will
transition the container into this domain when running the container as
privileged.
Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>
Currently, this test fails with AVC denials like:
```
time->Thu Jan 5 10:17:51 2017
type=SYSCALL msg=audit(1483629471.846:6623): arch=c000003e syscall=257 success=no exit=-13 a0=ffffffffffffff9c a1=c820010120 a2=80241 a3=1a4 items=0 ppid=4112 pid=4130 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="mt" exe="/mt" subj=system_u:system_r:svirt_lxc_net_t:s0:c123,c328 key=(null)
type=AVC msg=audit(1483629471.846:6623): avc: denied { write } for pid=4130 comm="mt" name="sub-path" dev="xvda2" ino=118491348 scontext=system_u:system_r:svirt_lxc_net_t:s0:c123,c328 tcontext=system_u:object_r:container_runtime_tmp_t:s0 tclass=dir
```
```release-note
NONE
```
/cc @ncdc @pmorie
Test containers need to run as spc_t in order to interact with the host
filesystem under /tmp, as the tests for HostPath are doing. Docker will
transition the container into this domain when running the container as
privileged.
Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>
Automatic merge from submit-queue (batch tested with PRs 36229, 39450)
Bump etcd to 3.0.14 and switch to v3 API in etcd.
Ref #20504
**Release note**:
```release-note
Switch default etcd version to 3.0.14.
Switch default storage backend flag in apiserver to `etcd3` mode.
```
Automatic merge from submit-queue (batch tested with PRs 39408, 38981)
Remove RBAC UserAll
* Removes special handling of User * subjects in rolebinding matching evaluation
* Converts v1alpha1 rolebindings to `User *` subjects to `Group system:authenticated` subjects for backwards compatibility
```release-note
RBAC's special handling of the User subject named "*" in RoleBinding and ClusterRoleBinding objects is being deprecated and will be removed in v1beta1. Existing v1alpha1 role bindings to User "*" will be converted to the group "system:authenticated". To match unauthenticated requests, RBAC role bindings must explicitly bind to the group "system:unauthenticated".
```
Automatic merge from submit-queue
Ensure invalid token returns 401 error, not 403
fixes#39267
If a user attempts to use a bearer token, and the token is rejected, the authenticator should return an error. This distinguishes requests that did not provide a bearer token (and are unauthenticated without error) from ones that attempted to, and failed.
Automatic merge from submit-queue
Remove jobs that do not exist from active list of CronJob
**What this PR does / why we need it**: This PR modifies the controller for CronJob to remove from the active job list any job that does not exist anymore, to avoid staying blocked in active state forever. See #37957.
**Which issue this PR fixes**: fixes#37957
**Special notes for your reviewer**:
**Release note**:
```
```
Automatic merge from submit-queue (batch tested with PRs 38433, 36245)
Allow pods to define multiple environment variables from a whole ConfigMap
Allow environment variables to be populated from ConfigMaps
- ConfigMaps represent an entire set of EnvVars
- EnvVars can override ConfigMaps
fixes#26299
Automatic merge from submit-queue (batch tested with PRs 39280, 37350, 39389, 39390, 39313)
Moves e2e service util functions into service_util.go and cleans up
Basically moves codes into a central place for service util functions.
Some other codes are touched mostly only due to this migration. Also put a bunch of network reachability utils functions into network_utils.go. They seem somehow redundant, may consider combine they later.
@bowei @freehan
Automatic merge from submit-queue (batch tested with PRs 39001, 39104, 35978, 39361, 39273)
refactored admission to avoid internal client references
Refactored admission to avoid internal client references. This required switching to plugin initializers for them. And that required some rewiring of the plugin initializers.
Technically I can decouple from the other two commits, but I'm optimistic that those will go through easy. This is slightly move invasive, but I'd like to shoot for pre-christmas to avoid new admission plugins coming through and breaking bits.
@sttts @derekwaynecarr
Automatic merge from submit-queue (batch tested with PRs 39092, 39126, 37380, 37093, 39237)
Endpoints with TolerateUnready annotation, should list Pods in state terminating
**What this PR does / why we need it**:
We are using preStop lifecycle hooks to gracefully remove a node from a cluster. This hook is potentially long running and after the preStop hook is fired, the DNS resolution of the soon to be stopped Pod is failing, which causes a failure there.
**Special notes for your reviewer**:
Would be great to backport that to 1.4, 1.3
**Release note**:
```release-note
Endpoints, that tolerate unready Pods, are now listing Pods in state Terminating as well
```
@bprashanth
Automatic merge from submit-queue (batch tested with PRs 39150, 38615)
Add work queues to PV controller
PV controller should not use Controller.Requeue, as as it is not available in
shared informers. We need to implement our own work queues instead, where we
can enqueue volumes/claims as we want.
Automatic merge from submit-queue
Add Persistent Volume E2E in the context of a disrupted kubelet
This PR adds a test suite for persistent volumes affected by a disrupted kubelet. Two cases are presented:
1. A volume mounted via PVC remains accessible after a kubelet restart.
2. When a pod is deleted while the kubelet is down, the mounted volume is unmounted successfully.
PV controller should not use Controller.Requeue, as as it is not available in
shared informers. We need to implement our own work queues instead where we
can enqueue volumes/claims as we want.
Automatic merge from submit-queue
Remove all MAINTAINER statements in the codebase as they are deprecated
**What this PR does / why we need it**:
ref: https://github.com/docker/docker/pull/25466
**Release note**:
```release-note
Remove all MAINTAINER statements in Dockerfiles in the codebase as they are deprecated by docker
```
@ixdy @thockin (who else should be notified?)
Automatic merge from submit-queue
Remove system:anonymous check from kubectl test
This verbiage doesn't appear when the cluster is `AlwaysAllow` (and just makes the check more brittle).
Follow-on to #39263, this is the last (consistent) failure on [kops-aws](https://k8s-testgrid.appspot.com/google-aws#kops-aws&sort-by-failures=)
Automatic merge from submit-queue
Avoid unnecessary memory allocations
Low-hanging fruits in saving memory allocations. During our 5000-node kubemark runs I've see this:
ControllerManager:
- 40.17% k8s.io/kubernetes/pkg/util/system.IsMasterNode
- 19.04% k8s.io/kubernetes/pkg/controller.(*PodControllerRefManager).Classify
Scheduler:
- 42.74% k8s.io/kubernetes/plugin/pkg/scheduler/algrorithm/predicates.(*MaxPDVolumeCountChecker).filterVolumes
This PR is eliminating all of those.
Automatic merge from submit-queue
CreateNodeSelectorPods should respect parameter
Fix (1): `CreateNodeSelectorPods` should respect parameter `id`.
The existing e2e does not break because it happened use "node-selector" as id, which is the same as the hard coded value.
Fix (2): The current `CreateNodeSelectorPods` does not use `nodeSelector` parameter, it hard coded a label instead.
The reason current e2e does not influenced because we happened use the same label: https://github.com/kubernetes/kubernetes/blob/master/test/e2e/cluster_size_autoscaling.go#L177
Found these bugs during testing #36238
Automatic merge from submit-queue
Begin paths for internationalization in kubectl
This is just the first step, purposely simple so we can get the interface correct.
@kubernetes/sig-cli @deads2k
Automatic merge from submit-queue
Support loading UTF16 files if a byte-order-mark is present
Add support in kubectl for loading UTF16 encoded files if they have a correct BOM (Byte-Order-Mark https://en.wikipedia.org/wiki/Byte_order_mark) at the beginning
of the file. Falls back on UTF8 encoding, if no understandable BOM is present.
Fixes part of https://github.com/kubernetes/kubernetes/issues/39007
@fabianofranz @deads2k @kubernetes/sig-cli-misc
Automatic merge from submit-queue (batch tested with PRs 39059, 39175, 35676, 38655)
ReplicaSet has onwer ref of the Deployment that created it
**What this PR does / why we need it**:
This enabled garbage collection for ReplicaSets and ensures they are owned by their respective Deployment objects.
fixes https://github.com/kubernetes/kubernetes/issues/33845
This is an initial PR to get feedback. Will update this quickly with unit tests if this seems like in the right direction
Automatic merge from submit-queue
In-cluster configs must take flag overrides into account
**What this PR does / why we need it**: Some flags must override in-cluster configs if provided to `kubectl` inside a cluster.
**Which issue this PR fixes**: Fixes https://github.com/kubernetes/kubernetes/issues/38834
**Release note**:
```release-note
Fixed a bug where the --server, --token, and --certificate-authority flags were not overriding the related in-cluster configs when provided in a `kubectl` call inside a cluster.
```
Automatic merge from submit-queue
remove unneeded authenticator dependencies from genericapiserver
Refactors the authenticator options to remove unneeded dependencies.
@sttts
Automatic merge from submit-queue (batch tested with PRs 39146, 39094)
cleanup last e2e authorization failures
Builds on https://github.com/kubernetes/kubernetes/pull/39080. This adds rbac role bindings during e2e tests for test that use SA permissions to loopback to the API server.
Assigned to me until its ready.
Automatic merge from submit-queue
Node E2E: Set user with `--ssh-user` flag when running remote node e2e.
This PR unblocks https://github.com/kubernetes/test-infra/issues/1348.
In our test environment, we must login test instance as user `jenkins` because of the service account. Node e2e is always using the default user on the host, which works fine till now, because it is always run as `jenkins` in our test environment.
However, now we moved the test runner into a docker container, inside the container user is `root` by default, which will cause error:
```
Permission denied (publickey)
```
This PR added a flag `--ssh-user` to explicitly specify the user used to ssh into test instance. The dockerized test runner can set user to `jenkins` with this flag.
@krzyzacy @ixdy
Automatic merge from submit-queue
register batch/jobs to federation-apiserver
register batch/jobs api objects to federation-apiserver
**Release note**:
```release-note
Federation: Add `batch/jobs` API objects to federation-apiserver
```
@quinton-hoole @nikhiljindal @deepak-vij
#34261
Automatic merge from submit-queue
Added 'hollow'-node-problem-detector to hollow-nodes in kubemark
Added node-problem-detector container in kubemark hollow-nodes, which takes in a 'hollow' (having an empty list of rules and conditions) kernel monitor config.
cc @kubernetes/sig-scalability-misc @wojtek-t @gmarek
Automatic merge from submit-queue (batch tested with PRs 36751, 38968)
Convert * users/groups to system:authenticated group in ABAC
Part of enabling anonymous auth by default in 1.6 means protecting earlier policies that did not intend to grant access to anonymous users.
This modifies ABAC policies that match `user` or `group` `*` to only match authenticated users.
Docs PR to update examples to use `system:authenticated` or `system:unauthenticated` groups explicitly: https://github.com/kubernetes/kubernetes.github.io/pull/1992
```release-note
ABAC policies using "user":"*" or "group":"*" to match all users or groups will only match authenticated requests. To match unauthenticated requests, ABAC policies must explicitly specify "group":"system:unauthenticated"
```
Automatic merge from submit-queue
Moved kubemark master from Debian to GCI
This PR fixes issue #37484
Kubemark master now runs on GCI instead of Debian, taking it one step closer to a real cluster master.
Primary changes:
1. changing master VM image/OS in kubemark's config-default.sh to debian
2. moving kubelet to systemd from supervisord
3. changing directory for cert/key/csv files from /srv/kubernetes to /etc/srv/kubernetes
cc @kubernetes/sig-scalability-misc @wojtek-t @gmarek
Automatic merge from submit-queue
Add test to detach a pd whose node was deleted
**What this PR does / why we need it**:
A test for the following issue :
If a node with a GCE PD attached is deleted (before the volume is detached), subsequent attempts by the attach/detach controller to detach it should not fail.
**Bonus** :Added additional code to ensure that the pd can still be attached to a different node.
Edit : Removed it as it was making the test much slower.
https://github.com/kubernetes/kubernetes/issues/29358
Automatic merge from submit-queue (batch tested with PRs 38426, 38917, 38891, 38935)
Support different image during GCE node upgrade
**What this PR does / why we need it**: It lets GCE upgrade tests upgrade to a GCI node image.
**Which issue this PR fixes**: fixes#37855
Automatic merge from submit-queue (batch tested with PRs 38942, 38958)
Added MULTIZONE flag to e2e remove master script.
Added MULTIZONE flag to e2e remove master script. The script is used by HA tests which set-up multizone cluster.
Automatic merge from submit-queue (batch tested with PRs 34353, 33837, 38878)
Add e2e test for configmap volume
There are two patches:
- refactor e2e volume tests to allow multiple volumes mounted into single pod
- add a test for ConfigMap volume mounted twice to test #28502
Automatic merge from submit-queue (batch tested with PRs 34353, 33837, 38878)
Gce persistentvolume testing
Add E2E PersistentVolume test for a GCE environment. Tests that deleting a PV or PVC before the referencing pod does not fail on unmount and detach during pod deletion.
cc @jeffvance
Automatic merge from submit-queue (batch tested with PRs 37468, 36546, 38713, 38902, 38614)
Remove extensions/v1beta1 Job
Fixes https://github.com/kubernetes/kubernetes/issues/32763. This endpoint was deprecated in 1.5 and was planned to be removed in 1.6.
**Release note**:
```release-note
Remove extensions/v1beta1 Jobs resource, and job/v1beta1 generator.
```
Automatic merge from submit-queue (batch tested with PRs 37468, 36546, 38713, 38902, 38614)
Adds e2e firewall tests for LoadBalancer service, ingress, and e2e cluster
Fixes#25488 and fixes#31827.
This PR adds e2e firewall test for LoadBalancer type service, ingress and e2e cluster.
Test details for LoadBalancer type service as below:
- Verifies corresponding firewall rule has correct `sourceRanges`, `ports and protocols` and `target tags`.
- Verifies requests can reach all expected instances.
- Verifies requests can not reach instances that are not included.
Overview of the test procedure:
- Creates a LoadBalancer type service.
- Validates the corresponding firewall rule.
- Creates netexec pods as service backends.
- Sends requests from outside of the cluster and examine hitting all instances in range.
- Removes tags from one of the instances in order to get it out of firewall rule's range.
- Sends requests from outside of the cluster and examine not hitting this instance.
- Recovers tags for this instances and verifies its traffic is back.
@bprashanth @bowei @thockin
For LoadBalancer type service:
- Verifies corresponding firewall rule has correct sourceRanges, ports
& protocols, target tags.
- Verifies requests can reach all expected instances.
- Verifies requests can not reach instances that are not included.
For Ingress resrouce:
- Verifies the ingress firewall rule has correct sourceRanges, target
tags and tcp ports.
For general e2e cluster:
- Verifies all required firewall rules has correct sourceRange, ports
& protocols, source tags and target tags.
- Verifies well know ports on master and nodes are not
exposed externally
Automatic merge from submit-queue
Don't check nodeport for nginx ingress
Services behind a standard nginx ingress don't need nodeport, so don't check that.
Extracted delete operations into functions
wait on pv/pvc bind
removed redundant verification, minor refactors
GCEPD: fixed typo
name verifyDiskAttached to verifyGCEDiskAttached
fix empty log msg
Updated test owners
removed unnecessary api calls
Check for apierr IsNotFound for pod,pv,pvc but ignore result
Disable dynamic provisioning in test PVCs
gofmt'd
Automatic merge from submit-queue
Fix Recreate for Deployments and stop using events in e2e tests
Fixes https://github.com/kubernetes/kubernetes/issues/36453 by removing events from the deployment tests. The test about events during a Rolling deployment is redundant so I just removed it (we already have another test specifically for Rolling deployments).
Closes https://github.com/kubernetes/kubernetes/issues/32567 (preferred to use pod LISTs instead of a new status API field for replica sets that would add many more writes to replica sets).
@kubernetes/deployment
Automatic merge from submit-queue (batch tested with PRs 38830, 38750)
[Federation] Stop cleaning federation namespace in e2e tests
when --clean-start=true flag is provided to e2e tests it would cleanup all the leftover namespaces except `default` and `kube-system` and because of this when we run e2e tests in federation soak test job, the federation control plane is destroyed before it runs the tests and all tests start to fail.
So adding federation-system to the list of namespace to be left intact and also changed the default federation namespace name from `federation` to `federation-system` to be consistent with the newer method of deploying federation using kubefed.
@madhusudancs @nikhiljindal
Automatic merge from submit-queue (batch tested with PRs 38830, 38750)
Remove the ReadyReplica version guard
**What this PR does / why we need it**: Removes outlived version guards.
**Which issue this PR fixes**: fixes#37310
Automatic merge from submit-queue
Node Conformance Test: Fix report prefix for node conformance test.
The node conformance CI is running now.
The only problem is that junit files overwrite each other because of the lack of junit prefix. http://gcsweb.k8s.io/gcs/kubernetes-jenkins/logs/ci-kubernetes-node-kubelet-conformance/42/artifacts/
This PR fixes this. I've verified in my environment, it works well.
@timstclair
Automatic merge from submit-queue (batch tested with PRs 38788, 38821, 38829)
Node Conformance Test: Fix node conformance test.
The test suite could build on my desktop. However it is failing on jenkins.
https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-node-kubelet-conformance/1
It turns out that `docker save $IMAGE -o $FILE` only works for docker 1.12. (My desktop is 1.12) For older version docker, we should use `docker save -o $FILE $IMAGE instead`. (Jenkins is using 1.9.1)
@timstclair Could you help me review this short PR? :)
Automatic merge from submit-queue (batch tested with PRs 38154, 38502)
Rename "release_1_5" clientset to just "clientset"
We used to keep multiple releases in the main repo. Now that [client-go](https://github.com/kubernetes/client-go) does the versioning, there is no need to keep releases in the main repo. This PR renames the "release_1_5" clientset to just "clientset", clientset development will be done in this directory.
@kubernetes/sig-api-machinery @deads2k
```release-note
The main repository does not keep multiple releases of clientsets anymore. Please find previous releases at https://github.com/kubernetes/client-go
```
Automatic merge from submit-queue
make spellcheck for test/*
**What this PR does / why we need it**: Increase code readability
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**: only a slight effort
**Release note**:
```release-note
```
Automatic merge from submit-queue
genericapiserver: unify swagger and openapi in config
- make swagger config customizable
- remove superfluous `Config.Enable*` flags for OpenAPI and Swagger.
This is necessary for downstream projects to tweak the swagger spec.
Automatic merge from submit-queue
Node Conformance: Node Conformance CI
For https://github.com/kubernetes/kubernetes/issues/37252.
The first 2 commits of this PR are from #38150 and #38152. Please review those 2 PRs first, they are both minor cleanup.
This PR:
* Add `TestSuite` interface in `test/e2e_node/remote` to separate test suite logic (packaging, deploy, run test) from VM lifecycle management logic, so that different test suites can share the same VM lifecycle management logic.
* Different test suites such as node e2e, node conformance, node soaking, cri validation etc. should implement different `TestSuite`.
* `test/e2e_node/runner/remote` will initialize and run different test suite based on the subcommand.
* Add `run-kubelet-mode` which only starts and monitors kubelet, similar with `run-services-mode`. The reason we need this:
* Unlike node e2e, node conformance test doesn't start kubelet inside the test suite (in fact, in the future node e2e shouldn't do that either), it assumes kubelet is already running before the test.
* In fact, node e2e should use similar node bootstrap script like cluster e2e, and the bootstrap script should initialize the node with all necessary node software including kubelet. However, it's not the case now.
* The easiest way for now is to reuse the kubelet start logic in the test suite. So in this PR, we added `run-kubelet-mode`, and use the test binary as a kubelet launcher to start kubelet before running the test.
* Implement node e2e `TestSuite`.
* Implement node conformance `TestSuite`. Use `docker save` and `docker load` to create and deploy conformance docker image; Start kubelet by running test binary in `run-kubelet-mode`; Run conformance test with `docker run`.
This PR will make it easy to implement continuous integration node soaking test and cri validation test (https://github.com/kubernetes/kubernetes/pull/35266).
/cc @kubernetes/sig-node
Automatic merge from submit-queue
Inode Eviction Test is Flaky
This Pull Request:
Marks the InodeEviciton test as flaky
Increases the timeout for disk pressure because coreos has nearly 2 million inodes.
Decreases the status polling interval so we can see eviction ordering better.
@Random-Liu
Automatic merge from submit-queue
Add a package for handling version numbers (including non-"Semantic" versions)
As noted in #32401, we are using Semantic Version-parsing libraries to parse version numbers that aren't necessarily "Semantic". Although, contrary to what I'd said there, it turns out that this wasn't actually currently a problem for the iptables code, because the regexp used to extract the version number out of the "iptables --version" output only pulled out three components, so given "iptables v1.4.19.1", it would have extracted just "1.4.19". Still, it could be a problem if they later release "1.5" rather than "1.5.0", or if we eventually need to _compare_ against a 4-digit version number.
Also, as noted in #23854, we were also using two different semver libraries in different parts of the code (plus a wrapper around one of them in pkg/version).
This PR adds pkg/util/version, with code to parse and compare both semver and non-semver version strings, and then updates kubernetes to use it everywhere (including getting rid of a bunch of code duplication in kubelet by making utilversion.Version implement the kubecontainer.Version interface directly).
Ironically, this does not actually allow us to get rid of either of the vendored semver libraries, because we still have other dependencies that depend on each of them. (cadvisor uses blang/semver and etcd uses coreos/go-semver)
fixes#32401, #23854
Automatic merge from submit-queue
Add an option to run Job in Density/Load config
cc @timothysc @jeremyeder
@erictune @soltysh - I run this test and it seems to me that Job has noticeably worse performance than Deployment. I'll create an issue for this, but this PR is for easy repro.
Automatic merge from submit-queue (batch tested with PRs 38609, 38227)
On kubemark master, kubelet now runs as a supervisord process and all master components as pods
This PR fixes issue #37485
On kubemark, previously we had a custom setup that runs master components as top-level processes under supervisord, which is lighter than running kubelet and docker.
This PR makes kubelet run as a process under supervisord and the master components (apiserver, controller-manager, scheduler, etcd) as pods, making testing on kubemark mimic real clusters better.
Also, start-kubemark-master.sh now closely resembles cluster/gce/gci/configure-helper.sh, allowing easy integration in future.
cc @kubernetes/sig-scalability @wojtek-t @gmarek
Automatic merge from submit-queue (batch tested with PRs 38419, 38457, 38607)
Node E2E: Update CVM version to e2e-node-containervm-v20161208-image.
I built the new node e2e image from e2e-node-containervm-v20161208-image.
@timstclair
/cc @kubernetes/sig-node
Automatic merge from submit-queue
test: cleanup test logs for deployments
@mfojtik @janetkuo this will help with the deployment logs (should make them a bit cleaner) ptal
Automatic merge from submit-queue (batch tested with PRs 34002, 38535, 37330, 38522, 38423)
Node E2E: `make test-e2e-node` runs the same test with pr builder by default.
This PR makes `make test-e2e-node` run non-serial, non-flaky, non-slow test by default.
This will make it easier to use.
/cc @timstclair
Automatic merge from submit-queue (batch tested with PRs 37860, 38429, 38451, 36050, 38463)
[Part 2] Adding s390x cross-compilation support for gcr.io images in this repo
<!-- Thanks for sending a pull request! Here are some tips for you:
1. If this is your first time, read our contributor guidelines https://github.com/kubernetes/kubernetes/blob/master/CONTRIBUTING.md and developer guide https://github.com/kubernetes/kubernetes/blob/master/docs/devel/development.md
2. If you want *faster* PR reviews, read how: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/faster_reviews.md
3. Follow the instructions for writing a release note: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/pull-requests.md#release-notes
-->
**What this PR does / why we need it**: This PR enables s390x support to kube-dns , pause, addon-manager, etcd, hyperkube, kube-discovery etc. This PR also includes the changes due to which it can be cross compiled on x86 host architecture.
**Which issue this PR fixes#34328
**Special notes for your reviewer**: In existing file "build-tools/build-image/cross/Dockerfile" the repository mentioned for installing cross build tool chains for supporting architecture does not have a tool chain for s390x hence in my PR I am changing the repository so that it will be cross compiled for s390x.
**Release note**:
<!-- Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access)
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`.
-->
```
Allows cross compilation of Kubernetes on x86 host for s390x also enables s390x support to kube-dns , pause, addon-manager, etcd, hyperkube, kube-discovery etc
```
Automatic merge from submit-queue (batch tested with PRs 38354, 38371)
Add GetOptions parameter to Get() calls in client library
Ref #37473
This PR is super mechanical - the non trivial commits are:
- Update client generator
- Register GetOptions in batch/v2alpha1 group
Automatic merge from submit-queue (batch tested with PRs 38278, 37770)
Refactor REST storage to use generic defaults
This removes the repetition in the REST storage builders by moving the logic to `restoptions.ApplyOptions`. `registry.StorageWithCacher`/`generic.StorageDecorator` no longer assume that they can build the `keyFunc` for arbitrary objects. `restoptions.ApplyOptions` uses the `registry.Store`'s `KeyFunc` for its call to `generic.StorageDecorator`.
```release-note
Cluster federation servers have changed the location in etcd where federated services are stored, so existing federated services must be deleted and recreated. Before upgrading, export all federated services from the federation server and delete the services. After upgrading the cluster, recreate the federated services from the exported data.
```
Automatic merge from submit-queue (batch tested with PRs 36419, 38330, 37718, 38244, 38375)
Guarantees drop packets commands succeed in reboot test
Fixes the main case in #33405 and #36230.
Previous attempted fix in #38057.
During the reboot test, the iptables command that was supposed to take the node offline failed to exec.
Turned out the xtables lock was holding by other processes led to this failure. Logs as below:
```
I1202 20:00:29.686] Dec 2 20:00:29.685: INFO: ssh jenkins@146.148.111.167:22: stdout:
"+ sleep 10
+ sudo iptables -I INPUT 1 -s 127.0.0.1 -j ACCEPT
Another app is currently holding the xtables lock. Perhaps you want to use the -w option?"
I1202 20:00:29.686] Dec 2 20:00:29.685: INFO: ssh jenkins@146.148.111.167:22: stderr: ""
I1202 20:00:29.686] Dec 2 20:00:29.685: INFO: ssh jenkins@146.148.111.167:22: exit code: 0
```
This reboot test won't pass if any one of these iptables commands fails. This PR put "reboot" commands into while loops to guarantee it retries until succeed.
`sudo iptables -t filter -nL` is removed since it is clear now that the `FILTER` rules won't be clobbered.
(Tests passed on local cluster.)
@bprashanth
Automatic merge from submit-queue (batch tested with PRs 36419, 38330, 37718, 38244, 38375)
adjusted timeouts for inode eviction and garbage collection tests
Inode eviction tests appear to run slower on coreos than the other operating systems I tested on.
I adjusted the timeout for the test from 10 to 30 minutes to compensate.
Garbage collection tests also flake occasionally due to timeouts.
I adjusted the timeout for runtime commands from 2 to 3 minutes, and removed an unused constant.
cc: @Random-Liu
Automatic merge from submit-queue (batch tested with PRs 36071, 32752, 37998, 38350, 38401)
Pass addressable values to DeepCopy
Extracted from https://github.com/kubernetes/kubernetes/pull/35728
These are the places we are currently calling DeepCopy incorrectly, and we need to fix, even if we don't pick up the changes to DeepCopy in #35728:
* creating a new cloner means we have no generated functions registered
* passing non-addressable values doesn't pick up generated deep copy functions, and forces us into reflective mode
Automatic merge from submit-queue (batch tested with PRs 36071, 32752, 37998, 38350, 38401)
Add test for concurrent evictions requests
This is a followup PR after #37668.
Add a test case to make sure concurrent eviction requests can be handled.
@davidopp @lavalamp
Automatic merge from submit-queue (batch tested with PRs 35939, 38381, 37825, 38306, 38110)
Moved start-kubemark-master.sh from test/kubemark/ to test/kubemark/r…
Automatic merge from submit-queue
Fix useless uuid in container log path node e2e
@timstclair pointed out there're nits in original PR, ref: https://github.com/kubernetes/kubernetes/pull/34877
So this patch:
1. removed useless uuid
2. change all those strings to const
Thanks. 🐱
Automatic merge from submit-queue (batch tested with PRs 36626, 37294, 37463, 37943, 36541)
Use tmpfs for gluster-server container volumes.
Gluster server needs a filesystem that supports extended attributes for its data. Some distros (Debian, Ubuntu) ship Docker that runs containers on aufs, which does not support extended attributes and therefore Gluster server fails there.
We can use tmpfs for Gluster server data volumes. ~~This expects that host's /tmp is tmpfs, which is true for Debian, Ubuntu, RHEL, CentOS and Fedora.~~
I reworked it to mount tmpfs inside the container, the server pod is privileged.
And *after* this PR is merged and new Gluster server container image is pushed we need to bump the image version in https://github.com/kubernetes/kubernetes/blob/master/test/e2e/volumes.go#L407
Edit: I also fixed Gluster server Dockerfile, it was not working at all.
Automatic merge from submit-queue (batch tested with PRs 37092, 37850)
Turns on dns horizontal scaling tests for GKE
Seems like the dns-autoscaler is already enabled in [this recent gke build](https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-e2e-gci-gke/769/).
Turning on the corresponding e2e tests to increase test coverage.
Probably better to wait for this fix#37261 to go in first.
@bowei @bprashanth
cc @maisem @roberthbailey
Automatic merge from submit-queue (batch tested with PRs 37325, 38313, 38141, 38321, 38333)
Decrease expected lower bound for misc CPU
Fixes https://github.com/kubernetes/kubernetes/issues/34990
We started enforcing expectations on the `misc` system container in https://github.com/kubernetes/kubernetes/pull/37856, but the CPU usage tends to be lower than the `kueblet` & `runtime` containers (to be expected). For simplicity, I lowered the lower bound for all system containers.
Automatic merge from submit-queue (batch tested with PRs 37325, 38313, 38141, 38321, 38333)
Cleanup firewalls, add nginx ingress to presubmit
Make the firewall cleanup code follow the same pattern as the other cleanup functions, and add the nginx ingress e2e to presubmit.
Planning to watch the test for a bit, and if it works alright, I'll add the other Ingress e2e to post-submit merge blocker.
Automatic merge from submit-queue (batch tested with PRs 37325, 38313, 38141, 38321, 38333)
Fix running e2e with 'Completed' kube-system pods
As of now, e2e runner keeps waiting for pods in `kube-system` namespace to be "Running and Ready" if there are any pods in `Completed` state in that namespace.
This for example happens after following [Kubernetes Hosted Installation](http://docs.projectcalico.org/v2.0/getting-started/kubernetes/installation/#kubernetes-hosted-installation) instructions for Calico, making it impossible to run conformance tests against the cluster. It's also to possible to reproduce the problem like that:
```
$ cat testjob.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: tst
namespace: kube-system
spec:
template:
metadata:
name: tst
spec:
containers:
- name: tst
image: busybox
command: ["echo", "test"]
restartPolicy: Never
$ kubectl create -f testjob.yaml
$ go run hack/e2e.go -v --test --test_args='--ginkgo.focus=existing\s+RC'
```
Automatic merge from submit-queue
Delete regional static-ip instead of global for type=lb
Global vs region is the difference between
```
$ gcloud compute addresses delete foo --global
$ gcloud compute addresses delete foo --region us-central1
```
Type=LoadBalancer users the second type and were were doing the first.
Also adds some logging.
Automatic merge from submit-queue
Fix scheduler_perf test so that QPS is non-zero even if there is a scheduling "cold start"
@gmarek ... is something like this more realistic as an expectation ? That is, wait till scheduling starts, then wait 1 second before any polling attempts.... i.e.
- gaurantee uniform sleep before measure, by doing it first rather then last
- print the initial status so that its easier to debug in case of issues
#36532
Automatic merge from submit-queue (batch tested with PRs 38294, 37009, 36778, 38130, 37835)
fix permissions when using fsGroup
Currently, when an fsGroup is specified, the permissions of the defaultMode are not respected and all files created by the atomic writer have mode 777. This is because in `SetVolumeOwnership()` the `filepath.Walk` includes the symlinks created by the atomic writer. The symlinks have mode 777 when read from `info.Mode()`. However, when the are chmod'ed later, the chmod applies to the file the symlink points to, not the symlink itself, resulting in the wrong mode for the underlying file.
This PR skips chmod/chown for symlinks in the walk since those operations are carried out on the underlying file which will be included elsewhere in the walk.
xref https://bugzilla.redhat.com/show_bug.cgi?id=1384458
@derekwaynecarr @pmorie
Automatic merge from submit-queue (batch tested with PRs 38173, 38151, 38197, 38221)
test: wait for ready replica set before adopting
Reworked version of https://github.com/kubernetes/kubernetes/pull/36439 which was reverted in https://github.com/kubernetes/kubernetes/pull/38049. This PR doesn't use any of the new status API added in replica sets so it should cause no trouble with upgrade tests.
@kubernetes/deployment @smarterclayton
Automatic merge from submit-queue (batch tested with PRs 37032, 38119, 38186, 38200, 38139)
New ns param for NewClusterVerification
**What this PR does / why we need it**: Allows the test to specify alternate namespaces to when waiting for pods to be in a specific state.
**Which issue this PR fixes**: fixes#38138
**Special notes for your reviewer**: Minor fix
**Release note**: None
Automatic merge from submit-queue (batch tested with PRs 37032, 38119, 38186, 38200, 38139)
Detect long-running requests from parsed request info
Follow up to https://github.com/kubernetes/kubernetes/pull/36064
Uses parsed request info to more tightly match verbs and subresources
Removes regex-based long-running request path matching (which is easily fooled)
```release-note
The --long-running-request-regexp flag to kube-apiserver is deprecated and will be removed in a future release. Long-running requests are now detected based on specific verbs (watch, proxy) or subresources (proxy, portforward, log, exec, attach).
```
Automatic merge from submit-queue
Add integration tests for desire state of world populator
Add integration tests for desire state of world populator
This adds tests for code introduced here :
https://github.com/kubernetes/kubernetes/issues/26994
Via integration test we can now verify that if pod delete
event is somehow missed by AttachDetach controller - it still
get cleaned up by Desired State of World populator.
Automatic merge from submit-queue (batch tested with PRs 38194, 37594, 38123, 37831, 37084)
remove unnecessary fields from genericapiserver config
Cleans up some unnecessary fields in the genericapiserver config.
Automatic merge from submit-queue
Skip not registered nodes in labeling in CA e2e tests
This PR fixes problems with querying for not yet registered nodes. The underlying problem is related to the way the test is written. So we apply labels to the existing nodes, create pods that require N+1 nodes with the labels and expect a new node to be added. But the new node is created without the labels. As soon as the node is spotted it is labeled. But sometimes it is too late. CA notices that the new node doesn't solve the problem and ask for another, hoping that this time it will get the node with the labels. The node is added by MIG but it takes a minute or more for the node to start and register in kubernetes. At this moment the labeling is started. The list of nodes to be labeled is taken from MIG. The extra node is there. But it is not in kubernetes yet. So 404 error is returned on labeling attempt and test fails.
This PR filters the list of nodes to be labeled and applies the labels only on the fully registered nodes.
Fixes 404 in #33754
cc: @jszczepkowski @piosz @fgrzadkowski
Automatic merge from submit-queue (batch tested with PRs 36990, 37494, 38152, 37561, 38136)
api federation types
First commit adds types that can back the kubernetes-discovery server with an `kubectl` compatible way of adding federated servers. Second commit is just generated code.
After we have types, I'd like to start splitting `kubernetes-discovery` into a "legacy" mode which will support what we have today and a "normal" mode which will provide an API federation server like this: https://github.com/kubernetes/kubernetes/blob/master/docs/proposals/federated-api-servers.md that includes both discovery and proxy in a single server. Something like this: https://github.com/openshift/kube-aggregator .
@kubernetes/sig-api-machinery @nikhiljindal
Automatic merge from submit-queue (batch tested with PRs 36990, 37494, 38152, 37561, 38136)
Node E2E: Move ssh related functions into ssh.go.
This PR moves all ssh related functions and variables into a separate file `ssh.go`.
This is a minor cleanup preparing for my test framework refactoring work. Will send out the refactor PR later.
/cc @kubernetes/sig-node
This adds tests for code introduced here :
https://github.com/kubernetes/kubernetes/issues/26994
Via integration test we can now verify that if pod delete
event is somehow missed by AttachDetach controller - it still
get cleaned up by Desired State of World populator.
Automatic merge from submit-queue (batch tested with PRs 37870, 36643, 37664, 37545)
Add option to disable federation ingress controller
**What this PR does / why we need it**:
Added an option to enable/disable federation ingress controller as currently federated ingresses doesn't work in environments other than GCE/GKE. Also ignore reconcile config maps if no federated ingresses exist.
**Which issue this PR fixes**
fixes#33943
@quinton-hoole
**Release note**:
```release-note
Add `--controllers` flag to federation controller manager for enable/disable federation ingress controller
```
Automatic merge from submit-queue
[Federation] Separate the cleanup phases of service and service shards so that service shards can be cleaned up even after the service is deleted elsewhere.
Fixes Federated Service e2e test.
This separation is necessary because "Federated Service DNS should be
able to discover a federated service" e2e test recently added a case
where it deletes the service from federation but not the shards from
the underlying clusters.
Because of the way cleanup was implemented in the AfterEach block
currently, we did not cleanup any of the underlying shards. However,
separating the two phases of the cleanup needs this separation.
cc @kubernetes/sig-cluster-federation @nikhiljindal
Automatic merge from submit-queue (batch tested with PRs 38149, 38156, 38150)
Node E2E: Remove setup-node option
This PR removes `setup-node` option, because:
* It is misleading. `setup-node` doesn't really setup the node, test framework will only put current user into docker user group when it is specified.
* It is not necessary anymore. Because we always run node e2e test as root now, we don't need to do this anymore.
This is a minor cleanup preparing for my test framework refactoring work. Will send out the refactor PR later.
/cc @kubernetes/sig-node
Automatic merge from submit-queue (batch tested with PRs 38149, 38156, 38150)
Remove girishkalele from most places
@matchstick you might need to help here. I am doing this because the bot is trying to create an issue assigned to @girishkalele but it cannot be created as he is not a member of the org any longer.
Automatic merge from submit-queue (batch tested with PRs 37328, 38102, 37261, 31321, 38146)
Fixes flake: wait for dns pods terminating after test completed
From #37194. Based on #36600. Please only look at the second commit.
As mentioned in [comment](https://github.com/kubernetes/kubernetes/issues/37194#issuecomment-262007174), "DNS horizontal autoscaling" test does not wait for the additional pods to be terminated and this may lead to the failure of later tests.
This fix adds a wait loop at the end of the serial test to ensure the cluster recovers to the original state. In the non-serial test it does not wait for the additional pods terminating because it will not affect other tests, given they are able to be run simultaneously. Plus wait for pods terminating will take certain amount of time.
Note this only fixes certain case of #37194. I noticed there are other failures irrelevant to dns autoscaler. LIke [this one](https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-e2e-gci-gce-serial/34/).
@bprashanth @Random-Liu
Automatic merge from submit-queue (batch tested with PRs 37328, 38102, 37261, 31321, 38146)
Make thirdparty codec able to decode DeleteOptions
Fix#37278.
Without this PR, the gvk sent to the delegated codec will be the thirdparty one, which is not recognized by the delegated codec (usually api.Codecs).
Automatic merge from submit-queue (batch tested with PRs 38076, 38137, 36882, 37634, 37558)
Make logging for gcl e2e test more verbose
To help debug https://github.com/kubernetes/kubernetes/issues/37241
CC @piosz
Automatic merge from submit-queue (batch tested with PRs 38111, 38121)
remove rbac super user
Cleaning up cruft and duplicated capabilities as we transition from RBAC alpha to beta. In 1.5, we added a secured loopback connection based on the `system:masters` group name. `system:masters` have full power in the API, so the RBAC super user is superfluous.
The flag will stay in place so that the process can still launch, but it will be disconnected.
@kubernetes/sig-auth
Automatic merge from submit-queue (batch tested with PRs 36352, 36538, 37976, 36374)
test: update deployment helper to return better error messages
@kubernetes/deployment the problem with https://github.com/kubernetes/kubernetes/issues/36270 is that the selector key is never added in the deployment but this change would make it clearer.
This separation is necessary because "Federated Service DNS should be
able to discover a federated service" e2e test recently added a case
where it deletes the service from federation but not the shards from
the underlying clusters.
Because of the way cleanup was implemented in the AfterEach block
currently, we did not cleanup any of the underlying shards. However,
separating the two phases of the cleanup needs this separation.