Automatic merge from submit-queue
Do not add unique label to DaemonSet
**What this PR does / why we need it**:
It's mainly for #46925. DaemonSet controller adds a unique label to DaemonSet, which is unexpected to federation.
The 1st commit addressed #46981 to construct history once and pass it around, so that we can avoid adding that unique label in DaemonSet in the 2nd commit. ~The 3rd commit just reverts the band-aid PR #47103.~
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#46925, xref #46981
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 47470, 47260, 47411, 46852, 46135)
Write reports for each upgrade test
Due to the way Ginkgo runs individual test cases and the level of coordination required for the upgrade tests, they were all run under a single Ginkgo test case. This PR generates and auxiliary report that break out the results of each upgrade test. This is accomplished by:
1) Wrapping `ginkgo.Fail` and `ginkgo.Skip` to get the actual failure or skip messages.
2) Recovering that info in the upgrade test to generate an auxiliary report.
I suggest reviewing commit by commit.
Sample report: https://storage.googleapis.com/krouseytestreports/logs/results/1/artifacts/junit_upgrades.xmlFixes: #47371
Automatic merge from submit-queue
test/kubemark/resources: configure custom etcd endpoints
We want to stress our own etcd cluster with Kubernetes
workloads, using kubemark e2e tests. This PR adds a new
environment variable 'ETCD_SERVERS' to configure custom
etcd endpoints.
/cc @xiang90 @hongchaodeng
Automatic merge from submit-queue (batch tested with PRs 47302, 47389, 47402, 47468, 47459)
Update to kube-addon-manager:v6.4-beta.2: kubectl v1.6.4 and refreshed base images
**What this PR does / why we need it**: refreshes base images for kube-addon-manager with fixes for CVE-2016-9841 and CVE-2016-9843.
x-ref https://github.com/kubernetes/kubernetes/issues/47386
**Special notes for your reviewer**: the updated images are not yet pushed, so tests will fail until that's done.
**Release note**:
```release-note
```
/assign @MrHohn
Automatic merge from submit-queue
Update GPU e2e tests.
* Use nvidia driver installer from external repo.
That installer decouples itself from COS image version (as long as the
image version is newer than cos-stable-59-9460-60-0).
A separate commit in the test-infra repo will update the cos version
used for this test to cos-stable-59-9460-60-0.
* Use cos-stable-59-9460-60-0 and newer installer for GPU node e2e tests.
This is to enable #47388.
This supercedes #47091.
**Release note**:
```release-note
NONE
```
/sig node
We want to stress our own etcd cluster with Kubernetes
workloads, using kubemark e2e tests. This PR adds a new
environment variable 'ETCD_SERVERS' to configure custom
etcd endpoints.
Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
In 1.7, we add controller history to avoid unnecessary DaemonSet pod
restarts during pod adoption. We will not restart pods with matching
templateGeneration for backward compatibility, and will not restart pods
when template hash label matches current DaemonSet history, regardless
of templateGeneration.
Automatic merge from submit-queue (batch tested with PRs 47084, 46016, 46372)
Update adoption/release of DaemonSet controller history, and wait for history store sync
**What this PR does / why we need it**:
~Depends on #47075, so that DaemonSet controller can update history's controller ref. Ignore that commit when reviewing.~ (merged)
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: #46981
**Special notes for your reviewer**: @kubernetes/sig-apps-bugs
**Release note**:
```release-note
NONE
```
That installer decouples itself from COS image version (as long as the
image version is newer than cos-stable-59-9460-60-0).
A separate commit in the test-infra repo will update the cos version
used for this test to cos-stable-59-9460-60-0.
Automatic merge from submit-queue (batch tested with PRs 46441, 43987, 46921, 46823, 47276)
Enable Node authorizer and NodeRestriction admission in kubemark
xref https://github.com/kubernetes/features/issues/279
We want to ensure scale testing covers use of the authorizer/admission pair that partitions nodes. This includes enabling the authorizer, which populates a graph of existing nodes and pods.
Kubemark is still running all nodes with a single credential, so a follow-up step is to generate unique credentials per node (or enable TLS bootstrapping) and remove the temporary rolebinding added in this PR so the node authorizer is the one authorizing each call by a hollow node.
Automatic merge from submit-queue
Shorten eviction tests, and increase test suite timeout
After #43590, the eviction manager is less aggressive when evicting pods. Because of that, many runs in the flaky suite time out.
To shorten the inode eviction test, I have lowered the eviction threshold.
To shorten the allocatable eviction test, I now set KubeReserved = NodeMemoryCapacity - 200Mb, so that any pod using 200Mb will be evicted. This shortens this test from 40 minutes, to 10 minutes.
While this should be enough to not hit the flaky suite timeout anymore, it is better to keep lower individual test timeouts than a lower suite timeout, since hitting the suite timeout means that even successful test runs are not reported.
/assign @Random-Liu @mtaufen
issue: #31362
Automatic merge from submit-queue
Change what is stored in DaemonSet history `.data`
**What this PR does / why we need it**:
In DaemonSet history `.data`, store a strategic merge patch that can be applied to restore a DaemonSet. Only PodSpecTemplate is saved.
This will become consistent with the data stored in StatefulSet history.
Before this fix, a serialized pod template is stored in `.data`; however, seriazlized pod template isn't a `runtime.RawExtension`, and caused problems when controllers try to patch the history's controller ref.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#47008
**Special notes for your reviewer**: @kubernetes/sig-apps-bugs @erictune @kow3ns @kargakis @lukaszo @mengqiy
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Fix bad check in node e2e tests for GPUs.
When no nvidia device was attached, the -ne check had a syntax error:
sh: -ne: argument expected
This resulted in `Success` being echoed and the test passing incorrectly.
This was found while debugging issue #47216
/release-note-none
/sig node
/area node-e2e
/kind bug
When no nvidia device was attached, the -ne check had a syntax error:
sh: -ne: argument expected
This resulted in 'Success' being echoed and the test passing incorrectly.
This was found while debugging issue #47216
Automatic merge from submit-queue (batch tested with PRs 46750, 47141)
Speed up volume integration test
Partly solves https://github.com/kubernetes/kubernetes/issues/47129 .
On my local box:
before - 7m56.751s
after - 5m53.132s
So approx. 2m time saving. More saving will require refactoring of attach detach controller.
cc @mikedanese
Automatic merge from submit-queue (batch tested with PRs 36376, 47251)
client-go: GetOptions for dynamic client
Looks like `GetOptions` were forgotten in the dynamic client. Without them it's hard to write a dynamic initializer controller (useful for custom resources).
Automatic merge from submit-queue (batch tested with PRs 47113, 46665, 47189)
Improve the e2e node restart test
This commit includes the following two changes:
* Move pre-test checks (pods/nodes ready) to BeforeEach() so that it's
clear whether the test has run or not.
* Dumping logs for unready pods.
Automatic merge from submit-queue
Kubelet: rename cri package name to pkg/kubelet/apis/cri/v1alpha1/runtime
**What this PR does / why we need it**:
We have moved CRI from api/v1alpha1/runtime to apis/cri/v1alpha1, which changed the package name of CRI. This would cause a significant problem: old-versioned runtime (based on CRI in v1.6) doesn't work with latest kubelet v1.7, and vice versa.
This PR renames cri package name to `pkg/kubelet/apis/cri/v1alpha1/runtime` for fixing the problem.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*:
fixes#47012
**Special notes for your reviewer**:
Should be included in v1.7.
**Release note**:
```release-note
CRI has been moved to package `pkg/kubelet/apis/cri/v1alpha1/runtime`.
```
Automatic merge from submit-queue
Move the nvidia installer to the beginning.
When the installer runs for the first time, it disables loadpin and restarts
the node. So, it is better to run it in the beginning so that we can avoid
redoing the later steps. One of the later steps include downloading a tar file
and untarring it. Doing that only once saves around 1m30s in test runtime for
the gci image.
/sig node
/area node-e2e
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 46835, 46856)
Made tests that create Horizontal Pod Autoscaler delete it after they are done.
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#46847
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 46835, 46856)
Made WaitForReplicas and EnsureDesiredReplicas use PollImmediate and improved logging.
**What this PR does / why we need it**: Most importantly, this results in better logging: timeout is logged at the level of the caller, not the helper function, helping debugging.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Allow pods to opt out of PodPreset mutation via an annotation on the pod
An annotation in the pod spec of the form:
podpreset.admission.kubernetes.io/PodPresetOptOut: "true"
Will cause the admission controller to skip manipulating the pod spec,
no matter the labelling.
This is an alternative implementation to pull #44163.
```release-note
Allow pods to opt out of PodPreset mutation via an annotation on the pod.
```
Automatic merge from submit-queue (batch tested with PRs 46885, 47197)
Fix e2e ns deletion message for flake analysis
**What this PR does / why we need it**:
Let's us know when pods have a missing deletion timestamp.
**Special notes for your reviewer**:
helps https://github.com/kubernetes/kubernetes/issues/47135
Automatic merge from submit-queue (batch tested with PRs 46885, 47197)
Let COS docker validation node test against gci-next-canary
**What this PR does / why we need it**:
This is for COS docker validation node test. We plan to use family gci-next-canary in container-vm-image-staging for future Docker upgration and validation.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#47134
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 47065, 47157, 47143)
Use actual hostname when creating network e2e test pod
**What this PR does / why we need it**:
This changes a e2e framework network test Pod use the actual hostname value to match the `kubernetes.io/hostname` label in it's `NodeSelector`. Currently it assumes the Node name will match that hostname label which is not true in all environments.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*:
Fixescoreos/tectonic-installer#1018
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 47065, 47157, 47143)
Removed a race condition from ResourceConsumer
**What this PR does / why we need it**: Without this PR there is a race condition in ResourceConsumer that sometimes results in communication to pods that might not exist anymore.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#47127
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
When the installer runs for the first time, it disables loadpin and restarts
the node. So, it is better to run it in the beginning so that we can avoid
redoing the later steps. One of the later steps include downloading a tar file
and untarring it. Doing that only once saves around 1m30s in test runtime for
the gci image.
Automatic merge from submit-queue
Bump up npd version to v0.4.0
Fixes#47070.
Bump up npd version to [v0.4.0](https://github.com/kubernetes/node-problem-detector/releases/tag/v0.4.0).
```release-note
Bump up Node Problem Detector version to v0.4.0, which added support of parsing log from /dev/kmsg and ABRT.
```
/cc @dchen1107 @ajitak
Automatic merge from submit-queue
Bump external provisioner image to smaller version
The image is roughly half as big so this should improve speed/flakiness maybe
-->
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 46979, 47078, 47138, 46916)
DeleteCollection should include uninitialized resources
Users who delete a collection expect all resources to be deleted, and
users can also delete an uninitialized resource. To preserve this
expectation, DeleteCollection selects all resources regardless of
initialization.
The namespace controller should list uninitialized resources in order to
gate cleanup of a namespace.
Fixes#47137
Automatic merge from submit-queue (batch tested with PRs 46979, 47078, 47138, 46916)
[federation][e2e] Fix cleanupServiceShardLoadBalancer
**What this PR does / why we need it**:
Fixes the issue mentioned in #46976
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#46976
**Special notes for your reviewer**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 45877, 46846, 46630, 46087, 47003)
update NetworkPolicy e2e test for v1 semantics
This makes the NetworkPolicy test at least correct for v1, although ideally we'll eventually add a few more tests... (So this covers about half of #46625.)
I've tested that this compiles, but not that it passes, since I don't have a v1-compatible NetworkPolicy implementation yet...
@caseydavenport @ozdanborne, maybe you're closer to having a testable plugin than I am?
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 47024, 47050, 47086, 47081, 47013)
client-go: deprecate TPR example and add CRD example
/cc @nilebox
Part of https://github.com/kubernetes/kubernetes/issues/46702
Users who delete a collection expect all resources to be deleted, and
users can also delete an uninitialized resource. To preserve this
expectation, DeleteCollection selects all resources regardless of
initialization.
The namespace controller should list uninitialized resources in order to
gate cleanup of a namespace.
Automatic merge from submit-queue (batch tested with PRs 43005, 46660, 46385, 46991, 47103)
[gke-slow always fails] Defer DeleteGCEStaticIP before asserting error
From https://github.com/kubernetes/kubernetes/issues/46918.
I'm getting close to the root cause: During tests, CreateGCEStaticIP() in fact successfully created static IP, but the parser we wrote in test mistakenly think we failed, probably because the gcloud output format was changed recently (or not). I'm still looking into fixing that.
This PR defer the delete function before asserting the error so that we can stop consistently leaking static IP in every run.
/assign @krzyzacy @dchen1107
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 43005, 46660, 46385, 46991, 47103)
add e2e node test for Pod hostAliases feature
**What this PR does / why we need it**: adds node e2e test for #45148
tests requested in https://github.com/kubernetes/kubernetes/issues/43632#issuecomment-298434125
**Release note**:
```release-note
NONE
```
@yujuhong @thockin
Automatic merge from submit-queue
Federation: create loadbalancer service in tests only if test depends on it
**What this PR does / why we need it**:
Creating LoadBalancer type of service for every test case is kind of expensive and time consuming to provision. So this PR changes the test cases to use LoadBalancer type services only when necessary.
**Which issue this PR fixes** (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes#47068
**Release note**:
```release-note
NONE
```
cc @kubernetes/sig-federation-pr-reviews
/assign @madhusudancs
Automatic merge from submit-queue (batch tested with PRs 46977, 47005, 47018, 47061, 46809)
Directly grab map values instead of using loop-clause variables when setting up federated sync controller tests.
Go's loop-clause variables are allocated once and the items are copied to that variable while iterating through the loop. This means, these variables can't escape the scope since closures are bound to loop-clause variables whose value change during each iteration. Doing so would lead to undesired behavior. For more on this topic see: https://github.com/golang/go/wiki/CommonMistakes
So in order to workaround this problem in sync controller e2e tests, we iterate through the map and copy the map value to a variable inside the loop before using it in closures.
Fixes issue: #47059
**Release note**:
```release-note
NONE
```
/assign @marun @shashidharatd @perotinus
cc @csbell @nikhiljindal
/sig federation
Automatic merge from submit-queue (batch tested with PRs 46977, 47005, 47018, 47061, 46809)
Fix for cluster-autoscaler e2e failures
This may help with cluster-autoscaler e2e failing in setup if the tests are run before all machines in mig get fully ready.
Automatic merge from submit-queue (batch tested with PRs 46235, 44786, 46833, 46756, 46669)
implements StatefulSet update
**What this PR does / why we need it**:
1. Implements rolling update for StatefulSets
2. Implements controller history for StatefulSets.
3. Makes StatefulSet status reporting consistent with DaemonSet and ReplicaSet.
https://github.com/kubernetes/features/issues/188
**Special notes for your reviewer**:
**Release note**:
```release-note
Implements rolling update for StatefulSets. Updates can be performed using the RollingUpdate, Paritioned, or OnDelete strategies. OnDelete implements the manual behavior from 1.6. status now tracks
replicas, readyReplicas, currentReplicas, and updatedReplicas. The semantics of replicas is now consistent with DaemonSet and ReplicaSet, and readyReplicas has the semantics that replicas did prior to this release.
```
Automatic merge from submit-queue (batch tested with PRs 46235, 44786, 46833, 46756, 46669)
Fixed ResourceConsumer.CleanUp to properly clean up non-replication-controller resources and pods
**What this PR does / why we need it**: Without this fix CleanUp does not remove non-replication-controller resources and pods. This leads to pollution that in some cases inadvertently affects what is happening in AfterEachs before the namespace gets deleted.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 46997, 47021)
Don't parse human-readable output from gcloud in tests
This is the reason `[k8s.io] Services should be able to change the type and ports of a service [Slow]` is currently failing on GKE e2e tests. For GKE jobs we run a prerelease version of gcloud, in which the default command output was changed.
gcloud's default output for commands is human readable, and is subject to change. Anything scripting against gcloud should always pass `--format=json|yaml|value(...)` so you get standardized output.
fixes: #46918
Automatic merge from submit-queue (batch tested with PRs 47083, 44115, 46881, 47082, 46577)
Add an e2e test for server side get
Print a better error from the response. Performs validation to ensure it
does not regress in alpha state.
This is tests and bug fixes for https://github.com/kubernetes/community/pull/363
@kubernetes/sig-api-machinery-pr-reviews
Implements history utilities for ControllerRevision in the controller/history package
StatefulSetStatus now has additional fields for consistency with DaemonSet and Deployment
StatefulSetStatus.Replicas now represents the current number of createdPods and StatefulSetStatus.ReadyReplicas is the current number of ready Pods
Automatic merge from submit-queue (batch tested with PRs 46718, 46828, 46988)
Update docs/ links to point to main site
**What this PR does / why we need it**:
This updates various links to either point to kubernetes.io or to the kubernetes/community repo instead of the legacy docs/ tree in k/k
Pre-requisite for #46813
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
@kubernetes/sig-docs-maintainers @chenopis @ahmetb @thockin