Automatic merge from submit-queue (batch tested with PRs 50988, 50509, 52660, 52663, 52250). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Added device plugin e2e kubelet failure test
Signed-off-by: Renaud Gaubert <renaud.gaubert@gmail.com>
**What this PR does / why we need it**:
This is part of issue #52859 (fixes#52859)
This PR adds a e2e_node test for the device plugin.
Specifically it implements testing of failure handling by the device plugin components in case Kubelet restart / crashes.
I might try to refactor the GPU tests in a later PR.
**Special notes for your reviewer**:
@jiayingz @vishh
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 51067, 52319, 52803, 52961, 51972). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
Add support for skeleton in GetSigner
Adding support for skeleton to GetSigner to be able to run
e2e tests against a bare metal multinode cluster.
Closes#35613
Automatic merge from submit-queue (batch tested with PRs 52905, 52766). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
Refactor parsing cluster autoscaler status, add logging error
Minor improvements to autoscaling test suite and e2e framework.
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
Fix GCE LB resource cleanup for service e2e tests.
**What this PR does / why we need it**: Fix GCE LB resource cleanup logic.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#52347
**Special notes for your reviewer**:
/assign @shyamjvs @nicksardo
**Release note**:
```release-note
NONE
```
It turns out that at some points while the Node is recovering from a
reboot, we get a different kind of error ("unable to upgrade
connection"). Since we can't distinguish these transient errors from an
error encountered after successfully executing the remote command,
let's just retry all errors for 5min. If this doesn't work, I'm gonna
blame it on sig-node.
The initial retry up to 20s was giving up too soon.
I'm seeing this test flake because the Node rebooted and it takes ~2min
to recover.
Now StatefulSet RunHostCmd calls will use the same 5min timeout as with
other Pod state checks.
Automatic merge from submit-queue
AllowedNotReadyNodes allowed to be not ready for absolutely *any* reason
It's as good as we allow those many nodes to be not part of the cluster at all, ever.
Btw - currently our 5k-node correctness test fails if "kubelet stopped posting node status" or "route not created", etc (ref: https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-scale-correctness/3/build-log.txt)
cc @kubernetes/sig-scalability-misc
We seem to get a lot of flakes due to "connection refused" while running
`kubectl exec`. I can't find any reason this would be caused by the test
flow, so I'm adding retries to see if that helps.
Automatic merge from submit-queue
Migrate to controller references helpers in meta/v1
**What this PR does / why we need it**:
This is a follow up for #48319 that migrates all method usages to new methods in meta/v1.
**Special notes for your reviewer**:
Looking at each commit individually might be easier.
**Release note**:
```release-note
NONE
```
/sig api-machinery
/kind cleanup
Automatic merge from submit-queue (batch tested with PRs 49651, 49707, 49662, 47019, 49747)
improve detectability of deleted pods
**What this PR does / why we need it**:
Adds comment to `waitForPodTerminatedInNamespace` to better explain how it's implemented.
~~It improves pod deletion detection in the e2e framework as follows:~~
~~1. the `waitForPodTerminatedInNamespace` func looks for pod.Status.Phase == _PodFailed_ or _PodSucceeded_ since both values imply that all containers have terminated.~~
~~2. the `waitForPodTerminatedInNamespace` func also ignores the pod's Reason if the passed-in `reason` parm is "". Reason is not really relevant to the pod being deleted or not, but if the caller passes a non-blank `reason` then it will be lower-cased, de-blanked and compared to the pod's Reason (also lower-cased and de-blanked). The idea is to make Reason checking more flexible and to prevent a pod from being considered running when all of its containers have terminated just because of a Reason mis-match.~~
Releated to pr [49597](https://github.com/kubernetes/kubernetes/pull/49597) and issue [49529](https://github.com/kubernetes/kubernetes/issues/49529).
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 45813, 49594, 49443, 49167, 47539)
Add node e2e tests for GKE environment
Ref: https://github.com/kubernetes/kubernetes/issues/46891
This PR adds node e2e tests for validating images used on GKE.
- We pass the `SYSTEM_SPEC_NAME` to the node e2e test process via the flag `--system-spec-name` so that we can skip the environment specific tests using `RunIfSystemSpecNameIs()`.
- Also added `SkipIfContainerRuntimeIs()` as the opposite of `RunIfContainerRuntimeIs()`.
**Release note**:
```
None
```
Automatic merge from submit-queue (batch tested with PRs 49619, 49598, 47267, 49597, 49638)
improve log for pod deletion poll loop
**What this PR does / why we need it**:
It improves some logging related to waiting for a pod to reach a passed-in condition. Specifically, related to issue [49529](https://github.com/kubernetes/kubernetes/issues/49529) where better logging may help to debug the root cause.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 49420, 49296, 49299, 49371, 46514)
Refactoring taint functions to reduce sprawl
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#45060
**Special notes for your reviewer**:
@gmarek @timothysc @k82cn @jayunit100 - I moved some fn's to helpers and some to utils. LMK, if you are ok with this change.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 48377, 48940, 49144, 49062, 49148)
fixit: break sig-cluster-lifecycle tests into subpackage
this is part of fixit week. ref #49161
@kubernetes/sig-cluster-lifecycle-misc
Automatic merge from submit-queue (batch tested with PRs 48139, 48042, 47645, 48054, 48003)
Pipe clusterID into gce_loadbalancer_external.go
**What this PR does / why we need it**: Small cleanup for GCE ELB codes.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#48002
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 47915, 47856, 44086, 47575, 47475)
deprecate created-by annotation for e2e test framework
**What this PR does / why we need it**: This PR deprecates created-by annotation for e2e test framework. This is needed as we now have ControllerRef.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: xref #44407
**Special notes for your reviewer**: This is the third PR for deprecating created-by annotation. Other PRs can be found here: #47469, #47471
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 47470, 47260, 47411, 46852, 46135)
Write reports for each upgrade test
Due to the way Ginkgo runs individual test cases and the level of coordination required for the upgrade tests, they were all run under a single Ginkgo test case. This PR generates and auxiliary report that break out the results of each upgrade test. This is accomplished by:
1) Wrapping `ginkgo.Fail` and `ginkgo.Skip` to get the actual failure or skip messages.
2) Recovering that info in the upgrade test to generate an auxiliary report.
I suggest reviewing commit by commit.
Sample report: https://storage.googleapis.com/krouseytestreports/logs/results/1/artifacts/junit_upgrades.xmlFixes: #47371
Automatic merge from submit-queue (batch tested with PRs 46835, 46856)
Made WaitForReplicas and EnsureDesiredReplicas use PollImmediate and improved logging.
**What this PR does / why we need it**: Most importantly, this results in better logging: timeout is logged at the level of the caller, not the helper function, helping debugging.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 46252, 45524, 46236, 46277, 46522)
Make GCE load-balancers create health checks for nodes
From #14661. Proposal on kubernetes/community#552. Fixes#46313.
Bullet points:
- Create nodes health check and firewall (for health checking) for non-OnlyLocal service.
- Create local traffic health check and firewall (for health checking) for OnlyLocal service.
- Version skew:
- Don't create nodes health check if any nodes has version < 1.7.0.
- Don't backfill nodes health check on existing LBs unless users explicitly trigger it.
**Release note**:
```release-note
GCE Cloud Provider: New created LoadBalancer type Service now have health checks for nodes by default.
An existing LoadBalancer will have health check attached to it when:
- Change Service.Spec.Type from LoadBalancer to others and flip it back.
- Any effective change on Service.Spec.ExternalTrafficPolicy.
```
Automatic merge from submit-queue
move from daemon_restart.go to framework/util.go
**What this PR does / why we need it**:
Moves the func `nodeExec` from daemon_restart.go to framework/util.go. This is the correct file for this func and is a more intuitive pkg for other callers to use. This is a small step of the larger effort of restructuring e2e tests to be more logically structured and easier for newcomers to understand.
```release-note
NONE
```
cc @timothysc @copejon
Automatic merge from submit-queue
util.go: format for
**What this PR does / why we need it**:
format for.
delete redundant para.
make code clean.
**Release note**:
```release-note
NONE
```
Not all CI systems support ssh keys to be present on the node. This
supports the case where "local" provider is being used when running
e2e test, but the environment does not have a SSH key.
Automatic merge from submit-queue (batch tested with PRs 44424, 44026, 43939, 44386, 42914)
Try in cluster config when input KubeConfig is empty
**What this PR does / why we need it**:
Allows for downstream providers to run e2es "incluster" sans kubeconfig
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```
NONE
```
/cc @kubernetes/sig-testing-pr-reviews @marun
Automatic merge from submit-queue (batch tested with PRs 44447, 44456, 43277, 41779, 43942)
Clean up pre-ControllerRef compatibility logic
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#43323
**Special notes for your reviewer**:
No
**Release note**:
```
NONE
```
Automatic merge from submit-queue
fix wrong error return
Signed-off-by: Crazykev <crazykev@zju.edu.cn>
**What this PR does / why we need it**: The err return here is wrong, correct it.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
```