Commit Graph

23283 Commits

Author SHA1 Message Date
Kubernetes Prow Robot
0010333bdd
Merge pull request #116161 from danielvegamyhre/mutable-scheduling-directives
Mutable pod scheduling directives
2023-03-10 12:40:58 -08:00
Kubernetes Prow Robot
7529178924
Merge pull request #111372 from HeavenTonight/master
code cleanup
2023-03-10 11:44:40 -08:00
Sergey Kanzhelev
13855373e5 three more checks for containers lifecycle 2023-03-10 18:42:14 +00:00
Daniel Vega-Myhre
86f41dc012 mutable pod scheduling directives 2023-03-10 18:30:09 +00:00
Antonio Ojea
eecfaf658e decouple system namespaces from bootstrap controller
Use an informer instead of polling.

Change-Id: Ib071e53addb914fcb31d8a1346cf61ca6d22520b
2023-03-10 17:49:47 +00:00
Kubernetes Prow Robot
2e3c5003b9
Merge pull request #115630 from Jefftree/agg-discovery-metrics
Add metrics for aggregated discovery
2023-03-10 07:44:41 -08:00
vinay kulkarni
01b96e7704 Rename ContainerStatus.ResourcesAllocated to ContainerStatus.AllocatedResources 2023-03-10 14:49:26 +00:00
Jan Safranek
771c9be291 Add e2e test for SELinux metrics
This is the commit message for patch #2 (refresh-temp):
2023-03-10 15:03:56 +01:00
Kubernetes Prow Robot
bdf5af93e8
Merge pull request #115468 from pwschuurman/kep-3335-e2e-tests
Add e2e tests for StatefulSetStartOrdinal feature
2023-03-10 04:35:13 -08:00
Kubernetes Prow Robot
cb00077cd3
Merge pull request #113471 from ncdc/gc-contextual-logging
garbagecollector: use contextual logging
2023-03-10 04:34:39 -08:00
Kubernetes Prow Robot
08fbe92fa7
Merge pull request #116423 from ffromani/e2e-podres-node-conformance
e2e: podresources: promote platform-independent test as NodeConformance
2023-03-10 00:06:52 -08:00
Kubernetes Prow Robot
8ce3a2bbef
Merge pull request #116398 from tzneal/rework-init-containers-test
rework init containers test to remove host file dependency
2023-03-09 22:44:25 -08:00
Kubernetes Prow Robot
16d2d55bc0
Merge pull request #115969 from DangerOnTheRanger/messageExpression-for-crd
Add messageExpression field for CRD validation
2023-03-09 22:43:19 -08:00
Kubernetes Prow Robot
e8ae6658ed
Merge pull request #115065 from apelisse/apimachinery-managed-fields
managedfields: Move most of fieldmanager package to managefields
2023-03-09 21:34:22 -08:00
Kermit Alexander II
4e26f680a9 Implement MessageExpression. 2023-03-09 23:37:59 +00:00
Kubernetes Prow Robot
45b96eae98
Merge pull request #113145 from smarterclayton/zombie_terminating_pods
kubelet: Force deleted pods can fail to move out of terminating
2023-03-09 15:32:30 -08:00
Jefftree
387d97605e Add metrics for aggregated discovery 2023-03-09 17:24:02 +00:00
Francesco Romani
5ca235e0ee e2e: podresources: promote platform-independent test as NodeConformance
We have quite a few podresources e2e tests and, as the feature
progresses to GA, we should consider moving them to NodeConformance.
Unfortunately most of them require linux-specific features not in the
test themselves but in the test prelude (fixture) to check or create the
node conditions (e.g. presence or not of devices, online CPUS...) to be
verified in the test proper.

For this reason we promote only a single test for starters.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2023-03-09 16:26:01 +01:00
Kubernetes Prow Robot
f90643435e
Merge pull request #113840 from 249043822/br-context-logging-statefulset
statefulset: use contextual logging
2023-03-09 06:42:02 -08:00
Kubernetes Prow Robot
f02da82e36
Merge pull request #116404 from cpanato/go1202
[go] Bump images, dependencies and versions to go 1.20.2
2023-03-09 05:16:20 -08:00
Kubernetes Prow Robot
da87af638f
Merge pull request #115856 from lanycrost/e2e-115780-grpc-probe-tests
Promote gRPC probe e2e test to Conformance
2023-03-09 01:06:03 -08:00
cpanato
99c80ac119
[go] Bump images, dependencies and versions to go 1.20.2 2023-03-09 09:57:45 +01:00
Todd Neal
78ca93e39c rework init containers test to remove host file dependency
Since we can't rely on the test runner and hosts under test to
be on the same machine, we write to the terminate log from each
container and concatenate the results.
2023-03-08 23:17:17 -06:00
Clayton Coleman
6b9a381185
kubelet: Force deleted pods can fail to move out of terminating
If a CRI error occurs during the terminating phase after a pod is
force deleted (API or static) then the housekeeping loop will not
deliver updates to the pod worker which prevents the pod's state
machine from progressing. The pod will remain in the terminating
phase but no further attempts to terminate or cleanup will occur
until the kubelet is restarted.

The pod worker now maintains a store of the pods state that it is
attempting to reconcile and uses that to resync unknown pods when
SyncKnownPods() is invoked, so that failures in sync methods for
unknown pods no longer hang forever.

The pod worker's store tracks desired updates and the last update
applied on podSyncStatuses. Each goroutine now synchronizes to
acquire the next work item, context, and whether the pod can start.
This synchronization moves the pending update to the stored last
update, which will ensure third parties accessing pod worker state
don't see updates before the pod worker begins synchronizing them.

As a consequence, the update channel becomes a simple notifier
(struct{}) so that SyncKnownPods can coordinate with the pod worker
to create a synthetic pending update for unknown pods (i.e. no one
besides the pod worker has data about those pods). Otherwise the
pending update info would be hidden inside the channel.

In order to properly track pending updates, we have to be very
careful not to mix RunningPods (which are calculated from the
container runtime and are missing all spec info) and config-
sourced pods. Update the pod worker to avoid using ToAPIPod()
and instead require the pod worker to directly use
update.Options.Pod or update.Options.RunningPod for the
correct methods. Add a new SyncTerminatingRuntimePod to prevent
accidental invocations of runtime only pod data.

Finally, fix SyncKnownPods to replay the last valid update for
undesired pods which drives the pod state machine towards
termination, and alter HandlePodCleanups to:

- terminate runtime pods that aren't known to the pod worker
- launch admitted pods that aren't known to the pod worker

Any started pods receive a replay until they reach the finished
state, and then are removed from the pod worker. When a desired
pod is detected as not being in the worker, the usual cause is
that the pod was deleted and recreated with the same UID (almost
always a static pod since API UID reuse is statistically
unlikely). This simplifies the previous restartable pod support.
We are careful to filter for active pods (those not already
terminal or those which have been previously rejected by
admission). We also force a refresh of the runtime cache to
ensure we don't see an older version of the state.

Future changes will allow other components that need to view the
pod worker's actual state (not the desired state the podManager
represents) to retrieve that info from the pod worker.

Several bugs in pod lifecycle have been undetectable at runtime
because the kubelet does not clearly describe the number of pods
in use. To better report, add the following metrics:

  kubelet_desired_pods: Pods the pod manager sees
  kubelet_active_pods: "Admitted" pods that gate new pods
  kubelet_mirror_pods: Mirror pods the kubelet is tracking
  kubelet_working_pods: Breakdown of pods from the last sync in
    each phase, orphaned state, and static or not
  kubelet_restarted_pods_total: A counter for pods that saw a
    CREATE before the previous pod with the same UID was finished
  kubelet_orphaned_runtime_pods_total: A counter for pods detected
    at runtime that were not known to the kubelet. Will be
    populated at Kubelet startup and should never be incremented
    after.

Add a metric check to our e2e tests that verifies the values are
captured correctly during a serial test, and then verify them in
detail in unit tests.

Adds 23 series to the kubelet /metrics endpoint.
2023-03-08 22:03:51 -06:00
Kubernetes Prow Robot
8d5c96fed2
Merge pull request #116093 from swatisehgal/topologymanager-ga-graduation
node: topologymgr: Graduate Kubelet Topology Manager to GA
2023-03-08 16:56:06 -08:00
Kubernetes Prow Robot
30ee6914c5
Merge pull request #115149 from nilekhc/encrypt-all
Allow encryption for all resources
2023-03-08 16:55:59 -08:00
Kubernetes Prow Robot
8fa82976fc
Merge pull request #116356 from pacoxu/cleanup-bump_qps_kubelet
sync default qps of kubelet change everywhere
2023-03-08 15:42:41 -08:00
Maksim Nabokikh
c1431af4f8
KEP-3325: Promote SelfSubjectReview to Beta (#116274)
* Promote SelfSubjectReview to Beta

Signed-off-by: m.nabokikh <maksim.nabokikh@flant.com>

* Fix whoami API

Signed-off-by: m.nabokikh <maksim.nabokikh@flant.com>

* Fixes according to code review

Signed-off-by: m.nabokikh <maksim.nabokikh@flant.com>

---------

Signed-off-by: m.nabokikh <maksim.nabokikh@flant.com>
2023-03-08 15:42:33 -08:00
Kubernetes Prow Robot
0a5310fe9a
Merge pull request #116232 from aojea/e2e_terminating_connectivity
test connectivity for terminating pods
2023-03-08 15:42:21 -08:00
Kubernetes Prow Robot
8ee9b82b10
Merge pull request #115984 from tzneal/init-container-tests
add more init container testing
2023-03-08 15:42:08 -08:00
Jiahui Feng
0a954cc10d always get fresh object before updating. 2023-03-08 15:17:58 -08:00
Jiahui Feng
82eb24156a add test for reset fields. 2023-03-08 15:01:06 -08:00
Peter Schuurman
c57bc292de Add e2e tests for StatefulSetStartOrdinal feature 2023-03-08 14:55:58 -08:00
Nilekh Chaudhari
9382fab9b6
feat: implements encrypt all
Signed-off-by: Nilekh Chaudhari <1626598+nilekhc@users.noreply.github.com>
2023-03-08 22:18:49 +00:00
Antoine Pelisse
4f3859ce91 managedfields: Move most of fieldmanager package to managefields 2023-03-08 13:44:00 -08:00
Kubernetes Prow Robot
8319ac5274
Merge pull request #116383 from Huang-Wei/fix/sched-perf-test
fix: remove SchedulingMigratedInTreePVs feature gate in sched perf test
2023-03-08 13:12:20 -08:00
Kubernetes Prow Robot
2a22864d9c
Merge pull request #116381 from pohly/cronjob-integration-test-shutdown
cronjob: shut down integration test quickly again
2023-03-08 13:12:08 -08:00
Todd Neal
123ab80333 add more init container lifetime testing
Add some additional init container tests that work via monitoring
container lifetime based on logs written to a common file. This allows
more easily writing assertions about the container lifetimes with
respect to one another.
2023-03-08 14:39:10 -06:00
Kubernetes Prow Robot
7598ff36cf
Merge pull request #116333 from aojea/multiport_service
e2e network test for multiple protocol services on same port
2023-03-08 11:27:12 -08:00
Wei Huang
c9bc2f98d0
fix: remove SchedulingMigratedInTreePVs feature gate in sched perf test 2023-03-08 08:34:44 -08:00
Patrick Ohly
be82872eff cronjob: shut down integration test quickly again
6f2cd1b5bd swapped the order of cancel() and
closeFn() so that closeFn got called first when the test was done. This caused
it to block while waiting for goroutines which themselves were waiting for
the context cancellation. The test still shut down, it just took ~86s instead
of ~30s.

The fix is to register the cancel twice: once as soon as the context is
created (to clean up in case of an unexpected panic) and once after
closeFn (because then it'll get called first, as before).
2023-03-08 17:26:47 +01:00
Kubernetes Prow Robot
499a03d88b
Merge pull request #115451 from zhucan/nodeexpandvolume-secret-e2e
e2e: add e2e test to node expand volume with secret
2023-03-08 07:53:12 -08:00
Kubernetes Prow Robot
03ff890ef4
Merge pull request #116329 from dims/drop-aws-kubelet-credential-provider-and-cleanup-aws-storage-e2e-tests
Drop aws kubelet credential provider and cleanup aws storage e2e tests
2023-03-08 06:49:11 -08:00
Andy Goldstein
26e3dab78b garbagecollector: use contextual logging
Signed-off-by: Andy Goldstein <andy.goldstein@redhat.com>
2023-03-08 08:37:56 -05:00
ZhangKe10140699
a239b9986b Migrated the StatefulSet controller (within `kube-controller-manager) to use [contextual logging](https://k8s.io/docs/concepts/cluster-administration/system-logs/#contextual-logging) 2023-03-08 18:57:57 +08:00
Paco Xu
f368413d65 sync default qps of kubelet change 2023-03-08 14:04:51 +08:00
Kubernetes Prow Robot
e390791e5f
Merge pull request #116341 from bobbypage/revert-114640-handle-device-mgr-recovery
Revert "node: device-mgr: Handle recovery flow by checking if healthy devices exist"
2023-03-07 19:31:33 -08:00
Kubernetes Prow Robot
83334ccaa1
Merge pull request #116320 from wangchen615/lastminute-scheduler-fix
Address last-minute requested changes for inplace update feature testing in scheduler
2023-03-07 19:31:26 -08:00
Kubernetes Prow Robot
548e856b58
Merge pull request #116144 from dashpole/apiserver_tracing_beta_round_2
Graduate API Server tracing to beta
2023-03-07 19:31:12 -08:00
zhucan
7a87b4363b e2e: add e2e test to node expand volume with secret
Signed-off-by: zhucan <zhucan.k8s@gmail.com>
2023-03-08 10:27:50 +08:00
Jiahui Feng
feb18b3f5f implmementing type checking
with multi-type support.
2023-03-07 15:49:19 -08:00
Jiahui Feng
54283a1d38 exempt validatingadmissionpolicies/status
because admission control object does not apply to themselves.
2023-03-07 15:48:21 -08:00
Kubernetes Prow Robot
88cce39155
Merge pull request #116200 from Jefftree/e2e-openapi
Add OpenAPI V3 E2E Tests
2023-03-07 15:08:55 -08:00
David Ashpole
4014d0fbbf
graduate API Server tracing to beta 2023-03-07 21:39:39 +00:00
Antonio Ojea
7cb135a888 e2e network test for multiple protocol services on same port
The test creates a Service exposing two protocols on the same port
and a backend that replies on both protocols.

1. Test that Service with works for both protocol
2. Update Service to expose only the TCP port
3. Verify that TCP works and UDP does not work
4. Update Service to expose only the UDP port
5. Verify that TCP does not work and UDP does work

Change-Id: Ic4f3a6509e332aa5694d20dfc3b223d7063a7871
2023-03-07 21:30:39 +00:00
Jefftree
658ea4b591 Refactor aggregated apiserver e2e, add openapi e2e 2023-03-07 20:04:00 +00:00
David Porter
9c20cee504
Revert "node: device-mgr: Handle recovery flow by checking if healthy devices exist" 2023-03-07 11:50:52 -08:00
Kubernetes Prow Robot
2bac225e42
Merge pull request #116331 from aojea/revert_aojea_mistake
Revert "do not assume backend on e2e service jig"
2023-03-07 10:44:53 -08:00
Kubernetes Prow Robot
7ec3c2727b
Merge pull request #115358 from pohly/logs-performance-benchmarks
Logs performance benchmarks
2023-03-07 10:44:46 -08:00
Andrea Tosatto
cae19f9e85 Remove deprecated pod-eviction-timeout flag from controller-manager 2023-03-07 18:14:18 +00:00
kerthcet
e5c812bbe7 Remove CLI flag enable-taint-manager
Signed-off-by: kerthcet <kerthcet@gmail.com>
2023-03-07 18:11:49 +00:00
Antonio Ojea
bcc61bbb8b test connectivity for terminating pods
Test 2 scenarios:

- pod can connect to a terminating pods
- terminating pod can connect to other pods

Change-Id: Ia5dc4e7370cc055df452bf7cbaddd9901b4d229d
2023-03-07 17:48:07 +00:00
Kubernetes Prow Robot
0e9ad242bd
Merge pull request #116298 from soltysh/simplify_sset_test
Get rid of context.TODO and simplify waitForStatusCurrentReplicas
2023-03-07 09:40:45 -08:00
Kubernetes Prow Robot
37326f7cea
Merge pull request #112670 from yangjunmyfm192085/delklogV0
use contextual logging(nodeipam and nodelifecycle part)
2023-03-07 09:40:33 -08:00
Davanum Srinivas
b50d9d1e28
Cleanup vendor/
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2023-03-07 10:27:15 -05:00
Antonio Ojea
0826438d72 Revert "do not assume backend on e2e service jig"
This reverts commit 909a08d011.

Change-Id: Ie9ba4284ac739b3ea7eb57c44bb4551316121920
2023-03-07 15:12:22 +00:00
Patrick Ohly
5ee679b340 test/integration/logs: use stable struct for unit test
v1.Container is still changing a log which caused the test to fail each time a
new field was added. To test loading, let's better use something that is
unlikely to change. The runtimev1.VersionResponse gets logged by kubelet and
seems to be stable.
2023-03-07 16:04:32 +01:00
Patrick Ohly
eaa95b9178 test/integration/logs: benchmark using logsapi
The benchmarks and unit tests were written so that they used custom APIs for
each log format. This made them less realistic because there were subtle
differences between the benchmark and a real Kubernetes component. Now all
logging configuration is done with the official
k8s.io/component-base/logs/api/v1.

To make the different test cases more comparable, "messages/s" is now reported
instead of the generic "ns/op".
2023-03-07 16:04:32 +01:00
Patrick Ohly
10c15d7a67 test/integration/logs: replace assert.Contains
For long strings the output of assert.Contains is not very readable.
2023-03-07 16:04:32 +01:00
Patrick Ohly
a862a269b0 test/integration/logs: remove useless stats case
The same effect can be achieved with `-bench=BenchmarkEncoding/none`.
2023-03-07 16:04:32 +01:00
Patrick Ohly
97a8d72a67 test/integration/logs: update benchmark support
When trying again with recent log files from the CI job, it was found that some
JSON messages get split across multiple lines, both in container logs and in
the systemd journal:

   2022-12-21T07:09:47.914739996Z stderr F {"ts":1671606587914.691,"caller":"rest/request.go:1169","msg":"Response ...
   2022-12-21T07:09:47.914984628Z stderr F 70 72  6f 78 79 10 01 1a 13 53 ... \".|\n","v":8}

Note the different time stamp on the second line. That first line is
long (17384 bytes). This seems to happen because the data must pass through a
stream-oriented pipe and thus may get split up by the Linux kernel.

The implication is that lines must get merged whenever the JSON decoder
encounters an incomplete line. The benchmark loader now supports that. To
simplifies this, stripping the non-JSON line prefixes must be done before using
a log as test data.

The updated README explains how to do that when downloading a CI job
result. The amount of manual work gets reduced by committing symlinks under
data to the expected location under ci-kubernetes-kind-e2e-json-logging and
ignoring them when the data is not there.

Support for symlinks gets removed and path/filepath is used instead of path
because it has better Windows support.
2023-03-07 16:03:48 +01:00
Davanum Srinivas
90d185b7e1
Drop AWS kubelet credential provider and cleanup AWS storage e2e tests
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2023-03-07 09:00:12 -05:00
Maciej Szulik
27d5cda811
Simplify waitForStatusCurrentReplicas helper 2023-03-07 14:18:31 +01:00
Maciej Szulik
17117dc47d
Replace context.TODO with proper context 2023-03-07 14:18:29 +01:00
Antonio Ojea
4482d1c2f4 e2e support Services with multiple EndpointSlices
A Service can use multiple EndpointSlices for its backend, when
using custom Endpoint Slices, the data plane should forward traffic
to any of the endpoints in the Endpointslices that belong to the
Service.

Change-Id: I80b42522bf6ab443050697a29b94d8245943526f
2023-03-07 13:00:46 +00:00
Kubernetes Prow Robot
4eb29bcb21
Merge pull request #116040 from pbeschetnov/master
[HPA e2e] Reduce possible number of scale steps to minimize stabilization test flakiness
2023-03-07 04:20:16 -08:00
Naman Lakhwani
b6f9a65558
Migrating pkg/controller/serviceaccount to contextual logging (#114918)
* migrating pkg/controller/serviceaccount to contextual logging

Signed-off-by: Naman <namanlakhwani@gmail.com>

* small nit

Signed-off-by: Naman <namanlakhwani@gmail.com>

* capitalising first letter of error

Signed-off-by: Naman <namanlakhwani@gmail.com>

* addressed review comments

Signed-off-by: Naman <namanlakhwani@gmail.com>

* small nit to add key

Signed-off-by: Naman <namanlakhwani@gmail.com>

---------

Signed-off-by: Naman <namanlakhwani@gmail.com>
2023-03-07 04:19:59 -08:00
Naman Lakhwani
8f45b64c93
Migrated pkg/controller/replicaset to contextual logging (#114871)
* migrated controller/replicaset to contextual logging

Signed-off-by: Naman <namanlakhwani@gmail.com>

* small nits

Signed-off-by: Naman <namanlakhwani@gmail.com>

* addressed changes

Signed-off-by: Naman <namanlakhwani@gmail.com>

* small nit

Signed-off-by: Naman <namanlakhwani@gmail.com>

* taking t as input

Signed-off-by: Naman <namanlakhwani@gmail.com>

---------

Signed-off-by: Naman <namanlakhwani@gmail.com>
2023-03-07 04:19:51 -08:00
Kubernetes Prow Robot
4aaa4df840
Merge pull request #113986 from songxiao-wang87/runwxs-test2
Migrate StorageVersionGC to contextual logging
2023-03-07 04:19:43 -08:00
Kubernetes Prow Robot
471b392f43
Merge pull request #113916 from songxiao-wang87/runwxs-test1
Migrate ttl_controller to contextual logging
2023-03-07 04:18:30 -08:00
Swati Sehgal
e5ad3cbf6a node: topologymgr: update node e2e test tags
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2023-03-07 09:52:07 +00:00
Kubernetes Prow Robot
3489796d5c
Merge pull request #113428 from mengjiao-liu/contextual-logging-controller-cronjob
Update `pkg/controller/cronjob/` for contextual logging
2023-03-07 01:28:18 -08:00
JunYang
780ef3afb0 use klog.InfoS instead of klog.V(0),Info 2023-03-07 15:50:01 +08:00
Kubernetes Prow Robot
04675428bb
Merge pull request #115973 from jpbetz/enforcement-actions
KEP-3488: Implement Enforcement Actions and Audit Annotations
2023-03-06 21:56:37 -08:00
Joe Betz
c2b3871502 Add integration tests 2023-03-06 21:51:33 -05:00
Chen Wang
fd6105d015 fix last minute scheduler changes for inplace update 2023-03-06 18:47:02 -05:00
David Porter
d3214226de test: Fix node e2e shutdown test flake
Bump the timeout as the previous timeout was sometimes too short,
resulting in the pod status update not sent. Also, fixed a typo in
previous refactor.

Signed-off-by: David Porter <david@porter.me>
2023-03-06 15:38:45 -08:00
Kubernetes Prow Robot
64259b43b8
Merge pull request #116054 from jpbetz/secondary-authz
KEP-3488: Implement secondary authz for ValidatingAdmissionPolicy
2023-03-06 11:54:16 -08:00
Joe Betz
4d30c43494 Add integration tests for secondary authz 2023-03-06 12:08:53 -05:00
Kubernetes Prow Robot
d6e9cff212
Merge pull request #115838 from torredil/remove-aws
Remove AWS legacy cloud provider + EBS in-tree storage plugin
2023-03-06 08:18:29 -08:00
torredil
6aebda9b1e Remove AWS legacy cloud provider + EBS in-tree storage plugin
Signed-off-by: torredil <torredil@amazon.com>
2023-03-06 14:01:15 +00:00
Swati Sehgal
d536a342b4 node: topologymgr: GA graduation implies Feature Gate is ON by default
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2023-03-06 12:51:05 +00:00
Swati Sehgal
01a9148887 node: device-mgr: e2e: adapt to sample device plugin refactoring
These updates are to adapt to the sample device plugin
refactoring done here: 92e00203e0.

Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2023-03-06 12:15:59 +00:00
Swati Sehgal
bae8a164e0 node: device-mgr: e2e: address e2e test review comments
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2023-03-06 12:15:58 +00:00
Swati Sehgal
674879a959 node: device-mgr: e2e: Update the e2e test to reproduce issue:109595
Breakdown of the steps implemented as part of this e2e test is as follows:
1. Create a file `registration` at path `/var/lib/kubelet/device-plugins/sample/`
2. Create sample device plugin with an environment variable with
   `REGISTER_CONTROL_FILE=/var/lib/kubelet/device-plugins/sample/registration` that
    waits for a client to delete the control file.
3. Trigger plugin registeration by deleting the abovementioned directory.
4. Create a test pod requesting devices exposed by the device plugin.
5. Stop kubelet.
6. Remove pods using CRI to ensure new pods are created after kubelet restart.
7. Restart kubelet.
8. Wait for the sample device plugin pod to be running. In this case,
   the registration is not triggered.
9. Ensure that resource capacity/allocatable exported by the device plugin is zero.
10. The test pod should fail with `UnexpectedAdmissionError`
11. Delete the test pod.
12. Delete the sample device plugin pod.
13. Remove `/var/lib/kubelet/device-plugins/sample/` and its content, the directory
    created to control registration

Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2023-03-06 12:15:58 +00:00
Swati Sehgal
db7afc1cd8 node: device-mgr: e2e: Implement End to end test
This commit reuses e2e tests implmented as part of https://github.com/kubernetes/kubernetes/pull/110729.
The commit is borrowed from the aforementioned PR as is to preserve
authorship. Subsequent commit will update the end to end test to
simulate the problem this PR is trying to solve by reproducing
the issue: 109595.

Co-authored-by: Francesco Romani <fromani@redhat.com>
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2023-03-06 11:52:23 +00:00
Ed Bartosh
35fd124f4d DRA: fix CDI spec version
The latest CDI release includes spec version check that fails
if version is less than 0.3.0:
  https://github.com/container-orchestrated-devices/container-device-interface/blob/v0.5.4/pkg/cdi/version.go#L42

Updating CDI spec version to 0.3.0 in the test kubelet plugin code
should fix e2e test failures on the CRI runtimes that use CDI >= 0.5.4
(Containerd master atm, CRI-O soon).
2023-03-05 16:49:56 +02:00
Kubernetes Prow Robot
20c3a007f5
Merge pull request #115693 from bobbypage/shutdown_test
test: e2e node shutdown test logging improvements
2023-03-03 15:20:57 -08:00
Kubernetes Prow Robot
15c5366a1c
Merge pull request #116240 from bobbypage/devicepluginfix
test: Fix path to e2e node sample device plugin
2023-03-03 14:15:09 -08:00
Kubernetes Prow Robot
9f0b491953
Merge pull request #113270 from rrangith/fix/create-pvc-for-pending-pod
Automatically recreate PVC for pending STS pod
2023-03-03 10:24:58 -08:00
Kubernetes Prow Robot
37d8b5a2b8
Merge pull request #116227 from gnufied/wait-for-pod-startup-before-resize
Wait for pod to be running before expanding
2023-03-03 09:18:59 -08:00
David Porter
c5a1f0188b
test: Add node e2e test to verify static pod termination
Add node e2e test to verify that static pods can be started after a
previous static pod with the same config temporarily failed termination.

The scenario is:

1. Static pod is started
2. Static pod is deleted
3. Static pod termination fails (internally `syncTerminatedPod` fails)
4. At later time, pod termination should succeed
5. New static pod with the same config is (re)-added
6. New static pod is expected to start successfully

To repro this scenario, setup a pod using a NFS mount. The NFS server is
stopped which will result in volumes failing to unmount and
`syncTerminatedPod` to fail. The NFS server is later started, allowing
the volume to unmount successfully.

xref:

1. https://github.com/kubernetes/kubernetes/pull/113145#issuecomment-1289587988
2. https://github.com/kubernetes/kubernetes/pull/113065
3. https://github.com/kubernetes/kubernetes/pull/113093

Signed-off-by: David Porter <david@porter.me>
2023-03-03 10:00:48 -06:00
David Porter
1c75c2cda8
test: Add e2e to verify static pod termination
Add a node e2e to verify that if a static pod is terminated while the
container runtime or CRI returns an error, the pod is eventually
terminated successfully.

This test serves as a regression test for k8s.io/issue/113145 which
fixes an issue where force deleted pods may not be terminated if the
container runtime fails during a `syncTerminatingPod`.

To test this behavior, start a static pod, stop the container runtime,
and later start the container runtime. The static pod is expected to
eventually terminate successfully.

To start and stop the container runtime, we need to find the container
runtime systemd unit name. Introduce a util function
`findContainerRuntimeServiceName` which finds the unit name by getting
the pid of the container runtime from the existing
`ContainerRuntimeProcessName` flag passed into node e2e and using
systemd dbus `GetUnitNameByPID` function to convert the pid of the
container runtime to a unit name. Using the unit name, introduce helper
functions to start and stop the container runtime.

Signed-off-by: David Porter <david@porter.me>
2023-03-03 10:00:48 -06:00
Hemant Kumar
53585ec009 Bump the timeout for volume expansion 2023-03-02 22:36:24 -05:00
David Porter
8647c23c11 test: Fix path to e2e node sample device plugin
The existing path is incorrect (missing `sample-device-plugin`)
directory and thus causing test failures. The full path should be
`test/e2e/testing-manifests/sample-device-plugin/sample-device-plugin.yaml`.

Signed-off-by: David Porter <david@porter.me>
2023-03-02 19:22:59 -08:00
Kubernetes Prow Robot
74f0819069
Merge pull request #116152 from torredil/fix-windows-e2e-test
Add windows nodeSelector to provisioning functions
2023-03-02 11:36:56 -08:00
Kubernetes Prow Robot
ab002db788
Merge pull request #116223 from logicalhan/metric-docs
include beta metrics in documentation and update docs for metrics
2023-03-02 10:31:04 -08:00
Hemant Kumar
99fe00797d Wait for pod to be running before expanding 2023-03-02 12:57:17 -05:00
Kubernetes Prow Robot
b6d102d634
Merge pull request #116071 from yuanchen8911/symlink
Add symlink data verification to statefulset e2e
2023-03-02 05:43:07 -08:00
Kubernetes Prow Robot
78e5db0931
Merge pull request #115107 from swatisehgal/handle-device-mgr-recovery-sample-dp-changes
node: device-mgr: sample device plugin: Add support to control registration process
2023-03-02 05:42:55 -08:00
Kubernetes Prow Robot
949bee0118
Merge pull request #116189 from marosset/windows-hyperv-basic-e2e-test
Adding e2e test to verify hyperv container is running inside a VM on Windows
2023-03-01 22:27:07 -08:00
Kubernetes Prow Robot
d788d436c9
Merge pull request #115893 from mgoltzsche/go-jose-update-2.6
bump go-jose to v2.6.0
2023-03-01 20:23:06 -08:00
Kubernetes Prow Robot
59a7e34052
Merge pull request #115442 from bobbypage/unknown_pods_test
test: Add e2e node test to check for unknown pods
2023-03-01 19:08:55 -08:00
Max Goltzsche
df8fa2eab5
bump go-jose to v2.6.0
Update go-jose from v2.2.2 to v2.6.0.
This is to make the kubernetes code compatible with newer go-jose versions that have a small breaking change (`jwt.NewNumericDate()` returns a pointer).

Signed-off-by: Max Goltzsche <max.goltzsche@gmail.com>
2023-03-02 02:53:17 +01:00
Kubernetes Prow Robot
1646ed8222
Merge pull request #116057 from bobbypage/nodee2elog
test: Add log artifact for ginkgo node e2e and tune default ginkgo flags
2023-03-01 16:55:16 -08:00
Kubernetes Prow Robot
dfa03231da
Merge pull request #116110 from knabben/knabben/polling-hpc-stats
Poll for stats until Windows kubelet present it in the stats endpoint
2023-03-01 15:11:27 -08:00
Kubernetes Prow Robot
51dedff4f3
Merge pull request #115277 from pohly/klog-update
klog update
2023-03-01 15:11:16 -08:00
Mark Rossetti
ab020ee628
Adding e2e test to verify hyperv container is running inside a VM on Windows
Signed-off-by: Mark Rossetti <marosset@microsoft.com>
2023-03-01 14:08:46 -08:00
Kubernetes Prow Robot
b0c949d9dd
Merge pull request #116148 from aramase/aramase/f/ci-metrics
[KMSv2] update ci script to create cluster and gather metrics
2023-03-01 12:39:30 -08:00
Amim Knabben
3fd3a76eb9 Poll for stats until Windows kubelet present it in the stats endpoint 2023-03-01 17:17:23 -03:00
Han Kang
0199276f85 include beta metrics in documentation and update docs for metrics 2023-03-01 11:32:19 -08:00
Kubernetes Prow Robot
60eefa8066
Merge pull request #115425 from pohly/scheduler-perf-benchstat
scheduler perf: benchstat support
2023-03-01 11:19:29 -08:00
Kubernetes Prow Robot
fe671737ec
Merge pull request #116181 from pohly/dra-test-driver-update
e2e: dra test driver update
2023-03-01 10:10:39 -08:00
Patrick Ohly
961819a4d0 dependencies: update klog v2.90.1
This improves performance of the text formatting and ktesting.

Because ktesting no longer buffers messages by default, one unit
test needs to ask for that explicitly.
2023-03-01 19:03:50 +01:00
Anish Ramasekar
c52ac0d59d
[KMSv2] update ci script to create cluster and gather metrics
Signed-off-by: Anish Ramasekar <anish.ramasekar@gmail.com>
2023-03-01 18:03:37 +00:00
Patrick Ohly
74785074c6 e2e dra: update logging
When running as part of the scheduler_perf benchmark testing, we want to print
less information by default, so we should use V to limit verbosity

Pretty-printing doesn't belong into "application" code. I am moving that into
the ktesting formatting (https://github.com/kubernetes/kubernetes/pull/116180).
2023-03-01 15:02:03 +01:00
Patrick Ohly
106fce6fae e2e dra: improve goroutine handling
There is an API now to wait for informer factory goroutine termination.
While at it, an incorrect comment for mutex locking gets removed.
2023-03-01 15:00:30 +01:00
Justin SB
50a025acdb e2e: Remove dead code in tests
We were building a local pod variable that we were no longer using.

Co-authored-by: Patrick Ohly <patrick.ohly@intel.com>
2023-03-01 08:08:33 -05:00
Kubernetes Prow Robot
9ef145d3a7
Merge pull request #116127 from pacoxu/negative-grace-period
retry for negative TerminationGracePeriodSeconds update
2023-03-01 04:29:16 -08:00
Swati Sehgal
7ea35d0cd8 node: device-mgr: sample device plugin: manifest to avoid registration
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2023-03-01 10:01:34 +00:00
Swati Sehgal
2c8fc26b89 node: device-mgr: sample device plugin: control registration process
Update the sample device plugin to enable the e2e node tests (or any
other entity with full access to the node filesystem) to control the
registration process. We add a new environment variable `REGISTER_CONTROL_FILE`.
The value of this variable must be a file which prevents the plugin
to register itself while it's present. Once removed, the plugin will
go on and complete the registration. The plugin will automatically
detect the parent directory on which the file resides and detect
deletions, unblocking the registration process. If the file is specified
but unaccessible, the plugin will fail. If the file is not specified,
the registration process will progress as usual and never pause.
The plugin will need read access to the parent directory.

This feature is useful because it is not possible to control the order
in which the pods are recovered after node reboot/kubelet restart.

In this approach, the testing environment will create a directory and
then a empty file to pause the registration process of the plugin.
Once pointed to that file, the plugin will start and wait for it to
be deleted. Only after the directory has been deleted,
the plugin would proceed to registration.

This feature is used in #114640 where e2e test is implemented to
simulate scenarios where application pods requesting devices come up before
the device plugin pod on node reboot/ kubelet restart.

Co-authored-by: Francesco Romani <fromani@redhat.com>
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2023-03-01 10:00:52 +00:00
Paco Xu
7d8437933e retry on conflict for negative TerminationGracePeriodSeconds update 2023-03-01 12:55:58 +08:00
Kubernetes Prow Robot
93a5181871
Merge pull request #116022 from nilekhc/reference-implementation-provider
[kmsv2] feat: add kms mock plugin for e2e tests
2023-02-28 17:57:17 -08:00
Nilekh Chaudhari
43acba8084
feat: kms base64 plugin for e2e tests
Signed-off-by: Nilekh Chaudhari <1626598+nilekhc@users.noreply.github.com>
2023-03-01 00:11:17 +00:00
Kubernetes Prow Robot
9b213330f5
Merge pull request #116153 from alexzielenski/podsecurity-featuregate-re-enable
skip special features in TestPodSecurityGAOnly
2023-02-28 16:07:23 -08:00
Kubernetes Prow Robot
6e202d6fdb
Merge pull request #116116 from ahg-g/ahg-mutable-job-ga
Graduate JobMutableNodeSchedulingDirectives feature to GA
2023-02-28 14:53:52 -08:00
Kubernetes Prow Robot
0469455ff7
Merge pull request #116082 from mimowo/fix-oomkiller-test
Fix the flaky OOMKiller test by sleep at start
2023-02-28 14:53:37 -08:00
Patrick Ohly
cc4bcd1d8e scheduler_perf: report data items as benchmark results
This replaces the pretty useless us/op metric (useless because it includes
setup and teardown times) with the same values that also get stored in the JSON
file.

The main advantage is that benchstat can be used to analyze and compare
results.
2023-02-28 23:08:23 +01:00
Patrick Ohly
961129c5f1 scheduler_perf: add logging flags
This enables testing of different real production configurations (JSON
vs. text, different log levels, contextual logging).
2023-02-28 23:08:17 +01:00
Patrick Ohly
00d1459530 test/utils: extend ktesting
The upstream ktesting has to be very flexible to accommodate different ways of
using it. In Kubernetes, we can be opinionated and make certain choices, like
using klog flags, and only those.
2023-02-28 23:06:00 +01:00
Patrick Ohly
c008732948 test/integration: add StartEtcd
In contrast to EtcdMain, it can be called by individual tests or benchmarks and
each caller will get a fresh etcd instance. However, it uses the same
underlying code and the same port for all instances, so tests cannot run in
parallel.
2023-02-28 23:05:17 +01:00
Alexander Zielenski
9ef1fc543f skip special features in TestPodSecurityGAOnly
was causing some alpha/beta features to be disabled after running sometimes
2023-02-28 13:21:35 -08:00
torredil
42909af615
Add windows nodeSelector to provisioning functions
Signed-off-by: torredil <torredil@amazon.com>
2023-02-28 21:21:29 +00:00
ahg-g
2ecd24011a Graduate JobMutableNodeSchedulingDirectives feature to GA 2023-02-28 15:47:13 +00:00
Kubernetes Prow Robot
6f68a13696
Merge pull request #115961 from pohly/e2e-framework-deprecate-gomega-wrappers
e2e framework: deprecate gomega wrappers
2023-02-28 06:27:29 -08:00
Kubernetes Prow Robot
04e7021d06
Merge pull request #114625 from Divya063/feature-private-image-registry
[E2E] Add support for pulling images from private registry
2023-02-28 06:27:17 -08:00
Kubernetes Prow Robot
806b215cce
Merge pull request #115987 from yuanchen8911/cleanup
Replace closures in test packages
2023-02-28 01:47:29 -08:00
Divya Rani
a8b1e57246 add support for pulling images from private registry 2023-02-28 00:40:51 -08:00
David Porter
e001884594 test: Add some default flags ginkgo flags for node e2e
Add the following ginkgo flags for each node e2e similar to the
existing hack/ginkgo-e2e.sh script.

* --no-color, colors aren't rendered properly in prow and make examining
  the log in text editors more difficult, so let's disable them.
  `hack/ginkgo-e2e.sh` (used for kind e2e tests) also disables them
  already.
* -v, enable verbose logs. This is needed so we get more detailed info
  even when the tests pass. This is useful so we can compare successful
  runs to failed runs.

Signed-off-by: David Porter <david@porter.me>
2023-02-28 00:24:40 -08:00
David Porter
e9ecdf3534 test: Emit ginkgo log for each node e2e
When running multiple node e2e with multiple machine images, the tests
are run separately for each node. The final build log has all of the
results for each of the hosts combined together which make debugging the
log difficult. To make it easier, emit a log for each host that was run.
This log will be written to the results directory and uploaded as an
artifact in prow jobs.

Signed-off-by: David Porter <david@porter.me>
2023-02-28 00:21:34 -08:00
David Porter
0980f026c9 test: Remove tests argument from node e2e image config
This was never being used, the only config that used it was deleted in
https://github.com/kubernetes/test-infra/pull/26017 so we don't need
this anymore, so let's delete it.

Signed-off-by: David Porter <david@porter.me>
2023-02-28 00:21:03 -08:00
Kubernetes Prow Robot
b9fd1802ba
Merge pull request #102884 from vinaykul/restart-free-pod-vertical-scaling
In-place Pod Vertical Scaling feature
2023-02-27 22:53:15 -08:00
Kevin Delgado
9828f62237 turn field validation e2e tests into conformance tests 2023-02-27 14:39:21 -08:00
Kubernetes Prow Robot
12ceec47aa
Merge pull request #115977 from elmiko/fix-lb-test
remove aws from e2e loadbalancer udp conntrack tests
2023-02-27 10:26:50 -08:00
Yuan Chen
a24aef6510 Replace a function closure
Replace more closures with pointer conversion

Replace deprecated Int32Ptr to Int32
2023-02-27 09:13:36 -08:00
Roman Bednar
fbe39a8e96 skip e2e test for checking filesystem resizing on Windows
Test is currently using `stat` Linux command which won't work on Windows.
2023-02-27 10:47:02 +01:00
Roman Bednar
7484c69a16 add e2e test for snapshot restore resizing 2023-02-27 10:47:02 +01:00
Roman Bednar
8567eafa02 add anti-capability to allow resize test skipping 2023-02-27 10:47:02 +01:00
Roman Bednar
b62794c52e fix a typo 2023-02-27 10:47:02 +01:00
Kubernetes Prow Robot
015e2fa20c
Merge pull request #115953 from pohly/lint-gomega
test: fixing + linting gomega usage
2023-02-27 00:56:20 -08:00
Michal Wozniak
36eef0600d Fix the flaky OOMKiller test by sleep at start 2023-02-27 08:15:46 +01:00
Yuan Chen
b929908f3b Add symlink check to e2e statefulset app
Fix a bug
2023-02-26 18:37:58 -08:00
Khachatur Ashotyan
1033cf80df e2e: promote gRPC probe tests to conformance 2023-02-25 08:54:51 +04:00
Kubernetes Prow Robot
70fee660fb
Merge pull request #115854 from kerthcet/cleanup/apiserver-cleanup
Cleanup resources when initializing error in integration
2023-02-24 15:58:05 -08:00
Chen Wang
7db339dba2 This commit contains the following:
1. Scheduler bug-fix + scheduler-focussed E2E tests
2. Add cgroup v2 support for in-place pod resize
3. Enable full E2E pod resize test for containerd>=1.6.9 and EventedPLEG related changes.

Co-Authored-By: Vinay Kulkarni <vskibum@gmail.com>
2023-02-24 18:21:21 +00:00
Vinay Kulkarni
f2bd94a0de In-place Pod Vertical Scaling - core implementation
1. Core Kubelet changes to implement In-place Pod Vertical Scaling.
2. E2E tests for In-place Pod Vertical Scaling.
3. Refactor kubelet code and add missing tests (Derek's kubelet review)
4. Add a new hash over container fields without Resources field to allow feature gate toggling without restarting containers not using the feature.
5. Fix corner-case where resize A->B->A gets ignored
6. Add cgroup v2 support to pod resize E2E test.
KEP: /enhancements/keps/sig-node/1287-in-place-update-pod-resources

Co-authored-by: Chen Wang <Chen.Wang1@ibm.com>
2023-02-24 18:21:21 +00:00
Vinay Kulkarni
76962b0fa7 In-place Pod Vertical Scaling - API changes
1. Define ContainerResizePolicy and add it to Container struct.
 2. Add ResourcesAllocated and Resources fields to ContainerStatus struct.
 3. Define ResourcesResizeStatus and add it to PodStatus struct.
 4. Add InPlacePodVerticalScaling feature gate and drop disabled fields.
 5. ResizePolicy validation & defaulting and Resources mutability for CPU/Memory.
 6. Various fixes from code review feedback (originally committed on Apr 12, 2022)
KEP: /enhancements/keps/sig-node/1287-in-place-update-pod-resources
2023-02-24 17:18:04 +00:00
Pavel Beschetnov
e25badc5de Reduce possible number of scale steps to improve test stability 2023-02-24 13:19:05 +00:00
Kubernetes Prow Robot
35f3fc59c1
Merge pull request #115236 from danielvegamyhre/scalable-indexed-job
Support for elastic Indexed Jobs
2023-02-23 14:57:34 -08:00
Kubernetes Prow Robot
10cdaefc1f
Merge pull request #116005 from gjkim42/fix-createStaticPod
Fix createStaticPod to not use container.RestartPolicy
2023-02-23 12:33:35 -08:00
Kubernetes Prow Robot
3bf57e3ded
Merge pull request #115989 from yuanchen8911/closure
Replace a function argument in statefulset e2e framework
2023-02-23 09:52:37 -08:00
Kubernetes Prow Robot
f3bb101f54
Merge pull request #115964 from atwamahmoud/fix-scale-crd-target-ref
Update ExistsInDiscovery to ignore 404 errors in autoscaling utils framework
2023-02-23 08:47:46 -08:00
Kubernetes Prow Robot
aa98f6f4da
Merge pull request #115606 from wzshiming/fix/termination_grace_period_seconds
`pod.spec.terminationGracePeriodSeconds` is a negative then convert to 1
2023-02-23 07:35:35 -08:00
Mahmoud Atwa
5df798f22c Fix import alias issue for errors package 2023-02-23 12:22:32 +00:00
Gunju Kim
f690a0ce41
Fix createStaticPod to not use container.RestartPolicy 2023-02-23 21:18:24 +09:00
Kubernetes Prow Robot
3702411ef9
Merge pull request #115926 from ffromani/e2e-node-remove-kubevirt-device-plugin
e2e: node remove: kubevirt device plugin
2023-02-23 04:13:35 -08:00
Patrick Ohly
181fc50f8e e2e framework: deprecate gomega wrappers
All wrappers except for ExpectNoError are identical to their gomega
counterparts. The only advantage that they have is that their invocations are
shorter.

That advantage does not outweigh their disadvantages:
- cannot be used in combination with gomega.Eventually/Consistently
- not a full replacement for gomega, so we just end up using both
- don't support passing a stack offset and thus cannot be used in helper
  functions
- ginkgolinter does not work for them, so sub-optimal calls like this one
  are not reported:

     framework.ExpectEqual(len(items), 0)
     ->
     gomega.Expect(items).To(gomega.BeEmpty())
- developers try to make do with what's available in the framework, leading
  to sub-optimal checks like this:

    framework.ExpectEqual(true, strings.Contains(event.Message, expectedEventError), "Event error should indicate non-root policy caused container to not start")
    ->
    gomega.Expect(event.Message).To(gomega.ContainSubstring(expectedEventError), "Event error should indicate non-root policy caused container to not start")

So let's remove these wrappers. As a first step they get marked as deprecated.
This enables stricter
linting (https://github.com/kubernetes/kubernetes/pull/109728), once enabled,
to report new code which uses them.
2023-02-23 09:51:42 +01:00
Yuan Chen
214bb8e573 Remove a closure function in statefulset e2e 2023-02-22 19:30:13 -08:00
Daniel Vega-Myhre
c63f448451 change test names and address other comments 2023-02-23 03:25:17 +00:00
Daniel Vega-Myhre
b0b0959b92 address comments 2023-02-23 03:25:16 +00:00
Daniel Vega-Myhre
d41302312e update validation logic so completions is mutable iff completions is modified in tandem with parallelsim so completions == parallelism 2023-02-23 03:25:16 +00:00
michael mccune
3a76c95f8e remove aws from e2e loadbalancer udp conntrack tests
These tests are not working as expected on AWS due to the requirement
for the user to add an annotation to the UDP Service created.
2023-02-22 17:08:57 -05:00
Riaan Kleinhans
16a15d0c44 pending_eligible_endpoints.yaml
start work on moving pending eligible endpoints

Move endpoints from ineligible_endpoints.yaml to pending_eligible_endpoints.yaml
2023-02-23 10:30:09 +13:00
Patrick Ohly
41f23f52d0 test: fix ginkgolinter issues
All of these issues were reported by https://github.com/nunnatsa/ginkgolinter.
Fixing these issues is useful (several expressions get simpler, using
framework.ExpectNoError is better because it has additional support for
failures) and a necessary step for enabling that linter in our golangci-lint
invocation.
2023-02-22 19:36:05 +01:00
Mahmoud Atwa
13d25acdfa Remove version check & 403 ignore 2023-02-22 16:03:18 +00:00
cc
d49bff855f
refine: the server-side http Request Body is always non-nil (#115908)
* refine: the server-side http Request Body is always non-nil

* revert changes under vendor

* Update staging/src/k8s.io/pod-security-admission/cmd/webhook/server/server.go

Co-authored-by: Jordan Liggitt <jordan@liggitt.net>

* Update main.go

---------

Co-authored-by: Jordan Liggitt <jordan@liggitt.net>
2023-02-22 07:18:09 -08:00
Mahmoud Atwa
e4c25afa59 Remove unneeded edit in error formatting 2023-02-22 13:51:01 +00:00
Mahmoud Atwa
b1980b2f6b Update ExistsInDiscovery to ignore 404 & 403 errors in autoscaling utils framework 2023-02-22 13:41:46 +00:00
Francesco Romani
00b41334bf e2e: node: podresources: fix restart wait
Fix the waiting logic in the e2e test loop to wait
for resources to be reported again instead of making logic on the
timestamp. The idea is that waiting for resource availability
is the canonical way clients should observe the desired state,
and it should also be more robust than comparing timestamps,
especially on CI environments.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2023-02-22 14:04:55 +01:00
Francesco Romani
92e00203e0 e2e: node: unify sample device plugin utilities
Start to consolidate the sample device plugin utility
and constants in a central place, because we need
to use it in different e2e tests.

Having a central dependency is better than a maze of
entangled e2e tests depending on each other helpers.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2023-02-22 14:04:55 +01:00
Francesco Romani
871201ba64 e2e: node: remove kubevirt device plugin
The podresources e2e tests want to exercise the case on which a device
plugin doesn't report topology affinity. The only known device plugin
which had this requirement and didn't depend on specialized hardware
was the kubevirt device plugin, which was however deprecated after
we started using it.

So the e2e tests are now broken, and in any case they can't depend on
unmaintained and liable to be obsolete code.

To unblock the state and preserve some e2e signal, we switch to the
sample device plugin, which is a stub implementation and which is
managed in-tree, so we can maintain it and ensure it fits the e2e test
usecase.

This is however a regression, because e2e tests should try their hardest
to use real devices and avoid any mocking or faking.
The upside is that using a OS-neutral device plugin for the tests enables
us to run on all the supported platform (windows!) so this could allow
us to transition these tests to conformance.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2023-02-22 14:04:22 +01:00
Francesco Romani
aa1a0385e2 e2e: node: podresources: internal cleanup
rename getPodResources for clarity. Allow to return error
(and not use ginkgo expectations), so it can actually be used
as intended inside `Eventually` blocks without blow up at the
first failure.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2023-02-22 13:49:13 +01:00
Shiming Zhang
b70d1377bb Add test for Pod TerminationGracePeriodSeconds is negative 2023-02-22 13:36:16 +08:00
Shang Ding
177905302e add integration tests for debug profiles general & baseline 2023-02-21 17:43:03 -06:00
Anish Ramasekar
c9b8ad6a55
[KMSv2] restructure kms staging dir
Signed-off-by: Anish Ramasekar <anish.ramasekar@gmail.com>
2023-02-21 22:40:25 +00:00
Kubernetes Prow Robot
06b6644fcf
Merge pull request #115815 from Huang-Wei/pod-scheduling-readiness-beta
Graduate PodSchedulingReadiness to beta
2023-02-21 14:24:32 -08:00
Kubernetes Prow Robot
edea44c82e
Merge pull request #113205 from mimowo/oomkiller-e2e-node-test
Add e2e_node test for oom killed container reason
2023-02-21 14:23:55 -08:00
Kubernetes Prow Robot
487c443239
Merge pull request #115710 from pohly/e2e-import-restrictions
e2e framework: revise import restrictions
2023-02-20 17:17:48 -08:00
Kubernetes Prow Robot
b6582ffcd5
Merge pull request #115863 from jsafrane/remove-vsphere-test-global
Remove global vSphere framework variable
2023-02-20 11:09:48 -08:00
cpanato
a2c5863adc
update distroless iptables to v0.2.1
Signed-off-by: cpanato <ctadeu@gmail.com>
2023-02-20 13:44:09 +01:00
Jan Safranek
ba099644b2 Remove global framework variable
`f framework.Framework` does not need to be global, it's used only on a few
places.

This fixes vSphereDriver.PrepareTest() in in_tree.go that schedules
ginkgo.DeferCleanup() that uses the global `f` variable, but its value is not
valid at the time of ginkgo cleanup.
2023-02-20 11:00:12 +01:00
Michal Wozniak
fd28f69ca4 Add e2e_node test for oom killed container reason 2023-02-20 08:15:45 +01:00
Arda Güçlü
6c346e6cc9 Re-enable label selector 2023-02-20 09:10:51 +03:00
Arda Güçlü
6e8a1beda7 Add integration test for diff --prune --selector
This PR adds new integration tests for `kubectl diff --prune -l` to
catch possible regressions in the future.
2023-02-20 09:10:50 +03:00
Kubernetes Prow Robot
d6fe718e19
Merge pull request #115800 from shogohida/switch-image-in-grpc-probe-tests-to-agnhost
Switch image in gRPC probe tests to agnhost
2023-02-18 10:19:36 -08:00
Shogo Hida
7cbf007e47 Fix port number
Signed-off-by: Shogo Hida <shogo.hida@gmail.com>
2023-02-18 20:44:10 +09:00
Kubernetes Prow Robot
70b2e4aa3e
Merge pull request #113312 from jiahuif-forks/feature/cel/builtins
OpenAPI-based CEL type library
2023-02-18 00:31:36 -08:00
Wei Huang
72863f65d6
Graduate PodSchedulingReadiness to beta 2023-02-17 18:45:20 -08:00
Shogo Hida
26f95f475a Fix arguments
Signed-off-by: Shogo Hida <shogo.hida@gmail.com>
2023-02-18 09:51:17 +09:00
Kante Yin
ad55d0cbc9 Use context instead when cleaning up
Signed-off-by: Kante Yin <kerthcet@gmail.com>
2023-02-17 17:13:35 +08:00
Kante Yin
014be8444a Make sure resoruces will be cleaned up when initializing error
Signed-off-by: Kante Yin <kerthcet@gmail.com>
2023-02-17 17:10:38 +08:00
Davanum Srinivas
4ecb4670cc
Remove unnecessary ETCD_UNSUPPORTED_ARCH for arm64
we should only use this env var for `arm`, since `arm64` is fully
supported by etcd folks, let us drop this!

(ex - https://github.com/etcd-io/etcd/releases/tag/v3.5.6)

ppc64le comment should be dropped as well

Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2023-02-16 21:29:13 -05:00
Kubernetes Prow Robot
5c09c9de29
Merge pull request #115828 from cpanato/go1201
[go] Bump images, dependencies and versions to go 1.20.1
2023-02-16 09:55:56 -08:00
Kubernetes Prow Robot
d004f324d3
Merge pull request #114447 from yulng/elseyw
fix:Optimize code for else logic
2023-02-16 08:33:40 -08:00
cpanato
65230338ad
[go] Bump images, dependencies and versions to go 1.20.1
Signed-off-by: cpanato <ctadeu@gmail.com>
2023-02-16 13:38:32 +01:00
Kubernetes Prow Robot
1e84987bac
Merge pull request #115799 from pohly/test-util-data-race
test/utils: avoid data race during parallel create
2023-02-16 01:53:38 -08:00
Patrick Ohly
501a7678b3 test/utils: avoid data race during parallel create
The client-go Create call writes into the object that it gets passed. Each call
therefore needs its own copy when invoked in parallel.

Seen in

   go test -v -timeout=0 -bench=.*/SchedulingBasic/5000Nodes -race ./test/integration/scheduler_perf

WARNING: DATA RACE
Read at 0x00c003fa5b00 by goroutine 45227:
  k8s.io/apimachinery/pkg/apis/meta/v1.(*TypeMeta).GroupVersionKind()
      /nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/apis/meta/v1/meta.go:126 +0x84
  k8s.io/apimachinery/pkg/runtime.WithVersionEncoder.Encode()
      /nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/runtime/helper.go:231 +0x176
  k8s.io/apimachinery/pkg/runtime.(*WithVersionEncoder).Encode()
      <autogenerated>:1 +0xfb
  k8s.io/apimachinery/pkg/runtime.Encode()
      /nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/runtime/codec.go:50 +0xb3
  k8s.io/client-go/rest.(*Request).Body()
      /nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/client-go/rest/request.go:469 +0x884
  k8s.io/client-go/kubernetes/typed/core/v1.(*pods).Create()
      /nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/client-go/kubernetes/typed/core/v1/pod.go:126 +0x264
  k8s.io/kubernetes/test/utils.CreatePodWithRetries.func1()
      /nvme/gopath/src/k8s.io/kubernetes/test/utils/create_resources.go:61 +0x111
  k8s.io/apimachinery/pkg/util/wait.ConditionFunc.WithContext.func1()
      /nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:222 +0x30
  k8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtectionWithContext()
      /nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:262 +0x7b
  k8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtection()
      /nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:255 +0x5c
  k8s.io/apimachinery/pkg/util/wait.ExponentialBackoff()
      /nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:431 +0x67
  k8s.io/kubernetes/test/utils.RetryWithExponentialBackOff()
      /nvme/gopath/src/k8s.io/kubernetes/test/utils/create_resources.go:53 +0x1be
  k8s.io/kubernetes/test/utils.CreatePodWithRetries()
      /nvme/gopath/src/k8s.io/kubernetes/test/utils/create_resources.go:70 +0x1bf
  k8s.io/kubernetes/test/utils.makeCreatePod()
      /nvme/gopath/src/k8s.io/kubernetes/test/utils/runners.go:1339 +0x68
  k8s.io/kubernetes/test/utils.CreatePod.func1()
      /nvme/gopath/src/k8s.io/kubernetes/test/utils/runners.go:1349 +0xab
  k8s.io/client-go/util/workqueue.ParallelizeUntil.func1()
      /nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/client-go/util/workqueue/parallelizer.go:90 +0x1c1

Previous write at 0x00c003fa5b00 by goroutine 45250:
  k8s.io/apimachinery/pkg/apis/meta/v1.(*TypeMeta).SetGroupVersionKind()
      /nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/apis/meta/v1/meta.go:121 +0x1cc
  k8s.io/apimachinery/pkg/runtime.WithVersionEncoder.Encode()
      /nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/runtime/helper.go:241 +0x408
  k8s.io/apimachinery/pkg/runtime.(*WithVersionEncoder).Encode()
      <autogenerated>:1 +0xfb
  k8s.io/apimachinery/pkg/runtime.Encode()
      /nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/runtime/codec.go:50 +0xb3
  k8s.io/client-go/rest.(*Request).Body()
      /nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/client-go/rest/request.go:469 +0x884
  k8s.io/client-go/kubernetes/typed/core/v1.(*pods).Create()
      /nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/client-go/kubernetes/typed/core/v1/pod.go:126 +0x264
  k8s.io/kubernetes/test/utils.CreatePodWithRetries.func1()
      /nvme/gopath/src/k8s.io/kubernetes/test/utils/create_resources.go:61 +0x111
  k8s.io/apimachinery/pkg/util/wait.ConditionFunc.WithContext.func1()
      /nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:222 +0x30
  k8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtectionWithContext()
      /nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:262 +0x7b
  k8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtection()
      /nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:255 +0x5c
  k8s.io/apimachinery/pkg/util/wait.ExponentialBackoff()
      /nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:431 +0x67
  k8s.io/kubernetes/test/utils.RetryWithExponentialBackOff()
      /nvme/gopath/src/k8s.io/kubernetes/test/utils/create_resources.go:53 +0x1be
  k8s.io/kubernetes/test/utils.CreatePodWithRetries()
      /nvme/gopath/src/k8s.io/kubernetes/test/utils/create_resources.go:70 +0x1bf
  k8s.io/kubernetes/test/utils.makeCreatePod()
      /nvme/gopath/src/k8s.io/kubernetes/test/utils/runners.go:1339 +0x68
  k8s.io/kubernetes/test/utils.CreatePod.func1()
      /nvme/gopath/src/k8s.io/kubernetes/test/utils/runners.go:1349 +0xab
  k8s.io/client-go/util/workqueue.ParallelizeUntil.func1()
      /nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/client-go/util/workqueue/parallelizer.go:90 +0x1c1
2023-02-16 08:44:42 +01:00
Kubernetes Prow Robot
908803081f
Merge pull request #115811 from danwinship/e2e-userspace-cleanup
Remove checks for userspace proxy mode in e2e tests
2023-02-15 18:03:49 -08:00
Kubernetes Prow Robot
292450717c
Merge pull request #115394 from ritazh/kmsv2-metrics
kmsv2: add metrics
2023-02-15 18:03:37 -08:00
Kubernetes Prow Robot
a25834cb5a
Merge pull request #115802 from logicalhan/webhook-metrics
webhook metrics top out at 2.5s but default timeout is 10s
2023-02-15 15:29:11 -08:00
Rita Zhang
bd0f7f8ee8
kmsv2: add metrics
Signed-off-by: Rita Zhang <rita.z.zhang@gmail.com>
2023-02-15 15:08:24 -08:00
Dan Winship
9283429f22 Remove checks for userspace proxy mode in e2e tests
It's gone
2023-02-15 16:30:58 -05:00
Han Kang
7b823002f3 add 25s bucket 2023-02-15 10:31:12 -08:00
Kubernetes Prow Robot
fa3d5730a4
Merge pull request #115797 from pohly/dra-test-driver-resource-limit-fix
e2e dra: fix resource limits in a mixed cluster
2023-02-15 09:58:33 -08:00
Han Kang
20b5205dad use 10 seconds as the biggest bucket for webhook metrics otherwise charts will top out at 2.5s for webhook latencies 2023-02-15 09:17:41 -08:00
Shogo Hida
58ae449604 Change etcd to agnhost
Signed-off-by: Shogo Hida <shogo.hida@gmail.com>
2023-02-16 00:49:28 +09:00
Kubernetes Prow Robot
e18fa74551
Merge pull request #115590 from swatisehgal/topology-mgr-duration-metrics
node: topology-mgr: Add metric to measure topology manager admission latency
2023-02-15 07:12:25 -08:00
Patrick Ohly
20d7fa2771 e2e dra: fix resource limits in a mixed cluster
The check for "resources available on a node" must treat nodes that are not
listed as "no resources available". The previous logic only worked because all
nodes were listed during E2E testing. The upcoming integration testing is
covering additional scenarios and triggered this broken case.
2023-02-15 15:12:19 +01:00
Swati Sehgal
cf21dcef51 node: topology-mgr: e2e: changes to validate admission latency metrics
The component was previously incorrect. This patch updates to
the correct component name.

Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2023-02-15 13:59:56 +00:00
Kubernetes Prow Robot
390ddafe9e
Merge pull request #114494 from chrishenzie/readwriteoncepod-beta
Graduate ReadWriteOncePod to beta, updated e2e test
2023-02-14 16:35:42 -08:00
Kubernetes Prow Robot
731238fb41
Merge pull request #115739 from ii/update-ineligible-yaml-debug-endpoints
Update ineligible endpoints yaml to include debug endpoints
2023-02-14 12:46:02 -08:00
Chris Henzie
f855c90c1e chore: Update hostpath driver to v1.11.0
This version enforces the new SINGLE_NODE_SINGLE_WRITER CSI access mode
in NodePublishVolume.

See for more details:
https://github.com/kubernetes-csi/csi-driver-host-path/pull/381
2023-02-14 10:09:58 -08:00
Chris Henzie
0e47d90dd1 test: e2e test for ReadWriteOncePod preemption 2023-02-14 10:09:57 -08:00
Kubernetes Prow Robot
4cf352c4bb
Merge pull request #115456 from pohly/goroutine-leak-check
test/integration: goroutine leak check
2023-02-14 08:31:31 -08:00
Andy Goldstein
71ec5ed81d
resourcequota: use contexual logging (#113315)
Signed-off-by: Andy Goldstein <andy.goldstein@redhat.com>
2023-02-14 07:19:31 -08:00
Patrick Ohly
f131cabfa0 test: use go-uber/goleak for strict leak checking
It provides more readable output and has additional APIs for using it inside a
unit test. goleak.IgnoreCurrent is needed to filter out the goroutine that gets
started when importing go.opencensus.io/stats/view.

In order to handle background goroutines that get created on demand and cannot
be stopped (like the one for LogzHealth), a helper function ensures that those
are running before calling goleak.IgnoreCurrent. Keeping those goroutines
running is not a problem and thus not worth the effort of adding new APIs to
stop them.

Other goroutines are genuine leaks for which no fix is available. Those get
suppressed via IgnoreTopFunction, which works as long as that function
is unique enough.

Example output for the leak fixed in https://github.com/kubernetes/kubernetes/pull/115423:

    E0202 09:30:51.641841   74789 etcd.go:205] "EtcdMain goroutine check" err=<
        found unexpected goroutines:
        [Goroutine 4889 in state chan receive, with k8s.io/apimachinery/pkg/watch.(*Broadcaster).loop on top of the stack:
        goroutine 4889 [chan receive]:
        k8s.io/apimachinery/pkg/watch.(*Broadcaster).loop(0xc0076183c0)
        	/nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/watch/mux.go:268 +0x65
        created by k8s.io/apimachinery/pkg/watch.NewBroadcaster
        	/nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/watch/mux.go:77 +0x116
    >
2023-02-14 12:11:37 +01:00
Kubernetes Prow Robot
8b2545efa3
Merge pull request #115730 from ravisantoshgudimetla/remove-cgo
Remove cgo dependency
2023-02-13 12:49:38 -08:00
Stephen Heywood
4d2611cf58 Update ineligible endpoints yaml
Adding the following endpoints
- connectCoreV1GetNamespacedPodPortforward
- connectCoreV1GetNamespacedPodAttach
- connectCoreV1PostNamespacedPodAttach
2023-02-14 09:00:44 +13:00
Kubernetes Prow Robot
b8b18ecd85
Merge pull request #114051 from chrishenzie/rwop-preemption
[scheduler] Support preemption of pods using ReadWriteOncePod PVCs
2023-02-13 11:45:30 -08:00
ravisantoshgudimetla
d65262d1f9 Remove cgo dependency 2023-02-13 11:16:39 -05:00
Kubernetes Prow Robot
bf79066749
Merge pull request #115714 from aramase/aramase/f/kubernetes#115595
[KMSv2] Add kind cluster and encryption config for e2e
2023-02-13 05:43:42 -08:00
Kubernetes Prow Robot
4933005b38
Merge pull request #115697 from aojea/lbds
don't run loadbalancer tests on large environments
2023-02-13 05:43:30 -08:00
Kubernetes Prow Robot
8ee0d3b6e8
Merge pull request #115584 from pbeschetnov/master
[HPA e2e] Calculate more precise consumed CPU usage for N replicas
2023-02-13 03:27:29 -08:00
Anish Ramasekar
4e6d5dddfb
[KMSv2] Add kind cluster and encryption config for e2e
Signed-off-by: Anish Ramasekar <anish.ramasekar@gmail.com>
2023-02-13 06:42:54 +00:00
Patrick Ohly
3e760310b2 e2e: revise import restrictions
- test/e2e/framework/*.go should have very minimal dependencies.
  We can enforce that via import-boss.

- What each test/e2e/framework/* sub-package uses is less relevant,
  although ideally it also should be as minimal as possible in each case.

Enforcing this via import-boss ensures that new dependencies get flagged as
problem and thus will get additional scrutiny. It might be okay to add them,
but it needs to be considered.
2023-02-12 14:56:45 +01:00
Antonio Ojea
244d7449ce don't run loadbalancer tests on large environments
Change-Id: Id987e9469e563c0837c6437a44a65889cec2e202
2023-02-11 10:28:25 +00:00
David Porter
826472c99d test: e2e node shutdown test logging improvements
Since the pod names are reused across the test, searching the logs is
currently difficult.

Use a uuid for each pod name to make grepping the logs easier. Also,
always include the pod name and pod namespace in any logs or error
messages to make debugging easier.

Signed-off-by: David Porter <david@porter.me>
2023-02-10 16:54:31 -08:00
Kubernetes Prow Robot
0424a530a4
Merge pull request #115678 from pohly/e2e-full-reports
e2e: revise complete report creation
2023-02-10 15:07:29 -08:00
Kubernetes Prow Robot
1749bb2991
Merge pull request #115579 from ardaguclu/fix-wait-sh-timeout
flaky test wait.sh: Add deployment assertion before running wait
2023-02-10 13:59:29 -08:00
Patrick Ohly
3e2b26ce52 e2e: revise complete report creation
The previous approach was based on the observation that some Prow jobs use the
--report-dir parameter instead of the E2E_REPORT_DIR env variable. Parsing the
command line was necessary to use the --json-report and --junit-report
parameters.

But that is complex and can be avoided by triggering the creation of complete
reports in the E2E test suite. The paths are hard-coded and relative to the
report directory to keep the code simple.

There was a report that k8s-triage started processing more data after
6db4b741dd was merged. It's unclear whether
that was because of the new <report-dir>/ginkgo_report.xml file. To avoid
this potential problem, the reports are now in a "ginkgo" sub-directory.

While at it, error checking gets enhanced:
- Create directories at the start of
  the suite and bail out early if that fails.
- *All* e2e suites using the framework do this, not just test/e2e.
- Added missing error checking of truncated JUnit report writing.
2023-02-10 10:20:20 +01:00
Arda Güçlü
e0fedec69d (kubectl debug): Support debugging via files
Currently `kubectl debug` only supports passing names in command line.
However, users might want to pass resources in files by passing `-f` flag like
in all other kubectl commands.

This PR adds this ability.
2023-02-10 10:21:30 +03:00
Kubernetes Prow Robot
b2f8c8f00d
Merge pull request #115635 from bobbypage/npd-time-fix
test: Simplify NPD start timestamp calculation
2023-02-09 18:37:31 -08:00
Kubernetes Prow Robot
0698d9eb82
Merge pull request #115649 from aramase/grpc-metrics
[KMSv2] Add metrics for grpc service
2023-02-09 15:50:45 -08:00
Kubernetes Prow Robot
6e2e61bb3c
Merge pull request #115657 from saschagrunert/inject-base64
Allow SSH e2e node base64 key injection
2023-02-09 14:45:06 -08:00
Kubernetes Prow Robot
95c65ca3a0
Merge pull request #115454 from dgrisonnet/promote-pod-resource-metrics
Promote pod resource metrics to stable
2023-02-09 12:36:16 -08:00
Damien Grisonnet
49da8a1d4a scheduler: promote pod resource metrics to stable
Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
2023-02-09 20:30:45 +01:00
Anish Ramasekar
de3b2d525b
[KMSv2] Add metrics for grpc service
Signed-off-by: Anish Ramasekar <anish.ramasekar@gmail.com>
2023-02-09 18:51:37 +00:00
Sascha Grunert
85106dc327
Allow SSH e2e node base64 key injection
With the change of the CRI-O jobs to use butane, we now have a
verification for base64 data urls in place. This means that the
following URL is invalid:

```
data:text/plain;base64,GCE_SSH_PUBLIC_KEY_FILE_CONTENT
```

This means we have to pass valid base64 to the URL. To fix that, we now
allow to inject SSH key values with both, the
`GCE_SSH_PUBLIC_KEY_FILE_CONTENT` field and its base64 encoded variant.

Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
2023-02-09 16:17:11 +01:00
vaibhav2107
6ab8a8fbec Updated the change in registry 2023-02-09 09:37:44 +05:30
David Porter
7fe371a974 test: Simplify NPD start timestamp calculation
The NPD test checks when NPD started to determine if it is needed to
check the kubelet start event. The current logic requires parsing the
journalctl logs which is quite fragile and is broken now because of
systemd changing the expected log format.

Newer versions of systemd do not print "end at" or "logs begin at" and
instead may print "No entries", which will result in the test panicking.

```
$ journalctl -u foo.service
-- No entries --
```

For units started, it will not print "end at" or "logs begin at":

```
root@ubuntu-jammy:~# journalctl -u foo.service
Feb 08 22:02:19 ubuntu-jammy systemd[1]: Started /usr/bin/sleep 1s.
Feb 08 22:02:20 ubuntu-jammy systemd[1]: foo.service: Deactivated successfully.
```

To avoid relying on log parsing which is fragile, let's instead directly
ask systemd when the NPD service started and parse the resulting
timestamp.

Signed-off-by: David Porter <david@porter.me>
2023-02-08 13:58:45 -08:00
Kubernetes Prow Robot
468ce59183
Merge pull request #115557 from MikeSpreitzer/cleanup-path-hack
Simplify construction of /metrics request
2023-02-08 09:28:58 -08:00
Matthew Cary
69808b74ec Remove obsolete GKE local SSD test
Change-Id: I156bd03ac740c2ebe394081d3106851f7182269f
2023-02-07 17:33:32 -08:00
Riaan Kleinhans
999e9f14f7
remove conformance tested endpoints 2023-02-08 11:55:44 +13:00
Kubernetes Prow Robot
090025f5e6
Merge pull request #115548 from pohly/e2e-wait-for-pods-with-gomega
e2e: wait for pods with gomega, II
2023-02-07 07:01:21 -08:00
Kubernetes Prow Robot
22b88dea36
Merge pull request #115315 from enj/enj/i/kas_kubelet_conn_close
kubelet/client: collapse transport wiring onto standard approach
2023-02-07 07:01:14 -08:00
Kubernetes Prow Robot
4f321041bd
Merge pull request #115537 from MadhavJivrajani/bump-tools-deps-go120
*: Bump golangci-lint version and adapt to new linters
2023-02-07 05:53:12 -08:00
Pavel Beschetnov
456de495ef Calculate more precise usage for replicas 2023-02-07 12:41:36 +00:00
Arda Güçlü
60401d35d1 flaky test wait.sh: Add deployment assertion before running wait
There is a test in wait.sh integration suite which is checking the
given timeout value(passed by user) is equal to actual happened timeout value.

However, this test rarely gets `no matching resources found` error and
causes flakyiness. The reason is we are running wait command, immediately
after applying deployment. In reality, timeout test does not care about
deployment, since it is testing the timeout by passing invalid configurations.
But we need this deployment to not get `no matching resources found` error.

That's why, this PR adds deployment assertion before executing wait command.
2023-02-07 14:19:34 +03:00
Madhav Jivrajani
5e1f440d0a *: Fix linter warnings
Adapt to newly improved linters in golangci-lint v1.51.1

Signed-off-by: Madhav Jivrajani <madhav.jiv@gmail.com>
2023-02-07 13:01:41 +05:30
Kubernetes Prow Robot
e944fc28ca
Merge pull request #115443 from torredil/master
Add windows nodeSelector to e2e storage testing pods
2023-02-06 18:27:09 -08:00
Monis Khan
754cb3d601
kubelet/client: collapse transport wiring onto standard approach
Signed-off-by: Monis Khan <mok@microsoft.com>
2023-02-06 20:34:49 -05:00
Mike Spreitzer
e9973979d0 Simplify construction of /metrics request 2023-02-06 16:20:34 -05:00
torredil
25389ee0ee
Add nodeSelector to e2e storage testing pods
Signed-off-by: torredil <torredil@amazon.com>
2023-02-06 16:00:51 +00:00
Patrick Ohly
136f89dfc5 e2e: use error wrapping with %w
The recently introduced failure handling in ExpectNoError depends on error
wrapping: if an error prefix gets added with `fmt.Errorf("foo: %v", err)`, then
ExpectNoError cannot detect that the root cause is an assertion failure and
then will add another useless "unexpected error" prefix and will not dump the
additional failure information (currently the backtrace inside the E2E
framework).

Instead of manually deciding on a case-by-case basis where %w is needed, all
error wrapping was updated automatically with

    sed -i "s/fmt.Errorf\(.*\): '*\(%s\|%v\)'*\",\(.* err)\)/fmt.Errorf\1: %w\",\3/" $(git grep -l 'fmt.Errorf' test/e2e*)

This may be unnecessary in some cases, but it's not wrong.
2023-02-06 15:39:13 +01:00
Patrick Ohly
9878e735dd e2e pod: unit test for pod status + API error
This covers new behavior in gomega.
2023-02-06 15:39:13 +01:00
Patrick Ohly
1bd1167d56 e2e pod: remove dead code 2023-02-06 15:39:13 +01:00
Patrick Ohly
3bb735e6fa e2e pod: use gomega.Eventually in WaitForRestartablePods 2023-02-06 15:39:13 +01:00
Patrick Ohly
1e346c4e4a e2e pod: convert ProxyResponseChecker into matcher
Instead of pod responses being printed to the log each time polling fails, we
get a consolidated failure message with all unexpected pod responses if (and
only if) the check times out or a progress report gets produced.
2023-02-06 15:39:13 +01:00
Patrick Ohly
c3266cde77 e2e: consolidate pod response checking
This renames PodsResponding to WaitForPodsResponding for the sake of
consistency and adds a timeout parameter. That is necessary because some other
users of NewProxyResponseChecker used a much lower timeout (2min vs. 15min).

Besides simplifying some code, it also makes it easier to rewrite
ProxyResponseChecker because it only gets used in WaitForPodsResponding.
2023-02-06 15:39:13 +01:00
Patrick Ohly
89a5d6d8af e2e pod: use gomega.Eventually in WaitForPodNotFoundInNamespace 2023-02-06 15:39:13 +01:00
Patrick Ohly
9df3e2a47a e2e: replace WaitForPodToDisappear with WaitForPodNotFoundInNamespace
WaitForPodToDisappear was always called such that it listed all pods, which
made it less efficient than trying to get just the one pod it was checking for.

Being able to customize the poll interval in practice wasn't useful, therefore
it can be replaced with WaitForPodNotFoundInNamespace.
2023-02-06 15:39:12 +01:00
Patrick Ohly
45d4631069 e2e: consolidate checking a pod list
WaitForPods is now a generic function which lists pods and then checks the pods
that it found against some provided condition. A parameter determines how many
pods must be found resp. match the condition for the check to succeed.
2023-02-06 15:39:12 +01:00
Patrick Ohly
d8428c6fb1 e2e pod: use gomega.Eventually in WaitTimeoutForPodReadyInNamespace/WaitForPodCondition
These get converted together because they relied on FinalErr which now isn't
needed anymore.
2023-02-06 15:39:12 +01:00
Patrick Ohly
01a40d9d6b e2e framework: support getting list of objects
This is similar to the previous support for getting a single object.
2023-02-06 15:39:12 +01:00
Patrick Ohly
3dd185aa40 e2e pod: use gomega.Eventually in WaitForPodsRunningReady
The code becomes simpler (78 insertions, 91 deletions), easier to read (all
code entirely inside WaitForPodsRunningReady, no need to declare and later
overwrite variables) and possibly more correct (if all API calls failed,
the resulting error was ignored when allowedNotReadyPods > 0).
2023-02-06 15:39:12 +01:00
Patrick Ohly
afbb2c5323 e2e framework: turn function into gomega.Matcher
The intention is to use this inside a helper function where the
corresponding Expect call is known.
2023-02-06 15:39:12 +01:00
Patrick Ohly
4d63e7d4d6 e2e: remove unused label filter from WaitForPodsRunningReady
None of the users of the functions passed anything other than nil or an empty
map and the implementation ignore the parameter - it seems like a candidate for
simplification.
2023-02-06 15:39:12 +01:00
Patrick Ohly
8181f97ecc e2e framework: include additional stack backtrace in failures
When a Gomega failure is converted to an error, the stack at the time when the
failure occurs may be useful: error wrapping provides some bread crumbs that
can be followed to determine where the failure really occurred, but error
wrapping may be missing or ambiguous.

To provide the additional information, a FailureError now includes a full stack
backtrace. The backtrace intentionally makes no attempt to exclude framework
functions besides the gomega support itself because helpers like
e2e/framework/pod may be relevant.

That backtrace is not included in the failure message for the sake of
brevity. Instead, it gets logged as part of the test's output.
2023-02-06 15:39:12 +01:00
Patrick Ohly
005a9da0cc e2e framework: implement pod polling with gomega.Eventually
gomega.Eventually provides better progress reports: instead of filling up the
log with rather useless one-line messages that are not enough to to understand
the current state, it integrates with Gingko's progress reporting (SIGUSR1,
--poll-progress-after) and then dumps the same complete failure message as
after a timeout. That makes it possible to understand why progress isn't
getting made without having to wait for the timeout.

The other advantage is that the failure message for some unexpected pod state
becomes more readable: instead of encapsulating it as "observed object" inside
an error, it directly gets rendered by gomega.
2023-02-06 15:39:12 +01:00
Patrick Ohly
71dc81ec89 e2e framework: gomega assertions as errors
Calling gomega.Expect/Eventually/Consistently deep inside a helper call chain
has several challenges:
- the stack offset must be tracked correctly, otherwise the callstack
  for the failure starts at some helper code, which is often not informative
- augmenting the failure message with additional information from each
  caller implies that each caller must pass down a string and/or format
  string plus arguments

Both challenges can be solved by returning errors:
- the stacktrace is taken at that level where the error is
  treated as a failure instead of passing back an error, i.e.
  inside the It callback
- traditional error wrapping can add additional information, if
  desirable

What was missing was some easy way to generate an error via a gomega
assertion. The new infrastructure achieves that by mirroring the
Gomega/Assertion/AsyncAssertion interfaces with errors as return values instead
of calling a fail handler.

It is intentionally less flexible than the gomega APIs:
- A context must be passed to Eventually/Consistently as first
  parameter because that is needed for proper timeout handling.
- No additional text can be added to the failure through this
  API because error wrapping is meant to be used for this.
- No need to adjust the callstack offset because no backtrace
  is recorded when a failure occurs.

To avoid the useless "unexpected error" log message when passing back a gomega
failure, ExpectNoError gets extended to recognize such errors and then skips
the logging.
2023-02-06 15:39:12 +01:00
Patrick Ohly
d17ce64ac5 e2e storage: remove WaitForPodTerminatedInNamespace
Calling WaitForPodTerminatedInNamespace after testFlexVolume is useless because
the client pod that it waits for always gets deleted by testVolumeClient:

0fcc3dbd55/test/e2e/framework/volume/fixtures.go (L541-L546)

Worse, because WaitForPodTerminatedInNamespace treats "not found" as "must keep
polling", these two tests always kept waiting for 5 minutes:

    Kubernetes e2e suite: [It] [sig-storage] Flexvolumes should be mountable
    when non-attachable 	6m4s

The only reason why these tests passed is that WaitForPodTerminatedInNamespace
used to return the "not found" API error. That is not guaranteed and about to
change.
2023-02-06 15:39:12 +01:00
Antonio Ojea
7f5ae1c0c1
Revert "e2e: wait for pods with gomega" 2023-02-06 12:08:22 +01:00
Kubernetes Prow Robot
cf14b50b0d
Merge pull request #114502 from cpanato/go1.20
[go] Bump images, dependencies and versions to go 1.20
2023-02-04 13:40:28 -08:00
Kubernetes Prow Robot
85aa0057c6
Merge pull request #113298 from pohly/e2e-wait-for-pods-with-gomega
e2e: wait for pods with gomega
2023-02-04 05:26:29 -08:00
David Porter
039a848274 test: Add e2e node test to check for unknown pods
Unknown pods are pods which are unknown pods to the kubelet, but are still
running in the container runtime. If kubelet detects a pod which is not in
the config (i.e. not present in API-server or static pod), but running as
detected in container runtime, kubelet should aggressively terminate the pod.

This situation can be encountered if a pod is running, then kubelet is
stopped, and while stopped, the manifest is deleted (by force deleting the
API pod or deleting the static pod manifest), and then restarting the
kubelet. Upon restart, kubelet will see the pod as running via the container
runtime, but it will not be present in the config, thus making the pod a
"unknown pod". Kubelet should then proceed to terminate these unknown pods.

Add two tests that ensure that unknown pods will be terminated (1)
static pods and (2) API pods. The test will start a pod, stop the
kubelet, force delete the pod (by deleting the manifest or force
deleting the pod), and then restarting the kubelet. The container
runtime is then queried to ensure the containers are terminated by
kubelet.

Signed-off-by: David Porter <david@porter.me>
2023-02-03 23:04:45 -08:00
David Porter
c2923c472d test: Move waitForAllContainerRemoval() into node e2e util
This is used across multiple tests, so let's move into the util file.

Also, refactor it a bit to provide a better error message in case of a
failure.

Signed-off-by: David Porter <david@porter.me>
2023-02-03 23:04:35 -08:00
cpanato
b9ddf07a75
[go] Bump images, dependencies and versions to go 1.20
Signed-off-by: cpanato <ctadeu@gmail.com>
2023-02-03 22:55:24 +01:00
pwschuurman
7bf175d5a2
Add integration tests for StatefulSetStartOrdinal feature (#115466)
* Add integration tests for StatefulSetStartOrdinal feature

* Move expensive test setup (apiserver and running controller) to be run once in StatefulSetStartOrdinal parameterized tests
2023-02-03 05:26:29 -08:00
Maciej Szulik
8b48ff3584
Don't explicitly set image version in tests
Image versions are already explicitly set in our manifests
configuration, so tests should not be setting these values
to ensure we're using the same versions across the board.
2023-02-02 19:06:00 +01:00
Mengjiao Liu
6f2cd1b5bd Update pkg/controller/cronjob/ for contextual logging 2023-02-02 14:27:13 +08:00