Commit Graph

25108 Commits

Author SHA1 Message Date
Nilekh Chaudhari
131216fa8f chore: hashes keyID
Signed-off-by: Nilekh Chaudhari <1626598+nilekhc@users.noreply.github.com>
2023-07-13 20:42:09 +00:00
Jiahui Feng
049614f884 ValidatingAdmissionPolicy controller for Type Checking (#117377)
* [API REVIEW] ValidatingAdmissionPolicyStatucController config.

worker count.

* ValidatingAdmissionPolicyStatus controller.

* remove CEL typechecking from API server.

* fix initializer tests.

* remove type checking integration tests

from API server integration tests.

* validatingadmissionpolicy-status options.

* grant access to VAP controller.

* add defaulting unit test.

* generated: ./hack/update-codegen.sh

* add OWNERS for VAP status controller.

* type checking test case.
2023-07-13 13:41:50 -07:00
Andrew Sy Kim
d25075f342 update generated list of stable metrics
Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
2023-07-13 20:13:04 +00:00
Patrick Ohly
80ab8f0542 dra: handle scheduled pods in kube-controller-manager
When someone decides that a Pod should definitely run on a specific node, they
can create the Pod with spec.nodeName already set. Some custom scheduler might
do that. Then kubelet starts to check the pod and (if DRA is enabled) will
refuse to run it, either because the claims are still waiting for the first
consumer or the pod wasn't added to reservedFor. Both are things the scheduler
normally does.

Also, if a pod got scheduled while the DRA feature was off in the
kube-scheduler, a pod can reach the same state.

The resource claim controller can handle these two cases by taking over for the
kube-scheduler when nodeName is set. Triggering an allocation is simpler than
in the scheduler because all it takes is creating the right
PodSchedulingContext with spec.selectedNode set. There's no need to list nodes
because that choice was already made, permanently. Adding the pod to
reservedFor also isn't hard.

What's currently missing is triggering de-allocation of claims to re-allocate
them for the desired node. This is not important for claims that get created
for the pod from a template and then only get used once, but it might be
worthwhile to add de-allocation in the future.
2023-07-13 21:27:11 +02:00
Jordan Liggitt
39207dada2 Add integration test for node authorizer claim references 2023-07-13 20:42:21 +02:00
CoderSherlock
b7cbebcd03 Added oomkill test for init container and fix typos 2023-07-13 17:19:34 +00:00
Jan Safranek
052b06bdad Remove test Pods sharing a single local PV
The test runs two pods accessing the same local volume, which is duplicate
with "Two pods mounting a local volume at the same time" test.
2023-07-13 18:33:18 +02:00
Rafael Fonseca
9f5b6db8be test: azure: check error for cloud detection.
If something goes wrong during the Azure cloud detection, trying to cast
the returned value will result in the following panic and give no clue
as to what the error was.

```
  panic: interface conversion: cloudprovider.Interface is nil, not *azure.Cloud

goroutine 1 [running]:
k8s.io/kubernetes/test/e2e/framework/providers/azure.newProvider()
	test/e2e/framework/providers/azure/azure.go:50 +0x2b5
k8s.io/kubernetes/test/e2e/framework.SetupProviderConfig({0xc0007966b8, 0x5})
	test/e2e/framework/provider.go:82 +0x1a6
```
2023-07-13 09:04:24 +02:00
Kubernetes Prow Robot
406d2dfe61 Merge pull request #119250 from pohly/controller-contextual-logging
kube-controller-manager: finish conversion to contextual logging
2023-07-12 18:59:30 -07:00
Kubernetes Prow Robot
4af23c157c Merge pull request #119242 from carlory/add-logger
change the QueueingHintFn to pass a logger
2023-07-12 13:03:31 -07:00
Kubernetes Prow Robot
047d040ce7 Merge pull request #119012 from pohly/dra-batch-node-prepare
kubelet: support batched prepare/unprepare in v1alpha3 DRA plugin API
2023-07-12 10:57:37 -07:00
Kubernetes Prow Robot
2ec4e14bfa Merge pull request #118812 from serathius/storage-metric
Improve apiserver storage size metric
2023-07-12 10:57:26 -07:00
carlory
0599b3caa0 change the QueueingHintFn to pass a logger 2023-07-13 00:56:41 +08:00
Patrick Ohly
08d40f53a7 dra: test with and without immediate ReservedFor
The recommendation and default in the controller helper code is to set
ReservedFor to the pod which triggered delayed allocation. However, this
is neither required nor enforced. Therefore we should also test the fallback
path were kube-scheduler itself adds the pod to ReservedFor.
2023-07-12 16:57:17 +02:00
Patrick Ohly
7d064812bb kube-controller-manager: finish conversion to contextual logging
This removes all exceptions and fixes the remaining unconverted log calls.
2023-07-12 14:57:29 +02:00
Kubernetes Prow Robot
3cc729fc7f Merge pull request #119195 from pohly/dra-reallocate-flake
dra e2e: fix "reallocation works" flake
2023-07-12 05:55:25 -07:00
Patrick Ohly
d743c50bb9 kubelet: support batched prepare/unprepare in v1alpha3 DRA plugin API
Combining all prepare/unprepare operations for a pod enables plugins to
optimize the execution. Plugins can continue to use the v1beta2 API for now,
but should switch. The new API is designed so that plugins which want to work
on each claim one-by-one can do so and then report errors for each claim
separately, i.e. partial success is supported.
2023-07-12 14:50:30 +02:00
Marek Siarkowicz
7a63997c8a Improve apiserver storage size metric to allow it's graduation
Change name to make it compliant with prometheus guidelines.
Calculate it on demand instead of periodic to comply with prometheus standards.
Replace "endpoint" with "server" label to make it semantically consistent with storage factory
2023-07-12 14:33:10 +02:00
Francesco Romani
01c3a51a78 node: podresources: getallocatable: move to GA
lock the feature gate to GA, and remove the now-redundant code.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2023-07-12 14:11:22 +02:00
Francesco Romani
d78671447f e2e: node: add test to check device-requiring pods are cleaned up
Make sure orphanded pods (pods deleted while kubelet is down) are
handled correctly.
Outline:
1. create a pod (not static pod)
2. stop kubelet
3. while kubelet is down, force delete the pod on API server
4. restart kubelet
the pod becomes an orphaned pod and is expected to be killed by HandlePodCleanups.

There is a similar test already, but here we want to check device
assignment.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2023-07-12 13:25:36 +02:00
Francesco Romani
5cf50105a2 e2e: node: devices: improve the node reboot test
The recently added e2e device plugins test to cover node reboot
works fine if runs every time on CI environment (e.g CI) but
doesn't handle correctly partial setup when run repeatedly on
the same instance (developer setup).

To accomodate both flows, we extend the error management, checking
more error conditions in the flow.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2023-07-12 13:25:36 +02:00
Francesco Romani
b926aba268 e2e: node: devicemanager: update tests
Fix e2e device manager tests.
Most notably, the workload pods needs to survive a kubelet
restart. Update tests to reflect that.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2023-07-12 13:25:36 +02:00
Maciej Szulik
ab3a0b78ea Match both old and new kubectl version for a while in e2e 2023-07-12 12:49:33 +02:00
Kubernetes Prow Robot
745cfa35bd Merge pull request #119147 from mengjiao-liu/contextual-logging-controller-disruption
Migrate /pkg/controller/disruption to structured and contextual logging
2023-07-12 03:35:25 -07:00
Kubernetes Prow Robot
a8093823c3 Merge pull request #119042 from sttts/sttts-restcore-split
cmd/kube-apiserver: turn core (legacy) rest storage into standard RESTStorageProvider
2023-07-12 03:35:17 -07:00
Patrick Ohly
c143a875ed dra e2e: fix "reallocation works" flake
The main problem probably was that
https://github.com/kubernetes/kubernetes/pull/118862 moved creating the first
pod before setting up the callback which blocks allocating one claim for that
pod. This is racy because allocations happen in the background.

The test also was unnecessarily complex and hard to read:
- The intended effect can be achieved with three instead of four claims.
- It wasn't clear which claim has "external-claim-other" as name.
  Using the claim variable avoids that.
2023-07-12 11:20:47 +02:00
Mengjiao Liu
19869478c1 Migrate /pkg/controller/disruption to structured and contextual logging 2023-07-12 11:30:45 +08:00
Kubernetes Prow Robot
98e7c2a751 Merge pull request #119237 from jpbetz/jpbetz-apiserver-integration-owner
Add jpbetz as approver of apiserver integration tests
2023-07-11 20:03:18 -07:00
Kubernetes Prow Robot
2d9c951abe Merge pull request #117011 from fabi200123/Add-Node-Log-Query-Tests-
Add e2e tests for feature NodeLogQuery
2023-07-11 20:03:11 -07:00
Kubernetes Prow Robot
6ffca50136 Merge pull request #116443 from benluddy/secondary-authz-decision-caching
Cache authz decisions within the scope of validating policy admission.
2023-07-11 12:41:11 -07:00
Joe Betz
6d6595d0f6 Add jpbetz as approver of apiserver integration tests 2023-07-11 14:36:45 -04:00
Kubernetes Prow Robot
da61644869 Merge pull request #119179 from gjkim42/add-prestop-e2e-test
node-e2e: Add container lifecycle e2e tests for preStop hook
2023-07-11 10:33:23 -07:00
Kubernetes Prow Robot
e0dafe57a3 Merge pull request #117351 from pohly/dra-generated-resource-claim-names
DRA: generated resource claim names
2023-07-11 10:33:11 -07:00
Dr. Stefan Schimanski
75e3576523 kube-apiserver: rewire service controllers: kubernetesservice + IP repair 2023-07-11 17:27:20 +02:00
Arda Güçlü
3267dd9d52 kubectl delete: Introduce new interactive flag for interactive deletion (#114530) 2023-07-11 06:05:11 -07:00
Patrick Ohly
ba810871ad dra e2e: check that not generating a ResourceClaim works
This is not something that normally happens, but the API supports it because it
might be needed at some point, so we have to test it.
2023-07-11 14:23:49 +02:00
Patrick Ohly
444d23bd2f dra: generated name for ResourceClaim from template
Generating the name avoids all potential name collisions. It's not clear how
much of a problem that was because users can avoid them and the deterministic
names for generic ephemeral volumes have not led to reports from users. But
using generated names is not too hard either.

What makes it relatively easy is that the new pod.status.resourceClaimStatus
map stores the generated name for kubelet and node authorizer, i.e. the
information in the pod is sufficient to determine the name of the
ResourceClaim.

The resource claim controller becomes a bit more complex and now needs
permission to modify the pod status. The new failure scenario of "ResourceClaim
created, updating pod status fails" is handled with the help of a new special
"resource.kubernetes.io/pod-claim-name" annotation that together with the owner
reference identifies exactly for what a ResourceClaim was generated, so
updating the pod status can be retried for existing ResourceClaims.

The transition from deterministic names is handled with a special case for that
recovery code path: a ResourceClaim with no annotation and a name that follows
the Kubernetes <= 1.27 naming pattern is assumed to be generated for that pod
claim and gets added to the pod status.

There's no immediate need for it, but just in case that it may become relevant,
the name of the generated ResourceClaim may also be left unset to record that
no claim was needed. Components processing such a pod can skip whatever they
normally would do for the claim. To ensure that they do and also cover other
cases properly ("no known field is set", "must check ownership"),
resourceclaim.Name gets extended.
2023-07-11 14:23:48 +02:00
Kubernetes Prow Robot
86038ae590 Merge pull request #116846 from moshe010/e2e--node-pod-resources
kubelet pod-resources: add e2e for KubeletPodResourcesGet feature
2023-07-11 04:53:24 -07:00
Kubernetes Prow Robot
8f1852bb44 Merge pull request #115295 from Namanl2001/pkg/controller/endpointslice
Migrated `pkg/controller/endpointslice` and `pkg/controller/endpointslicemirroring` to contextual logging
2023-07-11 03:19:12 -07:00
carlory
f443c458af move non-graceful node shutdown to GA 2023-07-11 13:51:51 +08:00
Kubernetes Prow Robot
ad72319ece Merge pull request #115122 from r-erema/110782-oidc-test-coverage
add integration tests for OIDC authenticator
2023-07-10 15:29:10 -07:00
Naman
645cb90732 migrated pkg/controller/endpointslicemirroring to contextual logging
Signed-off-by: Naman <namanlakhwani@gmail.com>
2023-07-11 01:43:30 +05:30
Naman
09849b09cf migrated pkg/controller/endpointslice to contextual logging
Signed-off-by: Naman <namanlakhwani@gmail.com>
2023-07-11 01:28:22 +05:30
Sascha Grunert
3bae26ae58 Check dbus error on container runtime start/stop
We should evaluate the error, otherwise we risk to hang indefinately on
waiting for the `reschan` in:

64939b66c6/test/e2e_node/util.go (L419)

We also increase the timeout, because it can take a bit longer for
runtimes to determinate depending on the work they have to be done on
running containers.

Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
2023-07-10 13:45:40 +02:00
Kubernetes Prow Robot
80dab4127b Merge pull request #116720 from soltysh/remove_short_version
Remove long/golang version information making short the default
2023-07-10 02:41:06 -07:00
Sascha Grunert
a6554b9d5d Make kubelet label types public
We use the label definitions in CRI-O, means we now make them public to
stop vendoring/copying this part of Kubernetes.

Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
2023-07-10 10:58:44 +02:00
Gunju Kim
8fb5b6eb4c node-e2e: Add container lifecycle e2e tests for preStop hook
This ensures that the container's pre-stop hook is invoked if the
startup or liveness probe fails.
2023-07-10 08:55:48 +09:00
Kubernetes Prow Robot
d653dcab5a Merge pull request #119048 from pohly/scheduler-perf-metrics-for-perfdash
scheduler-perf: metrics for perfdash
2023-07-09 09:27:04 -07:00
Kubernetes Prow Robot
19a25bac05 Merge pull request #119159 from alculquicondor/fix-job-uncounted
Only declare job as finished after removing all finalizers
2023-07-08 01:55:03 -07:00
kerthcet
47ef977ddd Direct reference to the packages
Signed-off-by: kerthcet <kerthcet@gmail.com>
2023-07-08 12:03:46 +08:00