Commit Graph

6593 Commits

Author SHA1 Message Date
Drew Sirenko
16c2ad5b84 Add labels to PVCollector bound/unbound PVC metrics for VolumeAttributesClass Feature (#126166)
* Add labels to PVCollector bound/unbound PVC metrics

* fixup! Add labels to PVCollector bound/unbound PVC metrics

* wip: Fix 'Unknown
    Decorator'

* fixup! Add labels to PVCollector bound/unbound PVC metrics
2024-07-23 12:21:29 -07:00
Kubernetes Prow Robot
a00181d4d4 Merge pull request #121902 from carlory/kep-3751-pv-controller
[kep-3751] pvc bind pv with vac
2024-07-23 11:02:13 -07:00
Kubernetes Prow Robot
1854839ff0 Merge pull request #126067 from tenzen-y/implement-job-success-policy-e2e
Graduate the JobSuccessPolicy to Beta
2024-07-23 06:14:23 -07:00
carlory
3a6a4830df pvc bind pv with vac 2024-07-23 15:04:11 +08:00
Yuki Iwai
551931c6a8 Graduate the JobSuccessPolicy to beta
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2024-07-23 09:29:06 +09:00
Yuki Iwai
6e8dc2c250 Job: Extend the jobs_finished_total metric reason label with SuccessPolicy and CompletionsReached
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2024-07-23 09:29:02 +09:00
Kubernetes Prow Robot
6e52e705d0 Merge pull request #125374 from pwschuurman/kep-3335-stable
Promote StatefulSetStartOrdinal to stable in 1.31
2024-07-22 14:25:49 -07:00
Kubernetes Prow Robot
d21b17264e Merge pull request #125488 from pohly/dra-1.31
DRA for 1.31
2024-07-22 11:45:55 -07:00
Patrick Ohly
0fc78b9bcc DRA resource claim controller: update test
The resource claim controller is completely agnostic to the claim spec. It
doesn't care about classes or devices, therefore it needs no changes in 1.31
besides the v1alpha2 -> v1alpha3 renaming from a previous commit.
2024-07-22 18:09:34 +02:00
Kubernetes Prow Robot
1f436e0fba Merge pull request #124108 from carlory/update-test-InTreePluginXXXUnregister
update unit test for adc to test volume migration
2024-07-22 06:49:49 -07:00
Yuki Iwai
594490fd77 Job: Add the CompletionsReached reason to the SuccessCriteriaMet condition
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2024-07-22 21:24:52 +09:00
Patrick Ohly
8a629b9f15 DRA: remove "sharable" from claim allocation result
Now all claims are shareable up to the limit imposed by the size of the
"reserverFor" array.

This is one of the agreed simplifications for 1.31.
2024-07-21 17:28:14 +02:00
Patrick Ohly
de5742ae83 DRA: remove immediate allocation
As agreed in https://github.com/kubernetes/enhancements/pull/4709, immediate
allocation is one of those features which can be removed because it makes no
sense for structured parameters and the justification for classic DRA is weak.
2024-07-21 17:28:14 +02:00
Patrick Ohly
b51d68bb87 DRA: bump API v1alpha2 -> v1alpha3
This is in preparation for revamping the resource.k8s.io completely. Because
there will be no support for transitioning from v1alpha2 to v1alpha3, the
roundtrip test data for that API in 1.29 and 1.30 gets removed.

Repeating the version in the import name of the API packages is not really
required. It was done for a while to support simpler grepping for usage of
alpha APIs, but there are better ways for that now. So during this transition,
"resourceapi" gets used instead of "resourcev1alpha3" and the version gets
dropped from informer and lister imports. The advantage is that the next bump
to v1beta1 will affect fewer source code lines.

Only source code where the version really matters (like API registration)
retains the versioned import.
2024-07-21 17:28:13 +02:00
Kubernetes Prow Robot
892acaa6a7 Merge pull request #126107 from enj/enj/i/svm_not_found_err
svm: set UID and RV on SSA patch to cause conflict on logical create
2024-07-20 08:18:01 -07:00
googs1025
6626b9ce28 chore(Job): remove deprecated fake.NewSimpleClientset method 2024-07-19 23:46:29 +08:00
googs1025
75a4cfbd58 chore(Job): use ctx.Done() instead of stopCh 2024-07-19 23:43:36 +08:00
googs1025
af5b8bed70 chore(Job): use WaitForCacheSync method after sharedInformerFactory Start 2024-07-19 23:41:20 +08:00
Monis Khan
6a6771b514 svm: set UID and RV on SSA patch to cause conflict on logical create
When a resource gets deleted during migration, the SVM SSA patch
calls are interpreted as a logical create request.  Since the object
from storage is nil, the merged result is just a type meta object,
which lacks a name in the body.  This fails when the API server
checks that the name from the request URL and the body are the same.
Note that a create request is something that SVM controller should
never do.

Once the UID is set on the patch, the API server will fail the
request at a slightly earlier point with an "uid mismatch" conflict
error, which the SVM controller can handle gracefully.

Setting UID by itself is not sufficient.  When a resource gets
deleted and recreated, if RV is not set but UID is set, we would get
an immutable field validation error for attempting to update the
UID.  To address this, we set the resource version on the SSA patch
as well.  This will cause that update request to also fail with a
conflict error.

Added the create verb on all resources for SVM controller RBAC as
otherwise the API server will reject the request before it fails
with a conflict error.

The change addresses a host of other issues with the SVM controller:

1. Include failure message in SVM resource
2. Do not block forever on unsynced GC monitor
3. Do not immediately fail on GC monitor being missing, allow for
   a grace period since discovery may be out of sync
4. Set higher QPS and burst to handle large migrations

Test changes:

1. Clean up CRD webhook convertor logs
2. Allow SVM tests to be run multiple times to make finding flakes easier
3. Create and delete CRs during CRD test to force out any flakes
4. Add a stress test with multiple parallel migrations
5. Enable RBAC on KAS
6. Run KCM directly to exercise wiring and RBAC
7. Better logs during CRD migration
8. Scan audit logs to confirm SVM controller never creates

Signed-off-by: Monis Khan <mok@microsoft.com>
2024-07-18 17:19:11 -04:00
Michal Wozniak
1be4df6e02 Cleanup Job controller isPodFailed function 2024-07-18 09:08:23 +02:00
carlory
dae05f3b88 cleanup after JobPodFailurePolicy is promoted to GA 2024-07-18 10:00:56 +08:00
Kubernetes Prow Robot
5d40866fae Merge pull request #125994 from carlory/fix-job-api
clean up codes after PodDisruptionConditions was promoted to GA
2024-07-17 14:37:09 -07:00
Peter Schuurman
585971431b Remove StatefulSetStartOrdinal feature gate to target stable in 1.31 2024-07-16 08:05:09 -07:00
Michal Wozniak
370631eb30 Update the set of reasons in comment for JobFinishedNum metric 2024-07-16 12:34:29 +02:00
Kubernetes Prow Robot
68da9a6762 Merge pull request #125925 from mmorel-35/testifylint/pkg/controller
fix: enable testifylint on `pkg/controller`
2024-07-12 12:50:25 -07:00
Michal Wozniak
f1233ac5e0 JobPodFailurePolicy to GA
# Conflicts:
#	pkg/controller/job/job_controller_test.go
2024-07-12 17:21:32 +02:00
Kubernetes Prow Robot
0a3330d6c9 Merge pull request #125510 from mimowo/extend-job-conditions
Delay setting terminal Job conditions until all pods are terminal
2024-07-12 08:12:46 -07:00
Matthieu MOREL
be59afc102 fix: enable testifylint on pkg/controller
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2024-07-12 06:18:34 +00:00
Kubernetes Prow Robot
2d4514e169 Merge pull request #125802 from mmorel-35/testifylint/len+empty
fix: enable empty and len rules from testifylint on pkg and staging package
2024-07-11 23:12:06 -07:00
Michal Wozniak
fb7704ba03 Delay setting terminal Job conditions until all pods are terminal
Fix the integration test typecheck

Fix after rebase

# Conflicts:
#	pkg/controller/job/job_controller_test.go
2024-07-11 20:54:09 +02:00
Kubernetes Prow Robot
9356b0e3e9 Merge pull request #126030 from kaisoz/test-SuccessPolicyMet-false-ignored
job_controller: Add test for SuccessCriteriaMet=False
2024-07-11 09:24:04 -07:00
Kubernetes Prow Robot
b977096f5a Merge pull request #125767 from carlory/fix-external-provisioner-1235
Fix pv reclaim failed due to its phase is wrongly updated to the Failed state by kcm
2024-07-11 07:45:18 -07:00
Tomas Tormo
5a046a4fd9 job_controller: Add test for SuccessCriteriaMet=False 2024-07-11 12:08:43 +00:00
carlory
850bc09e9b clean up codes after PodDisruptionConditions was promoted to GA and locked to default 2024-07-11 10:40:21 +08:00
Kubernetes Prow Robot
135f2e0372 Merge pull request #125997 from mimowo/job-comment-cleanup
Cleanup TODO comment in the Job controller
2024-07-10 12:15:39 -07:00
Kubernetes Prow Robot
1608dc2b09 Merge pull request #125985 from kaisoz/fix-failureTarget-manually-added
job_controller: Ignore FailureTarget JobCondition with Status != True
2024-07-10 09:02:59 -07:00
Kubernetes Prow Robot
27c6c30905 Merge pull request #124607 from gjtempleton/HPA-Reviewers-Addition
HPA - Add gjtempleton to reviewers
2024-07-10 07:08:50 -07:00
Michal Wozniak
8a8717c3a9 Cleanup TODO comment in the Job controller 2024-07-10 12:27:56 +02:00
Tomas Tormo
2aed11ec78 job_controller: Ignore FailureTarget!=True 2024-07-10 08:02:14 +00:00
Kubernetes Prow Robot
4a214f6ad9 Merge pull request #125461 from mimowo/pod-disruption-conditions-ga
Graduate PodDisruptionConditions to stable
2024-07-09 11:08:13 -07:00
Matthieu MOREL
f014b754fb fix: enable empty and len rules from testifylint on pkg package
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>

Co-authored-by: Patrick Ohly <patrick.ohly@intel.com>
2024-07-06 23:15:43 +00:00
Michal Wozniak
4250d444f8 Cleanup Job controller tests 2024-07-05 14:59:03 +02:00
lukashankeln
58c44005cd fix(cronjob): lastSuccessfullTime not set when successfulJobsHistoryLimit equal to zero (#122025)
* fix(cronjob): lastSuccessfullTime not set when successfulJobsHistoryLimit equal to zero

* fix(cronjob): added tests for successfulJobsHistoryLimit mutations
2024-07-05 03:57:47 -07:00
Kubernetes Prow Robot
11c689b945 Merge pull request #125675 from tnqn/fix-rapid-endpoints-update
Fix endpoints status out-of-sync when the pod state changes rapidly
2024-07-02 23:48:42 -07:00
Kubernetes Prow Robot
6a0aeb2adb Merge pull request #125151 from skitt/drop-ptr-wrappers-pkg-controller
pkg/controller: drop pointer wrapper functions
2024-07-02 09:28:00 -07:00
Stephen Kitt
f55b59fc02 pkg/controller: drop pointer wrapper functions
The new k8s.io/utils/ptr package provides generic wrapper functions,
which can be used instead of type-specific pointer wrapper functions.
This replaces the latter with the former, and migrates other uses of
the deprecated pointer package to ptr in affected files.

Signed-off-by: Stephen Kitt <skitt@redhat.com>
2024-07-02 16:19:12 +02:00
Quan Tian
3bd975862a Fix endpoints status out-of-sync when the pod state changes rapidly
When Pod state changes rapidly, endpoints controller may use outdated
informer cache to sync Service. If the outdated endpoints appear to be
expected by the controller, it skips updating it.

The commit fixes it by checking if endpoints informer cache is outdated
when processing a service. If the endpoints is stale, it returns an
error and retries later.

Signed-off-by: Quan Tian <quan.tian@broadcom.com>
2024-07-01 21:56:36 +08:00
Kubernetes Prow Robot
93d56511e6 Merge pull request #125021 from aojea/servicecidrbeta
KEP-1880 Multiple Service CIDRs: Graduate to Beta (2/2)
2024-06-30 08:53:25 -07:00
Antonio Ojea
0e1f9dadd6 modify components to use the networking v1beta1 API 2024-06-30 09:48:46 +00:00
Kubernetes Prow Robot
ac9aec9f9b Merge pull request #125116 from pohly/dra-one-of-source
DRA: remove "source" indirection from v1 Pod API
2024-06-28 12:46:45 -07:00