Commit Graph

6375 Commits

Author SHA1 Message Date
carlory
5ff42b2368 fix issue with using feature HonorPVReclaimPolicy in csi-provisioner 2024-01-17 10:57:30 +08:00
Kevin Hannon
2645b22003
Self nominate Kevin Hannon for reviewer for job controller
I have been lead the PodReplacementPolicy KEP for alpha and I helped review/fix some issues in beta.  

https://github.com/kubernetes/kubernetes/pulls?q=+is%3Apr+reviewed-by%3Akannon92+label%3Asig%2Fapps+

I have also been an active reviewer and helped GA job tracking last release.  I hope to continue reviewing Job related code.
2023-11-07 13:21:02 -05:00
Antonio Ojea
3b69bd6a9b servicecidrs controller clarify condition false reevaluation
Change-Id: I0eb8d39abe9b7b0ce6472ff426e9a62e7155aae1
2023-10-31 21:05:58 +00:00
Antonio Ojea
3edcce52e3 service cidr controller manager: use new ServiceCIDR API 2023-10-31 21:05:50 +00:00
Antonio Ojea
599597ca65 fix race on ServiceCIDR deletion
When a ServiceCIDR is deleted, the service CIDR controller on the
controller manager verifies that is safe to be deleted before removing
the finalizer, howerver, since the information of deletion takes time to
propragate, there can be a race where the apiserver allocators didn't
receive the information of deletion and assign an IP address that will
be orphan.

To avoid this race, the service cidr controller waits a grace period
before removing the finalizer to ensure the allocators do not assign any
new IP Address from that range before is completely deleted.

Change-Id: Ib34d32c0bdde91c6e84f1d056db9374589b25c0b
2023-10-31 21:05:06 +00:00
Antonio Ojea
4ff80864e1 service cidr controller manager
Controls the lifecycle of the ServiceCIDRs adding finalizers and
setting the Ready condition in status when they are created, and
removing the finalizers once it is safe to remove (no orphan IPAddresses)

An IPAddress is orphan if there are no ServiceCIDR containing it.

Change-Id: Icbe31e1ed8525fa04df3b741c8a817e5f2a49e80
2023-10-31 21:05:05 +00:00
James Munnelly
76463e21d4 KEP-4193: bound service account token improvements 2023-10-30 21:15:10 +00:00
Kubernetes Prow Robot
05765a851c
Merge pull request #121389 from aleksandra-malinowska/sts-restart-always
Resubmit "Make StatefulSet restart pods with phase Succeeded"
2023-10-30 21:11:51 +01:00
Kubernetes Prow Robot
e4212878dd
Merge pull request #119208 from atosatto/separate-taint-manager
Decouple TaintManager from NodeLifeCycleController (KEP-3902)
2023-10-30 21:11:33 +01:00
Kubernetes Prow Robot
ceea5fd0cb
Merge pull request #119109 from jiahuif-forks/feature/validating-admission-policy/crd-typechecking
ValidatingAdmissionPolicy - Type Checking for API Expensions types
2023-10-30 21:11:19 +01:00
Andrea Tosatto
ccda2d6fd4 kube-controller-manager: Decouple TaintManager from NodeLifeCycleController (KEP-3902) 2023-10-30 12:23:56 +00:00
carlory
5a20ff1617 fix wrong controller name for ephemeralController 2023-10-30 18:45:13 +08:00
Kubernetes Prow Robot
74098ab5ad
Merge pull request #119500 from JackTroy/fix-threshold-arg
Add explanation for large-cluster-size-threshold arg
2023-10-30 02:50:10 +01:00
Kubernetes Prow Robot
99bf6a674c
Merge pull request #121039 from josselin-c/master
hpa: always update status metrics when updating the replica count
2023-10-28 19:35:01 +02:00
Kubernetes Prow Robot
848de697d8
Merge pull request #115711 from sourcelliu/improve
Improve lock performance
2023-10-27 23:41:32 +02:00
Kubernetes Prow Robot
fe21e4d749
Merge pull request #120682 from yt2985/cleanSA
LegacyServiceAccountTokenCleanUp beta
2023-10-27 19:08:05 +02:00
tinatingyu
5925dc0775 LegacyServiceAccountTokenCleanUp beta 2023-10-27 03:52:06 +00:00
Dejan Pejchev
e98c33bfaf
switch feature flag to beta for pod replacement policy and add e2e test
update pod replacement policy feature flag comment and refactor the e2e test for pod replacement policy

minor fixes for pod replacement policy and e2e test

fix wrong assertions for pod replacement policy e2e test

more fixes to pod replacement policy e2e test

refactor PodReplacementPolicy e2e test to use finalizers

fix unit tests when pod replacement policy feature flag is promoted to beta

fix podgc controller unit tests when pod replacement feature is enabled

fix lint issue in pod replacement policy e2e test

assert no error in defer function for removing finalizer in pod replacement policy e2e test

implement test using a sh trap for pod replacement policy

reduce sleep after SIGTERM in pod replacement policy e2e test to 5s
2023-10-26 21:50:37 +02:00
Jiahui Feng
fd132665a8 extend VAP status controller for extensions type checking. 2023-10-26 10:26:03 -07:00
Aleksandra Malinowska
e07d898cfd Make StatefulSet restart pods with phase Succeeded 2023-10-26 15:34:01 +02:00
Dejan Pejchev
88c0a8be1b
feat: add job_pods_creation_total metric 2023-10-24 17:49:04 +02:00
Dejan Zele Pejchev
f8a4e343a1
Fix tracking of terminating Pods when nothing else changes (#121342)
* cleanup: refactor pod replacement policy integration test into staged assertion

* cleanup: remove typo in job_test.go

* refactor PodReplacementPolicy test and remove test for defaulting the policy

* fix issue with missing update in job controller for terminating status and refactor pod replacement policy integration test

* use t.Cleanup instead of defer in PodReplacementPolicy integration tests

* revert t.Cleanup to defer for reseting feature flag in PodReplacementPolicy integration tests
2023-10-24 15:04:46 +02:00
Kubernetes Prow Robot
cdd20eebb7
Merge pull request #118381 from SataQiu/fix-controller-20230601
controller: fix the help information format of sorting_deletion_age_ratio metric
2023-10-24 15:04:25 +02:00
Kubernetes Prow Robot
015297a577
Merge pull request #121327 from soltysh/fix_nextScheduleTimeDuration
Fix next schedule time duration
2023-10-24 12:18:35 +02:00
Maciej Szulik
bf2f640ea2
Add more test cases ensuring nextScheduleTimeDuration is never < 0 2023-10-24 11:08:02 +02:00
Kubernetes Prow Robot
ccca58aa36
Merge pull request #120075 from lowang-bh/enhancement
Call getPodRevision once
2023-10-23 19:51:40 +02:00
Kubernetes Prow Robot
8149ab3f3f
Merge pull request #121356 from mimowo/backoff-limit-per-index-beta
Graduate BackoffLimitPerIndex to Beta
2023-10-23 18:39:58 +02:00
Kubernetes Prow Robot
2b16f7b6bb
Merge pull request #120001 from qingwave/hpa-sidecar
HPA: calculate sidecar container resource in pod autoscaler
2023-10-23 18:39:31 +02:00
Kubernetes Prow Robot
1fc3d10f7e
Merge pull request #121292 from mimowo/backoff-limit-per-index-metrics
Introduce the job_finished_indexes_total metric
2023-10-20 23:50:57 +02:00
Anton Stuchinskii
34294cd67f locking feature-gate for ready pods job status 2023-10-20 16:08:54 +02:00
Michal Wozniak
b0d04d933b Introduce the job_finished_indexes_total metric 2023-10-20 15:19:04 +02:00
Kubernetes Prow Robot
568aee16e8
Merge pull request #120731 from Nordix/sts_issue/adil
Fixing CurrentReplicas and CurrentRevision in completeRollingUpdate
2023-10-20 14:42:08 +02:00
Michal Wozniak
32fdb55192 Use Patch instead of SSA for Pod Disruption condition 2023-10-19 21:00:19 +02:00
adil ghaffar
00c21ced3a
Fixing CurrentReplicas and CurrentRevision in completeRollingUpdate 2023-10-19 14:17:42 +03:00
Michal Wozniak
6dd0ad5c0f Graduate BackoffLimitPerIndex to Beta 2023-10-19 12:18:36 +02:00
Maciej Szulik
db8b303156
Modify mostRecentScheduleTime to return more detailed information about missed schedules
Initially this method was returning a number of missed schedules, but
that turned out to be not reliable for some complex schedules. For
example, those which are being run only during week days. The second
approach was to only return a boolean indicating the too many missed
information. It turns out that we need to return all three values:
none missed, few missed and many missed, to let consumers know what to
do, but don't leak the wrong number out of mostRecentScheduleTime.
2023-10-18 20:03:03 +02:00
Maciej Szulik
6c4f71b31c
Fix spelling 2023-10-18 19:15:34 +02:00
Yuki Iwai
d7556769e7 Job: Replace deprecated wait functions with supported one
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2023-10-19 00:14:35 +09:00
Kubernetes Prow Robot
6d70013af5
Merge pull request #121147 from kannon92/rm-at-least-no-terminating-count
Remove terminating count from rmAtLeast
2023-10-18 00:44:51 +02:00
Kubernetes Prow Robot
27ff547a14
Merge pull request #121011 from kannon92/job-pod-replacement-policy-feature-on-but-api-specified
Fix panic when enablement of pod replacement policy is skewed
2023-10-17 21:28:48 +02:00
Yuki Iwai
201c30fba8
Job: Handle error returned from AddEventHandler function (#119917)
* Job: Handle error returned from AddEventHandler function

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Use the error message the similar to CronJob

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Clean up error messages

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Put the tesing.T on the second place in the args for the newControllerFromClient function

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Put the testing.T on the second place in the args for the newControllerFromClientWithClock function

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Call t.Helper()

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Put the testing.TB on the second place in the args for the createJobControllerWithSharedInformers function and call tb.Helper() there

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Put the testing.TB on the second place in the args for the startJobControllerAndWaitForCaches function and call tb.Helper() there

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Adapt TestFinializerCleanup to the eventhandler error

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

---------

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2023-10-17 21:28:34 +02:00
Kevin Hannon
7a1ac18bc8 Fix panic if there are more terminating pods than active pods
Co-authored-by: Aldo Culquicondor <1299064+alculquicondor@users.noreply.github.com>
2023-10-17 14:50:38 -04:00
Antonio Ojea
c2d473f0d4 remove ClusterCIDR
KEP-2593 proposed to expand the existing node-ipam controller
to be configurable via a ClusterCIDR objects, however, there
were reasonable doubts on the SIG about the feature and after
several months of dicussions we decided to not move forward
with the KEP intree, hence, we are going to remove the existing
code, that is still in alpha.

https://groups.google.com/g/kubernetes-sig-network/c/nts1xEZ--gQ/m/2aTOUNFFAAAJ

Change-Id: Ieaf2007b0b23c296cde333247bfb672441fe6dfc
2023-10-14 19:06:22 +00:00
Kubernetes Prow Robot
bae6911b11
Merge pull request #121142 from aleksandra-malinowska/sts-concurrent-write-fix
Fix concurrent map writes on missing PVC creation in StatefulSet controller
2023-10-12 17:11:19 +02:00
Kubernetes Prow Robot
07029999f9
Merge pull request #120666 from b8kings0ga/feature/fix-comment-correction
AttachDetachControllerConfiguration.ReconcilerSyncLoopPeriod default value comment fix
2023-10-11 22:51:49 +02:00
Aleksandra Malinowska
7989400bef Fix concurrent write when filling PVC labels 2023-10-11 15:07:55 +02:00
Aleksandra Malinowska
54714686bc Modify test PVC to detect concurrent map write bug 2023-10-11 15:07:50 +02:00
Kevin Hannon
d7ee6b9d1b fix possible panic if pod replacement policy is turned on and jobs do not set pod replacement policy 2023-10-11 08:37:50 -04:00
Kubernetes Prow Robot
d3559bf77f
Merge pull request #120595 from jsafrane/fix-detach-uncertain
Mark a volume as uncertain-attached after detach error
2023-10-08 05:54:01 +02:00
Josselin Costanzi
3c4512c6cc hpa: always update status metrics when updating the replica count
Have hpa always update both the metrics and replica count. This fix an
edge case behavior bug where the metrics would not be updated if a
custom metrics was unavailable.
2023-10-06 21:34:09 +00:00