Commit Graph

6427 Commits

Author SHA1 Message Date
Antonio Ojea
3b69bd6a9b servicecidrs controller clarify condition false reevaluation
Change-Id: I0eb8d39abe9b7b0ce6472ff426e9a62e7155aae1
2023-10-31 21:05:58 +00:00
Antonio Ojea
3edcce52e3 service cidr controller manager: use new ServiceCIDR API 2023-10-31 21:05:50 +00:00
Antonio Ojea
599597ca65 fix race on ServiceCIDR deletion
When a ServiceCIDR is deleted, the service CIDR controller on the
controller manager verifies that is safe to be deleted before removing
the finalizer, howerver, since the information of deletion takes time to
propragate, there can be a race where the apiserver allocators didn't
receive the information of deletion and assign an IP address that will
be orphan.

To avoid this race, the service cidr controller waits a grace period
before removing the finalizer to ensure the allocators do not assign any
new IP Address from that range before is completely deleted.

Change-Id: Ib34d32c0bdde91c6e84f1d056db9374589b25c0b
2023-10-31 21:05:06 +00:00
Antonio Ojea
4ff80864e1 service cidr controller manager
Controls the lifecycle of the ServiceCIDRs adding finalizers and
setting the Ready condition in status when they are created, and
removing the finalizers once it is safe to remove (no orphan IPAddresses)

An IPAddress is orphan if there are no ServiceCIDR containing it.

Change-Id: Icbe31e1ed8525fa04df3b741c8a817e5f2a49e80
2023-10-31 21:05:05 +00:00
James Munnelly
76463e21d4 KEP-4193: bound service account token improvements 2023-10-30 21:15:10 +00:00
Kubernetes Prow Robot
05765a851c
Merge pull request #121389 from aleksandra-malinowska/sts-restart-always
Resubmit "Make StatefulSet restart pods with phase Succeeded"
2023-10-30 21:11:51 +01:00
Kubernetes Prow Robot
e4212878dd
Merge pull request #119208 from atosatto/separate-taint-manager
Decouple TaintManager from NodeLifeCycleController (KEP-3902)
2023-10-30 21:11:33 +01:00
Kubernetes Prow Robot
ceea5fd0cb
Merge pull request #119109 from jiahuif-forks/feature/validating-admission-policy/crd-typechecking
ValidatingAdmissionPolicy - Type Checking for API Expensions types
2023-10-30 21:11:19 +01:00
Andrea Tosatto
ccda2d6fd4 kube-controller-manager: Decouple TaintManager from NodeLifeCycleController (KEP-3902) 2023-10-30 12:23:56 +00:00
carlory
5a20ff1617 fix wrong controller name for ephemeralController 2023-10-30 18:45:13 +08:00
Kubernetes Prow Robot
74098ab5ad
Merge pull request #119500 from JackTroy/fix-threshold-arg
Add explanation for large-cluster-size-threshold arg
2023-10-30 02:50:10 +01:00
Kubernetes Prow Robot
99bf6a674c
Merge pull request #121039 from josselin-c/master
hpa: always update status metrics when updating the replica count
2023-10-28 19:35:01 +02:00
Kubernetes Prow Robot
848de697d8
Merge pull request #115711 from sourcelliu/improve
Improve lock performance
2023-10-27 23:41:32 +02:00
Kubernetes Prow Robot
fe21e4d749
Merge pull request #120682 from yt2985/cleanSA
LegacyServiceAccountTokenCleanUp beta
2023-10-27 19:08:05 +02:00
huweiwen
63b3085f2a fix ad controller populators test
The informer is not initialized, so no assertion performed before. Fixed this now.

Then fixed the test failure by using NewAttachDetachController to initialize adc.
2023-10-27 23:35:45 +08:00
tinatingyu
5925dc0775 LegacyServiceAccountTokenCleanUp beta 2023-10-27 03:52:06 +00:00
Dejan Pejchev
e98c33bfaf
switch feature flag to beta for pod replacement policy and add e2e test
update pod replacement policy feature flag comment and refactor the e2e test for pod replacement policy

minor fixes for pod replacement policy and e2e test

fix wrong assertions for pod replacement policy e2e test

more fixes to pod replacement policy e2e test

refactor PodReplacementPolicy e2e test to use finalizers

fix unit tests when pod replacement policy feature flag is promoted to beta

fix podgc controller unit tests when pod replacement feature is enabled

fix lint issue in pod replacement policy e2e test

assert no error in defer function for removing finalizer in pod replacement policy e2e test

implement test using a sh trap for pod replacement policy

reduce sleep after SIGTERM in pod replacement policy e2e test to 5s
2023-10-26 21:50:37 +02:00
Jiahui Feng
fd132665a8 extend VAP status controller for extensions type checking. 2023-10-26 10:26:03 -07:00
Aleksandra Malinowska
e07d898cfd Make StatefulSet restart pods with phase Succeeded 2023-10-26 15:34:01 +02:00
Dejan Pejchev
88c0a8be1b
feat: add job_pods_creation_total metric 2023-10-24 17:49:04 +02:00
Dejan Zele Pejchev
f8a4e343a1
Fix tracking of terminating Pods when nothing else changes (#121342)
* cleanup: refactor pod replacement policy integration test into staged assertion

* cleanup: remove typo in job_test.go

* refactor PodReplacementPolicy test and remove test for defaulting the policy

* fix issue with missing update in job controller for terminating status and refactor pod replacement policy integration test

* use t.Cleanup instead of defer in PodReplacementPolicy integration tests

* revert t.Cleanup to defer for reseting feature flag in PodReplacementPolicy integration tests
2023-10-24 15:04:46 +02:00
Kubernetes Prow Robot
cdd20eebb7
Merge pull request #118381 from SataQiu/fix-controller-20230601
controller: fix the help information format of sorting_deletion_age_ratio metric
2023-10-24 15:04:25 +02:00
Kubernetes Prow Robot
015297a577
Merge pull request #121327 from soltysh/fix_nextScheduleTimeDuration
Fix next schedule time duration
2023-10-24 12:18:35 +02:00
Maciej Szulik
bf2f640ea2
Add more test cases ensuring nextScheduleTimeDuration is never < 0 2023-10-24 11:08:02 +02:00
Kubernetes Prow Robot
ccca58aa36
Merge pull request #120075 from lowang-bh/enhancement
Call getPodRevision once
2023-10-23 19:51:40 +02:00
Kubernetes Prow Robot
8149ab3f3f
Merge pull request #121356 from mimowo/backoff-limit-per-index-beta
Graduate BackoffLimitPerIndex to Beta
2023-10-23 18:39:58 +02:00
Kubernetes Prow Robot
2b16f7b6bb
Merge pull request #120001 from qingwave/hpa-sidecar
HPA: calculate sidecar container resource in pod autoscaler
2023-10-23 18:39:31 +02:00
Kubernetes Prow Robot
1fc3d10f7e
Merge pull request #121292 from mimowo/backoff-limit-per-index-metrics
Introduce the job_finished_indexes_total metric
2023-10-20 23:50:57 +02:00
Anton Stuchinskii
34294cd67f locking feature-gate for ready pods job status 2023-10-20 16:08:54 +02:00
Michal Wozniak
b0d04d933b Introduce the job_finished_indexes_total metric 2023-10-20 15:19:04 +02:00
Kubernetes Prow Robot
568aee16e8
Merge pull request #120731 from Nordix/sts_issue/adil
Fixing CurrentReplicas and CurrentRevision in completeRollingUpdate
2023-10-20 14:42:08 +02:00
Michal Wozniak
32fdb55192 Use Patch instead of SSA for Pod Disruption condition 2023-10-19 21:00:19 +02:00
adil ghaffar
00c21ced3a
Fixing CurrentReplicas and CurrentRevision in completeRollingUpdate 2023-10-19 14:17:42 +03:00
Michal Wozniak
6dd0ad5c0f Graduate BackoffLimitPerIndex to Beta 2023-10-19 12:18:36 +02:00
Maciej Szulik
db8b303156
Modify mostRecentScheduleTime to return more detailed information about missed schedules
Initially this method was returning a number of missed schedules, but
that turned out to be not reliable for some complex schedules. For
example, those which are being run only during week days. The second
approach was to only return a boolean indicating the too many missed
information. It turns out that we need to return all three values:
none missed, few missed and many missed, to let consumers know what to
do, but don't leak the wrong number out of mostRecentScheduleTime.
2023-10-18 20:03:03 +02:00
Maciej Szulik
6c4f71b31c
Fix spelling 2023-10-18 19:15:34 +02:00
Yuki Iwai
d7556769e7 Job: Replace deprecated wait functions with supported one
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2023-10-19 00:14:35 +09:00
Kubernetes Prow Robot
6d70013af5
Merge pull request #121147 from kannon92/rm-at-least-no-terminating-count
Remove terminating count from rmAtLeast
2023-10-18 00:44:51 +02:00
Kubernetes Prow Robot
27ff547a14
Merge pull request #121011 from kannon92/job-pod-replacement-policy-feature-on-but-api-specified
Fix panic when enablement of pod replacement policy is skewed
2023-10-17 21:28:48 +02:00
Yuki Iwai
201c30fba8
Job: Handle error returned from AddEventHandler function (#119917)
* Job: Handle error returned from AddEventHandler function

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Use the error message the similar to CronJob

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Clean up error messages

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Put the tesing.T on the second place in the args for the newControllerFromClient function

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Put the testing.T on the second place in the args for the newControllerFromClientWithClock function

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Call t.Helper()

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Put the testing.TB on the second place in the args for the createJobControllerWithSharedInformers function and call tb.Helper() there

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Put the testing.TB on the second place in the args for the startJobControllerAndWaitForCaches function and call tb.Helper() there

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Adapt TestFinializerCleanup to the eventhandler error

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

---------

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2023-10-17 21:28:34 +02:00
Kevin Hannon
7a1ac18bc8 Fix panic if there are more terminating pods than active pods
Co-authored-by: Aldo Culquicondor <1299064+alculquicondor@users.noreply.github.com>
2023-10-17 14:50:38 -04:00
Antonio Ojea
c2d473f0d4 remove ClusterCIDR
KEP-2593 proposed to expand the existing node-ipam controller
to be configurable via a ClusterCIDR objects, however, there
were reasonable doubts on the SIG about the feature and after
several months of dicussions we decided to not move forward
with the KEP intree, hence, we are going to remove the existing
code, that is still in alpha.

https://groups.google.com/g/kubernetes-sig-network/c/nts1xEZ--gQ/m/2aTOUNFFAAAJ

Change-Id: Ieaf2007b0b23c296cde333247bfb672441fe6dfc
2023-10-14 19:06:22 +00:00
Kubernetes Prow Robot
bae6911b11
Merge pull request #121142 from aleksandra-malinowska/sts-concurrent-write-fix
Fix concurrent map writes on missing PVC creation in StatefulSet controller
2023-10-12 17:11:19 +02:00
Kubernetes Prow Robot
07029999f9
Merge pull request #120666 from b8kings0ga/feature/fix-comment-correction
AttachDetachControllerConfiguration.ReconcilerSyncLoopPeriod default value comment fix
2023-10-11 22:51:49 +02:00
Aleksandra Malinowska
7989400bef Fix concurrent write when filling PVC labels 2023-10-11 15:07:55 +02:00
Aleksandra Malinowska
54714686bc Modify test PVC to detect concurrent map write bug 2023-10-11 15:07:50 +02:00
Kevin Hannon
d7ee6b9d1b fix possible panic if pod replacement policy is turned on and jobs do not set pod replacement policy 2023-10-11 08:37:50 -04:00
Kubernetes Prow Robot
d3559bf77f
Merge pull request #120595 from jsafrane/fix-detach-uncertain
Mark a volume as uncertain-attached after detach error
2023-10-08 05:54:01 +02:00
Josselin Costanzi
3c4512c6cc hpa: always update status metrics when updating the replica count
Have hpa always update both the metrics and replica count. This fix an
edge case behavior bug where the metrics would not be updated if a
custom metrics was unavailable.
2023-10-06 21:34:09 +00:00
Lukasz Stankiewicz
1b489963c8 Add nil checks for hpa object target type values 2023-10-05 17:15:51 -07:00
Kevin Hannon
b96a074bcd convert pointer to ptr for job controller 2023-10-05 09:30:01 -04:00
Abhishek Srivastav
5f8fc30b2c
Added locks on request tracker before accessing fields (#120599)
* Added locks on request tracker before accessing fields

Unit test StatefulSetAutoDeletePVCEnabled has been
flaking with DATARACE. Added lock on request tracker
before accessing err field.

* Addressed review comments for PR : Added locks on request tracker before accessing fields
2023-10-03 16:38:08 +02:00
Kubernetes Prow Robot
622509830c
Merge pull request #120716 from xrstf/fix-typos
Fix typos
2023-09-30 00:25:56 -07:00
b8kings0ga
9345da51ac fix comment mistake, run "make update" 2023-09-22 16:37:55 +08:00
Filip Křepinský
c816601d83 reintroduce resourcequota.NewMonitor
- this function is used by other packages and  was mistakenly removed
  in 397cc73dc9
- let resource quota controller use this constructor instead of an
  object instantiation
2023-09-20 17:18:55 +02:00
Kubernetes Prow Robot
fd5f36e6a0
Merge pull request #120175 from kannon92/move-pod-failure-policy-constant
move reasons to api package for job controller
2023-09-20 03:06:00 -07:00
Kubernetes Prow Robot
355feb21fd
Merge pull request #120649 from andrewsykim/fix-cronjob-controller-already-exists-err
cronjob controller: ensure already existing jobs are added to active list
2023-09-20 02:00:00 -07:00
Kubernetes Prow Robot
963c9b3cb9
Merge pull request #119317 from mochizuki875/fix_ds_rolling_update_118823
Exclude nodes from rolling update depending on tolerations
2023-09-19 16:50:17 -07:00
Kevin Hannon
a62eb45ae2 Rename job reasons to JobReasons as part of api review 2023-09-19 13:10:22 -04:00
Andrew Sy Kim
301aa69fec cronjob controller: ensure already existing jobs are added to Active list of cronjobs
Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
2023-09-19 15:18:44 +00:00
Aleksandra Malinowska
5ed60a72f6
Revert "Make StatefulSet restart pods with phase Succeeded" 2023-09-19 15:49:36 +02:00
mochizuki875
2a82776745 change rolling update logic to exclude sunsetting nodes 2023-09-19 11:39:32 +00:00
Christoph Mewes
79a7833ade fix typo Mininum => Minimum 2023-09-17 11:24:29 +02:00
Stephen Kitt
567fca7baa
Use copy() instead of a for loop
Signed-off-by: Stephen Kitt <skitt@redhat.com>
2023-09-15 09:20:08 +02:00
Kevin Hannon
c6e9fba79b move reasons to api package for job controller 2023-09-14 13:24:29 -04:00
Kubernetes Prow Robot
a68093a3ff
Merge pull request #120506 from alexzielenski/import-restrictions
Update e2e import restrictions
2023-09-13 21:56:22 -07:00
Kubernetes Prow Robot
3eca0a5f78
Merge pull request #120398 from aleksandra-malinowska/sts-restart-always
Make StatefulSet restart pods with phase Succeeded
2023-09-13 12:40:12 -07:00
Jan Safranek
7fc11f47ff Mark a volume as uncertain-attached after detach error
Volume that failed Detach() should not be marked as attached, CSI
external-attacher is probably still trying to detach it.

Mark it uncertain instead and wait for Detach() to succeed.
2023-09-13 10:03:28 +02:00
Kubernetes Prow Robot
db49b13ccd
Merge pull request #120252 from kerthcet/cleanup/framework-import
Move framework testing libraries to the right place
2023-09-12 17:44:11 -07:00
kerthcet
6fbb8ec7e4 Move scheduler testing utils to /scheduler/testing
Signed-off-by: kerthcet <kerthcet@gmail.com>
2023-09-12 13:42:38 +08:00
Aldo Culquicondor
6b4ab616a2
Increase range of job_sync_duration_seconds
Change-Id: I7ed4b006faecf0a7e6e583c42b4d6bc4b786a164
2023-09-11 18:01:33 -04:00
Kubernetes Prow Robot
aa4ec3c5b0
Merge pull request #119944 from Sharpz7/jm/backup-finalizers
Adding backup code for removing finalizers to more Job End States.
2023-09-11 09:30:30 -07:00
Alexander Zielenski
f135eed37b update codegen 2023-09-08 09:49:35 -07:00
Aleksandra Malinowska
d7264d0af0 Make StatefulSet restart pods with phase Succeeded 2023-09-08 17:47:17 +02:00
Sharpz7
7e4b5d0d49 Final Fix 2023-09-08 14:44:22 +00:00
Stephen Kitt
aa89e6dc97
Use ptr.To to retrieve intstr addresses
This uses the generic ptr.To in k8s.io/utils to replace functions and
code constructs which only serve to return pointers to intstr
values. Other uses of the deprecated pointer package are updated in
modified files.

Signed-off-by: Stephen Kitt <skitt@redhat.com>
2023-09-08 11:10:50 +02:00
Sharpz7
43fc6b5bdb Added suggests changes 2023-09-06 03:05:14 +00:00
Kubernetes Prow Robot
73580b2038
Merge pull request #120336 from pohly/dra-generated-name-hyphen
resource claim controller: separate generated suffix from base
2023-09-05 11:22:51 -07:00
Kubernetes Prow Robot
8e2b12a220
Merge pull request #119068 from lauchokyip/podgc-unit-test
added podgc orphaned pod unit tests
2023-09-05 03:19:49 -07:00
Patrick Ohly
3c2cfd9a4f resource claim controller: separate generated suffix from base
When the resource claim name inside the pod had some suffix like "1a" in
"resource-1a", the generated name suffix got added directly after that, leading
to "my-pod-resource-1ax6zgt".

Adding another hyphen makes the result more readable: "my-pod-resource-1a-x6zgt".
2023-09-04 09:45:25 +02:00
Kubernetes Prow Robot
1cadbd5887
Merge pull request #120172 from DrAuYueng/fix-log-in-deployment-controller
Fix pod deletion log in deployment controller
2023-09-01 11:28:31 -07:00
Albert Sverdlov
a46bab6930
Fix a job quota related deadlock (#119776)
* Fix a job quota related deadlock

In case ResourceQuota is used and sets a max # of jobs, a CronJob may get
trapped in a deadlock:
  1. Job quota for a namespace is reached.
  2. CronJob controller can't create a new job, because quota is
     reached.
  3. Cleanup of jobs owned by a cronjob doesn't happen, because a
     control loop iteration is finished because of an error to create a
     job.

To fix this we stop early quitting from a control loop iteration when
cronjob reconciliation failed and always let old jobs to be cleaned up.

* Dont reorder imports

* Don't stop requeuing on reconciliation error

Previous code only logged the reconciliation error inside jm.sync() and
didn't return the reconciliation error to it's invoker
processNextWorkItem().

Adding a copy-paste back to avoid this issue.

* Remove copy-pasted cleanupFinishedJobs()

Now we always call jm.cleanupFinishedJobs() first and then
jm.syncCronJob().

We also extract cronJobCopy and updateStatus outside jm.syncCronJob
function and pass pointers to them in both jm.syncCronJob and
jm.cleanupFinishedJobs to make delayed updates handling more explicit
and not dependent on the order in which cleanupFinishedJobs and
syncCronJob are invoked.

* Return updateStatus bool instead of changing the reference

* Explicitly ignore err in tests to fix linter
2023-08-31 08:25:00 -07:00
Sharpz7
e9be1d7438 Test now has coverage! 2023-08-27 05:06:53 +00:00
DrAuYueng
a4ce32769f fix pod delete log in deployment controller
Signed-off-by: DrAuYueng <ouyang1204@gmail.com>
2023-08-25 22:20:51 +08:00
Adam McArthur
0bc0256093
Update job_controller_test.go 2023-08-25 08:15:53 -06:00
Sharpz7
22f4b1c56a Static check fix 2023-08-25 11:35:05 +00:00
Sharpz7
70e2deb32f Fixing lint problem 2023-08-25 11:08:59 +00:00
Sharpz7
6ded53ce4d Added back test changes 2023-08-25 10:35:58 +00:00
Sharpz7
5fb049ff47 Added create job & cleanup 2023-08-25 10:35:58 +00:00
Sharpz7
ff1659cb79 Added syncjob 2023-08-25 10:35:58 +00:00
Sharpz7
f87cc43cdb Review Changes 2023-08-25 10:35:58 +00:00
Sharpz7
d08fc3a4d0 Another one creeped in 2023-08-25 10:35:58 +00:00
Sharpz7
ef6a0eb6d8 Final Lint Fix 2023-08-25 10:35:58 +00:00
Sharpz7
aa9f38c36d More Lint Fixes 2023-08-25 10:35:58 +00:00
Sharpz7
601679446a Lint fixes 2023-08-25 10:35:58 +00:00
Sharpz7
cf32ae9453 Initial Commit 2023-08-25 10:35:58 +00:00
Sharpz7
297f04b74a Added function to remove finalizers as backup 2023-08-25 10:35:57 +00:00
Kubernetes Prow Robot
f852d7fead
Merge pull request #118653 from pohly/volume-resource-requirements
Volume resource requirements
2023-08-21 14:08:05 -07:00
Kubernetes Prow Robot
6cbc5dfac6
Merge pull request #114095 from aimuz/fix-114083
scheduler: Fix field apiVersion is missing from events reported from taint manager
2023-08-21 07:03:23 -07:00
Patrick Ohly
2472291790 api: introduce separate VolumeResourceRequirements struct
PVC and containers shared the same ResourceRequirements struct to define their
API. When resource claims were added, that struct got extended, which
accidentally also changed the PVC API. To avoid such a mistake from happening
again, PVC now uses its own VolumeResourceRequirements struct.

The `Claims` field gets removed because risk of breaking someone is low:
theoretically, YAML files which have a claims field for volumes now
get rejected when validating against the OpenAPI. Such files
have never made sense and should be fixed.

Code that uses the struct definitions needs to be updated.
2023-08-21 15:31:28 +02:00