Commit Graph

6333 Commits

Author SHA1 Message Date
Michal Wozniak
b0d04d933b Introduce the job_finished_indexes_total metric 2023-10-20 15:19:04 +02:00
Michal Wozniak
32fdb55192 Use Patch instead of SSA for Pod Disruption condition 2023-10-19 21:00:19 +02:00
Yuki Iwai
d7556769e7 Job: Replace deprecated wait functions with supported one
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2023-10-19 00:14:35 +09:00
Kubernetes Prow Robot
6d70013af5
Merge pull request #121147 from kannon92/rm-at-least-no-terminating-count
Remove terminating count from rmAtLeast
2023-10-18 00:44:51 +02:00
Kubernetes Prow Robot
27ff547a14
Merge pull request #121011 from kannon92/job-pod-replacement-policy-feature-on-but-api-specified
Fix panic when enablement of pod replacement policy is skewed
2023-10-17 21:28:48 +02:00
Yuki Iwai
201c30fba8
Job: Handle error returned from AddEventHandler function (#119917)
* Job: Handle error returned from AddEventHandler function

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Use the error message the similar to CronJob

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Clean up error messages

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Put the tesing.T on the second place in the args for the newControllerFromClient function

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Put the testing.T on the second place in the args for the newControllerFromClientWithClock function

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Call t.Helper()

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Put the testing.TB on the second place in the args for the createJobControllerWithSharedInformers function and call tb.Helper() there

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Put the testing.TB on the second place in the args for the startJobControllerAndWaitForCaches function and call tb.Helper() there

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Adapt TestFinializerCleanup to the eventhandler error

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

---------

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2023-10-17 21:28:34 +02:00
Kevin Hannon
7a1ac18bc8 Fix panic if there are more terminating pods than active pods
Co-authored-by: Aldo Culquicondor <1299064+alculquicondor@users.noreply.github.com>
2023-10-17 14:50:38 -04:00
Antonio Ojea
c2d473f0d4 remove ClusterCIDR
KEP-2593 proposed to expand the existing node-ipam controller
to be configurable via a ClusterCIDR objects, however, there
were reasonable doubts on the SIG about the feature and after
several months of dicussions we decided to not move forward
with the KEP intree, hence, we are going to remove the existing
code, that is still in alpha.

https://groups.google.com/g/kubernetes-sig-network/c/nts1xEZ--gQ/m/2aTOUNFFAAAJ

Change-Id: Ieaf2007b0b23c296cde333247bfb672441fe6dfc
2023-10-14 19:06:22 +00:00
Kubernetes Prow Robot
bae6911b11
Merge pull request #121142 from aleksandra-malinowska/sts-concurrent-write-fix
Fix concurrent map writes on missing PVC creation in StatefulSet controller
2023-10-12 17:11:19 +02:00
Kubernetes Prow Robot
07029999f9
Merge pull request #120666 from b8kings0ga/feature/fix-comment-correction
AttachDetachControllerConfiguration.ReconcilerSyncLoopPeriod default value comment fix
2023-10-11 22:51:49 +02:00
Aleksandra Malinowska
7989400bef Fix concurrent write when filling PVC labels 2023-10-11 15:07:55 +02:00
Aleksandra Malinowska
54714686bc Modify test PVC to detect concurrent map write bug 2023-10-11 15:07:50 +02:00
Kevin Hannon
d7ee6b9d1b fix possible panic if pod replacement policy is turned on and jobs do not set pod replacement policy 2023-10-11 08:37:50 -04:00
Kubernetes Prow Robot
d3559bf77f
Merge pull request #120595 from jsafrane/fix-detach-uncertain
Mark a volume as uncertain-attached after detach error
2023-10-08 05:54:01 +02:00
Lukasz Stankiewicz
1b489963c8 Add nil checks for hpa object target type values 2023-10-05 17:15:51 -07:00
Kevin Hannon
b96a074bcd convert pointer to ptr for job controller 2023-10-05 09:30:01 -04:00
Abhishek Srivastav
5f8fc30b2c
Added locks on request tracker before accessing fields (#120599)
* Added locks on request tracker before accessing fields

Unit test StatefulSetAutoDeletePVCEnabled has been
flaking with DATARACE. Added lock on request tracker
before accessing err field.

* Addressed review comments for PR : Added locks on request tracker before accessing fields
2023-10-03 16:38:08 +02:00
Kubernetes Prow Robot
622509830c
Merge pull request #120716 from xrstf/fix-typos
Fix typos
2023-09-30 00:25:56 -07:00
b8kings0ga
9345da51ac fix comment mistake, run "make update" 2023-09-22 16:37:55 +08:00
Filip Křepinský
c816601d83 reintroduce resourcequota.NewMonitor
- this function is used by other packages and  was mistakenly removed
  in 397cc73dc9
- let resource quota controller use this constructor instead of an
  object instantiation
2023-09-20 17:18:55 +02:00
Kubernetes Prow Robot
fd5f36e6a0
Merge pull request #120175 from kannon92/move-pod-failure-policy-constant
move reasons to api package for job controller
2023-09-20 03:06:00 -07:00
Kubernetes Prow Robot
355feb21fd
Merge pull request #120649 from andrewsykim/fix-cronjob-controller-already-exists-err
cronjob controller: ensure already existing jobs are added to active list
2023-09-20 02:00:00 -07:00
Kubernetes Prow Robot
963c9b3cb9
Merge pull request #119317 from mochizuki875/fix_ds_rolling_update_118823
Exclude nodes from rolling update depending on tolerations
2023-09-19 16:50:17 -07:00
Kevin Hannon
a62eb45ae2 Rename job reasons to JobReasons as part of api review 2023-09-19 13:10:22 -04:00
Andrew Sy Kim
301aa69fec cronjob controller: ensure already existing jobs are added to Active list of cronjobs
Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
2023-09-19 15:18:44 +00:00
Aleksandra Malinowska
5ed60a72f6
Revert "Make StatefulSet restart pods with phase Succeeded" 2023-09-19 15:49:36 +02:00
mochizuki875
2a82776745 change rolling update logic to exclude sunsetting nodes 2023-09-19 11:39:32 +00:00
Christoph Mewes
79a7833ade fix typo Mininum => Minimum 2023-09-17 11:24:29 +02:00
Kevin Hannon
c6e9fba79b move reasons to api package for job controller 2023-09-14 13:24:29 -04:00
Kubernetes Prow Robot
a68093a3ff
Merge pull request #120506 from alexzielenski/import-restrictions
Update e2e import restrictions
2023-09-13 21:56:22 -07:00
Kubernetes Prow Robot
3eca0a5f78
Merge pull request #120398 from aleksandra-malinowska/sts-restart-always
Make StatefulSet restart pods with phase Succeeded
2023-09-13 12:40:12 -07:00
Jan Safranek
7fc11f47ff Mark a volume as uncertain-attached after detach error
Volume that failed Detach() should not be marked as attached, CSI
external-attacher is probably still trying to detach it.

Mark it uncertain instead and wait for Detach() to succeed.
2023-09-13 10:03:28 +02:00
Kubernetes Prow Robot
db49b13ccd
Merge pull request #120252 from kerthcet/cleanup/framework-import
Move framework testing libraries to the right place
2023-09-12 17:44:11 -07:00
kerthcet
6fbb8ec7e4 Move scheduler testing utils to /scheduler/testing
Signed-off-by: kerthcet <kerthcet@gmail.com>
2023-09-12 13:42:38 +08:00
Aldo Culquicondor
6b4ab616a2
Increase range of job_sync_duration_seconds
Change-Id: I7ed4b006faecf0a7e6e583c42b4d6bc4b786a164
2023-09-11 18:01:33 -04:00
Kubernetes Prow Robot
aa4ec3c5b0
Merge pull request #119944 from Sharpz7/jm/backup-finalizers
Adding backup code for removing finalizers to more Job End States.
2023-09-11 09:30:30 -07:00
Alexander Zielenski
f135eed37b update codegen 2023-09-08 09:49:35 -07:00
Aleksandra Malinowska
d7264d0af0 Make StatefulSet restart pods with phase Succeeded 2023-09-08 17:47:17 +02:00
Sharpz7
7e4b5d0d49 Final Fix 2023-09-08 14:44:22 +00:00
Stephen Kitt
aa89e6dc97
Use ptr.To to retrieve intstr addresses
This uses the generic ptr.To in k8s.io/utils to replace functions and
code constructs which only serve to return pointers to intstr
values. Other uses of the deprecated pointer package are updated in
modified files.

Signed-off-by: Stephen Kitt <skitt@redhat.com>
2023-09-08 11:10:50 +02:00
Sharpz7
43fc6b5bdb Added suggests changes 2023-09-06 03:05:14 +00:00
Kubernetes Prow Robot
73580b2038
Merge pull request #120336 from pohly/dra-generated-name-hyphen
resource claim controller: separate generated suffix from base
2023-09-05 11:22:51 -07:00
Kubernetes Prow Robot
8e2b12a220
Merge pull request #119068 from lauchokyip/podgc-unit-test
added podgc orphaned pod unit tests
2023-09-05 03:19:49 -07:00
Patrick Ohly
3c2cfd9a4f resource claim controller: separate generated suffix from base
When the resource claim name inside the pod had some suffix like "1a" in
"resource-1a", the generated name suffix got added directly after that, leading
to "my-pod-resource-1ax6zgt".

Adding another hyphen makes the result more readable: "my-pod-resource-1a-x6zgt".
2023-09-04 09:45:25 +02:00
Kubernetes Prow Robot
1cadbd5887
Merge pull request #120172 from DrAuYueng/fix-log-in-deployment-controller
Fix pod deletion log in deployment controller
2023-09-01 11:28:31 -07:00
Albert Sverdlov
a46bab6930
Fix a job quota related deadlock (#119776)
* Fix a job quota related deadlock

In case ResourceQuota is used and sets a max # of jobs, a CronJob may get
trapped in a deadlock:
  1. Job quota for a namespace is reached.
  2. CronJob controller can't create a new job, because quota is
     reached.
  3. Cleanup of jobs owned by a cronjob doesn't happen, because a
     control loop iteration is finished because of an error to create a
     job.

To fix this we stop early quitting from a control loop iteration when
cronjob reconciliation failed and always let old jobs to be cleaned up.

* Dont reorder imports

* Don't stop requeuing on reconciliation error

Previous code only logged the reconciliation error inside jm.sync() and
didn't return the reconciliation error to it's invoker
processNextWorkItem().

Adding a copy-paste back to avoid this issue.

* Remove copy-pasted cleanupFinishedJobs()

Now we always call jm.cleanupFinishedJobs() first and then
jm.syncCronJob().

We also extract cronJobCopy and updateStatus outside jm.syncCronJob
function and pass pointers to them in both jm.syncCronJob and
jm.cleanupFinishedJobs to make delayed updates handling more explicit
and not dependent on the order in which cleanupFinishedJobs and
syncCronJob are invoked.

* Return updateStatus bool instead of changing the reference

* Explicitly ignore err in tests to fix linter
2023-08-31 08:25:00 -07:00
Sharpz7
e9be1d7438 Test now has coverage! 2023-08-27 05:06:53 +00:00
DrAuYueng
a4ce32769f fix pod delete log in deployment controller
Signed-off-by: DrAuYueng <ouyang1204@gmail.com>
2023-08-25 22:20:51 +08:00
Adam McArthur
0bc0256093
Update job_controller_test.go 2023-08-25 08:15:53 -06:00
Sharpz7
22f4b1c56a Static check fix 2023-08-25 11:35:05 +00:00