Commit Graph

6449 Commits

Author SHA1 Message Date
Anton Stuchinskii
34294cd67f locking feature-gate for ready pods job status 2023-10-20 16:08:54 +02:00
Michal Wozniak
b0d04d933b Introduce the job_finished_indexes_total metric 2023-10-20 15:19:04 +02:00
Kubernetes Prow Robot
568aee16e8
Merge pull request #120731 from Nordix/sts_issue/adil
Fixing CurrentReplicas and CurrentRevision in completeRollingUpdate
2023-10-20 14:42:08 +02:00
Michal Wozniak
32fdb55192 Use Patch instead of SSA for Pod Disruption condition 2023-10-19 21:00:19 +02:00
adil ghaffar
00c21ced3a
Fixing CurrentReplicas and CurrentRevision in completeRollingUpdate 2023-10-19 14:17:42 +03:00
Michal Wozniak
6dd0ad5c0f Graduate BackoffLimitPerIndex to Beta 2023-10-19 12:18:36 +02:00
Maciej Szulik
db8b303156
Modify mostRecentScheduleTime to return more detailed information about missed schedules
Initially this method was returning a number of missed schedules, but
that turned out to be not reliable for some complex schedules. For
example, those which are being run only during week days. The second
approach was to only return a boolean indicating the too many missed
information. It turns out that we need to return all three values:
none missed, few missed and many missed, to let consumers know what to
do, but don't leak the wrong number out of mostRecentScheduleTime.
2023-10-18 20:03:03 +02:00
Maciej Szulik
6c4f71b31c
Fix spelling 2023-10-18 19:15:34 +02:00
Yuki Iwai
d7556769e7 Job: Replace deprecated wait functions with supported one
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2023-10-19 00:14:35 +09:00
Kubernetes Prow Robot
6d70013af5
Merge pull request #121147 from kannon92/rm-at-least-no-terminating-count
Remove terminating count from rmAtLeast
2023-10-18 00:44:51 +02:00
Kubernetes Prow Robot
27ff547a14
Merge pull request #121011 from kannon92/job-pod-replacement-policy-feature-on-but-api-specified
Fix panic when enablement of pod replacement policy is skewed
2023-10-17 21:28:48 +02:00
Yuki Iwai
201c30fba8
Job: Handle error returned from AddEventHandler function (#119917)
* Job: Handle error returned from AddEventHandler function

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Use the error message the similar to CronJob

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Clean up error messages

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Put the tesing.T on the second place in the args for the newControllerFromClient function

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Put the testing.T on the second place in the args for the newControllerFromClientWithClock function

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Call t.Helper()

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Put the testing.TB on the second place in the args for the createJobControllerWithSharedInformers function and call tb.Helper() there

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Put the testing.TB on the second place in the args for the startJobControllerAndWaitForCaches function and call tb.Helper() there

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* Adapt TestFinializerCleanup to the eventhandler error

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

---------

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2023-10-17 21:28:34 +02:00
Kevin Hannon
7a1ac18bc8 Fix panic if there are more terminating pods than active pods
Co-authored-by: Aldo Culquicondor <1299064+alculquicondor@users.noreply.github.com>
2023-10-17 14:50:38 -04:00
Antonio Ojea
c2d473f0d4 remove ClusterCIDR
KEP-2593 proposed to expand the existing node-ipam controller
to be configurable via a ClusterCIDR objects, however, there
were reasonable doubts on the SIG about the feature and after
several months of dicussions we decided to not move forward
with the KEP intree, hence, we are going to remove the existing
code, that is still in alpha.

https://groups.google.com/g/kubernetes-sig-network/c/nts1xEZ--gQ/m/2aTOUNFFAAAJ

Change-Id: Ieaf2007b0b23c296cde333247bfb672441fe6dfc
2023-10-14 19:06:22 +00:00
Kubernetes Prow Robot
bae6911b11
Merge pull request #121142 from aleksandra-malinowska/sts-concurrent-write-fix
Fix concurrent map writes on missing PVC creation in StatefulSet controller
2023-10-12 17:11:19 +02:00
Kubernetes Prow Robot
07029999f9
Merge pull request #120666 from b8kings0ga/feature/fix-comment-correction
AttachDetachControllerConfiguration.ReconcilerSyncLoopPeriod default value comment fix
2023-10-11 22:51:49 +02:00
Aleksandra Malinowska
7989400bef Fix concurrent write when filling PVC labels 2023-10-11 15:07:55 +02:00
Aleksandra Malinowska
54714686bc Modify test PVC to detect concurrent map write bug 2023-10-11 15:07:50 +02:00
Kevin Hannon
d7ee6b9d1b fix possible panic if pod replacement policy is turned on and jobs do not set pod replacement policy 2023-10-11 08:37:50 -04:00
Kubernetes Prow Robot
d3559bf77f
Merge pull request #120595 from jsafrane/fix-detach-uncertain
Mark a volume as uncertain-attached after detach error
2023-10-08 05:54:01 +02:00
Josselin Costanzi
3c4512c6cc hpa: always update status metrics when updating the replica count
Have hpa always update both the metrics and replica count. This fix an
edge case behavior bug where the metrics would not be updated if a
custom metrics was unavailable.
2023-10-06 21:34:09 +00:00
Lukasz Stankiewicz
1b489963c8 Add nil checks for hpa object target type values 2023-10-05 17:15:51 -07:00
Kevin Hannon
b96a074bcd convert pointer to ptr for job controller 2023-10-05 09:30:01 -04:00
Abhishek Srivastav
5f8fc30b2c
Added locks on request tracker before accessing fields (#120599)
* Added locks on request tracker before accessing fields

Unit test StatefulSetAutoDeletePVCEnabled has been
flaking with DATARACE. Added lock on request tracker
before accessing err field.

* Addressed review comments for PR : Added locks on request tracker before accessing fields
2023-10-03 16:38:08 +02:00
Kubernetes Prow Robot
622509830c
Merge pull request #120716 from xrstf/fix-typos
Fix typos
2023-09-30 00:25:56 -07:00
b8kings0ga
9345da51ac fix comment mistake, run "make update" 2023-09-22 16:37:55 +08:00
Filip Křepinský
c816601d83 reintroduce resourcequota.NewMonitor
- this function is used by other packages and  was mistakenly removed
  in 397cc73dc9
- let resource quota controller use this constructor instead of an
  object instantiation
2023-09-20 17:18:55 +02:00
Kubernetes Prow Robot
fd5f36e6a0
Merge pull request #120175 from kannon92/move-pod-failure-policy-constant
move reasons to api package for job controller
2023-09-20 03:06:00 -07:00
Kubernetes Prow Robot
355feb21fd
Merge pull request #120649 from andrewsykim/fix-cronjob-controller-already-exists-err
cronjob controller: ensure already existing jobs are added to active list
2023-09-20 02:00:00 -07:00
Kubernetes Prow Robot
963c9b3cb9
Merge pull request #119317 from mochizuki875/fix_ds_rolling_update_118823
Exclude nodes from rolling update depending on tolerations
2023-09-19 16:50:17 -07:00
Kevin Hannon
a62eb45ae2 Rename job reasons to JobReasons as part of api review 2023-09-19 13:10:22 -04:00
Andrew Sy Kim
301aa69fec cronjob controller: ensure already existing jobs are added to Active list of cronjobs
Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
2023-09-19 15:18:44 +00:00
Aleksandra Malinowska
5ed60a72f6
Revert "Make StatefulSet restart pods with phase Succeeded" 2023-09-19 15:49:36 +02:00
mochizuki875
2a82776745 change rolling update logic to exclude sunsetting nodes 2023-09-19 11:39:32 +00:00
Christoph Mewes
79a7833ade fix typo Mininum => Minimum 2023-09-17 11:24:29 +02:00
Stephen Kitt
567fca7baa
Use copy() instead of a for loop
Signed-off-by: Stephen Kitt <skitt@redhat.com>
2023-09-15 09:20:08 +02:00
Kevin Hannon
c6e9fba79b move reasons to api package for job controller 2023-09-14 13:24:29 -04:00
Kubernetes Prow Robot
a68093a3ff
Merge pull request #120506 from alexzielenski/import-restrictions
Update e2e import restrictions
2023-09-13 21:56:22 -07:00
Kubernetes Prow Robot
3eca0a5f78
Merge pull request #120398 from aleksandra-malinowska/sts-restart-always
Make StatefulSet restart pods with phase Succeeded
2023-09-13 12:40:12 -07:00
Jan Safranek
7fc11f47ff Mark a volume as uncertain-attached after detach error
Volume that failed Detach() should not be marked as attached, CSI
external-attacher is probably still trying to detach it.

Mark it uncertain instead and wait for Detach() to succeed.
2023-09-13 10:03:28 +02:00
Kubernetes Prow Robot
db49b13ccd
Merge pull request #120252 from kerthcet/cleanup/framework-import
Move framework testing libraries to the right place
2023-09-12 17:44:11 -07:00
kerthcet
6fbb8ec7e4 Move scheduler testing utils to /scheduler/testing
Signed-off-by: kerthcet <kerthcet@gmail.com>
2023-09-12 13:42:38 +08:00
Aldo Culquicondor
6b4ab616a2
Increase range of job_sync_duration_seconds
Change-Id: I7ed4b006faecf0a7e6e583c42b4d6bc4b786a164
2023-09-11 18:01:33 -04:00
Kubernetes Prow Robot
aa4ec3c5b0
Merge pull request #119944 from Sharpz7/jm/backup-finalizers
Adding backup code for removing finalizers to more Job End States.
2023-09-11 09:30:30 -07:00
Alexander Zielenski
f135eed37b update codegen 2023-09-08 09:49:35 -07:00
Aleksandra Malinowska
d7264d0af0 Make StatefulSet restart pods with phase Succeeded 2023-09-08 17:47:17 +02:00
Sharpz7
7e4b5d0d49 Final Fix 2023-09-08 14:44:22 +00:00
Stephen Kitt
aa89e6dc97
Use ptr.To to retrieve intstr addresses
This uses the generic ptr.To in k8s.io/utils to replace functions and
code constructs which only serve to return pointers to intstr
values. Other uses of the deprecated pointer package are updated in
modified files.

Signed-off-by: Stephen Kitt <skitt@redhat.com>
2023-09-08 11:10:50 +02:00
Sharpz7
43fc6b5bdb Added suggests changes 2023-09-06 03:05:14 +00:00
Kubernetes Prow Robot
73580b2038
Merge pull request #120336 from pohly/dra-generated-name-hyphen
resource claim controller: separate generated suffix from base
2023-09-05 11:22:51 -07:00
Kubernetes Prow Robot
8e2b12a220
Merge pull request #119068 from lauchokyip/podgc-unit-test
added podgc orphaned pod unit tests
2023-09-05 03:19:49 -07:00
Patrick Ohly
3c2cfd9a4f resource claim controller: separate generated suffix from base
When the resource claim name inside the pod had some suffix like "1a" in
"resource-1a", the generated name suffix got added directly after that, leading
to "my-pod-resource-1ax6zgt".

Adding another hyphen makes the result more readable: "my-pod-resource-1a-x6zgt".
2023-09-04 09:45:25 +02:00
Kubernetes Prow Robot
1cadbd5887
Merge pull request #120172 from DrAuYueng/fix-log-in-deployment-controller
Fix pod deletion log in deployment controller
2023-09-01 11:28:31 -07:00
Albert Sverdlov
a46bab6930
Fix a job quota related deadlock (#119776)
* Fix a job quota related deadlock

In case ResourceQuota is used and sets a max # of jobs, a CronJob may get
trapped in a deadlock:
  1. Job quota for a namespace is reached.
  2. CronJob controller can't create a new job, because quota is
     reached.
  3. Cleanup of jobs owned by a cronjob doesn't happen, because a
     control loop iteration is finished because of an error to create a
     job.

To fix this we stop early quitting from a control loop iteration when
cronjob reconciliation failed and always let old jobs to be cleaned up.

* Dont reorder imports

* Don't stop requeuing on reconciliation error

Previous code only logged the reconciliation error inside jm.sync() and
didn't return the reconciliation error to it's invoker
processNextWorkItem().

Adding a copy-paste back to avoid this issue.

* Remove copy-pasted cleanupFinishedJobs()

Now we always call jm.cleanupFinishedJobs() first and then
jm.syncCronJob().

We also extract cronJobCopy and updateStatus outside jm.syncCronJob
function and pass pointers to them in both jm.syncCronJob and
jm.cleanupFinishedJobs to make delayed updates handling more explicit
and not dependent on the order in which cleanupFinishedJobs and
syncCronJob are invoked.

* Return updateStatus bool instead of changing the reference

* Explicitly ignore err in tests to fix linter
2023-08-31 08:25:00 -07:00
Sharpz7
e9be1d7438 Test now has coverage! 2023-08-27 05:06:53 +00:00
DrAuYueng
a4ce32769f fix pod delete log in deployment controller
Signed-off-by: DrAuYueng <ouyang1204@gmail.com>
2023-08-25 22:20:51 +08:00
Adam McArthur
0bc0256093
Update job_controller_test.go 2023-08-25 08:15:53 -06:00
Sharpz7
22f4b1c56a Static check fix 2023-08-25 11:35:05 +00:00
Sharpz7
70e2deb32f Fixing lint problem 2023-08-25 11:08:59 +00:00
Sharpz7
6ded53ce4d Added back test changes 2023-08-25 10:35:58 +00:00
Sharpz7
5fb049ff47 Added create job & cleanup 2023-08-25 10:35:58 +00:00
Sharpz7
ff1659cb79 Added syncjob 2023-08-25 10:35:58 +00:00
Sharpz7
f87cc43cdb Review Changes 2023-08-25 10:35:58 +00:00
Sharpz7
d08fc3a4d0 Another one creeped in 2023-08-25 10:35:58 +00:00
Sharpz7
ef6a0eb6d8 Final Lint Fix 2023-08-25 10:35:58 +00:00
Sharpz7
aa9f38c36d More Lint Fixes 2023-08-25 10:35:58 +00:00
Sharpz7
601679446a Lint fixes 2023-08-25 10:35:58 +00:00
Sharpz7
cf32ae9453 Initial Commit 2023-08-25 10:35:58 +00:00
Sharpz7
297f04b74a Added function to remove finalizers as backup 2023-08-25 10:35:57 +00:00
Kubernetes Prow Robot
f852d7fead
Merge pull request #118653 from pohly/volume-resource-requirements
Volume resource requirements
2023-08-21 14:08:05 -07:00
Kubernetes Prow Robot
6cbc5dfac6
Merge pull request #114095 from aimuz/fix-114083
scheduler: Fix field apiVersion is missing from events reported from taint manager
2023-08-21 07:03:23 -07:00
Patrick Ohly
2472291790 api: introduce separate VolumeResourceRequirements struct
PVC and containers shared the same ResourceRequirements struct to define their
API. When resource claims were added, that struct got extended, which
accidentally also changed the PVC API. To avoid such a mistake from happening
again, PVC now uses its own VolumeResourceRequirements struct.

The `Claims` field gets removed because risk of breaking someone is low:
theoretically, YAML files which have a claims field for volumes now
get rejected when validating against the OpenAPI. Such files
have never made sense and should be fixed.

Code that uses the struct definitions needs to be updated.
2023-08-21 15:31:28 +02:00
lowang-bh
2f576969ee reducae function calling to once
Signed-off-by: lowang-bh <lhui_wang@163.com>
2023-08-20 19:36:50 +08:00
qingwave
7f805dc935 fix test name 2023-08-19 03:31:34 +00:00
Kubernetes Prow Robot
df493712e4
Merge pull request #119874 from kannon92/pod-replacement-policy-typos
fix typos for pod replacement policy
2023-08-17 11:21:34 -07:00
qingwave
f07a3a3f26 calculate sidecar container resource in pod autoscaler
Signed-off-by: qingwave <isguory@gmail.com>
2023-08-17 07:47:59 +00:00
git-jxj
a5b3a4b738
cleanup: Update deprecated FromInt to FromInt32 (#119858)
* redo commit

* apply suggestions from liggitt

* update Parse function based on suggestions
2023-08-16 09:33:01 -07:00
Kubernetes Prow Robot
d5f2420309
Merge pull request #119914 from luohaha3123/job-feature
Job: Change job controller  methods receiver to pointer
2023-08-15 23:14:05 -07:00
Kubernetes Prow Robot
fa1fc7a9cb
Merge pull request #119904 from tenzen-y/replace-deprecated-workqueue-lib
Job: Replace deprecated workqueue function with supported one
2023-08-15 23:13:52 -07:00
Kubernetes Prow Robot
5638fe5f33
Merge pull request #119214 from kaisoz/refactor-controller-utils-test
Rewrite the tests to be table driven
2023-08-15 15:17:55 -07:00
Kubernetes Prow Robot
7407f36b4b
Merge pull request #117992 from liggitt/gc-discovery-flutter
Fix duplicate GC event handlers getting added if discovery flutters
2023-08-15 15:16:50 -07:00
lhaha
947c9376f6 change struct methods receiver to pointer 2023-08-12 10:21:14 +08:00
Yuki Iwai
6f27733af8 Job: Replace deprecated workqueue library with supported one
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2023-08-11 20:35:36 +09:00
kannon92
f73c253acc fix typos for pod replacement policy 2023-08-09 20:34:48 +00:00
Tomas Tormo
074d5b5329 Rewrite the tests to be table driven 2023-08-03 08:39:46 +00:00
Kubernetes Prow Robot
18f8cb8398
Merge pull request #118644 from alexzielenski/apiserver/policy/namespaceParamRef
KEP-3488: Promote ValidatingAdmissionPolicy to Beta
2023-07-21 17:44:08 -07:00
Alexander Zielenski
ef8670c946 refactor: replace usage of v1alpha1 with v1beta1
v1alpha -> v1beta

fill in DenyAction where there is no ParameterNotFoundAction
2023-07-21 13:41:24 -07:00
Kubernetes Prow Robot
a30f6b7922
Merge pull request #119506 from mimowo/fix-job-controller-flaky-test
Fix the flaky TestJobApiBackoffReset test
2023-07-21 09:30:07 -07:00
Michal Wozniak
dbea279112 Fix the flaky TestJobApiBackoffReset test 2023-07-21 14:45:04 +02:00
jackcui
9d8959224c add explanation for large-cluster-size-threshold arg about multiple zones cluster 2023-07-21 17:25:51 +08:00
kannon92
74fcf3e766 implementation of PodReplacementPolicy kep in the job controller 2023-07-21 00:44:53 +00:00
Michal Wozniak
35d0af9243 Include ignored pods when computing backoff delay for Job pod failures 2023-07-19 17:39:58 +02:00
Kubernetes Prow Robot
88c8bcbb4a
Merge pull request #115952 from pacoxu/cleanup-cronjob
cronjob: return immediately when failed to create job for the namespace is terminating
2023-07-18 21:28:02 -07:00
Chok Yip Lau
dbaa9fe6b4 added podgc orphaned pod unit tests 2023-07-18 23:06:19 -04:00
Kubernetes Prow Robot
d1d86dafb7
Merge pull request #118772 from kannon92/terminating-pod-gc
KEP-3939: pod gc changes for pod replacement policy kep
2023-07-18 16:46:03 -07:00
Michał Woźniak
a15c27661e
Job controller implementation of backoff limit per index (#118009) 2023-07-18 13:44:11 -07:00
carlory
ad0b671b56 getPersistentVolume remove reduant deep copy 2023-07-18 18:55:42 +08:00
Daniel Vega-Myhre
7698fe7639
Add StatefulSet pod index as pod label (#119232)
* add statefulset pod index as pod label

* change statefulset pod index label name

* check 3 pods

* change label variable name
2023-07-17 12:47:10 -07:00
kannon92
e38ab6d367 Add PodGC changes for PodReplacementPolicy 2023-07-16 23:47:04 +00:00
Kubernetes Prow Robot
84a999923f
Merge pull request #119335 from mimowo/use-final-diff-for-job-pod-creation
Ensure final diff is used for setting expectations for Job pod creation
2023-07-14 15:20:54 -07:00
Kubernetes Prow Robot
6f3856f953
Merge pull request #118883 from danielvegamyhre/kep-4017-job
Add completion index as pod label for indexed jobs
2023-07-14 12:23:50 -07:00
Michal Wozniak
9564bdc39d Ensure final diff is used for setting expectations for Job pod creation 2023-07-14 19:09:39 +02:00
Kubernetes Prow Robot
5c72df7281
Merge pull request #118953 from mskrocki/escLib
Convert EndpointSlice Reconciler to a library in staging.
2023-07-13 17:13:34 -07:00
Kubernetes Prow Robot
be2cfc9697
Merge pull request #118228 from carlory/move-non-graceful-node-shutdown-to-GA
move non-graceful node shutdown to GA
2023-07-13 15:47:37 -07:00
Daniel Vega-Myhre
037091284e fix unit test bug 2023-07-13 22:38:21 +00:00
Kubernetes Prow Robot
bea27f82d3
Merge pull request #118209 from pohly/dra-pre-scheduled-pods
dra: pre-scheduled pods
2023-07-13 14:43:37 -07:00
Daniel Vega-Myhre
a1a5f49bb9 remove statefulset label added to wrong branch 2023-07-13 21:07:17 +00:00
Daniel Vega-Myhre
1ae60c0ed1 use job completion index annotation as label 2023-07-13 21:04:37 +00:00
Jiahui Feng
049614f884
ValidatingAdmissionPolicy controller for Type Checking (#117377)
* [API REVIEW] ValidatingAdmissionPolicyStatucController config.

worker count.

* ValidatingAdmissionPolicyStatus controller.

* remove CEL typechecking from API server.

* fix initializer tests.

* remove type checking integration tests

from API server integration tests.

* validatingadmissionpolicy-status options.

* grant access to VAP controller.

* add defaulting unit test.

* generated: ./hack/update-codegen.sh

* add OWNERS for VAP status controller.

* type checking test case.
2023-07-13 13:41:50 -07:00
Patrick Ohly
80ab8f0542 dra: handle scheduled pods in kube-controller-manager
When someone decides that a Pod should definitely run on a specific node, they
can create the Pod with spec.nodeName already set. Some custom scheduler might
do that. Then kubelet starts to check the pod and (if DRA is enabled) will
refuse to run it, either because the claims are still waiting for the first
consumer or the pod wasn't added to reservedFor. Both are things the scheduler
normally does.

Also, if a pod got scheduled while the DRA feature was off in the
kube-scheduler, a pod can reach the same state.

The resource claim controller can handle these two cases by taking over for the
kube-scheduler when nodeName is set. Triggering an allocation is simpler than
in the scheduler because all it takes is creating the right
PodSchedulingContext with spec.selectedNode set. There's no need to list nodes
because that choice was already made, permanently. Adding the pod to
reservedFor also isn't hard.

What's currently missing is triggering de-allocation of claims to re-allocate
them for the desired node. This is not important for claims that get created
for the pod from a template and then only get used once, but it might be
worthwhile to add de-allocation in the future.
2023-07-13 21:27:11 +02:00
Patrick Ohly
cffbb1f1b2 dra controller: enhance testing
The allocation mode is relevant when clearing the reservedFor: for delayed
allocation, deallocation gets requested, for immediate allocation not. Both
should get tested.

All pre-defined claims now use delayed allocation, just as they would if
created normally.
2023-07-13 21:27:11 +02:00
Patrick Ohly
5cec6d798c dra: revamp event handlers in kube-controller-manager
Enabling logging is useful to track what the code is doing.

There are some functional changes:
- The pod handler checks for existence of claims. This
  avoids adding pods to the work queue in more cases
  when nothing needs to be done, at the cost of
  making the event handlers a bit slower. This will become
  more important when adding more work to the controller
- The handler for deleted ResourceClaim did not check for
  cache.DeletedFinalStateUnknown.
2023-07-13 21:27:11 +02:00
Kubernetes Prow Robot
4fa97eae1b
Merge pull request #119291 from mimowo/use-jobctx-for-first-pending
Pass Job context down to firstPendingIndexes
2023-07-13 09:57:05 -07:00
Michal Wozniak
7e3b53042b Pass Job context down to firstPendingIndexes 2023-07-13 16:11:06 +02:00
Kubernetes Prow Robot
cd9215915b
Merge pull request #118480 from carlory/gc_metrics
podgc metrics should count all pod deletion behaviors
2023-07-13 06:52:05 -07:00
Jordan Liggitt
12a874d227
Preserve resourcequota informers for groups with discovery resolution errors only 2023-07-12 12:29:33 -04:00
Jordan Liggitt
c9a084d59c
Fix duplicate GC event handlers getting added if discovery flutters 2023-07-12 12:29:31 -04:00
Patrick Ohly
98ba89d31d resourceclaim controller: avoid caching deleted pod unnecessarily
We don't need to remember that a pod got deleted when it had no resource claims
because the code which checks the cached UIDs only checks for pods which have
resource claims.
2023-07-12 16:57:17 +02:00
Patrick Ohly
7d064812bb kube-controller-manager: finish conversion to contextual logging
This removes all exceptions and fixes the remaining unconverted log calls.
2023-07-12 14:57:29 +02:00
Patrick Ohly
1b8ddf6b79 podgc controller: convert to contextual logging 2023-07-12 13:45:10 +02:00
Mengjiao Liu
19869478c1 Migrate /pkg/controller/disruption to structured and contextual logging 2023-07-12 11:30:45 +08:00
Maciej Skrocki
7c873327b6 Convert controller name to reconciler variable. 2023-07-11 18:08:25 +00:00
Maciej Skrocki
29fad383da move endpointslice reconciler to staging endpointslice repo 2023-07-11 18:08:12 +00:00
Kubernetes Prow Robot
a6890b361d
Merge pull request #119193 from mimowo/sync-job-context
Introduce syncJobContext to limit the number of function parameters
2023-07-11 10:33:30 -07:00
Kubernetes Prow Robot
e0dafe57a3
Merge pull request #117351 from pohly/dra-generated-resource-claim-names
DRA: generated resource claim names
2023-07-11 10:33:11 -07:00
Patrick Ohly
fec25785ee dra: store generated ResourceClaims in cache
This addresses the following bad sequence of events:
- controller creates ResourceClaim
- updating pod status fails
- pod gets retried before the informer receives
  the created ResourceClaim
- another ResourceClaim gets created

Storing the generated ResourceClaim in a MutationCache ensures that the
controller knows about it during the retry.

A positive side effect is that ResourceClaims now get index by pod owner and
thus iterating over existing ones becomes a bit more efficient.
2023-07-11 14:23:49 +02:00
Patrick Ohly
444d23bd2f dra: generated name for ResourceClaim from template
Generating the name avoids all potential name collisions. It's not clear how
much of a problem that was because users can avoid them and the deterministic
names for generic ephemeral volumes have not led to reports from users. But
using generated names is not too hard either.

What makes it relatively easy is that the new pod.status.resourceClaimStatus
map stores the generated name for kubelet and node authorizer, i.e. the
information in the pod is sufficient to determine the name of the
ResourceClaim.

The resource claim controller becomes a bit more complex and now needs
permission to modify the pod status. The new failure scenario of "ResourceClaim
created, updating pod status fails" is handled with the help of a new special
"resource.kubernetes.io/pod-claim-name" annotation that together with the owner
reference identifies exactly for what a ResourceClaim was generated, so
updating the pod status can be retried for existing ResourceClaims.

The transition from deterministic names is handled with a special case for that
recovery code path: a ResourceClaim with no annotation and a name that follows
the Kubernetes <= 1.27 naming pattern is assumed to be generated for that pod
claim and gets added to the pod status.

There's no immediate need for it, but just in case that it may become relevant,
the name of the generated ResourceClaim may also be left unset to record that
no claim was needed. Components processing such a pod can skip whatever they
normally would do for the claim. To ensure that they do and also cover other
cases properly ("no known field is set", "must check ownership"),
resourceclaim.Name gets extended.
2023-07-11 14:23:48 +02:00
Kubernetes Prow Robot
8f1852bb44
Merge pull request #115295 from Namanl2001/pkg/controller/endpointslice
Migrated `pkg/controller/endpointslice` and `pkg/controller/endpointslicemirroring` to contextual logging
2023-07-11 03:19:12 -07:00
Michal Wozniak
bf48165232 Remarks to syncJobCtx 2023-07-11 09:44:08 +02:00
Michal Wozniak
990339d4c3 Introduce syncJobContext to limit the number of function parameters 2023-07-11 09:27:21 +02:00
carlory
f443c458af move non-graceful node shutdown to GA 2023-07-11 13:51:51 +08:00
Kubernetes Prow Robot
986171d388
Merge pull request #119185 from xing-yang/metrics_attach
Add reason to force detach metric
2023-07-10 14:03:18 -07:00
Naman
645cb90732 migrated pkg/controller/endpointslicemirroring to contextual logging
Signed-off-by: Naman <namanlakhwani@gmail.com>
2023-07-11 01:43:30 +05:30
Daniel Vega-Myhre
98c6e25c37 update name of pod index label 2023-07-10 20:11:52 +00:00
Naman
09849b09cf migrated pkg/controller/endpointslice to contextual logging
Signed-off-by: Naman <namanlakhwani@gmail.com>
2023-07-11 01:28:22 +05:30
Kubernetes Prow Robot
10a12165de
Merge pull request #116755 from my-git9/feat/endpoint/logging
Migrated `pkg/controller/endpoint` to contextual logging
2023-07-10 05:37:05 -07:00
Kubernetes Prow Robot
64939b66c6
Merge pull request #119146 from xuexu6666/xuexu6666/ControllerUtilUseCmpDiff
Use cmp diff in controller_util_test.go
2023-07-10 02:41:18 -07:00
xing-yang
cca6601106 Add reason to force detach metric 2023-07-10 06:30:05 +00:00
Aldo Culquicondor
f7a1fb76f4
Only declare job as finished after removing all finalizers
Change-Id: Id4b01b0e6fabe24134e57e687356e0fc613cead4
2023-07-07 14:08:19 -04:00
xuexu6666
d7708e79d3 Use cmp diff 2023-07-06 23:01:06 -05:00
Heba Elayoty
2fe38f93e5
feat: Append job creation timestamp to cronjob annotations (#118137)
* Append job name to job annotations

Signed-off-by: Heba Elayoty <hebaelayoty@gmail.com>

* Update annotation description, remove timezone, and fix time

Signed-off-by: Heba Elayoty <hebaelayoty@gmail.com>

* Remove unused ctx

Signed-off-by: Heba Elayoty <hebaelayoty@gmail.com>

* code review comments

Signed-off-by: Heba Elayoty <hebaelayoty@gmail.com>

* code review comments

Signed-off-by: Heba Elayoty <hebaelayoty@gmail.com>

* Add timezone back

Signed-off-by: Heba Elayoty <hebaelayoty@gmail.com>

---------

Signed-off-by: Heba Elayoty <hebaelayoty@gmail.com>
2023-07-06 14:39:04 -07:00
Daniel Vega-Myhre
3a02ecb341 check test case param instead of feature flag in unit test code 2023-07-06 17:30:40 +00:00
Kubernetes Prow Robot
6f9d1d38d8
Merge pull request #118817 from pohly/dra-delete-claims
DRA: improve handling of completed pods
2023-07-06 10:15:15 -07:00
Kubernetes Prow Robot
7e5506de8d
Merge pull request #119111 from kannon92/remove-equal-ready-job
remove equalReady and replace with k8 util function
2023-07-06 09:13:16 -07:00
Ziqi Zhao
dfc1838379 Migrated pkg/controller/volume|util|replicaset|nodeipam to contextual logging
Signed-off-by: Ziqi Zhao <zhaoziqi9146@gmail.com>
2023-07-06 07:39:52 +08:00
xin.li
6c0387d004 Migrated pkg/controller/endpoint to contextual logging
Signed-off-by: xin.li <xin.li@daocloud.io>
2023-07-06 07:36:51 +08:00
xin.li
3cf2822bc5 Migrated pkg/controller/garbagecollector to contextual logging
Signed-off-by: xin.li <xin.li@daocloud.io>
2023-07-06 07:36:51 +08:00
Kubernetes Prow Robot
916c3466b9
Merge pull request #118940 from soltysh/drop_missedschedules
Hide numberOfMissedSchedules as an algorithm internal number
2023-07-05 16:27:02 -07:00
kannon92
921b7e6e8f remove equalReady and replace with k8 util function 2023-07-05 20:11:48 +00:00
Daniel Vega-Myhre
a647f9febb default enabled pod index for test cases, add test case disabling it 2023-07-05 18:47:45 +00:00