Commit Graph

6217 Commits

Author SHA1 Message Date
Kubernetes Prow Robot
4fa97eae1b
Merge pull request #119291 from mimowo/use-jobctx-for-first-pending
Pass Job context down to firstPendingIndexes
2023-07-13 09:57:05 -07:00
Michal Wozniak
7e3b53042b Pass Job context down to firstPendingIndexes 2023-07-13 16:11:06 +02:00
Kubernetes Prow Robot
cd9215915b
Merge pull request #118480 from carlory/gc_metrics
podgc metrics should count all pod deletion behaviors
2023-07-13 06:52:05 -07:00
Patrick Ohly
7d064812bb kube-controller-manager: finish conversion to contextual logging
This removes all exceptions and fixes the remaining unconverted log calls.
2023-07-12 14:57:29 +02:00
Patrick Ohly
1b8ddf6b79 podgc controller: convert to contextual logging 2023-07-12 13:45:10 +02:00
Mengjiao Liu
19869478c1 Migrate /pkg/controller/disruption to structured and contextual logging 2023-07-12 11:30:45 +08:00
Kubernetes Prow Robot
a6890b361d
Merge pull request #119193 from mimowo/sync-job-context
Introduce syncJobContext to limit the number of function parameters
2023-07-11 10:33:30 -07:00
Kubernetes Prow Robot
e0dafe57a3
Merge pull request #117351 from pohly/dra-generated-resource-claim-names
DRA: generated resource claim names
2023-07-11 10:33:11 -07:00
Patrick Ohly
fec25785ee dra: store generated ResourceClaims in cache
This addresses the following bad sequence of events:
- controller creates ResourceClaim
- updating pod status fails
- pod gets retried before the informer receives
  the created ResourceClaim
- another ResourceClaim gets created

Storing the generated ResourceClaim in a MutationCache ensures that the
controller knows about it during the retry.

A positive side effect is that ResourceClaims now get index by pod owner and
thus iterating over existing ones becomes a bit more efficient.
2023-07-11 14:23:49 +02:00
Patrick Ohly
444d23bd2f dra: generated name for ResourceClaim from template
Generating the name avoids all potential name collisions. It's not clear how
much of a problem that was because users can avoid them and the deterministic
names for generic ephemeral volumes have not led to reports from users. But
using generated names is not too hard either.

What makes it relatively easy is that the new pod.status.resourceClaimStatus
map stores the generated name for kubelet and node authorizer, i.e. the
information in the pod is sufficient to determine the name of the
ResourceClaim.

The resource claim controller becomes a bit more complex and now needs
permission to modify the pod status. The new failure scenario of "ResourceClaim
created, updating pod status fails" is handled with the help of a new special
"resource.kubernetes.io/pod-claim-name" annotation that together with the owner
reference identifies exactly for what a ResourceClaim was generated, so
updating the pod status can be retried for existing ResourceClaims.

The transition from deterministic names is handled with a special case for that
recovery code path: a ResourceClaim with no annotation and a name that follows
the Kubernetes <= 1.27 naming pattern is assumed to be generated for that pod
claim and gets added to the pod status.

There's no immediate need for it, but just in case that it may become relevant,
the name of the generated ResourceClaim may also be left unset to record that
no claim was needed. Components processing such a pod can skip whatever they
normally would do for the claim. To ensure that they do and also cover other
cases properly ("no known field is set", "must check ownership"),
resourceclaim.Name gets extended.
2023-07-11 14:23:48 +02:00
Kubernetes Prow Robot
8f1852bb44
Merge pull request #115295 from Namanl2001/pkg/controller/endpointslice
Migrated `pkg/controller/endpointslice` and `pkg/controller/endpointslicemirroring` to contextual logging
2023-07-11 03:19:12 -07:00
Michal Wozniak
bf48165232 Remarks to syncJobCtx 2023-07-11 09:44:08 +02:00
Michal Wozniak
990339d4c3 Introduce syncJobContext to limit the number of function parameters 2023-07-11 09:27:21 +02:00
Kubernetes Prow Robot
986171d388
Merge pull request #119185 from xing-yang/metrics_attach
Add reason to force detach metric
2023-07-10 14:03:18 -07:00
Naman
645cb90732 migrated pkg/controller/endpointslicemirroring to contextual logging
Signed-off-by: Naman <namanlakhwani@gmail.com>
2023-07-11 01:43:30 +05:30
Naman
09849b09cf migrated pkg/controller/endpointslice to contextual logging
Signed-off-by: Naman <namanlakhwani@gmail.com>
2023-07-11 01:28:22 +05:30
Kubernetes Prow Robot
10a12165de
Merge pull request #116755 from my-git9/feat/endpoint/logging
Migrated `pkg/controller/endpoint` to contextual logging
2023-07-10 05:37:05 -07:00
Kubernetes Prow Robot
64939b66c6
Merge pull request #119146 from xuexu6666/xuexu6666/ControllerUtilUseCmpDiff
Use cmp diff in controller_util_test.go
2023-07-10 02:41:18 -07:00
xing-yang
cca6601106 Add reason to force detach metric 2023-07-10 06:30:05 +00:00
Aldo Culquicondor
f7a1fb76f4
Only declare job as finished after removing all finalizers
Change-Id: Id4b01b0e6fabe24134e57e687356e0fc613cead4
2023-07-07 14:08:19 -04:00
xuexu6666
d7708e79d3 Use cmp diff 2023-07-06 23:01:06 -05:00
Heba Elayoty
2fe38f93e5
feat: Append job creation timestamp to cronjob annotations (#118137)
* Append job name to job annotations

Signed-off-by: Heba Elayoty <hebaelayoty@gmail.com>

* Update annotation description, remove timezone, and fix time

Signed-off-by: Heba Elayoty <hebaelayoty@gmail.com>

* Remove unused ctx

Signed-off-by: Heba Elayoty <hebaelayoty@gmail.com>

* code review comments

Signed-off-by: Heba Elayoty <hebaelayoty@gmail.com>

* code review comments

Signed-off-by: Heba Elayoty <hebaelayoty@gmail.com>

* Add timezone back

Signed-off-by: Heba Elayoty <hebaelayoty@gmail.com>

---------

Signed-off-by: Heba Elayoty <hebaelayoty@gmail.com>
2023-07-06 14:39:04 -07:00
Kubernetes Prow Robot
6f9d1d38d8
Merge pull request #118817 from pohly/dra-delete-claims
DRA: improve handling of completed pods
2023-07-06 10:15:15 -07:00
Kubernetes Prow Robot
7e5506de8d
Merge pull request #119111 from kannon92/remove-equal-ready-job
remove equalReady and replace with k8 util function
2023-07-06 09:13:16 -07:00
Ziqi Zhao
dfc1838379 Migrated pkg/controller/volume|util|replicaset|nodeipam to contextual logging
Signed-off-by: Ziqi Zhao <zhaoziqi9146@gmail.com>
2023-07-06 07:39:52 +08:00
xin.li
6c0387d004 Migrated pkg/controller/endpoint to contextual logging
Signed-off-by: xin.li <xin.li@daocloud.io>
2023-07-06 07:36:51 +08:00
xin.li
3cf2822bc5 Migrated pkg/controller/garbagecollector to contextual logging
Signed-off-by: xin.li <xin.li@daocloud.io>
2023-07-06 07:36:51 +08:00
Kubernetes Prow Robot
916c3466b9
Merge pull request #118940 from soltysh/drop_missedschedules
Hide numberOfMissedSchedules as an algorithm internal number
2023-07-05 16:27:02 -07:00
kannon92
921b7e6e8f remove equalReady and replace with k8 util function 2023-07-05 20:11:48 +00:00
carlory
322da7c1aa fix gc metrics 2023-07-06 02:06:03 +08:00
Kubernetes Prow Robot
91698fe900
Merge pull request #114061 from Octopusjust/k8s-pr15
testutil: use contextual logging
2023-07-05 08:38:57 -07:00
Maciej Szulik
1240a29af9
Hide numberOfMissedSchedules as an algorithm internal number 2023-07-05 16:58:28 +02:00
Patrick Ohly
a514f40131 dra resourceclaim controller: delete generated claims when pod is done
When a pod is done, but not getting removed yet for while, then a claim that
got generated for that pod can be deleted already. This then also triggers
deallocation.
2023-07-05 16:10:20 +02:00
Patrick Ohly
e8a0c42212 dra resourceclaim controller: remove reservation for completed pods
When a pod is known to never run (again), the reservation for it also can be
removed. This is relevant in particular for the job controller.
2023-07-05 16:10:20 +02:00
Patrick Ohly
7f5a02fc7e dra resourceclaim controller: enhance logging
Adding logging to event handlers makes it more obvious why (or why not) claims
and pods need to be processed.
2023-07-05 16:10:20 +02:00
Patrick Ohly
d1ba893ad8 dra resourceclaim controller: refactor isPodDone
This covers pods that get deleted before running and will be used more than
once soon.
2023-07-05 16:09:41 +02:00
Kubernetes Prow Robot
229dd79efd
Merge pull request #117865 from aleksandra-malinowska/parallel-sts-3
Parallel StatefulSet pod create & delete
2023-07-03 10:16:51 -07:00
Kubernetes Prow Robot
0a82bdbfdb
Merge pull request #118173 from huiwq1990/feat-autoscale-variable
hpa: cleanup `currentReplicas` code
2023-07-02 23:00:50 -07:00
Kubernetes Prow Robot
ec87834bae
Merge pull request #118936 from pohly/dra-deallocate-when-unused
DRA: for delayed allocation, deallocate when no longer used
2023-07-01 12:56:48 -07:00
Kubernetes Prow Robot
52b1247b28
Merge pull request #118232 from luckymrwang/style
style: correct the sentence
2023-06-30 01:51:59 -07:00
Kubernetes Prow Robot
9af93df9b0
Merge pull request #117845 from ctripcloud/fix-hpa-plain-calc
fix HPA plain metric calculate
2023-06-30 01:51:47 -07:00
Kubernetes Prow Robot
68b9ccc511
Merge pull request #117554 from yanggangtony/clean-endpoint-controller
clean endpoint controller typo logs
2023-06-29 16:23:44 -07:00
Patrick Ohly
1b47e6433b dra delayed allocation: deallocate when a pod is done
This releases the underlying resource sooner and ensures that another consumer
can get scheduled without being influenced by a decision that was made for the
previous consumer.

An alternative would have been to have the apiserver trigger the deallocation
whenever it sees the `status.reservedFor` getting reduced to zero. But that
then also triggers deallocation when kube-scheduler removes the last
reservation after a failed scheduling cycle. In that case we want to keep the
claim allocated and let the kube-scheduler decide on a case-by-case basis which
claim should get deallocated.
2023-06-29 09:47:30 +02:00
Aleksandra Malinowska
d616cf72a3 Add unit tests for parallel StatefulSet create & delete 2023-06-28 16:55:38 +02:00
Kubernetes Prow Robot
74bd77a9df
Merge pull request #118917 from kmala/daemonsetfix
increase the log level for the GetTargetNodeName error message
2023-06-28 00:08:32 -07:00
Kubernetes Prow Robot
960830bc66
Merge pull request #118102 from RomanBednar/retro-sc-assignment-ga
graduate RetroactiveDefaultStorageClass feature to GA in 1.28
2023-06-27 20:46:32 -07:00
yanggang
860aab842d
fix a reference to the wrong variable name
Signed-off-by: yanggang <gang.yang@daocloud.io>
2023-06-28 06:12:44 +08:00
Aldo Culquicondor
a4519665fe
Skip terminal Pods with a deletion timestamp from the Daemonset sync (#118716)
* Skip terminal Pods with a deletion timestamp from the Daemonset sync

Change-Id: I64a347a87c02ee2bd48be10e6fff380c8c81f742

* Review comments and fix integration test

Change-Id: I3eb5ec62bce8b4b150726a1e9b2b517c4e993713

* Include deleted terminal pods in history

Change-Id: I8b921157e6be1c809dd59f8035ec259ea4d96301
2023-06-27 08:56:33 -07:00
Maciej Szulik
af1c9e49c4
Update schedule logic to properly calculate missed schedules
Before this change we've assumed a constant time between schedule runs,
which is not true for cases like "30 6-16/4 * * 1-5".
The fix is to calculate the potential next run using the fixed schedule
as the baseline, and then go back one schedule back and allow the cron
library to calculate the correct time.

This approach saves us from iterating multiple times between last
schedule time and now, if the cronjob for any reason wasn't running for
significant amount of time.
2023-06-27 11:29:30 +02:00
Keerthan Reddy Mala
0033f65808 increase the log level for the GetTargetNodeName error message 2023-06-26 17:31:50 -07:00