Commit Graph

6042 Commits

Author SHA1 Message Date
Viacheslav Panasovets
6adf60fdf4
Do not create endpoints if service of type ExternalName (#114814) 2023-01-18 03:12:34 -08:00
Kubernetes Prow Robot
46f3821bf4
Merge pull request #114586 from andrewsykim/apiserver-lease-rename
Rename apiserver identity lease labels to apiserver.kubernetes.io/identity
2023-01-17 21:36:34 -08:00
Kubernetes Prow Robot
5550064bc2
Merge pull request #115063 from kannon92/tracking-remove-comments
tracking with finalizers is the default way for the job controller so comments are not needed that say we are tracking with finalizers
2023-01-17 07:56:44 -08:00
Rahul Rangith
2f0eb543c7 New function for creating missing pvcs 2023-01-17 10:21:42 -05:00
Rahul Rangith
392cd5ce8c Make e2e test not rely on local volumes 2023-01-17 10:21:42 -05:00
Rahul Rangith
3cf636b22e PR feedback 2023-01-17 10:21:41 -05:00
Rahul Rangith
c1cc18ccd5 Automatically recreate pvc when sts pod is stuck in pending 2023-01-17 10:21:41 -05:00
Kubernetes Prow Robot
7b01daba71
Merge pull request #115074 from yangjunmyfm192085/deleteklogv0-controller
use klog instead of klog.V(0)--controller manager part
2023-01-16 09:58:50 -08:00
Kubernetes Prow Robot
ed8cad1e80
Merge pull request #115056 from mimowo/podgc-do-not-add-condition-for-terminated-pods
PodGC should not add DisruptionTarget condition for pods which are in terminal phase
2023-01-16 03:04:50 -08:00
JunYang
29086e2b04 use klog instead of klog.V(0) 2023-01-14 21:15:50 +08:00
Andrew Sy Kim
3da0f1809c apiserver: update lease label key to apiserver.kubernetes.io/identity
Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
2023-01-13 15:37:22 -05:00
Kubernetes Prow Robot
9af5ae0365
Merge pull request #115030 from kannon92/remove-pod-error-job-tracking
Update SyncJob with PodControllerError updates in job unit tests
2023-01-13 12:08:14 -08:00
Kubernetes Prow Robot
70217a4083
Merge pull request #114944 from mimowo/fix-active-deadline-test
Fix the job controller unit test for enforcing ActiveDeadlineSeconds
2023-01-13 10:46:26 -08:00
Michal Wozniak
3833c0c349 PodGC should not add DisruptionTarget condition for pods which are in terminal phase 2023-01-13 18:28:44 +01:00
kannon92
4890928b78 tracking with finalizers is the default way for the job controller 2023-01-13 16:48:35 +00:00
kannon92
3a838033f8 Update SyncJob with PodControllerError updates in job unit tests 2023-01-13 16:39:18 +00:00
Michal Wozniak
7065b42bb2 Fix the job controller unit test for enforcing ActiveDeadlineSeconds 2023-01-13 16:48:15 +01:00
Kubernetes Prow Robot
c0c386b9c9
Merge pull request #114516 from nikhita/job-backoff-fix
pkg/controller/job: re-honor exponential backoff delay
2023-01-13 07:36:40 -08:00
Kubernetes Prow Robot
1b8692ce46
Merge pull request #114296 from cbroglie/concurrent-monitor-node-health
controller/nodelifecycle: Make monitorNodeHealth process nodes concurrently
2023-01-12 12:42:54 -08:00
Nikhita Raghunath
fd8d92a29d pkg/controller/job: re-honor exponential backoff
This commit makes the job controller re-honor exponential backoff for
failed pods. Before this commit, the controller created pods without any
backoff. This is a regression because the controller used to
create pods with an exponential backoff delay before (10s, 20s, 40s ...).

The issue occurs only when the JobTrackingWithFinalizers feature is
enabled (which is enabled by default right now). With this feature, we
get an extra pod update event when the finalizer of a failed pod is
removed.

Note that the pod failure detection and new pod creation happen in the
same reconcile loop so the 2nd pod is created immediately after the 1st
pod fails. The backoff is only applied on 2nd pod failure, which means
that the 3rd pod created 10s after the 2nd pod, 4th pod is created 20s
after the 3rd pod and so on.

This commit fixes a few bugs:

1. Right now, each time `uncounted != nil` and the job does not see a
_new_ failure, `forget` is set to true and the job is removed from the
queue. Which means that this condition is also triggered each time the
finalizer for a failed pod is removed and `NumRequeues` is reset, which
results in a backoff of 0s.

2. Updates `updatePod` to only apply backoff when we see a particular
pod failed for the first time. This is necessary to ensure that the
controller does not apply backoff when it sees a pod update event
for finalizer removal of a failed pod.

3. If `JobsReadyPods` feature is enabled and backoff is 0s, the job is
now enqueued after `podUpdateBatchPeriod` seconds, instead of 0s. The
unit test for this check also had a few bugs:
    - `DefaultJobBackOff` is overwritten to 0 in certain unit tests,
    which meant that `DefaultJobBackOff` was considered to be 0,
    effectively not running any meaningful checks.
    - `JobsReadyPods` was not enabled for test cases that ran tests
    which required the feature gate to be enabled.
    - The check for expected and actual backoff had incorrect
    calculations.
2023-01-12 20:34:10 +05:30
Christopher Broglie
3c88de52c8 controller/nodelifecycle: Make monitorNodeHealth process nodes concurrently
Marking the pods not ready on a node requires looping over them and
updating each pod's status one at a time. This is performed serially,
and can take a while if we're processing each node serially as well.

Since the time is spent waiting on io, there's an opportunity to go
faster by processing multiple nodes concurrently. This change modifies
the loop to process nodes in parallel, using the same number of workers
as doNodeProcessingPassWorker.

This change also introduces histogram metrics to better observe
monitorNodeHealth.
2023-01-11 12:34:39 -08:00
kannon92
6dfaeff33c Remove Legacy Job Tracking 2023-01-10 14:52:54 +00:00
Kubernetes Prow Robot
e7549eae87
Merge pull request #114905 from kannon92/sync-job-test-fix
Fix SyncPastDeadlineJobFinished for enabling finalizer path
2023-01-09 12:47:28 -08:00
kannon92
0362c67859 Fix SyncPastDeadlineJobFinished for enabling finalizer path 2023-01-09 17:12:52 +00:00
Aldo Culquicondor
4c1b95ddfa
Ensure job is up to date in informer cache in test
The fake client doesn't guarantee that the informer cache is updated.
If it's not up-to-date, the controller always tries to set the
StartTime, leading to a broken test.

Change-Id: I71f26d46ea44beff88f0d03517985348654aec95
2023-01-09 10:53:19 -05:00
Kubernetes Prow Robot
901c1de5ea
Merge pull request #114870 from mattcary/mutation
Avoid mutation of PVC in stateful set controller shared cache
2023-01-05 23:16:09 -08:00
Matthew Cary
ed18ab54ba Avoid mutation of PVC in stateful set controller shared cache
Change-Id: Ieb8e443e460150d16524ca1c1fb3770f546b2c28
2023-01-05 18:09:05 -08:00
Kubernetes Prow Robot
492637878f
Merge pull request #111660 from pacoxu/key-encipherment-v1.26
Key encipherment usage  v1.27
2023-01-04 15:51:57 -08:00
Michal Wozniak
c3d0e8ff05 Fix clearing rate limiter in disruption controller 2023-01-03 15:06:06 +01:00
Kushagra
80384bbb55 spelling mistake rectified 2022-12-29 17:55:17 +00:00
Kushagra
f380ef8b61 Misleading message when there are no metrics. 2022-12-29 10:57:43 +00:00
Paco Xu
160f015ef4 kubelet: add key encipherment usage only if it is rsa key
remove allowOmittingUsageKeyEncipherment as it is always true

Signed-off-by: Paco Xu <paco.xu@daocloud.io>
2022-12-27 16:04:25 +08:00
Kubernetes Prow Robot
45f14a93f1
Merge pull request #113787 from gjkim42/update-daemonset-status-despite-error
Update daemonSet status even if syncDaemonSet fails
2022-12-22 15:49:25 -08:00
Kubernetes Prow Robot
d1c715a982
Merge pull request #113834 from atiratree/sts-handle-delete-pod-error
statefulset: handle API error on pod deletion
2022-12-22 08:17:26 -08:00
Harsha Narayana
208c3868cf
job controller: refactored job controller to be able to inject FakeClock for Unit Test 2022-12-20 21:29:24 +05:30
Jordan Liggitt
78cb3862f1
Fix indentation/spacing in comments to render correctly in godoc 2022-12-17 23:27:38 -05:00
Kubernetes Prow Robot
7f7bf68c7c
Merge pull request #111178 from lucming/cleanup
clean up code
2022-12-16 19:17:52 -08:00
Kubernetes Prow Robot
9edd4d86c8
Merge pull request #114522 from zshihang/master
lock LegacyServiceAccountTokenNoAutoGeneration
2022-12-16 14:24:32 -08:00
Kubernetes Prow Robot
f11e9aaf10
Merge pull request #113929 from howardjohn/endpointslice/use-optimized-set
endpoints: remove obsolete ServiceSelectorCache
2022-12-16 14:24:09 -08:00
Shihang Zhang
4fd09a06d6 lock LegacyServiceAccountTokenNoAutoGeneration 2022-12-16 10:45:35 -08:00
Daniel Smith
8100efc7b3 Enable propagration of HasSynced
* Add tracker types and tests
* Modify ResourceEventHandler interface's OnAdd member
* Add additional ResourceEventHandlerDetailedFuncs struct
* Fix SharedInformer to let users track HasSynced for their handlers
* Fix in-tree controllers which weren't computing HasSynced correctly
* Deprecate the cache.Pop function
2022-12-14 18:43:33 +00:00
Kubernetes Prow Robot
741bd5c382
Merge pull request #113947 from mowangdk/chore/change_adcontroller_log_level
Lower volume attached touch log level
2022-12-12 17:41:51 -08:00
John Howard
d9f2cc0c95 endpoints: remove obsolete ServiceSelectorCache
Since https://github.com/kubernetes/kubernetes/pull/112648, we can
efficiently handle selectors from pre-existing `map[string]string`,
making the cache obsolete.

Benchmark:

```
name                         old time/op    new time/op    delta
GetPodServiceMemberships-48     189µs ± 1%     193µs ± 1%  +2.10%  (p=0.000 n=10+10)

name                         old alloc/op   new alloc/op   delta
GetPodServiceMemberships-48    59.0kB ± 0%    58.9kB ± 0%  -0.09%  (p=0.000 n=9+9)

name                         old allocs/op  new allocs/op  delta
GetPodServiceMemberships-48     1.02k ± 0%     1.02k ± 0%    ~     (all equal)
```
2022-12-12 08:00:48 -08:00
Kubernetes Prow Robot
2118bc8aec
Merge pull request #114155 from aojea/mirroring_repack
endpointslicemirroring handle endpoints with multiple subsets
2022-12-10 07:53:42 -08:00
Kubernetes Prow Robot
9303ea836f
Merge pull request #114076 from akhilles/remove-unused-var
Remove unused `numExistingEndpoints` variable
2022-12-10 06:04:25 -08:00
Kubernetes Prow Robot
92ffe94592
Merge pull request #114033 from Octopusjust/k8s-pr14
pkg/controller/deployment/util/deployment_util.go:Improving test cove…
2022-12-10 06:03:55 -08:00
Antonio Ojea
ef6d9edea5 endpointslicemirroring handle endpoints with multiple subsets
Endpoints generated by the endpoints controller are in the canonical
form, however, custom endpoints can not be in canonical format
(there was a time they were canonicalized in the apiserver, but this
caused performance issues because the endpoint controller kept
updating them since the created endpoint were different than the
stored one due to the canonicalization)

There are cases where a custom endpoint may generate multiple slices
due to the controller, per example, when the same address is present
in different subsets.

The endpointslice mirroring controller should canonicalize the
endpoints subsets before start processing them to be consistent
on the slices generated, there is no risk of hotlooping because
the endpoint is only used as input.

Change-Id: I2a8cd53c658a640aea559a88ce33e857fa98cc5c
2022-12-10 11:44:10 +00:00
Kubernetes Prow Robot
0cd13e573c
Merge pull request #113196 from mimowo/job-controller-reviewer
Self-nominate mimowo as a reviewer for pkg/controller/job & test/integration/job packages
2022-12-10 02:01:39 -08:00
Gunju Kim
69fcde750a
Update daemonSet status even if syncDaemonSet fails
This ensures that the daemonset controller updates daemonset statuses in
a best-effort manner even if syncDaemonSet fails.

In order to add an integration test, this also replaces
`cmd/kube-apiserver/app/testing.StartTestServer` with
`test/integration/framework.StartTestServer` and adds
`setupWithServerSetup` to configure the admission control of the
apiserver.
2022-12-10 11:45:56 +09:00
Kubernetes Prow Robot
63a01a5465
Merge pull request #112260 from aryan9600/cidr-metrics
Add metric for max no. of CIDRs available
2022-12-09 15:42:59 -08:00
Kubernetes Prow Robot
da3d98277b
Merge pull request #111839 from ialidzhikov/cleanup/pkg-controller
pkg/controller: Replace deprecated func usage from the `k8s.io/utils/pointer` pkg
2022-12-09 15:42:37 -08:00
Kubernetes Prow Robot
4557c694ef
Merge pull request #111683 from lucming/code-cleanup5
reorganize some logic of controller_utils.go
2022-12-09 15:42:21 -08:00
Kubernetes Prow Robot
5fe12aae11
Merge pull request #111207 from lucming/code-cleanup2
Reduce indentation in daemonset controller code
2022-12-09 15:41:41 -08:00
Sanskar Jaiswal
b501d6036a add metric for max no. of CIDRs that can be allocated from MultiCIDRSet
Signed-off-by: Sanskar Jaiswal <jaiswalsanskar078@gmail.com>
2022-12-05 15:18:45 +00:00
Sanskar Jaiswal
37f4d4624b add metric for max no. of CIDRs that can be allocated from CidrSet
Signed-off-by: Sanskar Jaiswal <jaiswalsanskar078@gmail.com>
2022-12-05 15:18:45 +00:00
ialidzhikov
aede3fbf40 pkg/controller: Replace deprecated func usage from the k8s.io/utils/pointer pkg 2022-11-23 17:40:23 +02:00
songxiao-wang87
9ae5af4b6a StorageVersionGC logger
Signed-off-by: songxiao-wang87 <wang.xiaosong23@zte.com.cn>
2022-11-23 03:20:12 +00:00
Akhil Velagapudi
70d31ea917 Remove unused numExistingEndpoints variable 2022-11-23 00:54:50 +00:00
ZhangYu
d61849e800 pkg/controller/deployment/util/deployment_util.go:Improving test coverage 2022-11-22 10:12:58 +08:00
mowangdk
bf244d3046 Lower volume attached touch log level 2022-11-16 16:49:07 +08:00
Michelle Au
524a8b32a6 add sig-storage reviewers, remove inactive sig-storage reviewers, remove redundant owners files 2022-11-15 23:51:57 +00:00
Aldo Culquicondor
7dc36bdf82
Wait for Pods to finish before considering Failed in Job (#113860)
* Wait for Pods to finish before considering Failed

Limit behavior to feature gates PodDisruptionConditions and
JobPodFailurePolicy and jobs with a podFailurePolicy.

Change-Id: I926391cc2521b389c8e52962afb0d4a6a845ab8f

* Remove check for unsheduled terminating pod

Change-Id: I3dc05bb4ea3738604f01bf8cb5fc8cc0f6ea54ec
2022-11-15 09:44:53 -08:00
Kubernetes Prow Robot
84a55ad8d2
Merge pull request #113147 from andrewsykim/storageversiongc-controller-tests
add unit tests for storageversiongc controller
2022-11-14 10:56:41 -08:00
Michal Wozniak
a910ca563b Fix race conditions 2022-11-14 10:11:26 +01:00
Michal Wozniak
3b5c3acd61 Improve stability if the taint_manager tests 2022-11-13 19:40:18 +01:00
Kubernetes Prow Robot
d1c0171aed
Merge pull request #111023 from pohly/dynamic-resource-allocation
dynamic resource allocation
2022-11-11 16:21:56 -08:00
Aldo Culquicondor
bc5afaf580
Fix match onExitCodes when Pod is not terminated
Change-Id: Id1f9c46f8b6a12115577a1fadb12adc580c9ba6a
2022-11-11 10:05:11 -05:00
Kubernetes Prow Robot
d7bff1c809
Merge pull request #111577 from brianpursley/troubleshoot-unit-test-flake
Add logging for reconciler unit test
2022-11-11 00:44:09 -08:00
Filip Křepinský
ec0b200f3d statefulset: handle API error on pod deletion
when new revision is being rolled out
2022-11-10 22:47:23 +01:00
Patrick Ohly
0133df3929 kube-controller-manager: add ResourceClaim controller
The controller uses the exact same logic as the generic ephemeral inline volume
controller, just for inline ResourceClaimTemplate -> ResourceClaim.

In addition, it supports removal of pods from the ReservedFor field when those
pods are known to not need the claim anymore. At the moment, only this special
case is supported. Removal of arbitrary objects would imply granting full read
access to all types to determine whether a) an object is gone and b) if the
current incarnation is the one which is listed in ReservedFor. This may get
added later.
2022-11-10 20:23:50 +01:00
Patrick Ohly
b87530af4f kube-controller-manager: clone resource controller from volume/ephemeral 2022-11-10 20:23:50 +01:00
Andrew Sy Kim
dba7740115 pkg/controller/storageversiongc: add constructor function newKubeApiserverLease
Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
2022-11-09 15:52:47 -05:00
Kubernetes Prow Robot
ff19efdf9b
Merge pull request #112744 from pwschuurman/statefulset-slice-impl
Add implementation of KEP-3335, StatefulSetSlice
2022-11-09 11:12:28 -08:00
wangxiaojian
02db35ab1c
Optimize case conditions 2022-11-10 00:49:20 +08:00
Andrew Sy Kim
1320adc83f pkg/controller/storageversiongc: add comments for Test_StorageVersionUpdatedWithAllEncodingVersionsEqualOnLeaseDeletion, Test_StorageVersionUpdatedWithDifferentEncodingVersionsOnLeaseDeletion, Test_StorageVersionContainsInvalidLeaseID, and Test_StorageVersionDeletedOnLeaseDeletion
Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
2022-11-09 11:49:07 -05:00
Andrew Sy Kim
2fb8329eee pkg/controller/storageversiongc: add unit tests for storageversiongc controller
Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
2022-11-09 11:48:20 -05:00
Michal Wozniak
c803892bd8 Enable the feature into beta 2022-11-09 09:02:40 +01:00
Kubernetes Prow Robot
694698ca38
Merge pull request #110485 from Octopusjust/k8s-pr
cidr_set.go :  fix several typo
2022-11-08 13:51:00 -08:00
Peter Schuurman
9258cb4041 Fix typo in function emptyInvariants() 2022-11-08 07:48:10 -08:00
Peter Schuurman
366997951b Update doc comments and change name of feature gate 2022-11-08 07:48:10 -08:00
Peter Schuurman
8a9c126eca Small updates and comment fixes 2022-11-08 07:48:09 -08:00
Peter Schuurman
7b3d77a41a Adding implementation of KEP-3335, StatefulSetSlice 2022-11-08 07:48:00 -08:00
Maciej Szulik
3c93d540c6
Revert "Update daemonSet status even if syncDaemonSet fails"
This reverts commit 2ee024a4df.
2022-11-08 15:01:09 +01:00
Kubernetes Prow Robot
aef9a37df9
Merge pull request #113010 from soltysh/promote_job_metrics
Promote job metrics
2022-11-08 03:16:32 -08:00
Kubernetes Prow Robot
3451501c2e
Merge pull request #112737 from gjkim42/cleanup-defer-from-sts
StatefulSet: Cleanup the complex defer function updating the status
2022-11-08 03:16:21 -08:00
Kubernetes Prow Robot
0e530f44af
Merge pull request #113544 from LiorLieberman/topology-hints-events
Added: publishing events for topologyAwareHints changes
2022-11-07 16:01:15 -08:00
Kubernetes Prow Robot
47952e0917
Merge pull request #112360 from mimowo/handling-pod-failures-beta-kubelet
Add pod disruption conditions for kubelet-initiated failures
2022-11-07 16:00:40 -08:00
Kubernetes Prow Robot
b4f42864f5
Merge pull request #112127 from gjkim42/update-status-despite-error
Update daemonSet status even if syncDaemonSet fails
2022-11-07 16:00:28 -08:00
Gunju Kim
6559050ee1
StatefulSet: Cleanup the complex defer function updating the status
In the long term, the complex defer function makes the code harder to
maintain as code after it should take that into account. This removes
the complex defer function updating the status of a statefulset.
2022-11-08 08:39:42 +09:00
Kubernetes Prow Robot
1c230d519e
Merge pull request #113262 from jsafrane/rework-reconstruction
Rework volume reconstruction
2022-11-07 12:42:29 -08:00
Lior Lieberman
4faede03fa Added events publishing for topologyHints changes 2022-11-07 19:45:40 +00:00
Maciej Szulik
39d9981dc2
Promote job-related metrics to stable 2022-11-07 19:28:40 +01:00
Kubernetes Prow Robot
ac95e5b701
Merge pull request #113510 from alculquicondor/finalizers-stable
Graduate JobTrackingWithFinalizers to stable
2022-11-07 08:06:41 -08:00
Michal Wozniak
52cd6755eb Add pod disruption conditions for kubelet initiated failures 2022-11-07 11:23:22 +01:00
Kubernetes Prow Robot
c519bc02e8
Merge pull request #112011 from pbeschetnov/ambiguous-selectors
Add ambiguous selector check to HPA
2022-11-06 21:08:16 -08:00
Aldo Culquicondor
4948918155
Graduate JobTrackingWithFinalizers to stable
Change-Id: Ifc749a85b1270c0155ac511b91d4681d53236820
2022-11-04 17:05:53 -04:00
Kubernetes Prow Robot
b20ddbd75a
Merge pull request #113351 from andrewsykim/endpointslice-terminating-ga
Promote EndpointSliceTerminatingCondition to GA
2022-11-04 09:36:39 -07:00
Kubernetes Prow Robot
ead17f3dc8
Merge pull request #113008 from soltysh/promote_cronjob_metrics
Promote cronjob_job_creation_skew metric to stable
2022-11-04 09:36:27 -07:00
Kubernetes Prow Robot
20ffe3bbf9
Merge pull request #111607 from tnqn/reduce-redundant-index
Remove duplicate and unused index from PodIndexer
2022-11-04 09:36:16 -07:00
Maciej Szulik
4af97e599a
Promote cronjob_job_creation_skew metric to stable 2022-11-04 13:55:32 +01:00