Commit Graph

5992 Commits

Author SHA1 Message Date
Sarvesh Rangnekar
a8f120b76c Fix the delete flow for ClusterCIDR objects
Fixes the deletion of ClusterCIDR object, when a Node is associated(has
Pod CIDRs allocated from this ClusterCIDR) with it. Currently the
ClusterCIDR finalizer is never cleaned up as there is no reconciliation
happening after the associated Node has been deleted. This commit fixes
the issue by adding workitems from all events to a worker queue and
reconcile until the delete is successful.
2023-01-30 19:35:41 +00:00
Kubernetes Prow Robot
ad2a9f2f33
Merge pull request #113863 from msau42/owners
update sig-storage owners
2023-01-30 10:10:50 -08:00
Patrick Ohly
bc6c7fa912 logging: fix names of keys
The stricter checking with the upcoming logcheck v0.4.1 pointed out these names
which don't comply with our recommendations in
https://github.com/kubernetes/community/blob/master/contributors/devel/sig-instrumentation/migration-to-structured-logging.md#name-arguments.
2023-01-23 14:24:29 +01:00
Kubernetes Prow Robot
c63434aaff
Merge pull request #110838 from soltysh/cronjob_improvements
CronJob controller cleanups
2023-01-18 09:44:34 -08:00
Maciej Szulik
be44d67566
Re-use common parts between getNextScheduleTime and nextScheduledTimeDuration
The two methods nextScheduledTimeDuration and getNextScheduleTime have a
lot of similarities, so this commit squashes the common parts together
along with getMostRecentScheduleTime to avoid code duplication.
2023-01-18 16:52:45 +01:00
Maciej Szulik
cb491a8d0f
Cleanups in controller utils
1. Squash two identical sorters byTime
2. Move helper for searching active jobs into utils to exist next to its
  counterpart
2023-01-18 13:40:23 +01:00
Viacheslav Panasovets
6adf60fdf4
Do not create endpoints if service of type ExternalName (#114814) 2023-01-18 03:12:34 -08:00
Kubernetes Prow Robot
46f3821bf4
Merge pull request #114586 from andrewsykim/apiserver-lease-rename
Rename apiserver identity lease labels to apiserver.kubernetes.io/identity
2023-01-17 21:36:34 -08:00
Kubernetes Prow Robot
5550064bc2
Merge pull request #115063 from kannon92/tracking-remove-comments
tracking with finalizers is the default way for the job controller so comments are not needed that say we are tracking with finalizers
2023-01-17 07:56:44 -08:00
Kubernetes Prow Robot
7b01daba71
Merge pull request #115074 from yangjunmyfm192085/deleteklogv0-controller
use klog instead of klog.V(0)--controller manager part
2023-01-16 09:58:50 -08:00
Kubernetes Prow Robot
ed8cad1e80
Merge pull request #115056 from mimowo/podgc-do-not-add-condition-for-terminated-pods
PodGC should not add DisruptionTarget condition for pods which are in terminal phase
2023-01-16 03:04:50 -08:00
JunYang
29086e2b04 use klog instead of klog.V(0) 2023-01-14 21:15:50 +08:00
Andrew Sy Kim
3da0f1809c apiserver: update lease label key to apiserver.kubernetes.io/identity
Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
2023-01-13 15:37:22 -05:00
Kubernetes Prow Robot
9af5ae0365
Merge pull request #115030 from kannon92/remove-pod-error-job-tracking
Update SyncJob with PodControllerError updates in job unit tests
2023-01-13 12:08:14 -08:00
Kubernetes Prow Robot
70217a4083
Merge pull request #114944 from mimowo/fix-active-deadline-test
Fix the job controller unit test for enforcing ActiveDeadlineSeconds
2023-01-13 10:46:26 -08:00
Michal Wozniak
3833c0c349 PodGC should not add DisruptionTarget condition for pods which are in terminal phase 2023-01-13 18:28:44 +01:00
kannon92
4890928b78 tracking with finalizers is the default way for the job controller 2023-01-13 16:48:35 +00:00
kannon92
3a838033f8 Update SyncJob with PodControllerError updates in job unit tests 2023-01-13 16:39:18 +00:00
Michal Wozniak
7065b42bb2 Fix the job controller unit test for enforcing ActiveDeadlineSeconds 2023-01-13 16:48:15 +01:00
Kubernetes Prow Robot
c0c386b9c9
Merge pull request #114516 from nikhita/job-backoff-fix
pkg/controller/job: re-honor exponential backoff delay
2023-01-13 07:36:40 -08:00
Kubernetes Prow Robot
1b8692ce46
Merge pull request #114296 from cbroglie/concurrent-monitor-node-health
controller/nodelifecycle: Make monitorNodeHealth process nodes concurrently
2023-01-12 12:42:54 -08:00
Nikhita Raghunath
fd8d92a29d pkg/controller/job: re-honor exponential backoff
This commit makes the job controller re-honor exponential backoff for
failed pods. Before this commit, the controller created pods without any
backoff. This is a regression because the controller used to
create pods with an exponential backoff delay before (10s, 20s, 40s ...).

The issue occurs only when the JobTrackingWithFinalizers feature is
enabled (which is enabled by default right now). With this feature, we
get an extra pod update event when the finalizer of a failed pod is
removed.

Note that the pod failure detection and new pod creation happen in the
same reconcile loop so the 2nd pod is created immediately after the 1st
pod fails. The backoff is only applied on 2nd pod failure, which means
that the 3rd pod created 10s after the 2nd pod, 4th pod is created 20s
after the 3rd pod and so on.

This commit fixes a few bugs:

1. Right now, each time `uncounted != nil` and the job does not see a
_new_ failure, `forget` is set to true and the job is removed from the
queue. Which means that this condition is also triggered each time the
finalizer for a failed pod is removed and `NumRequeues` is reset, which
results in a backoff of 0s.

2. Updates `updatePod` to only apply backoff when we see a particular
pod failed for the first time. This is necessary to ensure that the
controller does not apply backoff when it sees a pod update event
for finalizer removal of a failed pod.

3. If `JobsReadyPods` feature is enabled and backoff is 0s, the job is
now enqueued after `podUpdateBatchPeriod` seconds, instead of 0s. The
unit test for this check also had a few bugs:
    - `DefaultJobBackOff` is overwritten to 0 in certain unit tests,
    which meant that `DefaultJobBackOff` was considered to be 0,
    effectively not running any meaningful checks.
    - `JobsReadyPods` was not enabled for test cases that ran tests
    which required the feature gate to be enabled.
    - The check for expected and actual backoff had incorrect
    calculations.
2023-01-12 20:34:10 +05:30
Christopher Broglie
3c88de52c8 controller/nodelifecycle: Make monitorNodeHealth process nodes concurrently
Marking the pods not ready on a node requires looping over them and
updating each pod's status one at a time. This is performed serially,
and can take a while if we're processing each node serially as well.

Since the time is spent waiting on io, there's an opportunity to go
faster by processing multiple nodes concurrently. This change modifies
the loop to process nodes in parallel, using the same number of workers
as doNodeProcessingPassWorker.

This change also introduces histogram metrics to better observe
monitorNodeHealth.
2023-01-11 12:34:39 -08:00
kannon92
6dfaeff33c Remove Legacy Job Tracking 2023-01-10 14:52:54 +00:00
Kubernetes Prow Robot
e7549eae87
Merge pull request #114905 from kannon92/sync-job-test-fix
Fix SyncPastDeadlineJobFinished for enabling finalizer path
2023-01-09 12:47:28 -08:00
kannon92
0362c67859 Fix SyncPastDeadlineJobFinished for enabling finalizer path 2023-01-09 17:12:52 +00:00
Aldo Culquicondor
4c1b95ddfa
Ensure job is up to date in informer cache in test
The fake client doesn't guarantee that the informer cache is updated.
If it's not up-to-date, the controller always tries to set the
StartTime, leading to a broken test.

Change-Id: I71f26d46ea44beff88f0d03517985348654aec95
2023-01-09 10:53:19 -05:00
Kubernetes Prow Robot
901c1de5ea
Merge pull request #114870 from mattcary/mutation
Avoid mutation of PVC in stateful set controller shared cache
2023-01-05 23:16:09 -08:00
Matthew Cary
ed18ab54ba Avoid mutation of PVC in stateful set controller shared cache
Change-Id: Ieb8e443e460150d16524ca1c1fb3770f546b2c28
2023-01-05 18:09:05 -08:00
Kubernetes Prow Robot
492637878f
Merge pull request #111660 from pacoxu/key-encipherment-v1.26
Key encipherment usage  v1.27
2023-01-04 15:51:57 -08:00
Michal Wozniak
c3d0e8ff05 Fix clearing rate limiter in disruption controller 2023-01-03 15:06:06 +01:00
Kushagra
80384bbb55 spelling mistake rectified 2022-12-29 17:55:17 +00:00
Kushagra
f380ef8b61 Misleading message when there are no metrics. 2022-12-29 10:57:43 +00:00
Paco Xu
160f015ef4 kubelet: add key encipherment usage only if it is rsa key
remove allowOmittingUsageKeyEncipherment as it is always true

Signed-off-by: Paco Xu <paco.xu@daocloud.io>
2022-12-27 16:04:25 +08:00
Kubernetes Prow Robot
45f14a93f1
Merge pull request #113787 from gjkim42/update-daemonset-status-despite-error
Update daemonSet status even if syncDaemonSet fails
2022-12-22 15:49:25 -08:00
Kubernetes Prow Robot
d1c715a982
Merge pull request #113834 from atiratree/sts-handle-delete-pod-error
statefulset: handle API error on pod deletion
2022-12-22 08:17:26 -08:00
Harsha Narayana
208c3868cf
job controller: refactored job controller to be able to inject FakeClock for Unit Test 2022-12-20 21:29:24 +05:30
Jordan Liggitt
78cb3862f1
Fix indentation/spacing in comments to render correctly in godoc 2022-12-17 23:27:38 -05:00
Kubernetes Prow Robot
7f7bf68c7c
Merge pull request #111178 from lucming/cleanup
clean up code
2022-12-16 19:17:52 -08:00
Kubernetes Prow Robot
9edd4d86c8
Merge pull request #114522 from zshihang/master
lock LegacyServiceAccountTokenNoAutoGeneration
2022-12-16 14:24:32 -08:00
Kubernetes Prow Robot
f11e9aaf10
Merge pull request #113929 from howardjohn/endpointslice/use-optimized-set
endpoints: remove obsolete ServiceSelectorCache
2022-12-16 14:24:09 -08:00
Shihang Zhang
4fd09a06d6 lock LegacyServiceAccountTokenNoAutoGeneration 2022-12-16 10:45:35 -08:00
Daniel Smith
8100efc7b3 Enable propagration of HasSynced
* Add tracker types and tests
* Modify ResourceEventHandler interface's OnAdd member
* Add additional ResourceEventHandlerDetailedFuncs struct
* Fix SharedInformer to let users track HasSynced for their handlers
* Fix in-tree controllers which weren't computing HasSynced correctly
* Deprecate the cache.Pop function
2022-12-14 18:43:33 +00:00
Kubernetes Prow Robot
741bd5c382
Merge pull request #113947 from mowangdk/chore/change_adcontroller_log_level
Lower volume attached touch log level
2022-12-12 17:41:51 -08:00
John Howard
d9f2cc0c95 endpoints: remove obsolete ServiceSelectorCache
Since https://github.com/kubernetes/kubernetes/pull/112648, we can
efficiently handle selectors from pre-existing `map[string]string`,
making the cache obsolete.

Benchmark:

```
name                         old time/op    new time/op    delta
GetPodServiceMemberships-48     189µs ± 1%     193µs ± 1%  +2.10%  (p=0.000 n=10+10)

name                         old alloc/op   new alloc/op   delta
GetPodServiceMemberships-48    59.0kB ± 0%    58.9kB ± 0%  -0.09%  (p=0.000 n=9+9)

name                         old allocs/op  new allocs/op  delta
GetPodServiceMemberships-48     1.02k ± 0%     1.02k ± 0%    ~     (all equal)
```
2022-12-12 08:00:48 -08:00
Kubernetes Prow Robot
2118bc8aec
Merge pull request #114155 from aojea/mirroring_repack
endpointslicemirroring handle endpoints with multiple subsets
2022-12-10 07:53:42 -08:00
Kubernetes Prow Robot
9303ea836f
Merge pull request #114076 from akhilles/remove-unused-var
Remove unused `numExistingEndpoints` variable
2022-12-10 06:04:25 -08:00
Kubernetes Prow Robot
92ffe94592
Merge pull request #114033 from Octopusjust/k8s-pr14
pkg/controller/deployment/util/deployment_util.go:Improving test cove…
2022-12-10 06:03:55 -08:00
Antonio Ojea
ef6d9edea5 endpointslicemirroring handle endpoints with multiple subsets
Endpoints generated by the endpoints controller are in the canonical
form, however, custom endpoints can not be in canonical format
(there was a time they were canonicalized in the apiserver, but this
caused performance issues because the endpoint controller kept
updating them since the created endpoint were different than the
stored one due to the canonicalization)

There are cases where a custom endpoint may generate multiple slices
due to the controller, per example, when the same address is present
in different subsets.

The endpointslice mirroring controller should canonicalize the
endpoints subsets before start processing them to be consistent
on the slices generated, there is no risk of hotlooping because
the endpoint is only used as input.

Change-Id: I2a8cd53c658a640aea559a88ce33e857fa98cc5c
2022-12-10 11:44:10 +00:00
Kubernetes Prow Robot
0cd13e573c
Merge pull request #113196 from mimowo/job-controller-reviewer
Self-nominate mimowo as a reviewer for pkg/controller/job & test/integration/job packages
2022-12-10 02:01:39 -08:00