Commit Graph

6018 Commits

Author SHA1 Message Date
Filip Křepinský
747ffe785d improve message, log level and testing for unmanaged pods in disruption controller
- set higher severity and log level when unmanaged pods found and improve testing
- do not mention unsupported controller when triggering event for
  unmanaged pods (this is covered by CalculateExpectedPodCountFailed
event)
- test unsupported controller
- make testing for events non blocking when event not found
2023-03-03 23:03:06 +01:00
Kubernetes Prow Robot
6fd488a4e6
Merge pull request #115861 from JayKayy/inform-unsupported-pdb
Add a warning event when pdb has found a unmanaged pod
2023-03-03 03:16:58 -08:00
Kubernetes Prow Robot
3835c7aecd
Merge pull request #115882 from binacs/binacs/controller-use-issuperset
cleanup(controller): use IsSuperset to avoid interim slice
2023-03-02 17:00:57 -08:00
John Kwiatkoski
1f42ebc013 Add a warning event when pdb has found a unmanaged pod 2023-03-01 20:14:10 -05:00
weizhichen
4d6be42c1a add unit test 2023-03-01 06:48:37 +00:00
weizhichen
d06c0995cb fix 116028 2023-02-27 12:49:44 +00:00
Kubernetes Prow Robot
a34f8423a7
Merge pull request #115907 from qinqon/svc-same-address-different-pod
svc: Support pods with same address
2023-02-24 19:00:05 -08:00
Enrique Llorente
697ea476e2 svc: Support pods with same address
If different pods with same address are exposed by the same service if
some of the endpointslices endpoints are overwriten. This change add the
pod name to the hash function to ensure that all the endpoints are in
place.

Signed-off-by: Enrique Llorente <ellorent@redhat.com>
2023-02-23 11:37:57 +01:00
Daniel Vega-Myhre
d41302312e update validation logic so completions is mutable iff completions is modified in tandem with parallelsim so completions == parallelism 2023-02-23 03:25:16 +00:00
kannon92
32ac4a9581 left over uncounted from tracking cleanup 2023-02-22 16:45:53 +00:00
binacs
84ff621309 cleanup(controller): use IsSuperset to avoid interim slice 2023-02-19 21:49:58 +08:00
Kubernetes Prow Robot
d9ed2ff4b0
Merge pull request #114687 from freddie400/migrate-hpa
Migrate pkg/controller/podautoscaler to contextual logging
2023-02-17 05:44:03 -08:00
Freddie
dee494ece1 squashing without rebase 2023-02-17 01:47:52 +05:30
Patrick Ohly
0e1139d027 dra: avoid goroutine leaks from event broadcaster
When using these controllers in test/integration/scheduler_perf, the goroutine
leak check there pointed out that broadcaster.Shutdown function wasn't called
and thus goroutines leaked during a test.
2023-02-15 15:14:27 +01:00
Andy Goldstein
71ec5ed81d
resourcequota: use contexual logging (#113315)
Signed-off-by: Andy Goldstein <andy.goldstein@redhat.com>
2023-02-14 07:19:31 -08:00
Kubernetes Prow Robot
49babf218a
Merge pull request #115464 from sunnylovestiramisu/fixCSIMigrationBug
Remove check for CSI driver running on node for CSI migration attach operations
2023-02-13 12:49:30 -08:00
Kubernetes Prow Robot
2c37b470b3
Merge pull request #113794 from littlejiancc/feature_stateful_cleanup
Simplify case conditions
2023-02-09 20:37:39 -08:00
Sunny Song
98f944f55d Remove check for CSI driver running on node for CSI migration attach operations 2023-02-09 02:45:02 +00:00
Antonio Ojea
3bb203e7eb replace nodeipam custom logic by a workqueue
Change-Id: I242174b9d92606b1225a4af29a0730b7cd7d3c03
2023-02-06 19:34:29 +00:00
Sarvesh Rangnekar
c791d69b3e Fix the nodeSelector key creation mechanism
Fixes the issue caused when multile ClusterCIDR objects have the same
nodeSelector values, order of the requirements in the nodeSelector is
not preserved when nodeSelector is marshalled and converted to a string.
2023-02-01 13:48:07 +00:00
Kubernetes Prow Robot
bd63a912d6
Merge pull request #115349 from danielvegamyhre/job-controller-changes
Update previous succeeded indexes for Indexed jobs unconditionally
2023-01-31 15:51:04 -08:00
Daniel Vega-Myhre
2a81337e7c update prev succeeded indexes for indexed jobs unconditionally 2023-01-31 19:15:53 +00:00
Kubernetes Prow Robot
fb9884577e
Merge pull request #115345 from gnufied/ignore-error-when-unable-find-plugin
Ignore error when we can't find plugin capable of expanding the volum…
2023-01-31 05:24:50 -08:00
Sarvesh Rangnekar
a8f120b76c Fix the delete flow for ClusterCIDR objects
Fixes the deletion of ClusterCIDR object, when a Node is associated(has
Pod CIDRs allocated from this ClusterCIDR) with it. Currently the
ClusterCIDR finalizer is never cleaned up as there is no reconciliation
happening after the associated Node has been deleted. This commit fixes
the issue by adding workitems from all events to a worker queue and
reconcile until the delete is successful.
2023-01-30 19:35:41 +00:00
Kubernetes Prow Robot
ad2a9f2f33
Merge pull request #113863 from msau42/owners
update sig-storage owners
2023-01-30 10:10:50 -08:00
Hemant Kumar
c9fc35c496 reword the warning that gets printed on external expansion 2023-01-30 11:37:30 -05:00
Hemant Kumar
1e57dae5ec Ignore error when we can't find plugin capable of expanding the volume intre 2023-01-26 14:39:05 -05:00
Patrick Ohly
bc6c7fa912 logging: fix names of keys
The stricter checking with the upcoming logcheck v0.4.1 pointed out these names
which don't comply with our recommendations in
https://github.com/kubernetes/community/blob/master/contributors/devel/sig-instrumentation/migration-to-structured-logging.md#name-arguments.
2023-01-23 14:24:29 +01:00
Kubernetes Prow Robot
c63434aaff
Merge pull request #110838 from soltysh/cronjob_improvements
CronJob controller cleanups
2023-01-18 09:44:34 -08:00
Maciej Szulik
be44d67566
Re-use common parts between getNextScheduleTime and nextScheduledTimeDuration
The two methods nextScheduledTimeDuration and getNextScheduleTime have a
lot of similarities, so this commit squashes the common parts together
along with getMostRecentScheduleTime to avoid code duplication.
2023-01-18 16:52:45 +01:00
Maciej Szulik
cb491a8d0f
Cleanups in controller utils
1. Squash two identical sorters byTime
2. Move helper for searching active jobs into utils to exist next to its
  counterpart
2023-01-18 13:40:23 +01:00
Viacheslav Panasovets
6adf60fdf4
Do not create endpoints if service of type ExternalName (#114814) 2023-01-18 03:12:34 -08:00
Kubernetes Prow Robot
46f3821bf4
Merge pull request #114586 from andrewsykim/apiserver-lease-rename
Rename apiserver identity lease labels to apiserver.kubernetes.io/identity
2023-01-17 21:36:34 -08:00
Kubernetes Prow Robot
5550064bc2
Merge pull request #115063 from kannon92/tracking-remove-comments
tracking with finalizers is the default way for the job controller so comments are not needed that say we are tracking with finalizers
2023-01-17 07:56:44 -08:00
Kubernetes Prow Robot
7b01daba71
Merge pull request #115074 from yangjunmyfm192085/deleteklogv0-controller
use klog instead of klog.V(0)--controller manager part
2023-01-16 09:58:50 -08:00
Kubernetes Prow Robot
ed8cad1e80
Merge pull request #115056 from mimowo/podgc-do-not-add-condition-for-terminated-pods
PodGC should not add DisruptionTarget condition for pods which are in terminal phase
2023-01-16 03:04:50 -08:00
JunYang
29086e2b04 use klog instead of klog.V(0) 2023-01-14 21:15:50 +08:00
Andrew Sy Kim
3da0f1809c apiserver: update lease label key to apiserver.kubernetes.io/identity
Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
2023-01-13 15:37:22 -05:00
Kubernetes Prow Robot
9af5ae0365
Merge pull request #115030 from kannon92/remove-pod-error-job-tracking
Update SyncJob with PodControllerError updates in job unit tests
2023-01-13 12:08:14 -08:00
Kubernetes Prow Robot
70217a4083
Merge pull request #114944 from mimowo/fix-active-deadline-test
Fix the job controller unit test for enforcing ActiveDeadlineSeconds
2023-01-13 10:46:26 -08:00
Michal Wozniak
3833c0c349 PodGC should not add DisruptionTarget condition for pods which are in terminal phase 2023-01-13 18:28:44 +01:00
kannon92
4890928b78 tracking with finalizers is the default way for the job controller 2023-01-13 16:48:35 +00:00
kannon92
3a838033f8 Update SyncJob with PodControllerError updates in job unit tests 2023-01-13 16:39:18 +00:00
Michal Wozniak
7065b42bb2 Fix the job controller unit test for enforcing ActiveDeadlineSeconds 2023-01-13 16:48:15 +01:00
Kubernetes Prow Robot
c0c386b9c9
Merge pull request #114516 from nikhita/job-backoff-fix
pkg/controller/job: re-honor exponential backoff delay
2023-01-13 07:36:40 -08:00
Kubernetes Prow Robot
1b8692ce46
Merge pull request #114296 from cbroglie/concurrent-monitor-node-health
controller/nodelifecycle: Make monitorNodeHealth process nodes concurrently
2023-01-12 12:42:54 -08:00
Nikhita Raghunath
fd8d92a29d pkg/controller/job: re-honor exponential backoff
This commit makes the job controller re-honor exponential backoff for
failed pods. Before this commit, the controller created pods without any
backoff. This is a regression because the controller used to
create pods with an exponential backoff delay before (10s, 20s, 40s ...).

The issue occurs only when the JobTrackingWithFinalizers feature is
enabled (which is enabled by default right now). With this feature, we
get an extra pod update event when the finalizer of a failed pod is
removed.

Note that the pod failure detection and new pod creation happen in the
same reconcile loop so the 2nd pod is created immediately after the 1st
pod fails. The backoff is only applied on 2nd pod failure, which means
that the 3rd pod created 10s after the 2nd pod, 4th pod is created 20s
after the 3rd pod and so on.

This commit fixes a few bugs:

1. Right now, each time `uncounted != nil` and the job does not see a
_new_ failure, `forget` is set to true and the job is removed from the
queue. Which means that this condition is also triggered each time the
finalizer for a failed pod is removed and `NumRequeues` is reset, which
results in a backoff of 0s.

2. Updates `updatePod` to only apply backoff when we see a particular
pod failed for the first time. This is necessary to ensure that the
controller does not apply backoff when it sees a pod update event
for finalizer removal of a failed pod.

3. If `JobsReadyPods` feature is enabled and backoff is 0s, the job is
now enqueued after `podUpdateBatchPeriod` seconds, instead of 0s. The
unit test for this check also had a few bugs:
    - `DefaultJobBackOff` is overwritten to 0 in certain unit tests,
    which meant that `DefaultJobBackOff` was considered to be 0,
    effectively not running any meaningful checks.
    - `JobsReadyPods` was not enabled for test cases that ran tests
    which required the feature gate to be enabled.
    - The check for expected and actual backoff had incorrect
    calculations.
2023-01-12 20:34:10 +05:30
Christopher Broglie
3c88de52c8 controller/nodelifecycle: Make monitorNodeHealth process nodes concurrently
Marking the pods not ready on a node requires looping over them and
updating each pod's status one at a time. This is performed serially,
and can take a while if we're processing each node serially as well.

Since the time is spent waiting on io, there's an opportunity to go
faster by processing multiple nodes concurrently. This change modifies
the loop to process nodes in parallel, using the same number of workers
as doNodeProcessingPassWorker.

This change also introduces histogram metrics to better observe
monitorNodeHealth.
2023-01-11 12:34:39 -08:00
kannon92
6dfaeff33c Remove Legacy Job Tracking 2023-01-10 14:52:54 +00:00
Kubernetes Prow Robot
e7549eae87
Merge pull request #114905 from kannon92/sync-job-test-fix
Fix SyncPastDeadlineJobFinished for enabling finalizer path
2023-01-09 12:47:28 -08:00