Commit Graph

201 Commits

Author SHA1 Message Date
Andrea Tosatto
ccda2d6fd4 kube-controller-manager: Decouple TaintManager from NodeLifeCycleController (KEP-3902) 2023-10-30 12:23:56 +00:00
Kubernetes Prow Robot
74098ab5ad
Merge pull request #119500 from JackTroy/fix-threshold-arg
Add explanation for large-cluster-size-threshold arg
2023-10-30 02:50:10 +01:00
Michal Wozniak
32fdb55192 Use Patch instead of SSA for Pod Disruption condition 2023-10-19 21:00:19 +02:00
Kubernetes Prow Robot
6cbc5dfac6
Merge pull request #114095 from aimuz/fix-114083
scheduler: Fix field apiVersion is missing from events reported from taint manager
2023-08-21 07:03:23 -07:00
jackcui
9d8959224c add explanation for large-cluster-size-threshold arg about multiple zones cluster 2023-07-21 17:25:51 +08:00
Ziqi Zhao
dfc1838379 Migrated pkg/controller/volume|util|replicaset|nodeipam to contextual logging
Signed-off-by: Ziqi Zhao <zhaoziqi9146@gmail.com>
2023-07-06 07:39:52 +08:00
aimuz
396c8a6783
test: TestPodDeletionEvent
Signed-off-by: aimuz <mr.imuz@gmail.com>
2023-06-16 10:31:33 +08:00
aimuz
975b2c6611
scheduler: Fix field apiVersion is missing from events reported from taint manager
Signed-off-by: aimuz <mr.imuz@gmail.com>
2023-06-16 09:51:25 +08:00
Kubernetes Prow Robot
484645e817
Merge pull request #116659 from claudiubelu/skip-flaky-tests-2
unit tests: Skip flaky tests on Windows (part 2)
2023-05-23 20:04:48 -07:00
Kubernetes Prow Robot
29fe2c70b1
Merge pull request #117252 from alculquicondor/node-lifecycle-owner
Add SIG ownership to controller/nodelifecycle
2023-04-17 14:10:57 -07:00
Claudiu Belu
0979d55443 unit tests: Skip flaky tests on Windows (part 2)
Some of the unit tests are currently flaky on Windows. This commit
skips them until they are resolved.
2023-04-13 12:07:18 +00:00
Aldo Culquicondor
b23ab389b4
Add SIG ownership to controller/nodelifecycle
Change-Id: I31a329d9ca08bdf12a428cae44a5f061afa01e73
2023-04-12 15:42:06 -04:00
Tim Hockin
29c0b73d64
Replace uses of diff.ObjectDiff with cmp.Diff
ObjectDiff is already a shim over cmp.Diff, so no actual output or
behavior changes
2023-04-12 08:46:12 -07:00
Andrea Tosatto
d09842e0ad
node-lifecycle-controller: improve monitorNodeHealth test-coverage (#116687)
* node-lifecycle-controller: refactor monitorNodeHealth tests to improve test-coverage

* address PR review comments

* dedupe test logic
2023-04-12 07:02:33 -07:00
Kubernetes Prow Robot
27e23bad7d
Merge pull request #116529 from pohly/controllers-with-name
kube-controller-manager: convert to structured logging
2023-03-14 14:12:55 -07:00
Ziqi Zhao
d1aa73312c
pkg/controller/util support contextual logging (#115049)
Signed-off-by: Ziqi Zhao <zhaoziqi9146@gmail.com>
2023-03-14 12:38:14 -07:00
Patrick Ohly
99151c39b7 kube-controller-manager: convert to structured logging
Most of the individual controllers were already converted earlier. Some log
calls were missed or added and then not updated during a rebase. Some of those
get updated here to fill those gaps.

Adding of the name to the logger used by each controller gets
consolidated in this commit. By using the name under which the
controller is registered we ensure that the names in the log
are consistent.
2023-03-14 19:16:32 +01:00
Damien Grisonnet
ac394c5c19 Cleanup deprecated metrics
Remove the following deprecated metrics:
- node_collector_evictions_number
- scheduler_e2e_scheduling_duration_seconds

Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
2023-03-13 22:55:34 +01:00
Andrea Tosatto
cae19f9e85 Remove deprecated pod-eviction-timeout flag from controller-manager 2023-03-07 18:14:18 +00:00
kerthcet
e5c812bbe7 Remove CLI flag enable-taint-manager
Signed-off-by: kerthcet <kerthcet@gmail.com>
2023-03-07 18:11:49 +00:00
Kubernetes Prow Robot
37326f7cea
Merge pull request #112670 from yangjunmyfm192085/delklogV0
use contextual logging(nodeipam and nodelifecycle part)
2023-03-07 09:40:33 -08:00
JunYang
780ef3afb0 use klog.InfoS instead of klog.V(0),Info 2023-03-07 15:50:01 +08:00
Claudiu Belu
5ba74c81ca unit tests: Skip flaky tests on Windows
Some of the unit tests are currently flaky on Windows. This commit
skips them until they are resolved.
2023-03-06 20:46:05 +00:00
binacs
84ff621309 cleanup(controller): use IsSuperset to avoid interim slice 2023-02-19 21:49:58 +08:00
JunYang
29086e2b04 use klog instead of klog.V(0) 2023-01-14 21:15:50 +08:00
Kubernetes Prow Robot
1b8692ce46
Merge pull request #114296 from cbroglie/concurrent-monitor-node-health
controller/nodelifecycle: Make monitorNodeHealth process nodes concurrently
2023-01-12 12:42:54 -08:00
Christopher Broglie
3c88de52c8 controller/nodelifecycle: Make monitorNodeHealth process nodes concurrently
Marking the pods not ready on a node requires looping over them and
updating each pod's status one at a time. This is performed serially,
and can take a while if we're processing each node serially as well.

Since the time is spent waiting on io, there's an opportunity to go
faster by processing multiple nodes concurrently. This change modifies
the loop to process nodes in parallel, using the same number of workers
as doNodeProcessingPassWorker.

This change also introduces histogram metrics to better observe
monitorNodeHealth.
2023-01-11 12:34:39 -08:00
ialidzhikov
aede3fbf40 pkg/controller: Replace deprecated func usage from the k8s.io/utils/pointer pkg 2022-11-23 17:40:23 +02:00
Michal Wozniak
a910ca563b Fix race conditions 2022-11-14 10:11:26 +01:00
Michal Wozniak
3b5c3acd61 Improve stability if the taint_manager tests 2022-11-13 19:40:18 +01:00
Michal Wozniak
c803892bd8 Enable the feature into beta 2022-11-09 09:02:40 +01:00
Michal Wozniak
fea883687f SSA to add pod failure conditions - ready for review 2022-10-27 18:21:33 +02:00
Jakub Przychodzeń
de25c5fdcf NodeLifecycleController: Remove race condition
Patch request does not support RV by default, we need to include them explicitly and patching lists actually overwrites whole field. It means that there is a race condition, in which we can overwrite changes to taints that happened between GET and PATCH requests.
2022-10-24 19:36:58 +00:00
Han Kang
2bbd445f50 remove rate limiter metric as it is not in use
Change-Id: I91157653e3860eeecc3f572aee88da6ffc65faed
2022-10-13 13:07:11 -07:00
Michal Wozniak
bb561e0324 Fix controller policy and improve logging of related errors
Improve error logging from timed workers which are used for pod eviction

Co-authored-by: Aldo Culquicondor <1299064+alculquicondor@users.noreply.github.com>
2022-09-23 16:53:32 +02:00
kerthcet
b4277e7ce4 Fix potential goroutine leakages in taint manager tests
Signed-off-by: kerthcet <kerthcet@gmail.com>
2022-08-04 00:00:48 +08:00
Michal Wozniak
04fcbd721c Introduction of a pod condition type indicating disruption. Its reason field indicates the reason:
- PreemptionByKubeScheduler (Pod preempted by kube-scheduler)
- DeletionByTaintManager (Pod deleted by taint manager due to NoExecute taint)
- EvictionByEvictionAPI (Pod evicted by Eviction API)
- DeletionByPodGC (an orphaned Pod deleted by PodGC)PreemptedByScheduler (Pod preempted by kube-scheduler)
2022-08-02 11:12:16 +02:00
Davanum Srinivas
a9593d634c
Generate and format files
- Run hack/update-codegen.sh
- Run hack/update-generated-device-plugin.sh
- Run hack/update-generated-protobuf.sh
- Run hack/update-generated-runtime.sh
- Run hack/update-generated-swagger-docs.sh
- Run hack/update-openapi-spec.sh
- Run hack/update-gofmt.sh

Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2022-07-26 13:14:05 -04:00
Kubernetes Prow Robot
a6afdf45dd
Merge pull request #110359 from MadhavJivrajani/remove-api-call-under-lock
controller/nodelifecycle: Refactor to not make API calls under lock
2022-07-25 06:20:34 -07:00
Madhav Jivrajani
3c0bc26d90 controller/nodelifecycle: Refactor to not make API calls under lock
The evictorLock only protects zonePodEvictor and zoneNoExecuteTainter.
processTaintBaseEviction showed indications of increased lock contention
among goroutines (see issue 110341 for more details).

The refactor done is to ensure that all codepaths in that function that
hold the evictorLock AND make API calls under the lock, are now making
API calls outside the lock and the lock is held only for accessing either
zonePodEvictor or zoneNoExecuteTainter or both.

Two other places where the refactor was done is the doEvictionPass and
doNoExecuteTaintingPass functions which make multiple API calls under
the evictorLock.

Signed-off-by: Madhav Jivrajani <madhav.jiv@gmail.com>
2022-07-25 15:16:26 +05:30
Michal Wozniak
4ec8cf08da This PR refactors taint_manager to eliminate the getPod and getNode stubs. 2022-07-14 18:00:44 +02:00
Wojciech Tyczyński
006ff4510b Clean shutdown of nodecontroller integration tests 2022-06-06 20:33:20 +02:00
Kubernetes Prow Robot
fcffb6de7e
Merge pull request #108089 from mysunshine92/nodeController-20220212
fix typo for nodelifecycle controller
2022-05-07 03:33:18 -07:00
Kubernetes Prow Robot
f979b4094e
Merge pull request #99292 from yangjunmyfm192085/run-test23
replace all occurrences of "node", nodeName to "node", klog.KRef("", nodeName)
2022-03-17 12:22:42 -07:00
wangyamei
3808e193d9 fix typo for nodelifecycle controller 2022-02-28 14:30:28 +08:00
Kubernetes Prow Robot
3bd422dc76
Merge pull request #107293 from dims/jan-1-owners-cleanup
Cleanup OWNERS files - Jan 2021 Week 1
2022-01-13 10:30:30 -08:00
Patrick Ohly
9eaa2dc554 avoid klog Info calls without verbosity
In the following code pattern, the log message will get logged with v=0 in JSON
output although conceptually it has a higher verbosity:

   if klog.V(5).Enabled() {
       klog.Info("hello world")
   }

Having the actual verbosity in the JSON output is relevant, for example for
filtering out only the important info messages. The solution is to use
klog.V(5).Info or something similar.

Whether the outer if is necessary at all depends on how complex the parameters
are. The return value of klog.V can be captured in a variable and be used
multiple times to avoid the overhead for that function call and to avoid
repeating the verbosity level.
2022-01-12 07:48:36 +01:00
Davanum Srinivas
9682b7248f
OWNERS cleanup - Jan 2021 Week 1
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2022-01-10 08:14:29 -05:00
JUN YANG
13fa2b1949 replace all occurrences of "node", nodeName to "node", klog.KRef("", nodeName) 2022-01-06 21:19:29 +08:00
Kubernetes Prow Robot
1426587e08
Merge pull request #106436 from dims/cleanup-owners-files-no-activity-in-a-year
Cleanup OWNERS files (No Activity in the last year)
2021-12-15 12:07:51 -08:00