Commit Graph

5429 Commits

Author SHA1 Message Date
Swetha Repakula
be2ef551d1 Graduate EndpointSliceNodeName feature gate to GA
- Feature gate can be removed when EndpointSlice v1beta1 is removed
  - Remove test cases where feature gate is disabled
2021-03-03 18:13:51 -08:00
Aldo Culquicondor
8812531b8c Add completion index to Job Pods
When .spec.completionMode="Indexed"
2021-03-03 22:45:53 +00:00
Kubernetes Prow Robot
7b0ad65d4d
Merge pull request #99288 from supriya-premkumar/ineffassign
Adds ineffassign to GO linter script.
2021-03-03 14:40:46 -08:00
Kubernetes Prow Robot
53492f79df
Merge pull request #98756 from pacoxu/ut/faster
networking nodeipam UT: set node poll interval to 1s in UT
2021-03-03 14:40:31 -08:00
Kubernetes Prow Robot
59f8ba5ffa
Merge pull request #98595 from SergeyKanzhelev/nodelifecycleSchedulerUnitTestsSpeedUp
sped up scheduler tests by using fake clock
2021-03-03 14:40:21 -08:00
Kubernetes Prow Robot
90ac9cfc58
Merge pull request #97881 from ialidzhikov/nit/unnecessary-type-conversion
Nit: Remove unnecessary type conversion
2021-03-03 14:39:49 -08:00
Sergey Kanzhelev
d24ed4ace1 sped up scheduler tests by using fake clock 2021-03-03 19:12:53 +00:00
Supriya Premkumar
e52e5e486c
Adds ineffassign to GO linter script.
Changes:
 - Enables ineffassign check in the verify scripts.
 - Fixes lint errs.
2021-03-03 08:28:10 -08:00
Patrick Ohly
512401a8a2 scheduler: tests for generic ephemeral volumes
This covers some failure scenarios and feature gate enablement.
2021-03-03 10:13:05 +01:00
Patrick Ohly
d2cc70ee2c scheduler: fail when a pod uses disabled generic ephemeral volumes
Without this error, kube-scheduler was simply ignoring the special
volume source and scheduled the pod. This was unlikely to work in
practice because the volume might have needed binding or the feature
is also disabled on kubelet which then doesn't know what to do with
the volume.
2021-03-03 10:13:05 +01:00
pacoxu
cef6e81fb3 set node poll interval to 10ms in UT
Signed-off-by: pacoxu <paco.xu@daocloud.io>
2021-03-03 16:39:57 +08:00
Mikkel Oscar Lyderik Larsen
fef092b417
hpa: Don't scale down if at least one metric was invalid
Signed-off-by: Mikkel Oscar Lyderik Larsen <mikkel.larsen@zalando.de>
2021-03-03 07:53:01 +01:00
Kubernetes Prow Robot
ee90db514c
Merge pull request #99345 from robscott/endpointslice-wait-for-cache
Updating EndpointSlice controller to wait for cache to be updated
2021-03-02 13:23:20 -08:00
Kubernetes Prow Robot
d1a2af554a
Merge pull request #99115 from pohly/ephemeral-volume-metrics
generic ephemeral volume: add metrics
2021-03-02 12:15:59 -08:00
Kubernetes Prow Robot
a89d783b27
Merge pull request #99553 from alaypatel07/fix-flaky-unit-test
cronjob: fix flaky unit test TestController2_updateCronJob
2021-03-02 11:11:31 -08:00
Rob Scott
e1542606c2
Updating EndpointSlice controller to wait for cache to be updated
This updates the EndpointSlice controller to make use of the
EndpointSlice tracker to identify when expected changes are not present
in the cache yet. If this is detected, the controller will wait to sync
until all expected updates have been received. This should help avoid
race conditions that would result in duplicate EndpointSlices or failed
attempts to update stale EndpointSlices. To simplify this logic, this
also moves the EndpointSlice tracker from relying on resource versions
to generations.
2021-03-02 09:43:46 -08:00
Alay Patel
08bc827e66 cronjob_controller: add metrics for job creation skew duration 2021-03-02 10:38:59 -05:00
Patrick Ohly
98f75290ba generic ephemeral volume: simpler metrics
A CounterVector with status as label may create unnecessary overhead
and using the success case with the empty label value wasn't
easy. It's better to have two seperate counters, one for total number
of calls and one for failed calls.
2021-03-02 12:01:37 +01:00
Patrick Ohly
6cb28fd1b4 generic ephemeral volume: add metrics
As discussed during the production readiness review, a metric for the
PVC create operations is useful. The "ephemeral_volume" workqueue
metrics were already added in the initial implementation.

The new code follows the example set by the endpoints controller.
2021-03-02 12:01:37 +01:00
Patrick Ohly
e98c40a6f9 volume binder: test different CSIStorageCapacity/CSIDriver combinations
When the feature is disabled either in the scheduler or the CSIDriver,
the scheduler is expected to schedule pods without considering whether
storage capacity is available.
2021-03-02 11:08:57 +01:00
Patrick Ohly
1f3ede50f7 PVC protection controller: clarify pod shutdown
The code was correct and now the comment references the code in
kubelet to illustrate how pod shutdown works.
2021-03-02 08:31:12 +01:00
Kubernetes Prow Robot
2e055dac58
Merge pull request #99163 from ahg-g/ahg-pod-deletion
Implements pod deletion cost
2021-03-01 18:19:21 -08:00
Kubernetes Prow Robot
1b88c2ee47
Merge pull request #98912 from wzshiming/ut/speed-up-volume-scheduling
Speed up pkg/controller/volume/scheduling unit tests
2021-03-01 13:58:16 -08:00
Alay Patel
602435ccb9 cronjob: fix flaky unit test TestController2_updateCronJob 2021-03-01 15:45:32 -05:00
Abdullah Gharaibeh
d7e80ab038 Implement pod deletion cost 2021-03-01 13:45:58 -05:00
Clayton Coleman
8d8884a907
daemonset: Remove unnecessary error returns in strategy code
The nodeShouldRunDaemonPod method does not need to return an error
because there are no scenarios under which it fails. Remove the
error return path for its direct calls as well.
2021-03-01 13:23:18 -05:00
Clayton Coleman
9f296c133d
daemonset: Simplify the logic for calculating unavailable pods
In order to maintain the correct invariants, the existing maxUnavailable
logic calculated the same data several times in different ways. Leverage
the simpler structure from maxSurge and calculate pod availability only
once, as well as perform only a single pass over all the pods in the
daemonset. This changed no behavior of the current controller, and
has a structure that is almost identical to maxSurge.
2021-03-01 13:23:16 -05:00
Clayton Coleman
18f43e4120
daemonset: Implement MaxSurge on daemonset update
If MaxSurge is set, the controller will attempt to double up nodes
up to the allowed limit with a new pod, and then when the most recent
(by hash) pod is ready, trigger deletion on the old pod. If the old
pod goes unready before the new pod is ready, the old pod is immediately
deleted. If an old pod goes unready before a new pod is placed on that
node, a new pod is immediately added for that node even past the MaxSurge
limit.

The backoff clock is used consistently throughout the daemonset controller
as an injectable clock for the purposes of testing.
2021-03-01 13:21:12 -05:00
Clayton Coleman
8e6f58b0bd
daemonset: Prevent accidental omissions of test condition checks
It is too easy to omit checking the return value for the
syncAndValidateDaemonSet test in large suites. Switch the method
type to be a test helper and fatal/error directly. Also rename
a method that referenced the old name 'Rollback' instead of
'RollingUpdate'.
2021-03-01 13:20:58 -05:00
Kubernetes Prow Robot
5498ee641b
Merge pull request #99561 from BenTheElder/remove-bazel
Remove Bazel
2021-03-01 09:55:27 -08:00
Kubernetes Prow Robot
f6152d1521
Merge pull request #97086 from xing-yang/check_datasource
Only CSI plugin can have a DataSource
2021-03-01 06:53:26 -08:00
wzshiming
67e4ba0797 Speed up pkg/controller/volume/scheduling unit tests 2021-03-01 11:53:45 +08:00
Morten Torkildsen
5ad4280c09 Avoid sending events for every non-conformant pod in disruption controller 2021-02-28 19:43:25 -08:00
Benjamin Elder
56e092e382 hack/update-bazel.sh 2021-02-28 15:17:29 -08:00
Kubernetes Prow Robot
c200a8f9b7
Merge pull request #98433 from damemi/remove-helper-from-volume-zone
Move GetPersistentVolumeClaimClass to component-helpers
2021-02-26 12:38:15 -08:00
Kubernetes Prow Robot
c198632a12
Merge pull request #97098 from alaypatel07/cronjob-controller-2-follow-up-2
fix the case of time drift and re-implement next schedule calculation
2021-02-25 12:21:40 -08:00
xing-yang
676a3a7012 Only CSI plugin can have a DataSource 2021-02-25 15:27:26 +00:00
Kubernetes Prow Robot
8feec9bf94
Merge pull request #99351 from CaoDonghui123/fixissues3
Remove deadcode
2021-02-24 15:29:34 -08:00
Alay Patel
cebbc6b487 cronjob: use the same schedule Parser for tests as the reconcile loop 2021-02-24 13:50:03 -05:00
Alay Patel
6290a23ceb cronjob_controllerv2: gracefully handle 0 seconds between schedules 2021-02-24 13:48:56 -05:00
xiaofei.sun
fd62f32125 Scheduler: remove pkg/apis/core/field_constants.go 2021-02-24 18:06:29 +08:00
Kubernetes Prow Robot
1af99c4b21
Merge pull request #99274 from serathius/fix
Fix usage of klog.InfoS
2021-02-23 15:23:03 -08:00
caodonghui
f435e24403 Remove deadcode 2021-02-23 17:58:47 +08:00
Kubernetes Prow Robot
186f934e4c
Merge pull request #98346 from mortent/checkForScalePDBs
Check if resources implement scale in disruption controller
2021-02-22 13:58:03 -08:00
Alay Patel
4d7d14d249 update bazel 2021-02-22 12:11:04 -05:00
Alay Patel
0e171969e2 refine the test case for time drift, calculate next schedule with math 2021-02-22 12:11:04 -05:00
Shihang Zhang
1095778dcc remove secret-based sa token client builder 2021-02-21 22:00:40 -08:00
Kubernetes Prow Robot
031f2afbba
Merge pull request #98931 from michaelbeaumont/kubelet_well_known
Move pkg/kubelet/apis to k8s.io/kubelet/pkg/apis
2021-02-20 11:55:41 -08:00
Marek Siarkowicz
7b1f3584f5 Fix usage of klog.InfoS 2021-02-20 19:22:16 +01:00
Shihang Zhang
bbce0468d4 add metrics for rootcacertpublisher controller 2021-02-16 21:56:41 -08:00
Kubernetes Prow Robot
aa28a3563b
Merge pull request #99050 from Jiawei0227/storage_metrics
Add migrated field to storage_operation_duration_seconds metric
2021-02-16 18:21:06 -08:00
Nikhita Raghunath
b11516d69f *: move gmarek to emeritus_approvers 2021-02-16 10:59:19 +05:30
Alex Dudko
dcdde707ab Migrate deployment controller log messages to structured logging 2021-02-14 11:09:52 -08:00
Jiawei Wang
3d61b56bcd update bazel 2021-02-12 17:50:40 -08:00
Jiawei Wang
6a7222cf4e Add migrated field to storage_operation_duration_seconds metric 2021-02-12 17:35:01 -08:00
Kubernetes Prow Robot
002fb9fc5e
Merge pull request #98676 from ahg-g/ahg-ttl
JobDeletionDurationSeconds metric in TTLAfterFinished controller
2021-02-12 16:57:05 -08:00
Abdullah Gharaibeh
181e2173cf Added time to deletion metric to ttl-after-finidhed controller 2021-02-12 13:02:17 -05:00
Khaled (Kal) Henidak
3e56ddae67 upgrade IPv6DualStack feature to beta and turn on by default 2021-02-10 23:14:05 +00:00
Kubernetes Prow Robot
18605d8814
Merge pull request #98792 from wzshiming/ut/speed-up-persistentvolume
Speed up pkg/controller/volume/persistentvolume unit tests
2021-02-10 01:06:59 -08:00
Michael Beaumont
a5a6762d33
Move pkg/kubelet/apis to k8s.io/kubelet/pkg/apis 2021-02-09 21:37:39 +01:00
Kubernetes Prow Robot
425d29b39a
Merge pull request #98688 from wangkai1994/fix/pvc_protection_controller_log
migrate pkg/controller/volume/pvc_protection_controller.go to structured logs
2021-02-06 20:13:11 -08:00
Kubernetes Prow Robot
c15851b777
Merge pull request #98793 from wzshiming/ut/speed-up-endpointslice
Speed up pkg/controller/endpointslice unit tests
2021-02-05 17:51:11 -08:00
Kubernetes Prow Robot
036cab71a6
Merge pull request #98691 from pacoxu/cronjob-ut
run cronjob every 1minute in UT
2021-02-05 11:34:52 -08:00
Kubernetes Prow Robot
6dc0047396
Merge pull request #98259 from tanjing2020/taint_manager_log
migrate scheduler/taint_manager.go structured logging
2021-02-04 23:50:51 -08:00
wzshiming
98eb869d63 Speed up pkg/controller/endpointslice unit tests 2021-02-05 15:28:37 +08:00
wzshiming
fb518af0fc Speed up pkg/controller/volume/persistentvolume unit tests 2021-02-05 15:09:36 +08:00
Kubernetes Prow Robot
4eb9d825d5
Merge pull request #98489 from alculquicondor/job-testtable
Make sync Job test tables more readable
2021-02-04 15:55:16 -08:00
Morten Torkildsen
b63342af70 Fix nil pointer dereference in disruption controller 2021-02-03 21:04:29 -08:00
Morten Torkildsen
96ea28aa77 Check if resources implement scale in disruption controller 2021-02-03 20:19:35 -08:00
wangkai1994
7edf9e0155 change to kref and kobj 2021-02-03 17:45:38 +08:00
pacoxu
81e0ebcde9 run cronjob every 1minute(means 1s) in UT 2021-02-02 18:54:50 +08:00
wangkai1994
ab11816570 migrate pkg/controller/volume/pvcprotection.go to structured logs 2021-02-02 17:42:20 +08:00
Kubernetes Prow Robot
f27318c0da
Merge pull request #98445 from damemi/taint-toleration-internal-helpers
Move Taint/Toleration helpers to component-helpers repo
2021-02-01 17:50:27 -08:00
Aldo Culquicondor
609116b147 Test failed pod recreation
Change-Id: I31a2e667e9d96c385a921e25347ebeb5a8424e62
2021-02-01 13:20:03 -05:00
Mike Dame
578ff3ec34 Move Taint/Toleration helpers to component-helpers repo
This is part of the goal for scheduling to remove dependencies on internal
packages for the scheduling framework. It also provides these functions in an
external location for other components and projects to import.
2021-02-01 11:06:03 -05:00
Mike Dame
ba72411aa2 Move GetPersistentVolumeClaimClass to component-helpers
The goal of this move is related to issue 89930, to break the dependence
of scheduling plugins on internal helpers. This function can easily move to
component-helpers where it will be used by other components as well.
2021-02-01 10:48:38 -05:00
Rob Scott
4c9c9bb5a0 Adding myself as an approver for EndpointSlice controller
This also adds me as a reviewer for the Endpoints controller.
2021-01-31 22:34:39 -08:00
Aldo Culquicondor
dbf9e3b2d3 Make sync Job test tables more readable
And use t.Run to improve debugging experience

Change-Id: Ia91adbfe9c419cc640abe0efe287f5b9ab715e87
2021-01-27 16:56:41 -05:00
Kubernetes Prow Robot
ed4025a47b
Merge pull request #96328 from qingsenLi/201107-tc
test: Add comment for the redundant define name
2021-01-26 10:30:32 -08:00
tanjing2020
e144c4e1e9 migrate scheduler/taint_manager.go structured logging 2021-01-25 10:06:41 +08:00
Kubernetes Prow Robot
73fbd3da09
Merge pull request #97407 from waynepeking348/clean_rs_by_revision_instead_of_creat_timestamp
clean rs by revision instead of creation timestamp in deployment controller
2021-01-21 20:15:27 -08:00
Kubernetes Prow Robot
889b0ef004
Merge pull request #98107 from andrewsykim/controllers-owner
add myself as reviewer in pkg/controller/OWNERS
2021-01-19 22:18:20 -08:00
JunYang
f241036d2b fix mistake about [avaliable] for index_test.go
Signed-off-by: JunYang <yang.jun22@zte.com.cn>
2021-01-20 08:37:50 +08:00
waynepeking348
b2de3507d0 add dependency in bazel BUILD file 2021-01-16 22:25:37 +08:00
Kubernetes Prow Robot
8bf42039e6
Merge pull request #96552 from pandaamanda/klog_fmt
use klog.Info and klog.Warning when had no format
2021-01-15 17:57:43 -08:00
Andrew Sy Kim
fdc9790b2d add myself as reviewer in pkg/controller/OWNERS
Signed-off-by: Andrew Sy Kim <kim.andrewsy@gmail.com>
2021-01-15 17:21:35 -05:00
Jordan Liggitt
1e71a28490 Clean up namespaced children of missing virtual parents with incorrectly cluster-scoped nodes 2021-01-14 11:29:31 -05:00
Jordan Liggitt
65d43faf3a Add unit test for child scope mismatch with missing parent 2021-01-14 10:45:56 -05:00
Kubernetes Prow Robot
8f09066809
Merge pull request #97788 from heqg/expect-scheduler_binder
fix typo of [expect] in pkg/controller/../scheduler_binder.go
2021-01-13 18:43:03 -08:00
Kubernetes Prow Robot
1209c59612
Merge pull request #96876 from howieyuen/no-execute-taint-missing
fix nodelifecyle controller not add NoExecute taint bug
2021-01-13 14:17:03 -08:00
Kubernetes Prow Robot
51f31bd42c
Merge pull request #97113 from 249043822/bugfix-job-log111
Optimize log output for job controller
2021-01-11 20:52:38 -08:00
ialidzhikov
e1e7c39b47 Remove unnecessary type conversion
Signed-off-by: ialidzhikov <i.alidjikov@gmail.com>
2021-01-09 22:43:29 +02:00
Kubernetes Prow Robot
5a6f9ed550
Merge pull request #97721 from aojea/slicelog
fix slice controller logging for services ipfamily
2021-01-07 09:34:40 -08:00
he.qingguo
b72da953be fix typo of [expect] in pkg/controller/../scheduler_binder.go
Signed-off-by: he.qingguo <he.qingguo@zte.com.cn>
2021-01-07 12:23:14 +08:00
Kubernetes Prow Robot
e8b3a02fd2
Merge pull request #97572 from gavinfish/gc-test-typo
Cleanup: fix typos in garbagecollector_test.go
2021-01-05 15:11:53 -08:00
Kubernetes Prow Robot
571a7ce2c1
Merge pull request #97348 from josephburnett/bistable-review
Up and down scale stabilize with envelope.
2021-01-05 12:39:59 -08:00
Kubernetes Prow Robot
0300aa712e
Merge pull request #97543 from pacoxu/fix/94495-ga
remove LegacyNodeRoleBehavior and mv ServiceNodeExclusion to GA
2021-01-05 11:46:45 -08:00
Kubernetes Prow Robot
07bd985724
Merge pull request #96561 from ialidzhikov/cleanup/csi-node-info
Remove CSINodeInfo feature gate
2021-01-05 11:46:00 -08:00
pacoxu
441985afb6 set LegacyNodeRoleBehavior to false and mv ServiceNodeExclusion to GA
Signed-off-by: pacoxu <paco.xu@daocloud.io>
2021-01-05 22:34:18 +08:00
Antonio Ojea
561133ccf1 fix slice controller logging for services ipfamily 2021-01-05 12:52:04 +01:00
ialidzhikov
1b74f28ed8 Clean up some redundant imports
Signed-off-by: ialidzhikov <i.alidjikov@gmail.com>
2021-01-02 01:09:22 +02:00
drfish
4a839fa6d3 Fix typos in garbagecollector_test.go 2020-12-29 23:20:18 +08:00
Kubernetes Prow Robot
f11c3b475d
Merge pull request #94858 from waynepeking348/master
fix bugs when copying deployment annotations to replicaSet if value is empty
2020-12-22 19:08:26 -08:00
Kubernetes Prow Robot
83156c6246
Merge pull request #92335 from farah/farah/remove-unused-interface
Delete unused interface
2020-12-21 14:50:25 -08:00
Joseph Burnett
16133c2b77 Up and down scale stabilize with envelope.
The HPA controller keeps a flat history of recommendations for
stabilization. However when both up and down scale stabilization are
configured, the interpretation of the history changes depending on the
direction of movement. What we want is to keep the stabilized
recommendation within the envelope of the minimum and maximum over
configured stabilization windows. We should only move when the
envelope forces a move.
2020-12-21 14:36:13 +01:00
waynepeking348
c8f9858920 add unit test cases 2020-12-20 17:05:48 +08:00
waynepeking348
ab1be93809 clean rs by revision instead of creat timestamp 2020-12-20 17:05:42 +08:00
waynepeking348
6c36a48550 fix bugs when copying deployment annotations to replicaSet if value is empty 2020-12-20 13:46:26 +08:00
Kubernetes Prow Robot
efb9489acb
Merge pull request #96617 from yuga711/dangling
Recover CSI volumes from dangling attachments
2020-12-16 14:54:30 -08:00
Kubernetes Prow Robot
c5efee02ac
Merge pull request #89465 from shibataka000/84142-cm
Fix HPA bug about unintentional scale out during updating deployment when using PodMetric.
2020-12-16 04:44:21 -08:00
Kubernetes Prow Robot
b97aa71519
Merge pull request #96825 from roycaihw/storage-version/conditions
storage-version: update conditions
2020-12-14 14:01:51 -08:00
Hao Yuan
5569db4902 fix nodelifecyle controller not add NoExecute taint bug 2020-12-14 09:34:57 +08:00
Jayasekhar Konduru
9b2b73600d Recover CSI volumes from dangling attachments
Change-Id: I72105d67d8a4069ab19bfa4638a7ac365cf4194c
2020-12-11 18:31:53 -08:00
ialidzhikov
bc432124a2 Remove CSINodeInfo feature gate
Signed-off-by: ialidzhikov <i.alidjikov@gmail.com>
2020-12-10 09:58:22 +02:00
Kubernetes Prow Robot
ce7ac8442e
Merge pull request #94599 from verult/adc-op-asw-race
Fixes Attach Detach Controller reconciler race reading ActualStateOfWorld and operation pending states
2020-12-08 16:28:53 -08:00
Kubernetes Prow Robot
0ee9c391f1
Merge pull request #92827 from yuanhuaiwang/disruptionresync
Remove resync period for disruption controller
2020-12-08 16:28:26 -08:00
KeZhang
67b40a50c6 Optimize log output 2020-12-08 11:20:24 +08:00
Haowei Cai
7a28ca419e generated 2020-12-02 14:32:50 -08:00
Haowei Cai
d13fe474f9 storage-version: update conditions 2020-12-02 14:32:34 -08:00
Kubernetes Prow Robot
248c116963
Merge pull request #96417 from hvenev-vmware/fix-ipam
Fix double counting of IP addresses
2020-11-23 07:45:33 -08:00
Hristo Venev
c8c81be8af range_allocator: Test (lack of) double counting 2020-11-22 23:09:09 -08:00
Hristo Venev
ee581278bd cidrset: Add test for double counting 2020-11-22 23:09:09 -08:00
Hristo Venev
4d28391c24 Fix double counting of IP addresses
The range allocator in pkg/controller/nodeipam/ipam/range_allocator.go
may call Occupy() on the same range twice:

1. Just before subscribing to the NodeInformer
2. From a callback given to the NodeInformer soon after registration
2020-11-22 23:09:09 -08:00
Jordan Liggitt
e491c3bc70 Add GC unit tests
Adds unit tests covering the problematic scenarios identified
around conflicting data in child owner references

                      Before   After
package level         51%      68%
garbagecollector.go   60%      75%
graph_builder.go      50%      81%
graph.go              50%      68%

Added/improved coverage of key functions that had lacking unit test coverage:

* attemptToDeleteWorker
* attemptToDeleteItem
* processGraphChanges (added coverage of all added code)
2020-11-17 10:49:32 -05:00
Jordan Liggitt
603a0b016e Log cluster-scoped owners referencing namespaced owners, avoid retrying lookups forever
If a cluster-scoped dependent references a namespace-scoped owner,
this is an invalid relationship, and the lookup will never succeed in attemptToDelete.

Short-circuit requeueing in attemptToDelete and log.
2020-11-17 10:49:30 -05:00
Jordan Liggitt
221e4aa2c2 Queue non-matching children for deletion when a virtual node is marked as observed
When we observe valid coordinates for a previously virtual node,
if there are dependents that do not agree with those coordinates,
add them to the attemptToDelete queue.

This queue will check the dependent's ownerReferences using the coordinates specified by the dependent.
If all of the owners can be verified absent, the dependent will be deleted.
If some are still present, or if there are errors looking them up, the dependent will not be deleted.

If the verified owner is namespaced, and the dependent is not in the same namespace,
an event will be recorded for user visibility, since cross-namespace ownerReferences are not supported.
2020-11-17 10:49:27 -05:00
Jordan Liggitt
b655f22509 Handle virtual delete events when children don't agree on owner coordinates
If a virtual delete event is received for a node whose dependents disagree on the parent's coordinates:
1. propagate the delete to children that matched the verified absent coordinates
2. if the existing node is virtual, select a new set of coordinates from the remaining dependents
3. do not delete the parent node from the graph if the parent node is non-virtual,
   or if there are dependents that do not agree with the virtual delete event coordinates
2020-11-17 10:49:07 -05:00
Jordan Liggitt
b8d7ecf73b Make node removal conditional in processGraphChanges 2020-11-17 10:49:04 -05:00
Jordan Liggitt
ac8d419b4c Enqueue dependents for deletion when their ownerReference does not match observed parent coordinates
When adding a dependent to the graph, we ensure there is a node representing each owner reference,
and add the dependent to each parent node.

If the parent node already exists, and the dependent's ownerReference
coordinates disagree with the verified coordinates, add the dependent to the attemptToDelete queue.

This queue will check the dependent's ownerReferences using the coordinates specified by the dependent.
If all of the owners can be verified absent, the dependent will be deleted.
If some are still present, or if there are errors looking them up, the dependent will not be deleted.

If the parent node has been observed via informer event (so we know the coordinates are accurate),
and the verified owner is namespaced, and the dependent is not in the same namespace,
an event will be recorded for user visibility, since cross-namespace ownerReferences are not supported.
2020-11-17 10:47:39 -05:00
Jordan Liggitt
78317edb8b Short-circuit attemptToDelete loop for virtual nodes that are removed or observed
Virtual nodes are added to the attemptToDelete queue, and continue getting requeued
until they are successfully verified absent or are observed via informer.

In the meantime, if the real object associated with that UID is observed via informer,
or is observed to be deleted via informer, the graph node for that UID can be removed
or marked as observed. In that case, we should stop retrying to get the virtual node coordinates.
2020-11-17 10:46:00 -05:00
Jordan Liggitt
cae56bea0a Replace virtual node with observed node if identity differs
If the graph contains a virtual node (because some child object referenced it in an OwnerRef),
and a real informer event is observed for that uid at different coordinates,
we want to fix the coordinates of the node in the graph to match the actual coordinates.

The safe way to do this is to clone the node, replace the identity in the clone,
then replace the node with the clone.

Modifying the identity directly is not safe because it is accessed lock-free from many code paths.

Replacing the node in the graph from processGraphChanges is safe because it is the only graph writer.
2020-11-17 10:42:48 -05:00
Jordan Liggitt
cb7b9ed532 Refactor identityFromEvent 2020-11-17 10:42:48 -05:00
Jordan Liggitt
30eb6683e6 Avoid marking virtual nodes as observed when they haven't been
Virtual nodes can be added to the GC graph in order to represent objects
which have not been observed via an informer, but are referenced via ownerReferences.

These virtual nodes are requeued into attemptToDelete until they are observed via an informer,
or successfully verified absent via a live lookup. Previously, both of those code paths
called markObserved() to stop requeuing into attemptToDelete.

Because it is useful to know whether a particular node has been observed via
a real informer event, this commit does the following:

* adds a `virtual bool` attribute to graph events so we know which ones came from a real informer
* limits the markObserved() call to the code path where a real informer event is observed
* uses an alternative mechanism to stop requeueing into attemptToDelete when a virtual node is verified absent via a live lookup
2020-11-17 10:42:48 -05:00
Jordan Liggitt
445f20dbdb Switch GC absentOwnerCache to full reference
Before deleting an object based on absent owners, GC verifies absence of those owners with a live lookup.

The coordinates used to perform that live lookup are the ones specified in the ownerReference of the child.

In order to performantly delete multiple children from the same parent (e.g. 1000 pods from a replicaset),
a 404 response to a lookup is cached in absentOwnerCache.

Previously, the cache was a simple uid set. However, since children can disagree on the coordinates
that should be used to look up a given uid, the cache should record the exact coordinates verified absent.
This is a [apiVersion, kind, namespace, name, uid] tuple.
2020-11-17 10:42:48 -05:00
Jordan Liggitt
09bdf76b8a Plumb event recorder to garbage collector controller 2020-11-17 10:42:45 -05:00
xiongzhongliang
90f4aeeea4 use klog.Info and klog.Warning when had no format 2020-11-14 00:55:06 +08:00
Kubernetes Prow Robot
da75c26648
Merge pull request #95978 from roycaihw/storage-version/gc
Storage version garbage collector
2020-11-12 18:36:37 -08:00
Haowei Cai
1d5a8f8f24 fixup! add storage version garbage collector 2020-11-12 16:34:27 -08:00
Haowei Cai
f675dac440 generated 2020-11-12 16:25:22 -08:00
Haowei Cai
ee9ace14c2 add storage version garbage collector 2020-11-12 16:21:00 -08:00
Kubernetes Prow Robot
765d949bfc
Merge pull request #96440 from robscott/endpointslice-pre-ga
Adding NodeName to EndpointSlice API, deprecation updates
2020-11-12 16:03:13 -08:00
Kubernetes Prow Robot
798eb07720
Merge pull request #96443 from alaypatel07/cronjob-controller-2-follow-up
handle slow cronjob lister in cronjob controller v2 and improve memory footprint
2020-11-12 13:16:52 -08:00
Kubernetes Prow Robot
4b46d44e0c
Merge pull request #96327 from robscott/app-protocol-ga
Graduating AppProtocol to GA
2020-11-12 13:16:39 -08:00
Kubernetes Prow Robot
55856ed727
Merge pull request #93130 from zshihang/master
plumb service account token down to csi driver
2020-11-12 13:16:25 -08:00
Rob Scott
84e4b30a3e
Updates related to PR feedback
- Remove feature gate consideration from EndpointSlice validation
- Deprecate topology field, note that it will be removed in future
release
- Update kube-proxy to check for NodeName if feature gate is enabled
- Add comments indicating the feature gates that can be used to enable
alpha API fields
- Add comments explaining use of deprecated address type in tests
2020-11-12 12:30:50 -08:00
Kubernetes Prow Robot
e38b1b94f8
Merge pull request #96399 from andrewsykim/service-config
move service controller config to k8s.io/cloud-provider/controllers/service/config
2020-11-12 11:21:57 -08:00
Shihang Zhang
d2859cd89b plumb service account token down to csi driver 2020-11-12 09:26:43 -08:00
Rob Scott
d985438772
Updating EndpointSlice controllers to support NodeName field 2020-11-11 16:50:36 -08:00
Alay Patel
15089aab94 update bazel 2020-11-11 19:47:11 -05:00
Alay Patel
d6ca5b8d14 handle the case for slow cronjob lister, add unit tests 2020-11-11 18:48:57 -05:00
Alay Patel
41c82e69ed convert to stardard lister, use []*batchv1.Job instead of []batchv1.Job 2020-11-11 18:48:57 -05:00
Kubernetes Prow Robot
667d1c2c3f
Merge pull request #93370 from alaypatel07/add-new-cronjob-controller
Add cronjob controller v2
2020-11-11 15:42:50 -08:00
Kubernetes Prow Robot
423f8731ef
Merge pull request #95719 from tsmetana/add-pv_collector-provisioner-metric
PV Controller: Add plugin name and volume mode to PV metrics
2020-11-11 01:49:49 -08:00
10177505
c9abf09136 Remove the redundant define name 2020-11-11 12:00:57 +08:00
Alay Patel
38bb53555e update violation_exceptions.list and make generated 2020-11-10 17:32:06 -05:00
Alay Patel
8d7dd4415e add cronjob_controllerv2.go 2020-11-10 17:32:06 -05:00
Andrew Sy Kim
b1e0decce1 move service controller config to k8s.io/cloud-provider/controllers/service/config
Signed-off-by: Andrew Sy Kim <kim.andrewsy@gmail.com>
2020-11-10 14:59:44 -05:00
Rob Scott
b044fadf66
Graduating AppProtocol to GA 2020-11-09 11:08:19 -08:00
Tim Hockin
819ff9b087
Use topology labels instead of old beta names (#96033)
* Rename const for topology.../zone

* Rename const for topology.../region

* Rename const for failure-domain.../zone

* Rename const for failure-domain.../region

* Restore old names for compat
2020-11-05 20:26:50 -08:00
Andrew Sy Kim
7cf19e5fb7 endpointslice API: rename 'accepting' condition to 'serving' condition
Signed-off-by: Andrew Sy Kim <kim.andrewsy@gmail.com>
2020-11-05 19:18:45 -05:00
Andrew Sy Kim
17cf1b4415 endpointslice controller: add test cases to TestSyncServiceFull for terminating endpoints
Signed-off-by: Andrew Sy Kim <kim.andrewsy@gmail.com>
2020-11-05 19:18:45 -05:00
Andrew Sy Kim
2947f5ce4f endpointslice controller: refactor TestSyncServiceFull to use test tables
Signed-off-by: Andrew Sy Kim <kim.andrewsy@gmail.com>
2020-11-05 19:18:45 -05:00
Andrew Sy Kim
1c603e90ef endpointslice controller: set new conditions 'accepting' and 'terminating'
Signed-off-by: Andrew Sy Kim <kim.andrewsy@gmail.com>
2020-11-05 19:18:45 -05:00
Kubernetes Prow Robot
2d6cd683bd
Merge pull request #95541 from cofyc/fix95538
volume binding: report UnschedulableAndUnresolvable status instead of an error when bound PVs not found
2020-11-05 15:16:51 -08:00
Shihang Zhang
2c378beb64 abort if namespace doesn't exist or terminating 2020-11-05 11:12:15 -08:00
Yecheng Fu
0961891a7a report UnschedulableAndUnresolvable status instead of an error when PVCs can't find bound
persistent volumes

This is an user error. We should't report an error.
2020-11-05 10:28:40 +08:00
Shihang Zhang
d40f0c43c4 separate RootCAConfigMap from BoundServiceAccountTokenVolume 2020-11-04 17:10:39 -08:00
Kubernetes Prow Robot
096819c963
Merge pull request #95909 from pohly/pv-controller-delete-pv-fix
PV controller: don't delete PVs when PVC is not known yet
2020-11-02 02:00:52 -08:00
zouyu
7dd4622c84 Some comments' typos
Signed-off-by: zouyu <zouy.fnst@cn.fujitsu.com>
2020-11-02 15:05:23 +08:00
Kubernetes Prow Robot
4b65f70652
Merge pull request #95740 from cici37/moveCCM
Move cloud-controller-manager to staging k8s.io/cloud-provider
2020-10-30 13:48:51 -07:00
cici37
9465d95ea6 Move CCM to staging k8s.io/cloud-provider 2020-10-29 20:50:23 -07:00
Cheng Xing
d9a629fe3a IsVolumeAttachedToNode() renamed to GetAttachState(), and returns 3 states instead of combining "uncertain" and "detached" into "false" 2020-10-29 13:24:51 -07:00
cici37
a91a2cdad6 Move informer_factory to staging 2020-10-29 12:20:33 -07:00
Patrick Ohly
24f5764787 pv controller test: more test cases
The main goal was to cover retrieval of a PVC from the apiserver when
it isn't known yet. This is achieved by adding PVCs and (for the sake
of completeness) PVs to the reactor, but not the controller, when a
special annotation is set. The approach with a special annotation was
chosen because it doesn't affect other tests.

The other test cases were added while checking the existing tests
because (at least at first glance) the situations seemed to be not
covered.
2020-10-28 10:52:11 +01:00
Patrick Ohly
22f81e9e0b pv controller test: use sub tests
This makes it possible to run individual tests.
2020-10-28 10:39:59 +01:00
Patrick Ohly
06f934ea1f pv controller test: enable klog output
This makes it possible to run tests with -v=5 and thus actually get
some output.
2020-10-28 10:39:10 +01:00
Cheng Xing
a61743b125 Fixes Attach Detach Controller reconciler race reading ActualStateOfWorld and operation pending states; fixes reconciler_test mock detach to account for multiple attaches on a node 2020-10-27 23:51:55 -07:00
Kubernetes Prow Robot
554319cce8
Merge pull request #95410 from benhxy/staticcheck
Fix static check for pkg/controller/podautoscaler
2020-10-27 10:36:14 -07:00
Patrick Ohly
5686664a1d PV controller: don't delete PVs when PVC is not known yet
Normally, the PV controller knows about the PVC that triggers the
creation of a PV before it sees the PV, because the PV controller must
set the volume.beta.kubernetes.io/storage-provisioner annotation that
tells an external provisioner to create the PV.

When restarting, the PV controller first syncs its caches, so that
case is also covered.

However, the creator of a PVC might decided to set that annotation
itself to speed up volume creation. While unusual, it's not forbidden
and thus part of the external Kubernetes API. Whether it makes sense
depends on the intentions of the user.

When that is done and there is heavy load, an external provisioner
might see the PVC and create a PV before the PV controller sees the
PVC. If the PV controller then encounters the PV before the PVC, it
incorrectly concludes that the PV needs to be deleted instead of being
bound.

The same issue occurred earlier for external binding and the existing
code for looking up a PVC in the cache or in the apiserver solves the
issue also for volume provisioning, it just needs to be enabled also
for PVs without the pv.kubernetes.io/bound-by-controller annotation.
2020-10-27 11:26:58 +01:00
Khaled Henidak (Kal)
6675eba3ef
dual stack services (#91824)
* api: structure change

* api: defaulting, conversion, and validation

* [FIX] validation: auto remove second ip/family when service changes to SingleStack

* [FIX] api: defaulting, conversion, and validation

* api-server: clusterIPs alloc, printers, storage and strategy

* [FIX] clusterIPs default on read

* alloc: auto remove second ip/family when service changes to SingleStack

* api-server: repair loop handling for clusterIPs

* api-server: force kubernetes default service into single stack

* api-server: tie dualstack feature flag with endpoint feature flag

* controller-manager: feature flag, endpoint, and endpointSlice controllers handling multi family service

* [FIX] controller-manager: feature flag, endpoint, and endpointSlicecontrollers handling multi family service

* kube-proxy: feature-flag, utils, proxier, and meta proxier

* [FIX] kubeproxy: call both proxier at the same time

* kubenet: remove forced pod IP sorting

* kubectl: modify describe to include ClusterIPs, IPFamilies, and IPFamilyPolicy

* e2e: fix tests that depends on IPFamily field AND add dual stack tests

* e2e: fix expected error message for ClusterIP immutability

* add integration tests for dualstack

the third phase of dual stack is a very complex change in the API,
basically it introduces Dual Stack services. Main changes are:

- It pluralizes the Service IPFamily field to IPFamilies,
and removes the singular field.
- It introduces a new field IPFamilyPolicyType that can take
3 values to express the "dual-stack(mad)ness" of the cluster:
SingleStack, PreferDualStack and RequireDualStack
- It pluralizes ClusterIP to ClusterIPs.

The goal is to add coverage to the services API operations,
taking into account the 6 different modes a cluster can have:

- single stack: IP4 or IPv6 (as of today)
- dual stack: IPv4 only, IPv6 only, IPv4 - IPv6, IPv6 - IPv4

* [FIX] add integration tests for dualstack

* generated data

* generated files

Co-authored-by: Antonio Ojea <aojea@redhat.com>
2020-10-26 13:15:59 -07:00
Ben Hu
4e62298c1b Fix static checks for pkg/controller/podautoscaler 2020-10-23 18:53:07 +00:00
Kubernetes Prow Robot
ec453ffb1a
Merge pull request #90691 from arjunrn/container-resource-hpa
Add container based scaling to HPA
2020-10-23 05:51:51 -07:00
Tomas Smetana
0c4e05a245 PV Controller: PV plugin and mode metrics 2020-10-23 14:35:35 +02:00
Kubernetes Prow Robot
106ee38796
Merge pull request #95647 from JoshuaAndrew/master
Horizontal Pod Autoscaler doesn`t automatically scale the number of pods correctly
2020-10-23 04:05:59 -07:00
Kubernetes Prow Robot
1257bc5acb
Merge pull request #91474 from cici37/pkgController
Cleanup CCM dependencies
2020-10-22 23:17:45 -07:00
Kubernetes Prow Robot
afa941b8e1
Merge pull request #95789 from qingsenLi/k8s201023
remove unused const failedExpiration
2020-10-22 22:17:35 -07:00
Kubernetes Prow Robot
153d33091b
Merge pull request #95632 from mrkm4ntr/remove-redundant-variable
Remove redundant variable
2020-10-22 22:16:48 -07:00
qingsenLi
30bfa7d078 remove unused const failedExpiration 2020-10-22 18:57:36 +08:00
weiwei
b19a115f42 If we set SelectPolicy MinPolicySelect on scaleUp behavior or scaleDown behavior,Horizontal Pod Autoscaler doesn`t automatically scale the number of pods correctly
Signed-off-by: weiwei <weiwei@tenxcloud.com>
2020-10-22 18:00:49 +08:00
Arjun Naik
0fec7b0f7e Added functionality and API for pod autoscaling based on container resources
Signed-off-by: Arjun Naik <anaik@redhat.com>
2020-10-21 21:10:05 +02:00
Kubernetes Prow Robot
163b23f163
Merge pull request #95529 from cici37/cleanup
Add back openapi gen for generic types and clean up doc.go
2020-10-20 11:22:34 -07:00
Kubernetes Prow Robot
3175b59ac2
Merge pull request #94489 from ialidzhikov/fix/volume-expand
Do not assume storageclass is still in-tree after csi migration
2020-10-19 15:08:07 -07:00
cici37
95acec5a3b Move client_builder to k8s.io/controller-manager 2020-10-19 14:48:22 -07:00
cici37
0d2002229f Add back openapi gen for generic types and clean up doc.go 2020-10-16 10:54:15 -07:00
Shintaro Murakami
acc970399d Remove redundant variable
The variable firstUnhealthyOrdinal is redundant because replicas and condemned are already sorted in ascending order.
2020-10-16 09:53:34 +09:00
Joseph Burnett
1ccaaa768d Ignore deleted pods.
When a pod is deleted, it is given a deletion timestamp. However the
pod might still run for some time during graceful shutdown. During
this time it might still produce CPU utilization metrics and be in a
Running phase.

Currently the HPA replica calculator attempts to ignore deleted pods
by skipping over them. However by not adding them to the ignoredPods
set, their metrics are not removed from the average utilization
calculation. This allows pods in the process of shutting down to drag
down the recommmended number of replicas by producing near 0%
utilization metrics.

In fact the ignoredPods set is misnomer. Those pods are not fully
ignored. When the replica calculator recommends to scale up, 0%
utilization metrics are filled in for those pods to limit the scale
up. This prevents overscaling when pods take some time to startup. In
fact, there should be 4 sets considered (readyPods, unreadyPods,
missingPods, ignoredPods) not just 3.

This change renames ignoredPods as unreadyPods and leaves the scaleup
limiting semantics. Another set (actually) ignoredPods is added to
which delete pods are added instead of being skipped during
grouping. Both ignoredPods and unreadyPods have their metrics removed
from consideration. But only unreadyPods have 0% utilization metrics
filled in upon scaleup.
2020-10-14 16:45:06 +02:00
Kubernetes Prow Robot
8647eece9c
Merge pull request #95113 from Git-Jiro/lint_ttlcontroller
Lint ttl_controller
2020-10-13 22:51:53 -07:00
Kubernetes Prow Robot
ea896a2e64
Merge pull request #95224 from Git-Jiro/lint_endpoint
Fix lint errors in pkg/contoller/endpoint
2020-10-13 12:06:27 -07:00
cici37
ae8ce0d190 Move cmd/controller-manager to k8s.io/controller-manager and cloud specific configs to k8s.io/cloud-provider. 2020-10-08 13:23:16 -07:00
Kubernetes Prow Robot
f30d6a463d
Merge pull request #93779 from yodarshafrir1/fix_restart_job_failure_with_restart_policy_never
Fix job backoff limit for restart policy Never
2020-10-06 10:02:44 -07:00