Commit Graph

5160 Commits

Author SHA1 Message Date
Kubernetes Prow Robot
b97aa71519
Merge pull request #96825 from roycaihw/storage-version/conditions
storage-version: update conditions
2020-12-14 14:01:51 -08:00
Kubernetes Prow Robot
ce7ac8442e
Merge pull request #94599 from verult/adc-op-asw-race
Fixes Attach Detach Controller reconciler race reading ActualStateOfWorld and operation pending states
2020-12-08 16:28:53 -08:00
Kubernetes Prow Robot
0ee9c391f1
Merge pull request #92827 from yuanhuaiwang/disruptionresync
Remove resync period for disruption controller
2020-12-08 16:28:26 -08:00
Haowei Cai
7a28ca419e generated 2020-12-02 14:32:50 -08:00
Haowei Cai
d13fe474f9 storage-version: update conditions 2020-12-02 14:32:34 -08:00
Kubernetes Prow Robot
248c116963
Merge pull request #96417 from hvenev-vmware/fix-ipam
Fix double counting of IP addresses
2020-11-23 07:45:33 -08:00
Hristo Venev
c8c81be8af range_allocator: Test (lack of) double counting 2020-11-22 23:09:09 -08:00
Hristo Venev
ee581278bd cidrset: Add test for double counting 2020-11-22 23:09:09 -08:00
Hristo Venev
4d28391c24 Fix double counting of IP addresses
The range allocator in pkg/controller/nodeipam/ipam/range_allocator.go
may call Occupy() on the same range twice:

1. Just before subscribing to the NodeInformer
2. From a callback given to the NodeInformer soon after registration
2020-11-22 23:09:09 -08:00
Jordan Liggitt
e491c3bc70 Add GC unit tests
Adds unit tests covering the problematic scenarios identified
around conflicting data in child owner references

                      Before   After
package level         51%      68%
garbagecollector.go   60%      75%
graph_builder.go      50%      81%
graph.go              50%      68%

Added/improved coverage of key functions that had lacking unit test coverage:

* attemptToDeleteWorker
* attemptToDeleteItem
* processGraphChanges (added coverage of all added code)
2020-11-17 10:49:32 -05:00
Jordan Liggitt
603a0b016e Log cluster-scoped owners referencing namespaced owners, avoid retrying lookups forever
If a cluster-scoped dependent references a namespace-scoped owner,
this is an invalid relationship, and the lookup will never succeed in attemptToDelete.

Short-circuit requeueing in attemptToDelete and log.
2020-11-17 10:49:30 -05:00
Jordan Liggitt
221e4aa2c2 Queue non-matching children for deletion when a virtual node is marked as observed
When we observe valid coordinates for a previously virtual node,
if there are dependents that do not agree with those coordinates,
add them to the attemptToDelete queue.

This queue will check the dependent's ownerReferences using the coordinates specified by the dependent.
If all of the owners can be verified absent, the dependent will be deleted.
If some are still present, or if there are errors looking them up, the dependent will not be deleted.

If the verified owner is namespaced, and the dependent is not in the same namespace,
an event will be recorded for user visibility, since cross-namespace ownerReferences are not supported.
2020-11-17 10:49:27 -05:00
Jordan Liggitt
b655f22509 Handle virtual delete events when children don't agree on owner coordinates
If a virtual delete event is received for a node whose dependents disagree on the parent's coordinates:
1. propagate the delete to children that matched the verified absent coordinates
2. if the existing node is virtual, select a new set of coordinates from the remaining dependents
3. do not delete the parent node from the graph if the parent node is non-virtual,
   or if there are dependents that do not agree with the virtual delete event coordinates
2020-11-17 10:49:07 -05:00
Jordan Liggitt
b8d7ecf73b Make node removal conditional in processGraphChanges 2020-11-17 10:49:04 -05:00
Jordan Liggitt
ac8d419b4c Enqueue dependents for deletion when their ownerReference does not match observed parent coordinates
When adding a dependent to the graph, we ensure there is a node representing each owner reference,
and add the dependent to each parent node.

If the parent node already exists, and the dependent's ownerReference
coordinates disagree with the verified coordinates, add the dependent to the attemptToDelete queue.

This queue will check the dependent's ownerReferences using the coordinates specified by the dependent.
If all of the owners can be verified absent, the dependent will be deleted.
If some are still present, or if there are errors looking them up, the dependent will not be deleted.

If the parent node has been observed via informer event (so we know the coordinates are accurate),
and the verified owner is namespaced, and the dependent is not in the same namespace,
an event will be recorded for user visibility, since cross-namespace ownerReferences are not supported.
2020-11-17 10:47:39 -05:00
Jordan Liggitt
78317edb8b Short-circuit attemptToDelete loop for virtual nodes that are removed or observed
Virtual nodes are added to the attemptToDelete queue, and continue getting requeued
until they are successfully verified absent or are observed via informer.

In the meantime, if the real object associated with that UID is observed via informer,
or is observed to be deleted via informer, the graph node for that UID can be removed
or marked as observed. In that case, we should stop retrying to get the virtual node coordinates.
2020-11-17 10:46:00 -05:00
Jordan Liggitt
cae56bea0a Replace virtual node with observed node if identity differs
If the graph contains a virtual node (because some child object referenced it in an OwnerRef),
and a real informer event is observed for that uid at different coordinates,
we want to fix the coordinates of the node in the graph to match the actual coordinates.

The safe way to do this is to clone the node, replace the identity in the clone,
then replace the node with the clone.

Modifying the identity directly is not safe because it is accessed lock-free from many code paths.

Replacing the node in the graph from processGraphChanges is safe because it is the only graph writer.
2020-11-17 10:42:48 -05:00
Jordan Liggitt
cb7b9ed532 Refactor identityFromEvent 2020-11-17 10:42:48 -05:00
Jordan Liggitt
30eb6683e6 Avoid marking virtual nodes as observed when they haven't been
Virtual nodes can be added to the GC graph in order to represent objects
which have not been observed via an informer, but are referenced via ownerReferences.

These virtual nodes are requeued into attemptToDelete until they are observed via an informer,
or successfully verified absent via a live lookup. Previously, both of those code paths
called markObserved() to stop requeuing into attemptToDelete.

Because it is useful to know whether a particular node has been observed via
a real informer event, this commit does the following:

* adds a `virtual bool` attribute to graph events so we know which ones came from a real informer
* limits the markObserved() call to the code path where a real informer event is observed
* uses an alternative mechanism to stop requeueing into attemptToDelete when a virtual node is verified absent via a live lookup
2020-11-17 10:42:48 -05:00
Jordan Liggitt
445f20dbdb Switch GC absentOwnerCache to full reference
Before deleting an object based on absent owners, GC verifies absence of those owners with a live lookup.

The coordinates used to perform that live lookup are the ones specified in the ownerReference of the child.

In order to performantly delete multiple children from the same parent (e.g. 1000 pods from a replicaset),
a 404 response to a lookup is cached in absentOwnerCache.

Previously, the cache was a simple uid set. However, since children can disagree on the coordinates
that should be used to look up a given uid, the cache should record the exact coordinates verified absent.
This is a [apiVersion, kind, namespace, name, uid] tuple.
2020-11-17 10:42:48 -05:00
Jordan Liggitt
09bdf76b8a Plumb event recorder to garbage collector controller 2020-11-17 10:42:45 -05:00
Kubernetes Prow Robot
da75c26648
Merge pull request #95978 from roycaihw/storage-version/gc
Storage version garbage collector
2020-11-12 18:36:37 -08:00
Haowei Cai
1d5a8f8f24 fixup! add storage version garbage collector 2020-11-12 16:34:27 -08:00
Haowei Cai
f675dac440 generated 2020-11-12 16:25:22 -08:00
Haowei Cai
ee9ace14c2 add storage version garbage collector 2020-11-12 16:21:00 -08:00
Kubernetes Prow Robot
765d949bfc
Merge pull request #96440 from robscott/endpointslice-pre-ga
Adding NodeName to EndpointSlice API, deprecation updates
2020-11-12 16:03:13 -08:00
Kubernetes Prow Robot
798eb07720
Merge pull request #96443 from alaypatel07/cronjob-controller-2-follow-up
handle slow cronjob lister in cronjob controller v2 and improve memory footprint
2020-11-12 13:16:52 -08:00
Kubernetes Prow Robot
4b46d44e0c
Merge pull request #96327 from robscott/app-protocol-ga
Graduating AppProtocol to GA
2020-11-12 13:16:39 -08:00
Kubernetes Prow Robot
55856ed727
Merge pull request #93130 from zshihang/master
plumb service account token down to csi driver
2020-11-12 13:16:25 -08:00
Rob Scott
84e4b30a3e
Updates related to PR feedback
- Remove feature gate consideration from EndpointSlice validation
- Deprecate topology field, note that it will be removed in future
release
- Update kube-proxy to check for NodeName if feature gate is enabled
- Add comments indicating the feature gates that can be used to enable
alpha API fields
- Add comments explaining use of deprecated address type in tests
2020-11-12 12:30:50 -08:00
Kubernetes Prow Robot
e38b1b94f8
Merge pull request #96399 from andrewsykim/service-config
move service controller config to k8s.io/cloud-provider/controllers/service/config
2020-11-12 11:21:57 -08:00
Shihang Zhang
d2859cd89b plumb service account token down to csi driver 2020-11-12 09:26:43 -08:00
Rob Scott
d985438772
Updating EndpointSlice controllers to support NodeName field 2020-11-11 16:50:36 -08:00
Alay Patel
15089aab94 update bazel 2020-11-11 19:47:11 -05:00
Alay Patel
d6ca5b8d14 handle the case for slow cronjob lister, add unit tests 2020-11-11 18:48:57 -05:00
Alay Patel
41c82e69ed convert to stardard lister, use []*batchv1.Job instead of []batchv1.Job 2020-11-11 18:48:57 -05:00
Kubernetes Prow Robot
667d1c2c3f
Merge pull request #93370 from alaypatel07/add-new-cronjob-controller
Add cronjob controller v2
2020-11-11 15:42:50 -08:00
Kubernetes Prow Robot
423f8731ef
Merge pull request #95719 from tsmetana/add-pv_collector-provisioner-metric
PV Controller: Add plugin name and volume mode to PV metrics
2020-11-11 01:49:49 -08:00
Alay Patel
38bb53555e update violation_exceptions.list and make generated 2020-11-10 17:32:06 -05:00
Alay Patel
8d7dd4415e add cronjob_controllerv2.go 2020-11-10 17:32:06 -05:00
Andrew Sy Kim
b1e0decce1 move service controller config to k8s.io/cloud-provider/controllers/service/config
Signed-off-by: Andrew Sy Kim <kim.andrewsy@gmail.com>
2020-11-10 14:59:44 -05:00
Rob Scott
b044fadf66
Graduating AppProtocol to GA 2020-11-09 11:08:19 -08:00
Tim Hockin
819ff9b087
Use topology labels instead of old beta names (#96033)
* Rename const for topology.../zone

* Rename const for topology.../region

* Rename const for failure-domain.../zone

* Rename const for failure-domain.../region

* Restore old names for compat
2020-11-05 20:26:50 -08:00
Andrew Sy Kim
7cf19e5fb7 endpointslice API: rename 'accepting' condition to 'serving' condition
Signed-off-by: Andrew Sy Kim <kim.andrewsy@gmail.com>
2020-11-05 19:18:45 -05:00
Andrew Sy Kim
17cf1b4415 endpointslice controller: add test cases to TestSyncServiceFull for terminating endpoints
Signed-off-by: Andrew Sy Kim <kim.andrewsy@gmail.com>
2020-11-05 19:18:45 -05:00
Andrew Sy Kim
2947f5ce4f endpointslice controller: refactor TestSyncServiceFull to use test tables
Signed-off-by: Andrew Sy Kim <kim.andrewsy@gmail.com>
2020-11-05 19:18:45 -05:00
Andrew Sy Kim
1c603e90ef endpointslice controller: set new conditions 'accepting' and 'terminating'
Signed-off-by: Andrew Sy Kim <kim.andrewsy@gmail.com>
2020-11-05 19:18:45 -05:00
Kubernetes Prow Robot
2d6cd683bd
Merge pull request #95541 from cofyc/fix95538
volume binding: report UnschedulableAndUnresolvable status instead of an error when bound PVs not found
2020-11-05 15:16:51 -08:00
Shihang Zhang
2c378beb64 abort if namespace doesn't exist or terminating 2020-11-05 11:12:15 -08:00
Yecheng Fu
0961891a7a report UnschedulableAndUnresolvable status instead of an error when PVCs can't find bound
persistent volumes

This is an user error. We should't report an error.
2020-11-05 10:28:40 +08:00