Update the maximum sync backoff value to 1000s to match the sequence of
delays expected by the endpointslice controller when syncing Services:
Before this change the sequence was:
> 1s, 2s, 4s, 8s, 16s, 32s, 64s, 100s
Now it is:
> 1s, 2s, 4s, 8s, 16s, 32s, 64s, 128s, 256s, 512s, 1000s
Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
Fixes instances of #98213 (to ultimately complete #98213 linting is
required).
This commit fixes a few instances of a common mistake done when writing
parallel subtests or Ginkgo tests (basically any test in which the test
closure is dynamically created in a loop and the loop doesn't wait for
the test closure to complete).
I'm developing a very specific linter that detects this king of mistake
and these are the only violations of it it found in this repo (it's not
airtight so there may be more).
In the case of Ginkgo tests, without this fix, only the last entry in
the loop iteratee is actually tested. In the case of Parallel tests I
think it's the same problem but maybe a bit different, iiuc it depends
on the execution speed.
Waiting for the CI to confirm the tests are still passing, even after
this fix - since it's likely it's the first time those test cases are
executed - they may be buggy or testing code that is buggy.
Another instance of this is in `test/e2e/storage/csi_mock_volume.go` and
is still failing so it has been left out of this commit and will be
addressed in a separate one
To be able to implement controllers that are dynamically deciding
on which resources to watch, it is required to get rid of
dedicated watches and event handlers again. This requires the
possibility to remove event handlers from SharedIndexInformers again.
Stopping an informer is not sufficient, because there might
be multiple controllers in a controller manager that independently
decide which resources to watch.
Unfortunately the ResourceEventHandler interface encourages to use
value objects for handlers (like the ResourceEventHandlerFuncs
struct, that uses value receivers to implement the interface).
Go does not support comparison of function pointers and therefore
the comparison of such structs is not possible, also. To be able
to remove all kinds of handlers and to solve the problem of
multi-registrations of handlers a registration handle is introduced.
It is returned when adding a handler and can later be used to remove
the registration again. This handle directly stores the created
listener to simplify the deletion.
The Priority is determined as follows:
P0: ClusterCIDR with higher number of matching labels has highest
priority.
P1: ClusterCIDR having cidrSet with fewer allocatable Pod CIDRs has
higher priority.
P2: ClusterCIDR with a PerNodeMaskSize having fewer IPs has higher
priority.
P3: ClusterCIDR having label with lower alphanumeric value has higher
priority.
P4: ClusterCIDR with a cidrSet having a smaller IP address value has
higher priority.
Add a new cidrset named `multicidrset` which extends the current
cidrset mechanism to track allocatable Pod and Service CIDRs.
multicidrset stores the info about allocated CIDRs in a Map as opposed
to the current cidrset implementation where it is stored in a bitmap.
Add a new call to VolumePlugin interface and change all its
implementations.
Kubelet's VolumeManager will be interested whether a volume supports
mounting with -o conext=XYZ or not to hanle SetUp() / MountDevice()
accordingly.
In future commits we will need this to set the user/group of supported
volumes of KEP 127 - Phase 1.
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
- PreemptionByKubeScheduler (Pod preempted by kube-scheduler)
- DeletionByTaintManager (Pod deleted by taint manager due to NoExecute taint)
- EvictionByEvictionAPI (Pod evicted by Eviction API)
- DeletionByPodGC (an orphaned Pod deleted by PodGC)PreemptedByScheduler (Pod preempted by kube-scheduler)
DaemonSetsController adds a "nodeName" index to PodIndexer, which is
redundant with the "spec.nodeName" index of NodeLifecycleController.
However, DaemonSetsController hasn't been using this index since #86730.
This patch removes the redundant and unused index to reduce memory and
CPU spent on it.
Signed-off-by: Quan Tian <qtian@vmware.com>
The field replicaChange in timestampedScaleEvent was wrongly described
as either positive or negative depending on the scale direction. In
fact the change is set as unsigned, positive or 0 even for downscales.
- Run hack/update-codegen.sh
- Run hack/update-generated-device-plugin.sh
- Run hack/update-generated-protobuf.sh
- Run hack/update-generated-runtime.sh
- Run hack/update-generated-swagger-docs.sh
- Run hack/update-openapi-spec.sh
- Run hack/update-gofmt.sh
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
The evictorLock only protects zonePodEvictor and zoneNoExecuteTainter.
processTaintBaseEviction showed indications of increased lock contention
among goroutines (see issue 110341 for more details).
The refactor done is to ensure that all codepaths in that function that
hold the evictorLock AND make API calls under the lock, are now making
API calls outside the lock and the lock is held only for accessing either
zonePodEvictor or zoneNoExecuteTainter or both.
Two other places where the refactor was done is the doEvictionPass and
doNoExecuteTaintingPass functions which make multiple API calls under
the evictorLock.
Signed-off-by: Madhav Jivrajani <madhav.jiv@gmail.com>
Fix a TODO to plumb an update filter from above in the resource quota
monitor code that was handling update events for quota-able objects,
instead of hard-coding the logic in the resource quota monitor.
Signed-off-by: Andy Goldstein <andy.goldstein@redhat.com>
6 minute force-deatch timeout should be used only for nodes that are not
healthy.
In case a CSI driver is being upgraded or it's simply slow, NodeUnstage
can take more than 6 minutes. In that case, Pod is already deleted from the
API server and thus A/D controller will force-detach a mounted volume,
possibly corrupting the volume and breaking CSI - a CSI driver expects
NodeUnstage to succeed before Kubernetes can call ControllerUnpublish.
When a Pod is referencing a Node that doesn't exist on the local
informer cache, the current behavior was to return an error to
retry later and stop processing.
However, this can cause scenarios that a missing node leaves a
Slice stuck, it can no reflect other changes, or be created.
Also, this doesn't respect the publishNotReadyAddresses options
on Services, that considers ok to publish pod Addresses that are
known to not be ready.
The new behavior keeps retrying the problematic Service, but it
keeps processing the updates, reflacting current state on the
EndpointSlice. If the publishNotReadyAddresses is set, a missing
node on a Pod is not treated as an error.
There is always a placeholder slice.
The ServicePortCache logic was considering always one endpointSlice
per Endpoint, but if there are multiple empty Endpoints, we just
use one placeholder slice, not multiple placeholder slices.
Fixes Issue 108231 by checking `slicesToDelete` in the EndpointSlice
reconciler for a pre-existing placeholder slice.
Also adds a helper function for comparing the slices.
in case its status is False or Unknown.
In case the status of the pre-existing condition is true we ignore the new
condition. If there is no pre-existing failed condition, then append
the new failed condition as before.
Also, make the condition comparisons less hacky by ignoring timestamp fields
in tests.
Terminal pods, whose phase its Failed or Succeeded, are guaranteed
to never regress and to be stopped, so their IPs never should
be published on the Endpoints.
* feat: Provide previous replica count for deployment/replica set scale up/down event
Signed-off-by: GitHub <noreply@github.com>
* change format of event
Co-authored-by: Maciej Szulik <soltysh@gmail.com>
Co-authored-by: Maciej Szulik <soltysh@gmail.com>
In some rare race conditions, the job controller might create new pods after the job is declared finished.
Change-Id: I8a00429c8845463259cd7f82bb3c241d0011583c
When calculating the scale-up/scale-down limit, the number of replicas
at the start of the scaling policy period is calculated correctly by
taken into account the number of scaled-up and scaled-down replicas.
Signed-off-by: Olivier Michaelis <38879457+oliviermichaelis@users.noreply.github.com>
Remove the comment "As of v1.22, this field is beta and is controlled
via the CSRDuration feature gate" from the expirationSeconds field's
godoc.
Mark the "CSRDuration" feature gate as GA in 1.24, lock its value to
"true", and remove the various logic which handled when the gate was
"false".
Update conformance test to check that the CertificateSigningRequest's
Spec.ExpirationSeconds field is stored, but do not check if the field
is honored since this functionality is optional.
This patch aims to simplify decoupling "pkg/scheduler/framework/plugins"
from internal "k8s.io/kubernetes" packages. More described in
issue #89930 and PR #102953.
Some helpers from "k8s.io/kubernetes/pkg/controller/volume/persistentvolume"
package moved to "k8s.io/component-helpers/storage/volume" package:
- IsDelayBindingMode
- GetBindVolumeToClaim
- IsVolumeBoundToClaim
- FindMatchingVolume
- CheckVolumeModeMismatches
- CheckAccessModes
- GetVolumeNodeAffinity
Also "CheckNodeAffinity" from "k8s.io/kubernetes/pkg/volume/util"
package moved to "k8s.io/component-helpers/storage/volume" package
to prevent diamond dependency conflict.
Signed-off-by: Konstantin Misyutin <konstantin.misyutin@huawei.com>
- Lock feature gate to true and schedule for deletion in 1.26
- Remove checks on feature gate
- Graduate E2E test to Conformance
Change-Id: I6814819d318edaed5c86dae4055f4b050a4d39fd
All the controllers should use context for signalling termination of communication with API server. Once kcm cancels context all the cert controllers which are started via kcm should cancel the APIServer request in flight instead of hanging around.
The field is not used anywhere and its value may be stale as Endpoints
and EndpointSlice won't be updated if there is only Pod ResourceVersion
change..
- actual_state_of_world_test.go: test the new method GetVolumesToReportAttachedForNode
for an existing node and a non-existing node
- node_status_updater_test.go: test UpdateNodeStatuses and UpdateNodeStatuses in nominal
case with 2 nodes getting one volume each. Test UpdateNodeStatuses with the first call
to node.patch failing but the following one succeeding
- add comment in node_status_updater.go
- fix log line in reconciler.go
- rename variable in actual_state_of_world.go
The UpdateNodeStatuses code stops too early in case there is
an error when calling updateNodeStatus. It will return immediately
which means any remaining node won't have its update status put back
to true.
Looking at the call sites for UpdateNodeStatuses, it appears this is
not the only issue. If the lister call fails with anything but a Not Found
error, it's silently ignored which is wrong in the detach path.
Also the reconciler detach path calls UpdateNodeStatuses but the real intent
is to only update the node currently processed in the loop and not proceed
with the detach call if there is an error updating that specifi node volumesAttached
property. With the current implementation, it will not proceed if there is
an error updating another node (which is not completely bad but not ideal) and
worse it will proceed if there is a lister error on that node which means the
node volumesAttached property won't have been updated.
To fix those issues, introduce the following changes:
- [node_status_updater] introduce UpdateNodeStatusForNode which does what
UpdateNodeStatuses does but only for the provided node
- [node_status_updater] if the node lister call fails for anything but a Not
Found error, we will return an error, not ignore it
- [node_status_updater] if the update of a node volumesAttached properties fails
we continue processing the other nodes
- [actual_state_of_world] introduce GetVolumesToReportAttachedForNode which
does what GetVolumesToReportAttached but for the node whose name is provided
it returns a bool which indicates if the node in question needs an update as
well as the volumesAttached list. It is used by UpdateNodeStatusForNode
- [actual_state_of_world] use write lock in updateNodeStatusUpdateNeeded, we're
modifying the map content
- [reconciler] use UpdateNodeStatusForNode in the detach loop
When comparing EndpointSubsets and Endpoints, we ignore the difference
in ResourceVersion of Pod to avoid unnecessary updates caused by Pod
updates that we don't care, e.g. annotation update.
Otherwise periodic Service resync would intensively update Endpoints or
EndpointSlice whose Pods have irrelevant change between two resyncs,
leading to delay in processing newly created Services. In a scale
cluster with thousands of such Endpoints, we observed 2 minutes of
delay when the resync happens.
Remove `tolerate-unready-endpoints` annotation in Service deprecated
from 1.11, use `Service.spec.publishNotReadyAddresses` instead.
Signed-off-by: He Xiaoxi <tossmilestone@gmail.com>
In the following code pattern, the log message will get logged with v=0 in JSON
output although conceptually it has a higher verbosity:
if klog.V(5).Enabled() {
klog.Info("hello world")
}
Having the actual verbosity in the JSON output is relevant, for example for
filtering out only the important info messages. The solution is to use
klog.V(5).Info or something similar.
Whether the outer if is necessary at all depends on how complex the parameters
are. The return value of klog.V can be captured in a variable and be used
multiple times to avoid the overhead for that function call and to avoid
repeating the verbosity level.
ServerResources function was deprecated and instead ServerGroupsAndResources
function is suggested.
This PR removes ServerResources function and move every place to use ServerGroupsAndResources.
Test 5-7 tries to delete a PVC at the very same time when it detects that
the PV controller started processing the PVC. The controller then sometimes
can't update the PVC and generate an event for it that the test expects.
From PV controller logs (not shown in CI):
> I1221 14:36:34.548160 104481 pv_controller.go:815] updating PersistentVolumeClaim[default/claim5-7] status: set phase Lost failed: cannot update claim claim5-7: claim not found
Typical error in CI:
> FAIL: TestControllerSync (83.22s)
> framework_test.go:202: Event "Warning ClaimLost" not emitted
Therefore wait for the PVC to be fully processed before deleting the PVC to
avoid races.
Add fake Pod and Node watchers to the tests. It only reduces test noise:
Failed to watch *v1.Pod: unhandled watch: testing.WatchActionImpl{ActionImpl:testing.ActionImpl{Namespace:"", Verb:"watch", Resource:schema.GroupVersionResource{Group:"", Version:"v1", Resource:"pods"}, Subresource:""}, WatchRestrictions:testing.WatchRestrictions{Labels:labels.internalSelector(nil), Fields:fields.andTerm{}, ResourceVersion:""}}
CRON_TZ variable slipped in during upgrading github.com/robfig/cron
library. It allows setting a time zone which is a long requested
feature but one that is not officially supported. This adds warning
event since users should not rely on unsupported features.