Update the maximum sync backoff value to 1000s to match the sequence of
delays expected by the endpointslice controller when syncing Services:
Before this change the sequence was:
> 1s, 2s, 4s, 8s, 16s, 32s, 64s, 100s
Now it is:
> 1s, 2s, 4s, 8s, 16s, 32s, 64s, 128s, 256s, 512s, 1000s
Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
Fixes instances of #98213 (to ultimately complete #98213 linting is
required).
This commit fixes a few instances of a common mistake done when writing
parallel subtests or Ginkgo tests (basically any test in which the test
closure is dynamically created in a loop and the loop doesn't wait for
the test closure to complete).
I'm developing a very specific linter that detects this king of mistake
and these are the only violations of it it found in this repo (it's not
airtight so there may be more).
In the case of Ginkgo tests, without this fix, only the last entry in
the loop iteratee is actually tested. In the case of Parallel tests I
think it's the same problem but maybe a bit different, iiuc it depends
on the execution speed.
Waiting for the CI to confirm the tests are still passing, even after
this fix - since it's likely it's the first time those test cases are
executed - they may be buggy or testing code that is buggy.
Another instance of this is in `test/e2e/storage/csi_mock_volume.go` and
is still failing so it has been left out of this commit and will be
addressed in a separate one
To be able to implement controllers that are dynamically deciding
on which resources to watch, it is required to get rid of
dedicated watches and event handlers again. This requires the
possibility to remove event handlers from SharedIndexInformers again.
Stopping an informer is not sufficient, because there might
be multiple controllers in a controller manager that independently
decide which resources to watch.
Unfortunately the ResourceEventHandler interface encourages to use
value objects for handlers (like the ResourceEventHandlerFuncs
struct, that uses value receivers to implement the interface).
Go does not support comparison of function pointers and therefore
the comparison of such structs is not possible, also. To be able
to remove all kinds of handlers and to solve the problem of
multi-registrations of handlers a registration handle is introduced.
It is returned when adding a handler and can later be used to remove
the registration again. This handle directly stores the created
listener to simplify the deletion.
The Priority is determined as follows:
P0: ClusterCIDR with higher number of matching labels has highest
priority.
P1: ClusterCIDR having cidrSet with fewer allocatable Pod CIDRs has
higher priority.
P2: ClusterCIDR with a PerNodeMaskSize having fewer IPs has higher
priority.
P3: ClusterCIDR having label with lower alphanumeric value has higher
priority.
P4: ClusterCIDR with a cidrSet having a smaller IP address value has
higher priority.
Add a new cidrset named `multicidrset` which extends the current
cidrset mechanism to track allocatable Pod and Service CIDRs.
multicidrset stores the info about allocated CIDRs in a Map as opposed
to the current cidrset implementation where it is stored in a bitmap.
Add a new call to VolumePlugin interface and change all its
implementations.
Kubelet's VolumeManager will be interested whether a volume supports
mounting with -o conext=XYZ or not to hanle SetUp() / MountDevice()
accordingly.
In future commits we will need this to set the user/group of supported
volumes of KEP 127 - Phase 1.
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
- PreemptionByKubeScheduler (Pod preempted by kube-scheduler)
- DeletionByTaintManager (Pod deleted by taint manager due to NoExecute taint)
- EvictionByEvictionAPI (Pod evicted by Eviction API)
- DeletionByPodGC (an orphaned Pod deleted by PodGC)PreemptedByScheduler (Pod preempted by kube-scheduler)
DaemonSetsController adds a "nodeName" index to PodIndexer, which is
redundant with the "spec.nodeName" index of NodeLifecycleController.
However, DaemonSetsController hasn't been using this index since #86730.
This patch removes the redundant and unused index to reduce memory and
CPU spent on it.
Signed-off-by: Quan Tian <qtian@vmware.com>
The field replicaChange in timestampedScaleEvent was wrongly described
as either positive or negative depending on the scale direction. In
fact the change is set as unsigned, positive or 0 even for downscales.
- Run hack/update-codegen.sh
- Run hack/update-generated-device-plugin.sh
- Run hack/update-generated-protobuf.sh
- Run hack/update-generated-runtime.sh
- Run hack/update-generated-swagger-docs.sh
- Run hack/update-openapi-spec.sh
- Run hack/update-gofmt.sh
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
The evictorLock only protects zonePodEvictor and zoneNoExecuteTainter.
processTaintBaseEviction showed indications of increased lock contention
among goroutines (see issue 110341 for more details).
The refactor done is to ensure that all codepaths in that function that
hold the evictorLock AND make API calls under the lock, are now making
API calls outside the lock and the lock is held only for accessing either
zonePodEvictor or zoneNoExecuteTainter or both.
Two other places where the refactor was done is the doEvictionPass and
doNoExecuteTaintingPass functions which make multiple API calls under
the evictorLock.
Signed-off-by: Madhav Jivrajani <madhav.jiv@gmail.com>
Fix a TODO to plumb an update filter from above in the resource quota
monitor code that was handling update events for quota-able objects,
instead of hard-coding the logic in the resource quota monitor.
Signed-off-by: Andy Goldstein <andy.goldstein@redhat.com>
6 minute force-deatch timeout should be used only for nodes that are not
healthy.
In case a CSI driver is being upgraded or it's simply slow, NodeUnstage
can take more than 6 minutes. In that case, Pod is already deleted from the
API server and thus A/D controller will force-detach a mounted volume,
possibly corrupting the volume and breaking CSI - a CSI driver expects
NodeUnstage to succeed before Kubernetes can call ControllerUnpublish.
When a Pod is referencing a Node that doesn't exist on the local
informer cache, the current behavior was to return an error to
retry later and stop processing.
However, this can cause scenarios that a missing node leaves a
Slice stuck, it can no reflect other changes, or be created.
Also, this doesn't respect the publishNotReadyAddresses options
on Services, that considers ok to publish pod Addresses that are
known to not be ready.
The new behavior keeps retrying the problematic Service, but it
keeps processing the updates, reflacting current state on the
EndpointSlice. If the publishNotReadyAddresses is set, a missing
node on a Pod is not treated as an error.
There is always a placeholder slice.
The ServicePortCache logic was considering always one endpointSlice
per Endpoint, but if there are multiple empty Endpoints, we just
use one placeholder slice, not multiple placeholder slices.
Fixes Issue 108231 by checking `slicesToDelete` in the EndpointSlice
reconciler for a pre-existing placeholder slice.
Also adds a helper function for comparing the slices.
in case its status is False or Unknown.
In case the status of the pre-existing condition is true we ignore the new
condition. If there is no pre-existing failed condition, then append
the new failed condition as before.
Also, make the condition comparisons less hacky by ignoring timestamp fields
in tests.
Terminal pods, whose phase its Failed or Succeeded, are guaranteed
to never regress and to be stopped, so their IPs never should
be published on the Endpoints.
* feat: Provide previous replica count for deployment/replica set scale up/down event
Signed-off-by: GitHub <noreply@github.com>
* change format of event
Co-authored-by: Maciej Szulik <soltysh@gmail.com>
Co-authored-by: Maciej Szulik <soltysh@gmail.com>
In some rare race conditions, the job controller might create new pods after the job is declared finished.
Change-Id: I8a00429c8845463259cd7f82bb3c241d0011583c
When calculating the scale-up/scale-down limit, the number of replicas
at the start of the scaling policy period is calculated correctly by
taken into account the number of scaled-up and scaled-down replicas.
Signed-off-by: Olivier Michaelis <38879457+oliviermichaelis@users.noreply.github.com>
Remove the comment "As of v1.22, this field is beta and is controlled
via the CSRDuration feature gate" from the expirationSeconds field's
godoc.
Mark the "CSRDuration" feature gate as GA in 1.24, lock its value to
"true", and remove the various logic which handled when the gate was
"false".
Update conformance test to check that the CertificateSigningRequest's
Spec.ExpirationSeconds field is stored, but do not check if the field
is honored since this functionality is optional.
This patch aims to simplify decoupling "pkg/scheduler/framework/plugins"
from internal "k8s.io/kubernetes" packages. More described in
issue #89930 and PR #102953.
Some helpers from "k8s.io/kubernetes/pkg/controller/volume/persistentvolume"
package moved to "k8s.io/component-helpers/storage/volume" package:
- IsDelayBindingMode
- GetBindVolumeToClaim
- IsVolumeBoundToClaim
- FindMatchingVolume
- CheckVolumeModeMismatches
- CheckAccessModes
- GetVolumeNodeAffinity
Also "CheckNodeAffinity" from "k8s.io/kubernetes/pkg/volume/util"
package moved to "k8s.io/component-helpers/storage/volume" package
to prevent diamond dependency conflict.
Signed-off-by: Konstantin Misyutin <konstantin.misyutin@huawei.com>
- Lock feature gate to true and schedule for deletion in 1.26
- Remove checks on feature gate
- Graduate E2E test to Conformance
Change-Id: I6814819d318edaed5c86dae4055f4b050a4d39fd
All the controllers should use context for signalling termination of communication with API server. Once kcm cancels context all the cert controllers which are started via kcm should cancel the APIServer request in flight instead of hanging around.