As well as feature gate are locked, the tests when this feature is
disabled will crash. So we should remove them together with locking
the feature.
Signed-off-by: Konstantin Misyutin <konstantin.misyutin@huawei.com>
The feature gate gets locked to "true", with the goal to remove it in two
releases.
All code now can assume that the feature is enabled. Tests for "feature
disabled" are no longer needed and get removed.
Some code wasn't using the new helper functions yet. That gets changed while
touching those lines.
The name concatenation and ownership check were originally considered small
enough to not warrant dedicated functions, but the intent of the code is more
readable with them.
There also was a missing owner check in the attach controller.
During volume detach, the following might happen in reconciler
1. Pod is deleting
2. remove volume from reportedAsAttached, so node status updater will
update volumeAttached list
3. detach failed due to some issue
4. volume is added back in reportedAsAttached
5. reconciler loops again the volume, remove volume from
reportedAsAttached
6. detach will not be trigged because exponential back off, detach call
will fail with exponential backoff error
7. another pod is added which using the same volume on the same node
8. reconciler loops and it will NOT try to tigger detach anymore
At this point, volume is still attached and in actual state, but
volumeAttached list in node status does not has this volume anymore, and
will block volume mount from kubelet.
The fix in first round is to add volume back into the volume list that
need to reported as attached at step 6 when detach call failed with
error (exponentical backoff). However this might has some performance
issue if detach fail for a while. During this time, volume will be keep
removing/adding back to node status which will cause a surge of API
calls.
So we changed to logic to check first whether operation is safe to retry which
means no pending operation or it is not in exponentical backoff time
period before calling detach. This way we can avoid keep removing/adding
volume from node status.
Change-Id: I5d4e760c880d72937d34b9d3e904ecad125f802e
Add the UIDs of Pods for which we are removing finalizers to an in-memory cache.
The controller removes UIDs from the cache as Pod updates or deletes come in.
This avoids double counting finished Pods when Pod updates arrive after Job status updates.
https://github.com/kubernetes/kubernetes/issues/105200
The bug could result in the EndpointSlice controller unnecessarily updating
EndpointSlices associated with a Service that had Topology Aware Hints enabled.
Doing a GET right before retrying has 2 problems:
- It can masquerade conflicts
- It adds an additional delay
As for retries, we are better of going through the sync backoff.
In the case of conflict, we know that there was a Job update that would trigger another sync, so there is no need to do a rate limited requeue.
The `root_ca_cert_publisher_sync_duration_seconds` metric tracks the sync
duration in the root CA cert publisher per code and namespace. In
clusters with a high namespace turnover (like CI clusters), this may
cause the kube-controller-manager to expose over 100k series to
Prometheus, which may cause degradation of that service.
Drop the `namespace` label to remove the metrics' cardinality, tracking
this metric by namespace does not justify the impact of keeping it.
When doing partial updates for uncountedTerminatedPods, the controller might have removed UIDs for Pods which still had finalizers.
Also make more space by removing UIDs that don't have finalizers at the beginning of the sync.
This PR adds GA AnnStorageProvisioner annotation to
a PVC if the PVC requires dynamic provisioning. This
also deprecates the beta AnnStorageProvisioner annotation
and it will be removed in a later release.
All dependencies of VolumeBinding plugin from
"k8s.io/kubernetes/pkg/controller/volume/scheduling" package moved to
"k8s.io/kubernetes/pkg/scheduler/framework/plugins/volumebinding" package:
- whole file pkg/controller/volume/scheduling/scheduler_assume_cache.go
- whole file pkg/controller/volume/scheduling/scheduler_assume_cache_test.go
- whole file pkg/controller/volume/scheduling/scheduler_binder.go
- whole file pkg/controller/volume/scheduling/scheduler_binder_fake.go
- whole file pkg/controller/volume/scheduling/scheduler_binder_test.go
Package "k8s.io/kubernetes/pkg/controller/volume/scheduling/metrics" moved
to "k8s.io/kubernetes/pkg/scheduler/framework/plugins/volumebinding/metrics"
because it only used in VolumeBinding plugin and (e2e) tests.
More described in issue #89930 and PR #102953.
Signed-off-by: Konstantin Misyutin <konstantin.misyutin@huawei.com>
Through Job.status.uncountedPodUIDs and a Pod finalizer
An annotation marks if a job should be tracked with new behavior
A separate work queue is used to remove finalizers from orphan pods.
Change-Id: I1862e930257a9d1f7f1b2b0a526ed15bc8c248ad
* set `endpoints.kubernetes.io/over-capacity` to "truncated" when
number of addresses has been truncated to a 1000
* ready addresses are prioritized over non-ready addresses
* addresses are proportionally truncated across subsets
As of now, we allow PDBs to be applied to pods via
selectors, so there can be unmanaged pods(pods that
don't have backing controllers) but still have PDBs associated.
Such pods are to be logged instead of immediately throwing
a sync error. This ensures disruption controller is
not frequently updating the status subresource and thus
preventing excessive and expensive writes to etcd.
This promotes the LogarithmicScaleDown feature gate to Beta, enabling it
by default. It also introduces a new metric, `sorting_deletion_age_ratio`,
intended to measure the efficacy of this new replica set scaledown behavior.
This change updates the CSR API to add a new, optional field called
expirationSeconds. This field is a request to the signer for the
maximum duration the client wishes the cert to have. The signer is
free to ignore this request based on its own internal policy. The
signers built-in to KCM will honor this field if it is not set to a
value greater than --cluster-signing-duration. The minimum allowed
value for this field is 600 seconds (ten minutes).
This change will help enforce safer durations for certificates in
the Kube ecosystem and will help related projects such as
cert-manager with their migration to the Kube CSR API.
Future enhancements may update the Kubelet to take advantage of this
field when it is configured in a way that can tolerate shorter
certificate lifespans with regular rotation.
Signed-off-by: Monis Khan <mok@vmware.com>
This change updates the backdating logic to only be applied to the
NotBefore date and not the NotAfter date when the certificate is
short lived. Thus when such a certificate is issued, it will not be
immediately expired. Long lived certificates continue to have the
same lifetime as before.
Consolidated all certificate lifetime logic into the
PermissiveSigningPolicy.policy method.
Signed-off-by: Monis Khan <mok@vmware.com>
Handles incorrect mirroring of endpoint annotations to created endpoint
slices, specifically the last-applied-config. Also updates tests
and adds test cases for the same