Terminal pods, whose phase its Failed or Succeeded, are guaranteed
to never regress and to be stopped, so their IPs never should
be published on the Endpoints.
* feat: Provide previous replica count for deployment/replica set scale up/down event
Signed-off-by: GitHub <noreply@github.com>
* change format of event
Co-authored-by: Maciej Szulik <soltysh@gmail.com>
Co-authored-by: Maciej Szulik <soltysh@gmail.com>
In some rare race conditions, the job controller might create new pods after the job is declared finished.
Change-Id: I8a00429c8845463259cd7f82bb3c241d0011583c
Remove the comment "As of v1.22, this field is beta and is controlled
via the CSRDuration feature gate" from the expirationSeconds field's
godoc.
Mark the "CSRDuration" feature gate as GA in 1.24, lock its value to
"true", and remove the various logic which handled when the gate was
"false".
Update conformance test to check that the CertificateSigningRequest's
Spec.ExpirationSeconds field is stored, but do not check if the field
is honored since this functionality is optional.
This patch aims to simplify decoupling "pkg/scheduler/framework/plugins"
from internal "k8s.io/kubernetes" packages. More described in
issue #89930 and PR #102953.
Some helpers from "k8s.io/kubernetes/pkg/controller/volume/persistentvolume"
package moved to "k8s.io/component-helpers/storage/volume" package:
- IsDelayBindingMode
- GetBindVolumeToClaim
- IsVolumeBoundToClaim
- FindMatchingVolume
- CheckVolumeModeMismatches
- CheckAccessModes
- GetVolumeNodeAffinity
Also "CheckNodeAffinity" from "k8s.io/kubernetes/pkg/volume/util"
package moved to "k8s.io/component-helpers/storage/volume" package
to prevent diamond dependency conflict.
Signed-off-by: Konstantin Misyutin <konstantin.misyutin@huawei.com>
- Lock feature gate to true and schedule for deletion in 1.26
- Remove checks on feature gate
- Graduate E2E test to Conformance
Change-Id: I6814819d318edaed5c86dae4055f4b050a4d39fd
All the controllers should use context for signalling termination of communication with API server. Once kcm cancels context all the cert controllers which are started via kcm should cancel the APIServer request in flight instead of hanging around.
The field is not used anywhere and its value may be stale as Endpoints
and EndpointSlice won't be updated if there is only Pod ResourceVersion
change..
- actual_state_of_world_test.go: test the new method GetVolumesToReportAttachedForNode
for an existing node and a non-existing node
- node_status_updater_test.go: test UpdateNodeStatuses and UpdateNodeStatuses in nominal
case with 2 nodes getting one volume each. Test UpdateNodeStatuses with the first call
to node.patch failing but the following one succeeding
- add comment in node_status_updater.go
- fix log line in reconciler.go
- rename variable in actual_state_of_world.go
The UpdateNodeStatuses code stops too early in case there is
an error when calling updateNodeStatus. It will return immediately
which means any remaining node won't have its update status put back
to true.
Looking at the call sites for UpdateNodeStatuses, it appears this is
not the only issue. If the lister call fails with anything but a Not Found
error, it's silently ignored which is wrong in the detach path.
Also the reconciler detach path calls UpdateNodeStatuses but the real intent
is to only update the node currently processed in the loop and not proceed
with the detach call if there is an error updating that specifi node volumesAttached
property. With the current implementation, it will not proceed if there is
an error updating another node (which is not completely bad but not ideal) and
worse it will proceed if there is a lister error on that node which means the
node volumesAttached property won't have been updated.
To fix those issues, introduce the following changes:
- [node_status_updater] introduce UpdateNodeStatusForNode which does what
UpdateNodeStatuses does but only for the provided node
- [node_status_updater] if the node lister call fails for anything but a Not
Found error, we will return an error, not ignore it
- [node_status_updater] if the update of a node volumesAttached properties fails
we continue processing the other nodes
- [actual_state_of_world] introduce GetVolumesToReportAttachedForNode which
does what GetVolumesToReportAttached but for the node whose name is provided
it returns a bool which indicates if the node in question needs an update as
well as the volumesAttached list. It is used by UpdateNodeStatusForNode
- [actual_state_of_world] use write lock in updateNodeStatusUpdateNeeded, we're
modifying the map content
- [reconciler] use UpdateNodeStatusForNode in the detach loop
When comparing EndpointSubsets and Endpoints, we ignore the difference
in ResourceVersion of Pod to avoid unnecessary updates caused by Pod
updates that we don't care, e.g. annotation update.
Otherwise periodic Service resync would intensively update Endpoints or
EndpointSlice whose Pods have irrelevant change between two resyncs,
leading to delay in processing newly created Services. In a scale
cluster with thousands of such Endpoints, we observed 2 minutes of
delay when the resync happens.
Remove `tolerate-unready-endpoints` annotation in Service deprecated
from 1.11, use `Service.spec.publishNotReadyAddresses` instead.
Signed-off-by: He Xiaoxi <tossmilestone@gmail.com>
In the following code pattern, the log message will get logged with v=0 in JSON
output although conceptually it has a higher verbosity:
if klog.V(5).Enabled() {
klog.Info("hello world")
}
Having the actual verbosity in the JSON output is relevant, for example for
filtering out only the important info messages. The solution is to use
klog.V(5).Info or something similar.
Whether the outer if is necessary at all depends on how complex the parameters
are. The return value of klog.V can be captured in a variable and be used
multiple times to avoid the overhead for that function call and to avoid
repeating the verbosity level.
ServerResources function was deprecated and instead ServerGroupsAndResources
function is suggested.
This PR removes ServerResources function and move every place to use ServerGroupsAndResources.
Test 5-7 tries to delete a PVC at the very same time when it detects that
the PV controller started processing the PVC. The controller then sometimes
can't update the PVC and generate an event for it that the test expects.
From PV controller logs (not shown in CI):
> I1221 14:36:34.548160 104481 pv_controller.go:815] updating PersistentVolumeClaim[default/claim5-7] status: set phase Lost failed: cannot update claim claim5-7: claim not found
Typical error in CI:
> FAIL: TestControllerSync (83.22s)
> framework_test.go:202: Event "Warning ClaimLost" not emitted
Therefore wait for the PVC to be fully processed before deleting the PVC to
avoid races.
Add fake Pod and Node watchers to the tests. It only reduces test noise:
Failed to watch *v1.Pod: unhandled watch: testing.WatchActionImpl{ActionImpl:testing.ActionImpl{Namespace:"", Verb:"watch", Resource:schema.GroupVersionResource{Group:"", Version:"v1", Resource:"pods"}, Subresource:""}, WatchRestrictions:testing.WatchRestrictions{Labels:labels.internalSelector(nil), Fields:fields.andTerm{}, ResourceVersion:""}}
CRON_TZ variable slipped in during upgrading github.com/robfig/cron
library. It allows setting a time zone which is a long requested
feature but one that is not officially supported. This adds warning
event since users should not rely on unsupported features.