Commit Graph

5509 Commits

Author SHA1 Message Date
cyclinder
ab47b8b94b fix some lint error
Signed-off-by: cyclinder <qifeng.guo@daocloud.io>
2021-10-26 13:56:29 +08:00
Konstantin Misyutin
dbc9d7b71a Remove tests when StorageObjectInUseProtection feature is disabled
As well as feature gate are locked, the tests when this feature is
disabled will crash. So we should remove them together with locking
the feature.

Signed-off-by: Konstantin Misyutin <konstantin.misyutin@huawei.com>
2021-10-15 19:39:37 +08:00
Kubernetes Prow Robot
0bfa37dfcc
Merge pull request #105676 from alculquicondor/job-name
Fix name for Pods of NonIndexed Jobs
2021-10-14 10:50:12 -07:00
Aldo Culquicondor
4ef9d18abe Fix name for Pods of NonIndexed Jobs
Change-Id: I0ea4685a82f4cdec0caab362d52144476652f95a
2021-10-14 10:55:46 -04:00
Kubernetes Prow Robot
3aafe75698
Merge pull request #105461 from damemi/wire-contexts-autoscaling
Wire contexts to Autoscaling controllers
2021-10-14 06:59:33 -07:00
Kubernetes Prow Robot
f27e4714ba
Merge pull request #105377 from damemi/wire-contexts-apps
Wire contexts to Apps controllers
2021-10-14 06:59:19 -07:00
Kubernetes Prow Robot
baaa53db64
Merge pull request #105211 from xiaopingrubyist/fix-pv-controller-claim-cache-issue
fix:claim cached in pvcontroller is not the newest may cause unexpected issue
2021-10-14 05:47:18 -07:00
Mike Dame
41fcb95f2f Wire contexts to Apps controllers 2021-10-13 16:32:13 -04:00
torubylist
f28a8d7f2b fix:cached claim is not the newest will cause unexpected issue 2021-10-13 20:03:00 +08:00
Mike Dame
7780024916 Wire contexts to Autoscaling controllers 2021-10-12 14:34:05 -04:00
Maciej Szulik
8322121434
Move test-related utils to test/utils 2021-10-12 14:52:19 +02:00
Maciej Szulik
1fb6bf8a14
Wire context instead of TODO 2021-10-12 13:21:45 +02:00
Kubernetes Prow Robot
b0eac84937
Merge pull request #105345 from pohly/generic-ephemeral-volume-util
generic ephemeral volume util, base code and controller
2021-10-07 08:19:47 -07:00
Jordan Liggitt
81fa9855c1 Fix quota controller hotloop in integration tests 2021-10-06 11:34:32 -04:00
Patrick Ohly
4ae0eecb34 controller: use generic ephemeral volume helper functions
The name concatenation and ownership check were originally considered small
enough to not warrant dedicated functions, but the intent of the code is more
readable with them.

There also was a missing owner check in the attach controller.
2021-10-06 14:01:44 +02:00
Kubernetes Prow Robot
debd6c1e9e
Merge pull request #104526 from jingxu97/aug/volumeattach
Fix issue in node status updating VolumeAttached list
2021-10-05 17:30:32 -07:00
Jing Xu
69b9f9b1f0 Fix issue in node status updating VolumeAttached list
During volume detach, the following might happen in reconciler

1. Pod is deleting
2. remove volume from reportedAsAttached, so node status updater will
update volumeAttached list
3. detach failed due to some issue
4. volume is added back in reportedAsAttached
5. reconciler loops again the volume, remove volume from
reportedAsAttached
6. detach will not be trigged because exponential back off, detach call
will fail with exponential backoff error
7. another pod is added which using the same volume on the same node
8. reconciler loops and it will NOT try to tigger detach anymore

At this point, volume is still attached and in actual state, but
volumeAttached list in node status does not has this volume anymore, and
will block volume mount from kubelet.

The fix in first round is to add volume back into the volume list that
need to reported as attached at step 6 when detach call failed with
error (exponentical backoff). However this might has some performance
issue if detach fail for a while. During this time, volume will be keep
removing/adding back to node status which will cause a surge of API
calls.

So we changed to logic to check first whether operation is safe to retry which
means no pending operation or it is not in exponentical backoff time
period before calling detach. This way we can avoid keep removing/adding
volume from node status.

Change-Id: I5d4e760c880d72937d34b9d3e904ecad125f802e
2021-10-05 09:44:35 -07:00
Kubernetes Prow Robot
519b164db1
Merge pull request #105222 from cyclinder/remove_node_lease_GA
remove nodeLease feature GA
2021-10-05 05:41:21 -07:00
Author cyclinder
e61b901628 remove nodeLease feature GA
Signed-off-by: cyclinder <qifeng.guo@daocloud.io>
2021-10-05 12:23:27 +08:00
Kubernetes Prow Robot
0a29e2a73a
Merge pull request #105197 from alculquicondor/job-tracking
Roll-forward: Beta requirements for JobTrackingWithFinalizers
2021-10-04 18:57:49 -07:00
Aldo Culquicondor
5929ccd391 Track expected removals of Pod finalizers
Add the UIDs of Pods for which we are removing finalizers to an in-memory cache.

The controller removes UIDs from the cache as Pod updates or deletes come in.

This avoids double counting finished Pods when Pod updates arrive after Job status updates.

https://github.com/kubernetes/kubernetes/issues/105200
2021-10-04 16:09:58 -04:00
Kubernetes Prow Robot
c724abae81
Merge pull request #105375 from jonyhy96/fix-clock-util
node: test file use k8s.io/utils/clock instead
2021-10-04 04:19:31 -07:00
Rob Scott
d7a640d831
Excluding Control Plane Nodes from Topology Hints calculations 2021-10-01 15:12:38 -07:00
haoyun
8d9fa1decc fix: use k8s.io/utils/clock instead
Signed-off-by: haoyun <yun.hao@daocloud.io>
2021-10-02 00:08:53 +08:00
Aldo Culquicondor
95c2a8024c Parallelize pod updates in job test
To potentially reduce the number of job controller syncs.

Also reduce the maximum number of pods to sync in tests.
2021-10-01 09:55:53 -04:00
Kubernetes Prow Robot
598a8829c1
Merge pull request #105267 from llhuii/fix-topology-hint-bug
TopologyAwareHints: fix getHintsByZone bug
2021-09-28 20:54:48 -07:00
llhuii
e7350d126e TopologyAwareHints: add getHintsByZone unit test 2021-09-29 10:39:03 +08:00
llhuii
528cd30145 TopologyAwareHints: fix getHintsByZone bug
The bug could result in the EndpointSlice controller unnecessarily updating
EndpointSlices associated with a Service that had Topology Aware Hints enabled.
2021-09-28 09:02:24 +08:00
Kubernetes Prow Robot
aec9acda68
Merge pull request #105214 from alculquicondor/job-update
Remove GET job and retries for status updates
2021-09-27 09:05:47 -07:00
Khaled Henidak (Kal)
a53e2eaeab
move IPv6DualStack feature to stable. (#104691)
* kube-proxy

* endpoints controller

* app: kube-controller-manager

* app: cloud-controller-manager

* kubelet

* app: api-server

* node utils + registry/strategy

* api: validation (comment removal)

* api:pod strategy (util pkg)

* api: docs

* core: integration testing

* kubeadm: change feature gate to GA

* service registry and rest stack

* move feature to GA

* generated
2021-09-24 16:30:22 -07:00
Kubernetes Prow Robot
b6924839ca
Merge pull request #101987 from sky-philipalmeida/patch-1
Log if PV is still in use trying to delete it
2021-09-23 14:30:54 -07:00
Aldo Culquicondor
a438f16741 Revert "Revert "Add metric job_pod_finished""
This reverts commit 7868fbbe64.
2021-09-23 12:56:29 -04:00
Aldo Culquicondor
47a957d163 Revert "Revert "Limit number of Pods counted in a single Job sync""
This reverts commit 8bcb780808.
2021-09-23 12:56:29 -04:00
Aldo Culquicondor
01f27cd93e Fix log line for target number of running pods 2021-09-23 12:56:29 -04:00
Aldo Culquicondor
eebd678cda Remove GET job and retries for status updates.
Doing a GET right before retrying has 2 problems:
- It can masquerade conflicts
- It adds an additional delay

As for retries, we are better of going through the sync backoff.

In the case of conflict, we know that there was a Job update that would trigger another sync, so there is no need to do a rate limited requeue.
2021-09-23 11:48:34 -04:00
Kubernetes Prow Robot
372103f4b8
Merge pull request #100672 from wangyx1992/structured-log
Structured Logging migration: modify logs of controller-manager
2021-09-22 20:27:10 -07:00
Kubernetes Prow Robot
76c0573ff4
Merge pull request #105181 from alculquicondor/revert
Revert #104739
2021-09-21 16:54:00 -07:00
Aldo Culquicondor
7868fbbe64 Revert "Add metric job_pod_finished"
This reverts commit a0e7a567c5.
2021-09-21 15:16:54 -04:00
Aldo Culquicondor
8bcb780808 Revert "Limit number of Pods counted in a single Job sync"
This reverts commit 7d9cb88fed.
2021-09-21 15:16:50 -04:00
Phil
f1a9402082 Log if PV is still in use trying to delete it
Similar to what we have in:
https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/volume/pvcprotection/pvc_protection_controller.go#L181
The objective is to have a easy way to monitor if a PV will enter in Terminating state due to a failed removal when still in use.
This way we can capture the PV log and alert according.
The code is not tested.

Update pv_protection_controller.go

Change call to Infof
2021-09-21 18:05:16 +01:00
Kubernetes Prow Robot
b34a735bbe
Merge pull request #102523 from stlaz/rootca_metrics_cleanup
rootcacertpublisher: drop the namespace label from metrics to reduce its cardinality
2021-09-20 13:54:24 -07:00
Kubernetes Prow Robot
353f0a5eab
Merge pull request #105095 from wojtek-t/migrate_clock_3
Unify towards k8s.io/utils/clock - part 3
2021-09-20 12:46:45 -07:00
Kubernetes Prow Robot
f55101913f
Merge pull request #105098 from Karthik-K-N/fix-error-format
Fix incorrect format specifier in test files
2021-09-20 08:56:09 -07:00
Shivanshu Raj Shrivastava
bbd809cbd0
Fixing incorrectly migrated structured logs (#105122)
* added keys for structured logging

* used KObj
2021-09-19 12:28:08 -07:00
wojtekt
d9b08c611d Migrate to k8s.io/utils/clock 2021-09-17 15:19:08 +02:00
Kubernetes Prow Robot
399656369f
Merge pull request #104739 from alculquicondor/job-tracking
Beta requirements for JobTrackingWithFinalizers
2021-09-17 04:57:00 -07:00
Karthik K N
c651d50202 Fix incorrect format specifier in test files 2021-09-17 16:27:53 +05:30
Stanislav Laznicka
b67bd722a9
rootcacertpublisher: drop the namespace label from metrics to reduce its cardinality
The `root_ca_cert_publisher_sync_duration_seconds` metric tracks the sync
duration in the root CA cert publisher per code and namespace. In
clusters with a high namespace turnover (like CI clusters), this may
cause the kube-controller-manager to expose over 100k series to
Prometheus, which may cause degradation of that service.

Drop the `namespace` label to remove the metrics' cardinality, tracking
this metric by namespace does not justify the impact of keeping it.
2021-09-16 14:05:32 +02:00
Aldo Culquicondor
a0e7a567c5 Add metric job_pod_finished
To count the number of pods that the job controller successfully tracked with the JobTrackingWithFinalizers feature gate.
2021-09-15 11:19:47 -04:00
Kubernetes Prow Robot
047a6b9f86
Merge pull request #104874 from wojtek-t/migrate_clock_1
Unify towards k8s.io/utils/clock - part 1
2021-09-13 19:09:20 -07:00