Commit Graph

371 Commits

Author SHA1 Message Date
Harsha Narayana
c3cbc443ef
structured-logging: replace KObjs with KObjSlice for logging 2022-07-01 09:52:07 +05:30
Jan Safranek
3b94ac228a Don't force detach volume from healthy nodes
6 minute force-deatch timeout should be used only for nodes that are not
healthy. 

In case a CSI driver is being upgraded or it's simply slow, NodeUnstage
can take more than 6 minutes. In that case, Pod is already deleted from the
API server and thus A/D controller will force-detach a mounted volume,
possibly corrupting the volume and breaking CSI - a CSI driver expects
NodeUnstage to succeed before Kubernetes can call ControllerUnpublish.
2022-06-24 12:51:41 +02:00
Jiawei Wang
760365d5c9 CSIMigration feature gate to GA 2022-06-06 21:19:19 +00:00
Hemant Kumar
a99466ca86 check existing size before querying new size from api-server 2022-03-28 11:32:49 -04:00
Hemant Kumar
ed217f4140 rename SetVolumeSize to InitializeVolumeSize 2022-03-28 11:32:49 -04:00
Hemant Kumar
7a43406138 Do not update PVC if it already has updated size 2022-03-28 11:32:49 -04:00
Hemant Kumar
e4f62d6c41 Modify code to use new interface functions 2022-03-28 11:32:49 -04:00
Ashutosh Kumar
c00975370a
Handle Non-graceful Node Shutdown (#108486)
Signed-off-by: Ashutosh Kumar <sonasingh46@gmail.com>

Co-authored-by: Ashutosh Kumar <sonasingh46@gmail.com>

Co-authored-by: xing-yang <xingyang105@gmail.com>
2022-03-26 09:23:21 -07:00
Kubernetes Prow Robot
66daef4aa7
Merge pull request #108167 from jfremy/fix-107973
Fix nodes volumesAttached status not being updated
2022-03-01 12:49:54 -08:00
Kubernetes Prow Robot
06e107081e
Merge pull request #104732 from mengjiao-liu/remove-flag-experimental-check-node-capabilities-before-mount
kubelet: Remove the deprecated flag `--experimental-check-node-capabilities-before-mount`
2022-02-24 07:56:30 -08:00
Jean-Francois Remy
e83184568d Add unit tests
- actual_state_of_world_test.go: test the new method GetVolumesToReportAttachedForNode
  for an existing node and a non-existing node
- node_status_updater_test.go: test UpdateNodeStatuses and UpdateNodeStatuses in nominal
  case with 2 nodes getting one volume each. Test UpdateNodeStatuses with the first call
  to node.patch failing but the following one succeeding
- add comment in node_status_updater.go
- fix log line in reconciler.go
- rename variable in actual_state_of_world.go
2022-02-22 12:21:58 -08:00
Jean-Francois Remy
f1717baaaa Fix nodes volumesAttached status not updated
The UpdateNodeStatuses code stops too early in case there is
an error when calling updateNodeStatus. It will return immediately
which means any remaining node won't have its update status put back
to true.

Looking at the call sites for UpdateNodeStatuses, it appears this is
not the only issue. If the lister call fails with anything but a Not Found
error, it's silently ignored which is wrong in the detach path.
Also the reconciler detach path calls UpdateNodeStatuses but the real intent
is to only update the node currently processed in the loop and not proceed
with the detach call if there is an error updating that specifi node volumesAttached
property. With the current implementation, it will not proceed if there is
an error updating another node (which is not completely bad but not ideal) and
worse it will proceed if there is a lister error on that node which means the
node volumesAttached property won't have been updated.

To fix those issues, introduce the following changes:
- [node_status_updater] introduce UpdateNodeStatusForNode which does what
  UpdateNodeStatuses does but only for the provided node
- [node_status_updater] if the node lister call fails for anything but a Not
  Found error, we will return an error, not ignore it
- [node_status_updater] if the update of a node volumesAttached properties fails
  we continue processing the other nodes
- [actual_state_of_world] introduce GetVolumesToReportAttachedForNode which
  does what GetVolumesToReportAttached but for the node whose name is provided
  it returns a bool which indicates if the node in question needs an update as
  well as the volumesAttached list. It is used by UpdateNodeStatusForNode
- [actual_state_of_world] use write lock in updateNodeStatusUpdateNeeded, we're
  modifying the map content
- [reconciler] use UpdateNodeStatusForNode in the detach loop
2022-02-22 12:20:53 -08:00
yujunwang
8f96600907 perf:logic-optimiz-for-DetermineVolumeAction 2022-01-22 23:45:29 +08:00
Patrick Ohly
9eaa2dc554 avoid klog Info calls without verbosity
In the following code pattern, the log message will get logged with v=0 in JSON
output although conceptually it has a higher verbosity:

   if klog.V(5).Enabled() {
       klog.Info("hello world")
   }

Having the actual verbosity in the JSON output is relevant, for example for
filtering out only the important info messages. The solution is to use
klog.V(5).Info or something similar.

Whether the outer if is necessary at all depends on how complex the parameters
are. The return value of klog.V can be captured in a variable and be used
multiple times to avoid the overhead for that function call and to avoid
repeating the verbosity level.
2022-01-12 07:48:36 +01:00
Mengjiao Liu
beda4cafb6 kubelet: Remove the deprecated flag --experimental-check-node-capabilities-before-mount 2022-01-06 11:47:11 +08:00
Kubernetes Prow Robot
f0dbc32ed9
Merge pull request #106853 from gnufied/disable-exp-backoff-volume-not-inuse
When volume is not marked in-use, do not backoff
2021-12-22 19:46:37 -08:00
Hemant Kumar
7989f27044 use node informer to check volumes attachment status before backoff
fix unit tests
2021-12-20 11:57:05 -05:00
Davanum Srinivas
9405e9b55e
Check in OWNERS modified by update-yamlfmt.sh
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2021-12-09 21:31:26 -05:00
Léiyì Zhang
275fdf0884 fixing unit test failures induced by turning on CSIMigrationGCE
disable CSIMigrationGCE in some unit tests
2021-11-16 19:26:30 +00:00
Neha Lohia
fa1b6765d5
move pkg/util/node to component-helpers/node/util (#105347)
Signed-off-by: Neha Lohia <nehapithadiya444@gmail.com>
2021-11-12 07:52:27 -08:00
Patrick Ohly
a8c930ef46 generic ephemeral volume: graduation to GA
The feature gate gets locked to "true", with the goal to remove it in two
releases.

All code now can assume that the feature is enabled. Tests for "feature
disabled" are no longer needed and get removed.

Some code wasn't using the new helper functions yet. That gets changed while
touching those lines.
2021-10-11 20:54:20 +02:00
Kubernetes Prow Robot
b0eac84937
Merge pull request #105345 from pohly/generic-ephemeral-volume-util
generic ephemeral volume util, base code and controller
2021-10-07 08:19:47 -07:00
Patrick Ohly
4ae0eecb34 controller: use generic ephemeral volume helper functions
The name concatenation and ownership check were originally considered small
enough to not warrant dedicated functions, but the intent of the code is more
readable with them.

There also was a missing owner check in the attach controller.
2021-10-06 14:01:44 +02:00
Kubernetes Prow Robot
debd6c1e9e
Merge pull request #104526 from jingxu97/aug/volumeattach
Fix issue in node status updating VolumeAttached list
2021-10-05 17:30:32 -07:00
Jing Xu
69b9f9b1f0 Fix issue in node status updating VolumeAttached list
During volume detach, the following might happen in reconciler

1. Pod is deleting
2. remove volume from reportedAsAttached, so node status updater will
update volumeAttached list
3. detach failed due to some issue
4. volume is added back in reportedAsAttached
5. reconciler loops again the volume, remove volume from
reportedAsAttached
6. detach will not be trigged because exponential back off, detach call
will fail with exponential backoff error
7. another pod is added which using the same volume on the same node
8. reconciler loops and it will NOT try to tigger detach anymore

At this point, volume is still attached and in actual state, but
volumeAttached list in node status does not has this volume anymore, and
will block volume mount from kubelet.

The fix in first round is to add volume back into the volume list that
need to reported as attached at step 6 when detach call failed with
error (exponentical backoff). However this might has some performance
issue if detach fail for a while. During this time, volume will be keep
removing/adding back to node status which will cause a surge of API
calls.

So we changed to logic to check first whether operation is safe to retry which
means no pending operation or it is not in exponentical backoff time
period before calling detach. This way we can avoid keep removing/adding
volume from node status.

Change-Id: I5d4e760c880d72937d34b9d3e904ecad125f802e
2021-10-05 09:44:35 -07:00
Stephen Augustus
481cf6fbe7
generated: Run hack/update-gofmt.sh
Signed-off-by: Stephen Augustus <foo@auggie.dev>
2021-08-24 15:47:49 -04:00
Cheng Xing
0e315355df Pass FsGroup to MountDevice 2021-07-03 16:29:42 -07:00
Jordan Liggitt
ca279bbcc1 Fix race in attachdetach tests 2021-06-04 01:59:32 -04:00
yuzhiquan
0b8dc56408 fix volume failing test 2021-06-04 09:45:21 +08:00
Tim Ebert
cd3709232f
Fix VolumeAttachment garbage collection for migrated PVs 2021-05-28 08:35:05 +02:00
Jiawei Wang
be583070d2 Use CSI driver to determine unique name for migrated in-tree plugins 2021-05-06 10:31:30 -07:00
Kubernetes Prow Robot
fe88bdc1ab
Merge pull request #101304 from wangyx1992/capatial-log-controller
cleanup: fix errors in wrapped format and log capitalization in controller
2021-04-22 15:55:52 -07:00
wangyx1992
fd51e654af cleanup: fix errors in wrapped format and log capitalization in controller
Signed-off-by: wangyx1992 <wang.yixiang@zte.com.cn>
2021-04-22 15:40:54 +08:00
andyzhangx
e10d3948f5 fix: azure file namespace issue in csi translation
fix build failure

fix comments
2021-04-20 07:23:09 +00:00
Kubernetes Prow Robot
df9ad4d7d2
Merge pull request #96094 from Hellcatlk/m
Some comments' typos
2021-04-16 11:54:22 -07:00
Kubernetes Prow Robot
410d092d8a
Merge pull request #99643 from pohly/generic-ephemeral-volume-beta
generic ephemeral volume beta
2021-03-09 17:39:26 -08:00
Patrick Ohly
555d4a12bf generic ephemeral volumes: drop ReadOnly field
As discussed during the alpha review, the ReadOnly field is not really
needed because volume mounts can also be read-only. It's a historical
oddity that can be avoided for generic ephemeral volumes as part
of the promotion to beta.
2021-03-09 08:22:48 +01:00
Jan Safranek
219cbc818a Refactor CSI migration plugin manager to get featureGates as a parameter
This allows caller to provide fake ones for testing of various corner cases
(migration on A/D controller disabled while enabled on kubelet).
2021-03-08 13:50:01 +01:00
Benjamin Elder
56e092e382 hack/update-bazel.sh 2021-02-28 15:17:29 -08:00
caodonghui
f435e24403 Remove deadcode 2021-02-23 17:58:47 +08:00
Kubernetes Prow Robot
8bf42039e6
Merge pull request #96552 from pandaamanda/klog_fmt
use klog.Info and klog.Warning when had no format
2021-01-15 17:57:43 -08:00
Kubernetes Prow Robot
07bd985724
Merge pull request #96561 from ialidzhikov/cleanup/csi-node-info
Remove CSINodeInfo feature gate
2021-01-05 11:46:00 -08:00
Jayasekhar Konduru
9b2b73600d Recover CSI volumes from dangling attachments
Change-Id: I72105d67d8a4069ab19bfa4638a7ac365cf4194c
2020-12-11 18:31:53 -08:00
ialidzhikov
bc432124a2 Remove CSINodeInfo feature gate
Signed-off-by: ialidzhikov <i.alidjikov@gmail.com>
2020-12-10 09:58:22 +02:00
Kubernetes Prow Robot
ce7ac8442e
Merge pull request #94599 from verult/adc-op-asw-race
Fixes Attach Detach Controller reconciler race reading ActualStateOfWorld and operation pending states
2020-12-08 16:28:53 -08:00
xiongzhongliang
90f4aeeea4 use klog.Info and klog.Warning when had no format 2020-11-14 00:55:06 +08:00
Shihang Zhang
d2859cd89b plumb service account token down to csi driver 2020-11-12 09:26:43 -08:00
zouyu
7dd4622c84 Some comments' typos
Signed-off-by: zouyu <zouy.fnst@cn.fujitsu.com>
2020-11-02 15:05:23 +08:00
Cheng Xing
d9a629fe3a IsVolumeAttachedToNode() renamed to GetAttachState(), and returns 3 states instead of combining "uncertain" and "detached" into "false" 2020-10-29 13:24:51 -07:00
Cheng Xing
a61743b125 Fixes Attach Detach Controller reconciler race reading ActualStateOfWorld and operation pending states; fixes reconciler_test mock detach to account for multiple attaches on a node 2020-10-27 23:51:55 -07:00
Kubernetes Prow Robot
dd466bccde
Merge pull request #94527 from brahmaroutu/mount-utils-2
Change code to use  staging/k8s.io/mount-utils
2020-09-21 17:46:47 -07:00
Matthew Cary
f2e23afcf1 Adds filtering of hosts to DialContexts.
The provided DialContext wraps existing clients' DialContext in an attempt to
preserve any existing timeout configuration. In some cases, we may replace
infinite timeouts with golang defaults.

- scaleio: tcp connect/keepalive values changed from 0/15 to 30/30
- storageos: no change
2020-09-18 00:07:32 +00:00
Srini Brahmaroutu
fbe5daed73 Change code to use staging/k8s.io/mount-utils 2020-09-16 21:51:24 -07:00
Jiawei Wang
a6d8e6c5c2 Detect change of volume attachability in the middle of attaching
- Add Unit tests for both volumemanager and attach/detach controller
- Add E2E test
2020-08-24 17:15:11 -07:00
Patrick Ohly
ff3e5e06a7 GenericEphemeralVolume: initial implementation
The implementation consists of
- identifying all places where VolumeSource.PersistentVolumeClaim has
  a special meaning and then ensuring that the same code path is taken
  for an ephemeral volume, with the ownership check
- adding a controller that produces the PVCs for each embedded
  VolumeSource.EphemeralVolume
- relaxing the PVC protection controller such that it removes
  the finalizer already before the pod is deleted (only
  if the GenericEphemeralVolume feature is enabled): this is
  needed to break a cycle where foreground deletion of the pod
  blocks on removing the PVC, which waits for deletion of the pod

The controller was derived from the endpointslices controller.
2020-07-09 23:29:24 +02:00
Kubernetes Prow Robot
00d6255f44
Merge pull request #91712 from KobayashiD27/structured-logging-in-event
Migrate log to klog.InfoS for staging/src/k8s.io/client-go
2020-06-22 23:53:40 -07:00
Kobayashi Daisuke
4ae11dac2e Replace StartLogging(klog.Infof) with StartStructuredLogging(0) 2020-06-15 17:48:35 +09:00
Jayasekhar Konduru
2a89577659 CSI: Modify VolumeAttachment check to use Informer/Cache
Change-Id: Ie70c8b6657c67eefbf13042f36d56ca84a2e42bb
2020-06-11 10:34:09 -07:00
Yecheng Fu
8422044f17 sharing a common pod pvc indexer among volume controllers 2020-06-03 14:51:21 +08:00
Yecheng Fu
eaf2f54bba auto-generated files 2020-06-03 14:51:21 +08:00
Davanum Srinivas
07d88617e5
Run hack/update-vendor.sh
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2020-05-16 07:54:33 -04:00
Davanum Srinivas
442a69c3bd
switch over k/k to use klog v2
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2020-05-16 07:54:27 -04:00
Kubernetes Prow Robot
ef672c1c2d
Merge pull request #88678 from verult/slow-rxm-attach
Parallelize attach operations across different nodes for volumes that allow multi-attach
2020-03-06 13:17:21 -08:00
Christian Huffman
c6fd25d100 Updated CSIDriver references 2020-03-06 08:21:26 -05:00
Cheng Xing
ef3d66b98b Parallelize attach operations across different nodes for volumes that allow multi-attach 2020-03-05 22:22:05 -08:00
taesun_lee
79680b5d9b Fix pkg/controller typos in some error messages, comments etc
- applied review results by LuisSanchez
- Co-Authored-By: Luis Sanchez <sanchezl@redhat.com>

genernal -> general
iniital -> initial
initalObjects -> initialObjects
intentionaly -> intentionally
inforer -> informer
anotother -> another
triger -> trigger
mutli -> multi
Verifyies -> Verifies
valume -> volume
unexpect -> unexpected
unfulfiled -> unfulfilled
implenets -> implements
assignement -> assignment
expectataions -> expectations
nexpected -> unexpected
boundSatsified -> boundSatisfied
externel -> external
calcuates -> calculates
workes -> workers
unitialized -> uninitialized
afater -> after
Espected -> Expected
nodeMontiorGracePeriod -> NodeMonitorGracePeriod
estimateGrracefulTermination -> estimateGracefulTermination
secondrary -> secondary
ShouldRunDaemonPodOnUnscheduableNode -> ShouldRunDaemonPodOnUnschedulableNode
rrror -> error
expectatitons -> expectations
foud -> found
epackage -> package
succesfulJobs -> successfulJobs
namesapce -> namespace
ConfigMapResynce -> ConfigMapResync
2020-02-27 00:15:33 +09:00
Mike Danese
25651408ae generated: run refactor 2020-02-08 12:30:21 -05:00
Mike Danese
3aa59f7f30 generated: run refactor 2020-02-07 18:16:47 -08:00
Jordan Liggitt
cd1059e3c4 Revert "Merge pull request #87258 from verult/slow-rxm-attach"
This reverts commit 15c3f1b119, reversing
changes made to 52d7614a8c.
2020-01-29 14:58:32 -05:00
Cheng Xing
c6a03fa5be Parallelize attach operations across different nodes for volumes that allow multi-attach 2020-01-27 15:02:25 -08:00
David Zhu
41c65f4740 Jump out of spec translation early if the spec is not migratable. Unit tests work after all! 2019-11-15 11:23:32 -08:00
David Zhu
6e716af89e Add CSINodes to AttachDetachControllerRecovery test 2019-11-15 11:23:32 -08:00
Travis Rhoden
0c5c3d8bb9
Remove pkg/util/mount (moved out of tree)
This patch removes pkg/util/mount completely, and replaces it with the
mount package now located at k8s.io/utils/mount. The code found at
k8s.io/utils/mount was moved there from pkg/util/mount, so the code is
identical, just no longer in-tree to k/k.
2019-11-15 08:29:12 -07:00
Kubernetes Prow Robot
372ebd24f5
Merge pull request #83098 from ddebroy/disable-intree
CSI Migration phase 2: disable probing of in-tree plugins
2019-11-14 20:51:42 -08:00
Deep Debroy
129f15328b Disable in-tree plugins migrated to CSI
Signed-off-by: Deep Debroy <ddebroy@docker.com>
2019-11-14 17:28:21 -08:00
Travis Rhoden
367f879131
Retire mount.Exec for k8s.io/utils/exec
This patch removes mount.Exec entirely and instead uses the common
utility from k8s.io/utils/exec.

The fake exec implementation found in k8s.io/utils/exec differs a bit
than mount.Exec, with the ability to pre-script expected calls to
Command.CombinedOutput(), so tests that previously relied on a callback
mechanism to produce specific output have been updated to use that
mechanism.
2019-11-13 14:09:57 -07:00
RainbowMango
09dc221d7b Deal with auto-generated files.
Update bazel by hack/update-bazel.sh
2019-11-07 10:30:12 +08:00
RainbowMango
9efabe8124 Migrate custom collector to stablility framework 2019-11-07 10:26:27 +08:00
yuxiaobo
81e9f21f83 Correct spelling mistakes
Signed-off-by: yuxiaobo <yuxiaobogo@163.com>
2019-11-06 20:25:19 +08:00
Kubernetes Prow Robot
1d1385af91
Merge pull request #83474 from msau42/topology-ga
CSI Topology ga
2019-11-04 15:28:27 -08:00
Michelle Au
fb6dfeb718 Convert attach-detach controller to use v1.CSINode 2019-10-28 13:41:13 -07:00
wojtekt
7b6bcdf780 Autogenerated code 2019-10-24 20:21:00 +02:00
Kubernetes Prow Robot
a5dc1fffb0
Merge pull request #83543 from yutedz/attach-resync-comment
Remove stale comment about resyncPeriod
2019-10-09 13:17:50 -07:00
Kubernetes Prow Robot
4b002b3baa
Merge pull request #82123 from xiaoanyunfei/cleanup/take-effect-stateofworld-hashmap
replace iteration with hashmap in *state_of_world
2019-10-09 02:17:50 -07:00
Ted Yu
f0a6aa1e9b Log error from AddIndexers in NewAttachDetachController 2019-10-07 16:15:25 -07:00
Ted Yu
b81242b62e Remove stale comment about resyncPeriod 2019-10-06 05:02:07 -07:00
sunxiaofei03
45d41ed9e5 replace iteration with hashmap in *state_of_world 2019-08-29 19:22:25 +08:00
Han Kang
59db3ac27e migrate controller-manager metrics to stability framework 2019-08-28 12:26:57 -07:00
Yassine TIJANI
7e4c3096fe move WaitForCacheSync to the sharedInformer package
Signed-off-by: Yassine TIJANI <ytijani@vmware.com>
2019-08-22 16:13:41 +01:00
Travis Rhoden
4574473753
Rename mount.NewOsExec to mount.NewOSExec 2019-08-09 12:30:56 -06:00
David Xia
fabfd950b1
cleanup: fix some log and error capitalizations
Part of https://github.com/kubernetes/kubernetes/issues/15863
2019-07-20 18:26:16 -04:00
Kubernetes Prow Robot
5777fdfe31
Merge pull request #78105 from cwdsuzhou/narrow_down_lock
Narrow down the lock
2019-06-14 04:08:23 -07:00
caiweidong
e39ec09975 Add error info for plugin do not support attachment 2019-05-23 18:32:49 +08:00
caiweidong
5a0e7f19b6 Narrow down the lock 2019-05-20 19:12:27 +08:00
Vladimir Vivien
cfafde983b Volume AttachablePlugin.CanAttach() now returns both bool and error 2019-04-08 16:53:22 -04:00
Xing Yang
000ab86788 Move CSIDriver Lister to the controller 2019-04-05 12:20:11 -07:00
Vladimir Vivien
3777514f83 Adds DeviceMountablePlugin.CanDeviceMount check when retrieving plugins 2019-03-28 10:39:32 -04:00
Kubernetes Prow Robot
4499275cb9
Merge pull request #72800 from stewart-yu/stewart-component-base
Move config local to every controller in KCM
2019-03-21 19:26:19 -07:00
Kubernetes Prow Robot
046dcbd1ed
Merge pull request #73917 from droslean/cleanup
replace loops with go idiomatic.
2019-03-19 19:01:04 -07:00
David Zhu
41b3579345 Address review comments 2019-03-07 17:17:09 -08:00