Commit Graph

323 Commits

Author SHA1 Message Date
Matthew Cary
e5d387c5d6 Upgrade CSIMigrationGCE feature gate to GA
Change-Id: I620bc4913765c0d6562eb1008216a72e8b0a2970
2022-08-02 09:14:27 -07:00
Davanum Srinivas
a9593d634c
Generate and format files
- Run hack/update-codegen.sh
- Run hack/update-generated-device-plugin.sh
- Run hack/update-generated-protobuf.sh
- Run hack/update-generated-runtime.sh
- Run hack/update-generated-swagger-docs.sh
- Run hack/update-openapi-spec.sh
- Run hack/update-gofmt.sh

Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2022-07-26 13:14:05 -04:00
Wojciech Tyczyński
13e4f2b554 Clean shutdown of volume integration tests 2022-07-14 11:25:57 +02:00
Jan Safranek
3b94ac228a Don't force detach volume from healthy nodes
6 minute force-deatch timeout should be used only for nodes that are not
healthy. 

In case a CSI driver is being upgraded or it's simply slow, NodeUnstage
can take more than 6 minutes. In that case, Pod is already deleted from the
API server and thus A/D controller will force-detach a mounted volume,
possibly corrupting the volume and breaking CSI - a CSI driver expects
NodeUnstage to succeed before Kubernetes can call ControllerUnpublish.
2022-06-24 12:51:41 +02:00
Jiawei Wang
760365d5c9 CSIMigration feature gate to GA 2022-06-06 21:19:19 +00:00
Hemant Kumar
a99466ca86 check existing size before querying new size from api-server 2022-03-28 11:32:49 -04:00
Hemant Kumar
ed217f4140 rename SetVolumeSize to InitializeVolumeSize 2022-03-28 11:32:49 -04:00
Hemant Kumar
7a43406138 Do not update PVC if it already has updated size 2022-03-28 11:32:49 -04:00
Hemant Kumar
e4f62d6c41 Modify code to use new interface functions 2022-03-28 11:32:49 -04:00
Ashutosh Kumar
c00975370a
Handle Non-graceful Node Shutdown (#108486)
Signed-off-by: Ashutosh Kumar <sonasingh46@gmail.com>

Co-authored-by: Ashutosh Kumar <sonasingh46@gmail.com>

Co-authored-by: xing-yang <xingyang105@gmail.com>
2022-03-26 09:23:21 -07:00
Kubernetes Prow Robot
66daef4aa7
Merge pull request #108167 from jfremy/fix-107973
Fix nodes volumesAttached status not being updated
2022-03-01 12:49:54 -08:00
Kubernetes Prow Robot
06e107081e
Merge pull request #104732 from mengjiao-liu/remove-flag-experimental-check-node-capabilities-before-mount
kubelet: Remove the deprecated flag `--experimental-check-node-capabilities-before-mount`
2022-02-24 07:56:30 -08:00
Jean-Francois Remy
e83184568d Add unit tests
- actual_state_of_world_test.go: test the new method GetVolumesToReportAttachedForNode
  for an existing node and a non-existing node
- node_status_updater_test.go: test UpdateNodeStatuses and UpdateNodeStatuses in nominal
  case with 2 nodes getting one volume each. Test UpdateNodeStatuses with the first call
  to node.patch failing but the following one succeeding
- add comment in node_status_updater.go
- fix log line in reconciler.go
- rename variable in actual_state_of_world.go
2022-02-22 12:21:58 -08:00
Jean-Francois Remy
f1717baaaa Fix nodes volumesAttached status not updated
The UpdateNodeStatuses code stops too early in case there is
an error when calling updateNodeStatus. It will return immediately
which means any remaining node won't have its update status put back
to true.

Looking at the call sites for UpdateNodeStatuses, it appears this is
not the only issue. If the lister call fails with anything but a Not Found
error, it's silently ignored which is wrong in the detach path.
Also the reconciler detach path calls UpdateNodeStatuses but the real intent
is to only update the node currently processed in the loop and not proceed
with the detach call if there is an error updating that specifi node volumesAttached
property. With the current implementation, it will not proceed if there is
an error updating another node (which is not completely bad but not ideal) and
worse it will proceed if there is a lister error on that node which means the
node volumesAttached property won't have been updated.

To fix those issues, introduce the following changes:
- [node_status_updater] introduce UpdateNodeStatusForNode which does what
  UpdateNodeStatuses does but only for the provided node
- [node_status_updater] if the node lister call fails for anything but a Not
  Found error, we will return an error, not ignore it
- [node_status_updater] if the update of a node volumesAttached properties fails
  we continue processing the other nodes
- [actual_state_of_world] introduce GetVolumesToReportAttachedForNode which
  does what GetVolumesToReportAttached but for the node whose name is provided
  it returns a bool which indicates if the node in question needs an update as
  well as the volumesAttached list. It is used by UpdateNodeStatusForNode
- [actual_state_of_world] use write lock in updateNodeStatusUpdateNeeded, we're
  modifying the map content
- [reconciler] use UpdateNodeStatusForNode in the detach loop
2022-02-22 12:20:53 -08:00
yujunwang
8f96600907 perf:logic-optimiz-for-DetermineVolumeAction 2022-01-22 23:45:29 +08:00
Patrick Ohly
9eaa2dc554 avoid klog Info calls without verbosity
In the following code pattern, the log message will get logged with v=0 in JSON
output although conceptually it has a higher verbosity:

   if klog.V(5).Enabled() {
       klog.Info("hello world")
   }

Having the actual verbosity in the JSON output is relevant, for example for
filtering out only the important info messages. The solution is to use
klog.V(5).Info or something similar.

Whether the outer if is necessary at all depends on how complex the parameters
are. The return value of klog.V can be captured in a variable and be used
multiple times to avoid the overhead for that function call and to avoid
repeating the verbosity level.
2022-01-12 07:48:36 +01:00
Mengjiao Liu
beda4cafb6 kubelet: Remove the deprecated flag --experimental-check-node-capabilities-before-mount 2022-01-06 11:47:11 +08:00
Kubernetes Prow Robot
f0dbc32ed9
Merge pull request #106853 from gnufied/disable-exp-backoff-volume-not-inuse
When volume is not marked in-use, do not backoff
2021-12-22 19:46:37 -08:00
Hemant Kumar
7989f27044 use node informer to check volumes attachment status before backoff
fix unit tests
2021-12-20 11:57:05 -05:00
Davanum Srinivas
9405e9b55e
Check in OWNERS modified by update-yamlfmt.sh
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2021-12-09 21:31:26 -05:00
Léiyì Zhang
275fdf0884 fixing unit test failures induced by turning on CSIMigrationGCE
disable CSIMigrationGCE in some unit tests
2021-11-16 19:26:30 +00:00
Neha Lohia
fa1b6765d5
move pkg/util/node to component-helpers/node/util (#105347)
Signed-off-by: Neha Lohia <nehapithadiya444@gmail.com>
2021-11-12 07:52:27 -08:00
Patrick Ohly
a8c930ef46 generic ephemeral volume: graduation to GA
The feature gate gets locked to "true", with the goal to remove it in two
releases.

All code now can assume that the feature is enabled. Tests for "feature
disabled" are no longer needed and get removed.

Some code wasn't using the new helper functions yet. That gets changed while
touching those lines.
2021-10-11 20:54:20 +02:00
Kubernetes Prow Robot
b0eac84937
Merge pull request #105345 from pohly/generic-ephemeral-volume-util
generic ephemeral volume util, base code and controller
2021-10-07 08:19:47 -07:00
Patrick Ohly
4ae0eecb34 controller: use generic ephemeral volume helper functions
The name concatenation and ownership check were originally considered small
enough to not warrant dedicated functions, but the intent of the code is more
readable with them.

There also was a missing owner check in the attach controller.
2021-10-06 14:01:44 +02:00
Kubernetes Prow Robot
debd6c1e9e
Merge pull request #104526 from jingxu97/aug/volumeattach
Fix issue in node status updating VolumeAttached list
2021-10-05 17:30:32 -07:00
Jing Xu
69b9f9b1f0 Fix issue in node status updating VolumeAttached list
During volume detach, the following might happen in reconciler

1. Pod is deleting
2. remove volume from reportedAsAttached, so node status updater will
update volumeAttached list
3. detach failed due to some issue
4. volume is added back in reportedAsAttached
5. reconciler loops again the volume, remove volume from
reportedAsAttached
6. detach will not be trigged because exponential back off, detach call
will fail with exponential backoff error
7. another pod is added which using the same volume on the same node
8. reconciler loops and it will NOT try to tigger detach anymore

At this point, volume is still attached and in actual state, but
volumeAttached list in node status does not has this volume anymore, and
will block volume mount from kubelet.

The fix in first round is to add volume back into the volume list that
need to reported as attached at step 6 when detach call failed with
error (exponentical backoff). However this might has some performance
issue if detach fail for a while. During this time, volume will be keep
removing/adding back to node status which will cause a surge of API
calls.

So we changed to logic to check first whether operation is safe to retry which
means no pending operation or it is not in exponentical backoff time
period before calling detach. This way we can avoid keep removing/adding
volume from node status.

Change-Id: I5d4e760c880d72937d34b9d3e904ecad125f802e
2021-10-05 09:44:35 -07:00
Stephen Augustus
481cf6fbe7
generated: Run hack/update-gofmt.sh
Signed-off-by: Stephen Augustus <foo@auggie.dev>
2021-08-24 15:47:49 -04:00
Cheng Xing
0e315355df Pass FsGroup to MountDevice 2021-07-03 16:29:42 -07:00
Jordan Liggitt
ca279bbcc1 Fix race in attachdetach tests 2021-06-04 01:59:32 -04:00
yuzhiquan
0b8dc56408 fix volume failing test 2021-06-04 09:45:21 +08:00
Tim Ebert
cd3709232f
Fix VolumeAttachment garbage collection for migrated PVs 2021-05-28 08:35:05 +02:00
Jiawei Wang
be583070d2 Use CSI driver to determine unique name for migrated in-tree plugins 2021-05-06 10:31:30 -07:00
Kubernetes Prow Robot
fe88bdc1ab
Merge pull request #101304 from wangyx1992/capatial-log-controller
cleanup: fix errors in wrapped format and log capitalization in controller
2021-04-22 15:55:52 -07:00
wangyx1992
fd51e654af cleanup: fix errors in wrapped format and log capitalization in controller
Signed-off-by: wangyx1992 <wang.yixiang@zte.com.cn>
2021-04-22 15:40:54 +08:00
andyzhangx
e10d3948f5 fix: azure file namespace issue in csi translation
fix build failure

fix comments
2021-04-20 07:23:09 +00:00
Kubernetes Prow Robot
df9ad4d7d2
Merge pull request #96094 from Hellcatlk/m
Some comments' typos
2021-04-16 11:54:22 -07:00
Kubernetes Prow Robot
410d092d8a
Merge pull request #99643 from pohly/generic-ephemeral-volume-beta
generic ephemeral volume beta
2021-03-09 17:39:26 -08:00
Patrick Ohly
555d4a12bf generic ephemeral volumes: drop ReadOnly field
As discussed during the alpha review, the ReadOnly field is not really
needed because volume mounts can also be read-only. It's a historical
oddity that can be avoided for generic ephemeral volumes as part
of the promotion to beta.
2021-03-09 08:22:48 +01:00
Jan Safranek
219cbc818a Refactor CSI migration plugin manager to get featureGates as a parameter
This allows caller to provide fake ones for testing of various corner cases
(migration on A/D controller disabled while enabled on kubelet).
2021-03-08 13:50:01 +01:00
Benjamin Elder
56e092e382 hack/update-bazel.sh 2021-02-28 15:17:29 -08:00
caodonghui
f435e24403 Remove deadcode 2021-02-23 17:58:47 +08:00
Kubernetes Prow Robot
8bf42039e6
Merge pull request #96552 from pandaamanda/klog_fmt
use klog.Info and klog.Warning when had no format
2021-01-15 17:57:43 -08:00
Kubernetes Prow Robot
07bd985724
Merge pull request #96561 from ialidzhikov/cleanup/csi-node-info
Remove CSINodeInfo feature gate
2021-01-05 11:46:00 -08:00
Jayasekhar Konduru
9b2b73600d Recover CSI volumes from dangling attachments
Change-Id: I72105d67d8a4069ab19bfa4638a7ac365cf4194c
2020-12-11 18:31:53 -08:00
ialidzhikov
bc432124a2 Remove CSINodeInfo feature gate
Signed-off-by: ialidzhikov <i.alidjikov@gmail.com>
2020-12-10 09:58:22 +02:00
Kubernetes Prow Robot
ce7ac8442e
Merge pull request #94599 from verult/adc-op-asw-race
Fixes Attach Detach Controller reconciler race reading ActualStateOfWorld and operation pending states
2020-12-08 16:28:53 -08:00
xiongzhongliang
90f4aeeea4 use klog.Info and klog.Warning when had no format 2020-11-14 00:55:06 +08:00
Shihang Zhang
d2859cd89b plumb service account token down to csi driver 2020-11-12 09:26:43 -08:00
zouyu
7dd4622c84 Some comments' typos
Signed-off-by: zouyu <zouy.fnst@cn.fujitsu.com>
2020-11-02 15:05:23 +08:00