Commit Graph

224 Commits

Author SHA1 Message Date
Kubernetes Prow Robot
9c9a9b7c48 Merge pull request #124242 from carlory/cleanup-after-NewVolumeManagerReconstruction-ga
remove unneeded func SyncReconstructedVolume from ActualStateOfWorld
2024-04-18 03:24:50 -07:00
carlory
a6b2619274 remove unneeded func SyncReconstructedVolume from ActualStateOfWorld 2024-04-09 15:34:33 +08:00
宋文杰
420db6e82b delete 'TODO: move to reconstruct.go and remove old code there.' 2024-02-28 11:26:12 +08:00
Jan Safranek
2e92036576 Rename "new" reconstruction just to reconstruction
There is no "old" reconstruction, so remove "_new" from the file names and
function names.
2024-02-22 13:20:38 +01:00
Jan Safranek
2a2542289f Remove usage of NewVolumeManagerReconstruction feature gate
This removes lot of code related to "old" VolumeManage reconstruction.
2024-02-22 10:21:13 +01:00
Hemant Kumar
d190fa3e7d Fix race condition between external-resizer and kubelet
This fixes the race condition that could happen because
resize controller just finished volume expansiona and has only
finished marking PV and yet to mark PVC.

The workaround proposed here should not be necessary once
RecoverVolumeExpansionFailure goes GA/beta.
2024-01-31 12:23:56 -05:00
Kubernetes Prow Robot
c633ea71ed Merge pull request #122211 from gnufied/fix-uncertain-raw-block-devices
Fix device uncertain errors on reboot
2023-12-15 15:42:40 +01:00
Kubernetes Prow Robot
26e2cc5299 Merge pull request #119923 from cvvz/fix-119921
fix: Mount point may become local without calling `NodePublishVolume` after node rebooting
2023-12-13 21:25:51 +01:00
Hemant Kumar
56dd5ab10f Add tests for checking of uncertain device paths 2023-12-11 17:15:16 -05:00
Hemant Kumar
ed0facacfa Fix device uncertain errors on reboot 2023-12-06 22:19:14 -05:00
weizhichen
b91f07008c add ut 2023-11-06 08:20:42 +00:00
cvvz
03126c5465 add comment 2023-08-29 10:46:31 +08:00
cvvz
94d03ccc83 Squashed commit of the following:
commit d623614de31fe411f1dcb1e784472135f3ca0c5e
Merge: 8054af3b303 91344b4008
Author: cvvz <ftdchenwz@gmail.com>
Date:   Mon Aug 28 18:43:49 2023 +0800

    Merge branch 'master' of https://github.com/kubernetes/kubernetes into fix-volumemanager-logs

commit 8054af3b303e10e7b74b1ba4d3c4035f488cbdad
Author: cvvz <ftdchenwz@gmail.com>
Date:   Fri Aug 25 22:03:08 2023 +0800

    fix

commit b414972831c4e4030162ee385d8f600e1e0257ac
Author: cvvz <ftdchenwz@gmail.com>
Date:   Fri Aug 25 21:41:36 2023 +0800

    fix

commit ebea00a8dd50eb3d8859a912b464bbda5548b1d4
Author: cvvz <ftdchenwz@gmail.com>
Date:   Fri Aug 25 20:54:40 2023 +0800

    123

commit 9f6f1dbbe717fa34e1c13fec645f4c474cbf99a0
Author: cvvz <ftdchenwz@gmail.com>
Date:   Fri Aug 25 20:53:16 2023 +0800

    add MarshalLog

commit d7d2878409343df937c770d6796f8c125e18ce7a
Author: cvvz <ftdchenwz@gmail.com>
Date:   Tue Aug 8 23:57:47 2023 +0800

    fix volumemanager logs
2023-08-28 18:44:40 +08:00
cvvz
56c241783e fix 2023-08-25 19:56:54 +08:00
cvvz
ab1f97bd6e fix 2023-08-25 19:55:56 +08:00
Patrick Ohly
2472291790 api: introduce separate VolumeResourceRequirements struct
PVC and containers shared the same ResourceRequirements struct to define their
API. When resource claims were added, that struct got extended, which
accidentally also changed the PVC API. To avoid such a mistake from happening
again, PVC now uses its own VolumeResourceRequirements struct.

The `Claims` field gets removed because risk of breaking someone is low:
theoretically, YAML files which have a claims field for volumes now
get rejected when validating against the OpenAPI. Such files
have never made sense and should be fixed.

Code that uses the struct definitions needs to be updated.
2023-08-21 15:31:28 +02:00
cvvz
e40d00cf53 fix: 119921 2023-08-13 15:52:25 +08:00
Hemant Kumar
e011187114 Update code to use new generic allocatedResourceStatus field 2023-07-17 15:30:35 -04:00
Jan Safranek
354b6c409f Rename updateReconstructedFromAPIServer
to be in sync with volumesNeedUpdateFromNodeStatus.
2023-07-11 11:25:43 +02:00
Jan Safranek
1903f5aa2a Rename volumesNeedDevicePath
To volumesNeedUpdateFromNodeStatus - because both devicePath and uncertain
attach-ability needs to be fixed from node status.
2023-07-11 11:15:24 +02:00
Jan Safranek
7cd60df4aa Update volumesInUse after attachability is confirmed
node.status.volumesInUse should report only attachable volumes, therefore
it needs to wait for the reconciler to update uncertain attachability of
volumes from the API server.
2023-07-11 10:32:22 +02:00
Jan Safranek
0a2272dc68 Add uncertain state of volume attach-ability
During CSI volume reconstruction it's not possible to tell, if the volume
is attachable or not - CSIDriver instance may not be available, because
kubelet may not have connection to the API server at that time.

Adding uncertain state during reconstruction + adding a correct state when
the API server is available.
2023-07-11 10:32:22 +02:00
Jan Safranek
45aa59946a Refactor FindAttachablePluginBySpec out of CSI code path
reconstructVolume() is called when kubelet may not have connection to the
API server yet, therefore it cannot get CSIDriver instances to figure out
if a CSI volume is attachable or not.

Refactor reconstructVolume(), so it does not need
FindAttachablePluginBySpec for CSI volumes, because all of them are
deviceMountable (i.e. FindDeviceMountablePluginBySpec always returns the
CSI volume plugin).
2023-06-23 12:28:15 +02:00
Kubernetes Prow Robot
4893c66a48 Merge pull request #116134 from cvvz/fix-111933
fix: After a Node is down and take some time to get back to up again, the mount point of the evicted Pods cannot be cleaned up successfully.
2023-04-11 15:35:41 -07:00
Paco Xu
5134520a3b add lock in volume manager reconciler to avoid data race
Signed-off-by: Paco Xu <paco.xu@daocloud.io>
2023-03-17 21:29:10 +08:00
Kubernetes Prow Robot
49649c89ea Merge pull request #113584 from yangjunmyfm192085/volume-contextual-logging
volume: use contextual logging
2023-03-14 10:40:16 -07:00
Kubernetes Prow Robot
aa49f001bc Merge pull request #114701 from goushicui/vlm
update comment
2023-03-14 09:38:53 -07:00
Jan Safranek
c4f8c3f628 Fix volume reconstruction in standalone mode
Kubelet in standalone mode won't have kubeclient, it cannot get node.status
and get devices from it. Such a kubelet cannot mount attachable volumes
anyway.
2023-03-14 12:32:21 +01:00
杨军10092085
361e4ff0fa volume: use contextual logging 2023-03-14 08:37:30 +08:00
weizhichen
a6ffbb41f8 Squashed commit of the following:
commit 1b3ae27e7af577372d5aaaf28ea401eb33d1c4df
Author: weizhichen <weizhichen@microsoft.com>
Date:   Thu Mar 9 08:39:04 2023 +0000

    fix

commit 566e139308e3cec4c9d4765eb4ccc3a735346c2e
Author: weizhichen <weizhichen@microsoft.com>
Date:   Thu Mar 9 08:36:32 2023 +0000

    fix unit test

commit 13a58ebd25b824dcf854a132e9ac474c8296f0bf
Author: weizhichen <weizhichen@microsoft.com>
Date:   Thu Mar 2 03:32:39 2023 +0000

    add unit test

commit c984e36e37c41bbef8aec46fe3fe81ab1c6a2521
Author: weizhichen <weizhichen@microsoft.com>
Date:   Tue Feb 28 15:25:56 2023 +0000

    fix imports

commit 58ec617e0ff1fbd209ca0af3237017679c3c0ad7
Author: weizhichen <weizhichen@microsoft.com>
Date:   Tue Feb 28 15:24:21 2023 +0000

    delete CheckVolumeExistenceOperation

commit 0d8cf0caa78bdf1f1f84ce011c4cc0e0de0e8707
Author: weizhichen <weizhichen@microsoft.com>
Date:   Tue Feb 28 14:29:37 2023 +0000

    fix 111933
2023-03-09 09:53:38 +00:00
Kubernetes Prow Robot
2c8f63f693 Merge pull request #115268 from jsafrane/split-reconstruction
Split volume reconstruction refactoring from SELinuxMountReadWriteOncePod
2023-03-07 10:44:34 -08:00
Jan Safranek
9ca548fcf0 Add metrics for force cleaned mounts after failed reconstruction
Count nr. of force cleaned mounts + their failures after a volume fails
reconstruction.
2023-03-06 17:48:59 +01:00
Jan Safranek
bd73aee9db Add volume reconstruction metrics
Count nr. of volumes that kubelet tried to reconstruct + reconstruction
errors.
2023-02-22 13:01:26 +01:00
Jan Safranek
cca3d557e6 Split volume reconstruction refactoring from SELinuxMountReadWriteOncePod
Add a new feature gate NewVolumeManagerReconstruction and add the new
volume reconstruction done in 1.26 under that feature gate.
2023-01-23 14:43:29 +01:00
goushicui
6e0832a5aa update comment 2022-12-27 00:09:59 +08:00
Jordan Liggitt
78cb3862f1 Fix indentation/spacing in comments to render correctly in godoc 2022-12-17 23:27:38 -05:00
Kubernetes Prow Robot
a668924cb6 Merge pull request #113255 from claudiubelu/path-filepath-update-kubelet
Replaces path.Operation with filepath.Operation (kubelet)
2022-12-09 22:27:41 -08:00
Claudiu Belu
b9bf3e5c49 Replaces path.Operation with filepath.Operation (kubelet)
The path module has a few different functions:
Clean, Split, Join, Ext, Dir, Base, IsAbs. These functions do not
take into account the OS-specific path separator, meaning that they
won't behave as intended on Windows.

For example, Dir is supposed to return all but the last element of the
path. For the path "C:\some\dir\somewhere", it is supposed to return
"C:\some\dir\", however, it returns ".".

Instead of these functions, the ones in filepath should be used instead.
2022-11-08 16:05:48 +00:00
Jan Safranek
e575e60ea4 Reconstruct SELinux mount option
When reconstructing volumes from disk after kubelet restart, reconstruct
also context=XYZ mount option and add it to the ActualStateOfWorld.
2022-11-08 11:17:38 +01:00
Jan Safranek
9a98f7318b Increase verbosity of volume reconstruction messages
Add volume reconstruction logs to V(2) to see initial kubelet
ActualStateOfWorld after kubelet start. Kubelet logs SetUp / TearDown
events at V(2) already, so we can track the whole volume mount state in
V(2) logs.
2022-11-07 11:05:27 +01:00
Jan Safranek
286e19c460 Add node name parameter
Add nodeName to MarkVolumeAsAttached. MarkVolumeAsAttached implementation
in kubelet does not use the parameter, but it could do that in the future.
2022-11-07 10:50:23 +01:00
Jan Safranek
20c5cc0a39 Add unit test for failed mount after reconstruction
To preserve fix in https://github.com/kubernetes/kubernetes/pull/110670,
add an unit test that check a volume is *uncertain* even after final mount
error when it was reconstructed.

And actually fix a regression introduced in the previous patch.
2022-11-04 12:25:21 +01:00
Jan Safranek
6d810f2cd4 Add unit tests 2022-11-03 17:54:58 +01:00
Jan Safranek
3a79466ddd Reshuffle functions between reconstruct and reconstruc_common
Move common functions to reconstruc_common.go and functions used only for
the current (old) reconstruction to reconstruct.go
2022-11-03 16:55:13 +01:00
Jan Safranek
44b72d0348 Move new reconciler logic into reconciler_new.go
Move reconciler logic from reconstruct{new}.go to:
- reconciler.go - only the functionality used by the current (old)
  reconciler.
- reconciler_new.go - only the functionality used by the new reconciler.
- reconciler_common.go - common functions.
2022-11-03 16:55:13 +01:00
Jan Safranek
fc245b339b Refactor ConstructVolumeSpec
Return a struct from ConstructVolumeSpec to be able to add more fields to
it later.
2022-11-03 16:55:13 +01:00
Jan Safranek
e0f3e5c457 Rework volume reconstruction
Subsequent SELinux work (see http://kep.k8s.io/1710) will need
ActualStateOfWorld populated around the time kubelet starts mounting
volumes.

Therefore reconstruct volumes before starting reconciler, but do not depend
on the desired state of world populated nor node.status - both need a
working API server, which may not be available at that time.

All reconstructed volumes are marked as Uncertain and reconciler will sort
them out - call SetUp to ensure the volume is really mounted when a pod
needs the volume or call TearDown then there is no such pod.

Finish the reconstruction when the API server becomes available:
- Clean up volumes that failed reconstruction and are not needed.

- Update devicePath of reconstructed volumes from node.status. Make sure
  not to overwrite devicePath that may have been updated when the volume
  was mounted by reconcile().

Hiding all this rework behind SELinuxMountReadWriteOncePod FeatureGate,
just to make sure we have a way back if this commit is buggy.
2022-11-03 16:55:12 +01:00
Jan Safranek
989e391d08 Move all volume reconstruction code into separate files
There is no code change, just moving code around and preparing for the
subsequent commit.
2022-11-02 15:58:21 +01:00
Jan Safranek
d37808faae Report error on a pod startup on SELinux mismatch
When a volume is already mounted with an unexpected SELinux label,
kubelet must unmount it first and then mount it back with the expected one.
Report an error to user, just in case the unmount takes too long.

In therory, this error should not happen too often, because two Pods with
different SELinux label will not enter Desired State of World, see
dsw.AddPodToVolume. It can happen when DSW and ASW SELinux labels only when
a volume has been deleted from DSW (= Pod was deleted) or a volume was
reconstructed after kubelet restart. In both cases, volume manager should
unmount the volume quickly.
2022-10-31 13:59:23 +01:00
Jan Safranek
a910d83070 Reduce log noise on SELinux mount mismatch
The Desired State of World can require a different SELinux mount context than
is in the Actual State of World and it's perfectly OK. For example when
user changes SELinux context of Pods or when the context is reconstructed
after kubelet restart.

Don't spam log and don't report errors to the user as event - reconciler
will do the right thing and unmount the old volume (with wrong context) and
mount a new one in the next reconciliation. It's not an error, it's
expected workflow.
2022-10-27 18:00:42 +02:00