node.status.volumesInUse should report only attachable volumes, therefore
it needs to wait for the reconciler to update uncertain attachability of
volumes from the API server.
During CSI volume reconstruction it's not possible to tell, if the volume
is attachable or not - CSIDriver instance may not be available, because
kubelet may not have connection to the API server at that time.
Adding uncertain state during reconstruction + adding a correct state when
the API server is available.
reconstructVolume() is called when kubelet may not have connection to the
API server yet, therefore it cannot get CSIDriver instances to figure out
if a CSI volume is attachable or not.
Refactor reconstructVolume(), so it does not need
FindAttachablePluginBySpec for CSI volumes, because all of them are
deviceMountable (i.e. FindDeviceMountablePluginBySpec always returns the
CSI volume plugin).
Every component that uses a pod.Manager should use a stub interface
(like we do for podWorker) that explicitly describes what methods
they use. This will allow podWorker to implement the minimum set
of manager interfaces.
The two are not coupled except accidentally. Separate them and
update callsites. This will reduce the scope of PodManager interface
to make exposing the pod worker cleaner.
This allows us to return with a timeout error as soon as the
context is canceled. Previously in cases where the mount will
never succeed pods can get stuck deleting for 2 minutes.
In the Sync*Pod methods that call VolumeManager.WaitFor*, we
must filter out wait.Interrupted errors from being logged as
they are part of control flow, not runtime problems. Any
early interruption should result in exiting the Sync*Pod method
as quickly as possible without logging intermediate errors.
`findAndRemoveDeletedPods()` processes only pods from volume_manager cache: `dswp.desiredStateOfWorld.GetVolumesToMount()`. `podWorker` calls volume_manager `WaitForUnmount()` asynchronously. If it happens after populator cleaned up resources, an entry is added to `processedPods` and will never be seen. Let's cleanup such entries if they don't have a pod and marked for deletion.
Kubelet in standalone mode won't have kubeclient, it cannot get node.status
and get devices from it. Such a kubelet cannot mount attachable volumes
anyway.
DesiredStateOfWorld must remember both
- the effective SELinux label to apply as a mount option (non-empty for
RWOP volumes, empty otherwise)
- and the label that _would_ be used if the mount option would be used by
all access modes.
Mismatch warning metrics must be generated from the second label.
The path module has a few different functions:
Clean, Split, Join, Ext, Dir, Base, IsAbs. These functions do not
take into account the OS-specific path separator, meaning that they
won't behave as intended on Windows.
For example, Dir is supposed to return all but the last element of the
path. For the path "C:\some\dir\somewhere", it is supposed to return
"C:\some\dir\", however, it returns ".".
Instead of these functions, the ones in filepath should be used instead.
SELinux context discovered from Pod is not final, it can be cleared when a
volume plugin does not support SELinux or the volume is not
ReadWriteOncePod. Update the existing log line + add a new one for easier
debugging.
Add volume reconstruction logs to V(2) to see initial kubelet
ActualStateOfWorld after kubelet start. Kubelet logs SetUp / TearDown
events at V(2) already, so we can track the whole volume mount state in
V(2) logs.
To preserve fix in https://github.com/kubernetes/kubernetes/pull/110670,
add an unit test that check a volume is *uncertain* even after final mount
error when it was reconstructed.
And actually fix a regression introduced in the previous patch.
Move reconciler logic from reconstruct{new}.go to:
- reconciler.go - only the functionality used by the current (old)
reconciler.
- reconciler_new.go - only the functionality used by the new reconciler.
- reconciler_common.go - common functions.
Subsequent SELinux work (see http://kep.k8s.io/1710) will need
ActualStateOfWorld populated around the time kubelet starts mounting
volumes.
Therefore reconstruct volumes before starting reconciler, but do not depend
on the desired state of world populated nor node.status - both need a
working API server, which may not be available at that time.
All reconstructed volumes are marked as Uncertain and reconciler will sort
them out - call SetUp to ensure the volume is really mounted when a pod
needs the volume or call TearDown then there is no such pod.
Finish the reconstruction when the API server becomes available:
- Clean up volumes that failed reconstruction and are not needed.
- Update devicePath of reconstructed volumes from node.status. Make sure
not to overwrite devicePath that may have been updated when the volume
was mounted by reconcile().
Hiding all this rework behind SELinuxMountReadWriteOncePod FeatureGate,
just to make sure we have a way back if this commit is buggy.