The feature gate enables mounting with -o context=XYZ mount option for all
volume types, not only ReadWriteOncePod.
All SELinux label tracking & error reporting infrastructure is already in
place from SELinuxMountReadWriteOncePod feature gate. This is just a
trivial extension to all access modes.
This fixes the race condition that could happen because
resize controller just finished volume expansiona and has only
finished marking PV and yet to mark PVC.
The workaround proposed here should not be necessary once
RecoverVolumeExpansionFailure goes GA/beta.
Use device mountable volume, to make it impossible to share the same global
mount with different SELinux contexts.
And fix pod2Name to actually refer to pod2.
volume_manager_selinux_volume_context_mismatch_warnings_total should be
counted only once per volume + pod. The previous location is evaluated
periodically, so bump the metric only when a new pod is added to volume.
Record volume plugin name when a volume in a Pod needs a different
"mount -o context" value than the actually mounted one.
We expect that NFS, CIFS and CephFS volumes would be able to mount such
volumes just fine with multiple "-o context" values.
We know that the block-volume based ones (ext4, xfs, btrfs, ...) cannot do
that.
Therefore want to distinguish the volume plugin in metrics, anything
block-volume based could break an existing application.
PVC and containers shared the same ResourceRequirements struct to define their
API. When resource claims were added, that struct got extended, which
accidentally also changed the PVC API. To avoid such a mistake from happening
again, PVC now uses its own VolumeResourceRequirements struct.
The `Claims` field gets removed because risk of breaking someone is low:
theoretically, YAML files which have a claims field for volumes now
get rejected when validating against the OpenAPI. Such files
have never made sense and should be fixed.
Code that uses the struct definitions needs to be updated.
node.status.volumesInUse should report only attachable volumes, therefore
it needs to wait for the reconciler to update uncertain attachability of
volumes from the API server.
During CSI volume reconstruction it's not possible to tell, if the volume
is attachable or not - CSIDriver instance may not be available, because
kubelet may not have connection to the API server at that time.
Adding uncertain state during reconstruction + adding a correct state when
the API server is available.
reconstructVolume() is called when kubelet may not have connection to the
API server yet, therefore it cannot get CSIDriver instances to figure out
if a CSI volume is attachable or not.
Refactor reconstructVolume(), so it does not need
FindAttachablePluginBySpec for CSI volumes, because all of them are
deviceMountable (i.e. FindDeviceMountablePluginBySpec always returns the
CSI volume plugin).
Every component that uses a pod.Manager should use a stub interface
(like we do for podWorker) that explicitly describes what methods
they use. This will allow podWorker to implement the minimum set
of manager interfaces.
The two are not coupled except accidentally. Separate them and
update callsites. This will reduce the scope of PodManager interface
to make exposing the pod worker cleaner.
This allows us to return with a timeout error as soon as the
context is canceled. Previously in cases where the mount will
never succeed pods can get stuck deleting for 2 minutes.
In the Sync*Pod methods that call VolumeManager.WaitFor*, we
must filter out wait.Interrupted errors from being logged as
they are part of control flow, not runtime problems. Any
early interruption should result in exiting the Sync*Pod method
as quickly as possible without logging intermediate errors.
`findAndRemoveDeletedPods()` processes only pods from volume_manager cache: `dswp.desiredStateOfWorld.GetVolumesToMount()`. `podWorker` calls volume_manager `WaitForUnmount()` asynchronously. If it happens after populator cleaned up resources, an entry is added to `processedPods` and will never be seen. Let's cleanup such entries if they don't have a pod and marked for deletion.
Kubelet in standalone mode won't have kubeclient, it cannot get node.status
and get devices from it. Such a kubelet cannot mount attachable volumes
anyway.
DesiredStateOfWorld must remember both
- the effective SELinux label to apply as a mount option (non-empty for
RWOP volumes, empty otherwise)
- and the label that _would_ be used if the mount option would be used by
all access modes.
Mismatch warning metrics must be generated from the second label.