This PR fixes issue #32727.
When an attach operation fails, it is still possible that the volume
will be attached to the node later. This PR adds the logic to record the
volume to node with attached state no matter whether the operation
succedded or not. If the operation fails, mark the attached state to
false. If the operation succeeded, mark the attached state to true. The
reconciler will still issue attach operation until it returns
successfully. If the pod is removed in the mean time, the reconciler
will issue detach operations for all the volumes no matter what is the
attached state.
- Move from the old github.com/golang/glog to k8s.io/klog
- klog as explicit InitFlags() so we add them as necessary
- we update the other repositories that we vendor that made a similar
change from glog to klog
* github.com/kubernetes/repo-infra
* k8s.io/gengo/
* k8s.io/kube-openapi/
* github.com/google/cadvisor
- Entirely remove all references to glog
- Fix some tests by explicit InitFlags in their init() methods
Change-Id: I92db545ff36fcec83afe98f550c9e630098b3135
Users must not be allowed to step outside the volume with subPath.
Therefore the final subPath directory must be "locked" somehow
and checked if it's inside volume.
On Windows, we lock the directories. On Linux, we bind-mount the final
subPath into /var/lib/kubelet/pods/<uid>/volume-subpaths/<container name>/<subPathName>,
it can't be changed to symlink user once it's bind-mounted.
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Redesign and implement volume reconstruction work
This PR is the first part of redesign of volume reconstruction work. The detailed design information is https://github.com/kubernetes/community/pull/1601
The changes include
1. Remove dependency on volume spec stored in actual state for volume
cleanup process (UnmountVolume and UnmountDevice)
Modify AttachedVolume struct to add DeviceMountPath so that volume
unmount operation can use this information instead of constructing from
volume spec
2. Modify reconciler's volume reconstruction process (syncState). Currently workflow
is when kubelet restarts, syncState() is only called once before
reconciler starts its loop.
a. If volume plugin supports reconstruction, it will use the
reconstructed volume spec information to update actual state as before.
b. If volume plugin cannot support reconstruction, it will use the
scanned mount path information to clean up the mounts.
In this PR, all the plugins still support reconstruction (except
glusterfs), so reconstruction of some plugins will still have issues.
The next PR will modify those plugins that cannot support reconstruction
well.
This PR addresses issue #52683
This is the 2nd attempt. The previous was reverted while we figured out
the regional mirrors (oops).
New plan: k8s.gcr.io is a read-only facade that auto-detects your source
region (us, eu, or asia for now) and pulls from the closest. To publish
an image, push k8s-staging.gcr.io and it will be synced to the regionals
automatically (similar to today). For now the staging is an alias to
gcr.io/google_containers (the legacy URL).
When we move off of google-owned projects (working on it), then we just
do a one-time sync, and change the google-internal config, and nobody
outside should notice.
We can, in parallel, change the auto-sync into a manual sync - send a PR
to "promote" something from staging, and a bot activates it. Nice and
visible, easy to keep track of.
This PR is the first part of redesign of volume reconstruction work. The
changes include
1. Remove dependency on volume spec stored in actual state for volume
cleanup process (UnmountVolume and UnmountDevice)
Modify AttachedVolume struct to add DeviceMountPath so that volume
unmount operation can use this information instead of constructing from
volume spec
2. Modify reconciler's volume reconstruction process (syncState). Currently workflow
is when kubelet restarts, syncState() is only called once before
reconciler starts its loop.
a. If volume plugin supports reconstruction, it will use the
reconstructed volume spec information to update actual state as before.
b. If volume plugin cannot support reconstruction, it will use the
scanned mount path information to clean up the mounts.
In this PR, all the plugins still support reconstruction (except
glusterfs), so reconstruction of some plugins will still have issues.
The next PR will modify those plugins that cannot support reconstruction
well.
This PR addresses issue #52683, #54108 (This PR includes the changes to
update devicePath after local attach finishes)
Automatic merge from submit-queue (batch tested with PRs 46450, 46272, 46453, 46019, 46367)
Move MountVolume.SetUp succeeded to debug level
This message is verbose and repeated over and over again in log files
creating a lot of noise. Leave the message in, but require a -v in
order to actually log it.
**What this PR does / why we need it**: Moves a verbose log message to actually be verbose.
**Which issue this PR fixes** fixes#46364Fixes#29059
Automatic merge from submit-queue (batch tested with PRs 46383, 45645, 45923, 44884, 46294)
Node status updater now deletes the node entry in attach updates...
… when node is missing in NodeInformer cache.
- Added RemoveNodeFromAttachUpdates as part of node status updater operations.
**What this PR does / why we need it**: Fixes issue of unnecessary node status updates when node is deleted.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#42438
**Special notes for your reviewer**: Unit tested added, but a more comprehensive test involving the attach detach controller requires certain testing functionality that is currently absent, and will require larger effort. Will be added at a later time.
There is an edge case caused by the following steps:
1) A node is deleted and restarted. The node exists, but is not yet recognized by Kubernetes.
2) A pod requiring a volume attach with nodeName specifically set to this node.
This would make the pod stuck in ContainerCreating state. This is low-pri since it's a specific edge case that can be avoided.
**Release note**:
```release-note
NONE
```
This message is verbose and repeated over and over again in log files
creating a lot of noise. Leave the messsage in, but require a -v in
order to actually log it.
Fixes#29059
When the attach/detach controller crashes and a pod with attached PV is deleted
afterwards the controller will never detach the pod's attached volumes. To
prevent this the controller should try to recover the state from the nodes
status.
This implements Bulk volume polling using ideas presented by
justin in https://github.com/kubernetes/kubernetes/pull/39564
But it changes the implementation to use an interface
and doesn't affect other implementations.
To safely mark a volume detached when the volume controller manager is used.
An example of one such problem:
1. pod is created, volume is added to desired state of the world
2. reconciler process starts
3. reconciler starts MountVolume, which is kicked off asynchronously via
operation_executor.go
4. MountVolume mounts the volume, but hasn't yet marked it as mounted
5. pod is deleted, volume is removed from desired state of the world
6. reconciler detects volume is no longer in desired state of world,
removes it from volumes in use
7. MountVolume tries to mark volume in use, throws an error because
volume is no longer in actual state of world list.
8. controller-manager tries to detach the volume, this fails because it
is still mounted to the OS.
9. EBS gets stuck indefinitely in busy state trying to detach.
This is a fix on top #38124. In this fix, we move the logic to filter
out shared mount references into operation_executor's UnmountDevice
function to avoid this part is being used by other types volumes such as
rdb, azure etc. This filter function should be only needed during
unmount device for GCI image.
Automatic merge from submit-queue
Reduce verbosity of volume reconciler
**What this PR does / why we need it**:
It reduces the log verbosity for attaching of volumes
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
Reduce verbosity of volume reconciler when attaching volumes
```
Set logging level for information about attaching of volumes to from 1 to 4
Otherwise the log is spammed with one line per 100ms while attaching is
in progress and afterwards as long as the volume is attached.
Automatic merge from submit-queue
refactor DeviceOpened() so it won't return error if device doesn't exist
<!-- Thanks for sending a pull request! Here are some tips for you:
1. If this is your first time, read our contributor guidelines https://github.com/kubernetes/kubernetes/blob/master/CONTRIBUTING.md and developer guide https://github.com/kubernetes/kubernetes/blob/master/docs/devel/development.md
2. If you want *faster* PR reviews, read how: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/faster_reviews.md
3. Follow the instructions for writing a release note: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/pull-requests.md#release-notes
-->
**What this PR does / why we need it**:
DeviceOpened() is called after device is unmounted but before detached. Some volumes such as rbd don't support 3rd party detach, they have to be detached during unmount. Once detached, the device path vanishes. This causes false alarm when DeviceOpened() is called.
The fix is to ignore error IsNotExist
**Which issue this PR fixes** _(optional, in `fixes #<issue number>(, #<issue_number>, ...)` format, will close that issue when PR gets merged)_: fixes #
**Special notes for your reviewer**:
@kubernetes/sig-storage
**Release note**:
<!-- Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access)
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`.
-->
``` release-note
```
Signed-off-by: Huamin Chen hchen@redhat.com
At master volume reconciler, the information about which volumes are
attached to nodes is cached in actual state of world. However, this
information might be out of date in case that node is terminated (volume
is detached automatically). In this situation, reconciler assume volume
is still attached and will not issue attach operation when node comes
back. Pods created on those nodes will fail to mount.
This PR adds the logic to periodically sync up the truth for attached volumes kept in the actual state cache. If the volume is no longer attached to the node, the actual state will be updated to reflect the truth. In turn, reconciler will take actions if needed.
To avoid issuing many concurrent operations on cloud provider, this PR
tries to add batch operation to check whether a list of volumes are
attached to the node instead of one request per volume.
More details are explained in PR #33760
When kubelet restarts, all the information about the volumes will be
gone from actual/desired states. When update node status with mounted
volumes, the volume list might be empty although there are still volumes
are mounted and in turn causing master to detach those volumes since
they are not in the mounted volumes list. This fix is to make sure only
update mounted volumes list after reconciler starts sync states process.
This sync state process will scan the existing volume directories and
reconstruct actual states if they are missing.
This PR also fixes the problem during orphaned pods' directories. In
case of the pod directory is unmounted but has not yet deleted (e.g.,
interrupted with kubelet restarts), clean up routine will delete the
directory so that the pod directoriy could be cleaned up (it is safe to
delete directory since it is no longer mounted)
The third issue this PR fixes is that during reconstruct volume in
actual state, mounter could not be nil since it is required for creating
container.VolumeMap. If it is nil, it might cause nil pointer exception
in kubelet.
Details are in proposal PR #33203