We sometimes face issues with iSCSI PVs and it's hard to guess what's going
on without iscsiadm commands logged. Using level 5 (iscsiadm output can be
long sometimes) and explicitly avoiding logging of CHAP passwords.
In addition, log which path failed to appear after timeout, so the admin
can see which portals are not providing devices.
Kubernetes should retry detaching iSCSI volumes on error. In addition, it
should not report an error when detaching a disk while the disk is already
detached.
At present, iscsi plugin wait for 10seconds for a path to appear for a multipath
device, but at certain scenarios this may not be sufficient for device mapper
to get the path. The default multipath configuration has a configuation
called 'checker_timeout' which specify the timeout to user for path checkers
that issue scsi commands with an explicit timeout, in seconds;
default taken from /sys/block/sd*/device/timeout which is 30s.
This patch lift the timeout value from 10s to 30s.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
This patch cleans up pkg/util/mount/* and pkg/util/volume/* to always
use filepath.Join instead of path.Join. filepath.Join is preferred
because path.Join can have issues on Windows.
- Move from the old github.com/golang/glog to k8s.io/klog
- klog as explicit InitFlags() so we add them as necessary
- we update the other repositories that we vendor that made a similar
change from glog to klog
* github.com/kubernetes/repo-infra
* k8s.io/gengo/
* k8s.io/kube-openapi/
* github.com/google/cadvisor
- Entirely remove all references to glog
- Fix some tests by explicit InitFlags in their init() methods
Change-Id: I92db545ff36fcec83afe98f550c9e630098b3135
Don't mount single path instead of multipath volumes and always wait until
at least 2 paths are available. Try up to 3 times to get all paths. Try 5
times to get at last 2 paths.
Automatic merge from submit-queue (batch tested with PRs 66916, 67252, 67794, 67619, 67328). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Using a fixed set of locks, then we don't need to free unused locks anymore.
**What this PR does / why we need it**:
Using a fixed set of locks, then we don't need to free unused locks anymore.
See kubernetes/kubernetes/pull/66442 for discussions.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#65113
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
/assign @msau42
/assign @thockin
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Attacher/Detacher refactor for local storage
Proposal link: https://github.com/kubernetes/community/pull/2438
**What this PR does / why we need it**:
Attacher/Detacher refactor for the plugins which just need to mount device, but do not need to attach, such as local storage plugin.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
```release-note
Attacher/Detacher refactor for local storage
```
/sig storage
/kind feature
It takes a variable amount of time for the multipath daemon
to create /dev/dm-XX in response to new LUNs being discovered.
The old iscsi_util code only discovered the multipath device
if it was created quickly enough, but in a significant number
of cases, kubelet would grab one of the individual paths and
put a filesystem it on before multipathd could construct a
multipath device.
This change waits for the multipath device to get created for
up to 10 seconds, but only if the PV actually had more than
one portal.
This change ensures that iSCSI block devices are deleted after
unmounting, and implements scanning of individual LUNs rather
than scanning the whole iSCSI bus.
In cases where an iSCSI bus is in use by more than one attachment,
detaching used to leave behind phantom block devices, which could
cause I/O errors, long timeouts, or even corruption in the case
when the underlying LUN number was recycled. This change makes
sure to flush references to the block devices after unmounting.
The original iSCSI code scanned the whole target every time a LUN
was attached. On storage controllers that export multiple LUNs on
the same target IQN, this led to a situation where nodes would
see SCSI disks that they weren't supposed to -- possibly dozens or
hundreds of extra SCSI disks. This caused 3 significant problems:
1) The large number of disks wasted resources on the node and
caused a minor drag on performance.
2) The scanning of all the devices caused a huge number of uevents
from the kernel, causing udev to bog down for multiple minutes in
some cases, triggering timeouts and other transient failures.
3) Because Kubernetes was not tracking all the "extra" LUNs that
got discovered, they would not get cleaned up until the last LUN
on a particular target was detached, causing a logout. This led
to significant complications:
In the time window between when a LUN was unintentially scanned,
and when it was removed due to a logout, if it was deleted on the
backend, a phantom reference remained on the node. In the best
case, the phantom LUN would cause I/O errors and timeouts in the
udev system. In the worst case, the backend could reuse the LUN
number for a new volume, and if that new volume were to be
scheduled to a pod with a phantom reference to the old LUN by the
same number, the initiator could get confused and possibly corrupt
data on that volume.
To avoid these problems, the new implementation only scans for
the specific LUN number it expects to see. It's worth noting that
the default behavior of iscsiadm is to automatically scan the
whole bus on login. That behavior can be disabled by setting
node.session.scan = manual
in iscsid.conf, and for the reasons mentioned above, it is
strongly recommended to set that option. This change still works
regardless of the setting in iscsid.conf, and while automatic
scanning will cause some problems, this change doesn't make the
problems any worse, and can make things better in some cases.
These are all flagged by Go 1.11's
more accurate printf checking in go vet,
which runs as part of go test.
Lubomir I. Ivanov <neolit123@gmail.com>
applied ammend for:
pkg/cloudprovider/provivers/vsphere/nodemanager.go
Google's configure-helper.sh script bind-mounts /var/lib/kubelet somewhere
into /home/kubernetes and thus every mount that Kubernetes does is visible
twice in /proc/mounts.
iSCSI and RBD should not rely on counting on entries in /proc/mounts and
unmount device when Kubernetes thinks it's unusued. Kubernetes tracks
the mounts by itself and most of other volume plugins rely on it safely.