kubernetes/pkg/volume
Kubernetes Submit Queue 5e70562c6a
Merge pull request #57512 from cofyc/improve_rbd_highavailability
Automatic merge from submit-queue (batch tested with PRs 57572, 57512, 57770). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

RBD Plugin: Pass monitors addresses in a comma-separed list instead of trying one by one.

**What this PR does / why we need it**:

In production, monitors may crash (or have a network problem), if we try monitors one by one, rbd
command will hang a long time (e.g. `rbd map -m <unconnectable_host_ip>`
on linux 4.4 timed out in 6 minutes) when trying a unconnectable monitor. This is unacceptable.

Actually, we can simply pass a comma-separated list monitor addresses to `rbd`
command utility. Kernel rbd/libceph modules will pick monitor randomly
and try one by one, `rbd` command utility succeed soon if there is a
good one in monitors list.

[Docs](http://docs.ceph.com/docs/jewel/man/8/rbd/#cmdoption-rbd-m) about `-m` option of `rbd` is wrong,  'rbd' utility simply pass '-m <mon>' parameter to kernel rbd/libceph modules, which
takes a comma-seprated list of one or more monitor addresses (e.g. ip1[:port1][,ip2[:port2]...]) in its first version in linux (see 602adf4002/net/ceph/ceph_common.c (L239)). Also, libceph choose monitor randomly, so we can simply pass all addresses without randomization (see 602adf4002/net/ceph/mon_client.c (L132)).

From what I saw, there is no need to iterate monitor hosts one by one.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #

**Special notes for your reviewer**:

Run `rbd map` against unconnectable monitor address logs on Linux 4.4:

```
root@myhost:~# uname -a
Linux myhost 4.4.0-62-generic #83-Ubuntu SMP Wed Jan 18 14:10:15 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
root@myhost:~# time rbd map kubernetes-dynamic-pvc-941ff4d2-b951-11e7-8836-049fca8e58df --pool <pool> --id <id> -m <unconnectable_host_ip> --key=<password>
rbd: sysfs write failed
2017-12-20 18:55:11.810583 7f7ec56863c0  0 monclient(hunting): authenticate timed out after 300
2017-12-20 18:55:11.810638 7f7ec56863c0  0 librados: client.<id> authentication error (110) Connection timed out
rbd: couldn't connect to the cluster!
In some cases useful info is found in syslog - try "dmesg | tail" or so.
rbd: map failed: (110) Connection timed out

real	6m0.018s
user	0m0.052s
sys	0m0.064s
```  

We can simply pass a comma-separated list of monitors, if there is a good one in them, `rbd map` succeed soon.

```
root@myhost:~# time rbd map kubernetes-dynamic-pvc-941ff4d2-b951-11e7-8836-049fca8e58df --pool <pool> --id <id> -m <unconnectable_host_ip>,<good_host_ip> --key=<password>

/dev/rbd3

real	0m0.426s
user	0m0.008s
sys	0m0.008s
```

**Release note**:

```release-note
NONE
```
2018-01-03 13:46:32 -08:00
..
aws_ebs Autogenerate BUILD files 2017-12-23 13:12:11 -08:00
azure_dd Autogenerate BUILD files 2017-12-23 13:12:11 -08:00
azure_file Autogenerate BUILD files 2017-12-23 13:12:11 -08:00
cephfs Autogenerate BUILD files 2017-12-23 13:12:11 -08:00
cinder Autogenerate BUILD files 2017-12-23 13:12:11 -08:00
configmap Autogenerate BUILD files 2017-12-23 13:12:11 -08:00
csi Autogenerate BUILD files 2017-12-23 13:12:11 -08:00
downwardapi Autogenerate BUILD files 2017-12-23 13:12:11 -08:00
empty_dir Autogenerate BUILD files 2017-12-23 13:12:11 -08:00
fc Autogenerate BUILD files 2017-12-23 13:12:11 -08:00
flexvolume Remove unused command waitfordetach from flex volume driver 2018-01-03 16:02:31 +08:00
flocker Autogenerate BUILD files 2017-12-23 13:12:11 -08:00
gce_pd Autogenerate BUILD files 2017-12-23 13:12:11 -08:00
git_repo Autogenerate BUILD files 2017-12-23 13:12:11 -08:00
glusterfs Autogenerate BUILD files 2017-12-23 13:12:11 -08:00
host_path Autogenerate BUILD files 2017-12-23 13:12:11 -08:00
iscsi Merge pull request #57475 from stmcginnis/iscsi_node_startup 2018-01-02 10:57:57 -08:00
local Autogenerate BUILD files 2017-12-23 13:12:11 -08:00
nfs Autogenerate BUILD files 2017-12-23 13:12:11 -08:00
photon_pd Autogenerate BUILD files 2017-12-23 13:12:11 -08:00
portworx Autogenerate BUILD files 2017-12-23 13:12:11 -08:00
projected Autogenerate BUILD files 2017-12-23 13:12:11 -08:00
quobyte Autogenerate BUILD files 2017-12-23 13:12:11 -08:00
rbd Merge pull request #57512 from cofyc/improve_rbd_highavailability 2018-01-03 13:46:32 -08:00
scaleio Autogenerate BUILD files 2017-12-23 13:12:11 -08:00
secret Autogenerate BUILD files 2017-12-23 13:12:11 -08:00
storageos Autogenerate BUILD files 2017-12-23 13:12:11 -08:00
testing VolumeHost.GetNodeName method added for CSI fix 2017-12-02 05:54:54 -05:00
util Merge pull request #57702 from mlmhl/volume_resize_event 2018-01-03 08:30:30 -08:00
validation Autogenerate BUILD files 2017-12-23 13:12:11 -08:00
vsphere_volume Autogenerate BUILD files 2017-12-23 13:12:11 -08:00
BUILD Autogenerate BUILD files 2017-12-23 13:12:11 -08:00
doc.go
metrics_cached.go
metrics_du_test.go switch from package syscall to x/sys/unix 2017-07-21 12:14:42 +02:00
metrics_du.go
metrics_errors.go
metrics_nil_test.go
metrics_nil.go
metrics_statfs_test.go
metrics_statfs.go
OWNERS Update volume OWNERS to reflect active sig-storage reviewers 2017-10-26 13:26:33 -07:00
plugins_test.go Use const instead of hard code for volume plugin 2017-09-18 20:09:07 +08:00
plugins.go VolumeHost.GetNodeName method added for CSI fix 2017-12-02 05:54:54 -05:00
util_test.go Revert k8s.gcr.io vanity domain 2017-12-22 14:36:16 -08:00
util.go Merge pull request #56742 from zouyee/patch-12 2017-12-20 16:47:34 -08:00
volume_linux.go Fixes cross platform build failure 2017-08-26 09:58:51 -04:00
volume_unsupported.go Fixes cross platform build failure 2017-08-26 09:58:51 -04:00
volume.go BlockVolumesSupport: CRI, VolumeManager and OperationExecutor changes 2017-11-20 14:10:26 -05:00