kubernetes

Files

Kubernetes Submit Queue 5e70562c6a Merge pull request #57512 from cofyc/improve_rbd_highavailability

Automatic merge from submit-queue (batch tested with PRs 57572, 57512, 57770). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

RBD Plugin: Pass monitors addresses in a comma-separed list instead of trying one by one.

**What this PR does / why we need it**:

In production, monitors may crash (or have a network problem), if we try monitors one by one, rbd
command will hang a long time (e.g. `rbd map -m <unconnectable_host_ip>`
on linux 4.4 timed out in 6 minutes) when trying a unconnectable monitor. This is unacceptable.

Actually, we can simply pass a comma-separated list monitor addresses to `rbd`
command utility. Kernel rbd/libceph modules will pick monitor randomly
and try one by one, `rbd` command utility succeed soon if there is a
good one in monitors list.

[Docs](http://docs.ceph.com/docs/jewel/man/8/rbd/#cmdoption-rbd-m) about `-m` option of `rbd` is wrong,  'rbd' utility simply pass '-m <mon>' parameter to kernel rbd/libceph modules, which
takes a comma-seprated list of one or more monitor addresses (e.g. ip1[:port1][,ip2[:port2]...]) in its first version in linux (see 602adf4002/net/ceph/ceph_common.c (L239)). Also, libceph choose monitor randomly, so we can simply pass all addresses without randomization (see 602adf4002/net/ceph/mon_client.c (L132)).

From what I saw, there is no need to iterate monitor hosts one by one.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #

**Special notes for your reviewer**:

Run `rbd map` against unconnectable monitor address logs on Linux 4.4:

```
root@myhost:~# uname -a
Linux myhost 4.4.0-62-generic #83-Ubuntu SMP Wed Jan 18 14:10:15 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
root@myhost:~# time rbd map kubernetes-dynamic-pvc-941ff4d2-b951-11e7-8836-049fca8e58df --pool <pool> --id <id> -m <unconnectable_host_ip> --key=<password>
rbd: sysfs write failed
2017-12-20 18:55:11.810583 7f7ec56863c0  0 monclient(hunting): authenticate timed out after 300
2017-12-20 18:55:11.810638 7f7ec56863c0  0 librados: client.<id> authentication error (110) Connection timed out
rbd: couldn't connect to the cluster!
In some cases useful info is found in syslog - try "dmesg | tail" or so.
rbd: map failed: (110) Connection timed out

real	6m0.018s
user	0m0.052s
sys	0m0.064s
```  

We can simply pass a comma-separated list of monitors, if there is a good one in them, `rbd map` succeed soon.

```
root@myhost:~# time rbd map kubernetes-dynamic-pvc-941ff4d2-b951-11e7-8836-049fca8e58df --pool <pool> --id <id> -m <unconnectable_host_ip>,<good_host_ip> --key=<password>

/dev/rbd3

real	0m0.426s
user	0m0.008s
sys	0m0.008s
```

**Release note**:

```release-note
NONE
```

2018-01-03 13:46:32 -08:00