Hard-coding the tests to use /tmp/device_plugin for sockets is
problematic because it prevents running tests in parallel on the same
machine (perhaps because there are multiple developers, perhaps
because testing is done independently on different code checkouts).
/tmp/device_plugin also was not removed after testing.
This is probably not that relevant. But more importantly, this change
also fixes https://github.com/kubernetes/kubernetes/issues/59488.
"make test" failed in TestDevicePluginReRegistration because something
removed /tmp/device_plugin/device-plugin.sock while something else
tried to connect to it:
2018/02/07 14:34:39 Starting to serve on /tmp/device_plugin/device-plugin.sock
[pid 29568] connect(14, {sa_family=AF_UNIX, sun_path="/tmp/device_plugin/server.sock"}, 33) = 0
[pid 29568] unlinkat(AT_FDCWD, "/tmp/device_plugin/server.sock", 0) = 0
[pid 29568] unlinkat(AT_FDCWD, "/tmp/device_plugin/device-plugin.sock", 0) = 0
[pid 29568] --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=29568, si_uid=1000} ---
[pid 29568] connect(6, {sa_family=AF_UNIX, sun_path="/tmp/device_plugin/device-plugin.sock"}, 40) = -1 ENOENT (No such file or directory)
E0207 14:34:39.961321 29568 endpoint.go:117] listAndWatch ended unexpectedly for device plugin mock with error rpc error: code = Unavailable desc = transport is closing
strace: Process 29623 attached
[pid 29574] connect(3, {sa_family=AF_UNIX, sun_path="/tmp/device_plugin/device-plugin.sock"}, 40) = -1 ENOENT (No such file or directory)
[pid 29623] connect(3, {sa_family=AF_UNIX, sun_path="/tmp/device_plugin/device-plugin.sock"}, 40) = -1 ENOENT (No such file or directory)
[pid 29574] connect(3, {sa_family=AF_UNIX, sun_path="/tmp/device_plugin/device-plugin.sock"}, 40) = -1 ENOENT (No such file or directory)
E0207 14:34:49.961324 29568 endpoint.go:60] Can't create new endpoint with path /tmp/device_plugin/device-plugin.sock err failed to dial device plugin: context deadline exceeded
E0207 14:34:49.961390 29568 manager.go:340] Failed to dial device plugin with request &RegisterRequest{Version:v1alpha2,Endpoint:device-plugin.sock,ResourceName:fake-domain/resource,}: failed to dial device plugin: context deadline exceeded
panic: test timed out after 2m0s
It's not entirely certain which code was to blame for this unlinkat()
calls (perhaps some cleanup code from a previous test running in a
goroutine?) but this no longer happened after switching to per-test
socket directories.
Automatic merge from submit-queue (batch tested with PRs 59466, 58912, 59605, 59548). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
fix incorrect logic in canSupport
**What this PR does / why we need it**:
if `spec.PersistentVolume` is nil or `spec.Volume` is nil, func `canSupport` should return false.
**Release note**:
```
NONE
```
/release-note-none
/sig storage
Automatic merge from submit-queue (batch tested with PRs 59466, 58912, 59605, 59548). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
return a more human readable error message if mount an unformatted vo…
**What this PR does / why we need it**:
If an unformatted volume is requested as read only mode, according device mount operation will fail, and the message is verbose and obscure. We should check this scenario and return a more human readable message.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#58911
**Release note**:
```release-note
NONE
```
/sig storage
Automatic merge from submit-queue (batch tested with PRs 59466, 58912, 59605, 59548). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Gluster: Remove provisioner configuration from info message.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Update azure_loadbalancer.md to fix typo
fix typo
incase -> in case
selction -> selection
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
fix typo
incase -> in case
selction -> selection
```
Automatic merge from submit-queue (batch tested with PRs 58437, 59490, 55684). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
[GCE] Unit test ExternalLoadBalancer
**What this PR does / why we need it**:
- Unit tests for `pkg/cloudprovider/providers/gce/gce_loadbalancer_external.go`
- Tests creating, updating, and deleting an external LoadBalancer
**Future Improvements**
In order to further test `gce_loadbalancer_external.go`, we should add tests for the following cases:
- Network Tiers - when the current/desired network tier doesn't match, existing resources with the wrong tier should be torn down
- Expect an error when the TargetPool does not exist
- Expect an error when the LoadBalancer Firewall does not exist
- Case when TargetPool needs to be recreated
- Case when IP needs to be released (calls gce.DeleteRegionAddress)
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Enable golinting for scheduler packages.
**What this PR does / why we need it**:
Enable golinting for scheduler packages
**Which issue(s) this PR fixes**:
Fixes#58234
**Special notes for your reviewer**:
- `pkg/scheduler/api` and `pkg/scheduler/api/v1` are not removed from `hack/.golint_failures`, because there are auto-generated go files by `deepcopy-gen` in the package, which have golint errors and are not suggested manually edited.
- Please help to refine the comments if there are error or inaccurate descriptions. Thanks!
**Release note**:
```release-note
Enable golint for `pkg/scheduler` and fix the golint errors in it.
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Make AWS attach/detach operations faster
Most attach/detach operations on AWS finish within 1-4seconds.
By using a shorter time interval and higher exponetial
factor we can shorten time taken for attach and detach to complete.
After this change retry interval looks like:
```
[1, 1.8, 3.24, 5.832000000000001, 10.4976]
```
Before it was:
```
[10, 12.0, 14.399999999999999, 17.279999999999998]
```
/sig aws
```release-note
AWS: Make attach/detach operations faster. from 10-12s to 2-6s
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add mountpoint as CRI image filesystem storage identifier.
Fixes https://github.com/kubernetes/kubernetes/issues/57356.
This PR changes CRI to use mountpoint as storage identifier. See https://github.com/kubernetes/kubernetes/issues/57356#issuecomment-363608733.
Note that:
1) This doesn't work with devicemapper for now. Please feel free to propose change for device mapper, we can discuss more about this after this first version is merged. @mrunalp @runcom
2) `mountpoint` is added as new field in `StorageIdentifier` now. After https://github.com/kubernetes/kubernetes/pull/58973 is merged, we can remove the UUID field in `v1alpha2`.
/cc @yujuhong @feiskyer @yguo0905 @dashpole @mikebrow @abhi @kubernetes/sig-node-api-reviews
**Release note**:
```release-note
CRI starts using moutpoint as image filesystem identifier instead of UUID.
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Cleanup and add category doc
**What this PR does / why we need it**:
CategoryExpender interface and some implements structs in package pkg/kubectl/categories are quite confusing to contributors due to the insufficiency documents and illustration.
This PR does:
1. Clean up some useless or vague comments
2. Add illustration for CategoryExpender and implements structs
@pwittrock @droot
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Redesign and implement volume reconstruction work
This PR is the first part of redesign of volume reconstruction work. The detailed design information is https://github.com/kubernetes/community/pull/1601
The changes include
1. Remove dependency on volume spec stored in actual state for volume
cleanup process (UnmountVolume and UnmountDevice)
Modify AttachedVolume struct to add DeviceMountPath so that volume
unmount operation can use this information instead of constructing from
volume spec
2. Modify reconciler's volume reconstruction process (syncState). Currently workflow
is when kubelet restarts, syncState() is only called once before
reconciler starts its loop.
a. If volume plugin supports reconstruction, it will use the
reconstructed volume spec information to update actual state as before.
b. If volume plugin cannot support reconstruction, it will use the
scanned mount path information to clean up the mounts.
In this PR, all the plugins still support reconstruction (except
glusterfs), so reconstruction of some plugins will still have issues.
The next PR will modify those plugins that cannot support reconstruction
well.
This PR addresses issue #52683
Automatic merge from submit-queue (batch tested with PRs 59344, 59595, 59598). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Remove unnecessary summary api call.
Summary API call is not as cheap as we think. Especially for CRI container runtime, it means:
1) Extra cgroups parsing (because cpu/memory are considered to be on demand);
2) Extra grpc encoding/decoding `ListPodSandboxes`, `ListContainers`, `ListContainerStats`;
I don't think it is necessary to call summary twice inside the same function.
/cc @kubernetes/sig-node-pr-reviews @dashpole @jingxu97
Signed-off-by: Lantao Liu <lantaol@google.com>
**Release note**:
```release-note
none
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
AWS: Do not ignore errors from EC2::DescribeVolume in DetachDisk
The DetachDisk method of AWS cloudprovider indirectly calls
EC2::DescribeVolume AWS API function to check if the volume
being detached is really attached to the specified node.
The AWS API call may fail and return error which is logged however
the DetachDisk then finishes successfully. This may cause the AWS
volumes to remain attached to the instances forever because the
attach/detach controller will mark the volume as attached. The PV
controller will never be able to delete those disks and they need
to be detached manually.
This patch ensures the error from DescribeVolume is propagated to
attach/detach controller and the detach operation is re-tried.
cc: @gnufied, @jsafrane
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 59580, 58854). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Prefer apps/v1 storage for daemonsets, deployments, replicasets, statefulsets
The workload API objects went GA in 1.9. This means we can safely begin persisting them in etcd in apps/v1 format in 1.10.
xref #43214
```release-note
DaemonSet, Deployment, ReplicaSet, and StatefulSet objects are now persisted in etcd in apps/v1 format
```
Automatic merge from submit-queue (batch tested with PRs 59054, 59515, 59577). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add 'none' option to EnforceNodeAllocatable
This lets us use omitempty on `EnforceNodeAllocatable`. We shouldn't treat
`nil` as different from `[]T{}`, because this can play havoc with
serializers (a-la #43203).
See: https://github.com/kubernetes/kubernetes/pull/53833#discussion_r166672137
```release-note
'none' can now be specified in KubeletConfiguration.EnforceNodeAllocatable (--enforce-node-allocatable) to explicitly disable enforcement.
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
vclib: enable VM disk attach test
**What this PR does / why we need it**:
Follow up to PR #58534 , where this test was disabled due to a limitation in
govmomi/simulator. The test passes as expected with godeps update of govmomi.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
This PR is 1-line update to the vSphere Cloud Provider tests and godep update of the vendor'd vmware/govmomi repo.
**Release note**:
```release-note
NONE
```
We're about to tear the container down, there's no point. It also suppresses
an annoying error message due to kubelet stupidity that causes multiple
parallel calls to StopPodSandbox for the same sandbox.
docker_sandbox.go:355] failed to read pod IP from plugin/docker: NetworkPlugin cni failed on the status hook for pod "docker-registry-1-deploy_default": Unexpected command output nsenter: cannot open /proc/22646/ns/net: No such file or directory
1) A first StopPodSandbox() request triggered by SyncLoop(PLEG) for
a ContainerDied event calls into TearDownPod() and thus the network
plugin. Until this completes, networkReady=true for the
sandbox.
2) A second StopPodSandbox() request triggered by SyncLoop(REMOVE)
calls PodSandboxStatus() and calls into the network plugin to read
the IP address because networkReady=true
3) The first request exits the network plugin, sets networReady=false,
and calls StopContainer() on the sandbox. This destroys the network
namespace.
4) The second request finally gets around to running nsenter but
the network namespace is already destroyed. It returns an error
which is logged by getIP().
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Report InstanceID for vSphere Cloud Provider as UUID obtained from product_serial file
**What this PR does / why we need it**:
vSphere Cloud Provider is not able to find the nodes for VMs created on vSphere v1.6.5. Kubelet fetches SystemUUID from file ```/sys/class/dmi/id/product_uuid```. vSphere Cloud Provider uses this uuid as VM identifier to get node information from vCenter. vCenter v1.6.5 doesn't recognize this uuids, as a result, nodes are not found.
UUID present in file ```/sys/class/dmi/id/product_serial``` is recognized by vCenter. Yet, Kubelet doesn't report this. Therefore, in this PR InstanceID is reported as UUID which is fetched from file
```/sys/class/dmi/id/product_serial```.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes https://github.com/kubernetes/kubernetes/issues/58927
**Special notes for your reviewer**:
Internally review here: https://github.com/vmware/kubernetes/pull/452
Tested:
Launched K8s cluster using kubeadm (Used Ubuntu VM compatible with vSphere version 6.5.)
_**Note: Installed Ubuntu from ISO**_
Observed following:
```
Master
> cat /sys/class/dmi/id/product_uuid
743F0E42-84EA-A2F9-7736-6106BB5DBF6B
> cat /sys/class/dmi/id/product_serial
VMware-42 0e 3f 74 ea 84 f9 a2-77 36 61 06 bb 5d bf 6b
Node
> cat /sys/class/dmi/id/product_uuid
956E0E42-CC9D-3D89-9757-F27CEB539B76
> cat /sys/class/dmi/id/product_serial
VMware-42 0e 6e 95 9d cc 89 3d-97 57 f2 7c eb 53 9b 76
```
With this fix controller manager was able to find the nodes.
**controller manager logs**
```
{"log":"I0205 22:43:00.106416 1 nodemanager.go:183] Found node ubuntu-node as vm=VirtualMachine:vm-95 in vc=10.161.120.115 and datacenter=vcqaDC\n","stream":"stderr","time":"2018-02-05T22:43:00.421010375Z"}
```
**Release note**:
```release-note
vSphere Cloud Provider supports VMs provisioned on vSphere v1.6.5
```