Automatic merge from submit-queue (batch tested with PRs 41597, 42185, 42075, 42178, 41705)
force rbd image unlock if the image is not used
**What this PR does / why we need it**:
Ceph RBD image could be locked if the host that holds the lock is down. In such case, the image cannot be used by other Pods.
The fix is to detect the orphaned locks and force unlock.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#31790
**Special notes for your reviewer**:
Note, previously, RBD volume plugin maps the image, mount it, and create a lock on the image. Since the proposed fix uses `rbd status` output to determine if the image is being used, the sequence has to change to: rbd lock checking (through `rbd lock list`), mapping check (through `rbd status`), forced unlock if necessary (through `rbd lock rm`), image lock, image mapping, and mount.
**Release note**:
```release-note
force unlock rbd image if the image is not used
```
- Add a new type PortworxVolumeSource
- Implement the kubernetes volume plugin for Portworx Volumes under pkg/volume/portworx
- The Portworx Volume Driver uses the libopenstorage/openstorage specifications and apis for volume operations.
Changes for k8s configuration and examples for portworx volumes.
- Add PortworxVolume hooks in kubectl, kube-controller-manager and validation.
- Add a README for PortworxVolume usage as PVs, PVCs and StorageClass.
- Add example spec files
Handle code review comments.
- Modified READMEs to incorporate to suggestions.
- Add a test for ReadWriteMany access mode.
- Use util.UnmountPath in TearDown.
- Add ReadOnly flag to PortworxVolumeSource
- Use hostname:port instead of unix sockets
- Delete the mount dir in TearDown.
- Fix link issue in persistentvolumes README
- In unit test check for mountpath after Setup is done.
- Add PVC Claim Name as a Portworx Volume Label
Generated code and documentation.
- Updated swagger spec
- Updated api-reference docs
- Updated generated code under pkg/api/v1
Godeps update for Portworx Volume Driver
- Adds github.com/libopenstorage/openstorage
- Adds go.pedge.io/pb/go/google/protobuf
- Updates Godep Licenses
Automatic merge from submit-queue (batch tested with PRs 41116, 41804, 42104, 42111, 42120)
Add support for attacher/detacher interface in Flex volume
Add support for attacher/detacher interface in Flex volume
This change breaks backward compatibility and requires to be release noted.
```release-note
Flex volume plugin is updated to support attach/detach interfaces. It broke backward compatibility. Please update your drivers and implement the new callouts.
```
Add some lines about how to enable multipath for block storage.
A new README was added, because multipath is relevant for at least
FC and iSCSI.
Signed-off-by: Fabian Deutsch <fabiand@fedoraproject.org>
A minimalistic multipath.conf got written, but it was useless, as
it is unclear if multipathd is running and there was also no
config reload triggered.
This patch drops this snippet. In general it's probably a better idea
to leave the multipath.conf to the component managing the host.
Signed-off-by: Fabian Deutsch <fabiand@fedoraproject.org>
is contacted if there is a failure in the connected server.
Mount option becomes:
mount -t glusterfs -o log-level=ERROR,log-file=/var/lib/kubelet/plugins/kubernetes.io/glusterfs/glustermount/glusterpod-glusterfs.log,backup-volfile-servers=192.168.100.0:192.168.200.0:192.168.43.149 ..
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
Automatic merge from submit-queue
Fix for Support selection of datastore for dynamic provisioning in vS…
Fixes#40558
Current vSphere Cloud provider doesn't allow a user to select a datastore for dynamic provisioning. All the volumes are created in default datastore provided by the user in the global vsphere configuration file.
With this fix, the user will be able to provide the datastore in the storage class definition. This will allow the volumes to be created in the datastore specified by the user in the storage class definition. This field is optional. If no datastore is specified, the volume will be created in the default datastore specified in the global config file.
For example:
User creates a storage class with the datastore
kind: StorageClass
apiVersion: storage.k8s.io/v1beta1
metadata:
name: slow
provisioner: kubernetes.io/vsphere-volume
parameters:
diskformat: thin
datastore: VMFSDatastore
Now the volume will be created in the datastore - "VMFSDatastore" specified by the user.
If the user creates a storage class without any datastore
kind: StorageClass
apiVersion: storage.k8s.io/v1beta1
metadata:
name: slow
provisioner: kubernetes.io/vsphere-volume
parameters:
diskformat: thin
Now the volume will be created in the datastore which in the global configuration file (vsphere.conf)
@pdhamdhere @kerneltime
Automatic merge from submit-queue (batch tested with PRs 41667, 41820, 40910, 41645, 41361)
Allow multiple mounts in StatefulSet volume zone placement
We have some heuristics that ensure that volumes (and hence stateful set
pods) are spread out across zones. Sadly they forgot to account for
multiple mounts. This PR updates the heuristic to ignore the mount name
when we see something that looks like a statefulset volume, thus
ensuring that multiple mounts end up in the same AZ.
Fix#35695
```release-note
Fix zone placement heuristics so that multiple mounts in a StatefulSet pod are created in the same zone
```
Automatic merge from submit-queue (batch tested with PRs 41364, 40317, 41326, 41783, 41782)
changes to cleanup the volume plugin for recycle
**What this PR does / why we need it**:
Code cleanup. Changing from creating a new interface from the plugin, that then calls a function to recycle a volume, to adding the function to the plugin itself.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#26230
**Special notes for your reviewer**:
Took same approach from closed PR #28432.
Do you want the approach to be the same for NewDeleter(), NewMounter(), NewUnMounter() and should they be in this same PR or submit different PR's for those?
**Release note**:
```NONE
```
Automatic merge from submit-queue (batch tested with PRs 39373, 41585, 41617, 41707, 39958)
Fix ConfigMaps for Windows
**What this PR does / why we need it**: ConfigMaps were broken for Windows as the existing code used linux specific file paths. Updated the code in `kubelet_getters.go` to use `path/filepath` to get the directories. Also reverted back the code in `secret.go` as updating `kubelet_getters.go` to use `path/filepath` also fixes `secrets`
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes https://github.com/kubernetes/kubernetes/issues/39372
```release-note
Fix ConfigMap for Windows Containers.
```
cc: @pires
We have some heuristics that ensure that volumes (and hence stateful set
pods) are spread out across zones. Sadly they forgot to account for
multiple mounts. This PR updates the heuristic to ignore the mount name
when we see something that looks like a statefulset volume, thus
ensuring that multiple mounts end up in the same AZ.
Fix#35695
Automatic merge from submit-queue (batch tested with PRs 41531, 40417, 41434)
Always detach volumes in operator executor
**What this PR does / why we need it**:
Instead of marking a volume as detached immediately in Kubelet's
reconciler, delegate the marking asynchronously to the operator
executor. This is necessary to prevent race conditions with other
operations mutating the same volume state.
An example of one such problem:
1. pod is created, volume is added to desired state of the world
2. reconciler process starts
3. reconciler starts MountVolume, which is kicked off asynchronously via
operation_executor.go
4. MountVolume mounts the volume, but hasn't yet marked it as mounted
5. pod is deleted, volume is removed from desired state of the world
6. reconciler reaches detach volume section, detects volume is no longer in desired state of world,
removes it from volumes in use
7. MountVolume tries to mark mount, throws an error because
volume is no longer in actual state of world list. After this, kubelet isn't aware of the mount
so doesn't try to unmount again.
8. controller-manager tries to detach the volume, this fails because it
is still mounted to the OS.
9. EBS gets stuck indefinitely in busy state trying to detach.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#32881, fixes ##37854 (maybe)
**Special notes for your reviewer**:
**Release note**:
```release-note
```
To safely mark a volume detached when the volume controller manager is used.
An example of one such problem:
1. pod is created, volume is added to desired state of the world
2. reconciler process starts
3. reconciler starts MountVolume, which is kicked off asynchronously via
operation_executor.go
4. MountVolume mounts the volume, but hasn't yet marked it as mounted
5. pod is deleted, volume is removed from desired state of the world
6. reconciler detects volume is no longer in desired state of world,
removes it from volumes in use
7. MountVolume tries to mark volume in use, throws an error because
volume is no longer in actual state of world list.
8. controller-manager tries to detach the volume, this fails because it
is still mounted to the OS.
9. EBS gets stuck indefinitely in busy state trying to detach.
Automatic merge from submit-queue (batch tested with PRs 41246, 39998)
Cinder volume attacher: use instanceID instead of NodeID when verifying attachment
**What this PR does / why we need it**: Cinder volume attacher incorrectly uses NodeID instead of openstack instance id, so that reconciliation fails.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#39978
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue
Add gnufied as reviewer for aws and gce volumes
Adding myself as reviewer for aws and gce volume plugins. I understand the code well enough and have helped with review in those areas already.
cc @childsb @justinsb @saad-ali
This fix prevents the PV controller from forcefully overwriting the provisioned volume's name with the generated PV name. Instead, it allows dynamic provisioner implementers to set the name of the volume to a value that they choose.
Automatic merge from submit-queue (batch tested with PRs 38772, 38797, 40732, 40740)
Synchronous spellcheck for pkg/volume/*
**What this PR does / why we need it**: Increase code readability
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**: Minor contribution
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 40299, 40311)
move authoritative client-go util out of pkg
Move `client-go/pkg/util` which are authoritative to `client-go/util` to make it easier to reason about what comes from where.
Automatic merge from submit-queue
Adding rescan scsi controller for cinder
For lsilogic scsi controller, attached cinder volume does not
appear under /dev/ automatically unless do a rescan.
This approach was used in vSphere volume provider before PR #27496
dropped support for lsilogic scsi controller.
Automatic merge from submit-queue
Optional configmaps and secrets
Allow configmaps and secrets for environment variables and volume sources to be optional
Implements approved proposal c9f881b7bb
Release note:
```release-note
Volumes and environment variables populated from ConfigMap and Secret objects can now tolerate the named source object or specific keys being missing, by adding `optional: true` to the volume or environment variable source specifications.
```
Automatic merge from submit-queue (batch tested with PRs 37228, 40146, 40075, 38789, 40189)
Cleanup temp dirs
So funny story my /tmp ran out of space running the unit tests so I am cleaning up all the temp dirs we create.
Automatic merge from submit-queue (batch tested with PRs 39826, 40030)
azure disk: restrict length of name
**What this PR does / why we need it**:
Fixes dynamic disk provisioning on Azure by properly truncating the disk name to conform to the Azure API spec.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
n/a
**Special notes for your reviewer**:
n/a
**Release note**:
```release-note
azure disk: restrict name length for Azure specifications
```
cc: @rootfs
Automatic merge from submit-queue
Curating Owners: pkg/volume
cc @jsafrane @spothanis @agonzalezro @justinsb @johscheuer @simonswine @nelcy @pmorie @quofelix @sdminonne @thockin @saad-ali @rootfs
In an effort to expand the existing pool of reviewers and establish a
two-tiered review process (first someone lgtms and then someone
experienced in the project approves), we are adding new reviewers to
existing owners files.
If You Care About the Process:
------------------------------
We did this by algorithmically figuring out who’s contributed code to
the project and in what directories. Unfortunately, that doesn’t work
well: people that have made mechanical code changes (e.g change the
copyright header across all directories) end up as reviewers in lots of
places.
Instead of using pure commit data, we generated an excessively large
list of reviewers and pruned based on all time commit data, recent
commit data and review data (number of PRs commented on).
At this point we have a decent list of reviewers, but it needs one last
pass for fine tuning.
Also, see https://github.com/kubernetes/contrib/issues/1389.
TLDR:
-----
As an owner of a sig/directory and a leader of the project, here’s what
we need from you:
1. Use PR https://github.com/kubernetes/kubernetes/pull/35715 as an example.
2. The pull-request is made editable, please edit the `OWNERS` file to
remove the names of people that shouldn't be reviewing code in the
future in the **reviewers** section. You probably do NOT need to modify
the **approvers** section. Names asre sorted by relevance, using some
secret statistics.
3. Notify me if you want some OWNERS file to be removed. Being an
approver or reviewer of a parent directory makes you a reviewer/approver
of the subdirectories too, so not all OWNERS files may be necessary.
4. Please use ALIAS if you want to use the same list of people over and
over again (don't hesitate to ask me for help, or use the pull-request
above as an example)
Automatic merge from submit-queue (batch tested with PRs 39807, 37505, 39844, 39525, 39109)
fix bug not using volumetype config in create volume
fixes#39843
@humblec
we are building the volumetype config but I don't see where we are using it in the CreateVolume for dyn provisioning, this is why volumetype parameter from the Storage Class was being overlooked because we are hard coding constants like replicaCount which is always 3.
unless I'm missing something?
Automatic merge from submit-queue (batch tested with PRs 39486, 37288, 39477, 39455, 39542)
Fix wc zombie goroutine issue in volume util
See [Cadvisor #1558](https://github.com/google/cadvisor/pull/1558). This should solve problems for those using images that do not support "wc".
cc: @timstclair
Automatic merge from submit-queue (batch tested with PRs 39433, 39413)
"Attach" function records information collation
In the "attach" function, the log information, for the variable "instanceid", has been described as "node", as well as recorded as "instance", recorded as "instance" should be better.
Automatic merge from submit-queue
Add unit tests for operation_executor
Add unit test for `Unmount operations should start in parallel for all volume plugins`
cc: @saad-ali
Automatic merge from submit-queue (batch tested with PRs 39092, 39126, 37380, 37093, 39237)
Improve error reporting in Ceph RBD provisioner.
- We should report an error when user references a secret that cannot be found
- We should report output of rbd create/delete commands, logging "exit code 1"
is not enough.
Before:
```
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
33m 33m 1 {persistentvolume-controller } Warning ProvisioningFailed Failed to provision volume with StorageClass "cephrbdprovisioner": rbd: create volume failed, err: exit status 1
```
After:
```
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
33m 33m 1 {persistentvolume-controller } Warning ProvisioningFailed Failed to provision volume with StorageClass "cephrbdprovisioner": failed to create rbd image: exit status 1, command output: rbd: couldn't connect to the cluster
```
@rootfs, PTAL
Automatic merge from submit-queue (batch tested with PRs 37959, 36221)
Recycle Pod Template Check
The kube-controller-manager has two command line arguments (--pv-recycler-pod-template-filepath-hostpath and --pv-recycler-pod-template-filepath-nfs) that specify a recycle pod template. The recycle pod template may not contain the volume that shall be recycled.
A check is added to make sure that the recycle pod template contains at least a volume.
cc: @jsafrane
Automatic merge from submit-queue
Refactor operation_executor to make it testable
**What this PR does / why we need it**:
To refactor operation_executor to make it unit testable
**Release note**:
`NONE`
Automatic merge from submit-queue (batch tested with PRs 39152, 39142, 39055)
openstack: Forcibly detach an attached cinder volume before attaching elsewhere
Fixes#33288
**What this PR does / why we need it**:
Without this fix, we can't preemptively reschedule pods with persistent volumes to other hosts (for rebalancing or hardware failure recovery).
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#33288
**Special notes for your reviewer**:
(This is a resurrection/cleanup of PR #33734, originally authored by @Rotwang)
**Release note**:
Automatic merge from submit-queue (batch tested with PRs 39029, 39014)
[Glusterfs Vol Plugin]: Check kube client is invalid and return error
Fixes: #38939
In volume plugins, we need to create a kube client to make api call. And this kube client can be nil when, for example, wrong api-server configuration, but kubelet should not crash in this case.
I have also checked other plugins and found only glusterfs need this fix.
The kube-controller-manager has two command line arguments (--pv-recycler-pod-template-filepath-hostpath and --pv-recycler-pod-template-filepath-nfs) that specify a recycle pod template. The recycle pod template may not contain the volume that shall be recycled.
A check is added to make sure that the recycle pod template contains at least a volume.
Automatic merge from submit-queue (batch tested with PRs 36888, 38180, 38855, 38590)
wrong pod reference in error message for volume attach timeout
**What this PR does / why we need it**:
when a disk mount times out you get the following error:
```
Warning FailedSync Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "nginx"/"default". list of unattached/unmounted volumes=[data]
```
where the pod is referenced by "podname"/"namespace", but should be "namespace"/"podname".
**Which issue this PR fixes**
no issue number
**Special notes for your reviewer**:
untested :(
Automatic merge from submit-queue (batch tested with PRs 38284, 38403, 38265, 38378)
glusterfs: properly check gidMin and gidMax values from SC individually
<!-- Thanks for sending a pull request! Here are some tips for you:
1. If this is your first time, read our contributor guidelines https://github.com/kubernetes/kubernetes/blob/master/CONTRIBUTING.md and developer guide https://github.com/kubernetes/kubernetes/blob/master/docs/devel/development.md
2. If you want *faster* PR reviews, read how: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/faster_reviews.md
3. Follow the instructions for writing a release note: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/pull-requests.md#release-notes
-->
**What this PR does / why we need it**:
This fixes a misleading debug message, and also prevents the glusterfs provisioner from adapting a misconfiguration of the gid-range in the storage class. Instead it will fail with proper error messages.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
https://bugzilla.redhat.com/show_bug.cgi?id=1402286
**Special notes for your reviewer**:
**Release note**:
<!-- Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access)
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`.
-->
```release-note
```
Don't override explict out-of max-range configuration, but
fail with an error message instead.
Signed-off-by: Michael Adam <obnox@redhat.com>
Automatic merge from submit-queue (batch tested with PRs 36071, 32752, 37998, 38350, 38401)
Pass addressable values to DeepCopy
Extracted from https://github.com/kubernetes/kubernetes/pull/35728
These are the places we are currently calling DeepCopy incorrectly, and we need to fix, even if we don't pick up the changes to DeepCopy in #35728:
* creating a new cloner means we have no generated functions registered
* passing non-addressable values doesn't pick up generated deep copy functions, and forces us into reflective mode
Automatic merge from submit-queue
Fix unmountDevice issue caused by shared mount in GCI
This is a fix on top #38124. In this fix, we move the logic to filter
out shared mount references into operation_executor's UnmountDevice
function to avoid this part is being used by other types volumes such as
rdb, azure etc. This filter function should be only needed during
unmount device for GCI image.
Automatic merge from submit-queue (batch tested with PRs 36310, 37349, 38319, 38402, 38338)
Fix space issue in volumePath with vSphere Cloud Provider
I tried to create a kubernetes deployment with vSphere volume with volume path
"[datastore] kubevols/redis-master".
In this case the cloud provider queries the getDeviceNameFromMount() to return the path of the volume mounted. Since getDeviceNameFromMount() queries the filesystem to get the mount references, it returns a volume path "[datastore]\\040kubevols/redis-master". Later the kubelet searches for this volume path in both the actual and desired states. Th actual and desired states contains volume with path "[datastore] kubevols/redis-master". So, it couldn't find such volume path and therefore kubernetes stalls unable to make any progress further similar to one described in #37022.
This PR will fix the space issue in volume path by replacing \\040 to empty space. This fixes#37712.
Also fixes#38148
@kerneltime @pdhamdhere
This is a fix on top #38124. In this fix, we move the logic to filter
out shared mount references into operation_executor's UnmountDevice
function to avoid this part is being used by other types volumes such as
rdb, azure etc. This filter function should be only needed during
unmount device for GCI image.
Automatic merge from submit-queue (batch tested with PRs 38294, 37009, 36778, 38130, 37835)
fix permissions when using fsGroup
Currently, when an fsGroup is specified, the permissions of the defaultMode are not respected and all files created by the atomic writer have mode 777. This is because in `SetVolumeOwnership()` the `filepath.Walk` includes the symlinks created by the atomic writer. The symlinks have mode 777 when read from `info.Mode()`. However, when the are chmod'ed later, the chmod applies to the file the symlink points to, not the symlink itself, resulting in the wrong mode for the underlying file.
This PR skips chmod/chown for symlinks in the walk since those operations are carried out on the underlying file which will be included elsewhere in the walk.
xref https://bugzilla.redhat.com/show_bug.cgi?id=1384458
@derekwaynecarr @pmorie
this is a workaround for the unmount device issue caused by gci mounter. In GCI cluster, if gci mounter is used for mounting, the container started by mounter script will cause additional mounts created in the container. Since these mounts are irrelavant to the original mounts, they should be not considered when checking the mount references. By comparing the mount path prefix, those additional mounts can be filtered out.
Plan to work on better approach to solve this issue.
Automatic merge from submit-queue (batch tested with PRs 38076, 38137, 36882, 37634, 37558)
glusterfs: Fix all gid types to int to prevent failures on 32bit systems
<!-- Thanks for sending a pull request! Here are some tips for you:
1. If this is your first time, read our contributor guidelines https://github.com/kubernetes/kubernetes/blob/master/CONTRIBUTING.md and developer guide https://github.com/kubernetes/kubernetes/blob/master/docs/devel/development.md
2. If you want *faster* PR reviews, read how: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/faster_reviews.md
3. Follow the instructions for writing a release note: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/pull-requests.md#release-notes
-->
**What this PR does / why we need it**:
The glusterfs dynamic provisioner with GID security has an issue on 32 bit systems.
This fixes that issue by forcing all gid types to int internally.
<!--
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
-->
**Release note**:
<!-- Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access)
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`.
-->
```release-note
Fix the glusterfs dynamic provisioner for 32bit systems by limiting the gids to type int internally, and allowing 2147483647 as the highest GID.
```
This makes all types int until we hand the GID to heketi/gluster,
at which point it's converted to int64.
It also limits the maximum usable GID ti math.MaxInt32 = 2147483647.
Signed-off-by: Michael Adam <obnox@redhat.com>
This makes all types int until we hand the GID to heketi/gluster,
at which point it's converted to int64.
It also limits the maximum usable GID ti math.MaxInt32 = 2147483647.
Signed-off-by: Michael Adam <obnox@redhat.com>
Automatic merge from submit-queue
Implement GID security for the GlusterFS dynamic provisioner.
<!-- Thanks for sending a pull request! Here are some tips for you:
1. If this is your first time, read our contributor guidelines https://github.com/kubernetes/kubernetes/blob/master/CONTRIBUTING.md and developer guide https://github.com/kubernetes/kubernetes/blob/master/docs/devel/development.md
2. If you want *faster* PR reviews, read how: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/faster_reviews.md
3. Follow the instructions for writing a release note: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/pull-requests.md#release-notes
-->
**What this PR does / why we need it**:
This PR implements GID security for the glusterfs dynamic provisioner.
It is a reworked version of PR #37549 .
<!--
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
-->
**Release note**:
<!-- Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access)
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`.
-->
```release-note
The glusterfs dynamic volume provisioner will now choose a unique GID for new persistent volumes from a range that can be configured in the storage class with the "gidMin" and "gidMax" parameters. The default range is 2000 - 4294967295 (max uint32).
```
An allocator of integers that allows for changing the range.
Previously allocated numbers are not lost, and can be
released later even if they have fallen outside of the range.
Signed-off-by: Michael Adam <obnox@redhat.com>
Automatic merge from submit-queue
Add `clusterid`, an optional parameter to storageclass.
At present, admin doesn't have the privilege to chose the
trusted storage pool from which persistent gluster volume
has to be provided.
This patch introduce a new storage class parameter which allows
the admin to specify storage pool/cluster if required.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
Automatic merge from submit-queue
kubernetes attempts to unmount a wrong vSphere volume and stops making any progress after that
This is in reference to the bug #37332 which was accidentally closed. So created this new PR.
The code is already reviewed as part of PR #37332
Fixes issue #37022
@saad-ali @jingxu97 @abrarshivani @kerneltime
Automatic merge from submit-queue
Fix photon controller plugin to construct with correct PdID
**What this PR does / why we need it**:
This PR is to fix a mismatching of unmount path in photon volume plugin, which is resulted from the assigning volume spec name to persistent disk ID. Without this path, unmounting process is stalling in reconciler when a pod is deleted. Restart the same pod will see a mount failure because the previous unmounting is still going on.
The input variable of function ConstructVolumeSpec is the volume spec name instead of persistent disk ID. Previously the function directly construct new volume spec by assigning volume spec name to persistent disk ID, which will result in mismatching of mount path. The fix will find the pdID according to mount path and construct volume spec with the correct pdID.
I have tested the patch with back-to-back pod creation/deletion and mounting/unmounting of photon persistent disk volume source performs normal now.
This need to be cherry-picked to 1.5 release branch.
Automatic merge from submit-queue
Reduce verbosity of volume reconciler
**What this PR does / why we need it**:
It reduces the log verbosity for attaching of volumes
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
Reduce verbosity of volume reconciler when attaching volumes
```
Set logging level for information about attaching of volumes to from 1 to 4
Otherwise the log is spammed with one line per 100ms while attaching is
in progress and afterwards as long as the volume is attached.
- We should report an error when user references a secret that cannot be found
- We should report output of rbd create/delete commands, logging "exit code 1"
is not enough.
Before:
```
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
33m 33m 1 {persistentvolume-controller } Warning ProvisioningFailed Failed to provision volume with StorageClass "cephrbdprovisioner": rbd: create volume failed, err: exit status 1
```
After:
```
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
33m 33m 1 {persistentvolume-controller } Warning ProvisioningFailed Failed to provision volume with StorageClass "cephrbdprovisioner": failed to create rbd image: exit status 1, command output: rbd: couldn't connect to the cluster
```
The input variable of function ConstructVolumeSpec is the volume spec
name instead of persistent disk ID. Previously the function directly
construct new volume spec by assigning volume spec name to persistent
disk ID, which will result in mismatching of mount path. The fix will
find the pdID according to mount path and construct volume spec with the
correct pdID.
trusted storage pool from which persistent gluster volume
has to be provided.
This patch introduce a new storage class parameter which allows
the admin to specify storage pool/cluster if required.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
Automatic merge from submit-queue
Add more logging around Pod deletion
After this PR we'll have at least V(2) level log near all Pod deletions.
@saad-ali - this is required by GKE to help with diagnosing possible problem.
cc @dchen1107 @wojtek-t
For lsilogic scsi controller, attached cinder volume does not
appear under /dev/ automatically unless do a rescan.
This approach was used in vSphere volume provider before PR #27496
dropped support for lsilogic scsi controller.
Automatic merge from submit-queue
Revert "Use Gid when provisioning Gluster Volumes."
On further inspection the design in #35460 was not secure enough. This PR reverts the change.
This reverts commit 7a0d219d12.
Automatic merge from submit-queue
fix issue in converting aws volume id from mount paths
This PR is to fix the issue in converting aws volume id from mount
paths. Currently there are three aws volume id formats supported. The
following lists example of those three formats and their corresponding
global mount paths:
1. aws:///vol-123456
(/var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/aws/vol-123456)
2. aws://us-east-1/vol-123456
(/var/lib/kubelet/plugins/kubernetes.io/mounts/aws/us-est-1/vol-123455)
3. vol-123456
(/var/lib/kubelet/plugins/kubernetes.io/mounts/aws/us-est-1/vol-123455)
For the first two cases, we need to check the mount path and convert
them back to the original format.
This PR fixes#36269
Automatic merge from submit-queue
Fix recycler pod deletion race.
We should use clone of recycler pod template instead of reusing the same
one for two or more recyclers running in parallel.
Also add some logs to relevant places to spot the error easily next time.
Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1392338
This PR is to fix the issue in converting aws volume id from mount
paths. Currently there are three aws volume id formats supported. The
following lists example of those three formats and their corresponding
global mount paths:
1. aws:///vol-123456
(/var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/aws/vol-123456)
2. aws://us-east-1/vol-123456
(/var/lib/kubelet/plugins/kubernetes.io/mounts/aws/us-est-1/vol-123455)
3. vol-123456
(/var/lib/kubelet/plugins/kubernetes.io/mounts/aws/us-est-1/vol-123455)
For the first two cases, we need to check the mount path and convert
them back to the original format.
We should use clone of recycler pod template instead of reusing the same
one for two or more recyclers running in parallel.
Also add some logs to relevant places to spot the error easily next time.
Automatic merge from submit-queue
Avoid setting S_ISGID on files in volumes
Some applications are having issues with setting the S_ISGID bit on files in volumes. We intend to do this for directories so that the group ID is inherited, but not files for which S_ISGID indicates madatory file locking https://linux.die.net/man/2/stat
xref https://bugzilla.redhat.com/show_bug.cgi?id=1387306
@ncdc @derekwaynecarr @pmorie
Automatic merge from submit-queue
Make a consistent name ( GlusterFS instead of Gluster) in variables a…
Signed-off-by: Humble Chirammal hchiramm@redhat.com
Directories in volumes are set S_ISGID to ensure files created inside
them inherit group ownership. Currently, files are also set S_ISGID
however this is not relevant to the original intent, and indicates
'mandatory file locking' (stat(2)).
With this commit, only directories are set S_ISGID.
struct hostPathPlugin contains newRecyclerFunc, newDeleterFunc and newProvisionerFunc items that have only one instance, i.e. newRecycler, newDeleter or newProvisioner function.
That's why the newRecyclerFunc, newDeleterFunc and newProvisionerFunc items are removed and the newRecycler, newDeleter or newProvisioner functions are called directly.
In addition, the TestRecycler tests whether NewFakeRecycler function is called and returns nil. This is no longer needed so this particular part of the test is removed. In addition, the no longer used NewFakeRecycler function is removed also.
Similarly for the NFS plugin, struct nfsPlugin contains newRecyclerFunc item that has only one instance, i.e. newRecycler function. That's why the newRecyclerFunc item is removed and the newRecycler function is called directly. In addition, the TestRecycler tests whether newMockRecycler function is called and returns nil. This is no longer needed so this particular part of the test is removed. In addition, the no longer used newMockRecycler function is removed also.
Automatic merge from submit-queue
Remove PV annotations for quobyte provisioner
This is the last provisioner that uses annotations to pass secrets from provisioner to deleter.
Fixes#34822
@johscheuer, I don't have access to quobyte, please take look and retest the plugin. An e2e test for quobyte would be nice!
@kubernetes/sig-storage
Automatic merge from submit-queue
Remove unused WaitForDetach from Detacher interface and plugins
See issue #33128 and PR #33270
We can't rely on the device name provided by OpenStack Cinder, and thus
must perform detection based on the drive serial number (aka It's cinder ID)
on the kubelet itself.
This needs to be removed now, as part of #33128, as the code can't be
updated to attempt device detection and fallback through to the Cinder
provided deviceName, as detection "fails" when the device is gone, and
if cinder has reported a deviceName that another volume has used in
relaity, then this will block forever (or until the other, unreleated,
volume has been detached)
Automatic merge from submit-queue
Remove GetRootContext method from VolumeHost interface
Remove the `GetRootContext` call from the `VolumeHost` interface, since Kubernetes no longer needs to know the SELinux context of the Kubelet directory.
Per #33951 and #35127.
Depends on #33663; only the last commit is relevant to this PR.
Automatic merge from submit-queue
Initial work on running windows containers on Kubernetes
<!-- Thanks for sending a pull request! Here are some tips for you:
1. If this is your first time, read our contributor guidelines https://github.com/kubernetes/kubernetes/blob/master/CONTRIBUTING.md and developer guide https://github.com/kubernetes/kubernetes/blob/master/docs/devel/development.md
2. If you want *faster* PR reviews, read how: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/faster_reviews.md
3. Follow the instructions for writing a release note: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/pull-requests.md#release-notes
-->
This is the first stab at getting the Kubelet running on Windows (fixes#30279), and getting it to deploy network-accessible pods that consist of Windows containers. Thanks @csrwng, @jbhurat for helping out.
The main challenge with Windows containers at this point is that container networking is not supported. In other words, each container in the pod will get it's own IP address. For this reason, we had to make a couple of changes to the kubelet when it comes to setting the pod's IP in the Pod Status. Instead of using the infra-container's IP, we use the IP address of the first container.
Other approaches we investigated involved "disabling" the infra container, either conditionally on `runtime.GOOS` or having a separate windows-docker container runtime that re-implemented some of the methods (would require some refactoring to avoid maintainability nightmare).
Other changes:
- The default docker endpoint was removed. This results in the docker client using the default for the specific underlying OS.
More detailed documentation on how to setup the Windows kubelet can be found at https://docs.google.com/document/d/1IjwqpwuRdwcuWXuPSxP-uIz0eoJNfAJ9MWwfY20uH3Q.
cc: @ikester @brendandburns @jstarks
Automatic merge from submit-queue
Per Volume Inode Accounting
Collects volume inode stats using the same find command as cadvisor. The command is "find _path_ -xdev -printf '.' | wc -c". The output is passed to the summary api, and will be consumed by the eviction manager.
This cannot be merged yet, as it depends on changes adding the InodesUsed field to the summary api, and the eviction manager consuming this. Expect tests to fail until this happens.
DEPENDS ON #35137
Automatic merge from submit-queue
refactor DeviceOpened() so it won't return error if device doesn't exist
<!-- Thanks for sending a pull request! Here are some tips for you:
1. If this is your first time, read our contributor guidelines https://github.com/kubernetes/kubernetes/blob/master/CONTRIBUTING.md and developer guide https://github.com/kubernetes/kubernetes/blob/master/docs/devel/development.md
2. If you want *faster* PR reviews, read how: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/faster_reviews.md
3. Follow the instructions for writing a release note: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/pull-requests.md#release-notes
-->
**What this PR does / why we need it**:
DeviceOpened() is called after device is unmounted but before detached. Some volumes such as rbd don't support 3rd party detach, they have to be detached during unmount. Once detached, the device path vanishes. This causes false alarm when DeviceOpened() is called.
The fix is to ignore error IsNotExist
**Which issue this PR fixes** _(optional, in `fixes #<issue number>(, #<issue_number>, ...)` format, will close that issue when PR gets merged)_: fixes #
**Special notes for your reviewer**:
@kubernetes/sig-storage
**Release note**:
<!-- Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access)
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`.
-->
``` release-note
```
Signed-off-by: Huamin Chen hchen@redhat.com
See issue #33128
We can't rely on the device name provided by Cinder, and thus must perform
detection based on the drive serial number (aka It's cinder ID) on the
kubelet itself.
This patch re-works the cinder volume attacher to ignore the supplied
deviceName, and instead defer to the pre-existing GetDevicePath method to
discover the device path based on it's serial number and /dev/disk/by-id
mapping.
This new behavior is controller by a config option, as falling back
to the cinder value when we can't discover a device would risk devices
not showing up, falling back to cinder's guess, and detecting the wrong
disk as attached.
We are more liberal in what we accept as a volume id in k8s, and indeed
we ourselves generate names that look like `aws://<zone>/<id>` for
dynamic volumes.
This volume id (hereafter a KubernetesVolumeID) cannot directly be
compared to an AWS volume ID (hereafter an awsVolumeID).
We introduce types for each, to prevent accidental comparison or
confusion.
Issue #35746
This has been unused since 542f2dc7, and relies on deviceName, which
can no longer be relied upon (see issue #33128).
This needs to be removed now, as part of #33128, as the code can't be
updated to attempt device detection and fallback through to the Cinder
provided deviceName, as detection "fails" when the device is gone, and
if cinder has reported a deviceName that another volume has used in
relaity, then this will block forever (or until the other, unreleated,
volume has been detached)
Automatic merge from submit-queue
SELinux Overhaul
Overhauls handling of SELinux in Kubernetes. TLDR: Kubelet dir no longer has to be labeled `svirt_sandbox_file_t`.
Fixes#33351 and #33510. Implements #33951.
Automatic merge from submit-queue
Remove stale volumes if endpoint/svc creation fails.
Remove stale volumes if endpoint/svc creation fails.
Signed-off-by: Humble Chirammal hchiramm@redhat.com
Automatic merge from submit-queue
Require PV provisioner secrets to match type
In 1.5, PV provisioners are allowing targeting namespaced secrets via storageclass params. This adds a requirement that those secrets' type match the volume provisioner plugin name, to prevent targeting and extraction of arbitrary secrets
Helps limit secret targeting issues mentioned in https://github.com/kubernetes/kubernetes/issues/34822
Automatic merge from submit-queue
Add sync state loop in master's volume reconciler
At master volume reconciler, the information about which volumes are
attached to nodes is cached in actual state of world. However, this
information might be out of date in case that node is terminated (volume
is detached automatically). In this situation, reconciler assume volume
is still attached and will not issue attach operation when node comes
back. Pods created on those nodes will fail to mount.
This PR adds the logic to periodically sync up the truth for attached
volumes kept in
the actual state cache. If the volume is no longer attached to the node,
the actual state will be updated to reflect the truth. In turn,
reconciler will take actions if needed.
To avoid issuing many concurrent operations on cloud provider, this PR
tries to add batch operation to check whether a list of volumes are
attached to the node instead of one request per volume.
At master volume reconciler, the information about which volumes are
attached to nodes is cached in actual state of world. However, this
information might be out of date in case that node is terminated (volume
is detached automatically). In this situation, reconciler assume volume
is still attached and will not issue attach operation when node comes
back. Pods created on those nodes will fail to mount.
This PR adds the logic to periodically sync up the truth for attached volumes kept in the actual state cache. If the volume is no longer attached to the node, the actual state will be updated to reflect the truth. In turn, reconciler will take actions if needed.
To avoid issuing many concurrent operations on cloud provider, this PR
tries to add batch operation to check whether a list of volumes are
attached to the node instead of one request per volume.
More details are explained in PR #33760
When kubelet restarts, all the information about the volumes will be
gone from actual/desired states. When update node status with mounted
volumes, the volume list might be empty although there are still volumes
are mounted and in turn causing master to detach those volumes since
they are not in the mounted volumes list. This fix is to make sure only
update mounted volumes list after reconciler starts sync states process.
This sync state process will scan the existing volume directories and
reconstruct actual states if they are missing.
This PR also fixes the problem during orphaned pods' directories. In
case of the pod directory is unmounted but has not yet deleted (e.g.,
interrupted with kubelet restarts), clean up routine will delete the
directory so that the pod directoriy could be cleaned up (it is safe to
delete directory since it is no longer mounted)
The third issue this PR fixes is that during reconstruct volume in
actual state, mounter could not be nil since it is required for creating
container.VolumeMap. If it is nil, it might cause nil pointer exception
in kubelet.
Details are in proposal PR #33203
Don't store Gluster SotrageClass parameters in annotations, it's insecure.
Instead, expect that there is the StorageClass available at the time
when it's needed by Gluster deleter.
Gluster provisioner is interested in pvc.Namespace and I don't want to add
at as a new field in VolumeOptions - it would contain almost whole PVC.
Let's pass direct reference to PVC instead and let the provisioner to pick
information it is interested in.
Automatic merge from submit-queue
Dynamic provisioning for flocker volume plugin
Refactor flocker volume plugin
* [x] Support provisioning beta (#29006)
* [x] Support deletion
* [x] Use bind mounts instead of /flocker in containers
* [x] support ownership management or SELinux relabeling.
* [x] adds volume specification via datasetUUID (this is guranted to be unique)
I based my refactor work to replicate pretty much GCE-PD behaviour
**Related issues**: #29006#26908
@jsafrane @mattbates @wallrj @wallnerryan
Automatic merge from submit-queue
Use strongly-typed types.NodeName for a node name
We had another bug where we confused the hostname with the NodeName.
Also, if we want to use different values for the Node.Name (which is
an important step for making installation easier), we need to keep
better control over this.
A tedious but mechanical commit therefore, to change all uses of the
node name to use types.NodeName
We had another bug where we confused the hostname with the NodeName.
To avoid this happening again, and to make the code more
self-documenting, we use types.NodeName (a typedef alias for string)
whenever we are referring to the Node.Name.
A tedious but mechanical commit therefore, to change all uses of the
node name to use types.NodeName
Also clean up some of the (many) places where the NodeName is referred
to as a hostname (not true on AWS), or an instanceID (not true on GCE),
etc.
Automatic merge from submit-queue
Use secrets for glusterfs provisioning passwords
- no plain password in StorageClass!
- fix the style along the way
- use PV annotations to pass the configuration from provisioners to deleters, inspired by Ceph RBD provisioning.
~~Proposing 1.4:~~
~~- GlusterFS provisioning is a new 1.4 feature~~
~~- if we release GlusterFS provisioner as it is now, we need to support it's API (i.e. plaintext passwords) until 2.0~~
~~- it can break only GlusterFS provisioning, nothing else~~
~~- it's easy to revert~~
@kubernetes/sig-storage
fixes#31871
Automatic merge from submit-queue
Do not report error when deleting an attached volume
Persistent volume controller should not send warning events to a PV and mark the PV as failed when the volume is still attached.
This happens when a user quickly deletes a pod and associated PVC - PV is slowly detaching, while the PVC is already deleted and the PV enters Failed phase.
`Deleter.Deleter` can now return `tryAgainError`, which is sent as INFO to the PV to let the user know we did not forget to delete the PV, however the PV stays in Released state. The controller tries again in the next sync (15 seconds by default).
Fixes#31511
Automatic merge from submit-queue
Send recycle events from pod to pv.
This allows users to diagnose what's wrong with recycler. Recycler pods are started automatically with a cryptic name and they are deleted immediately when they finish.
e.g, `kubectl describe pv` could show that NFS cannot be mounted (and how many pods have tried it):
```
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
59m 59m 1 {persistentvolume-controller } Warning RecyclerPod Recycler pod: Unable to mount volumes for pod "recycler-for-nfs_default(5421800e-347b-11e6-a79b-3c970e965218)": timeout expired waiting for volumes to attach/mount for pod "recycler-for-nfs"/"default". list of unattached/unmounted volumes=[vol]
53m 53m 1 {persistentvolume-controller } Warning RecyclerPod Recycler pod: Unable to mount volumes for pod "recycler-for-nfs_default(3c9809e5-347c-11e6-a79b-3c970e965218)": timeout expired waiting for volumes to attach/mount for pod "recycler-for-nfs"/"default". list of unattached/unmounted volumes=[vol]
46m 46m 1 {persistentvolume-controller } Warning RecyclerPod Recycler pod: Unable to mount volumes for pod "recycler-for-nfs_default(250dd2a2-347d-11e6-a79b-3c970e965218)": timeout expired waiting for volumes to attach/mount for pod "recycler-for-nfs"/"default". list of unattached/unmounted volumes=[vol]
40m 40m 1 {persistentvolume-controller } Warning RecyclerPod Recycler pod: Unable to mount volumes for pod "recycler-for-nfs_default(0d84ea33-347e-11e6-a79b-3c970e965218)": timeout expired waiting for volumes to attach/mount for pod "recycler-for-nfs"/"default". list of unattached/unmounted volumes=[vol]
33m 33m 1 {persistentvolume-controller } Warning RecyclerPod Recycler pod: Unable to mount volumes for pod "recycler-for-nfs_default(f5fb63bf-347e-11e6-a79b-3c970e965218)": timeout expired waiting for volumes to attach/mount for pod "recycler-for-nfs"/"default". list of unattached/unmounted volumes=[vol]
27m 27m 1 {persistentvolume-controller } Warning RecyclerPod Recycler pod: Unable to mount volumes for pod "recycler-for-nfs_default(de7128fd-347f-11e6-a79b-3c970e965218)": timeout expired waiting for volumes to attach/mount for pod "recycler-for-nfs"/"default". list of unattached/unmounted volumes=[vol]
1h 3m 75 {persistentvolume-controller } Normal RecyclerPod Recycler pod: Successfully assigned recycler-for-nfs to 127.0.0.1
1h 3m 76 {persistentvolume-controller } Normal RecyclerPod Recycler pod: Pod was active on the node longer than specified deadline
1h 1m 12 {persistentvolume-controller } Warning RecyclerPod Recycler pod: Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "recycler-for-nfs"/"default". list of unattached/unmounted volumes=[vol]
20m 1m 4 {persistentvolume-controller } Warning RecyclerPod (events with common reason combined)
```
These steps were necessary:
- added event watcher to volume.RecycleVolumeByWatchingPodUntilCompletion
- pass all these events through volume plugins to volume controller
- rework volume.RecycleVolumeByWatchingPodUntilCompletion unit tests to a table (too much copy-paste)
- fix all unit tests along the way
Automatic merge from submit-queue
Support Quobyte as StorageClass
This PR allows Users to use Quobyte as StorageClass for dynamic volume provisioning and implements the Provisioner/Deleter Interface.
@quolix @kubernetes/sig-storage @rootfs
Automatic merge from submit-queue
Disambiguate unsupported metrics from metrics errors
**What this PR does / why we need it**:
Stop logging "metrics are not supported for MetricsNil Volumes" as it spams the log.
**Which issue this PR fixes**
fixes#20676, fixes#27373
**Special notes for your reviewer**:
None
**Release note**:
```release-note
Don't log "metrics are not supported for MetricsNil Volumes"
```
Automatic merge from submit-queue
Change the default volume type of GlusterFS provisioner.
At present provisioner creates 'Distribute' Volume and this patch change the default
volume type 'Distribute Replica:(3)' volume.
At present, provisioner creates Distribute Volume and this patch
change the default volume type to Distribute-Replica(3) volume.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
Automatic merge from submit-queue
Fix race condition in updating attached volume between master and node
This PR tries to fix issue #29324. The cause of this issue is that a race
condition happens when marking volumes as attached for node status. This
PR tries to clean up the logic of when and where to mark volumes as
attached/detached. Basically the workflow as follows,
1. When volume is attached sucessfully, the volume and node info is
added into nodesToUpdateStatusFor to mark the volume as attached to the
node.
2. When detach request comes in, it will check whether it is safe to
detach now. If the check passes, remove the volume from volumesToReportAsAttached
to indicate the volume is no longer considered as attached now.
Afterwards, reconciler tries to update node status and trigger detach
operation. If any of these operation fails, the volume is added back to
the volumesToReportAsAttached list showing that it is still attached.
These steps should make sure that kubelet get the right (might be
outdated) information about which volume is attached or not. It also
garantees that if detach operation is pending, kubelet should not
trigger any mount operations.
This PR tries to fix issue #29324. This cause of this issue is a race
condition happens when marking volumes as attached for node status. This
PR tries to clean up the logic of when and where to mark volumes as
attached/detached. Basically the workflow as follows,
1. When volume is attached sucessfully, the volume and node info is
added into nodesToUpdateStatusFor to mark the volume as attached to the
node.
2. When detach request comes in, it will check whether it is safe to
detach now. If the check passes, remove the volume from volumesToReportAsAttached
to indicate the volume is no longer considered as attached now.
Afterwards, reconciler tries to update node status and trigger detach
operation. If any of these operation fails, the volume is added back to
the volumesToReportAsAttached list showing that it is still attached.
These steps should make sure that kubelet get the right (might be
outdated) information about which volume is attached or not. It also
garantees that if detach operation is pending, kubelet should not
trigger any mount operations.
Automatic merge from submit-queue
support storage class in Ceph RBD volume
replace WIP PR #30959, using PV annotation idea from @jsafrane
@kubernetes/sig-storage @johscheuer @elsonrodriguez
This allows users to diagnose what's wrong with recycler. Recycler pods are
started automatically with a cryptic name and they are deleted immediately
when they finish.
kubectl describe pods will show:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
59m 59m 1 {persistentvolume-controller } Warning RecyclerPod Recycler pod: Unable to mount volumes for pod "recycler-for-nfs_default(5421800e-347b-11e6-a79b-3c970e965218)": timeout expired waiting for volumes to attach/mount for pod "recycler-for-nfs"/"default". list of unattached/unmounted volumes=[vol]
53m 53m 1 {persistentvolume-controller } Warning RecyclerPod Recycler pod: Unable to mount volumes for pod "recycler-for-nfs_default(3c9809e5-347c-11e6-a79b-3c970e965218)": timeout expired waiting for volumes to attach/mount for pod "recycler-for-nfs"/"default". list of unattached/unmounted volumes=[vol]
46m 46m 1 {persistentvolume-controller } Warning RecyclerPod Recycler pod: Unable to mount volumes for pod "recycler-for-nfs_default(250dd2a2-347d-11e6-a79b-3c970e965218)": timeout expired waiting for volumes to attach/mount for pod "recycler-for-nfs"/"default". list of unattached/unmounted volumes=[vol]
40m 40m 1 {persistentvolume-controller } Warning RecyclerPod Recycler pod: Unable to mount volumes for pod "recycler-for-nfs_default(0d84ea33-347e-11e6-a79b-3c970e965218)": timeout expired waiting for volumes to attach/mount for pod "recycler-for-nfs"/"default". list of unattached/unmounted volumes=[vol]
33m 33m 1 {persistentvolume-controller } Warning RecyclerPod Recycler pod: Unable to mount volumes for pod "recycler-for-nfs_default(f5fb63bf-347e-11e6-a79b-3c970e965218)": timeout expired waiting for volumes to attach/mount for pod "recycler-for-nfs"/"default". list of unattached/unmounted volumes=[vol]
27m 27m 1 {persistentvolume-controller } Warning RecyclerPod Recycler pod: Unable to mount volumes for pod "recycler-for-nfs_default(de7128fd-347f-11e6-a79b-3c970e965218)": timeout expired waiting for volumes to attach/mount for pod "recycler-for-nfs"/"default". list of unattached/unmounted volumes=[vol]
1h 3m 75 {persistentvolume-controller } Normal RecyclerPod Recycler pod: Successfully assigned recycler-for-nfs to 127.0.0.1
1h 3m 76 {persistentvolume-controller } Normal RecyclerPod Recycler pod: Pod was active on the node longer than specified deadline
1h 1m 12 {persistentvolume-controller } Warning RecyclerPod Recycler pod: Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "recycler-for-nfs"/"default". list of unattached/unmounted volumes=[vol]
20m 1m 4 {persistentvolume-controller } Warning RecyclerPod (events with common reason combined)
These steps were necessary:
- added event watcher to volume.RecycleVolumeByWatchingPodUntilCompletion
- pass all these events through volume plugins to volume controller
- rework volume.RecycleVolumeByWatchingPodUntilCompletion unit tests to a table
(too much copy-paste)
- fix all unit tests along the way
Automatic merge from submit-queue
Make @rootfs the assignee for various volumes
This, combined with the '/lgtm' capability of reviewers means you can approve
PRs. @rootfs - I assume you're OK with this?
Automatic merge from submit-queue
Post event message for volume attachment
This PR is to add event message when attaching volume fails to help
users to debug. For detach failure, may address in a different PR since
it requires more data structure change.
This PR is to add event message when attaching volume fails to help
users to debug. For detach failure, may address in a different PR since
it requires more data structure change.
Automatic merge from submit-queue
Implements Attacher Plugin Interface for vSphere
This PR does the following,
Fixes#29028 (vsphere volume should implement attacher interface): Implements Attacher Plugin Interface for vSphere.
See file:
pkg/volume/vsphere_volume/vsphere_volume.go. - Removed attach and detach calls from SetupAt and TearDownAt.
pkg/volume/vsphere_volume/attacher.go. - Implements Attacher & Detacher Plugin Interface for vSphere. (Ref :- GCE_PD & AWS attacher.go)
pkg/cloudproviders/provider/vsphere.go - Added DiskIsAttach method.
The vSphere plugin code needs clean up. (ex: The code for getting vSphere instance is repeated in file pkg/cloudprovider/providers/vsphere.go). I will fix this in next PR.
Automatic merge from submit-queue
Fix coding style
cc @pmorie
**What this PR does / why we need it**: Fixes case on a variable name, it's simple and adjust the code to the coding style.
**Release note**:
<!-- Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access)
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`.
-->
```NONE
```
Automatic merge from submit-queue
Add encryption to EBS dynamic provisioner
Resolves https://github.com/kubernetes/kubernetes/issues/30792
Adds encryption to the EBS cloud provider and provisioner.
Follow up to #29006 (all commits but the one in this PR will drop out).
@kubernetes/sig-storage
```release-note
```
The serviceAccountName is occasionally useful for clients running on
Kube that need to know who they are when talking to other components.
The nodeName is useful for PetSet or DaemonSet pods that need to make
calls back to the API to fetch info about their node.
Both fields are immutable, and cannot easily be retrieved in another
way.
Automatic merge from submit-queue
Add ismounted check in unmountpath function
This change is to fix PR #30930. The function should check if the
mountpath is still mounted or not. If it is not, it should continue with
removing the directory instead of returning error.
This change is for fixing PR #30930. The function should check if the
mountpath is still mounted or not. If it is not, it should continue with
removing the directory instead of returning error.
Automatic merge from submit-queue
Adds myself to the flocker volume plugin owners
I am happy to look after the flocker volume plugin and support @agonzalezro. Currently refactoring the volume plugin and adding dynamic provisioning features in #31005
Automatic merge from submit-queue
Add Events for operation_executor to show status of mounts, failed/successful to show in describe events
Fixes#27590
@saad-ali @pmorie @erinboyd
After talking with @pmorie last week about the above issue, I decided to poke around and see if I could remedy. The refactoring broke my previous UXP merged PR's that correctly showed failed mount errors in the describe events. However, Not sure I implemented correctly, but it tested out and seems to be working, let me know what I missed or if this is not the correct approach.
```
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
2m 2m 1 {default-scheduler } Normal Scheduled Successfully assigned nfs-bb-pod1 to 127.0.0.1
44s 44s 1 {kubelet 127.0.0.1} Warning FailedMount Unable to mount volumes for pod "nfs-bb-pod1_default(a94f64f1-37c9-11e6-9aa5-52540073d346)": timeout expired waiting for volumes to attach/mount for pod "nfs-bb-pod1"/"default". list of unattached/unmounted volumes=[nfsvol]
44s 44s 1 {kubelet 127.0.0.1} Warning FailedSync Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "nfs-bb-pod1"/"default". list of unattached/unmounted volumes=[nfsvol]
38s 38s 1 {kubelet } Warning FailedMount Unable to mount volumes for pod "a94f64f1-37c9-11e6-9aa5-52540073d346": Mount failed: exit status 32
Mounting arguments: nfs1.rhs:/opt/data99 /var/lib/kubelet/pods/a94f64f1-37c9-11e6-9aa5-52540073d346/volumes/kubernetes.io~nfs/nfsvol nfs []
Output: mount.nfs: Connection timed out
Resolution hint: Check and make sure the NFS Server exists (ensure that correct IPAddress/Hostname was given) and is available/reachable.
Also make sure firewall ports are open on both client and NFS Server (2049 v4 and 2049, 20048 and 111 for v3).
Use commands telnet <nfs server> <port> and showmount <nfs server> to help test connectivity.
```
Automatic merge from submit-queue
Quobyte Volume plugin
@quofelix and myself developed a volume plugin for [Quobyte](http://www.quobyte.com) which is a software-defined storage solution. This PR allows Kubernetes users to mount a Quobyte Volume inside their containers over Kubernetes.
Here are some further informations about [Quobyte and Storage for containers](http://www.quobyte.com/containers)
Automatic merge from submit-queue
Implement dynamic provisioning (beta) of PersistentVolumes via StorageClass
Implemented according to PR #26908. There are several patches in this PR with one huge code regen inside.
* Please review the API changes (the first patch) carefully, sometimes I don't know what the code is doing...
* `PV.Spec.Class` and `PVC.Spec.Class` is not implemented, use annotation `volume.alpha.kubernetes.io/storage-class`
* See e2e test and integration test changes - Kubernetes won't provision a thing without explicit configuration of at least one `StorageClass` instance!
* Multiple provisioning volume plugins can coexist together, e.g. HostPath and AWS EBS. This is important for Gluster and RBD provisioners in #25026
* Contradicting the proposal, `claim.Selector` and `volume.alpha.kubernetes.io/storage-class` annotation are **not** mutually exclusive. They're both used for matching existing PVs. However, only `volume.alpha.kubernetes.io/storage-class` is used for provisioning, configuration of provisioning with `Selector` is left for (near) future.
* Documentation is missing. Can please someone write some while I am out?
For now, AWS volume plugin accepts classes with these parameters:
```
kind: StorageClass
metadata:
name: slow
provisionerType: kubernetes.io/aws-ebs
provisionerParameters:
type: io1
zone: us-east-1d
iopsPerGB: 10
```
* parameters are case-insensitive
* `type`: `io1`, `gp2`, `sc1`, `st1`. See AWS docs for details
* `iopsPerGB`: only for `io1` volumes. I/O operations per second per GiB. AWS volume plugin multiplies this with size of requested volume to compute IOPS of the volume and caps it at 20 000 IOPS (maximum supported by AWS, see AWS docs).
* of course, the plugin will use some defaults when a parameter is omitted in a `StorageClass` instance (`gp2` in the same zone as in 1.3).
GCE:
```
apiVersion: extensions/v1beta1
kind: StorageClass
metadata:
name: slow
provisionerType: kubernetes.io/gce-pd
provisionerParameters:
type: pd-standard
zone: us-central1-a
```
* `type`: `pd-standard` or `pd-ssd`
* `zone`: GCE zone
* of course, the plugin will use some defaults when a parameter is omitted in a `StorageClass` instance (SSD in the same zone as in 1.3 ?).
No OpenStack/Cinder yet
@kubernetes/sig-storage
Automatic merge from submit-queue
Allow setting permission mode bits on secrets, configmaps and downwardAPI files
cc @thockin @pmorie
Here is the first round to implement: https://github.com/kubernetes/kubernetes/pull/28733.
I made two commits: one with the actual change and the other with the auto-generated code. I think it's easier to review this way, but let me know if you prefer in some other way.
I haven't written any tests yet, I wanted to have a first glance and not write them till this (and the API) are more close to the "LGTM" :)
There are some things:
* I'm not sure where to do the "AND 0777". I'll try to look better in the code base, but suggestions are always welcome :)
* The write permission on group and others is not set when you do an `ls -l` on the running container. It does work with write permissions to the owner. Debugging seems to show that is something happening after this is correctly set on creation. Will look closer.
* The default permission (when the new fields are not specified) are the same that on kubernetes v1.3
* I do realize there are conflicts with master, but I think this is good enough to have a look. The conflicts is with the autog-enerated code, so the actual code is actually the same (and it takes like ~30 minutes to generate it here)
* I didn't generate the docs (`generated-docs` and `generated-swagger-docs` from `hack/update-all.sh`) because my machine runs out of mem. So that's why it isn't in this first PR, will try to investigate and see why it happens.
Other than that, this works fine here with some silly scripts I did to create a secret&configmap&downwardAPI, a pod and check the file permissions. Tested the "defaultMode" and "mode" for all. But of course, will write tests once this is looking fine :)
Thanks a lot again!
Rodrigo
This implements the proposal in:
docs/proposals/secret-configmap-downwarapi-file-mode.md
Fixes: #28317.
The mounttest image is updated so it returns the permissions of the linked file
and not the symlink itself.
Automatic merge from submit-queue
Fix default resource limits (node allocatable) for downward api volumes and env vars
@kubernetes/rh-cluster-infra @pmorie @derekwaynecarr
Automatic merge from submit-queue
Add volume reconstruct/cleanup logic in kubelet volume manager
Currently kubelet volume management works on the concept of desired
and actual world of states. The volume manager periodically compares the
two worlds and perform volume mount/unmount and/or attach/detach
operations. When kubelet restarts, the cache of those two worlds are
gone. Although desired world can be recovered through apiserver, actual
world can not be recovered which may cause some volumes cannot be cleaned
up if their information is deleted by apiserver. This change adds the
reconstruction of the actual world by reading the pod directories from
disk. The reconstructed volume information is added to both desired
world and actual world if it cannot be found in either world. The rest
logic would be as same as before, desired world populator may clean up
the volume entry if it is no longer in apiserver, and then volume
manager should invoke unmount to clean it up.
Fixes https://github.com/kubernetes/kubernetes/issues/27653
Currently kubelet volume management works on the concept of desired
and actual world of states. The volume manager periodically compares the
two worlds and perform volume mount/unmount and/or attach/detach
operations. When kubelet restarts, the cache of those two worlds are
gone. Although desired world can be recovered through apiserver, actual
world can not be recovered which may cause some volumes cannot be cleaned
up if their information is deleted by apiserver. This change adds the
reconstruction of the actual world by reading the pod directories from
disk. The reconstructed volume information is added to both desired
world and actual world if it cannot be found in either world. The rest
logic would be as same as before, desired world populator may clean up
the volume entry if it is no longer in apiserver, and then volume
manager should invoke unmount to clean it up.
Automatic merge from submit-queue
Verify volume.GetPath() never returns ""
Add a new helper method volume.GetPath(Mounter) instead of calling
the GetPath() of the Mounter directly. Check if GetPath() is returning
a "" and convert that into an error.
Fixes#23163
Automatic merge from submit-queue
Run goimport for the whole repo
While removing GOMAXPROC and running goimports, I noticed quite a lot of other files also needed a goimport format. Didn't commit `*.generated.go`, `*.deepcopy.go` or files in `vendor`
This is more for testing if it builds.
The only strange thing here is the gopkg.in/gcfg.v1 => github.com/scalingdata/gcfg replace.
cc @jfrazelle @thockin
Add a new helper method volume.GetPath(Mounter) instead of calling
the GetPath() of the Mounter directly. Check if GetPath() is returning
a "" and convert that into an error. At this point, we only have
information about the type of the Mounter, so let's log that if
there is a problem
Fixes#23163
Automatic merge from submit-queue
pkg/various: plug leaky time.New{Timer,Ticker}s
According to the documentation for Go package time, `time.Ticker` and
`time.Timer` are uncollectable by garbage collector finalizers. They
leak until otherwise stopped. This commit ensures that all remaining
instances are stopped upon departure from their relative scopes.
Similar efforts were incrementally done in #29439 and #29114.
```release-note
* pkg/various: plugged various time.Ticker and time.Timer leaks.
```
Automatic merge from submit-queue
Check iscsi iface file for transport name
When checking for tcp vs hardware transports, check actual iscsi iface file to see if we are using tcp as a transport, rather than relying on just the transport name of 'default'.
This fixes the open-iscsi software iscsi initiator for non-default interfaces.
fixes#27131
Automatic merge from submit-queue
pkg/util/goroutinemap: apply idiomatic Go cleanups
Package goroutinemap can be structurally simplified to be more
idiomatic, concise, and free of error potential. No structural changes
are made.
It is unconventional declare `sync.Mutex` directly as a pointerized
field in a parent structure. The `sync.Mutex` operates on pointer
receivers of itself; and by relying on that, the types that contain
those fields can be safely constructed using
https://golang.org/ref/spec#The_zero_value semantic.
The duration constants are already of type `time.Duration`, so
re-declaring that is redundant.
/CC: @saad-ali
Automatic merge from submit-queue
Fix mount collision timeout issue
Short- or medium-term workaround for #29555. The root issue being fixed here is that the recent attach/detach work in the kubelet uses a unique volume name as a key that tracks the work that has to be done for each volume in a pod to attach/mount/umount/detach. However, the non-attachable volume plugins do not report unique names for themselves, which causes collisions when a single secret or configmap is mounted multiple times in a pod.
This is still a WIP -- I need to add a couple E2E tests that ensure that tests break in the future if there is a regression -- but posting for early review.
cc @kubernetes/sig-storage
Ultimately, I would like to refine this a bit further. A couple things I would like to change:
1. `GetUniqueVolumeName` should be a property ONLY of attachable volumes
2. I would like to see the kubelet apparatus for attach/mount/umount/detach handle non-attachable volumes specifically to avoid things like the `WaitForControllerAttach` call that has to be done for those volume types now
Automatic merge from submit-queue
volume/flocker: plug time.Ticker resource leak
This commit ensures that `flockerMounter.updateDatasetPrimary` does not leak
running `time.Ticker` instances. Upon termination of the consuming routine, we
stop the tickers.
```release-note
* flockerMounter.updateDatasetPrimary no longer leaks running time.Ticker instances.
Upon termination of the consuming routine, we stop the tickers.
```
For non-attachable volumes, do not call GetVolumeName on the plugin and instead
generate a unique name based on the identity of the pod and the name of the volume
within the pod.
This fixes race conditions in configmap, secret, downwardapi & git_repo
volume plugins.
wrappedVolumeSpec vars used by volume mounters and unmounters contained
a pointer to api.Volume structs which were being patched by
NewWrapperMounter/NewWrapperUnmounter, causing race condition during
volume mounts.
Package goroutinemap can be structurally simplified to be more
idiomatic, concise, and free of error potential. No structural changes
are made.
It is unconventional declare `sync.Mutex` directly as a pointerized
field in a parent structure. The `sync.Mutex` operates on pointer
receivers of itself; and by relying on that, the types that contain
those fields can be safely constructed using
https://golang.org/ref/spec#The_zero_value.
The duration constants are already of type `time.Duration`, so
re-declaring that is redundant.
According to the documentation for Go package time, `time.Ticker` and
`time.Timer` are uncollectable by garbage collector finalizers. They
leak until otherwise stopped. This commit ensures that all remaining
instances are stopped upon departure from their relative scopes.
Automatic merge from submit-queue
add enhanced volume and mount logging for block devices
Fixes#24568
Adding better logging and debugging for block device volumes and the shared SafeFormatAndMount (aws, gce, flex, rbd, cinder, etc...)
Allow mount volume operations to run in parallel for non-attachable
volume plugins.
Allow unmount volume operations to run in parallel for all volume
plugins.
This commit ensures that `flockerMounter.updateDatasetPrimary` does not leak
running `time.Ticker` instances. Upon termination of the consuming
routine, we stop the tickers.
This is to base the name on the volume not just on the
source configMap. If you have 2 volumes that both have the same
configMap as a source, the volume is see as being in the attached
state (it's state is looked up based on GetVolumeName()).
See bug #28502
Automatic merge from submit-queue
Track object modifications in fake clientset
Fake clientset is used by unit tests extensively but it has some
shortcomings:
- no filtering on namespace and name: tests that want to test objects in
multiple namespaces end up getting all objects from this clientset,
as it doesn't perform any filtering based on name and namespace;
- updates and deletes don't modify the clientset state, so some tests
can get unexpected results if they modify/delete objects using the
clientset;
- it's possible to insert multiple objects with the same
kind/name/namespace, this leads to confusing behavior, as retrieval is
based on the insertion order, but anchors on the last added object as
long as no more objects are added.
This change changes core.ObjectRetriever implementation to track object
adds, updates and deletes.
Some unit tests were depending on the previous (and somewhat incorrect)
behavior. These are fixed in the following few commits.
Ensure that kublet marks VolumeInUse before checking if it is Attached.
Also ensures that the attach/detach controller always fetches a fresh
copy of the node object before detach (instead ofKubelet relying on node
informer cache).
Fake clientset no longer needs to be prepopulated with records: keeping
them in leads to the name conflict on creates. Also, since fake
clientset now respects namespaces, we need to correctly populate them.
Automatic merge from submit-queue
Adding SCSI controller type filter for vSphere disk attach
Hot plug of disks to a SCSI controller of type lsilogic doesn't work as expected. When a device is detached from the controller, it fails to remove the device from the /dev path which makes the subsequent attaches to the node to fail. With scsi controller types lsilogic-sas or paravirtual this seems to work well. This patch filters the existing controller for these types, and if it doesn't find one, it creates a new controller for disk attach.
This PR is dependent on https://github.com/kubernetes/kubernetes/pull/26658 (1st commit) also targeting this for 1.3
Automatic merge from submit-queue
AWS/GCE: Spread PetSet volume creation across zones, create GCE volumes in non-master zones
Long term we plan on integrating this into the scheduler, but in the
short term we use the volume name to place it onto a zone.
We hash the volume name so we don't bias to the first few zones.
If the volume name "looks like" a PetSet volume name (ending with
-<number>) then we use the number as an offset. In that case we hash
the base name.
Lots of comments describing the heuristics, how it fits together and the
limitations.
In particular, we can't guarantee correct volume placement if the set of
zones is changing between allocating volumes.
Modify attach/detach controller to keep track of volumes to report
attached in Node VolumeToAttach status.
Modify kubelet volume manager to wait for volume to show up in Node
VolumeToAttach status.
Implement exponential backoff for errors in volume manager and attach
detach controller
Hot attach of disk to a scsi controller will work only if the
controller type is lsilogic-sas or paravirtual.This patch filters
the existing controller for these types, if it doesn't find one it
creates a new scsi controller.
Automatic merge from submit-queue
Remove an empty line being output when exposing annotations and labels via downward api volume
The issue is that formatMap function (for annotations and labels) in pkg/fieldpath/fieldpath.go appends a "\n" after each key value pair which is correct for all pairs except the last pair because then a complete string is returned with a "\n" in the end. It is inconsistent with other strings (metadata.name, namespace and resources) being returned as they dont have "\n" in the end. These returned strings are processed by sortLines function in pkg/volume/downwardapi/downwardapi.go and the function finally appends "\n" to each string, but incorrectly outputs an empty line if there is an already "\n" in the end with the input string. To illustrate:
The sortLines works as follows: lets say the input string is : "a\nb\nc\n".
1. It splits them as "a", "b", "c", "" (note empty string in the end).
2. it sort them: "", "a", b", "c"
3. And then it appends "\n" again to each string: "\n", "a\n" ,"b\n", "c\n"
So we can see that it is erroneously creating an empty string in the beginning when the input string to sortLines has "\n" in the end. As I said above, it is not an issue with metadata.name, namespace and resources as their input strings are without \n" in the end.
So now, the output in the downward api volume, (using the example in http://kubernetes.io/docs/user-guide/downward-api/):
```
# cat /etc/annotations
zone="us-est-coast"
cluster="test-cluster1"
rack="rack-22"
```
After this patch, the output will be correct and without the erroneous empty line in the beginning.
I could think other ways to solve this but I found the way in this patch with minimal code changes.
@kubernetes/rh-cluster-infra
We had a long-lasting bug which prevented creation of volumes in
non-master zones, because the cloudprovider in the volume label
admission controller is not initialized with the multizone setting
(issue #27656).
This implements a simple workaround: if the volume is created with the
failure-domain zone label, we look for the volume in that zone. This is
more efficient, avoids introducing a new semantic, and allows users (and
the dynamic provisioner) to create volumes in non-master zones.
Fixes#27657
Long term we plan on integrating this into the scheduler, but in the
short term we use the volume name to place it onto a zone.
We hash the volume name so we don't bias to the first few zones.
If the volume name "looks like" a PetSet volume name (ending with
-<number>) then we use the number as an offset. In that case we hash
the base name.
Fixes#27256
This commit adds a new volume manager in kubelet that synchronizes
volume mount/unmount (and attach/detach, if attach/detach controller
is not enabled).
This eliminates the race conditions between the pod creation loop
and the orphaned volumes loops. It also removes the unmount/detach
from the `syncPod()` path so volume clean up never blocks the
`syncPod` loop.
- replaces probeVolume with scsiHostRescan to scan hot attached disks
- fixes substring match of UUID returned from AttachDisk
- changes DetachDisk to take volumePath argument instead of diskID
- fixes delayed failure at mount rather than attach disk
- removes cloning of virtual disk in AttachDisk
Automatic merge from submit-queue
GCE attach tests
Add basic tests for GCE attacher.
Looking at the code, it would deserve some refactoring as suggested in #25888, so mounting is not tested at all.
Automatic merge from submit-queue
Fix GCE attacher/detacher to ignore return value of failed calls.
The plugin should ignore any return value if err is set. Found when writing unit tests in #26615 - my dummy `DiskIsAttached` returned `false, errors.New('fake error')` and the volume was **not** detached although the log message `"Error checking if PD (%q) is already attached to current node (%q). Will continue and try detach anyway."` suggested otherwise
@saad-ali, PTAL
@kubernetes/sig-storage
This PR contains Kubelet changes to enable attach/detach controller control.
* It introduces a new "enable-controller-attach-detach" kubelet flag to
enable control by controller. Default enabled.
* It removes all references "SafeToDetach" annoation from controller.
* It adds the new VolumesInUse field to the Node Status API object.
* It modifies the controller to use VolumesInUse instead of SafeToDetach
annotation to gate detachment.
* There is a bug in node-problem-detector that causes VolumesInUse to
get reset every 30 seconds. Issue https://github.com/kubernetes/node-problem-detector/issues/9
opened to fix that.
Automatic merge from submit-queue
read gluster log to surface glusterfs plugin errors properly in describe events
glusterfs.go does not properly expose errors as all mount errors go to a log file, I propose we read the log file to expose the errors without asking the users to 'go look at this log'
This PR does the following:
1. adds a gluster option for log-level=ERROR to remove all noise from log file
2. change log file name and path based on PV + Pod name - so specific per PV and Pod
3. create a utility to read the last two lines of the log file when failure occurs
old behavior:
```
13s 13s 1 {kubelet 127.0.0.1} Warning FailedMount Unable to mount volumes for pod "bb-gluster-pod2_default(34b18c6b-070d-11e6-8e95-52540092b5fb)": glusterfs: mount failed: Mount failed: exit status 1
Mounting arguments: 192.168.234.147:myVol2 /var/lib/kubelet/pods/34b18c6b-070d-11e6-8e95-52540092b5fb/volumes/kubernetes.io~glusterfs/pv-gluster glusterfs [log-file=/var/lib/kubelet/plugins/kubernetes.io/glusterfs/pv-gluster/glusterfs.log]
Output: Mount failed. Please check the log file for more details.
```
improved behavior: (updated after suggestions from community)
```
34m 34m 1 {kubelet 127.0.0.1} Warning FailedMount Unable to mount volumes for pod "bb-multi-pod1_default(e7d7f790-0d4b-11e6-a275-52540092b5fb)": glusterfs: mount failed: Mount failed: exit status 1
Mounting arguments: 192.168.123.222:myVol2 /var/lib/kubelet/pods/e7d7f790-0d4b-11e6-a275-52540092b5fb/volumes/kubernetes.io~glusterfs/pv-gluster2 glusterfs [log-level=ERROR log-file=/var/lib/kubelet/plugins/kubernetes.io/glusterfs/pv-gluster2/bb-multi-pod1-glusterfs.log]
Output: Mount failed. Please check the log file for more details.
the following error information was pulled from the log to help resolve this issue:
[2016-04-28 14:21:29.109697] E [socket.c:2332:socket_connect_finish] 0-glusterfs: connection to 192.168.123.222:24007 failed (Connection timed out)
[2016-04-28 14:21:29.109767] E [glusterfsd-mgmt.c:1819:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: 192.168.123.222 (Transport endpoint is not connected)
```
also this PR is alternate approach to : #24624
Automatic merge from submit-queue
Attach Detach Controller Business Logic
This PR adds the meat of the attach/detach controller proposed in #20262.
The PR splits the in-memory cache into a desired and actual state of the world.
Split controller cache into actual and desired state of world.
Controller will only operate on volumes scheduled to nodes that
have the "volumes.kubernetes.io/controller-managed-attach" annotation.
Automatic merge from submit-queue
vSphere Volume Plugin Implementation
This PR implements vSphere Volume plugin support in Kubernetes (ref. issue #23932).
Automatic merge from submit-queue
Add support for PersistentVolumeClaim in Attacher/Detacher interface
The attach detach interface does not support volumes which are referenced through PVCs. This PR adds that support
Automatic merge from submit-queue
Extend secrets volumes with path control
As per [1] this PR extends secrets mapped into volume with:
* key-to-path mapping the same way as is for configmap. E.g.
```
{
"apiVersion": "v1",
"kind": "Pod",
"metadata": {
"name": "mypod",
"namespace": "default"
},
"spec": {
"containers": [{
"name": "mypod",
"image": "redis",
"volumeMounts": [{
"name": "foo",
"mountPath": "/etc/foo",
"readOnly": true
}]
}],
"volumes": [{
"name": "foo",
"secret": {
"secretName": "mysecret",
"items": [{
"key": "username",
"path": "my-username"
}]
}
}]
}
}
```
Here the ``spec.volumes[0].secret.items`` added changing original target ``/etc/foo/username`` to ``/etc/foo/my-username``.
* secondly, refactoring ``pkg/volumes/secrets/secrets.go`` volume plugin to use ``AtomicWritter`` to project a secret into file.
[1] https://github.com/kubernetes/kubernetes/blob/master/docs/design/configmap.md#changes-to-secret
Automatic merge from submit-queue
volume recycler: Don't start a new recycler pod if one already exists.
Recycling is a long duration process and when the recycler controller is restarted in the meantime, it should not start a new recycler pod if there is one already running.
This means that the recycler pod must have deterministic name based on name of the recycled PV, we then get name conflicts when creating the pod.
Two things need to be changed:
- recycler controller and recycler plugins must pass the PV.Name to place, where the pod is created. This is most of the patch and it should be pretty straightforward.
- create recycler pod with deterministic name and check "already exists" error.
When at it, remove useless 'resourceVersion' argument and make log messages starting with lowercase.
There is an unit test to check the behavior + there is an e2e test that checks that regular recycling is not broken (it does not try to run two recycler pods in parallel as the recycler is single-threaded now).
Recycling is a long duration process and when the recycler controller is
restarted in the meantime, it should not start a new recycler pod if there is
one already running.
This means that the recycler pod must have deterministic name based on name
of the recycled PV, we then get name conflicts when creating the pod.
Two things need to be changed:
- recycler controller and recycler plugins must pass the PV.Name to place,
where the pod is created.
- create recycler pod with deterministic name and check "already exists" error.
When at it, remove useless 'resourceVersion' argument and make log messages
starting with lowercase.
Automatic merge from submit-queue
Refactor persistent volume controller
Here is complete persistent controller as designed in https://github.com/pmorie/pv-haxxz/blob/master/controller.go
It's feature complete and compatible with current binder/recycler/provisioner. No new features, it *should* be much more stable and predictable.
Testing
--
The unit test framework is quite complicated, still it was necessary to reach reasonable coverage (78% in `persistentvolume_controller.go`). The untested part are error cases, which are quite hard to test in reasonable way - sure, I can inject a VersionConflictError on any object update and check the error bubbles up to appropriate places, but the real test would be to run `syncClaim`/`syncVolume` again and check it recovers appropriately from the error in the next periodic sync. That's the hard part.
Organization
---
The PR starts with `rm -rf kubernetes/pkg/controller/persistentvolume`. I find it easier to read when I see only the new controller without old pieces scattered around.
[`types.go` from the old controller is reused to speed up matching a bit, the code looks solid and has 95% unit test coverage].
I tried to split the PR into smaller patches, let me know what you think.
~~TODO~~
--
* ~~Missing: provisioning, recycling~~.
* ~~Fix integration tests~~
* ~~Fix e2e tests~~
@kubernetes/sig-storage
<!-- Reviewable:start -->
---
This change is [<img src="http://reviewable.k8s.io/review_button.svg" height="35" align="absmiddle" alt="Reviewable"/>](http://reviewable.k8s.io/reviews/kubernetes/kubernetes/24331)
<!-- Reviewable:end -->
Fixes#15632
The key to path mapping allows pod to specify different name (thus location) of each secret.
At the same time refactor the volume plugin to use AtomicWritter to project secrets to files in a volume.
Update e2e Secrets test, the secret file permission has changed from 0444 to 0644
Remove TestPluginIdempotent as the AtomicWritter is responsible for secret creation
Automatic merge from submit-queue
Make IsQualifiedName return error strings
Part of the larger validation PR, broken out for easier review and merge.
@lavalamp FYI, but I know you're swamped, too.
Automatic merge from submit-queue
Use local disk for ConfigMap volume instead of tmpfs
So that ConfigMap volumes are counted against pod's storage quota.
@kubernetes/sig-node
cc @derekwaynecarr @vishh
Automatic merge from submit-queue
Abstract node side functionality of attachable plugins
- Create PhysicalAttacher interface to abstract MountDevice and
WaitForAttach.
- Create PhysicalDetacher interface to abstract WaitForDetach and
UnmountDevice.
- Expand unit tests to check that Attach, Detach, WaitForAttach,
WaitForDetach, MountDevice, and UnmountDevice get call where
appropriet.
Physical{Attacher,Detacher} are working titles suggestions welcome. Some other thoughts:
- NodeSideAttacher or NodeAttacher.
- AttachWatcher
- Call this Attacher and call the Current Attacher CloudAttacher.
- DeviceMounter (although there are way too many things called Mounter right now :/)
This is to address: https://github.com/kubernetes/kubernetes/pull/21709#issuecomment-192035382
@saad-ali
Automatic merge from submit-queue
Automatically Add Supplemental Groups from Volumes to Pods
This adds support for a "GID" annotation that one can add to their PVs. When this annotation is seen the kubelet automatically adds the given GID to the list of supplemental groups for the pod to which the PV is attached. This allows admins to create volumes and suggest a GID to use to access the volume. This is needed for volumes which do not support ownership management such as NFS.
@markturansky PTAL
- Expand Attacher/Detacher interfaces to break up work more
explicitly.
- Add arguments to all functions to avoid having implementers store
the data needed for operations.
- Expand unit tests to check that Attach, Detach, WaitForAttach,
WaitForDetach, MountDevice, and UnmountDevice get call where
appropriet.
Automatic merge from submit-queue
Rackspace improvements (OpenStack Cinder)
This adds PV support via Cinder on Rackspace clusters. Rackspace Cloud Block Storage is pretty much vanilla OpenStack Cinder, so there is no need for a separate Volume Plugin. Instead I refactored the Cinder/OpenStack interaction a bit (by introducing a CinderProvider Interface and moving the device path detection logic to the OpenStack part).
Right now this is limited to `AttachDisk` and `DetachDisk`. Creation and deletion of Block Storage is not in scope of this PR.
Also the `ExternalID` and `InstanceID` cloud provider methods have been implemented for Rackspace.
Automatic merge from submit-queue
Add mpio support for iscsi
This allows the iscsi volume to check if a iscsi device belongs to a mpio device
If it does belong to the device then we make sure we mount the mpio device instead of
the raw device.
The code is based on the current FibreChannel volume support for mpio
example
/dev/disk/by-path/iqn-example.com.2999 -> /dev/sde
Then we check
/sys/block/[dm-X]/slaves/xx
until we find the [dm-X] containing /dev/sde and mount it
Additional work that can be done in future
1. Add multiple portal support to iscsi
2. Move the FibreChannel volume provider to use the code that has been extracted