Automatic merge from submit-queue (batch tested with PRs 45374, 44537, 45739, 44474, 45888)
Fix attach volume to instance repeatedly
1.When volume's status is 'attaching', controllermanager will attach
it again and return err. So it is necessary to check volume's
status before attach/detach volume.
2. When volume's status is 'attaching', its attachments will be None,
controllermanager can't get device path and make some failed event.
But it is normal, so don't return err when attachments is None
Fix bug: #44536
Automatic merge from submit-queue (batch tested with PRs 45408, 45355, 45528)
Make createEndpointService() and deleteEndpointService() plugin interface methods.
Why this change?
In some setups, after creation of dynamic PVs and before mounting/using these PVs in a pod, the endpoint/service got mistakenly deleted by the user/developer. By making these methods 'plugin' specific, we can call it from mounter if there are scenarios where the endpoint and service got wiped in between accidentally.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
In some setups, after creation of dynamic PVs and before mounting/using
these PVs in a pod, the endpoint/service got mistakenly deleted by the
user/developer. By making these methods 'plugin' specific, we can call
it from mounter if there are scenarios where the endpoint and service
got wiped in between accidentally.
Signed-off-by: Humble Chirammal hchiramm@redhat.com
When volume's status is 'attaching', its attachments will be None,
controllermanager can't get device path and make some failed event.
But it is normal, let's fix it.
Automatic merge from submit-queue (batch tested with PRs 44798, 45537, 45448, 45432)
nfs.go: cleancode err
**What this PR does / why we need it**:
The modification makes code clean, simple, and easy to inspect.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Statefulsets for cinder: allow multi-AZ deployments, spread pods across zones
**What this PR does / why we need it**: Currently if we do not specify availability zone in cinder storageclass, the cinder is provisioned to zone called nova. However, like mentioned in issue, we have situation that we want spread statefulset across 3 different zones. Currently this is not possible with statefulsets and cinder storageclass. In this new solution, if we leave it empty the algorithm will choose the zone for the cinder drive similar style like in aws and gce storageclass solutions.
**Which issue this PR fixes** fixes#44735
**Special notes for your reviewer**:
example:
```
kind: StorageClass
apiVersion: storage.k8s.io/v1beta1
metadata:
name: all
provisioner: kubernetes.io/cinder
---
apiVersion: v1
kind: Service
metadata:
annotations:
service.alpha.kubernetes.io/tolerate-unready-endpoints: "true"
name: galera
labels:
app: mysql
spec:
ports:
- port: 3306
name: mysql
clusterIP: None
selector:
app: mysql
---
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
name: mysql
spec:
serviceName: "galera"
replicas: 3
template:
metadata:
labels:
app: mysql
annotations:
pod.alpha.kubernetes.io/initialized: "true"
spec:
containers:
- name: mysql
image: adfinissygroup/k8s-mariadb-galera-centos:v002
imagePullPolicy: Always
ports:
- containerPort: 3306
name: mysql
- containerPort: 4444
name: sst
- containerPort: 4567
name: replication
- containerPort: 4568
name: ist
volumeMounts:
- name: storage
mountPath: /data
readinessProbe:
exec:
command:
- /usr/share/container-scripts/mysql/readiness-probe.sh
initialDelaySeconds: 15
timeoutSeconds: 5
env:
- name: POD_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
volumeClaimTemplates:
- metadata:
name: storage
annotations:
volume.beta.kubernetes.io/storage-class: all
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 12Gi
```
If this example is deployed it will automatically create one replica per AZ. This helps us a lot making HA databases.
Current storageclass for cinder is not perfect in case of statefulsets. Lets assume that cinder storageclass is defined to be in zone called nova, but because labels are not added to pv - pods can be started in any zone. The problem is that at least in our openstack it is not possible to use cinder drive located in zone x from zone y. However, should we have possibility to choose between cross-zone cinder mounts or not? Imo it is not good way of doing things that they mount volume from another zone where the pod is located(means more network traffic between zones)? What you think? Current new solution does not allow that anymore (should we have possibility to allow it? it means removing the labels from pv).
There might be some things that needs to be fixed still in this release and I need help for that. Some parts of the code is not perfect.
Issues what i am thinking about (I need some help for these):
1) Can everybody see in openstack what AZ their servers are? Can there be like access policy that do not show that? If AZ is not found from server specs, I have no idea how the code behaves.
2) In GetAllZones() function, is it really needed to make new serviceclient using openstack.NewComputeV2 or could I somehow use existing one
3) This fetches all servers from some openstack tenant(project). However, in some cases kubernetes is maybe deployed only to specific zone. If kube servers are located for instance in zone 1, and then there are another servers in same tenant in zone 2. There might be usecase that cinder drive is provisioned to zone-2 but it cannot start pod, because kubernetes does not have any nodes in zone-2. Could we have better way to fetch kubernetes nodes zones? Currently that information is not added to kubernetes node labels automatically in openstack (which should I think). I have added those labels manually to nodes. If that zone information is not added to nodes, the new solution does not start stateful pods at all, because it cannot target pods.
cc @rootfs @anguslees @jsafrane
```release-note
Default behaviour in cinder storageclass is changed. If availability is not specified, the zone is chosen by algorithm. It makes possible to spread stateful pods across many zones.
```
Automatic merge from submit-queue
add rootfs gnufied and childsb to volume approver
**What this PR does / why we need it**:
add me and @gnufied @childsb to volume approver
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Remove unnecessary constants and add type to secret
**What this PR does / why we need it**:
Adds the type field to the secret for the `persistent-volume-provisioning` example of Quobyte. Also remove unnecessary constants in Quobyte Code base.
FYI
@rootfs @saad-ali @quolix
Automatic merge from submit-queue (batch tested with PRs 44590, 44969, 45325, 45208, 44714)
Use dedicated UnixUserID and UnixGroupID types
**What this PR does / why we need it**:
DRYs up type definitions by using the dedicated types in apimachinery
**Which issue this PR fixes**
#38120
**Release note**:
```release-note
UIDs and GIDs now use apimachinery types
```
Automatic merge from submit-queue (batch tested with PRs 44590, 44969, 45325, 45208, 44714)
Refactor volume operation log and error messages
What this PR does / why we need it:
Adds wrappers for volume-specific error and log messages. Each message has a simple version that can be displayed to the user and a detailed version that can be used in logs. The messages that are used for events was also cleaned up. @msau42
Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes#40905
Special notes for your reviewer:
pkg/kubelet/volumemanager/reconciler/reconciler.go can be refactored. I can do that refactoring after this one.
Release note:
NONE
Automatic merge from submit-queue (batch tested with PRs 45283, 45289, 45248, 44295)
Azure disk: dealing with missing disk probe
**What this PR does / why we need it**:
While Azure disks are expected to attach to SCSI host 3 and above on general purpose instances, on certain Azure instances disks are under SCSI host 2.
This fix searches all LUNs but excludes those used by Azure sys disks, based on udev rules [here](https://raw.githubusercontent.com/Azure/WALinuxAgent/master/config/66-azure-storage.rules)
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Log node name when error attaching volume
Helps with debugging to know immediately which node the volume failed to atach to. Went through all plugins, added this to 3. @gnufied
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 44601, 44842, 44893, 44491, 44588)
Define const annotation variable once
We do not need to define the const annotation var twice in pkg/volume and pkg/volume/validation
**Release note**:
```release-note
NONE
```
When the attach/detach controller crashes and a pod with attached PV is deleted
afterwards the controller will never detach the pod's attached volumes. To
prevent this the controller should try to recover the state from the nodes
status.
Automatic merge from submit-queue (batch tested with PRs 40055, 42085, 44509, 44568, 43956)
Fix gofmt errors
**What this PR does / why we need it**:
There were some gofmt errors on master. Ran the following to fix:
```
hack/verify-gofmt.sh | grep ^diff | awk '{ print $2 }' | xargs gofmt -w -s
```
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: none
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 40777, 43673)
remove an unnecassary variable assignment in glusterfs_test
**What this PR does / why we need it**:
`path` is exactly the same variable as `volumePath`, which is defined in line 122 . So no needs to assign it.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 44362, 44421, 44468, 43878, 44480)
Delete EmptyDir volume directly instead of renaming the directory.
**What this PR does / why we need it**:
The volume operation executor can handle duplicate requests on the same volume now, so it is not necessary to rename the directory anymore. This change can cause pod deletion to take longer for large emptydir volumes because now the pod waits for the volume to be deleted until it continues pod cleanup. But this is actually required for local disk scheduling so that we don't schedule new pods that need emptydir volumes on the node if the previous emptydir has not be fully reclaimed yet.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#43534
**Special notes for your reviewer**:
**Release note**:
NONE
cc @kubernetes/sig-storage-pr-reviews
Automatic merge from submit-queue
Update owners to include kerneltime
**What this PR does / why we need it**: Update owners to include kerneltime to help with PRs
Automatic merge from submit-queue
Catch error when failed to make directory in NFS volume plugin
NFS: Catch error when failed to make directory
Currently, NFS volume plugin doesn't catch the error from
os.MkdirAll. That makes it difficult to debug why failed to make the
directory. This patch adds error catch to os.MkdirAll.
Automatic merge from submit-queue
azure disk: add logging on disk attach
**What this PR does / why we need it**:
While we were debugging a failed azure disk attach, we were missing logging information to identify the root cause. This fix logs information at each stage of attach to help identify where problem is once it happens again.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
NONE