Commit Graph

3325 Commits

Author SHA1 Message Date
Kubernetes Prow Robot
dab6f6a43d Merge pull request #102344 from smarterclayton/keep_pod_worker
Prevent Kubelet from incorrectly interpreting "not yet started" pods as "ready to terminate pods" by unifying responsibility for pod lifecycle into pod worker
2021-07-08 16:48:53 -07:00
Li Bo
c3d9b10ca8 feature: support Memory QoS for cgroups v2 2021-07-08 09:26:46 +08:00
Kubernetes Prow Robot
36a7426aa5 Merge pull request #99144 from bart0sh/PR0094-promote-HugePageStorageMediumSize-to-GA
promote huge page storage medium size to GA
2021-07-07 18:09:05 -07:00
Clayton Coleman
3eadd1a9ea Keep pod worker running until pod is truly complete
A number of race conditions exist when pods are terminated early in
their lifecycle because components in the kubelet need to know "no
running containers" or "containers can't be started from now on" but
were relying on outdated state.

Only the pod worker knows whether containers are being started for
a given pod, which is required to know when a pod is "terminated"
(no running containers, none coming). Move that responsibility and
podKiller function into the pod workers, and have everything that
was killing the pod go into the UpdatePod loop. Split syncPod into
three phases - setup, terminate containers, and cleanup pod - and
have transitions between those methods be visible to other
components. After this change, to kill a pod you tell the pod worker
to UpdatePod({UpdateType: SyncPodKill, Pod: pod}).

Several places in the kubelet were incorrect about whether they
were handling terminating (should stop running, might have
containers) or terminated (no running containers) pods. The pod worker
exposes methods that allow other loops to know when to set up or tear
down resources based on the state of the pod - these methods remove
the possibility of race conditions by ensuring a single component is
responsible for knowing each pod's allowed state and other components
simply delegate to checking whether they are in the window by UID.

Removing containers now no longer blocks final pod deletion in the
API server and are handled as background cleanup. Node shutdown
no longer marks pods as failed as they can be restarted in the
next step.

See https://docs.google.com/document/d/1Pic5TPntdJnYfIpBeZndDelM-AbS4FN9H2GTLFhoJ04/edit# for details
2021-07-06 15:55:22 -04:00
Cheng Xing
c50b3074fe Moved VOLUME_MOUNT_GROUP capability check from NodeStageVolume to MountDevice; added log message in SetupAt to indicate FSGroup is delegated to driver 2021-07-03 16:29:42 -07:00
Cheng Xing
794a925a85 Pass FsGroup to NodeStageVolume 2021-07-03 16:29:42 -07:00
Cheng Xing
0e315355df Pass FsGroup to MountDevice 2021-07-03 16:29:42 -07:00
Cheng Xing
ae5668edef Pass FsGroup to NodePublishVolume 2021-07-03 16:29:42 -07:00
Cheng Xing
65db13a3a5 Combine capability check implementations 2021-07-03 16:29:42 -07:00
Chris Henzie
b7d732d3d6 Map PV access modes to CSI access modes 2021-06-28 21:25:38 -07:00
Chris Henzie
8db83c89aa CSI client helpers for NodeGetCapabilities 2021-06-28 21:25:37 -07:00
Chris Henzie
2b98f8edc7 Enforce ReadWriteOncePod access mode during mount 2021-06-28 21:25:37 -07:00
Chris Henzie
83e3ee780a Rename access mode contains helper method
So it is consistent with other methods performing the same check (one
for internal and external types)
2021-06-28 21:24:56 -07:00
Kubernetes Prow Robot
a0f9c8c277 Merge pull request #103001 from zshihang/csi
CSIServiceAccountToken ga
2021-06-26 19:31:23 -07:00
Kubernetes Prow Robot
55c0d318bb Merge pull request #103127 from PushkarJ/pkg-vol-csi-non-root-test-fix
Fix panic in pkg/volume/csi tests
2021-06-25 06:38:44 -07:00
Pushkar Joglekar
1e250610b2 Fix panic in pkg/volume/csi tests
When run as non-root user, TestAttacherMountDevice fails, because of missing
nil check that induces a panic. Fixed by doing err nil check
before using the returned user value from user.Current()
2021-06-24 10:14:20 -07:00
Davanum Srinivas
5feff280e1 remove fakefs to drop spf13/afero dependency
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2021-06-24 09:51:34 -04:00
Kubernetes Prow Robot
7f4abd897e Merge pull request #102414 from divyenpatel/use-ga-topology-labels-for-vsphere
Update vSphere volume topology label to GA
2021-06-21 18:13:57 -07:00
Shihang Zhang
8231a3e921 CSIServiceAccountToken ga 2021-06-21 11:35:24 -07:00
Divyen Patel
518844fd25 use GA topology labels for vsphere 2021-06-21 10:37:31 -07:00
Kubernetes Prow Robot
53bc4c13c1 Merge pull request #96115 from ncopa/disk-usage
Get inodes and disk usage via pure go
2021-06-18 20:30:50 -07:00
Kubernetes Prow Robot
4afb72a863 Merge pull request #100183 from jsafrane/fix-unstage-retry
Mark volume as uncertain after Unmount* fails
2021-06-18 11:04:06 -07:00
Jan Safranek
f4b41c0a17 Fix UnmountDevice error cases
When UnmountDevice fails, kubelet treat the volume mount as uncertain,
because it does not know at which stage UnmountDevice failed. It may be
already partially unmonted / destroyed.

As result, MountDevice will be performer when a new Pod is started on the
node after UnmountDevice faiure.
2021-06-16 18:39:04 +02:00
Kubernetes Prow Robot
278f856144 Merge pull request #102653 from wzshiming/fix/npe
Fix NPE for CSI mounter
2021-06-09 10:17:26 -07:00
Kubernetes Prow Robot
8b787f3a22 Merge pull request #100741 from mengjiao-liu/fix-test-err
Fix  incorrect test code in pkg/volume/csi/csi_attacher_test.go file
2021-06-08 08:33:12 -07:00
Kubernetes Prow Robot
0322d34a3e Merge pull request #100937 from mengjiao-liu/fix-metrics-nil-pointer
Fix csi_client_test.go metrics nil pointer dereference
2021-06-08 07:27:14 -07:00
Shiming Zhang
1eb8060dd6 Add test for CSI mounter 2021-06-08 18:42:31 +08:00
Shiming Zhang
c065d7c7b3 Fix NPE for CSI mounter 2021-06-08 10:29:46 +08:00
Kubernetes Prow Robot
dff5940ac3 Merge pull request #97534 from heqg/typo01
fix Spelling error for klog
2021-06-06 22:24:39 -07:00
Kubernetes Prow Robot
29a8105cec Merge pull request #101272 from Jiawei0227/deprecateflag
Remove CSIMigrationvSphereComplete flag
2021-06-05 10:40:38 -07:00
Hemant Kumar
f5739a15d1 The test was not very useful and required elevated access 2021-06-04 15:48:35 -04:00
Kubernetes Prow Robot
38783bfeb7 Merge pull request #102059 from jsafrane/fix-consistentread
Retry reading /proc/mounts when unable to get a consistent read
2021-06-03 21:59:37 -07:00
Kubernetes Prow Robot
807e70c46f Merge pull request #101605 from njuptlzf/flexvloume_test
cleanup: delete tempDir correctly after flexvloume_test is executed
2021-06-01 19:48:33 -07:00
Jan Safranek
f9a04f3bc4 Move error reporting to volume plugins
Move reporting of GetReliableMountRefs error to the volume plugins that
have more context about severity of the error.
2021-05-27 18:30:17 +02:00
Kubernetes Prow Robot
81e159f0b0 Merge pull request #101862 from jsafrane/fix-fc-detach-retry
Retry detaching FibreChannel volume few times
2021-05-27 07:14:23 -07:00
Kubernetes Prow Robot
f7e62dc5bb Merge pull request #100746 from mengjiao-liu/fix-nil-call
Fixed a possible nil pointer dereference caused by variable `plug`
2021-05-26 14:49:38 -07:00
Jan Safranek
a95842095e Retry reading /proc/mounts indifinetly in FC and iSCSI volume reconstruction
iSCSI and FC volume plugins do not implement real 3rd party attach/detach.
If reconstruction fails with an error on a FC or iSCSI volume, it will not
be unmounted from the volume global dir and at the same time it will be
marked as unused, to be available to be mounted on another node.

The volume can then be mounted on several nodes, resulting in volume
corruption.

The other block based volume plugins implement attach/detach that either
makes the volume stuck (can't be detached) or will be force-detached from a
node before attaching it somewhere else.
2021-05-26 23:08:19 +02:00
Jan Safranek
64e8396e30 Retry detaching FibreChannel volume few times
When UnmountDevice() of a FibreChannel volume fails after unmounting the
device and before the device is fully cleaned up, subsequent
UnmountDevice() retry won't find the device mounted and return without
retrying the device cleanup.

Therefore implement its own retry inside UnmountDevice() to make sure that
the volume devices are either fully cleaned or the error is serius enough
that even 1 minute of trying does not help.
2021-05-26 23:05:06 +02:00
刁浩 10284789
5908cd0d90 simplify returning boolean expression in /pkg/volume
Signed-off-by: 刁浩 10284789 <diao.hao@zte.com.cn>
2021-05-25 02:39:55 +00:00
Kubernetes Prow Robot
f545438bd3 Merge pull request #101587 from nixpanic/in-tree/block-metrics
Fix a panic for in-tree drivers that partialy support Block volume metrics
2021-05-24 16:18:47 -07:00
Kubernetes Prow Robot
f803daaca7 Merge pull request #101510 from huchengze/patch-12
migrate log in pkg/volume/plugins.go
2021-05-20 21:14:57 -07:00
mengjiao.liu
c24b87b133 Fixed a possible nil pointer dereference caused by variable plug 2021-05-21 10:17:04 +08:00
Niels de Vos
b997e0e4d6 Add SupportsMetrics() for Block-mode volumes
Volumes that are provisioned with `VolumeMode: Block` often have a
MetrucsProvider interface declared in their type. However, the
MetricsProvider should implement a GetMetrics() function. In the cases
where the storage drivers do not implement GetMetrics(), a panic can
occur.

Usual type-assertions are not sufficient in this case. All assertions
assume the interface is present. There is no straight forward way to
verify that a valid GetMetrics() function is provided.

By adding SupportsMetrics(), storage driver implementations require
careful reviewing for metrics support.
2021-05-20 17:10:23 +02:00
Niels de Vos
e7dedc5cd1 Support Capacity metric for block PVCs for in-tree drivers
PR #97972 added support for gathering metrics for Block PVCs provided by
CSI drivers. The in-tree drivers can support at leas the most basic
metric; Capacity.
2021-05-20 16:37:12 +02:00
Niels de Vos
2b9c81b87d Add helper functions for Block volume Capacity detection
Similar to how NewMetricsStatFS() works, the new NewMetricsBlock()
provides the GetMetrics() interface for Block volumes.

Additional metrics for Block volumes are difficult to gather. There is
no guarantee that there is a filesystem on the volume, which makes most
of the volume metrics useless.

Advanced storage might be able to detect the actual consumption (when
thin-provisioned) vs the capacity. However, this is out of the scope for
a standard helper function and requires intimate knowledge of the used
storage system.
2021-05-20 16:37:12 +02:00
Jiawei Wang
94db1e18ba Remove scaleio from volume plugins 2021-05-19 10:35:21 -07:00
Kubernetes Prow Robot
d01a5cae9c Merge pull request #97965 from chymy/fix-spell
Fix some case issue
2021-05-16 06:31:59 -07:00
Kubernetes Prow Robot
6768ac8115 Merge pull request #100894 from clickyotomy/sk/loop-dev-sysfs
Handle invalid `losetup' options
2021-05-12 05:05:39 -07:00
Ed Bartosh
c12aa0f6b7 promote HugePageStorageMediumSize to GA 2021-05-10 15:57:55 +03:00
Kubernetes Prow Robot
160cdbbdca Merge pull request #101534 from kassarl/issue-98281
Use GA topology labels for Azuredisk
2021-05-07 13:32:00 -07:00