kubernetes

Author	SHA1	Message	Date
Ashutosh Kumar	c00975370a	Handle Non-graceful Node Shutdown (#108486 ) Signed-off-by: Ashutosh Kumar <sonasingh46@gmail.com> Co-authored-by: Ashutosh Kumar <sonasingh46@gmail.com> Co-authored-by: xing-yang <xingyang105@gmail.com>	2022-03-26 09:23:21 -07:00
Jean-Francois Remy	e83184568d	Add unit tests - actual_state_of_world_test.go: test the new method GetVolumesToReportAttachedForNode for an existing node and a non-existing node - node_status_updater_test.go: test UpdateNodeStatuses and UpdateNodeStatuses in nominal case with 2 nodes getting one volume each. Test UpdateNodeStatuses with the first call to node.patch failing but the following one succeeding - add comment in node_status_updater.go - fix log line in reconciler.go - rename variable in actual_state_of_world.go	2022-02-22 12:21:58 -08:00
Jean-Francois Remy	f1717baaaa	Fix nodes volumesAttached status not updated The UpdateNodeStatuses code stops too early in case there is an error when calling updateNodeStatus. It will return immediately which means any remaining node won't have its update status put back to true. Looking at the call sites for UpdateNodeStatuses, it appears this is not the only issue. If the lister call fails with anything but a Not Found error, it's silently ignored which is wrong in the detach path. Also the reconciler detach path calls UpdateNodeStatuses but the real intent is to only update the node currently processed in the loop and not proceed with the detach call if there is an error updating that specifi node volumesAttached property. With the current implementation, it will not proceed if there is an error updating another node (which is not completely bad but not ideal) and worse it will proceed if there is a lister error on that node which means the node volumesAttached property won't have been updated. To fix those issues, introduce the following changes: - [node_status_updater] introduce UpdateNodeStatusForNode which does what UpdateNodeStatuses does but only for the provided node - [node_status_updater] if the node lister call fails for anything but a Not Found error, we will return an error, not ignore it - [node_status_updater] if the update of a node volumesAttached properties fails we continue processing the other nodes - [actual_state_of_world] introduce GetVolumesToReportAttachedForNode which does what GetVolumesToReportAttached but for the node whose name is provided it returns a bool which indicates if the node in question needs an update as well as the volumesAttached list. It is used by UpdateNodeStatusForNode - [actual_state_of_world] use write lock in updateNodeStatusUpdateNeeded, we're modifying the map content - [reconciler] use UpdateNodeStatusForNode in the detach loop	2022-02-22 12:20:53 -08:00
Patrick Ohly	9eaa2dc554	avoid klog Info calls without verbosity In the following code pattern, the log message will get logged with v=0 in JSON output although conceptually it has a higher verbosity: if klog.V(5).Enabled() { klog.Info("hello world") } Having the actual verbosity in the JSON output is relevant, for example for filtering out only the important info messages. The solution is to use klog.V(5).Info or something similar. Whether the outer if is necessary at all depends on how complex the parameters are. The return value of klog.V can be captured in a variable and be used multiple times to avoid the overhead for that function call and to avoid repeating the verbosity level.	2022-01-12 07:48:36 +01:00
Jing Xu	69b9f9b1f0	Fix issue in node status updating VolumeAttached list During volume detach, the following might happen in reconciler 1. Pod is deleting 2. remove volume from reportedAsAttached, so node status updater will update volumeAttached list 3. detach failed due to some issue 4. volume is added back in reportedAsAttached 5. reconciler loops again the volume, remove volume from reportedAsAttached 6. detach will not be trigged because exponential back off, detach call will fail with exponential backoff error 7. another pod is added which using the same volume on the same node 8. reconciler loops and it will NOT try to tigger detach anymore At this point, volume is still attached and in actual state, but volumeAttached list in node status does not has this volume anymore, and will block volume mount from kubelet. The fix in first round is to add volume back into the volume list that need to reported as attached at step 6 when detach call failed with error (exponentical backoff). However this might has some performance issue if detach fail for a while. During this time, volume will be keep removing/adding back to node status which will cause a surge of API calls. So we changed to logic to check first whether operation is safe to retry which means no pending operation or it is not in exponentical backoff time period before calling detach. This way we can avoid keep removing/adding volume from node status. Change-Id: I5d4e760c880d72937d34b9d3e904ecad125f802e	2021-10-05 09:44:35 -07:00
Kubernetes Prow Robot	8bf42039e6	Merge pull request #96552 from pandaamanda/klog_fmt use klog.Info and klog.Warning when had no format	2021-01-15 17:57:43 -08:00
xiongzhongliang	90f4aeeea4	use klog.Info and klog.Warning when had no format	2020-11-14 00:55:06 +08:00
Cheng Xing	d9a629fe3a	IsVolumeAttachedToNode() renamed to GetAttachState(), and returns 3 states instead of combining "uncertain" and "detached" into "false"	2020-10-29 13:24:51 -07:00
Cheng Xing	a61743b125	Fixes Attach Detach Controller reconciler race reading ActualStateOfWorld and operation pending states; fixes reconciler_test mock detach to account for multiple attaches on a node	2020-10-27 23:51:55 -07:00
Davanum Srinivas	442a69c3bd	switch over k/k to use klog v2 Signed-off-by: Davanum Srinivas <davanum@gmail.com>	2020-05-16 07:54:27 -04:00
Cheng Xing	ef3d66b98b	Parallelize attach operations across different nodes for volumes that allow multi-attach	2020-03-05 22:22:05 -08:00
Jordan Liggitt	cd1059e3c4	Revert "Merge pull request #87258 from verult/slow-rxm-attach" This reverts commit `15c3f1b119`, reversing changes made to `52d7614a8c`.	2020-01-29 14:58:32 -05:00
Cheng Xing	c6a03fa5be	Parallelize attach operations across different nodes for volumes that allow multi-attach	2020-01-27 15:02:25 -08:00
yuxiaobo	81e9f21f83	Correct spelling mistakes Signed-off-by: yuxiaobo <yuxiaobogo@163.com>	2019-11-06 20:25:19 +08:00
Jing Xu	7bac6ca73a	Address comments This commit addressed the comment and also add a unit test.	2019-01-11 10:57:37 -08:00
Jing Xu	562d0fea53	Handle failed attach operation leave uncertain volume attach state This commit adds the unit tests for the PR. It also includes some files that are affected by the function name changes.	2018-11-19 17:21:49 -08:00
Davanum Srinivas	954996e231	Move from glog to klog - Move from the old github.com/golang/glog to k8s.io/klog - klog as explicit InitFlags() so we add them as necessary - we update the other repositories that we vendor that made a similar change from glog to klog * github.com/kubernetes/repo-infra * k8s.io/gengo/ * k8s.io/kube-openapi/ * github.com/google/cadvisor - Entirely remove all references to glog - Fix some tests by explicit InitFlags in their init() methods Change-Id: I92db545ff36fcec83afe98f550c9e630098b3135	2018-11-10 07:50:31 -05:00
nikopen	6f2a45aefe	Fix VMWare VM freezing bug by reverting #51066	2018-08-24 14:28:44 +02:00
Fabio Bertinatto	4ce2058ef6	Add more metrics for A/D Controller: * Number of Volumes in ActualStateofWorld and DesiredStateofWorld * Numer of times A/D Controller performs force detach	2018-08-15 10:01:57 +02:00
Michal Fojtik	97f546d249	volume: decrease memory allocations for debugging messages	2018-06-11 13:52:38 +02:00
Di Xu	48388fec7e	fix all the typos across the project	2018-02-11 11:04:14 +08:00
Jan Safranek	e46c886bf3	Add list of pods that use a volume to multiattach events So users knows what pods are blocking a volume and can realize their error.	2018-01-24 13:22:03 +01:00
Hemant Kumar	67d4c40849	Fix spam of multiattach errors in event logs We should be careful while generating multiattach errors. We seem to be generating too many of them because old code had minor bug.	2017-10-03 15:45:06 -04:00
Balu Dontu	cfdff1ae46	Multi-Attach volume fix for vSphere	2017-08-21 18:06:29 -07:00
Kubernetes Submit Queue	c662e1d7d8	Merge pull request #46949 from xingzhou/typo Automatic merge from submit-queue Fixed a comment typo Typo fix Fixed #48414 Release note: ``` None ```	2017-07-03 11:33:36 -07:00
Chao Xu	60604f8818	run hack/update-all	2017-06-22 11:31:03 -07:00
Chao Xu	f4989a45a5	run root-rewrite-v1-..., compile	2017-06-22 10:25:57 -07:00
Kubernetes Submit Queue	bebe346d5f	Merge pull request #42252 from justinsb/volumes_raise_loglevels Automatic merge from submit-queue (batch tested with PRs 42252, 42251, 42249, 47512, 47887) volumes: promote some logs from info -> warning Part of #40583 ```release-note NONE ```	2017-06-21 22:13:24 -07:00
Xing Zhou	750d0d8730	Fixed a comment typo	2017-06-05 10:47:59 +08:00
NickrenREN	add091b1fb	fix regression in UX experience for double attach volume send event when volume is not allowed to multi-attach	2017-05-25 09:27:24 +08:00
Justin Santa Barbara	35be997c2f	volumes: promote some logs from info -> warning Part of #40583	2017-05-23 22:36:42 -04:00
Alexander Block	06baeb33b2	Don't try to attach volumes which are already attached to other nodes	2017-05-18 06:56:30 +02:00
Kubernetes Submit Queue	6dbe853e29	Merge pull request #45544 from ianchakeres/reconciler-err-cleanup Automatic merge from submit-queue (batch tested with PRs 45990, 45544, 45745, 45742, 45678) Refactor reconciler volume log and error messages What this PR does / why we need it: Utilizes volume-specific error and log messages introduced in #44969, inside files that also log volume information. Specifically: - pkg/kubelet/volumemanager/reconciler/reconciler.go, - pkg/controller/volume/attachdetach/reconciler/reconciler.go, and - pkg/kubelet/volumemanager/populator/desired_state_of_world_populator.go Which issue this PR fixes : fixes #40905 Special notes for your reviewer: Release note: ```release-note ``` NONE	2017-05-17 18:40:51 -07:00
Ian Chakeres	b1315f4491	Refactor reconciler volume log and error messages	2017-05-11 22:33:17 -07:00
NickrenREN	0861688237	add and clear err message in RemoveVolumeFromReportAsAttached	2017-05-08 09:37:21 +08:00
Tomas Smetana	852c44ae59	Fix issue #34242 : Attach/detach should recover from a crash When the attach/detach controller crashes and a pod with attached PV is deleted afterwards the controller will never detach the pod's attached volumes. To prevent this the controller should try to recover the state from the nodes status.	2017-04-20 13:04:50 +02:00
Hemant Kumar	786da1de12	Impement bulk polling of volumes This implements Bulk volume polling using ideas presented by justin in https://github.com/kubernetes/kubernetes/pull/39564 But it changes the implementation to use an interface and doesn't affect other implementations.	2017-03-02 14:59:59 -05:00
Kubernetes Submit Queue	f74b4bbbad	Merge pull request #38094 from yarntime/fix_update_typo Automatic merge from submit-queue fix typos fix typos.	2017-01-16 18:22:33 -08:00
deads2k	6a4d5cd7cc	start the apimachinery repo	2017-01-11 09:09:48 -05:00
yarntime@163.com	f7c737e8a9	fix typos	2017-01-11 16:08:20 +08:00
chrislovecnm	ac49139c9f	updates from review	2017-01-09 17:20:19 -07:00
chrislovecnm	a973c38c7d	The capability to control duration via controller-manager flags, and the option to shut off reconciliation.	2017-01-09 16:47:13 -07:00
Jing Xu	3d3e44e77e	fix issue in converting aws volume id from mount paths This PR is to fix the issue in converting aws volume id from mount paths. Currently there are three aws volume id formats supported. The following lists example of those three formats and their corresponding global mount paths: 1. aws:///vol-123456 (/var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/aws/vol-123456) 2. aws://us-east-1/vol-123456 (/var/lib/kubelet/plugins/kubernetes.io/mounts/aws/us-est-1/vol-123455) 3. vol-123456 (/var/lib/kubelet/plugins/kubernetes.io/mounts/aws/us-est-1/vol-123455) For the first two cases, we need to check the mount path and convert them back to the original format.	2016-11-16 22:35:48 -08:00
Jing Xu	abbde43374	Add sync state loop in master's volume reconciler At master volume reconciler, the information about which volumes are attached to nodes is cached in actual state of world. However, this information might be out of date in case that node is terminated (volume is detached automatically). In this situation, reconciler assume volume is still attached and will not issue attach operation when node comes back. Pods created on those nodes will fail to mount. This PR adds the logic to periodically sync up the truth for attached volumes kept in the actual state cache. If the volume is no longer attached to the node, the actual state will be updated to reflect the truth. In turn, reconciler will take actions if needed. To avoid issuing many concurrent operations on cloud provider, this PR tries to add batch operation to check whether a list of volumes are attached to the node instead of one request per volume. More details are explained in PR #33760	2016-10-28 09:24:53 -07:00
Jing Xu	efaceb28cc	Fix race condition in updating attached volume between master and node This PR tries to fix issue #29324. This cause of this issue is a race condition happens when marking volumes as attached for node status. This PR tries to clean up the logic of when and where to mark volumes as attached/detached. Basically the workflow as follows, 1. When volume is attached sucessfully, the volume and node info is added into nodesToUpdateStatusFor to mark the volume as attached to the node. 2. When detach request comes in, it will check whether it is safe to detach now. If the check passes, remove the volume from volumesToReportAsAttached to indicate the volume is no longer considered as attached now. Afterwards, reconciler tries to update node status and trigger detach operation. If any of these operation fails, the volume is added back to the volumesToReportAsAttached list showing that it is still attached. These steps should make sure that kubelet get the right (might be outdated) information about which volume is attached or not. It also garantees that if detach operation is pending, kubelet should not trigger any mount operations.	2016-09-12 13:51:08 -07:00
saadali	88d495026d	Allow mounts to run in parallel for non-attachable Allow mount volume operations to run in parallel for non-attachable volume plugins. Allow unmount volume operations to run in parallel for all volume plugins.	2016-07-19 21:54:26 -07:00
Morgan Bauer	69719167a3	close channel to prevent memory leak - wait.JitterUntil goroutine is never cleaned up when used with wait.NeverStop - fixup comment	2016-07-06 09:34:20 -07:00
saadali	0dd17fff22	Reorganize volume controllers and manager	2016-07-01 18:50:25 -07:00

48 Commits