Commit Graph

28 Commits

Author SHA1 Message Date
Jeff Grafton
efee0704c6 Autogenerate BUILD files 2017-12-23 13:12:11 -08:00
vikaschoudhary16
bf1fb46347 Look for requested resources in the Requests 2017-12-17 22:56:45 -05:00
vikaschoudhary16
8c51d235d6 Refactor TestPodContainerDeviceAllocation to make it readable and extensible 2017-12-16 20:32:08 -05:00
Kubernetes Submit Queue
a902959544
Merge pull request #56911 from WanLinghao/projected_test_fix
Automatic merge from submit-queue (batch tested with PRs 56650, 55813, 56911, 56921, 56871). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

fix deviceplugin test file create leak file problem

When execute make test, this test file will create a file named "kubelet_internal_checkpoint" in k8s directory and not delete it.

This patch fix this error




**What this PR does / why we need it**:

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #56365

**Special notes for your reviewer**:

**Release note**:

```release-note

```
2017-12-16 12:10:48 -08:00
Kubernetes Submit Queue
578f3db8d5
Merge pull request #55382 from vikaschoudhary16/checkpoint
Automatic merge from submit-queue (batch tested with PRs 57172, 55382, 56147, 56146, 56158). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Use file store utility for device plugin checkpointing

Partially address issue #54088
cc @sjenning @jeremyeder @jiayingz @vishh 

/sig node
2017-12-14 12:38:13 -08:00
WanLinghao
3e7e4ab397 old test file will create a leak file in current directory.
this patch fix this.
	modified:   pkg/kubelet/cm/deviceplugin/manager_test.go
2017-12-07 11:57:17 +08:00
Jiaying Zhang
d4244f3ded Re-uses device plugin resources allocated to init containers.
Implements option 2 mentioned in
https://github.com/kubernetes/kubernetes/issues/56022#issuecomment-348286184
2017-12-04 22:01:28 -08:00
vikaschoudhary16
de358fb21f Use file store utility for device plugin check-pointing 2017-11-24 08:41:11 -05:00
Jiaying Zhang
048bafdd0b Adds device plugin registration count metric and allocation latency metric. 2017-11-21 13:44:10 -08:00
Jiaying Zhang
1eb4e79453 Extends deviceplugin to gracefully handle full device plugin lifecycle.
- Instead of using cm.capacity field to communicate device plugin resource
capacity, this PR changes to use an explicit cm.GetDevicePluginResourceCapacity()
function that returns device plugin resource capacity as well as any inactive
device plugin resource. Kubelet syncNodeStatus call this function during its
periodic run to update node status capacity and allocatable. After this call,
device plugin can remove the inactive device plugin resource from its allDevices
field as the update is already pushed to API server.
- Extends device plugin checkpoint data to record registered resources
so that we can finish resource removing even upon kubelet restarts.
- Passes sourcesReady from kubelet to device plugin to avoid removing
inactive pods during grace period of kubelet restart.
2017-11-20 23:40:14 -08:00
Niklas Q. Nielsen
b16bfc768d Merging handler into manager API 2017-11-20 21:37:46 +00:00
Kubernetes Submit Queue
0b1d023aa7
Merge pull request #55884 from mpolednik/dpi-race-fix
Automatic merge from submit-queue (batch tested with PRs 55839, 54495, 55884, 55983, 56069). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

deviceplugin: fix race when multiple plugins are registered

**What this PR does / why we need it**:
When registering multiple device plugins to Kubelet concurrently, there exists a race that crashes the Kubelet.

Consider two plugins: D1 and D2. The call order method is roughly

D1 -> manager.go:register -> endpoint.go:listAndWatch -> device_plugin_handler.go:(*D1).callback
D2 -> manager.go:register -> endpoint.go:listAndWatch -> device_plugin_handler.go:(*D2).callback

The callback function accesses HandlerImpl's allDevices map that maps (resourceName -> DeviceID). If both plugins reach these accesses at the same time, Kubelet crashes with "fatal error: concurrent map read and map write".

This can be solved by making sure handler is locked when allDevices are being updated. The functionality is needed to avoid Kubelet crashes when multiple device plugins are trying to register with Kubelet at the same moment. Occurs frequently when single binary tries to register itself as multiple plugins.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:

**Special notes for your reviewer**:

**Release note**:
```release-note
NONE
```
2017-11-20 13:08:09 -08:00
Martin Polednik
6e3f8f3890 deviceplugin: fix race when multiple plugins are registered
Signed-off-by: Martin Polednik <mpolednik@redhat.com>
2017-11-16 15:20:00 +01:00
Kubernetes Submit Queue
6f35d49079
Merge pull request #52149 from lichuqiang/combineListwatch
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Deviceplugin refactoring: merge func list and listwatch in endpoint into one

**What this PR does / why we need it**:
merge func list and listwatch in endpoint into one, since we won't call list func individually

**Which issue this PR fixes**
fixes #51993
Part2

**Special notes for your reviewer**:
/cc @jiayingz @RenaudWasTaken @vishh

**Release note**:

```release-note
NONE
```
2017-11-15 16:56:51 -08:00
Jiaying Zhang
93916242f7 Adds jiayingz@ and vish@ as approvers for pkg/kubelet/cm/deviceplugin/. 2017-11-14 15:27:02 -08:00
lichuqiang
4fa0fa5ad1 pass devices of previous endpoint into re-registered one to avoid potential orphaned devices upon re-registration 2017-11-14 16:43:19 +08:00
Kubernetes Submit Queue
e2c02f425a
Merge pull request #53970 from ScorpioCPH/add-more-comments
Automatic merge from submit-queue (batch tested with PRs 55283, 55461, 55288, 53970, 55487). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add more comments for DevicePluginHandlerImpl struct

**What this PR does / why we need it**:

Add more comments

**Special notes for your reviewer**:

@jiayingz PTAL.

**Release note**:

```
NONE
```
2017-11-13 12:32:27 -08:00
Dr. Stefan Schimanski
bec617f3cc Update generated files 2017-11-09 12:14:08 +01:00
Dr. Stefan Schimanski
012b085ac8 pkg/apis/core: mechanical import fixes in dependencies 2017-11-09 12:14:08 +01:00
Penghao Cen
1d4e1942d8 Add more comments for HandlerImpl struct 2017-11-03 18:24:32 +08:00
Kubernetes Submit Queue
2084f7f4f3
Merge pull request #54488 from lichuqiang/plugin_base
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add admission handler for device resources allocation

**What this PR does / why we need it**:
Add admission handler for device resources allocation to fail fast during pod creation

**Which issue this PR fixes** 
fixes #51592

**Special notes for your reviewer**:
@jiayingz Sorry, there is something wrong with my branch in #51895. And I think the existing comments in the PR might be too long for others to view. So I closed it and opened the new one, as we have basically reach an agreement on the implement :)
I have covered the functionality and unit test part here, and would set about the e2e part ASAP

/cc @jiayingz @vishh @RenaudWasTaken 

**Release note**:

```release-note
NONE
```
2017-11-02 17:24:06 -07:00
lichuqiang
0630896383 update unit test for plugin resources allocation reinforcement 2017-11-02 09:18:24 +08:00
lichuqiang
ebd445eb8c add admission handler for device resources allocation 2017-11-02 09:17:48 +08:00
Rohit Agarwal
092429be1c Better error messages and logging while registering device plugins. 2017-10-26 15:17:38 -07:00
lichuqiang
6a39ac3874 merge func list and listwatch into one 2017-10-26 16:36:16 +08:00
Jiaying Zhang
e501f01d85 Move podDevices code into a separate file. 2017-10-24 17:48:59 -07:00
Jiaying Zhang
ff4e8d429e Device plugin code refactoring to cope with file move.
While moving device_plugin_handler_test.go from pkg/kubelet/cm/ to
pkg/kubelet/cm/deviceplugin/, we can no longer uses cm in its tests
because that would cause a cycle dependency. To solve this problem,
I moved the main cm GetResources functionality as well as part of the
current device plugin handler Allocate functionality into a new device
plugin handler function, GetDeviceRunContainerOptions(). This
refactoring is also needed by another PR 51895 that moves device
allocation into admission phase. Now device plugin handler Allocate()
first checks whether there is cached device runtime state and only
issues Allocate grpc call if there is no cached state available.
The new GetDeviceRunContainerOptions() function simply returns device
runtime config from the cached state. To support this change, extended the
podDevices struct and checkpoint data structure with device runtime state.
2017-10-24 14:38:15 -07:00
Jiaying Zhang
796f488789 Move device plugin related files under pkg/kubelet/cm/deviceplugin/. 2017-10-24 14:17:20 -07:00