Commit Graph

821 Commits

Author SHA1 Message Date
nolancon
709989efa2 CPU Manager - Rename policy.AddContainer() to policy.Allocate() 2020-02-27 07:24:33 +00:00
Kevin Klues
0d68bffd03 Change GetTopologyPodAdmitHandler() to be more general
GetTopologyPodAdmitHandler() now returns a lifecycle.PodAdmitHandler
type instead of the TopologyManager directly. The handler it returns
is generally responsible for attempting to allocate any resources that
require a pod admission check. When the TopologyManager feature gate
is on, this comes directly from the TopologyManager. When it is off,
we simply attempt the allocations ourselves and fail the admission
on an unexpected error. The higher level kubelet.go feature gate
check will be removed in an upcoming PR.
2020-02-27 07:24:26 +00:00
Oleg Chunikhin
b651178849 fix incorrect configuration of kubepods.slice unit by kubelet (issue #88197) 2020-02-17 13:22:45 -05:00
Kubernetes Prow Robot
1a0f923a65
Merge pull request #87712 from alena1108/jan30kubelet
Ineffassign fixes for pkg/controller and kubelet
2020-02-14 14:29:27 -08:00
Kevin Klues
0b168f0243 Change devicemanager to implement HintProvider.Allocate()
This change will not work on its own. Higher level code needs to make
sure and call Allocate() before AddContainer is called. This is already
being done in cases when the TopologyManager feature gate is enabled (in
the PodAdmitHandler of the TopologyManager). However, we need to make
sure we add proper logic to call it in cases when the TopologyManager
feature gate is disabled.
2020-02-10 03:27:47 +00:00
Kevin Klues
91f91858a5 Change CPUManager to implement HintProvider.Allocate()
This change will not work on its own. Higher level code needs to make
sure and call Allocate() before AddContainer is called. This is already
being done in cases when the TopologyManager feature gate is enabled (in
the PodAdmitHandler of the TopologyManager). However, we need to make
sure we add proper logic to call it in cases when the TopologyManager
feature gate is disabled.
2020-02-10 03:27:47 +00:00
Kevin Klues
9e4ee5ecc3 Add Allocate() call to TopologyManager's HintProvider interface
Having this interface allows us to perform a tight loop of:

    for each container {
        containerHints = {}
        for each provider {
	    containerHints[provider] = provider.GatherHints(container)
        }

        containerHints.MergeAndPublish()

        for each provider {
	    provider.Allocate(container)
        }
    }

With this in place we can now be sure that the hints gathered in one
iteration of the loop always consider the allocations made in the
previous.
2020-02-10 03:27:47 +00:00
Kevin Klues
a3f099ea4d Split devicemanager Allocate into two functions
Instead of having a single call for Allocate(), we now split this into two
functions Allocate() and UpdatePluginResources().

The semantics split across them:

// Allocate configures and assigns devices to a pod. From the requested
// device resources, Allocate will communicate with the owning device
// plugin to allow setup procedures to take place, and for the device
// plugin to provide runtime settings to use the device (environment
// variables, mount points and device files).
Allocate(pod *v1.Pod) error

// UpdatePluginResources updates node resources based on devices already
// allocated to pods. The node object is provided for the device manager to
// update the node capacity to reflect the currently available devices.
UpdatePluginResources(
    node *schedulernodeinfo.NodeInfo,
    attrs *lifecycle.PodAdmitAttributes) error

As we move to a model in which the TopologyManager is able to ensure
aligned allocations from the CPUManager, devicemanger, and any
other TopologManager HintProviders in the same synchronous loop, we will
need to be able to call Allocate() independently from an
UpdatePluginResources(). This commit makes that possible.
2020-02-10 03:27:47 +00:00
Takeaki Matsumoto
785fac6826 Make updateAllocatedDevices() as a public method and call it in
podresources api
2020-02-07 13:26:56 +09:00
Kevin Klues
d5addb4090 Cleanup logging and creation logic of TopologyManager in prep for beta 2020-02-03 17:13:29 +00:00
Kevin Klues
bc686ea27b Update TopologyManager.GetTopologyHints() to take pointers
Previously, this function was taking full Pod and Container objects
unnecessarily. This commit updates this so that they will take pointers
instead.
2020-02-03 17:13:28 +00:00
Kevin Klues
adaa58b6cb Update TopologyManager.Policy.Merge() to return a simple bool
Previously, the verious Merge() policies of the TopologyManager all
eturned their own lifecycle.PodAdmitResult result. However, for
consistency in any failed admits, this is better handled in the
top-level Topology manager, with each policy only returning a boolean
about whether or not they would like to admit the pod or not. This
commit changes the semantics to match this logic.
2020-02-03 17:13:28 +00:00
Kevin Klues
95a3ac447f Fix bug in TopologManager RemoveContainer()
Previously, we unconditionally removed *all* topology hints from a pod
whenever just one container was being removed. This commit makes it so
we only remove the hints for the single container being removed, and
then conditionally remove the pod from the podTopologyHints[podUID] when
no containers left in it.
2020-02-03 17:13:14 +00:00
Alena Prokharchyk
6c3093f970 Ineffassign fixes for pkg/controller and kubelet 2020-01-30 14:35:10 -08:00
sewon.oh
463442aa29
Update container hugepage limit when creating the container
Unit test for updating container hugepage limit
Add warning message about ignoring case.
Update error handling about hugepage size requirements

Signed-off-by: sewon.oh <sewon.oh@samsung.com>
2020-01-28 09:35:02 +09:00
Kubernetes Prow Robot
98f63eee1b
Merge pull request #87460 from nolancon/policies_refactor
Refactor Topology Manager policies to reduce code duplication
2020-01-24 21:11:15 -08:00
nolancon
4d76b1c8de Add mergeFilteredHints:
- Move remaining logic from mergeProvidersHints to generic top level
mergeFilteredHints function.
- Add numaNodes as parameter in order to make generic.
- Move single NUMA node specific check to single-numa-node Merge
function.
2020-01-22 09:07:41 +00:00
nolancon
fc300e0e7d Move filterSingleNumaHints call to top level Merge 2020-01-22 08:39:22 +00:00
nolancon
45660fd3a2 Add filterProvidersHints function:
- Move initial 'filtering' functionality to generic function
filterProvidersHints level policy.go.
- Call new function from top level Merge function.
- Rename some variables/parameters to reflect changes.
2020-01-22 08:35:28 +00:00
nolancon
df9b2595f3 Update filterHints to filterSingleNumaHints:
- Change function name
- Remove policy parameter (unnecessary)
- Update unit test to reflect change
2020-01-22 07:15:00 +00:00
Kubernetes Prow Robot
9822016bf8
Merge pull request #87397 from klueska/upstream-cpu-manager-set-initial-containers
Initialize CPUManager containerMap to set of initial containers
2020-01-20 17:39:50 -08:00
Kubernetes Prow Robot
e6b5194ec1
Merge pull request #84300 from klueska/upstream-cpu-manager-reconcile-on-container-state
Update logic in `CPUManager` `reconcileState()`
2020-01-20 12:27:37 -08:00
Kevin Klues
bd9d8fa42f Initialize CPUManager containerMap to set of initial containers
A recent change made it so that the CPUManager receives a list of
initial containers that exist on the system at startup. This list can be
non-empty, for example, after a kubelet retart.

This commit ensures that the CPUManagers containerMap structure is
initialized with the containers from this list.
2020-01-20 20:42:29 +01:00
Kubernetes Prow Robot
37ee6425ef
Merge pull request #87255 from klueska/upstream-remove-redundant-active-pods-check
Remove check for empty activePods list in CPUManager removeStaleState
2020-01-20 09:05:50 -08:00
Kubernetes Prow Robot
23fa359d6c
Merge pull request #84705 from whypro/cpumanager-panic-master
Return error instead of panic when cpu manager fails on startup.
2020-01-20 07:25:37 -08:00
Kevin Klues
7be9b0fe55 Update comments and error messages in the CPUManager 2020-01-20 15:31:01 +01:00
Kevin Klues
f2acbf6607 Base CPUManager state reconciliation on container state, not pod state 2020-01-20 13:57:30 +00:00
Kevin Klues
f6cf9b8ce9 Move CPUManager Pod Status logic before container loop 2020-01-20 13:57:30 +00:00
Kubernetes Prow Robot
50f9ea7999
Merge pull request #85798 from nolancon/merge-policy-rebase
Updated - topologymanager: Add Merge method to Policy
2020-01-17 05:14:56 -08:00
Kubernetes Prow Robot
9701baea0f
Merge pull request #87283 from klueska/update-printing-for-tm-bitmask
Update bitmask printing to print in groups of 2 instead of all 64 bits
2020-01-16 12:04:32 -08:00
Kevin Klues
708278098a Update bitmask printing to print in groups of 2 instead of all 64 bits 2020-01-16 17:28:52 +01:00
Kevin Klues
7069b1d6e8 Update TopologyManager single-numa-node logic to handle "don't cares"
The logic has been updated to match the logic of the best-effort policy
except in two places:

1) The hint filtering frunction has been updated to allow "don't care"
hints encoded with a `nil` affinity mask, to pass through the filter in
addition to hints that have just a single NUMA bit set.

2) After calculating the `bestHint` we transform "don't care" affinities
encoded as having all NUMA bits set in their affinity masks into "don't
care" affinities encoded as `nil`.
2020-01-16 08:50:35 +00:00
Kevin Klues
2905ffffa7 Rename TopologyManager test TestPolicyBestEffortMerge for consistency 2020-01-16 08:50:21 +00:00
Kevin Klues
94489c137c Cleanup use of defaultAffinity in mergePermutation of TopologyManager 2020-01-16 08:50:12 +00:00
nolancon
5e23517ebf Use reflect.DeepEqual check in policy_test.go 2020-01-16 08:13:07 +00:00
nolancon
92eb7cd601 Update "Single NUMA hint generation" expected affinity to nil 2020-01-16 08:13:07 +00:00
nolancon
8b3f6e61a2 Move test case "Two providers, 1 with 2 hints, 1 with single
non-preferred hint matching" into specific policy tests
2020-01-16 08:13:07 +00:00
nolancon
681c42bfc2 Move test case "Two providers, 1 hint each, same mask, 1 preferred, 1
not 2/2" into specific policy tests
2020-01-16 08:13:07 +00:00
nolancon
a38a2562b2 Move test case "Two providers, 1 hint each, same mask, 1 preferred, 1
not 1/2" into specific policy test.
2020-01-16 08:13:07 +00:00
nolancon
f639da7637 Move test case "Two providers, 1 hint each, no common mask" into
specific policy tests.
2020-01-16 08:13:07 +00:00
nolancon
401a2bb285 Move test case "Single TopologyHint with Preferred as false and
NUMANodeAffinity as nil" into specific policy tests.
2020-01-16 08:13:06 +00:00
nolancon
6460ef6392 Move test case "Single TopologyHint with Preferred as true and
NUMANodeAffinity as nil" into specific policy tests.
2020-01-16 08:13:06 +00:00
nolancon
baeff9ec5d Move test case "HintProvider returns empty non-nil map[string][]TopologyHint from
provider" into specific policy tests.
2020-01-16 08:13:06 +00:00
nolancon
599217d482 Move test case "HintProvider returns -nil map[string][]TopologyHint from provider" into specific policy tests 2020-01-16 08:13:06 +00:00
nolancon
57661ee946 Move test case 'HintProvider returns empty non-nil map[string][]TopologyHint' into specific policy tests. 2020-01-16 08:13:06 +00:00
nolancon
51f1af0395 Move test case 'TopologyHint not set' into individual policy tests 2020-01-16 08:13:06 +00:00
nolancon
8466a5852a Restore policy_test.go to upstream
Following commits will contain incremental changes to this file to ease
review process and ensure all tests are accounted for.
2020-01-16 08:13:06 +00:00
nolancon
59bb6c4d6f Update checks in mergeProvidersHints:
- Initialize best Hint to TopologyHint{}
- Update checks.
- Move generic unit test case into policy specific tests and updated
expected outcome to reflect changes.
2020-01-16 08:13:06 +00:00
nolancon
6758f95117 Restore original policy none test cases:
Mistakenly overwritten in earlier commit
2020-01-16 08:13:06 +00:00
nolancon
2d1a535a35 Make mergePermutation generic:
- Remove policy parameters to make function generic
- Move function into top level policy.go
2020-01-16 08:13:06 +00:00
nolancon
5487941485 Refactor filterHints:
- Restructure function
- Remove bug fix for catching {nil true} - To be fixed in later commit
- Restore unit tests to original state for testing filterHints
2020-01-16 08:13:06 +00:00
nolancon
adfd11f38f Make iterateAllProviderTopologyHints generic:
- Remove policy parameters to make this function generic.
- Move function out of individual policies and into policy.go
2020-01-16 08:13:06 +00:00
nolancon
e43f0a5293 Reinstate canAdmitPodResult in policy_none:
This is to keep consistency with the other policies.
This change may be made across all policies in a future PR, but removing it
from the scope of this PR for now.
2020-01-16 08:13:05 +00:00
nolancon
4cc5b9e46c Edit hints returned from policies and unit tests:
- Best Effort Policy: Return hint with nil affinity as opposed to
defaultAffinity when provider has no preference for NUMA affinty or no
possible NUMA affinities.
- Single NUMA Node Policy: Remove defaultHint from mergeProvidersHints.
Instead return appropriate TopologyHint where required.
- Update unit tests to reflect changes. Some test cases moved into
individual policy test functions due to differing returned affinties
per policy.
2020-01-16 08:13:05 +00:00
nolancon
e3d0c9397f Updates to single-numa-node policy:
- Remove getHintMatch method.
- Replace with simplified versions of mergePermutation and
iterateAllProviderTopologyHints methods - as used in best-effort.
- Remove getHintMatch unit tests.
2020-01-16 08:13:05 +00:00
nolancon
b5ca4989e3 Update unit tests:
- Update filterHints test to reflect changes in previous commit.
- Some common test cases achieve differing expected results based on
policy due to independent merge strategies. These cases are moved into
individual policy based test functions.
2020-01-16 08:13:05 +00:00
nolancon
17d615bca2 Update filterHints:
- Only append valid preferred-true hints to filtered
- Return true if allResourceHints only consist of
nil-affinity/preferred-true hints: {nil true}, update defaultHint
preference accordingly.
2020-01-16 08:13:05 +00:00
Adrian Chiris
9f21f49493 Additional unit tests for Topology Manager methods 2020-01-16 08:13:05 +00:00
Adrian Chiris
f886d2a832 Update single-numa-node policy unit tests 2020-01-16 08:13:05 +00:00
Adrian Chiris
2825a7be1a Add new functionality for single-numa-node policy:
Explanation taken from original commit:
- Change the current method of finding the best hint.
  Instead of going over all permutations, sort the hints and find
  the narrowest hint common to all resources.
- Break out early when merging to a preferred hint is not possible
2020-01-16 08:13:05 +00:00
Adrian Chiris
5ce2ea2773 Return defaultAffinity from PolicyBestEffort:
Now that PolicySingleNUMANode is not considered here, return
defaultAffinity as was the original case before previous bug fix
2020-01-16 08:13:05 +00:00
Adrian Chiris
eda1521562 Make mergeProviderHints policy-specific:
- Remove need to pass policy and numaNodes as arguments
- Remove PolicySingleNUMANode special case check in policy_best_effort
- Add mergeProviderHints base to policy_single_numa_node for upcoming
commit
2020-01-16 08:13:05 +00:00
Adrian Chiris
dc36924c37 Update policy_none removing canAdmitPodResult
Update unit tests for none_policy
Add Name test for policy_restricted
2020-01-16 08:13:05 +00:00
Adrian Chiris
cf8b098dda Refactor policy-best-effort
- Modularize code with mergePermutation method
2020-01-16 08:13:05 +00:00
Sascha Grunert
278717bc57
Fix ineffectual assignment to CPUSets
Signed-off-by: Sascha Grunert <sgrunert@suse.com>
2020-01-16 08:57:42 +01:00
Kevin Klues
34b942a41d Remove check for empty activePods list in CPUManager removeStaleState
This check is redundant since we protect this call with a call to
`m.sourcesReady.AllReady()` earlier on. Moreover, having this check in
place means that we will leave some stale state around in cases where
there are actually no active pods in the system and this loop hasn't
cleaned them up yet. This can happen, for example, if a pod exits while
the kubelet is down for some reason. We see this exact case being
triggered in our e2e tests, where a test has been failing since October
when this change was first introduced.
2020-01-15 20:09:24 +01:00
Kevin Klues
5802f3a910 Add proper activePods list in TestGetTopologyHints for CPUManager 2020-01-15 20:08:41 +01:00
danielqsj
1a9b121764 remove deprecated metrics of kubelet 2020-01-10 16:46:52 +08:00
Kubernetes Prow Robot
fd0358fd21
Merge pull request #86689 from klueska/upstream-fix-cpumanager-v1-state-checksum
Lock checksum calculation for v1 CPUManager state to pre 1.18 logic
2020-01-08 02:57:40 -08:00
Kubernetes Prow Robot
d6412b856f
Merge pull request #84345 from danielqsj/withdialer
replace grpc.WithDialer which is deprecated
2020-01-06 15:56:17 -08:00
Kubernetes Prow Robot
9acf7d11fe
Merge pull request #86344 from klueska/upstream-cm-approver
Add klueska as an approver in pkg/kubelet/cm/OWNERS
2020-01-06 09:54:16 -08:00
Kevin Klues
b373121a14 Make CPUManagerCheckpointV2 type an alias of CPUManagerCheckpoint
This change is to prevent problems when we remove the V1->V2 migration
code in the future. Without this, the checksums of all checkpoints would
be hashed with the name CPUManagerCheckpointV2 embedded inside of them,
which is undesirable. We want the checkpoints to be hashed with the name
CPUManagerCheckpoint instead.
2019-12-28 19:29:13 +01:00
Kevin Klues
5faf8f4c52 Lock checksum calculation for v1 CPUManager state to pre 1.18 logic
The updated CPUManager from PR #84462 implements logic to migrate the
CPUManager checkpoint file from an old format to a new one. To do so, it
defines the following types:

```
type CPUManagerCheckpoint = CPUManagerCheckpointV2
type CPUManagerCheckpointV1 struct {  ...  }
type CPUManagerCheckpointV2 struct {  ...  }
```

This replaces the old definition of just:

```
type CPUManagerCheckpoint struct {  ...  }
```

Code was put in place to ensure proper migration from checkpoints in V1
format to checkpoints in V2 format. However (and this is a big however),
all of the unit tests were performed on V1 checkpoints that were
generated using the type name `CPUManagerCheckpointV1` and not the
original type name of `CPUManagerCheckpoint`. As such, the checksum in
the checkpoint file uses the `CPUManagerCheckpointV1` type to calculate
its checksum and not the original type name of `CPUManagerCheckpoint`.

This causes problems in the real world since all pre-1.18 checkpoint
files will have been generated with the original type name of
`CPUManagerCheckpoint`. When verifying the checksum of the checkpoint
file across an upgrade to 1.18, the checksum is calculated assuming
a type name of `CPUManagerCheckpointV1` (which is incorrect) and the
file is seen to be corrupt.

This patch ensures that all V1 checksums are verified against a type
name of `CPUManagerCheckpoint` instead of ``CPUManagerCheckpointV1`.
It also locks the algorithm used to calculate the checksum in place,
since it wil never change in the future (for pre-1.18 checkpoint
files at least).
2019-12-28 14:17:55 +01:00
danielqsj
19fe9f8d94 replace grpc.WithDialer which is deprecated 2019-12-26 17:46:59 +08:00
whypro
f4bd4e2e96 Return error instead of panic when cpu manager starts failed. 2019-12-19 21:56:23 +08:00
Kevin Klues
9818b4522e Add klueska as an approver in pkg/kubelet/cm/OWNERS 2019-12-17 10:40:23 +01:00
Kevin Klues
f553286156 Pass initial set of runtime containers to the CPUManager at startup
These information associatedd with these containers is used to migrate
the CPUManager state from it's old format to its new (i.e. keyed off of
podUID and containerName instead of containerID).
2019-12-11 23:02:51 +01:00
Kevin Klues
6441e1ef43 Move CPUManager Checkpoint restoration to Start() instead of New() 2019-12-11 23:02:51 +01:00
Kevin Klues
69f8053850 Update top-level CPUManager to adhere to new state semantics
For now, we just pass 'nil' as the set of 'initialContainers' for
migrating from old state semantics to new ones. In a subsequent commit
will we pull this information from higher layers so that we can pass it
down at this stage properly.
2019-12-11 23:02:51 +01:00
Kevin Klues
185e790f71 Update CPUManager policies to adhere to new state semantics 2019-12-11 23:02:51 +01:00
Kevin Klues
7c760fea38 Change CPUManager state to key off of podUID and containerName
Previously, the state was keyed off of containerID intead of podUID and
containerName. Unfortunately, this is no longer possible as we move to a
to model where we we allocate CPUs to containers at pod adit time rather
than container start time.

This patch is the first step towards full migration to the new
semantics. Only the unit tests in cpumanager/state are passing. In
subsequent commits we will update the CPUManager itself to use these new
semantics.

This patch also includes code to do migration from the old checkpoint format
to the new one, assuming the existence of a ContainerMap with the proper
mapping of (containerID)->(podUID, containerName). A subsequent commit
will update code in higher layers to make sure that this ContainerMap is
made available to this state logic.
2019-12-11 23:02:51 +01:00
Kevin Klues
9191a949ae Extend makePod() helper in CPUManager to take PodUID and ContainerName 2019-12-11 23:02:51 +01:00
Kevin Klues
7a15d3a4d7 Fix bug in parsing int to string in CPUManager tests 2019-12-11 23:02:51 +01:00
Kevin Klues
765aae93f8 Move containerMap out of static policy and into top-level CPUManager 2019-12-11 23:02:51 +01:00
Kevin Klues
1d995c98ef Update CPUmanager containerMap to allow removal by containerRef 2019-12-11 23:02:47 +01:00
Kevin Klues
0639bd0942 Change CPUManager containerMap to key off of (podUID, containerName)
Previously it keyed off of a pointer to the actual pod / container,
which was unnecessary, and hard to work with (especially on the
retrieval side).
2019-12-11 23:02:11 +01:00
Kevin Klues
3881e50cce Update CPUmanager containerMap to also return a containerRef 2019-12-11 23:01:01 +01:00
Kevin Klues
347d5f57ac Move CPUManager ContainerMap to its own package 2019-12-11 22:59:00 +01:00
Kubernetes Prow Robot
ad5d4c4705
Merge pull request #85706 from yutedz/per-node-dev
Remove nodes slice in loop of takeByTopology
2019-12-05 13:50:30 -08:00
Kubernetes Prow Robot
57b6b287d4
Merge pull request #85688 from yutedz/pods-to-rm
Reduce unnecessary Set in updateAllocatedDevices
2019-12-02 17:07:26 -08:00
Kubernetes Prow Robot
833f585104
Merge pull request #85760 from yutedz/chkpt-write-err
Log error when writing checkpoint fails
2019-12-02 10:27:06 -08:00
Ted Yu
84a9803741 Log error when writing checkpoint fails 2019-11-29 19:47:17 -08:00
Ted Yu
6415fa765e Remove nodes slice in loop of takeByTopology 2019-11-29 12:12:22 -08:00
Kubernetes Prow Robot
80eed952f0
Merge pull request #84854 from BSWANG/fix-hugetlb-cgroup
fix kubelet failed to start on setting hugetlb limits
2019-11-27 12:29:03 -08:00
Ted Yu
86f3bc25e1 Reduce unnecessary Set in updateAllocatedDevices 2019-11-27 08:48:06 -08:00
Travis Rhoden
0c5c3d8bb9
Remove pkg/util/mount (moved out of tree)
This patch removes pkg/util/mount completely, and replaces it with the
mount package now located at k8s.io/utils/mount. The code found at
k8s.io/utils/mount was moved there from pkg/util/mount, so the code is
identical, just no longer in-tree to k/k.
2019-11-15 08:29:12 -07:00
Kubernetes Prow Robot
30e6238795
Merge pull request #85147 from yutedz/devmgr-rm-contents
Continue removing file in ManagerImpl#removeContents
2019-11-14 16:38:28 -08:00
Ted Yu
fb046f7787 Continue removing file in ManagerImpl#removeContents 2019-11-13 06:00:34 -08:00
Kubernetes Prow Robot
ed10b5b17f
Merge pull request #85047 from yutedz/dev-mgr-err-handling
Handle error return from allocatePodResources
2019-11-12 11:51:27 -08:00
Kubernetes Prow Robot
897ce3073c
Merge pull request #84533 from davidz627/fix/deprecatedPath
Remove plugin watching of deprecated directory and CSI v0 support in accordance with deprecation policy
2019-11-12 04:48:20 -08:00
David Zhu
802fe12803 Remove plugin watching of deprecated directory {kubelet_root_dir}/plugins and support for CSI V0 in accordance with deprecation announcement in https://v1-13.docs.kubernetes.io/docs/setup/release/notes/ 2019-11-11 11:42:58 -08:00
Ted Yu
db0f616974 Handle error return from allocatePodResources 2019-11-09 16:25:15 -08:00
Travis Rhoden
1fd8921546
Move mount/fake.go to mount/fake_mount.go
This patch moves fake.go to mount_fake.go, and follows to principle of
always returning a discrete type rather than an Interface. All callers
of "FakeMounter" are changed to instead use "NewFakeMounter()". The
FakeMounter "Log" struct member is changed to not be exported, and
instead only access through a new "GetLog()" method.
2019-11-08 08:07:41 -07:00
mrobson
e401ee9158 Errors from cgroup destroy and pid kills are swallowed. Log a warning when that happens. 2019-11-07 07:47:57 -05:00
Kubernetes Prow Robot
73b2c82b28
Merge pull request #83592 from jianzzha/opt-reserved-cpus
added --reserved-cpus kubelet command option
2019-11-06 22:14:42 -08:00
Kubernetes Prow Robot
695c3061dd
Merge pull request #82809 from liggitt/go-1.13-no-modules
update to use go1.13.4
2019-11-06 17:02:43 -08:00
Kubernetes Prow Robot
08e5781b41
Merge pull request #84525 from klueska/upstream-fix-hint-generation-after-kubelet-restart
Fix bug in TopologyManager hint generation after kubelet restart
2019-11-06 15:33:50 -08:00
Jordan Liggitt
297570e06a hack/update-vendor.sh 2019-11-06 17:42:34 -05:00
Kubernetes Prow Robot
46472773cb
Merge pull request #84836 from yuxiaobo96/k8s-checks
Correct spelling mistakes
2019-11-06 12:21:11 -08:00
Kevin Klues
4d4d4bdd61 Ensure devicemanager TopologyHints are regenerated after kubelet restart
This patch also includes test to make sure the newly added logic works
as expected.
2019-11-06 15:01:34 +00:00
Jianzhu Zhang
89dfd24483 added --reserved-cpus kubelet command option 2019-11-06 07:33:52 -05:00
yuxiaobo
81e9f21f83 Correct spelling mistakes
Signed-off-by: yuxiaobo <yuxiaobogo@163.com>
2019-11-06 20:25:19 +08:00
bingshen.wbs
47642a0bad fix kubelet failed to start on setting hugetlb limits in non-exist cgroup dir
cause by kubelet startup be interrupted on setting list of cgroups
In the 'cgroupManagerImpl.Exists' not check&recreate the hugetlb cgroup dir. Then setting the limits in non-exist cgroup dir will cause kubelet start failed.

Signed-off-by: bingshen.wbs <bingshen.wbs@alibaba-inc.com>
2019-11-06 16:39:55 +08:00
Kubernetes Prow Robot
0c0408c790
Merge pull request #76407 from yanghaichao12/dev0411
change directory permissions from 0755 to 0750
2019-11-05 19:30:59 -08:00
Kevin Klues
9dc116eb08 Ensure CPUManager TopologyHints are regenerated after kubelet restart
This patch also includes test to make sure the newly added logic works
as expected.
2019-11-05 15:48:51 +00:00
Kevin Klues
a338c8f7fd Add some more comments to GetTopologyHints() in the devicemanager 2019-11-05 13:06:23 +00:00
Kevin Klues
58f3554ebe Sync all CPU and device state before generating TopologyHints for them
This ensures that we have the most up-to-date state when generating
topology hints for a container. Without this, it's possible that some
resources will be seen as allocated, when they are actually free.
2019-11-05 13:00:20 +00:00
Kevin Klues
d9adf20360 Abstract removeStaleState from reconcileState in CPUManager
This will become especially important as we move to a model where
exclusive CPUs are assigned at pod admission time rather than at pod
creation time.

Having this function will allow us to do garbage collection on these
CPUs anytime we are about to allocate CPUs to a new set of containers,
in addition to reclaiming state periodically in the reconcileState()
loop.
2019-11-05 12:45:11 +00:00
Kevin Klues
b5f52e6072 Modularize TopologyManager policy Merge() tests
These changes make it so that a set of common test cases can be used for
all merge strategies, with specific test cases being able to be
specified on a policy-by-policy basis.
2019-11-04 18:43:07 +01:00
Kevin Klues
7ea1fc9be4 Move TopologyManager TestPolicyMerge() to shared test file 2019-11-04 18:43:07 +01:00
Kevin Klues
d7d7bfcda0 Abstract TopologyManager Policy Merge() tests into their own function 2019-11-04 18:43:07 +01:00
Adrian Chiris
dee22d1fbc Fix comments in TopologyManager 2019-11-04 18:43:07 +01:00
Adrian Chiris
5f7db54d3c Move function from top-level TopologyManager to best-effort policy
This is in preparation for removing the special-case of the
SingleNumaNode policy in mergeProvidersHints() in favor of a custom
merging strategy with much less overhead.
2019-11-04 18:43:07 +01:00
Adrian Chiris
d95464645c Add Merge() API to TopologyManager Policy abstraction
This abstraction moves the responsibility of merging topology hints to
the individual policies themselves. As part of this, it removes the
CanAdmitPodResult() API from the policy abstraction, and rolls it into a
second return value from Merge()
2019-11-04 18:43:07 +01:00
Adrian Chiris
78d7856288 Globalize a few TopologyManager functions
This is in preparation for a larger refactoring effort that will add a
'Merge()'  API to the TopologyManager policy API.
2019-11-04 18:43:07 +01:00
Adrian Chiris
e72847676f Pass a list of NUMA nodes to the various TopologyManager policies
This is in preparation for a larger refactoring effort that will add a
'Merge()'  API to the TopologyManager policy API.
2019-11-04 18:43:07 +01:00
Adrian Chiris
6fd8a6eb69 Make restricted TopologyManager policy inherit from best-effort policy
These policies only differ on whether they admit the pod or not when a
TopologyHint is preferred or not. As such, the restricted policy should
simply inherit whatever it can from the best effort policy and only
overwrite what is necessary.

This does not matter for now, but will become important when we add a
new 'Merge()' abstraction to a Policy later on.
2019-11-04 18:43:07 +01:00
Adrian Chiris
3391daeb00 Break TopologyManager.calculateAffinity() into more modular functions
This modularization is in preparation for a larger refactoring effort
that will add a 'Merge()'  API to the TopologyManager policy API.
2019-11-04 18:43:07 +01:00
Adrian Chiris
b17706b149 Added LessThan() and IsEqual() methods for TopologyHints 2019-11-04 18:43:07 +01:00
yanghaichao12
5cbafba457 change directory permissions from 0755 to 0750 2019-11-04 17:04:37 +08:00
Kubernetes Prow Robot
002dbf6a4c
Merge pull request #83777 from lmdaly/fix-single-numa-node-with-best-effort-pods
Fixed bug in TopologyManager with SingleNUMANode Policy
2019-11-01 04:53:23 -07:00
Kubernetes Prow Robot
17a57f99d5
Merge pull request #81344 from zouyee/cpm
fix cpumanager reconcileState without sourceready
2019-10-30 23:33:36 -07:00
nolancon
b0a85177d2 Clean-up and additional test cases for socket-mask unit test. 2019-10-18 04:16:06 +01:00
Kubernetes Prow Robot
017842d49d
Merge pull request #83492 from ConnorDoyle/topo-align-all-qos
Topology manager aligns pods of all QoS classes.
2019-10-11 03:03:40 -07:00
Louise Daly
a353247d44 Fixed bug in TopologyManager with SingleNUMANode Policy
This patch fixes an issue where best-effort pods were not admitted
to the node if the single-numa-node policy was set.

This was because the Admit policy in single-numa-node policy does
not admit any pod where the hint is anything but single NUMA node. The 'best hint' in this case is {<set bits for num. Numa Nodes on machine>, true}
So on a machine with 2 NUMA nodes the best hint for a best-effort pod is {11,true} as best-effort pods have no Topology preferences.

The single-numa-node policy fails any pod with a not preferred hint OR a hint where > 1 bits are set, thus the above example resulting in termintaed pods with a Topology Affinity Error.

This is a short term fix for the single-numa-node policy, as there will be code refactoring for the 1.17 release.
2019-10-11 07:00:37 +01:00
Kubernetes Prow Robot
4561b67971
Merge pull request #83697 from klueska/fix-single-numa-with-one-provider
Fixed bug in TopologyManager with SingleNUMANode Policy
2019-10-10 19:00:33 -07:00
Kubernetes Prow Robot
3db6d3abcf
Merge pull request #83551 from dims/move-external-facing-kubelet-apis-to-staging
Move external facing kubelet apis to staging
2019-10-10 13:41:36 -07:00
Connor Doyle
a598369e3c Gofmt. 2019-10-10 12:16:21 -07:00
Connor Doyle
a9203ebdcf Topology manager aligns pods of all QoS classes. 2019-10-10 12:16:21 -07:00
Kevin Klues
5501f542cd Fixed bug in TopologyManager with SingleNUMANode Policy
This patch fixes an issue in the TopologyManager that wouldn't allow
pods to be admitted if pods were launched with the SingleNUMANode policy
and any of the hint providers had no NUMA preferences.

This is due to 2 factors:

1) Any hint provider that passes back a `nil` as its hints, has its hint
automatically transformed into a single {11 true} hint before merging

2) We added a special casing for the SingleNumaNodePolicy() in the
TopologyManager that essentially turns these hints into a
{11 false} anytime a {11 true} is seen.

The current patch reworks this logic so the that TopologyManager can
tell the difference between a "don't care" hint and a true "{11 true}"
hint returned by the hint provider. Only true "{11 true}" hints will be
converted by the special casing for the SingleNumaNodePolicy(), while
"don't care" hints will not.

This is a short term fix for this issue until we do a larger refactoring
of this code for the 1.17 release.
2019-10-09 17:41:08 -07:00
mrobson
ad3dcb9fa0 Add podCgroup to process kill events to allow for correlation 2019-10-08 13:12:48 -04:00
Kubernetes Prow Robot
d70b2db1f2
Merge pull request #83296 from yutedz/kill-cgrp-proc
Only kill process where killing failed during previous iterations
2019-10-08 07:19:13 -07:00
Kubernetes Prow Robot
3f8f0a32fa
Merge pull request #83527 from odinuge/runc-rc9
Bump dependency opencontainers/runc@v1.0.0-rc9
2019-10-08 03:45:44 -07:00
Davanum Srinivas
f29d2272c8
fix gofmt and golint failures
Change-Id: I6535b506f50558b31663a13cd270b15023afa2c6
2019-10-06 18:43:17 -04:00
Kubernetes Prow Robot
48b90db9c3
Merge pull request #83495 from tanjunchen/fix-typo
remove the repeat word in documents
2019-10-06 15:05:08 -07:00
Davanum Srinivas
6ecc0f83af
update bazel BUILD files
Change-Id: Ia3917cec1453c0b22a958faf8c22bccd79242d14
2019-10-06 15:29:23 -04:00
Davanum Srinivas
d30c489c54
Move pkg/kubelet/pluginregistration and deviceplugin
Change-Id: I06adcb43bd278b430ffad2010869e1524c8cc4ff
2019-10-06 15:28:38 -04:00
tanjunchen
de3cf23414 remove the repeat word in documents 2019-10-06 23:32:01 +08:00
Odin Ugedal
b9cfb19321
Rename cgroupsystemd.Manager to LegacyManager 2019-10-05 14:22:35 +02:00
Kubernetes Prow Robot
d60bda1971
Merge pull request #83043 from ConnorDoyle/cleanup-cpumanger-topo-hints
Delegate topology hint gen to CPU manager policy
2019-10-05 00:59:39 -07:00