Commit Graph

44022 Commits

Author SHA1 Message Date
Abirdcfly
e35cfbb5a7 fix: some function should pass context parameter
Change-Id: Ib509573a72c8bd0c61233ade415fef470c61bf5f
2022-03-04 00:42:45 +08:00
Kubernetes Prow Robot
88f9728339 Merge pull request #108309 from zshihang/token
no auto-generation of secret-based service account token
2022-03-02 06:19:15 -08:00
Kubernetes Prow Robot
422001df8b Merge pull request #108154 from klueska/fix-topology-manager
Update TopologyManager algorithm for selecting "best" non-preferred hint
2022-03-02 04:13:13 -08:00
Kubernetes Prow Robot
0dc83fe696 Merge pull request #108078 from tnqn/endpoints-resync
Skip updating Endpoints if no relevant fields change
2022-03-01 23:23:13 -08:00
Kubernetes Prow Robot
2de37aa9fa Merge pull request #108276 from AllenZMC/improve_util_test_coverage
improve test coverage
2022-03-01 16:57:13 -08:00
Kubernetes Prow Robot
604ab4fc6c Merge pull request #108340 from ArangoGutierrez/misspelled/1
Fix typo in pkg/kubelet/pluginmanager/cache/actual_state_of_world
2022-03-01 15:45:55 -08:00
Kubernetes Prow Robot
5d6a793221 Merge pull request #96828 from panjf2000/opt-epoll-eventfd
kubelet/eviction: eliminate redundant allocations when handling eventfd
2022-03-01 13:59:54 -08:00
Kubernetes Prow Robot
66daef4aa7 Merge pull request #108167 from jfremy/fix-107973
Fix nodes volumesAttached status not being updated
2022-03-01 12:49:54 -08:00
Kubernetes Prow Robot
0e8e307567 Merge pull request #106570 from odinuge/fix-cpu-shares-on-big-systems
Fix cpu share issues on systems with large amounts of cpu
2022-03-01 10:15:55 -08:00
Kevin Klues
e370b7335c Add extensive unit testing for TopologyManager hint generation algorithm
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2022-03-01 17:30:24 +00:00
Kubernetes Prow Robot
46e78c1b80 Merge pull request #108407 from kerthcet/feature/graduate-defaultPodTopologySpread-to-ga-in-kube-feature
update feature gate DefaultPodTopologySpread release note
2022-03-01 08:53:47 -08:00
Kevin Klues
99c57828ce Update TopologyManager algorithm for selecting "best" non-preferred hint
For the 'single-numa' and 'restricted' TopologyManager policies, pods are only
admitted if all of their containers have perfect alignment across the set of
resources they are requesting. The best-effort policy, on the other hand, will
prefer allocations that have perfect alignment, but fall back to a non-preferred
alignment if perfect alignment can't be achieved.

The existing algorithm of how to choose the best hint from the set of
"non-preferred" hints is fairly naive and often results in choosing a
sub-optimal hint. It works fine in cases where all resources would end up
coming from a single NUMA node (even if its not the same NUMA nodes), but
breaks down as soon as multiple NUMA nodes are required for the "best"
alignment.  We will never be able to achieve perfect alignment with these
non-preferred hints, but we should try and do something more intelligent than
simply choosing the hint with the narrowest mask.

In an ideal world, we would have the TopologyManager return a set of
"resources-relative" hints (as opposed to a common hint for all resources as is
done today). Each resource-relative hint would indicate how many other
resources could be aligned to it on a given NUMA node, and a  hint provider
would use this information to allocate its resources in the most aligned way
possible. There are likely some edge cases to consider here, but such an
algorithm would allow us to do partial-perfect-alignment of "some" resources,
even if all resources could not be perfectly aligned.

Unfortunately, supporting something like this would require a major redesign to
how the TopologyManager interacts with its hint providers (as well as how those
hint providers make decisions based on the hints they get back).

That said, we can still do better than the naive algorithm we have today, and
this patch provides a mechanism to do so.

We start by looking at the set of hints passed into the TopologyManager for
each resource and generate a list of the minimum number of NUMA nodes required
to satisfy an allocation for a given resource. Each entry in this list then
contains the 'minNUMAAffinity.Count()' for a given resources. Once we have this
list, we find the *maximum* 'minNUMAAffinity.Count()' from the list and mark
that as the 'bestNonPreferredAffinityCount' that we would like to have
associated with whatever "bestHint" we ultimately generate. The intuition being
that we would like to (at the very least) get alignment for those resources
that *require* multiple NUMA nodes to satisfy their allocation. If we can't
quite get there, then we should try to come as close to it as possible.

Once we have this 'bestNonPreferredAffinityCount', the algorithm proceeds as
follows:

If the mergedHint and bestHint are both non-preferred, then try and find a hint
whose affinity count is as close to (but not higher than) the
bestNonPreferredAffinityCount as possible. To do this we need to consider the
following cases and react accordingly:

  1. bestHint.NUMANodeAffinity.Count() >  bestNonPreferredAffinityCount
  2. bestHint.NUMANodeAffinity.Count() == bestNonPreferredAffinityCount
  3. bestHint.NUMANodeAffinity.Count() <  bestNonPreferredAffinityCount

For case (1), the current bestHint is larger than the
bestNonPreferredAffinityCount, so updating to any narrower mergeHint is
preferred over staying where we are.

For case (2), the current bestHint is equal to the
bestNonPreferredAffinityCount, so we would like to stick with what we have
*unless* the current mergedHint is also equal to bestNonPreferredAffinityCount
and it is narrower.

For case (3), the current bestHint is less than bestNonPreferredAffinityCount,
so we would like to creep back up to bestNonPreferredAffinityCount as close as
we can. There are three cases to consider here:

  3a. mergedHint.NUMANodeAffinity.Count() >  bestNonPreferredAffinityCount
  3b. mergedHint.NUMANodeAffinity.Count() == bestNonPreferredAffinityCount
  3c. mergedHint.NUMANodeAffinity.Count() <  bestNonPreferredAffinityCount

For case (3a), we just want to stick with the current bestHint because choosing
a new hint that is greater than bestNonPreferredAffinityCount would be
counter-productive.

For case (3b), we want to immediately update bestHint to the current
mergedHint, making it now equal to bestNonPreferredAffinityCount.

For case (3c), we know that *both* the current bestHint and the current
mergedHint are less than bestNonPreferredAffinityCount, so we want to choose
one that brings us back up as close to bestNonPreferredAffinityCount as
possible. There are three cases to consider here:

  3ca. mergedHint.NUMANodeAffinity.Count() >  bestHint.NUMANodeAffinity.Count()
  3cb. mergedHint.NUMANodeAffinity.Count() <  bestHint.NUMANodeAffinity.Count()
  3cc. mergedHint.NUMANodeAffinity.Count() == bestHint.NUMANodeAffinity.Count()

For case (3ca), we want to immediately update bestHint to mergedHint because
that will bring us closer to the (higher) value of
bestNonPreferredAffinityCount.

For case (3cb), we want to stick with the current bestHint because choosing the
current mergedHint would strictly move us further away from the
bestNonPreferredAffinityCount.

Finally, for case (3cc), we know that the current bestHint and the current
mergedHint are equal, so we simply choose the narrower of the 2.

This patch implements this algorithm for the case where we must choose from a
set of non-preferred hints and provides a set of unit-tests to verify its
correctness.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2022-03-01 14:38:26 +00:00
Kubernetes Prow Robot
50e07c9a12 Merge pull request #108381 from thockin/add-generated-openapi
Add the last zz_generated.openapi.go file
2022-03-01 06:35:48 -08:00
kerthcet
9c5898ba90 feat: update feature gate DefaultPodTopologySpread release note
Signed-off-by: kerthcet <kerthcet@gmail.com>
2022-03-01 19:06:24 +08:00
Kubernetes Prow Robot
bef9d807a0 Merge pull request #108325 from pacoxu/donotReturnErrWhenPauseLose
do not return err when PodSandbox not exist
2022-02-28 18:15:46 -08:00
Kubernetes Prow Robot
e9ba9dc4e4 Merge pull request #107201 from pacoxu/add-metrics-volume-stats-cal
add VolumeStatCalDuration metrics for fsquato monitoring benchmark
2022-02-28 16:07:46 -08:00
Kevin Klues
f8601cb5a3 Refactor TopologyManager to be more explicit about bestHint calculation
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2022-02-28 20:30:01 +00:00
Kubernetes Prow Robot
f4df7d0eb2 Merge pull request #108393 from ialidzhikov/nit/ingressclassnamespacedparams
Correct comment related to IngressClassNamespacedParams feature gate
2022-02-28 12:23:46 -08:00
Tim Hockin
f9e19fc83e Add the last zz_generated.openapi.go file
We had 4 of 5 checked in.
2022-02-28 10:17:54 -08:00
ialidzhikov
856cf94d55 Correct comment related to IngressClassNamespacedParams feature gate
Signed-off-by: ialidzhikov <i.alidjikov@gmail.com>
2022-02-28 18:55:13 +02:00
Kubernetes Prow Robot
bf7b9119f0 Merge pull request #108278 from kerthcet/feature/graduate-defaultPodTopologySpread-to-ga
graduate default pod topology spread to ga
2022-02-28 08:02:57 -08:00
Kubernetes Prow Robot
d3ece70f0b Merge pull request #108269 from kerthcet/refactor/rename-schedulercache-to-cache
refactor: rename SchedulerCache to Cache in Scheduler
2022-02-24 14:46:13 -08:00
Carlos Eduardo Arango Gutierrez
bbb8ef1d10 Fix typo in pkg/kubelet/pluginmanager/cache/actual_state_of_world
Signed-off-by: Carlos Eduardo Arango Gutierrez <carangog@redhat.com>
2022-02-24 16:20:24 -05:00
Kubernetes Prow Robot
06e107081e Merge pull request #104732 from mengjiao-liu/remove-flag-experimental-check-node-capabilities-before-mount
kubelet: Remove the deprecated flag `--experimental-check-node-capabilities-before-mount`
2022-02-24 07:56:30 -08:00
chenyw1990
e26df3594c do not return err when PodSandbox not exist
Co-authored-by: pacoxu <paco.xu@daocloud.io>
2022-02-24 14:58:39 +08:00
kerthcet
eafbaad9f7 refactor: rename SchedulerCache to Cache in Scheduler
Signed-off-by: kerthcet <kerthcet@gmail.com>
2022-02-24 09:47:21 +08:00
kerthcet
09623be0b1 refactor: rename schedulerCache to cacheImpl in internal cache
Signed-off-by: kerthcet <kerthcet@gmail.com>
2022-02-24 09:42:51 +08:00
Kubernetes Prow Robot
2fcdbd098c Merge pull request #107993 from deads2k/simplify
prevent enabling beta by default for new api groups
2022-02-23 16:03:35 -08:00
Kubernetes Prow Robot
77eb1a03df Merge pull request #94637 from liggitt/namespace-before-admission
set/validate object namespace before admission
2022-02-23 14:35:58 -08:00
Shihang Zhang
fb6c727fde no auto-generation of secret-based service account token 2022-02-23 14:17:30 -08:00
Kubernetes Prow Robot
08c31088c1 Merge pull request #106858 from cmssczy/add_RegisterWithTaints_validation_test
add kubelet config validation test for RegisterWithTaints
2022-02-23 12:51:58 -08:00
David Eads
af99d192cf prevent enabling beta by default for new api groups 2022-02-23 13:51:43 -05:00
David Eads
a59b92e8c0 reduce API surface area of whether a resource is enabled 2022-02-23 13:36:33 -05:00
Kubernetes Prow Robot
343125cc6c Merge pull request #107997 from d-honeybadger/fix-tracking-cronjob-owned-jobs
Fix cronjob status reconciliation when job template labels change
2022-02-23 07:14:18 -08:00
d-honeybadger
fb094dc44e cronjob_controllerv2: do not filter jobs to be reconciled by labels 2022-02-23 09:10:33 -05:00
kerthcet
4439fc3590 feat: graduate DefaultPodTopologySpread to GA
Co-authored-by: drfish <drfish.me@gmail.com>
Signed-off-by: kerthcet <kerthcet@gmail.com>
2022-02-23 19:45:27 +08:00
Kubernetes Prow Robot
296bf4f016 Merge pull request #108230 from sanposhiho/fake-extender-name
Support ExtenderName in FakeExtender
2022-02-22 21:36:18 -08:00
Kubernetes Prow Robot
eacbf87bfe Merge pull request #108156 from jsafrane/rename-selinuxsupport
Rename SupportsSELinux to SELinuxRelabel
2022-02-22 20:12:20 -08:00
sanposhiho
0b16a7fefa Support ExtenderName in FakeExtender 2022-02-23 12:14:39 +09:00
Kubernetes Prow Robot
5211a4b214 Merge pull request #103061 from SergeyKanzhelev/removeAlphaRuntimeClass
Remove RuntimeClass feature gate and stop serving older versions of RuntimeClass
2022-02-22 19:08:18 -08:00
Kubernetes Prow Robot
bb610d0816 Merge pull request #108280 from liggitt/secrets
Update secrets field API doc
2022-02-22 17:48:18 -08:00
Kubernetes Prow Robot
8f3636e8ac Merge pull request #108224 from danwinship/kube-proxy-logging
Only log full iptables-restore input at V(9)
2022-02-22 16:42:18 -08:00
Kubernetes Prow Robot
a2adaf75b7 Merge pull request #108205 from dkkb/fix/typo
Fix typo allcoated -> allocated
2022-02-22 14:35:03 -08:00
Jean-Francois Remy
e83184568d Add unit tests
- actual_state_of_world_test.go: test the new method GetVolumesToReportAttachedForNode
  for an existing node and a non-existing node
- node_status_updater_test.go: test UpdateNodeStatuses and UpdateNodeStatuses in nominal
  case with 2 nodes getting one volume each. Test UpdateNodeStatuses with the first call
  to node.patch failing but the following one succeeding
- add comment in node_status_updater.go
- fix log line in reconciler.go
- rename variable in actual_state_of_world.go
2022-02-22 12:21:58 -08:00
Jean-Francois Remy
f1717baaaa Fix nodes volumesAttached status not updated
The UpdateNodeStatuses code stops too early in case there is
an error when calling updateNodeStatus. It will return immediately
which means any remaining node won't have its update status put back
to true.

Looking at the call sites for UpdateNodeStatuses, it appears this is
not the only issue. If the lister call fails with anything but a Not Found
error, it's silently ignored which is wrong in the detach path.
Also the reconciler detach path calls UpdateNodeStatuses but the real intent
is to only update the node currently processed in the loop and not proceed
with the detach call if there is an error updating that specifi node volumesAttached
property. With the current implementation, it will not proceed if there is
an error updating another node (which is not completely bad but not ideal) and
worse it will proceed if there is a lister error on that node which means the
node volumesAttached property won't have been updated.

To fix those issues, introduce the following changes:
- [node_status_updater] introduce UpdateNodeStatusForNode which does what
  UpdateNodeStatuses does but only for the provided node
- [node_status_updater] if the node lister call fails for anything but a Not
  Found error, we will return an error, not ignore it
- [node_status_updater] if the update of a node volumesAttached properties fails
  we continue processing the other nodes
- [actual_state_of_world] introduce GetVolumesToReportAttachedForNode which
  does what GetVolumesToReportAttached but for the node whose name is provided
  it returns a bool which indicates if the node in question needs an update as
  well as the volumesAttached list. It is used by UpdateNodeStatusForNode
- [actual_state_of_world] use write lock in updateNodeStatusUpdateNeeded, we're
  modifying the map content
- [reconciler] use UpdateNodeStatusForNode in the detach loop
2022-02-22 12:20:53 -08:00
Sergey Kanzhelev
06ee2969ef do not serve node.k8s.io, version v1alpha1 2022-02-22 18:30:24 +00:00
Kubernetes Prow Robot
b917653296 Merge pull request #108263 from deads2k/more-resthandlers
migrate more rest handlers to select by resource enablement
2022-02-22 10:15:16 -08:00
Jordan Liggitt
6b09e232cd Update secrets field API doc 2022-02-22 13:12:03 -05:00
David Eads
0ec20f97d2 migrate more rest handlers to select by resource enablement 2022-02-22 12:07:43 -05:00
Kubernetes Prow Robot
108e8136e2 Merge pull request #107393 from danwinship/filter-endpoints
kube-proxy endpoint filtering unit test refactoring
2022-02-22 08:55:15 -08:00