Commit Graph

43518 Commits

Author SHA1 Message Date
Kevin Klues
1436e33642 Add more extensive testing for NUMA distribution algorithm in CPUManager
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-10-16 19:31:02 +00:00
Kevin Klues
cf3afb8602 Add 2 distinguishing test cases between the 2 takeByTopology algorithms
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-10-16 19:31:02 +00:00
Kevin Klues
eb78e2406b Add a new TestTakeByTopologyNUMADistributed() test to the CPUManager
As part of this, pull out all of the existing "TakeByTopology" tests and have
them be called by the original TestTakeByTopologyNUMAPacked() as well as the
new TestTakeByTopologyNUMADistributed() test. In a subsequent commit, we will
add some tests that should differ between these two algorithms.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-10-16 19:31:02 +00:00
Kevin Klues
876dd9b078 Added algorithm to CPUManager to distribute CPUs across NUMA nodes
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-10-16 19:31:02 +00:00
Kevin Klues
462544d079 Split CPUManager takeByTopology() into two different algorithms
The first implements the original algorithm which packs CPUs onto NUMA nodes if
more than one NUMA node is required to satisfy the allocation. The second
disitributes CPUs across NUMA nodes if they can't all fit into one.

The "distributing" algorithm is currently a noop and just returns an error of
"unimplemented". A subsequent commit will add the logic to implement this
algorithm according to KEP 2902:

https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2902-cpumanager-distribute-cpus-policy-option

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-10-16 14:46:19 +00:00
Kevin Klues
0e7928edce Add new CPUManager policy option for "distribute-cpus-across-numa"
This commit only adds the option to the policy options framework. A
subsequent commit will add the logic to utilize it.

The KEP describing this new option can be found here:
https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2902-cpumanager-distribute-cpus-policy-option

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-10-16 14:46:19 +00:00
Aldo Culquicondor
2c1b3fdb5b Graduate JobTrackingWithFinalizers to beta
Enable feature by default.

Update integration tests for other features to assume that finalizers are present.

Change-Id: Ie969344f572627dba882c0e862e5700dadaf3026
2021-10-15 10:29:40 -04:00
kerthcet
fc9533e72f remove scheduler ServiceAffinity plugin
Signed-off-by: kerthcet <kerthcet@gmail.com>
2021-10-15 22:10:31 +08:00
Konstantin Misyutin
dbc9d7b71a Remove tests when StorageObjectInUseProtection feature is disabled
As well as feature gate are locked, the tests when this feature is
disabled will crash. So we should remove them together with locking
the feature.

Signed-off-by: Konstantin Misyutin <konstantin.misyutin@huawei.com>
2021-10-15 19:39:37 +08:00
Konstantin Misyutin
e07d736522 Lock StorageObjectInUseProtection feature gate to default
Signed-off-by: Konstantin Misyutin <konstantin.misyutin@huawei.com>
2021-10-15 19:36:53 +08:00
yuzhiquanlong
27fe56e916 remove unused import 2021-10-15 18:40:31 +08:00
Francesco Romani
4bae656835 cpumanager: test NUMA node support for CPU assign (2)
This batch of tests adds a fake topology on which each numa node
has multiple sockets. We didn't find yet a real HW topology in the wild
like this, but we need one to fully exercise the code.

So, until we find a HW topology, we add a fake one flipping
the NUMA/socket config of the existing xeon dual gold 6320.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-10-15 10:29:21 +00:00
Francesco Romani
547996f3f6 cpumanager: test NUMA node support for CPU assign (1)
This batch of tests adds a real topology on which each physical socket
has multiple NUMA zones. Taken by a real dual xeon 6320 gold.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-10-15 10:29:21 +00:00
Francesco Romani
f6ccc4426a cpumanager: test: use proper subtests
The exisiting unit tests where performing subtests without
actually using the full features of the testing package
(https://pkg.go.dev/testing#hdr-Subtests_and_Sub_benchmarks)

Update them with fairly minimal changes. The patch is deceptively
large because we need to move the code inside a new block.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-10-15 10:29:21 +00:00
Francesco Romani
15caa134b2 cpumanager: topology: use rich cmp package
User the `cmp.Diff` package in the unit tests, moving away from
`reflect.DeepEqual`. This gives us a clearer picture of the differences
when the tests fail.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-10-15 10:29:21 +00:00
Kevin Klues
aff54a0914 Abstract out whether NUMA or Sockets come first in the memory hierarchy
This allows us to get rid of the check for determining which one is higher all
throughout the code. Now we just check once and instantiate an interface of the
appropriate type that makes sure the ordering in the hierarchy is preserved
through the appropriate calls.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-10-15 10:29:15 +00:00
yuzhiquanlong
be9e1fda5e remove format pods func, instead with klog.Kobjs 2021-10-15 18:26:02 +08:00
Kevin Klues
17c7e86c6d Add NUMA support to the CPU assignment algorithm in the CPUManager
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-10-15 08:35:59 +00:00
Shiming Zhang
e47c78a354 Add log for creating node shutdown manager 2021-10-15 11:16:21 +08:00
Shiming Zhang
b468c24e85 Refactor to use structure to pass parameters 2021-10-15 11:16:21 +08:00
Kubernetes Prow Robot
30a32a39a4 Merge pull request #105136 from astraw99/fix-csi-mount-log
Fix CSI `mounter.TearDownAt` log msg
2021-10-14 11:54:55 -07:00
Kubernetes Prow Robot
0bfa37dfcc Merge pull request #105676 from alculquicondor/job-name
Fix name for Pods of NonIndexed Jobs
2021-10-14 10:50:12 -07:00
Shivanshu Raj Shrivastava
7d9a6d1de6 Migrated pkg/proxy/ipvs to structured logging (#104932)
* migrated ipset.go

* migrated graceful_termination.go

* fixed vstring

* fixed ip set entry, made it consistent

* fixed rs logging

* resolving review comments for key graceful_termination.go

* refactoring ipset.go

* included review changes
2021-10-14 09:47:29 -07:00
Shivanshu Raj Shrivastava
daf5af2917 Migrated pkg/proxy to structured logging (#104891)
* migrated service.go to structured logging

* fixing capital letter in starting

* migrated topology.go

* migrated endpointslicecache.go

* migrated endpoints.go

* nit typo

* nit plural to singular

* fixed format

* code formatting

* resolving review comment for key ipFamily

* resolving review comment for key endpoints.go

* code formating

* Converted Warningf to ErrorS, wherever applicable

* included review changes

* included review changes
2021-10-14 09:47:17 -07:00
Kubernetes Prow Robot
dea052ceba Merge pull request #105479 from ahg-g/ahg-mutable
Allow updating scheduling directives of suspended jobs that never started
2021-10-14 08:09:18 -07:00
Aldo Culquicondor
4ef9d18abe Fix name for Pods of NonIndexed Jobs
Change-Id: I0ea4685a82f4cdec0caab362d52144476652f95a
2021-10-14 10:55:46 -04:00
Abdullah Gharaibeh
335817cbce Allow updating node affinity, selector and tolerations for suspended jobs that never started 2021-10-14 10:04:47 -04:00
Kubernetes Prow Robot
3aafe75698 Merge pull request #105461 from damemi/wire-contexts-autoscaling
Wire contexts to Autoscaling controllers
2021-10-14 06:59:33 -07:00
Kubernetes Prow Robot
f27e4714ba Merge pull request #105377 from damemi/wire-contexts-apps
Wire contexts to Apps controllers
2021-10-14 06:59:19 -07:00
Kubernetes Prow Robot
baaa53db64 Merge pull request #105211 from xiaopingrubyist/fix-pv-controller-claim-cache-issue
fix:claim cached in pvcontroller is not the newest may cause unexpected issue
2021-10-14 05:47:18 -07:00
Kubernetes Prow Robot
a8bda48abe Merge pull request #105474 from mauriciopoppe/readd-volume-subpath-flag
Add VolumeSubpath feature gate back in preparation for its removal
2021-10-13 21:55:28 -07:00
CKchen0726
f1c523cfa6 remove storageOperationErrorMetric and storageOperationStatusMetric in 1.21 release 2021-10-14 12:03:58 +08:00
astraw99
5e789f157c fix CSI mount log 2021-10-14 10:27:50 +08:00
Kubernetes Prow Robot
894ceb63d0 Merge pull request #105003 from swatisehgal/getallocatable-to-beta
podresource-api: getAllocatableResources to Beta
2021-10-13 17:43:27 -07:00
Mike Dame
41fcb95f2f Wire contexts to Apps controllers 2021-10-13 16:32:13 -04:00
Jefftree
8cb2b798c6 Feature flag openapi v3 2021-10-13 09:40:57 -07:00
torubylist
f28a8d7f2b fix:cached claim is not the newest will cause unexpected issue 2021-10-13 20:03:00 +08:00
Mike Dame
7780024916 Wire contexts to Autoscaling controllers 2021-10-12 14:34:05 -04:00
Maciej Szulik
8322121434 Move test-related utils to test/utils 2021-10-12 14:52:19 +02:00
Maciej Szulik
1fb6bf8a14 Wire context instead of TODO 2021-10-12 13:21:45 +02:00
Kubernetes Prow Robot
a923852ba0 Merge pull request #105215 from rphillips/add_probe_shutdown
kubelet: add probe termination to graceful shutdowns
2021-10-11 21:19:46 -07:00
Kubernetes Prow Robot
67afa05c17 Merge pull request #105531 from aojea/master_leases
improve error message on control-plane endpoint reconciler
2021-10-11 15:01:02 -07:00
Patrick Ohly
a8c930ef46 generic ephemeral volume: graduation to GA
The feature gate gets locked to "true", with the goal to remove it in two
releases.

All code now can assume that the feature is enabled. Tests for "feature
disabled" are no longer needed and get removed.

Some code wasn't using the new helper functions yet. That gets changed while
touching those lines.
2021-10-11 20:54:20 +02:00
nolancon
6bbb36df10 Additional cases for reconcileState testing 2021-10-11 16:17:21 +00:00
Patrick Ohly
bc263f3ba5 scheduler: use generic ephemeral volume helper functions
The name concatenation and ownership check were originally considered small
enough to not warrant dedicated functions, but the intent of the code is more
readable with them.
2021-10-11 17:33:57 +02:00
Kubernetes Prow Robot
dc9c571166 Merge pull request #105569 from pohly/generic-ephemeral-kubelet-volume-stats
kubelet: also provide filesystem stats for generic ephemeral volumes
2021-10-11 07:52:39 -07:00
Kubernetes Prow Robot
1f2813368e Merge pull request #105542 from pohly/generic-ephemeral-volume-util-kubelet
kubelet: use generic ephemeral volume helper functions
2021-10-11 02:16:40 -07:00
Kubernetes Prow Robot
fb82a0d7eb Merge pull request #104873 from pohly/json-output-stream
JSON output streams
2021-10-10 17:04:37 -07:00
Patrick Ohly
b22263d835 component-base: configurable JSON output
This implements the replacement of klog output to different files per level
with optionally splitting JSON output into two streams: one for info messages
on stdout, one for error messages on stderr. The info messages can get buffered
to increase performance. Because stdout and stderr might be merged by the
consumer, the info stream gets flushed before writing an error, to ensure that
the order of messages is preserved.

This also ensures that the following code pattern doesn't leak info messages:
   klog.ErrorS(err, ...)
   os.Exit(1)

Commands explicitly have to flush before exiting via logs.FlushLogs. Most
already do. But buffered info messages can still get lost during an unexpected
program termination, therefore buffering is off by default.

The new options get added to the v1alpha1 LoggingConfiguration with new command
line flags. Because it is an alpha field, changing it inside the v1beta kubelet
config should be okay as long as the fields are clearly marked as alpha.
2021-10-09 10:10:35 +02:00
goofy-z
d2a0332e75 update extension point PostFilter comment 2021-10-09 14:26:09 +08:00