Commit Graph

43206 Commits

Author SHA1 Message Date
Kubernetes Prow Robot
dea14dbdcc Merge pull request #105495 from ikeeip/storageobjectinuseprotection_lock_to_default
Lock StorageObjectInUseProtection feature gate to default
2021-10-19 21:45:57 -07:00
Kubernetes Prow Robot
c733594040 Merge pull request #105687 from alculquicondor/job-tracking
Graduate JobTrackingWithFinalizers to beta
2021-10-19 11:40:37 -07:00
Kubernetes Prow Robot
b2c4269992 Merge pull request #105631 from klueska/upstream-distribute-cpus-across-numa
Add CPUManager policy option to distribute CPUs across NUMA nodes instead of packing them
2021-10-19 11:40:24 -07:00
Kubernetes Prow Robot
2dbdd9461d Merge pull request #105748 from marosset/host-process-emphemeral-contianer-validation
Adding unit test coverage for API validation for ephemeral containers in hostprocess pods on Windows
2021-10-19 08:11:04 -07:00
Kubernetes Prow Robot
1af8a8c026 Merge pull request #105465 from marosset/remove-host-process-contianer-kubelet-annotations
Stop passing WindowsHostProcessContainer annotations for CRI calls in kubelet
2021-10-18 15:50:02 -07:00
Kubernetes Prow Robot
e595d79dfc Merge pull request #104574 from 249043822/br-repeat-package
fix duplicate package import in pod_worker
2021-10-18 15:49:46 -07:00
Mark Rossetti
3ddff55fe6 Adding unit test coverage for API validation for emphermal contaienrs in hostprocess pods on Windows
Signed-off-by: Mark Rossetti <marosset@microsoft.com>
2021-10-18 15:46:27 -07:00
Kubernetes Prow Robot
5889fb4fbc Merge pull request #105652 from wzshiming/feat/structure-shutdown-config
Refactor to use structure to pass parameters for GracefulNodeShutdown
2021-10-18 14:45:20 -07:00
Kevin Klues
86f9c266bc Add optimizations to reduce iterations in distributed NUMA algorithm
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-10-18 08:53:25 +00:00
Kevin Klues
70e0f47191 Support full-pcpus-only with the new NUMA distribution policy option
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-10-16 19:31:02 +00:00
Kevin Klues
d54445a84d Generalize the NUMA distribution algorithm to take cpuGroupSize
This parameter ensures that CPUs are always allocated in groups of size
'cpuGroupSize'. This is important, for example, to ensure that all CPUs (i.e.
hyperthreads) from the same core are handed out together.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-10-16 19:31:02 +00:00
Kevin Klues
1436e33642 Add more extensive testing for NUMA distribution algorithm in CPUManager
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-10-16 19:31:02 +00:00
Kevin Klues
cf3afb8602 Add 2 distinguishing test cases between the 2 takeByTopology algorithms
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-10-16 19:31:02 +00:00
Kevin Klues
eb78e2406b Add a new TestTakeByTopologyNUMADistributed() test to the CPUManager
As part of this, pull out all of the existing "TakeByTopology" tests and have
them be called by the original TestTakeByTopologyNUMAPacked() as well as the
new TestTakeByTopologyNUMADistributed() test. In a subsequent commit, we will
add some tests that should differ between these two algorithms.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-10-16 19:31:02 +00:00
Kevin Klues
876dd9b078 Added algorithm to CPUManager to distribute CPUs across NUMA nodes
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-10-16 19:31:02 +00:00
Kevin Klues
462544d079 Split CPUManager takeByTopology() into two different algorithms
The first implements the original algorithm which packs CPUs onto NUMA nodes if
more than one NUMA node is required to satisfy the allocation. The second
disitributes CPUs across NUMA nodes if they can't all fit into one.

The "distributing" algorithm is currently a noop and just returns an error of
"unimplemented". A subsequent commit will add the logic to implement this
algorithm according to KEP 2902:

https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2902-cpumanager-distribute-cpus-policy-option

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-10-16 14:46:19 +00:00
Kevin Klues
0e7928edce Add new CPUManager policy option for "distribute-cpus-across-numa"
This commit only adds the option to the policy options framework. A
subsequent commit will add the logic to utilize it.

The KEP describing this new option can be found here:
https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2902-cpumanager-distribute-cpus-policy-option

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-10-16 14:46:19 +00:00
Aldo Culquicondor
2c1b3fdb5b Graduate JobTrackingWithFinalizers to beta
Enable feature by default.

Update integration tests for other features to assume that finalizers are present.

Change-Id: Ie969344f572627dba882c0e862e5700dadaf3026
2021-10-15 10:29:40 -04:00
Konstantin Misyutin
dbc9d7b71a Remove tests when StorageObjectInUseProtection feature is disabled
As well as feature gate are locked, the tests when this feature is
disabled will crash. So we should remove them together with locking
the feature.

Signed-off-by: Konstantin Misyutin <konstantin.misyutin@huawei.com>
2021-10-15 19:39:37 +08:00
Konstantin Misyutin
e07d736522 Lock StorageObjectInUseProtection feature gate to default
Signed-off-by: Konstantin Misyutin <konstantin.misyutin@huawei.com>
2021-10-15 19:36:53 +08:00
Francesco Romani
4bae656835 cpumanager: test NUMA node support for CPU assign (2)
This batch of tests adds a fake topology on which each numa node
has multiple sockets. We didn't find yet a real HW topology in the wild
like this, but we need one to fully exercise the code.

So, until we find a HW topology, we add a fake one flipping
the NUMA/socket config of the existing xeon dual gold 6320.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-10-15 10:29:21 +00:00
Francesco Romani
547996f3f6 cpumanager: test NUMA node support for CPU assign (1)
This batch of tests adds a real topology on which each physical socket
has multiple NUMA zones. Taken by a real dual xeon 6320 gold.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-10-15 10:29:21 +00:00
Francesco Romani
f6ccc4426a cpumanager: test: use proper subtests
The exisiting unit tests where performing subtests without
actually using the full features of the testing package
(https://pkg.go.dev/testing#hdr-Subtests_and_Sub_benchmarks)

Update them with fairly minimal changes. The patch is deceptively
large because we need to move the code inside a new block.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-10-15 10:29:21 +00:00
Francesco Romani
15caa134b2 cpumanager: topology: use rich cmp package
User the `cmp.Diff` package in the unit tests, moving away from
`reflect.DeepEqual`. This gives us a clearer picture of the differences
when the tests fail.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-10-15 10:29:21 +00:00
Kevin Klues
aff54a0914 Abstract out whether NUMA or Sockets come first in the memory hierarchy
This allows us to get rid of the check for determining which one is higher all
throughout the code. Now we just check once and instantiate an interface of the
appropriate type that makes sure the ordering in the hierarchy is preserved
through the appropriate calls.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-10-15 10:29:15 +00:00
Kevin Klues
17c7e86c6d Add NUMA support to the CPU assignment algorithm in the CPUManager
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-10-15 08:35:59 +00:00
Shiming Zhang
e47c78a354 Add log for creating node shutdown manager 2021-10-15 11:16:21 +08:00
Shiming Zhang
b468c24e85 Refactor to use structure to pass parameters 2021-10-15 11:16:21 +08:00
Kubernetes Prow Robot
30a32a39a4 Merge pull request #105136 from astraw99/fix-csi-mount-log
Fix CSI `mounter.TearDownAt` log msg
2021-10-14 11:54:55 -07:00
Kubernetes Prow Robot
0bfa37dfcc Merge pull request #105676 from alculquicondor/job-name
Fix name for Pods of NonIndexed Jobs
2021-10-14 10:50:12 -07:00
Shivanshu Raj Shrivastava
7d9a6d1de6 Migrated pkg/proxy/ipvs to structured logging (#104932)
* migrated ipset.go

* migrated graceful_termination.go

* fixed vstring

* fixed ip set entry, made it consistent

* fixed rs logging

* resolving review comments for key graceful_termination.go

* refactoring ipset.go

* included review changes
2021-10-14 09:47:29 -07:00
Shivanshu Raj Shrivastava
daf5af2917 Migrated pkg/proxy to structured logging (#104891)
* migrated service.go to structured logging

* fixing capital letter in starting

* migrated topology.go

* migrated endpointslicecache.go

* migrated endpoints.go

* nit typo

* nit plural to singular

* fixed format

* code formatting

* resolving review comment for key ipFamily

* resolving review comment for key endpoints.go

* code formating

* Converted Warningf to ErrorS, wherever applicable

* included review changes

* included review changes
2021-10-14 09:47:17 -07:00
Kubernetes Prow Robot
dea052ceba Merge pull request #105479 from ahg-g/ahg-mutable
Allow updating scheduling directives of suspended jobs that never started
2021-10-14 08:09:18 -07:00
Aldo Culquicondor
4ef9d18abe Fix name for Pods of NonIndexed Jobs
Change-Id: I0ea4685a82f4cdec0caab362d52144476652f95a
2021-10-14 10:55:46 -04:00
Abdullah Gharaibeh
335817cbce Allow updating node affinity, selector and tolerations for suspended jobs that never started 2021-10-14 10:04:47 -04:00
Kubernetes Prow Robot
3aafe75698 Merge pull request #105461 from damemi/wire-contexts-autoscaling
Wire contexts to Autoscaling controllers
2021-10-14 06:59:33 -07:00
Kubernetes Prow Robot
f27e4714ba Merge pull request #105377 from damemi/wire-contexts-apps
Wire contexts to Apps controllers
2021-10-14 06:59:19 -07:00
Kubernetes Prow Robot
baaa53db64 Merge pull request #105211 from xiaopingrubyist/fix-pv-controller-claim-cache-issue
fix:claim cached in pvcontroller is not the newest may cause unexpected issue
2021-10-14 05:47:18 -07:00
Kubernetes Prow Robot
a8bda48abe Merge pull request #105474 from mauriciopoppe/readd-volume-subpath-flag
Add VolumeSubpath feature gate back in preparation for its removal
2021-10-13 21:55:28 -07:00
astraw99
5e789f157c fix CSI mount log 2021-10-14 10:27:50 +08:00
Kubernetes Prow Robot
894ceb63d0 Merge pull request #105003 from swatisehgal/getallocatable-to-beta
podresource-api: getAllocatableResources to Beta
2021-10-13 17:43:27 -07:00
Mike Dame
41fcb95f2f Wire contexts to Apps controllers 2021-10-13 16:32:13 -04:00
torubylist
f28a8d7f2b fix:cached claim is not the newest will cause unexpected issue 2021-10-13 20:03:00 +08:00
Mike Dame
7780024916 Wire contexts to Autoscaling controllers 2021-10-12 14:34:05 -04:00
Maciej Szulik
8322121434 Move test-related utils to test/utils 2021-10-12 14:52:19 +02:00
Maciej Szulik
1fb6bf8a14 Wire context instead of TODO 2021-10-12 13:21:45 +02:00
Kubernetes Prow Robot
a923852ba0 Merge pull request #105215 from rphillips/add_probe_shutdown
kubelet: add probe termination to graceful shutdowns
2021-10-11 21:19:46 -07:00
Kubernetes Prow Robot
67afa05c17 Merge pull request #105531 from aojea/master_leases
improve error message on control-plane endpoint reconciler
2021-10-11 15:01:02 -07:00
Kubernetes Prow Robot
dc9c571166 Merge pull request #105569 from pohly/generic-ephemeral-kubelet-volume-stats
kubelet: also provide filesystem stats for generic ephemeral volumes
2021-10-11 07:52:39 -07:00
Kubernetes Prow Robot
1f2813368e Merge pull request #105542 from pohly/generic-ephemeral-volume-util-kubelet
kubelet: use generic ephemeral volume helper functions
2021-10-11 02:16:40 -07:00