Commit Graph

49698 Commits

Author SHA1 Message Date
Kubernetes Prow Robot
76a22d3b32 Merge pull request #120711 from charles-chenzz/unify_fake_pod_scheduler
scheduler test: unify util to fake pod
2023-09-18 09:26:31 -07:00
Kubernetes Prow Robot
82bca6304b Merge pull request #119464 from TommyStarK/dra/cleanup-manager-unit-tests
dra: cleanup manager unit tests
2023-09-18 07:08:43 -07:00
Kubernetes Prow Robot
9d6180559b Merge pull request #119099 from palnabarun/authz-config
[StructuredAuthorizationConfiguration] Implement API types and wire kube-apiserver to use them
2023-09-18 07:08:31 -07:00
charles-chenzz
c8b9d64d81 scheduler test: unify util to fake pod. 2023-09-18 20:05:01 +08:00
Kubernetes Prow Robot
3cfdf3c33d Merge pull request #120434 from pohly/scheduler-backoff-metric-test
scheduler: fix TestIncomingPodsMetrics unit test
2023-09-18 03:00:31 -07:00
Prince Pereira
1a27531d2e Fix for Kubeproxy Mock framework where hcn object is holding always new object and not the pointer reference. 2023-09-18 11:59:01 +05:30
Nabarun Pal
108d195595 use AuthorizationConfiguration in kube-apiserver for storing authorizer config
Signed-off-by: Nabarun Pal <pal.nabarun95@gmail.com>
2023-09-18 11:33:18 +05:30
Christoph Mewes
b59d4afd14 fix typo exeucting => executing 2023-09-17 11:27:57 +02:00
Christoph Mewes
62275e3cc8 fix typo dervied => derived 2023-09-17 11:26:19 +02:00
Christoph Mewes
79a7833ade fix typo Mininum => Minimum 2023-09-17 11:24:29 +02:00
Christoph Mewes
6e3ebdc68e fix typo Conext => Context 2023-09-17 11:18:43 +02:00
Gunju Kim
b4e5b868a8 Don't reuse memory of a restartable init container 2023-09-17 14:49:15 +09:00
Kubernetes Prow Robot
4fd8bd9975 Merge pull request #118568 from qiutongs/node-startup-latency
Create a node startup latency tracker
2023-09-15 13:00:12 -07:00
Patrick Ohly
7cac1dcf67 dra scheduler: fall back to SSA for PodSchedulingContext updates
During scheduler_perf testing, roughly 10% of the PodSchedulingContext update
operations failed with a conflict error. Using SSA would avoid that, but
performance measurements showed that this causes a considerable
slowdown (primarily because of the slower encoding with JSON instead of
protobuf, but also because server-side processing is more expensive).

Therefore a normal update is tried first and SSA only gets used when there has
been a conflict. Using SSA in that case instead of giving up outright is better
because it avoids another scheduling attempt.
2023-09-15 15:05:38 +02:00
Stephen Kitt
3cb0b520d6 Scheduler CSI tests: switch maxVols to int32
This ends up stored in an int32 Count, use the target type throughout
to avoid narrowing conversions.

Signed-off-by: Stephen Kitt <skitt@redhat.com>
2023-09-15 09:52:50 +02:00
Stephen Kitt
567fca7baa Use copy() instead of a for loop
Signed-off-by: Stephen Kitt <skitt@redhat.com>
2023-09-15 09:20:08 +02:00
Kubernetes Prow Robot
d393d4e151 Merge pull request #120574 from logicalhan/cslis
promote component SLIs to GA; remove feature gates for component slis
2023-09-14 22:52:12 -07:00
ruiwen-zhao
9b50af1f4f Use a wider-range of metric buckets for PodStartDuration
Signed-off-by: ruiwen-zhao <ruiwen@google.com>
2023-09-14 21:32:14 +00:00
Justin Garrison
62267518b8 Fix systemd unit string matchs 2023-09-14 12:17:09 -07:00
Kevin Hannon
c6e9fba79b move reasons to api package for job controller 2023-09-14 13:24:29 -04:00
Kubernetes Prow Robot
fc786dcd1d Merge pull request #119396 from wackxu/NodeUnschedulableHintFunc
NodeUnschedulable: scheduler queueing hints
2023-09-14 09:20:12 -07:00
Kubernetes Prow Robot
a68093a3ff Merge pull request #120506 from alexzielenski/import-restrictions
Update e2e import restrictions
2023-09-13 21:56:22 -07:00
wackxu
28dbe8a34d scheduler/NodeUnschedulable: reduce pod scheduling latency
Signed-off-by: wackxu <xushiwei5@huawei.com>
2023-09-14 10:23:43 +08:00
Kubernetes Prow Robot
716b8b9d83 Merge pull request #120623 from aojea/service_status_Finalizer
sync Service API status rest storage
2023-09-13 17:56:11 -07:00
Kubernetes Prow Robot
3eca0a5f78 Merge pull request #120398 from aleksandra-malinowska/sts-restart-always
Make StatefulSet restart pods with phase Succeeded
2023-09-13 12:40:12 -07:00
Kubernetes Prow Robot
a08ee80807 Merge pull request #119829 from cvvz/fix-volumemanager-logs
fix: implement MarshalLog for structures in volumemanager for structured-logging.
2023-09-13 07:46:12 -07:00
Antonio Ojea
21e26486ac sync Service API status rest storage
The Service API Rest implementation is complex and has to use different
hooks on the REST storage. The status store was making a shallow copy of
the storage before adding the hooks, so it was not inheriting the hooks.

The status store must have the same hooks as the rest store to be able
to handle correctly the allocation and deallocation of ClusterIPs and
nodePorts.

Change-Id: I44be21468d36017f0ec41a8f912b8490f8f13f55
Signed-off-by: Antonio Ojea <aojea@google.com>
2023-09-13 11:35:42 +00:00
Jan Safranek
7fc11f47ff Mark a volume as uncertain-attached after detach error
Volume that failed Detach() should not be marked as attached, CSI
external-attacher is probably still trying to detach it.

Mark it uncertain instead and wait for Detach() to succeed.
2023-09-13 10:03:28 +02:00
Stephen Kitt
9990307146 kube-scheduler: drop deprecated pointer package
This replaces deprecated k8s.io/utils/pointer functions with their ptr
equivalent.

Signed-off-by: Stephen Kitt <skitt@redhat.com>
2023-09-13 09:42:19 +02:00
carlory
5fcffcf4e4 Add APIGroup ratcheting validation to PVC.DataSource 2023-09-13 13:10:22 +08:00
Kubernetes Prow Robot
a06e5a7307 Merge pull request #120330 from rohitssingh/master
Retry NodeStageVolume if CSI Driver Is Missing; Treat this Error as Transient
2023-09-12 17:44:30 -07:00
Kubernetes Prow Robot
db49b13ccd Merge pull request #120252 from kerthcet/cleanup/framework-import
Move framework testing libraries to the right place
2023-09-12 17:44:11 -07:00
Patrick Ohly
819eddaf9a scheduler: fix TestIncomingPodsMetrics unit test
addUnschedulablePodBackToBackoffQ happened to put the pod into the backoff
queue because
- the pod was not popped earlier and thus not in flight
- the PodInfo had UnschedulablePlugins set
- determineSchedulingHintForInFlightPod has code for "if UnschedulablePlugins
  is set and pod not in flight -> internal error, use backoff"

Relying on such special code is not good. A better way to force backoff is by
recording some concurrent event. isPodWorthRequeuing then calls the
queueHintReturnQueueAfterBackoff function and the pod goes to the backoff
queue.
2023-09-12 08:38:53 +02:00
kerthcet
6fbb8ec7e4 Move scheduler testing utils to /scheduler/testing
Signed-off-by: kerthcet <kerthcet@gmail.com>
2023-09-12 13:42:38 +08:00
Zhecheng Li
61023579c1 Fix Windows credential provider cannot find binary
Windows credential provider binary path may have ".exe" suffix so
it is better to use LookPath() to support it flexibly.

Signed-off-by: Zhecheng Li <zhechengli@microsoft.com>
2023-09-12 02:47:39 +00:00
Aldo Culquicondor
6b4ab616a2 Increase range of job_sync_duration_seconds
Change-Id: I7ed4b006faecf0a7e6e583c42b4d6bc4b786a164
2023-09-11 18:01:33 -04:00
Kubernetes Prow Robot
74f6c263d8 Merge pull request #118544 from sohankunkerkar/remove-sandbox-image-ref
pkg/kubelet: allow sandbox image pinning from CRI
2023-09-11 11:52:12 -07:00
Kubernetes Prow Robot
aa4ec3c5b0 Merge pull request #119944 from Sharpz7/jm/backup-finalizers
Adding backup code for removing finalizers to more Job End States.
2023-09-11 09:30:30 -07:00
Kensei Nakada
0d3eafdfa3 fix(scheduling_queue): always put Pods with no unschedulable plugins into activeQ/backoffQ (#119105)
* always put Pods with no unschedulable plugins into activeQ/backoffQ

* address review comments
2023-09-11 09:30:11 -07:00
Han Kang
e6435e98ed promote component SLIs to GA; remove feature gates for component slis 2023-09-11 09:15:32 -07:00
Patrick Ohly
6f9140e421 DRA scheduler: stop allocating before deallocation
This fixes a test flake:

    [sig-node] DRA [Feature:DynamicResourceAllocation] multiple nodes reallocation [It] works
    /nvme/gopath/src/k8s.io/kubernetes/test/e2e/dra/dra.go:552

      [FAILED] number of deallocations
      Expected
          <int64>: 2
      to equal
          <int64>: 1
      In [It] at: /nvme/gopath/src/k8s.io/kubernetes/test/e2e/dra/dra.go:651 @ 09/05/23 14:01:54.652

This can be reproduced locally with

    stress -p 10 go test ./test/e2e -args -ginkgo.focus=DynamicResourceAllocation.*reallocation.works  -ginkgo.no-color -v=4 -ginkgo.v

Log output showed that the sequence of events leading to this was:
- claim gets allocated because of selected node
- a different node has to be used, so PostFilter sets
  claim.status.deallocationRequested
- the driver deallocates
- before the scheduler can react and select a different node,
  the driver allocates *again* for the original node
- the scheduler asks for deallocation again
- the driver deallocates again (causing the test failure)
- eventually the pod runs

The fix is to disable allocations first by removing the selected node and then
starting to deallocate.
2023-09-11 10:56:17 +02:00
Rohit Singh
61ecc2ad88 Retry operations if CSI Driver Isn't Found by Treating this Error as Transient 2023-09-11 06:07:40 +00:00
Qiutong Song
d3eb082568 Create a node startup latency tracker
Signed-off-by: Qiutong Song <songqt01@gmail.com>
2023-09-11 05:54:25 +00:00
pegasas
f446745777 Improve logging on kube-proxy exit 2023-09-11 00:50:29 +08:00
Kubernetes Prow Robot
49768134e5 Merge pull request #119754 from pbxqdown/kubelet-fix-typo
Fix some typos in kubelet component source code
2023-09-09 19:36:11 -07:00
Kubernetes Prow Robot
33c5bd631d Merge pull request #120008 from skitt/drop-intstr-ptr-wrappers
Use ptr.To to retrieve intstr addresses
2023-09-09 07:24:09 -07:00
Kubernetes Prow Robot
41689233b4 Merge pull request #120334 from pohly/scheduler-clear-unschedulable-plugins
scheduler: avoid false "unschedulable" pod state
2023-09-08 12:01:23 -07:00
Alexander Zielenski
f135eed37b update codegen 2023-09-08 09:49:35 -07:00
Aleksandra Malinowska
d7264d0af0 Make StatefulSet restart pods with phase Succeeded 2023-09-08 17:47:17 +02:00
Patrick Ohly
4e73634b53 scheduler: start scheduling attempt with clean UnschedulablePlugins
When some plugin was registered as "unschedulable" in some previous scheduling
attempt, it kept that attribute for a pod forever. When that plugin then later
failed with an error that requires backoff, the pod was incorrectly moved to the
"unschedulable" queue where it got stuck until the periodic flushing because
there was no event that the plugin was waiting for.

Here's an example where that happened:

     framework.go:1280: E0831 20:03:47.184243] Reserve/DynamicResources: Plugin failed err="Operation cannot be fulfilled on podschedulingcontexts.resource.k8s.io \"test-dragxd5c\": the object has been modified; please apply your changes to the latest version and try again" node="scheduler-perf-dra-7l2v2" plugin="DynamicResources" pod="test/test-dragxd5c"
    schedule_one.go:1001: E0831 20:03:47.184345] Error scheduling pod; retrying err="running Reserve plugin \"DynamicResources\": Operation cannot be fulfilled on podschedulingcontexts.resource.k8s.io \"test-dragxd5c\": the object has been modified; please apply your changes to the latest version and try again" pod="test/test-dragxd5c"
    ...
    scheduling_queue.go:745: I0831 20:03:47.198968] Pod moved to an internal scheduling queue pod="test/test-dragxd5c" event="ScheduleAttemptFailure" queue="Unschedulable" schedulingCycle=9576 hint="QueueSkip"

Pop still needs the information about unschedulable plugins to update the
UnschedulableReason metric. It can reset that information before returning the
PodInfo for the next scheduling attempt.
2023-09-08 16:52:36 +02:00