Commit Graph

10467 Commits

Author SHA1 Message Date
Jan Safranek
fc245b339b Refactor ConstructVolumeSpec
Return a struct from ConstructVolumeSpec to be able to add more fields to
it later.
2022-11-03 16:55:13 +01:00
Jan Safranek
2dc8cc13a4 Remove SyncReconstructedVolume call
With the new reconstruction, AWS.MarkVolumeAsMounted will update outer spec
name with the correct value from Pod.
2022-11-03 16:55:12 +01:00
Jan Safranek
e0f3e5c457 Rework volume reconstruction
Subsequent SELinux work (see http://kep.k8s.io/1710) will need
ActualStateOfWorld populated around the time kubelet starts mounting
volumes.

Therefore reconstruct volumes before starting reconciler, but do not depend
on the desired state of world populated nor node.status - both need a
working API server, which may not be available at that time.

All reconstructed volumes are marked as Uncertain and reconciler will sort
them out - call SetUp to ensure the volume is really mounted when a pod
needs the volume or call TearDown then there is no such pod.

Finish the reconstruction when the API server becomes available:
- Clean up volumes that failed reconstruction and are not needed.

- Update devicePath of reconstructed volumes from node.status. Make sure
  not to overwrite devicePath that may have been updated when the volume
  was mounted by reconcile().

Hiding all this rework behind SELinuxMountReadWriteOncePod FeatureGate,
just to make sure we have a way back if this commit is buggy.
2022-11-03 16:55:12 +01:00
Shiming Zhang
101bfb5522 Fix grpc probe log 2022-11-03 18:05:39 +08:00
Paco Xu
57a3af1f87 kubelet: don't set secret and configmap manager if running in standalone mode 2022-11-03 17:46:52 +08:00
PiotrProkop
75bb437a6b Improved multi-numa alignment in Topology Manager: implement closest numa policy
Signed-off-by: PiotrProkop <pprokop@nvidia.com>
2022-11-03 10:45:25 +01:00
PiotrProkop
d5dd42dfac Improved multi-numa alignment in Topology Manager: introduce TopologyManagerOptions
Signed-off-by: PiotrProkop <pprokop@nvidia.com>
2022-11-03 10:45:21 +01:00
PiotrProkop
58ef3f202a Improved multi-numa alignment in Topology Manager: add NUMAInfo
Signed-off-by: PiotrProkop <pprokop@nvidia.com>
2022-11-03 10:45:09 +01:00
PiotrProkop
daee219210 Improved multi-numa alignment in Topology Manager: add topology-manager-policy-options flag in Kubelet
This patch adds new Kubelet option topologyManagerPolicyOptions.
To introduce new TopologyManager options, first we need to introduce new
flag called `topology-manager-policy-options` to allow users to modify
behaviour of best-effort and restricted policies.

Signed-off-by: PiotrProkop <pprokop@nvidia.com>
2022-11-03 09:45:33 +01:00
Sascha Grunert
f9707064cf
Remove CRI v1alpha2
After the removal of dockershim we can finally also drop support for CRI
v1alpha2.

Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
2022-11-03 09:13:43 +01:00
Kubernetes Prow Robot
98742f9d77
Merge pull request #110747 from harshanarayana/cleanup/GIT-110737/logging-improvements
structured-logging: replace KObjs with KObjSlice for logging
2022-11-03 00:49:34 -07:00
Kubernetes Prow Robot
6754265580
Merge pull request #109757 from STRRL/enriching-unit-test-for-container-manager
Add testcases for pkg/kubelet/cm/pod_container_manager_linux.go
2022-11-02 23:45:35 -07:00
Kubernetes Prow Robot
3cf75a2f76
Merge pull request #103177 from arkbriar/support_cancelable_exec_stream
Support cancelable SPDY executor stream
2022-11-02 19:47:36 -07:00
Kubernetes Prow Robot
433787d25b
Merge pull request #113018 from fromanirh/cpumanager-ga-features
node: kubelet: cpumgr: CPU Manager to GA
2022-11-02 14:41:01 -07:00
Kubernetes Prow Robot
25dc4c4f32
Merge pull request #112980 from swatisehgal/devicemanager-ga-graduation
node: devicemgr: Graduate Kubelet DeviceManager to GA
2022-11-02 13:17:01 -07:00
Francesco Romani
a6b928d90c kubelet: cpumgr: internal variable trivial rename
CPUManager is going GA, thus it makes little sense
to keep the names of the internal configuration
variables `Experimental*`.

Trivial rename only.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2022-11-02 18:41:42 +01:00
Francesco Romani
5e12338a22 node: cpumgr: address golint complains
Add docstrings and trivial fixes.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2022-11-02 18:41:42 +01:00
Francesco Romani
ff44dc1932 cpumanager: the FG is locked to default (ON)
hence we can remove the if() guards, the feature
is always available.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2022-11-02 18:41:41 +01:00
Jan Safranek
989e391d08 Move all volume reconstruction code into separate files
There is no code change, just moving code around and preparing for the
subsequent commit.
2022-11-02 15:58:21 +01:00
Antonio Ojea
9c2b333925 Revert "plumb context from CRI calls through kubelet"
This reverts commit f43b4f1b95.
2022-11-02 13:37:23 +00:00
astraw99
244598af80 Add back-off restarting failed container name 2022-11-02 20:46:32 +08:00
Swati Sehgal
40741681a2 node: devicemgr: Address warnings from golint
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2022-11-02 11:05:20 +00:00
Swati Sehgal
8b29eded52 node: devicemgr: Remove devicePluginEnabled field from container mgr
With graduation of device plugins to GA in 1.26, the feature gate is
enabled by default so `devicePluginEnabled` field no longer needs to
be passed at the time of Container Manager creation.

In addition to that, we remove the `ManagerStub` as it is no longer
needed.

Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2022-11-02 11:05:20 +00:00
Swati Sehgal
752fa093e0 node: devicemgr: GA graduation implies Feature Gate is ON by default
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2022-11-02 11:05:20 +00:00
Kubernetes Prow Robot
5899432f92
Merge pull request #113481 from rphillips/fixes/77063
kubelet: fix pod log line corruption when using timestamps and long lines
2022-11-01 19:59:50 -07:00
Kubernetes Prow Robot
9bbd0fbdb2
Merge pull request #113476 from marosset/hpc-to-stable
Promoting WindowsHostProcessContainers to stable
2022-11-01 19:59:43 -07:00
Kubernetes Prow Robot
7b84436168
Merge pull request #113408 from dashpole/kubelet_context
Plumb context to Kubelet CRI calls
2022-11-01 19:59:08 -07:00
Kubernetes Prow Robot
2452a95bd4
Merge pull request #112796 from SataQiu/clean-kubelet-20220930
kubelet: remove the unused constant AnnotationInvalidReason since sysctl annotations are deprecated and migrated to fields
2022-11-01 14:56:45 -07:00
Mark Rossetti
498d065cc5
Promoting WindowsHostProcessContainers to stable
Signed-off-by: Mark Rossetti <marosset@microsoft.com>
2022-11-01 14:06:25 -07:00
Kubernetes Prow Robot
9b72f20156
Merge pull request #112373 from loktev-d/k8s_109717
Add unit tests for active_deadline.go
2022-11-01 12:54:44 -07:00
Kubernetes Prow Robot
1a41cb8985
Merge pull request #113021 from rphillips/fixes/112936
kubelet: fix nil crash in allocateRemainingFrom
2022-11-01 10:46:45 -07:00
Kubernetes Prow Robot
e6060f2780
Merge pull request #111220 from giuseppe/drop-superfluous-function
kubelet: remove superfluous function
2022-11-01 09:34:45 -07:00
Ryan Phillips
ddae396ce3 kubelet: fix pod log line corruption when using timestamps and long lines 2022-11-01 09:22:30 -05:00
Kubernetes Prow Robot
2d14d50b31
Merge pull request #113406 from jsafrane/fix-selinux-check-of-mounted
Fix SELinux check of mounted volumes
2022-11-01 04:14:45 -07:00
Kubernetes Prow Robot
4c657e5014
Merge pull request #110403 from claudiubelu/unittests-3
unittests: Fixes unit tests for Windows (part 3)
2022-10-31 15:52:44 -07:00
Kubernetes Prow Robot
f892ab1bd7
Merge pull request #113405 from jsafrane/reduce-log-noise-on-selinux
Reduce log noise on SELinux mount mismatch
2022-10-31 13:14:56 -07:00
Jan Safranek
d37808faae Report error on a pod startup on SELinux mismatch
When a volume is already mounted with an unexpected SELinux label,
kubelet must unmount it first and then mount it back with the expected one.
Report an error to user, just in case the unmount takes too long.

In therory, this error should not happen too often, because two Pods with
different SELinux label will not enter Desired State of World, see
dsw.AddPodToVolume. It can happen when DSW and ASW SELinux labels only when
a volume has been deleted from DSW (= Pod was deleted) or a volume was
reconstructed after kubelet restart. In both cases, volume manager should
unmount the volume quickly.
2022-10-31 13:59:23 +01:00
Jan Safranek
805482413a Fix SELinux check of mounted volumes
In PodExistsInVolume with volumeObj.seLinuxMountContext != nil we know that
the volume has been previously mounted with a given SELinuxMountContext.

Either it has been mounted by this kubelet and we know it's correct or it
was by a previous instance of kubelet and the context has been
reconstructed from the filesystem. In both cases, the actual context is
correct, regardless if the volume plugin or PV access mode supports SELinux
mounts.
2022-10-31 13:39:48 +01:00
Kubernetes Prow Robot
d0e86111ef
Merge pull request #112855 from fromanirh/cpumanager-metrics
node: metrics: cpumanager: add metrics about pinning
2022-10-31 03:12:56 -07:00
Kubernetes Prow Robot
9702161caa
Merge pull request #112597 from mythi/grpc-authority
grpc: set localhost Authority to unix client calls
2022-10-31 03:12:45 -07:00
David Ashpole
f43b4f1b95
plumb context from CRI calls through kubelet 2022-10-28 02:55:28 +00:00
Jan Safranek
a910d83070 Reduce log noise on SELinux mount mismatch
The Desired State of World can require a different SELinux mount context than
is in the Actual State of World and it's perfectly OK. For example when
user changes SELinux context of Pods or when the context is reconstructed
after kubelet restart.

Don't spam log and don't report errors to the user as event - reconciler
will do the right thing and unmount the old volume (with wrong context) and
mount a new one in the next reconciliation. It's not an error, it's
expected workflow.
2022-10-27 18:00:42 +02:00
Kubernetes Prow Robot
ab4907d2f4
Merge pull request #112913 from Garrybest/pr_cpumanager
fix GetAllocatableCPUs in cpumanager
2022-10-27 07:20:33 -07:00
Francesco Romani
47d3299781 node: metrics: cpumanager: add pinning metrics
In order to improve the observability of the cpumanager,
add and populate metrics to track if the combination of
the kubelet configuration and podspec would trigger
exclusive core allocation and pinning.

We should avoid leaking any node/machine specific information
(e.g. core ids, even though this is admittedly an extreme example);
tracking these metrics seems to be a good first step, because
it allows us to get feedback without exposing details.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2022-10-27 14:40:40 +02:00
Garrybest
95eb5670cf add GetAllocatableCPUs test in cpumanager
Signed-off-by: Garrybest <garrybest@foxmail.com>
2022-10-27 19:57:12 +08:00
Garrybest
d446f5f90e fix GetAllocatableCPUs in cpumanager
Signed-off-by: Garrybest <garrybest@foxmail.com>
2022-10-27 19:57:06 +08:00
Artur Żyliński
8a5fbce96e Fix cond: Check if pod started 2022-10-26 16:05:19 +02:00
Artur Żyliński
492f5fa82c Regenerate mocks 2022-10-26 11:31:50 +02:00
Artur Żyliński
b0fac15cd6 Make the interface local to each package 2022-10-26 11:28:18 +02:00
Artur Żyliński
9f31669a53 New histogram: Pod start SLI duration 2022-10-26 11:28:17 +02:00
Kubernetes Prow Robot
244c035b87
Merge pull request #110263 from claudiubelu/unittests
unittests: Fixes unit tests for Windows
2022-10-25 14:50:34 -07:00
Claudiu Belu
6f2eeed2e8 unittests: Fixes unit tests for Windows
Currently, there are some unit tests that are failing on Windows due to
various reasons:

- config options not supported on Windows.
- files not closed, which means that they cannot be removed / renamed.
- paths not properly joined (filepath.Join should be used).
- time.Now() is not as precise on Windows, which means that 2
  consecutive calls may return the same timestamp.
- different error messages on Windows.
- files have \r\n line endings on Windows.
- /tmp directory being used, which might not exist on Windows. Instead,
  the OS-specific Temp directory should be used.
- the default value for Kubelet's EvictionHard field was containing
  OS-specific fields. This is now moved, the field is now set during
  Kubelet's initialization, after the config file is read.
2022-10-25 23:46:56 +03:00
Kubernetes Prow Robot
6a709cf07b
Merge pull request #113194 from saltbo/refa-replace-ioutil
Replace the ioutil by the os and io for the pkg/util
2022-10-23 18:08:24 -07:00
saltbo
6f878d92fb
fix: update the fsstore_test.go
Signed-off-by: saltbo <saltbo@foxmail.com>
2022-10-23 21:51:48 +08:00
Kubernetes Prow Robot
a497c56c33
Merge pull request #113030 from Richabanker/kubelet-metrics-slis
add metrics/slis to kubelet health checks
2022-10-21 10:35:52 -07:00
Claudiu Belu
9f95b7b18c unittests: Fixes unit tests for Windows (part 3)
Currently, there are some unit tests that are failing on Windows due to
various reasons:

- paths not properly joined (filepath.Join should be used).
- Proxy Mode IPVS not supported on Windows.
- DeadlineExceeded can occur when trying to read data from an UDP
  socket. This can be used to detect whether the port was closed or not.
- In Windows, with long file name support enabled, file names can have
  up to 32,767 characters. In this case, the error
  windows.ERROR_FILENAME_EXCED_RANGE will be encountered instead.
- files not closed, which means that they cannot be removed / renamed.
- time.Now() is not as precise on Windows, which means that 2
  consecutive calls may return the same timestamp.
- path.Base() will return the same path. filepath.Base() should be used
  instead.
- path.Join() will always join the paths with a / instead of the OS
  specific separator. filepath.Join() should be used instead.
2022-10-21 19:25:48 +03:00
Kubernetes Prow Robot
9bcb81e13f
Merge pull request #113175 from liggitt/pr_normalize_probes_lifecycle_handlers
Record event and metric for lifecycle fallback to http
2022-10-20 02:31:08 -07:00
Kubernetes Prow Robot
ad26b315f2
Merge pull request #86139 from jasimmons/pr_normalize_probes_lifecycle_handlers
Normalize HTTP  lifecycle handlers with HTTP probers
2022-10-19 17:44:56 -07:00
Kubernetes Prow Robot
45636684a4
Merge pull request #112897 from fromanirh/podresources-metrics-e2e-tests
register podresources metrics
2022-10-19 13:57:18 -07:00
Jordan Liggitt
a5d785fae8
Record metric for lifecycle fallback to http 2022-10-19 14:45:25 -04:00
Jordan Liggitt
122b43037e
Record event for lifecycle fallback to http 2022-10-19 14:11:36 -04:00
Kubernetes Prow Robot
bf14677914
Merge pull request #112546 from oscr/the-the
grammar: replace all occurrences of "the the" with "the"
2022-10-19 10:03:02 -07:00
Billie Cleek
dfaaa144ab fallback to http when lifecycle handler request should have been https 2022-10-19 09:51:52 -07:00
Jason Simmons
5a6acf85fa Align lifecycle handlers and probes
Align the behavior of HTTP-based lifecycle handlers and HTTP-based
probers, converging on the probers implementation. This fixes multiple
deficiencies in the current implementation of lifecycle handlers
surrounding what functionality is available.

The functionality is gated by the features.ConsistentHTTPGetHandlers feature gate.
2022-10-19 09:51:52 -07:00
Richa Banker
047f6a736b add metrics/slis to kubelet health checks 2022-10-18 14:06:20 -07:00
Kubernetes Prow Robot
2522420937
Merge pull request #111601 from claudiubelu/skip-unittests
unit tests: Skip Windows-unrelated tests on Windows
2022-10-18 11:29:30 -07:00
Kubernetes Prow Robot
23721935d3
Merge pull request #113129 from chaunceyjiang/pr_remove_redundant_conversion
Remove redundant type conversion
2022-10-18 10:23:19 -07:00
Kubernetes Prow Robot
843ad71cac
Merge pull request #113041 from saschagrunert/kubelet-pods-creation-time
Sort kubelet pods by their creation time
2022-10-18 09:17:19 -07:00
Claudiu Belu
af77381e01 unit tests: Skip Windows-unrelated tests on Windows
Some of the unit tests cannot pass on Windows due to various reasons:

- fsnotify does not have a Windows implementation.
- Proxy Mode IPVS not supported on Windows.
- Seccomp not supported on Windows.
- VolumeMode=Block is not supported on Windows.
- iSCSI volumes are mounted differently on Windows, and iscsiadm is a
  Linux utility.
2022-10-18 12:43:07 +03:00
chaunceyjiang
d2b372e029 Remove redundant type conversion
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2022-10-18 14:37:40 +08:00
Kubernetes Prow Robot
6f579d3ceb
Merge pull request #111616 from ndixita/credential-api-ga
Move the Kubelet Credential Provider feature to GA and Update the Credential Provider API to GA
2022-10-15 07:53:09 -07:00
Oscar Utbult
e4f776f230 grammar: replace all occurrences of "the the" with "the" 2022-10-14 09:03:14 +02:00
Sascha Grunert
b296f82c69
Sort kubelet pods by their creation time
There is a corner case when blocking Pod termination via a lifecycle
preStop hook, for example by using this StateFulSet:

```yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: web
spec:
  selector:
    matchLabels:
      app: ubi
  serviceName: "ubi"
  replicas: 1
  template:
    metadata:
      labels:
        app: ubi
    spec:
      terminationGracePeriodSeconds: 1000
      containers:
      - name: ubi
        image: ubuntu:22.04
        command: ['sh', '-c', 'echo The app is running! && sleep 360000']
        ports:
        - containerPort: 80
          name: web
        lifecycle:
          preStop:
            exec:
              command:
              - /bin/sh
              - -c
              - 'echo aaa; trap : TERM INT; sleep infinity & wait'
```

After creation, downscaling, forced deletion and upscaling of the
replica like this:

```
> kubectl apply -f sts.yml
> kubectl scale sts web --replicas=0
> kubectl delete pod web-0 --grace-period=0 --force
> kubectl scale sts web --replicas=1
```

We will end up having two pods running by the container runtime, while
the API only reports one:

```
> kubectl get pods
NAME    READY   STATUS    RESTARTS   AGE
web-0   1/1     Running   0          92s
```

```
> sudo crictl pods
POD ID              CREATED              STATE     NAME     NAMESPACE     ATTEMPT     RUNTIME
e05bb7dbb7e44       12 minutes ago       Ready     web-0    default       0           (default)
d90088614c73b       12 minutes ago       Ready     web-0    default       0           (default)
```

When now running `kubectl exec -it web-0 -- ps -ef`, there is a random chance that we hit the wrong
container reporting the lifecycle command `/bin/sh -c echo aaa; trap : TERM INT; sleep infinity & wait`.

This is caused by the container lookup via its name (and no podUID) at:
02109414e8/pkg/kubelet/kubelet_pods.go (L1905-L1914)

And more specifiy by the conversion of the pod result map to a slice in `GetPods`:
02109414e8/pkg/kubelet/kuberuntime/kuberuntime_manager.go (L407-L411)

We now solve that unexpected behavior by tracking the creation time of
the pod and sorting the result based on that. This will cause to always
match the most recently created pod.

Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
2022-10-13 16:32:44 +02:00
Paco Xu
2ce7a81169 fsnotify: use event.Has instead of "event.Op&h == h" 2022-10-13 13:42:26 +08:00
weizhichen
5d514601a8 gofmt 2022-10-13 01:47:08 +00:00
Ryan Phillips
2514486d80 kubelet: fix nil crash in allocateRemainingFrom 2022-10-12 12:51:17 -05:00
arkbriar
42808c8343 Support cancelable SPDY executor stream
Mark remotecommand.Executor as deprecated and related modifications.

Handle crash when streamer.stream panics

Add a test to verify if stream is closed after connection being closed

Remove blank line and update waiting time to 1s to avoid test flakes in CI.

Refine the tests of StreamExecutor according to comments.

Remove the comment of context controlling the negotiation progress and misc.

Signed-off-by: arkbriar <arkbriar@gmail.com>
2022-10-09 15:24:00 +08:00
Daniil Loktev
e954eeb255 Add comment for 0th case 2022-10-08 12:06:42 +03:00
Francesco Romani
ba6b468982 node: metrics: register podresources metrics
Because of a bug in the commit 1e7bb20c52,
podresources metrics were added, they are updated in the right
places, but they are never exported, so they cannot be consumed.
Fix trivially registering the metrics.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2022-10-06 15:14:56 +02:00
Kubernetes Prow Robot
98233be715
Merge pull request #112709 from swagatbora90/kubelet-tracing
Support otel tracing in cri remote image service
2022-10-04 14:12:00 -07:00
Andrew Sy Kim
4e2a2b6053
Revert "Avoid tainting with NoSchedule when DisableCloudProviders feature is on" 2022-10-03 15:13:43 -04:00
Davanum Srinivas
8b9a5b2dff
Avoid tainting with NoSchedule when DisableCloudProviders feature is on
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2022-10-02 13:00:58 -04:00
Kubernetes Prow Robot
02109414e8
Merge pull request #112542 from astraw99/fix-runtime-validate
Add validation for runtime endpoint flag
2022-09-30 18:04:24 -07:00
Kubernetes Prow Robot
be22f605cf
Merge pull request #112097 from wongearl/cleanup_loop
use copy() instead of a loop
2022-09-30 18:04:12 -07:00
Kubernetes Prow Robot
ad64f9c4dc
Merge pull request #112631 from tzneal/reword-image-gc-failure-log
reword image gc failure log
2022-09-30 16:56:35 -07:00
jesse.tang
759e043136
Optimize: file /cpuset slice make cap (#112270) 2022-09-30 16:56:25 -07:00
Kubernetes Prow Robot
5bcdc82911
Merge pull request #112184 from danwinship/kubelet-node-ip-annotation-cleanup
Delete the cloud node IP annotation if it is stale
2022-09-30 16:56:13 -07:00
SataQiu
7308b83a99 remove the unused constant AnnotationInvalidReason since sysctl annotations are deprecated and migrated to fields 2022-09-30 14:53:46 +08:00
Kubernetes Prow Robot
4276ed3628
Merge pull request #112414 from pacoxu/kubelet-multi-options
kubelet: append options to pod if there are multi options in /etc/resolv.conf
2022-09-29 21:10:28 -07:00
Swagat Bora
caa83c25ae Support otel tracing in cri remote image service
Signed-off-by: Swagat Bora <sbora@amazon.com>
2022-09-29 22:15:07 +00:00
Kubernetes Prow Robot
3af1e5fdf6
Merge pull request #112707 from enj/enj/i/https_links
Use https links for k8s KEPs, issues, PRs, etc
2022-09-29 12:34:40 -07:00
Dixita Narang
ff1f525511 Setting LockToDefault as true for KubeletCredentialProviders feature, and removing conditions that check if the feature is enabled since now the feature is enabled by default 2022-09-29 16:42:48 +00:00
astraw99
805be30745 Add validation for runtime endpoint 2022-09-28 10:33:35 +08:00
Kubernetes Prow Robot
00532e305a
Merge pull request #107896 from smarterclayton/track_pod_sync_latency
kubelet: Record a metric for latency of pod status update
2022-09-27 14:25:50 -07:00
Kubernetes Prow Robot
5579ddea8a
Merge pull request #112644 from vitorfhc/issue-112605
Improves message for pod status in rejectPod
2022-09-27 11:32:02 -07:00
Kubernetes Prow Robot
efc306a12d
Merge pull request #112316 from dengyufeng2206/0908test
fix test order in pkg/kubelet/sysctl/util_test.go
2022-09-27 11:31:50 -07:00
Monis Khan
b738be9b46
Use https links for k8s KEPs, issues, PRs, etc
Signed-off-by: Monis Khan <mok@microsoft.com>
2022-09-23 23:36:24 +00:00
Kubernetes Prow Robot
4e105c4814
Merge pull request #111343 from niulechuan/add_unit_test_for_asw
Add unit test in kubelet volumemanager ASW: Detach a volume that had been mounted by pod should be skipped
2022-09-23 07:04:25 -07:00
Vitor Falcao
0beafd1a5a Improved message for pod status in rejectPod
Co-authored-by: Sergey Kanzhelev <S.Kanzhelev@live.com>
2022-09-21 21:46:52 +00:00
Ryan Phillips
205adec698 kubelet: increase log level for Path does not exist message 2022-09-21 14:16:44 -05:00
Todd Neal
9e83c2d7eb reword image gc failure log
Reword the log so that it sounds less like a failure of kubelet and points
towards the root cause of not enough data being eligible to free.
2022-09-20 21:57:59 -05:00
Mikko Ylinen
fbcdf48bb8 grpc: set localhost Authority to unix client calls
Several reports exist (both with device plugins and CSI) that
kubelet w/ grpc-go sends invalid Authority header and some non
grpc-go servers reject these unix domain socket client connections.

grpc-go sets the Authority header correct when the dial address
is in a format where the its address scheme can be determined.

Instead of making changes to get the all server addresses to unix://
prefixed format, set grpc.WithAuthority("localhost") client connection
override to get the same result.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2022-09-20 13:15:36 +03:00
Paco Xu
3bbd025982 ut: compare dns options without order 2022-09-20 11:45:43 +08:00
Paco Xu
468b2a2297 kubelet: append options to pod if there are multi options in /etc/resolv.conf 2022-09-20 10:40:54 +08:00
Kubernetes Prow Robot
127f33f63d
Merge pull request #111221 from inosato/remove-ioutil-from-kubelet
Remove ioutil in kubelet/kubeadm and its tests
2022-09-17 21:56:28 -07:00
inosato
7dc1f5e30b Fix comments 2022-09-18 12:51:03 +09:00
Dixita Narang
9c3cb6e66d Fixing boilerplate header 2022-09-16 21:20:30 +00:00
Kubernetes Prow Robot
c45ca46cdb
Merge pull request #112387 from mythi/kubelet-devicemanager-topologyinfo
devicemanager: do not leak empty TopologyInfo to TopologyManager
2022-09-14 07:17:00 -07:00
Mikko Ylinen
68bb0935bd devicemanager: do not leak empty TopologyInfo to TopologyManager
Device Plugins that wish to leverage the Topology Manager can send back a populated
TopologyInfo struct as part of the device registration, along with the device IDs
and the health of the device. TopologyInfo is converted to TopologyHints and
used by TopologyManager to find the optimal/desired resource allocation for a Pod.

If a plugin sends an empty but non-nil instance of TopologyInfo for a resource,
devicemanager passes it on as an empty instance of TopologyHint which is
currently interpreted as "Hint Provider has no possible NUMA affinities
for resource" which further means that pods requesting that resource will fail.

To not block device resources that pass TopologyInfo{Nodes:[]*NUMANode{}} from being
used, interprete that as nil set of hints and not a []TopologyHint{}.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2022-09-14 16:13:31 +03:00
Kubernetes Prow Robot
74469ca4c5
Merge pull request #112123 from paskal/paskal/cfs_clarification
clarify CPUCFSQuotaPeriod values, set the minimum to 1ms
2022-09-12 07:01:25 -07:00
Daniil Loktev
229ce27ae4 Add unit tests for active_deadline.go 2022-09-10 11:02:36 +03:00
Dixita Narang
4cc741955c Adding default values for v1 credential provider config 2022-09-09 06:11:15 +00:00
Dixita Narang
977a8ebb3a Renaming usage of v1beta1 to v1, and adding API violation exceptions and
vendor module for v1
2022-09-09 06:11:06 +00:00
Dmitry Verkhoturov
d0f9e6dc36 clarify CPUCFSQuotaPeriod values, set the minimum to 1ms
cpu.cfs_period_us is measured in microseconds in the kernel but
provided in time.Duration by the user, that change clarifies the code
to make this evident to the reader.

Also, the minimum value for that feature is 1ms and not 1μs, and this
change alters the validation to reject values smaller than 1ms.
2022-09-08 23:29:13 +02:00
Clayton Coleman
e9a5fb7372
kubelet: Record a metric for latency of pod status update
Track how long it takes for pod updates to propagate from detection
to successful change on API server. Will guide future improvements
in pod start and shutdown latency.

Metric is `kubelet_pod_status_sync_duration_seconds` and is ALPHA
stability. Histogram buckets are chosen based on distribution of
observed status delays in practice.
2022-09-08 12:17:44 -04:00
dengyufeng2206
e20071792f fix test order in pkg/kubelet/sysctl/util_test.go
Signed-off-by: dengyufeng2206 <deng.yufeng@zte.com.cn>
2022-09-08 17:20:22 +08:00
Kubernetes Prow Robot
6d1e9150d0
Merge pull request #108855 from haircommander/podStatsFix
kubelet/stats: deduplicate makePodStorageStats
2022-09-06 12:58:22 -07:00
Kubernetes Prow Robot
780fe01858
Merge pull request #111935 from giuseppe/userns-manager-use-bitmask-pkg-registry
kubelet: drop bitArray implementation
2022-09-06 10:27:51 -07:00
Dan Winship
e23f1a68af Delete the cloud node IP annotation if it is stale
If you run "kubelet --cloud-provider X --node-ip Y", kubelet will set
an annotation on the node, but previously, if you then ran just
"kubelet --cloud-provider X" (or just "kubelet --node-ip Y"), it
wouldn't delete the stale annotation. Fix that.
2022-09-01 16:43:18 -04:00
Dalton Hubble
7850097fd0 Avoid propagating search . into containers /etc/resolv.conf
* Adapt https://github.com/kubernetes/kubernetes/pull/109441 but
ensures that `search .` does not get propagated into containers'
/etc/resolv.conf. There is no reason to put `.` in a container's
search field and it causes issues for musl
2022-09-01 12:07:18 -07:00
Kubernetes Prow Robot
67d75db890
Merge pull request #111932 from azylinski/rm-lastContainerStartedTime-lru
Cleanup: Remove unused lastContainerStartedTime time.Cache lru
2022-08-29 09:54:37 -07:00
wongearl
47bd712b81 use copy() instead of a loop 2022-08-29 17:55:16 +08:00
Antonio Ojea
d434c588d7 Revert "change CPUCFSQuotaPeriod default value to 100us to match Linux default"
This reverts commit f2d591fae6.
2022-08-26 23:51:04 +02:00
sivchari
c62a7cdb32 fix: test 2022-08-26 01:25:44 +09:00
sivchari
12d49b6bfb fix: rename 2022-08-26 00:44:31 +09:00
Kubernetes Prow Robot
bc9f48b841
Merge pull request #112024 from cndoit18/remove-redundant-judgment
style: remove redundant judgment
2022-08-25 07:28:18 -07:00
Kubernetes Prow Robot
2b5475b3fa
Merge pull request #111554 from paskal/paskal/clarify_default_cfs_period
Clarify cpu.cfs_period_us default value
2022-08-25 07:28:07 -07:00
cndoit18
ec43037d0f style: remove redundant judgment
Signed-off-by: cndoit18 <cndoit18@outlook.com>
2022-08-25 12:07:36 +08:00
Mrunal Patel
65e693eccb Set correct SELinux label for host paths volumes created by host path provisioner
These host paths have a well known location under /tmp/hostpath_pv
and are therefore safe to be labeled with the shared SELinux label.

Without this label, the mounted volumes cannot be accessed by the
container processes.

Signed-off-by: Mrunal Patel <mpatel@redhat.com>
2022-08-24 17:57:47 -07:00
Kubernetes Prow Robot
70254065ea
Merge pull request #109966 from zhangxyjlu/config_validation_test
Add validation test for features.GracefulNodeShutdownBasedOnPodPriority
2022-08-24 00:02:24 -07:00
Kubernetes Prow Robot
08aac4f0ac
Merge pull request #111520 from paskal/paskal/clarify_cfs_period_us
Change CPUCFSQuotaPeriod default value from 100ms to 100us to match Linux default
2022-08-23 20:07:48 -07:00
Kubernetes Prow Robot
c9c7e245e4
Merge pull request #111692 from SataQiu/cleanup-kubelet-20220804
kubelet: remove unused custommetrics package
2022-08-23 17:17:34 -07:00
weizhichen
f2e7211ab8 delete stale code in kubelet volumemanager 2022-08-23 23:36:09 +00:00
Kubernetes Prow Robot
052bfc35b2
Merge pull request #110390 from major1201/fix_kubelet_test
fix defer in loop and optimize test cases with explicit field name
2022-08-23 16:04:37 -07:00
Kubernetes Prow Robot
17dd76f5d4
Merge pull request #108832 from waynepeking348/fix_bugs_of_container_cpu_shares
fix bugs of container cpu shares when cpu request set to zero
2022-08-23 16:04:03 -07:00
Giuseppe Scrivano
2b01af6c92
kubelet: drop bitArray implementation
drop bitArray implementation and use the bitmap implementation from
k8s.io/kubernetes/pkg/registry/core/service/allocator.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2022-08-19 16:55:15 +02:00
Artur Żyliński
15566d3d89 Cleanup: Remove unused lastContainerStartedTime time.Cache lru 2022-08-19 14:57:29 +02:00
Peter Hunt~
d6ffca04c9 kubelet/stats: drop makePodStorageStats errors to V(6)
and by doing so, fix a bug where the stats providers report a directory is not found after a pod's storage is removed

Signed-off-by: Peter Hunt <pehunt@redhat.com>
2022-08-16 16:54:29 -04:00
Peter Hunt
6d9264247d kubelet/stats: deduplicate makePodStorageStats
Signed-off-by: Peter Hunt <pehunt@redhat.com>
2022-08-16 16:54:29 -04:00
Dmitry Verkhoturov
f2d591fae6 change CPUCFSQuotaPeriod default value to 100us to match Linux default
cpu.cfs_period_us is 100μs by default despite having an "ms" unit
for some unfortunate reason. Documentation:
https://www.kernel.org/doc/html/latest/scheduler/sched-bwc.html#management

The desired effect of that change is to match
k8s default `CPUCFSQuotaPeriod` value (100ms before that change)
with one used in k8s without the `CustomCPUCFSQuotaPeriod` flag enabled
and Linux CFS (100us, 1000x smaller than 100ms).
2022-08-10 03:25:05 +02:00
STRRL
1dcd0c3348
chore: use require instead of assert
Signed-off-by: STRRL <im@strrl.dev>
2022-08-08 21:06:38 +08:00
Niu Lechuan
24614f8551 Add unit test in volumemanager: Detach a volume that had been mounted by pod should be skipped
Signed-off-by: Niu Lechuan <lechuan.niu@daocloud.io>
2022-08-05 09:03:21 +08:00
SataQiu
ea9f0a8c8e kubelet: remove unused custommetrics package 2022-08-04 20:00:38 +08:00
Jan Safranek
f9c7ce5b9c Add unit tests for DesiredStateOfWorldPopulator 2022-08-04 10:51:59 +02:00
Jan Safranek
260912490e Add a coment about handling same volumes with different contexts 2022-08-04 10:51:56 +02:00
Jan Safranek
a01e720a1a Rename IsRWOP
To be able to update content of the function to other access modes when we
implement SELinux mount for more of them.
2022-08-04 10:51:54 +02:00
Jan Safranek
1490d51028 Remove noisy log
The error would be logged every reconciler sync (100 ms).
2022-08-04 10:51:53 +02:00
Jan Safranek
0793ecee3a Add unit tests for ASW.AddPodToVolume 2022-08-04 10:51:52 +02:00
Jan Safranek
17d850ee0e Add interface for SELinuxOptionsToFileLabel
github.com/opencontainers/selinux/go-selinux needs OS that supports SELinux
and SELinux enabled in it to return useful data, therefore add an interface
in front of it, so we can mock its behavior in unit tests.
2022-08-04 10:51:51 +02:00
Jan Safranek
d9f792633d Add AddPodToVolume unit tests with SELinux 2022-08-04 10:51:50 +02:00
Jan Safranek
8d6b721ddd Extract SELinux context error handling into a common func
Add handlerSELinuxMetricError() which bumps the right metric + either
consumes a SELinux error or lets it propagate up the stack.
2022-08-04 10:51:48 +02:00
Jan Safranek
49148ddfd0 Extract getSELinuxLabel from AddPodToVolume
To keep the function smaller.
2022-08-04 10:51:46 +02:00
Jan Safranek
de7f5b66ed Fix existing unit tests 2022-08-04 10:51:44 +02:00
Jan Safranek
b2e18c0b20 Add metrics for SELinux context mount
Add separate _errors and _warnings to capture volumes that were rejected
from those will be rejected when the feature is expanded to all access
mode.
2022-08-04 10:51:43 +02:00
Jan Safranek
48b0751269 Add SELinux context tracking to volume manager
Both ActualStateOfWorld and DesiredStateOfWorld must track SELinux context
of volume mounts.
2022-08-04 10:51:41 +02:00
Kubernetes Prow Robot
442574f3a7
Merge pull request #111513 from jingxu97/july/localstorage
Promote Local storage capacity isolation feature to GA
2022-08-03 13:05:59 -07:00
Kubernetes Prow Robot
4b6134b6dc
Merge pull request #111090 from kinvolk/rata/userns-support-2022
Add support for user namespaces phase 1 (KEP 127)
2022-08-03 13:05:47 -07:00
Rodrigo Campos
138e80819e kubelet: set user namespace options
Set the user namespace options to use for the pod.

Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
2022-08-03 19:53:22 +02:00
Giuseppe Scrivano
67b38ffe6e kubelet: propagate errors from namespacesForPod
it is a preparatory change for the next commit.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2022-08-03 19:53:22 +02:00
Rodrigo Campos
d07c2688fe kubelet: add GetHostIDsForPod()
In future commits we will need this to set the user/group of supported
volumes of KEP 127 - Phase 1.

Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
2022-08-03 19:53:22 +02:00
Giuseppe Scrivano
9b2fc639a0 kubelet: add GetUserNamespaceMappings to RuntimeHelper
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2022-08-03 19:53:22 +02:00
Giuseppe Scrivano
63462285d5 kubelet: add userns manager
it is used to allocate and keep track of the unique users ranges
assigned to each pod that runs in a user namespace.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
Co-authored-by: Rodrigo Campos <rodrigoca@microsoft.com>
2022-08-03 19:53:22 +02:00
jinxu
0064010cdd Promote Local storage capacity isolation feature to GA
This change is to promote local storage capacity isolation feature to GA

At the same time, to allow rootless system disable this feature due to
unable to get root fs, this change introduced a new kubelet config
"localStorageCapacityIsolation". By default it is set to true. For
rootless systems, they can set this configuration to false to disable
the feature. Once it is set, user cannot set ephemeral-storage
request/limit because capacity and allocatable will not be set.

Change-Id: I48a52e737c6a09e9131454db6ad31247b56c000a
2022-08-02 23:45:48 -07:00
zhangxiaoyang
7375ba4e27 add validation test for features.GracefulNodeShutdownBasedOnPodPriority 2022-08-03 14:43:00 +08:00
Vinay Kulkarni
007d93ad08 Handle UpdateContainerResources for Windows in v1alpha2 2022-08-02 15:31:00 -07:00
Vinay Kulkarni
0ef263c3b0 CRI changes to support implementation of in-place pod resize.
KEP: /enhancements/keps/sig-node/1287-in-place-update-pod-resources
2022-08-02 15:08:25 -07:00
Kubernetes Prow Robot
8f3b2813dc
Merge pull request #111642 from harche/evented_pleg_cri_changes
Update CRI API to support Evented PLEG
2022-08-02 13:59:16 -07:00
Kubernetes Prow Robot
369a465fae
Merge pull request #111301 from mattcary/migration-feature
Upgrade CSIMigrationGCE feature gate to GA
2022-08-02 13:58:57 -07:00
Kubernetes Prow Robot
9fb1f67af7
Merge pull request #111278 from arpitsardhana/master
KEP-3327: Add CPUManager policy option to align CPUs by Socket instead of by NUMA node
2022-08-02 13:58:45 -07:00
Kubernetes Prow Robot
d40bc18461
Merge pull request #105126 from sallyom/tracing-kubelet
kubelet tracing instrumentation
2022-08-02 11:38:06 -07:00
Harshal Patil
668b2440c5 Update CRI API to support Evented PLEG
Signed-off-by: Harshal Patil <harpatil@redhat.com>
2022-08-03 00:01:13 +05:30
Arpit Singh
d92fd8392d Adding unit test for align-by-socket policy option
Also addressed MR comments as part of same commit.
2022-08-02 11:02:07 -07:00
Arpit Singh
06f347f645 Adding validity checks for topology manager align-by-socket 2022-08-02 11:02:07 -07:00
Arpit Singh
35849bf7fb KEP-3327: Add CPUManager policy option to align CPUs by Socket instead of by NUMA node 2022-08-02 11:02:07 -07:00
Matthew Cary
e5d387c5d6 Upgrade CSIMigrationGCE feature gate to GA
Change-Id: I620bc4913765c0d6562eb1008216a72e8b0a2970
2022-08-02 09:14:27 -07:00
Michal Wozniak
04fcbd721c Introduction of a pod condition type indicating disruption. Its reason field indicates the reason:
- PreemptionByKubeScheduler (Pod preempted by kube-scheduler)
- DeletionByTaintManager (Pod deleted by taint manager due to NoExecute taint)
- EvictionByEvictionAPI (Pod evicted by Eviction API)
- DeletionByPodGC (an orphaned Pod deleted by PodGC)PreemptedByScheduler (Pod preempted by kube-scheduler)
2022-08-02 11:12:16 +02:00
Dmitry Verkhoturov
32df800ba7 change CPUCFSQuotaPeriod default value to 100us to match Linux default
cpu.cfs_period_us is 100μs by default despite having an "ms" unit
for some unfortunate reason. Documentation:
https://www.kernel.org/doc/html/latest/scheduler/sched-bwc.html#management

The desired effect of that change is to match
k8s default `CPUCFSQuotaPeriod` value (100ms before that change)
with one used in k8s without the `CustomCPUCFSQuotaPeriod` flag enabled
and Linux CFS (100us, 1000x smaller than 100ms).
2022-08-02 09:55:50 +02:00
Kubernetes Prow Robot
ea21947641
Merge pull request #111426 from ping035627/k8s-220726
Update design-proposals URL
2022-08-01 23:50:30 -07:00
PingWang
473be65a3c Update design-proposals URL
Signed-off-by: PingWang <wang.ping5@zte.com.cn>

update url

Signed-off-by: PingWang <wang.ping5@zte.com.cn>
2022-08-02 09:13:38 +08:00
Kubernetes Prow Robot
2e1a4da8df
Merge pull request #111358 from ddebroy/hasnet1
Introduce PodHasNetwork condition for pods
2022-08-01 15:04:52 -07:00
Kubernetes Prow Robot
acc64759f5
Merge pull request #111549 from claudiubelu/log-compression
Fixes kubelet log compression on Windows
2022-08-01 13:18:41 -07:00
Sally O'Malley
9e4e0bb48a
add runtime-service test with tracerProvider
Signed-off-by: Sally O'Malley <somalley@redhat.com>
2022-08-01 12:55:21 -04:00
Sally O'Malley
0d558c51b5
add otelrestful restful.FilterFunction
Signed-off-by: Sally O'Malley <somalley@redhat.com>
2022-08-01 12:55:19 -04:00
Sally O'Malley
7585aae1b4
kubelet-tracing:update
Signed-off-by: Sally O'Malley <somalley@redhat.com>
2022-08-01 12:55:16 -04:00
Sally O'Malley
5b4456ceea
kubelet tracing: generated files
Signed-off-by: Sally O'Malley <somalley@redhat.com>
2022-08-01 12:55:14 -04:00
Sally O'Malley
47e7d8034f
kubelet tracing
Signed-off-by: Sally O'Malley <somalley@redhat.com>
Co-authored-by: David Ashpole <dashpole@google.com>
2022-08-01 12:55:02 -04:00
Deep Debroy
dfdf8245bb Introduce PodHasNetwork condition for pods
Signed-off-by: Deep Debroy <ddebroy@gmail.com>
2022-08-01 09:51:43 -07:00
Kubernetes Prow Robot
ef8e7c471e
Merge pull request #110291 from danwinship/kep-3178-iptables-cleanup-kubelet
Implement KEP-3178 "iptables cleanup" in kubelet
2022-08-01 07:50:40 -07:00
Sascha Grunert
584783ee9f
Partly remove support for seccomp annotations
We now partly drop the support for seccomp annotations which is planned
for v1.25 as part of the KEP:

https://github.com/kubernetes/enhancements/issues/135

Pod security policies are not touched by this change and therefore we
have to keep the annotation key constants.

This means we only allow the usage of the annotations for backwards
compatibility reasons while the synchronization of the field to
annotation is no longer supported. Using the annotations for static pods
is also not supported any more.

Making the annotations fully non-functional will be deferred to a
future release.

Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
2022-08-01 09:19:29 +02:00
Kubernetes Prow Robot
bebea5f950
Merge pull request #111152 from sivchari/fix-refer-url
fix: refer to url of Node Allocatable
2022-07-31 20:32:39 -07:00
Kubernetes Prow Robot
dd54a044ea
Merge pull request #110940 from pacoxu/ga-disable-accelerator
Disable AcceleratorUsage Metrics: ga
2022-07-31 20:32:28 -07:00
Kubernetes Prow Robot
2e64ae6d62
Merge pull request #110733 from psschwei/probe-grace-period-units
Add unit tests for grace period in killContainer func
2022-07-29 22:30:27 -07:00
Paco Xu
e073b0fd65 Disable AcceleratorUsage Metrics: ga 2022-07-30 12:31:43 +08:00
inosato
3b95d3b076 Remove ioutil in kubelet and its tests
Signed-off-by: inosato <si17_21@yahoo.co.jp>
2022-07-30 12:35:26 +09:00
Kubernetes Prow Robot
25cdaccf0d
Merge pull request #111439 from claudiubelu/fix-plugin-watcher
kubelet: Fixes plugin Watcher for Windows
2022-07-29 19:29:44 -07:00
Kubernetes Prow Robot
d838a8647b
Merge pull request #111418 from muyangren2/winstats_assert
Fix test order pkg/kubelet/winstats/winstats_test.go
2022-07-29 19:29:29 -07:00
Kubernetes Prow Robot
cf2800b812
Merge pull request #111402 from verb/111030-ec-ga
Promote EphemeralContainers feature to GA
2022-07-29 19:29:20 -07:00
Kubernetes Prow Robot
ca34eb1383
Merge pull request #111020 from claudiubelu/adds-unittests-5
unittests: Adds Windows unittests
2022-07-29 19:29:11 -07:00
Kubernetes Prow Robot
5d446b205e
Merge pull request #106244 from cncal/fix-state-checkpoint-testcase
fix test for CheckpointStateRestore
2022-07-29 15:41:14 -07:00
Dmitry Verkhoturov
5126192548 clarify cpu.cfs_period_us default value
cpu.cfs_period_us is 100μs by default despite having an "ms" unit
for some unfortunate reason. Documentation:
https://www.kernel.org/doc/html/latest/scheduler/sched-bwc.html#management

The desired effect of that change is more clarity on the default value
so users would be aware that the 10ms custom value would be
not 0.1x of the default, but 100x of it.
2022-07-29 23:02:35 +02:00