Commit Graph

10171 Commits

Author SHA1 Message Date
Shiming Zhang
d82f606970 Add field for KubeletConfiguration and Regenerate 2021-11-17 11:47:12 +08:00
Kubernetes Prow Robot
1f6d5caa9a
Merge pull request #105437 from cmssczy/update-kubelet-configuration
migrate --register-with-taints to KubeletConfiguration
2021-11-16 17:44:00 -08:00
menglong.qi
b886b9b108 fix: typo 2021-11-17 09:22:57 +08:00
Kubernetes Prow Robot
42d8b2f3b9
Merge pull request #106289 from CatherineF-dev/fix-metrics-AlreadyRegisteredError-in-unit-test
Fix metrics AlreadyRegisteredError on TestRecordOperation and TestGetHistogramVecFromGatherer unit test
2021-11-16 16:36:15 -08:00
Kubernetes Prow Robot
6805e6ee41
Merge pull request #104722 from leiyiz/migration
turning on the CSIMigrationGCE feature flag
2021-11-16 15:28:32 -08:00
Léiyì Zhang
275fdf0884 fixing unit test failures induced by turning on CSIMigrationGCE
disable CSIMigrationGCE in some unit tests
2021-11-16 19:26:30 +00:00
CatherineF-dev
5646120fbb Use Reset at first 2021-11-16 18:57:24 +00:00
haoyun
b5409adaeb refactor: extract multiple ignore errors validate to ignoreError
Signed-off-by: haoyun <yun.hao@daocloud.io>
2021-11-16 20:43:50 +08:00
caozhiyuan
bad4faf1b9 migrate --register-with-taints to KubeletConfiguration 2021-11-16 19:10:36 +08:00
Kubernetes Prow Robot
1d1d462d2f
Merge pull request #104287 from jsturtevant/windows-stats
Reduce the number of expensive calls in the Windows stats queries for dockershim
2021-11-15 18:51:37 -08:00
Kubernetes Prow Robot
0473cab823
Merge pull request #103299 from wgahnagl/addPinned
prevents garbage collection from removing pinned images
2021-11-15 18:51:25 -08:00
Kubernetes Prow Robot
39af75af30
Merge pull request #106201 from yxxhero/fea_106111
Add more msg when exec probe timeout
2021-11-15 17:51:37 -08:00
Kubernetes Prow Robot
463802765d
Merge pull request #104650 from yxxhero/initcontainer_oomkiil_as_a_failure
fix init container oomkilled as a failure
2021-11-15 17:51:25 -08:00
Kubernetes Prow Robot
b7c4962472
Merge pull request #105685 from liggitt/kubelet-file-test
Simplify kubelet file config field allowlists
2021-11-15 14:06:48 -08:00
Odin Ugedal
de0ece541c
Fix cpu share issues on systems with large amounts of cpu
On systems where the calculated cpu shares results in a value above the
max value in linux, containers getting that value are unable to start.
This occur on systems with 300+ cpu cores, and where containers are
given such a value.

This issue was fixed for the pod and qos control groups in the similar
cm.MilliCPUToShares that also has tests verifying the behavior. Since
this code already has an dependency on kubelet/cm, lets reuse that code
instead.
2021-11-14 19:49:19 +00:00
Kubernetes Prow Robot
e4c795168b
Merge pull request #106332 from bobbypage/disable-memcg-notifier
kubelet: cgroupv2 disable memcg notifications
2021-11-12 18:36:46 -08:00
CatherineF-dev
d9737eabf4 Use HandlerFor 2021-11-12 23:09:51 +00:00
CatherineF-dev
49d341aa2b Use defer in non-loop 2021-11-12 23:03:38 +00:00
Kubernetes Prow Robot
1f6aa87a93
Merge pull request #105744 from jsturtevant/windows-containerd-networkstats
Get Windows network stats directly for Containerd
2021-11-12 12:36:41 -08:00
Kubernetes Prow Robot
5f0a94b23c
Merge pull request #104743 from gjkim42/ensure-pod-uniqueness
Ensure there is one running static pod with the same full name
2021-11-12 12:36:28 -08:00
Kubernetes Prow Robot
6c04f87470
Merge pull request #106382 from rphillips/fix_close_log
kubelet: fix file descriptor leak in log rotations
2021-11-12 09:22:40 -08:00
Neha Lohia
fa1b6765d5
move pkg/util/node to component-helpers/node/util (#105347)
Signed-off-by: Neha Lohia <nehapithadiya444@gmail.com>
2021-11-12 07:52:27 -08:00
CatherineF-dev
a30af261f1 remove lint 2021-11-12 15:03:44 +00:00
Ryan Phillips
d6f9df424a defer close the rotated log open 2021-11-12 08:13:24 -06:00
CatherineF-dev
a8324a3bb7 clean 2021-11-12 03:52:19 +00:00
CatherineF-dev
744785ee40 remove prometheus.DefaultRegisterer 2021-11-12 02:17:28 +00:00
Kubernetes Prow Robot
3ca3daac76
Merge pull request #103415 from tiloso/staticcheck-kubelet
Fix staticcheck failure in pkg/kubelet/cm/cpuset
2021-11-11 15:15:13 -08:00
Gunju Kim
2dd4a00509
kubelet: Remove false PLEG errors 2021-11-12 00:03:01 +09:00
David Porter
f5140d3145 kubelet: cgroupv2 disable memcg notifications
The current memory notifier on cgroupv2 relies on reading
`cgroup.event_control` which is unsupported on cgroupv2. For now, let's
disable the feature on cgroupv2.
2021-11-10 15:40:59 -08:00
ravisantoshgudimetla
696abecada [test][kubelet]: Fix out of bounds in TestSyncLabels unit 2021-11-10 16:53:59 -05:00
James Sturtevant
ab2e58c416 Get networks stats directly 2021-11-10 12:43:56 -08:00
James Sturtevant
c39945c116 Add unit tests to existing code 2021-11-10 11:50:04 -08:00
James Sturtevant
3564cd5beb Reduce calls to docker from dockershim for stats 2021-11-10 11:25:03 -08:00
Kubernetes Prow Robot
b56dc43458
Merge pull request #106282 from bobbypage/cadvisor-v043
vendor: Bump cAdvisor to v0.43.0
2021-11-10 08:17:38 -08:00
CatherineF-dev
8290400e9c format 2021-11-10 03:29:13 +00:00
CatherineF-dev
ef0b2dfbf4 Fix metrics AlreadyRegisteredError on TestRecordOperation and TestGetHistogramVecFromGatherer unit test 2021-11-10 03:23:54 +00:00
Kubernetes Prow Robot
5d60c8d857
Merge pull request #102393 from mengjiao-liu/fix-sysctl-regex
Upgrade preparation to verify sysctl values containing forward slashes by regex
2021-11-09 18:23:26 -08:00
David Porter
b6269ce5de kubelet: update cAdvisor usage for v0.43
* Change cAdvisor manager constructor
* Change call to adding AcceleratorUsageMetrics

Signed-off-by: David Porter <david@porter.me>
2021-11-09 17:09:12 -08:00
Kubernetes Prow Robot
6ac2d8edc8
Merge pull request #105967 from shivanshu1333/feature2/master/105841
Migrated scheduler files `preemption.go`, `stateful.go`, `resource_allocation.go` to structured logging
2021-11-09 10:28:01 -08:00
ravisantoshgudimetla
889d45d3fb [kubelet] Reject pods with OS field mismatch
Once kubernetes#104613 and kubernetes#104693
merge, we'll have OS field in pod spec. Kubelet should start rejecting pods
where pod.Spec.OS and node's OS(using runtime.GOOS) won't match
2021-11-08 19:18:15 -05:00
Kubernetes Prow Robot
cda360c59f
Merge pull request #104613 from ravisantoshgudimetla/reconcile-labels
[kubelet]: Reconcile OS and arch labels periodically
2021-11-08 14:15:19 -08:00
Kubernetes Prow Robot
8b463cd141
Merge pull request #105406 from marosset/kubelet-metrics-for-host-process-containers
Adding kubelet metrics for started and failed to start HostProcess containers
2021-11-08 13:11:20 -08:00
Shivanshu Raj Shrivastava
f4aad52885
migrated preemption.go, stateful.go, resource_allocation.go to structured logging 2021-11-08 22:52:47 +05:30
Kubernetes Prow Robot
33de444861
Merge pull request #103095 from haircommander/podAndContainerStatsFromCRI-feature-gate
Kubelet: implement support for podAndContainerStatsFromCRI
2021-11-07 18:26:53 -08:00
yxxhero
4211826c3c add more msg when exec probe timeout
Signed-off-by: yxxhero <aiopsclub@163.com>
2021-11-06 15:59:22 +08:00
ravisantoshgudimetla
21c5c2ec5c [kubelet][podadmission]: Validate and reject pods with mismatching labels 2021-11-05 18:47:43 -04:00
ravisantoshgudimetla
02c1bac0b6 [kubelet]: Sync label periodically 2021-11-05 18:47:43 -04:00
Mark Rossetti
ef324d6bbd Adding kubelet metrics for started and failed to start HostProcess containers
Signed-off-by: Mark Rossetti <marosset@microsoft.com>
2021-11-04 14:39:57 -07:00
Andy Pan
3033a64135 kubelet/eviction: eliminate redundant allocations when handling eventfd 2021-11-04 15:41:46 +08:00
Mengjiao Liu
275d832ce2 Upgrade preparation to verify sysctl values containing forward slashes by regex 2021-11-04 11:49:56 +08:00
Skyler Clark
e9766c2b81
adds pinned field to imageRecords 2021-11-03 14:47:37 -04:00
Patrick Ohly
3948cb8d1b component-base: move v/vmodule/log-flush-frequency into LoggingConfiguration
These three options are the ones from logs.AddFlags which are not deprecated.
Therefore it makes sense to make them available also via the configuration file
support in the one command which currently supports that (kubelet).

Long-term, all commands should use LoggingConfiguration, either with a
configuration file (as in kubelet) or via flags (kube-scheduler,
kube-apiserver, kube-controller-manager).

Short-term, both approaches have to be supported. As the majority of the
commands only use logs.AddFlags, that function by default continues to register
the flags and only leaves that to Options.AddFlags when explicitly requested.

A drive-by bug fix is done for log flushing: the periodic flushing called
klog.Flush and therefore missed explicit flushing of the newer logr
backend. This bug was never present in any release Kubernetes and therefore the
fix is not submitted in a separate PR.
2021-11-03 07:41:46 +01:00
Kubernetes Prow Robot
aa0ea62489
Merge pull request #104903 from ikeeip/storageobjectinuseprotection_feature_ga_cleanup
Remove StorageObjectInUseProtection feature gate logic
2021-11-02 20:22:57 -07:00
Kubernetes Prow Robot
359b722c19
Merge pull request #102882 from fromanirh/device-manager-checkpoints
devicemanager: checkpoint: support pre-1.20 data
2021-11-02 16:56:57 -07:00
Konstantin Misyutin
808c8f42d5 Remove StorageObjectInUseProtection feature gate logic
This feature has graduated to GA in v1.11 and will always be
enabled. So no longe need to check if enabled.

Signed-off-by: Konstantin Misyutin <konstantin.misyutin@huawei.com>
2021-11-03 00:13:50 +03:00
Skyler Clark
d3ae0a381a
prevents garbage collection from removing pinned images 2021-11-02 14:43:02 -04:00
Jordan Liggitt
94d0c0f78e Simplify kubelet file config field allowlists 2021-11-02 10:23:54 -04:00
Kubernetes Prow Robot
08bf54678e
Merge pull request #101909 from nolancon/cpu-mgr-testing
Additional cases for reconcileState testing
2021-10-30 00:01:17 -07:00
Tim Hockin
11a25bfeb6
De-share the Handler struct in core API (#105979)
* De-share the Handler struct in core API

An upcoming PR adds a handler that only applies on one of these paths.
Having fields that don't work seems bad.

This never should have been shared.  Lifecycle hooks are like a "write"
while probes are more like a "read". HTTPGet and TCPSocket don't really
make sense as lifecycle hooks (but I can't take that back). When we add
gRPC, it is EXPLICITLY a health check (defined by gRPC) not an arbitrary
RPC - so a probe makes sense but a hook does not.

In the future I can also see adding lifecycle hooks that don't make
sense as probes.  E.g. 'sleep' is a common lifecycle request. The only
option is `exec`, which requires having a sleep binary in your image.

* Run update scripts
2021-10-29 13:15:11 -07:00
Peter Hunt
6b3f8e5662 kubelet: fallback to partial CRI stats if full fails
This is partially to allow the kube alpha tests to pass until CRI implementations have support, but also to handle this error situation a bit more elegantly

Signed-off-by: Peter Hunt <pehunt@redhat.com>
2021-10-29 09:40:20 -04:00
Peter Hunt
feb5f5e0ed kubelet: use helper function to check for nil fields in sandbox stats
Signed-off-by: Peter Hunt <pehunt@redhat.com>
2021-10-29 09:40:20 -04:00
Peter Hunt
85e8a4bf73 kubelet stats: use UsageNanoCores if available
Signed-off-by: Peter Hunt <pehunt@redhat.com>
2021-10-29 09:40:20 -04:00
Peter Hunt
ffdb4b9c4a kubelet: slightly move around some cri stats functions
to reduce duplication and add clarity

Signed-off-by: Peter Hunt <pehunt@redhat.com>
2021-10-29 09:40:20 -04:00
Peter Hunt
d2c436700e kubelet stats: add support for podAndContainerStatsFromCRI
This commit adds an initial implementation of translating from the new CRI fields
to the /stats/summary PodStats object

Signed-off-by: Peter Hunt <pehunt@redhat.com>
2021-10-29 09:40:20 -04:00
Peter Hunt
7866287ba1 kubelet stats: wire up podAndContainerStatsFromCRI feature gate
though it is currently unused

Signed-off-by: Peter Hunt <pehunt@redhat.com>
2021-10-29 09:40:20 -04:00
Kubernetes Prow Robot
c592bd40f2
Merge pull request #105609 from pohly/generic-ephemeral-volume-ga
generic ephemeral volume GA
2021-10-28 17:36:50 -07:00
Francesco Romani
2f426fdba6 devicemanager: checkpoint: support pre-1.20 data
The commit a8b8995ef2
changed the content of the data kubelet writes in the checkpoint.
Unfortunately, the checkpoint restore code was not updated,
so if we upgrade kubelet from pre-1.20 to 1.20+, the
device manager cannot anymore restore its state correctly.

The only trace of this misbehaviour is this line in the
kubelet logs:
```
W0615 07:31:49.744770    4852 manager.go:244] Continue after failing to read checkpoint file. Device allocation info may NOT be up-to-date. Err: json: cannot unmarshal array into Go struct field PodDevicesEntry.Data.PodDeviceEntries.DeviceIDs of type checkpoint.DevicesPerNUMA
```

If we hit this bug, the device allocation info is
indeed NOT up-to-date up until the device plugins register
themselves again. This can take up to few minutes, depending
on the specific device plugin.

While the device manager state is inconsistent:
1. the kubelet will NOT update the device availability to zero, so
   the scheduler will send pods towards the inconsistent kubelet.
2. at pod admission time, the device manager allocation will not
   trigger, so pods will be admitted without devices actually
   being allocated to them.

To fix these issues, we add support to the device manager to
read pre-1.20 checkpoint data. We retroactively call this
format "v1".

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-10-26 09:54:11 +02:00
Kubernetes Prow Robot
17da6a2345
Merge pull request #105699 from yuzhiquan/remove-format-pods
Remove format.pods func, instead with klog.Kobjs
2021-10-25 15:53:30 -07:00
Yuan Chen
b99495d1d9 Fix and improve comments on kubelet metrics 2021-10-21 17:38:25 -07:00
Eric Ernst
2c0fad1f52 kuberuntime: populate sandbox resources, overhead
Populate Resources and Overhead fields which, are now part of
LinuxPodSandboxConfig.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-10-20 11:30:23 -07:00
Eric Ernst
ddcf815d12 kuberuntime: refactor linux resources for better reuse
Seperate the CPU/Memory req/limit -> linux resource conversion into its
own function for better reuse.

Elsewhere in kuberuntime pkg, we will want to leverage this
requests/limits to Linux Resource type conversion.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-10-20 11:30:23 -07:00
Eric Ernst
b1361aed93 kuberuntime: augment linux container config unit test
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-10-20 11:30:23 -07:00
Eric Ernst
a73502a0be kuberuntime: augment linux container config unit test
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-10-20 11:29:22 -07:00
Kubernetes Prow Robot
b2c4269992
Merge pull request #105631 from klueska/upstream-distribute-cpus-across-numa
Add CPUManager policy option to distribute CPUs across NUMA nodes instead of packing them
2021-10-19 11:40:24 -07:00
Gunju Kim
3bce245279
Ensure there is one running static pod with the same full name 2021-10-19 16:30:18 +09:00
Kubernetes Prow Robot
1af8a8c026
Merge pull request #105465 from marosset/remove-host-process-contianer-kubelet-annotations
Stop passing WindowsHostProcessContainer annotations for CRI calls in kubelet
2021-10-18 15:50:02 -07:00
Kubernetes Prow Robot
e595d79dfc
Merge pull request #104574 from 249043822/br-repeat-package
fix duplicate package import in pod_worker
2021-10-18 15:49:46 -07:00
Kubernetes Prow Robot
5889fb4fbc
Merge pull request #105652 from wzshiming/feat/structure-shutdown-config
Refactor to use structure to pass parameters for GracefulNodeShutdown
2021-10-18 14:45:20 -07:00
Kevin Klues
86f9c266bc Add optimizations to reduce iterations in distributed NUMA algorithm
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-10-18 08:53:25 +00:00
Kevin Klues
70e0f47191 Support full-pcpus-only with the new NUMA distribution policy option
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-10-16 19:31:02 +00:00
Kevin Klues
d54445a84d Generalize the NUMA distribution algorithm to take cpuGroupSize
This parameter ensures that CPUs are always allocated in groups of size
'cpuGroupSize'. This is important, for example, to ensure that all CPUs (i.e.
hyperthreads) from the same core are handed out together.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-10-16 19:31:02 +00:00
Kevin Klues
1436e33642 Add more extensive testing for NUMA distribution algorithm in CPUManager
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-10-16 19:31:02 +00:00
Kevin Klues
cf3afb8602 Add 2 distinguishing test cases between the 2 takeByTopology algorithms
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-10-16 19:31:02 +00:00
Kevin Klues
eb78e2406b Add a new TestTakeByTopologyNUMADistributed() test to the CPUManager
As part of this, pull out all of the existing "TakeByTopology" tests and have
them be called by the original TestTakeByTopologyNUMAPacked() as well as the
new TestTakeByTopologyNUMADistributed() test. In a subsequent commit, we will
add some tests that should differ between these two algorithms.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-10-16 19:31:02 +00:00
Kevin Klues
876dd9b078 Added algorithm to CPUManager to distribute CPUs across NUMA nodes
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-10-16 19:31:02 +00:00
Kevin Klues
462544d079 Split CPUManager takeByTopology() into two different algorithms
The first implements the original algorithm which packs CPUs onto NUMA nodes if
more than one NUMA node is required to satisfy the allocation. The second
disitributes CPUs across NUMA nodes if they can't all fit into one.

The "distributing" algorithm is currently a noop and just returns an error of
"unimplemented". A subsequent commit will add the logic to implement this
algorithm according to KEP 2902:

https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2902-cpumanager-distribute-cpus-policy-option

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-10-16 14:46:19 +00:00
Kevin Klues
0e7928edce Add new CPUManager policy option for "distribute-cpus-across-numa"
This commit only adds the option to the policy options framework. A
subsequent commit will add the logic to utilize it.

The KEP describing this new option can be found here:
https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2902-cpumanager-distribute-cpus-policy-option

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-10-16 14:46:19 +00:00
yuzhiquanlong
27fe56e916 remove unused import 2021-10-15 18:40:31 +08:00
Francesco Romani
4bae656835 cpumanager: test NUMA node support for CPU assign (2)
This batch of tests adds a fake topology on which each numa node
has multiple sockets. We didn't find yet a real HW topology in the wild
like this, but we need one to fully exercise the code.

So, until we find a HW topology, we add a fake one flipping
the NUMA/socket config of the existing xeon dual gold 6320.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-10-15 10:29:21 +00:00
Francesco Romani
547996f3f6 cpumanager: test NUMA node support for CPU assign (1)
This batch of tests adds a real topology on which each physical socket
has multiple NUMA zones. Taken by a real dual xeon 6320 gold.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-10-15 10:29:21 +00:00
Francesco Romani
f6ccc4426a cpumanager: test: use proper subtests
The exisiting unit tests where performing subtests without
actually using the full features of the testing package
(https://pkg.go.dev/testing#hdr-Subtests_and_Sub_benchmarks)

Update them with fairly minimal changes. The patch is deceptively
large because we need to move the code inside a new block.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-10-15 10:29:21 +00:00
Francesco Romani
15caa134b2 cpumanager: topology: use rich cmp package
User the `cmp.Diff` package in the unit tests, moving away from
`reflect.DeepEqual`. This gives us a clearer picture of the differences
when the tests fail.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-10-15 10:29:21 +00:00
Kevin Klues
aff54a0914 Abstract out whether NUMA or Sockets come first in the memory hierarchy
This allows us to get rid of the check for determining which one is higher all
throughout the code. Now we just check once and instantiate an interface of the
appropriate type that makes sure the ordering in the hierarchy is preserved
through the appropriate calls.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-10-15 10:29:15 +00:00
yuzhiquanlong
be9e1fda5e remove format pods func, instead with klog.Kobjs 2021-10-15 18:26:02 +08:00
Kevin Klues
17c7e86c6d Add NUMA support to the CPU assignment algorithm in the CPUManager
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-10-15 08:35:59 +00:00
Shiming Zhang
e47c78a354 Add log for creating node shutdown manager 2021-10-15 11:16:21 +08:00
Shiming Zhang
b468c24e85 Refactor to use structure to pass parameters 2021-10-15 11:16:21 +08:00
Kubernetes Prow Robot
a923852ba0
Merge pull request #105215 from rphillips/add_probe_shutdown
kubelet: add probe termination to graceful shutdowns
2021-10-11 21:19:46 -07:00
Patrick Ohly
a8c930ef46 generic ephemeral volume: graduation to GA
The feature gate gets locked to "true", with the goal to remove it in two
releases.

All code now can assume that the feature is enabled. Tests for "feature
disabled" are no longer needed and get removed.

Some code wasn't using the new helper functions yet. That gets changed while
touching those lines.
2021-10-11 20:54:20 +02:00
nolancon
6bbb36df10 Additional cases for reconcileState testing 2021-10-11 16:17:21 +00:00
Kubernetes Prow Robot
dc9c571166
Merge pull request #105569 from pohly/generic-ephemeral-kubelet-volume-stats
kubelet: also provide filesystem stats for generic ephemeral volumes
2021-10-11 07:52:39 -07:00
Kubernetes Prow Robot
1f2813368e
Merge pull request #105542 from pohly/generic-ephemeral-volume-util-kubelet
kubelet: use generic ephemeral volume helper functions
2021-10-11 02:16:40 -07:00
Kubernetes Prow Robot
fb82a0d7eb
Merge pull request #104873 from pohly/json-output-stream
JSON output streams
2021-10-10 17:04:37 -07:00
Patrick Ohly
b22263d835 component-base: configurable JSON output
This implements the replacement of klog output to different files per level
with optionally splitting JSON output into two streams: one for info messages
on stdout, one for error messages on stderr. The info messages can get buffered
to increase performance. Because stdout and stderr might be merged by the
consumer, the info stream gets flushed before writing an error, to ensure that
the order of messages is preserved.

This also ensures that the following code pattern doesn't leak info messages:
   klog.ErrorS(err, ...)
   os.Exit(1)

Commands explicitly have to flush before exiting via logs.FlushLogs. Most
already do. But buffered info messages can still get lost during an unexpected
program termination, therefore buffering is off by default.

The new options get added to the v1alpha1 LoggingConfiguration with new command
line flags. Because it is an alpha field, changing it inside the v1beta kubelet
config should be okay as long as the fields are clearly marked as alpha.
2021-10-09 10:10:35 +02:00
Kubernetes Prow Robot
63f66e6c99
Merge pull request #105012 from fromanirh/cpumanager-policy-options-beta
node: graduate CPUManagerPolicyOptions to beta
2021-10-08 07:32:59 -07:00
Kubernetes Prow Robot
2face135c7
Merge pull request #97415 from AlexeyPerevalov/ExcludeSharedPoolFromPodResources
Return only isolated cpus in podresources interface
2021-10-08 05:58:58 -07:00
Patrick Ohly
b1ba381ef8 kubelet: also provide filesystem stats for generic ephemeral volumes
When checking for a reference to a PVC, the code also needs to consider that a
PVC might be referenced indirectly through an ephemeral volume source.
2021-10-08 12:11:52 +02:00
Kubernetes Prow Robot
dd650bd41f
Merge pull request #105527 from rphillips/fixes/filter_terminated_pods
kubelet: set terminated podWorker status for terminated pods
2021-10-07 22:19:51 -07:00
Ryan Phillips
0166d446b9 kubelet: set terminated podWorker status for terminated pods 2021-10-07 16:18:59 -05:00
Patrick Ohly
844662e7fa kubelet: use generic ephemeral volume helper functions
The name concatenation and ownership check were originally considered small
enough to not warrant dedicated functions, but the intent of the code is more
readable with them.
2021-10-07 17:31:54 +02:00
Alexey Perevalov
5d9032007a Return only isolated cpus in podresources interface
Co-Authored-by: Swati Sehgal <swsehgal@redhat.com>
Signed-off-by: Alexey Perevalov <alexey.perevalov@huawei.com>
2021-10-07 15:34:08 +01:00
Kubernetes Prow Robot
c4d802b0b5
Merge pull request #103289 from AlexeyPerevalov/DoNotExportEmptyTopology
podresources: do not export empty NUMA topology
2021-10-07 07:11:46 -07:00
Kubernetes Prow Robot
907d62eac8
Merge pull request #105462 from ehashman/merge-terminal-phase
Ensure terminal pods maintain terminal status
2021-10-05 13:12:58 -07:00
Mark Rossetti
99e43bfa8c Stop passing WindowsHostProcessContainer annotations for CRI calls in kubelet
Signed-off-by: Mark Rossetti <marosset@microsoft.com>
2021-10-05 10:08:53 -07:00
Elana Hashman
3005ef34f2
Ensure terminal pods maintain terminal status 2021-10-05 09:26:27 -07:00
Kubernetes Prow Robot
c91f9bdc60
Merge pull request #104689 from cynepco3hahue/memory_manager_restricted_policy_fix
kubelet: memory manager: fix preferred topology hints calculation
2021-10-05 06:47:08 -07:00
Kubernetes Prow Robot
efa9029a0d
Merge pull request #104920 from tkashem/response-writer-cleanup
apiserver: decorate http.ResponseWriter correctly
2021-10-05 00:53:09 -07:00
Elana Hashman
5ff6c2396d
Do not sync Waiting statuses for Terminated pods 2021-10-04 11:05:54 -07:00
Abu Kashem
0d50c969c5
apiserver: wrap ResponseWriter using abstraction 2021-10-04 10:59:11 -04:00
Kubernetes Prow Robot
e414cf7641
Merge pull request #100482 from pohly/generic-ephemeral-volume-checks
generic ephemeral volume checks
2021-10-01 10:47:22 -07:00
Patrick Ohly
1e26115df5 consider ephemeral volumes for host path and node limits check
When adding the ephemeral volume feature, the special case for
PersistentVolumeClaim volume sources in kubelet's host path and node
limits checks was overlooked. An ephemeral volume source is another
way of referencing a claim and has to be treated the same way.
2021-10-01 17:03:44 +02:00
Kubernetes Prow Robot
883250145c
Merge pull request #104788 from 249043822/memorymanager-br
Fix initContainersReusableMemory delete bug in MemoryManager
2021-10-01 05:27:22 -07:00
Kubernetes Prow Robot
cab54856f1
Merge pull request #104933 from vikramcse/automate_mockery
conversion of tests from mockery to mockgen
2021-09-30 18:33:21 -07:00
Shuhei Kitagawa
ef0eff14ab
Add tests kubelet default config (#105116)
* Use utilpointer to get a pointer

* Add tests for kubelet default configs

* Change copyright year from 2015 to 2021

* Run gofmt

* Add all negative and all positive test cases
2021-09-30 17:29:33 -07:00
Francesco Romani
077c0aa1be node: graduate CPUManagerPolicyOptions to beta
We graduate the `CPUManagerPolicyOptions` feature to beta
in the 1.23 cycle, and we add new experimental feature gates
to guard new options which are planned in the 1.23 and in the
following cycles.

We introduce additional feature gate called `CPUManagerPolicyAlphaOptions` and
`CPUManagerPolicyBetaOptions`. The basic idea is to avoid the
cumbersome process of adding a feature gate for each option, and to have
feature gates which track the maturity level of _groups_ of options.
Besides this change, the graduation process, and the process in general,
for adding new policy options is still unchanged.

The `full-pcpus-only` option added in the 1.22 cycle is intentionally
moved into the beta policy options

For more details:
- KEP: https://github.com/kubernetes/enhancements/pull/2933
- sig-arch discussion:
  https://groups.google.com/u/1/g/kubernetes-sig-architecture/c/Nxsc7pfe5rw

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-09-29 11:40:03 +02:00
Kubernetes Prow Robot
e138afc35d
Merge pull request #105213 from yxxhero/remove_StartedPodsErrorsTotal_metrice_message
Remove StartedPodsErrorsTotal metric message
2021-09-28 10:45:16 -07:00
Kubernetes Prow Robot
9005160245
Merge pull request #105272 from wojtek-t/add_jittering_for_kubelet
Add jittering for Kubelet status computing
2021-09-28 00:20:42 -07:00
wojtekt
65d8037ae3 Add jittering for Kubelet status computing 2021-09-27 19:39:50 +02:00
vikram Jadhav
0de4397490 mockery to mockgen conversion 2021-09-25 16:15:08 +00:00
Khaled Henidak (Kal)
a53e2eaeab
move IPv6DualStack feature to stable. (#104691)
* kube-proxy

* endpoints controller

* app: kube-controller-manager

* app: cloud-controller-manager

* kubelet

* app: api-server

* node utils + registry/strategy

* api: validation (comment removal)

* api:pod strategy (util pkg)

* api: docs

* core: integration testing

* kubeadm: change feature gate to GA

* service registry and rest stack

* move feature to GA

* generated
2021-09-24 16:30:22 -07:00
yxxhero
35df409a7e remove StartedPodsErrorsTotal metrice message
Signed-off-by: yxxhero <aiopsclub@163.com>
2021-09-23 22:18:56 +08:00
Kubernetes Prow Robot
2541fcf256
Merge pull request #104123 from fromanirh/podresources-not-report-unhealthy-devices
devicemanager: skip unhealthy devices in GetAllocatable
2021-09-23 05:39:21 -07:00
Ryan Phillips
e2e938066d kubelet: add probe termination to graceful shutdowns 2021-09-22 14:13:25 -05:00
Francesco Romani
1b6efa5e21 devicemanager: skip unhealthy devs in GetAllocatable
The GetAllocatableDevices, needed to support the podresources
API, doesn't take into account the device health when computing
its output.

In this PR we address this gap and add unit tests along the way
to prevent regressions. This gives us a good initial coverage,
E2E tests to cover this case are much harder to write, because
we would need to inject faults to trigger the unhealthy status.
We will evaluate if adding these tests into later PRs.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-09-22 19:20:04 +02:00
Kubernetes Prow Robot
7c71e06cd1
Merge pull request #104959 from calvin0327/issue-test-dataRace
fix the test issue of node shutdown manager
2021-09-21 11:56:30 -07:00
Kubernetes Prow Robot
44d4d007bf
Merge pull request #103424 from 249043822/br-cadvisor-perf
Optimize kubelet stats provider for perfomace bottleneck
2021-09-21 11:56:18 -07:00
Kubernetes Prow Robot
353f0a5eab
Merge pull request #105095 from wojtek-t/migrate_clock_3
Unify towards k8s.io/utils/clock - part 3
2021-09-20 12:46:45 -07:00
Kubernetes Prow Robot
0d20f47c7a
Merge pull request #105090 from saad-ali/removeSubpathFeaturegate
Remove VolumeSubpath feature gate
2021-09-17 15:52:07 -07:00
wojtekt
d9b08c611d Migrate to k8s.io/utils/clock 2021-09-17 15:19:08 +02:00
Kubernetes Prow Robot
cb2ea4bf7c
Merge pull request #101161 from rikatz/move-sysctl-util
Move node and networking related helpers from pkg/util to component helpers
2021-09-17 02:11:00 -07:00
saad-ali
beb17fe10b Remove VolumeSubpath feature gate
Remove the VolumeSubpath feature gate.

Feature gate convention has been updated since this was introduced to
indicate that they "are intended to be deprecated and removed after a
feature becomes GA or is dropped.".
2021-09-17 01:59:23 -07:00
Ricardo Pchevuzinske Katz
37d11bcdaf Move node and networking related helpers from pkg/util to component helpers
Signed-off-by: Ricardo Katz <rkatz@vmware.com>
2021-09-16 17:00:19 -03:00
Clayton Coleman
d5719800bf
kubelet: Handle UID reuse in pod worker
If a pod is killed (no longer wanted) and then a subsequent create/
add/update event is seen in the pod worker, assume that a pod UID
was reused (as it could be in static pods) and have the next
SyncKnownPods after the pod terminates remove the worker history so
that the config loop can restart the static pod, as well as return
to the caller the fact that this termination was not final.

The housekeeping loop then reconciles the desired state of the Kubelet
(pods in pod manager that are not in a terminal state, i.e. admitted
pods) with the pod worker by resubmitting those pods. This adds a
small amount of latency (2s) when a pod UID is reused and the pod
is terminated and restarted.
2021-09-15 14:02:00 -04:00
KeZhang
a629ceeb58 Fix initContainersReusableMemory delete bug 2021-09-15 10:04:49 +08:00
Kubernetes Prow Robot
fa2657b8b2
Merge pull request #104624 from Haleygo/support-null-resolvConf-in-configFile
When resolvConf is "" in kubelet configuration, pod will be created with wrong dns policy
2021-09-14 14:18:59 -07:00
yxxhero
c1b94d27d9 fix typo
Signed-off-by: yxxhero <aiopsclub@163.com>
2021-09-14 23:24:14 +08:00
Haleygo
46454ea9dc support null resolvConf in Kubelet Configuration 2021-09-14 16:12:52 +08:00
Kubernetes Prow Robot
047a6b9f86
Merge pull request #104874 from wojtek-t/migrate_clock_1
Unify towards k8s.io/utils/clock - part 1
2021-09-13 19:09:20 -07:00
Kubernetes Prow Robot
c79f7c1add
Merge pull request #104711 from claudiubelu/update-pause-3.6
update pause image references to use 3.6
2021-09-13 19:09:08 -07:00
yxxhero
20b3cd5198 fix typo
Signed-off-by: yxxhero <aiopsclub@163.com>
2021-09-14 09:04:59 +08:00
yxxhero
5ba76eb911 fix typo
Signed-off-by: yxxhero <aiopsclub@163.com>
2021-09-14 09:03:29 +08:00
Kubernetes Prow Robot
0e2acbe9a8
Merge pull request #104794 from wzshiming/fix/kubelet-cm-kv-pair
pkg/kubelet/cm/memorymanager: Fix ErrorS key/value pair
2021-09-13 15:44:04 -07:00
calvin0327
db82e282fc fix the test issue of data race to node shutdown manager 2021-09-13 18:12:19 +08:00
wojtekt
53ce79a18a Migrate to k8s.io/utils/clock in pkg/kubelet 2021-09-10 12:20:09 +02:00
Kubernetes Prow Robot
1dcea5cb02
Merge pull request #104817 from smarterclayton/pod_status
kubelet: Rejected pods should be filtered from admission
2021-09-09 22:15:59 -07:00
Kubernetes Prow Robot
5724484bda
Merge pull request #104069 from pacoxu/fix-data-race-104057
fix data race in kubelet volume test: add lock for ut
2021-09-09 21:09:59 -07:00
eggiter
20d3bc32ac fix(cpumanager): Do not release cpus of init containers while they are reused in app containers 2021-09-10 10:01:35 +08:00
Clayton Coleman
17d32ed0b8
kubelet: Rejected pods should be filtered from admission
A pod that has been rejected by admission will have status manager
set the phase to Failed locally, which make take some time to
propagate to the apiserver. The rejected pod will be included in
admission until the apiserver propagates the change back, which
was an unintended regression when checking pod worker state as
authoritative.

A pod that is terminal in the API may still be consuming resources
on the system, so it should still be included in admission.
2021-09-08 10:23:45 -04:00
Shiming Zhang
7706d3d281 pkg/kubelet/cm/memorymanager: Fix ErrorS key/value pair 2021-09-06 17:37:04 +08:00
vikram Jadhav
c10c92bda9 changes made by introducing mockgen command 2021-09-03 17:40:11 +00:00
Vikram Jadhav
5f674101bb Added update and verify scripts for automated mock generation 2021-09-03 17:40:11 +00:00
yxxhero
2f448a0789 fix oomkilled description
Signed-off-by: yxxhero <aiopsclub@163.com>
2021-09-03 22:07:46 +08:00
yxxhero
71a91d55cb update func description 2021-09-03 07:20:28 +08:00
yxxhero
afde4c8bc4 fix init container oomkilled as a failure
Signed-off-by: yxxhero <aiopsclub@163.com>
2021-09-03 07:04:57 +08:00
Kubernetes Prow Robot
0b4a793da2
Merge pull request #103941 from saschagrunert/seccomp-profile-root
Remove deprecated `--seccomp-profile-root`/`seccompProfileRoot` config
2021-09-02 08:52:57 -07:00
paco
ab055e9ba4 fix data race in kubelet volume test: add lock
Signed-off-by: Paco Xu <paco.xu@daocloud.io>
Co-authored-by: Jian Zeng <zengjian.zj@bytedance.com>
2021-09-01 16:13:55 +08:00
Artyom Lukianov
9ea9798759 kubelet: memory manager: fix topology preferred topology hints calculation
Prevent starting pods with resources satisfied by a single NUMA node on multiple NUMA nodes.
The code returned before it updated the minimal amount of NUMA nodes that can satisfy the container
requests.

Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
2021-08-31 17:46:59 +03:00
Sascha Grunert
46077e6be7
Remove deprecated --seccomp-profile-root/seccompProfileRoot configuration
The configuration is deprecated and targets removal for v1.23. Tests
cases have been changed as well.

Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
2021-08-31 09:55:28 +02:00
Kubernetes Prow Robot
bbbeceb6aa
Merge pull request #104577 from smarterclayton/smaller_filter_master
kubelet: Admission must exclude completed pods and avoid races
2021-08-30 13:17:13 -07:00
Claudiu Belu
18936d4785 updates pause image references
The pause:3.6 image has been published.

Also updates older / incorrect references.
2021-08-29 21:50:05 -07:00
Kubernetes Prow Robot
c262d09bb7
Merge pull request #104604 from wojtek-t/fix_secret_manager_2
Don't prematurely close reflectors in case of slow initialization in watch based manager
2021-08-26 06:11:23 -07:00
wojtekt
515106b795 Don't prematurely close reflectors in case of slow initialization in watch based manager 2021-08-26 11:34:24 +02:00
tiloso
2b86541313 Fix staticcheck failure in pkg/kubelet/cm/cpuset 2021-08-26 08:50:08 +02:00
Kubernetes Prow Robot
cbd0611d49
Merge pull request #104528 from kolyshkin/runc-1.0.2
vendor: bump runc to 1.0.2
2021-08-25 18:17:23 -07:00
Kubernetes Prow Robot
2f6b9166d7
Merge pull request #104039 from YanzhaoLi/extract-containerdid-from-various-cgrouppath
Get containerID from systemd-style cgroupPath in cri_stats_provider
2021-08-25 17:05:22 -07:00
Clayton Coleman
a2ca66d280
kubelet: Admission must exclude completed pods and avoid races
Fixes two issues with how the pod worker refactor calculated the
pods that admission could see (GetActivePods() and
filterOutTerminatedPods())

First, completed pods must be filtered from the "desired" state
for admission, which arguably should be happening earlier in
config. Exclude the two terminal pods states from GetActivePods()

Second, the previous check introduced with the pod worker lifecycle
ownership changes was subtly wrong for the admission use case.
Admission has to include pods that haven't yet hit the pod worker,
which CouldHaveRunningContainers was filtering out (because the
pod worker hasn't seen them). Introduce a weaker check -
IsPodKnownTerminated() - that returns true only if the pod is in
a known terminated state (no running containers AND known to pod
worker). This weaker check may only be called from components that
need admitted pods, not other kubelet subsystems.

This commit does not fix the long standing bug that force deleted
pods are omitted from admission checks, which must be fixed by
having GetActivePods() also include pods "still terminating".
2021-08-25 13:31:02 -04:00
KeZhang
dd4fd54427 fix duplicate package import in pod_worker 2021-08-25 21:16:38 +08:00
Stephen Augustus
481cf6fbe7
generated: Run hack/update-gofmt.sh
Signed-off-by: Stephen Augustus <foo@auggie.dev>
2021-08-24 15:47:49 -04:00
Alexey Perevalov
bb81101570 podresource: do not export NUMA topology if it's empty
If device plugin returns device without topology, keep it internaly
as NUMA node -1, it helps at podresources level to not export NUMA
topology, otherwise topology is exported with NUMA node id 0,
which is not accurate.

It's imposible to unveile this bug just by tracing json.Marshal(resp)
in podresource client, because NUMANodes field ID has json property
omitempty, in this case when ID=0 shown as emtpy NUMANode.
To reproduce it, better to iterate on devices and just
trace dev.Topology.Nodes[0].ID.

Signed-off-by: Alexey Perevalov <alexey.perevalov@huawei.com>
2021-08-24 15:38:21 +00:00
Kir Kolyshkin
c06a851042 pkg/kubelet/cm: use SkipFreezeOnSet
This is a knob added by runc 1.0.2 specifically for kubernetes,
which tells runc/libcontainer/cgroups/systemd v1 manager to not
freeze the cgroup in Set().

We set this knob here because this code is only used for pods
(rather than containers) management, and in this place we create or
update the pod cgroup with no device limits set, so we can skip the
freeze.

If this knob is not set, libcontainer's cgroup v1 manager tries to
figure out whether the freeze is needed or not, but it's a somewhat
expensive check to perform, thus the knob is a shortcut.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-08-23 13:41:51 -07:00
Antonio Ojea
0cd75e8fec run hack/update-netparse-cve.sh 2021-08-20 10:42:09 +02:00
Kubernetes Prow Robot
8dbc33d649
Merge pull request #101081 from rphillips/add_graceful_shutdown_event
kubelet: add graceful shutdown events
2021-08-17 22:08:08 -07:00
Kubernetes Prow Robot
a779c58b16
Merge pull request #104330 from liggitt/defaulter-package
Change defaulter-gen input to package import path
2021-08-17 11:42:18 -07:00
Kubernetes Prow Robot
07b7afefbf
Merge pull request #103862 from tanjing2020/cleancode
Replace 'x.Sub(time.Now())' with 'time.Until(x)'
2021-08-17 11:42:01 -07:00
Kubernetes Prow Robot
d7c1663556
Merge pull request #103137 from wzshiming/fix/expected_inhibit_delay
Allow the actual inhibit delay to be greater than the expected inhibit delay
2021-08-17 11:41:49 -07:00
Kubernetes Prow Robot
a9aad7e034
Merge pull request #103107 from pacoxu/fix-93300
ResourceConfigForPod: check initContainers as other QoS func
2021-08-17 11:41:37 -07:00
Kubernetes Prow Robot
f4185318bc
Merge pull request #103048 from gy95/remove_static
remove not used IsStaticPod, prevent possible panic
2021-08-17 11:41:25 -07:00
Kubernetes Prow Robot
b559434c02
Merge pull request #103059 from rajaSahil/fix-error
Update github.com/pkg/errors to go native errors pkg
2021-08-17 10:29:25 -07:00
Kubernetes Prow Robot
db42b67f3c
Merge pull request #101962 from llhhbc/add-osinfo-logs
Add getOSInfo err info
2021-08-17 10:29:13 -07:00
Jordan Liggitt
87a4e082ac Change defaulter-gen input to package path 2021-08-14 11:00:18 -04:00
YanzhaoLi
545d898584 Extract containerID from systemd-style cgroupPath in cri_stats_provider
And fix test to generate UUID without dash
2021-08-11 19:03:56 -07:00
Ryan Phillips
30e9a420c4 kubelet: fix sandbox creation error suppression when pods are quickly deleted 2021-08-10 08:55:25 -05:00
Kubernetes Prow Robot
4b4d12f8a6
Merge pull request #102913 from pacoxu/upgrade-promotheus-common
upgrade prometheus/common to v0.28.0
2021-08-09 08:03:31 -07:00
longhui.li
4af506c989 Add getOSInfo err info 2021-08-09 11:04:53 +08:00
Artyom Lukianov
73a5cce3e6 device manager: do not clean admitted pods from the state
Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
2021-08-08 16:46:06 +03:00
Artyom Lukianov
93a237abd8 memory manager: do not clean admitted pods from the state
Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
2021-08-08 16:46:06 +03:00
Artyom Lukianov
66babd1a90 cpu manager: do not clean admitted pods from the state
Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
2021-08-08 16:46:06 +03:00
Elana Hashman
d2ed3b28b7
Revert "revert Bump DynamicKubeConfig metric deprecation to 1.23 by delta update" 2021-08-06 08:38:56 -07:00
Kubernetes Prow Robot
28990f7664
Merge pull request #103958 from liggitt/server-timeouts
Set idle and readheader timeouts
2021-08-05 14:11:02 -07:00
Kubernetes Prow Robot
3b84cc9e6b
Merge pull request #104075 from kerthcet/cleanup/revert-dynamickubeconfig-metric
revert Bump DynamicKubeConfig metric deprecation to 1.23 by delta update
2021-08-05 08:18:40 -07:00
Kubernetes Prow Robot
fa1d682bd7
Merge pull request #103353 from njuptlzf/fix_datarace
fix data race for Test_Run_Positive_VolumeMountControllerAttachEnabledRace
2021-08-04 19:00:23 -07:00
Kubernetes Prow Robot
a674fb496c
Merge pull request #103261 from markusthoemmes/kubelet-volume-logs
Add pod context to volume lifecycle logs
2021-08-04 19:00:15 -07:00
Kubernetes Prow Robot
4b2f2a0cd8
Merge pull request #102789 from haircommander/add-summary-stats-to-cri
CRI: add fields for pod level stats to satisfy the /stats/summary API
2021-08-04 18:59:43 -07:00
Wesley Williams
ff165c8823
Replace usage of Whitelist with Allowlist within Kubelet's sysctl package (#102298)
* Change uses of whitelist to allowlist in kubelet sysctl

* Rename whitelist files to allowlist in Kubelet sysctl

* Further renames of whitelist to allowlist in Kubelet

* Rename podsecuritypolicy uses of whitelist to allowlist

* Update pkg/kubelet/kubelet.go

Co-authored-by: Danielle <dani@builds.terrible.systems>

Co-authored-by: Danielle <dani@builds.terrible.systems>
2021-08-04 18:59:35 -07:00
Markus Thömmes
c820824711 Add pod context to volume lifecycle logs 2021-08-03 13:12:22 +02:00
kerthcet
980cf85439 revert Bump DynamicKubeConfig metric deprecation to 1.23 by delta update
Signed-off-by: kerthcet <kerthcet@gmail.com>
2021-08-02 23:15:10 +08:00
Elana Hashman
b5f24c334e
Bump DynamicKubeConfig metric deprecation to 1.23 2021-07-28 09:29:57 -07:00
Jordan Liggitt
db48793269 Set idle and readheader timeouts 2021-07-27 11:58:45 -04:00
njuptlzf
1555877cc5 fix data race for Test_Run_Positive_VolumeMountControllerAttachEnabledRace 2021-07-26 17:17:16 +08:00
Kubernetes Prow Robot
47e1df8f4e
Merge pull request #103743 from kolyshkin/runc-1.0.1
vendor: bump runc to v1.0.1
2021-07-23 15:16:33 -07:00
tanjing2020
523b4c0918 Replace 'x.Sub(time.Now())' with 'time.Until(x)' 2021-07-23 10:03:36 +08:00
Kubernetes Prow Robot
9f47110aa2
Merge pull request #103785 from smarterclayton/preserve_reason
Ensure that Reason and Message are preserved on pod status
2021-07-20 15:21:26 -07:00
Kubernetes Prow Robot
6aa160f3ba
Merge pull request #103181 from 249043822/bugfix-volumemanager
Add sync reconstructed volume from desired state of world for volumemanager
2021-07-19 15:04:52 -07:00
Clayton Coleman
d7ee024cc5
kubelet: Make condition processing in one spot
The list of status conditions should be calculated all together,
this made review more complex. Readability only.
2021-07-19 17:56:22 -04:00
Clayton Coleman
c2a6d07b8f
kubelet: Avoid allocating multiple times during status
Noticed while reviewing this code path. We can assume the
temporary slice should be about the same size as it was previously.
2021-07-19 17:55:18 -04:00
Clayton Coleman
9efd40d72a kubelet: Preserve reason/message when phase changes
The Kubelet always clears reason and message in generateAPIPodStatus
even when the phase is unchanged. It is reasonable that we preserve
the previous values when the phase does not change, and clear it
when the phase does change.

When a pod is evicted, this ensurse that the eviction message and
reason are propagated even in the face of subsequent updates. It also
preserves the message and reason if components beyond the Kubelet
choose to set that value.

To preserve the value we need to know the old phase, which requires
a change to convertStatusToAPIStatus so that both methods have
access to it.
2021-07-19 17:54:55 -04:00
Kir Kolyshkin
e5b434e990 kubelet/cm: don't set Devices
Since runc 1.0.0 it is now sufficient to have SkipDevices: true.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-07-16 12:45:35 -07:00
Davanum Srinivas
75748c185e
enable verify-golangci-lint.sh
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2021-07-14 08:53:33 -04:00
Davanum Srinivas
26cc8e40a8
fix deadcode issues
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2021-07-14 08:41:21 -04:00
Kubernetes Prow Robot
2da4d48e6d
Merge pull request #100567 from jingxu97/mar/mark
Mark volume mount as uncertain in case of volume expansion fails
2021-07-13 22:20:26 -07:00
KeZhang
1467065352 Optimize kubelet stats provider for perfomace bottleneck 2021-07-14 11:11:12 +08:00
Kubernetes Prow Robot
d6f2473d08
Merge pull request #103668 from smarterclayton/panic_in_pod_worker
kubelet: Prevent runtime-only pods from going into terminated phase
2021-07-13 17:42:26 -07:00
Clayton Coleman
de9cdab5ae
kubelet: Prevent runtime-only pods from going into terminated phase
If a pod is already in terminated and the housekeeping loop sees an
out of date cache entry for a running container, the pod worker
should ignore that running pod termination request. Once the worker
completes, a subsequent housekeeping invocation will then invoke
terminating because the worker is no longer processing any pod
with that UID.

This does leave the possibility of syncTerminatedPod being blocked
if a container in the pod is started after killPod successfully
completes but before syncTerminatedPod can exit successfully,
perhaps because the terminated flow (detach volumes) is blocked on
that running container. A future change will address that issue.
2021-07-13 15:41:49 -04:00
rarashid
bf2ae14501 Move feature flag to beta (but leave as false) and remove the feature flag from Kubelet 2021-07-13 14:25:44 -05:00
KeZhang
65618bfd69 Add sync reconstructed volume from desired state of world for volumemanager 2021-07-13 12:51:37 +08:00
Kubernetes Prow Robot
04ef2b115d
Merge pull request #90216 from DataDog/nayef/fix-container-statuses-race
Avoid overwriting podStatus ContainerStatuses in convertToAPIContainerStatuses
2021-07-12 17:02:29 -07:00
pacoxu
abd8acc259 fix exec failure for gomock finish calling
Signed-off-by: pacoxu <paco.xu@daocloud.io>
2021-07-12 10:10:01 +08:00
Elana Hashman
642eff0c69
Rename NodeSwapEnabled flag to NodeSwap 2021-07-09 11:39:52 -07:00
Kubernetes Prow Robot
a6c2cd7d18
Merge pull request #103291 from wzshiming/fix/nodeshutdown-restart
Fix Data Race in nodeshutdown restart
2021-07-09 08:43:14 -07:00
Kubernetes Prow Robot
617064d732
Merge pull request #101432 from swatisehgal/smtaware
node: cpumanager: add options to reject non SMT-aligned workload
2021-07-08 21:04:53 -07:00
Kubernetes Prow Robot
83baa708df
Merge pull request #103429 from saschagrunert/metrics-test-fix
Fix resource metrics e2e test
2021-07-08 17:58:53 -07:00
Kubernetes Prow Robot
dab6f6a43d
Merge pull request #102344 from smarterclayton/keep_pod_worker
Prevent Kubelet from incorrectly interpreting "not yet started" pods as "ready to terminate pods" by unifying responsibility for pod lifecycle into pod worker
2021-07-08 16:48:53 -07:00
Jing Xu
0fa01c371c Mark volume mount as uncertain in case of volume expansion fails
should mark volume mount in actual state even if volume expansion fails so that
reconciler can tear down the volume when needed. To avoid pods start
using it, mark volume as uncertain instead of mounted.

Will add unit test after the logic is reviewed.

Change-Id: I5aebfa11ec93235a87af8f17bea7f7b1570b603d
2021-07-08 16:00:34 -07:00
Kubernetes Prow Robot
57716897eb
Merge pull request #103434 from perithompson/windows-etchostcreate-skip
Explicitly skip host file mounting for Windows when HostProcess pod
2021-07-08 15:36:53 -07:00
Francesco Romani
23abdab2b7 smtalign: propagate policy options to policies
Consume in the static policy the cpu manager policy options from
the cpumanager instance.
Validate in the none policy if any option is given, and fail if so -
this is almost surely a configuration mistake.

Add new cpumanager.Options type to hold the options and translate from
user arguments to flags.

Co-authored-by: Swati Sehgal <swsehgal@redhat.com>
Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-07-08 23:15:37 +02:00
Francesco Romani
6dcec345df smtalign: cm: factor out admission response
Introduce a new `admission` subpackage to factor out the responsability
to create `PodAdmitResult` objects. This enables resource manager
to report specific errors in Allocate() and to bubble up them
in the relevant fields of the `PodAdmitResult`.

To demonstrate the approach we refactor TopologyAffinityError as a
proper error.

Co-authored-by: Kevin Klues <kklues@nvidia.com>
Co-authored-by: Swati Sehgal <swsehgal@redhat.com>
Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-07-08 23:15:37 +02:00
Francesco Romani
c5cb263dcf smtalign: propagate policy options to cpumanager
The CPUManagerPolicyOptions received from the kubelet config/command line args
is propogated to the Container Manager.

We defer the consumption of the options to a later patch(set).

Co-authored-by: Swati Sehgal <swsehgal@redhat.com>
Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-07-08 23:15:35 +02:00
Francesco Romani
6dccad45b4 smtalign: add auto generated code
Files generate after running `make generated_files`.

Co-authored-by: Swati Sehgal <swsehgal@redhat.com>
Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-07-08 23:14:59 +02:00
Swati Sehgal
cc76a756e4 smtalign: add cpu-manager-policy-options flag in Kubelet
In this patch we enhance the kubelet configuration to support
cpuManagerPolicyOptions.

In order to introduce SMT-awareness in CPU Manager, we introduce a
new flag in Kubelet to allow the user to specify an additional flag
called `cpumanager-policy-options` to allow the user to modify the
behaviour of static policy to strictly guarantee allocation of whole
core.

Co-authored-by: Francesco Romani <fromani@redhat.com>
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2021-07-08 23:14:59 +02:00
Kubernetes Prow Robot
4d78db54a5
Merge pull request #103580 from tkestack/fix-version-format
fix kubelet panic when DynamicKubeletConfig enabled
2021-07-08 14:02:24 -07:00
Kubernetes Prow Robot
a9d7526864
Merge pull request #102970 from tkestack/feature-memory-qos
Feature: Support memory qos with cgroups v2
2021-07-08 14:01:36 -07:00
Kubernetes Prow Robot
7c84064a4f
Merge pull request #99000 from verb/1.21-kubelet-metrics
Add kubelet metrics for ephemeral containers
2021-07-08 14:00:55 -07:00
Peri Thompson
8e2b728c68
Explicitly skip host file mounting for windows 2021-07-08 19:38:49 +01:00
Peter Hunt
a9b7dcc8c2 kubelet: update remote runtimes for cri stat changes
Signed-off-by: Peter Hunt <pehunt@redhat.com>
2021-07-08 13:17:04 -04:00
Li Bo
79e230ea21 fix kubelet panic when DynamicKubeletConfig enabled 2021-07-08 16:20:51 +08:00
Li Bo
c3d9b10ca8 feature: support Memory QoS for cgroups v2 2021-07-08 09:26:46 +08:00
Kubernetes Prow Robot
36a7426aa5
Merge pull request #99144 from bart0sh/PR0094-promote-HugePageStorageMediumSize-to-GA
promote huge page storage medium size to GA
2021-07-07 18:09:05 -07:00
Kubernetes Prow Robot
ebbe63f116
Merge pull request #92863 from AkihiroSuda/rootless-pr
kubelet & kube-proxy: ignore sysctl errors and rlimit errors when running in UserNS (for rootless)
2021-07-07 18:08:53 -07:00
Kubernetes Prow Robot
8e56a34195
Merge pull request #102966 from SergeyKanzhelev/deprecateDynamicKubeletConfig
deprecate and disable by default DynamicKubeletConfig feature flag
2021-07-07 17:05:15 -07:00
Nayef Ghattas
bb3fe633b4 add test for triggering race condition 2021-07-07 20:17:22 +02:00
Nayef Ghattas
ab1807f2bc copy podStatus.ContainerStatuses before sorting it 2021-07-07 20:14:53 +02:00
Akihiro Suda
26e83ac4d4
kubelet: ignore /dev/kmsg error when running in userns
oomwatcher.NewWatcher returns "open /dev/kmsg: operation not permitted" error,
when running with sysctl value `kernel.dmesg_restrict=1`.

The error is negligible for KubeletInUserNamespace.

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2021-07-07 14:23:31 +09:00
Akihiro Suda
dbe0155139
kubelet/cm: ignore sysctl error when running in userns
Errors during setting the following sysctl values are ignored:
- vm.overcommit_memory
- vm.panic_on_oom
- kernel.panic
- kernel.panic_on_oops
- kernel.keys.root_maxkeys
- kernel.keys.root_maxbytes

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2021-07-07 14:23:29 +09:00
Kubernetes Prow Robot
2547c5bb97
Merge pull request #103307 from aojea/kubelet_podIPs
podIPs order match node IP family preference (Downward API)
2021-07-06 22:11:20 -07:00
Kubernetes Prow Robot
561959f682
Merge pull request #102823 from ehashman/kep-2400-swap
Alpha node swap support
2021-07-06 22:11:11 -07:00
Antonio Ojea
a7469cf680 sort and filter exposed Pod IPs
runtimes may return an arbitrary number of Pod IPs, however, kubernetes
only takes into consideration the first one of each IP family.

The order of the IPs are the one defined by the Kubelet:
- default prefer IPv4
- if NodeIPs are defined, matching the first nodeIP family

PodIP is always the first IP of PodIPs.

The downward API must expose the same IPs and in the same order than
the pod.Status API object.
2021-07-07 00:15:31 +02:00
Elana Hashman
5584725605
Explicitly set LimitedSwap case with fallthrough 2021-07-06 13:50:09 -07:00
Clayton Coleman
3eadd1a9ea
Keep pod worker running until pod is truly complete
A number of race conditions exist when pods are terminated early in
their lifecycle because components in the kubelet need to know "no
running containers" or "containers can't be started from now on" but
were relying on outdated state.

Only the pod worker knows whether containers are being started for
a given pod, which is required to know when a pod is "terminated"
(no running containers, none coming). Move that responsibility and
podKiller function into the pod workers, and have everything that
was killing the pod go into the UpdatePod loop. Split syncPod into
three phases - setup, terminate containers, and cleanup pod - and
have transitions between those methods be visible to other
components. After this change, to kill a pod you tell the pod worker
to UpdatePod({UpdateType: SyncPodKill, Pod: pod}).

Several places in the kubelet were incorrect about whether they
were handling terminating (should stop running, might have
containers) or terminated (no running containers) pods. The pod worker
exposes methods that allow other loops to know when to set up or tear
down resources based on the state of the pod - these methods remove
the possibility of race conditions by ensuring a single component is
responsible for knowing each pod's allowed state and other components
simply delegate to checking whether they are in the window by UID.

Removing containers now no longer blocks final pod deletion in the
API server and are handled as background cleanup. Node shutdown
no longer marks pods as failed as they can be restarted in the
next step.

See https://docs.google.com/document/d/1Pic5TPntdJnYfIpBeZndDelM-AbS4FN9H2GTLFhoJ04/edit# for details
2021-07-06 15:55:22 -04:00
Kubernetes Prow Robot
eae87bfe7e
Merge pull request #103483 from odinuge/revert-102508-runc-1.0
Revert "Update runc to 1.0.0"
2021-07-06 10:42:56 -07:00
Artyom Lukianov
bb6d5b1f95 memory manager: provide unittests for init containers re-use
- provide tests for static policy allocation, when init containers
requested memory bigger than the memory requested by app containers
- provide tests for static policy allocation, when init containers
requested memory smaller than the memory requested by app containers
- provide tests to verify that init containers removed from the state
file once the app container started

Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
2021-07-05 20:52:25 +03:00
Artyom Lukianov
960da7895c memory manager: remove init containers once app container started
Remove init containers from the state file once the app container started,
it will release the memory allocated for the init container and can intense
the density of containers on the NUMA node in cases when the memory allocated
for init containers is bigger than the memory allocated for app containers.

Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
2021-07-05 20:52:25 +03:00
Artyom Lukianov
b965502c49 memory manager: re-use the memory allocated for init containers
The idea that during allocation phase we will:

- during call to `Allocate` and `GetTopologyHints`  we will take into account the init containers reusable memory,
which means that we will re-use the memory and update container memory blocks accordingly.
For example for the pod with two init containers that requested: 1Gi and 2Gi,
and app container that requested 4Gi, we can re-use 2Gi of memory.

Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
2021-07-05 20:52:25 +03:00
Odin Ugedal
61d88af9e4
Revert "Update runc to 1.0.0" 2021-07-05 14:03:04 +02:00
Sascha Grunert
2d0f99fba1
Fix resource metrics e2e test
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
2021-07-05 11:16:05 +02:00
Sergey Kanzhelev
dffc2a60a2 deprecate and disable by default DynamicKubeletConfig feature flag 2021-07-02 23:53:11 +00:00
Kubernetes Prow Robot
659c7e709f
Merge pull request #99494 from enj/enj/i/not_after_ttl_hint
csr: add expirationSeconds field to control cert lifetime
2021-07-01 23:02:12 -07:00
Monis Khan
cd91e59f7c
csr: add expirationSeconds field to control cert lifetime
This change updates the CSR API to add a new, optional field called
expirationSeconds.  This field is a request to the signer for the
maximum duration the client wishes the cert to have.  The signer is
free to ignore this request based on its own internal policy.  The
signers built-in to KCM will honor this field if it is not set to a
value greater than --cluster-signing-duration.  The minimum allowed
value for this field is 600 seconds (ten minutes).

This change will help enforce safer durations for certificates in
the Kube ecosystem and will help related projects such as
cert-manager with their migration to the Kube CSR API.

Future enhancements may update the Kubelet to take advantage of this
field when it is configured in a way that can tolerate shorter
certificate lifespans with regular rotation.

Signed-off-by: Monis Khan <mok@vmware.com>
2021-07-01 23:38:15 -04:00
Kubernetes Prow Robot
062bc359ca
Merge pull request #102444 from sanwishe/resourceStartTime
Expose container start time in kubelet /metrics/resource endpoint
2021-07-01 14:27:51 -07:00
Kir Kolyshkin
ab5b77944e kubelet/cm: don't set Devices
Since runc 1.0.0 it is now sufficient to have SkipDevices: true.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-06-30 16:17:35 -07:00
Shiming Zhang
212ce7c287 Shorten test time 2021-06-30 09:48:26 +08:00
Elana Hashman
39f32d7286
Ensure MemorySwapConfig can't be set without feature flag 2021-06-29 12:08:25 -07:00
Elana Hashman
d4041cb80f
Add generated files for swap API changes 2021-06-29 12:08:25 -07:00
Elana Hashman
d3fd1362ca
Rename NoSwap to LimitedSwap as workloads may still swap
Also made the options a kubelet type, address API review feedback
2021-06-29 12:08:21 -07:00
Elana Hashman
0deef4610e
Set MemorySwapLimitInBytes for CRI when NodeSwapEnabled 2021-06-29 11:59:02 -07:00
Elana Hashman
7342acb0b8
Add validation for KubeletConfig MemorySwap 2021-06-29 11:59:01 -07:00
Elana Hashman
bda03b4818
API change: add MemorySwap to KubeletConfiguration 2021-06-29 11:58:59 -07:00
Kubernetes Prow Robot
01819dd322
Merge pull request #102028 from chrishenzie/read-write-once-pod-access-mode
ReadWriteOncePod access mode for PVs and PVCs
2021-06-29 10:04:40 -07:00
Kubernetes Prow Robot
756203fda0
Merge pull request #102576 from dobsonj/101911
kubelet: do not call RemoveAll on volumes directory for orphaned pods
2021-06-29 06:54:40 -07:00
Shiming Zhang
a42c066af7 Fix Data Race in nodeshutdown restart 2021-06-29 16:23:45 +08:00
Chris Henzie
2b98f8edc7 Enforce ReadWriteOncePod access mode during mount 2021-06-28 21:25:37 -07:00
Kubernetes Prow Robot
15d3c3a5e2
Merge pull request #102821 from ehashman/phase-fix
Ensure kubelet statuses can handle loss of container runtime state
2021-06-28 15:38:40 -07:00
pacoxu
f2eec0a816 ResourceConfigForPod: check initContainers as other QoS func
Signed-off-by: pacoxu <paco.xu@daocloud.io>
2021-06-28 19:22:42 +08:00
Kubernetes Prow Robot
07358f1663
Merge pull request #103146 from tech-geek29/fix-95380
Change log level to Debug
2021-06-25 07:44:45 -07:00
Kubernetes Prow Robot
49ab9ac160
Merge pull request #103154 from jsafrane/fix-asw-mounter
Update mounter interface in volume manager
2021-06-24 14:18:05 -07:00
Kubernetes Prow Robot
2e93b3924a
Merge pull request #101943 from saschagrunert/seccomp-default
Add kubelet `SeccompDefault` alpha feature
2021-06-24 13:07:41 -07:00
Kubernetes Prow Robot
79494183b7
Merge pull request #102869 from mengjiao-liu/json-register-move
Remove default JSON logging format registration from k8s.io/component-base/logs package
2021-06-24 11:59:41 -07:00
Kubernetes Prow Robot
06dfe683ce
Merge pull request #103123 from dims/remove-fakefs-to-drop-spf13/afero-dependency
Remove fakefs to drop spf13/afero dependency
2021-06-24 07:57:41 -07:00
Davanum Srinivas
5feff280e1
remove fakefs to drop spf13/afero dependency
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2021-06-24 09:51:34 -04:00
Mengjiao Liu
af825b4357 Remove default JSON logging format registration from component-base/logs package 2021-06-24 20:37:09 +08:00
Jan Safranek
d3dfe124da Update mounter interface in volume manager
Update mounter interface in volume manager's ActualStateOfWorld every time.
Otherwise kubelet uses the first mounter it gets, which may not have the
latest information.

This fixes set up of CSI volumes, which store information about SELinux
support in their `mounter` interface implementation. With each MountVolume()
retry, a new mounter is instantiated and only the final mounter that succeeds
has the right info if the volume supports SELinux or not and can later
return the right attributes on GetAttributes() call.
2021-06-24 14:11:31 +02:00
Rishabh Jain
8f08db9164 Change log level to Debug 2021-06-24 14:23:06 +05:30
Kenta Tada
89a4d4b071 kubelet: modify the function of getCgroupSubsystemsV2 to use libcontainer API 2021-06-24 16:58:05 +09:00
Shiming Zhang
97bcfbd674 Allow the actual inhibit delay to be greater than the expected inhibit delay 2021-06-24 14:11:58 +08:00
Ryan Phillips
d9be5abc37 kubelet: add shutdown events 2021-06-23 16:44:19 -05:00
sanwishe
43f8f58895 add containers starttime metrics for metrics/resource endpoint
Signed-off-by: sanwishe <jiang.mingzhi35@zte.com.cn>
2021-06-24 02:53:21 +08:00
Sascha Grunert
8b7003aff4
Add SeccompDefault feature
This adds the gate `SeccompDefault` as new alpha feature. Seccomp path
and field fallbacks are now passed to the helper functions, whereas unit
tests covering those code paths have been added as well.

Beside enabling the feature gate, the feature has to be enabled by the
`SeccompDefault` kubelet configuration or its corresponding
`--seccomp-default` CLI flag.

Signed-off-by: Sascha Grunert <sgrunert@redhat.com>

Apply suggestions from code review

Co-authored-by: Paulo Gomes <pjbgf@linux.com>
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
2021-06-23 10:22:57 +02:00
Sahil Raja
992993257d
Removed usage of github.com/pkg/errors
Signed-off-by: Sahil Raja <sahilraja242@gmail.com>
2021-06-23 08:07:05 +05:30
Kubernetes Prow Robot
985ac8ae50
Merge pull request #101030 from cynepco3hahue/pod_resources_memory_interface
Extend pod resource API response to return the information from memory manager
2021-06-22 06:35:58 -07:00
Artyom Lukianov
03830db82d Implement all necessary methods to provide memory manager data under pod resources metrics
Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
2021-06-22 13:06:32 +03:00
Artyom Lukianov
24023f9fcc Extend pod resource API response to return the memory manager information
Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
2021-06-22 12:59:04 +03:00
Kubernetes Prow Robot
e154a6d637
Merge pull request #102455 from lunhuijie/addTestHelpers
Add test cases to the addAllocatableThresholds function in pkg/kubelet/eviction/helpers.go
2021-06-21 19:23:57 -07:00
Kubernetes Prow Robot
3bd29bc53d
Merge pull request #102829 from snowplayfire/update-devicemanager
Add resource capacity to ListAndWatch grpc logging
2021-06-21 16:28:09 -07:00
Kubernetes Prow Robot
844fa00c5e
Merge pull request #102725 from 249043822/br-podworker
Fix:slow memory leak may be in kubelet podworkers.isWorking
2021-06-21 16:27:57 -07:00
Kubernetes Prow Robot
62fdaabe82
Merge pull request #102635 from charlesxsh/fix-linux-test
fix a potential deadlock in graceful node shutdown unit tests
2021-06-21 16:27:45 -07:00
Sahil Raja
8eee78a61f
Update github.com/pkg/errors to go native errors pkg
Signed-off-by: Sahil Raja <sahil.raja@mayadata.io>
2021-06-21 23:03:14 +05:30
gy95
7b98a0770f remove not used IsStaticPod, prevent possible panic 2021-06-21 19:38:40 +08:00
Kubernetes Prow Robot
4afb72a863
Merge pull request #100183 from jsafrane/fix-unstage-retry
Mark volume as uncertain after Unmount* fails
2021-06-18 11:04:06 -07:00
jingxueli
45d18acbcc add info for possible failed listAndWatch grpc call 2021-06-17 16:25:20 +08:00
Kubernetes Prow Robot
2d7a20fcd6
Merge pull request #102840 from Kissy/issue-102820
Improve terminated pod message when node is shutting down
2021-06-16 12:48:12 -07:00
Jan Safranek
d5da73032f Add unit test for DSWP with uncertain volume
desiredStateOfWorldPopulator.findAndRemoveDeletedPods() should remove
volumes from DSW when a pod is deleted on the API server and the volume is
uncertain in ASW.
2021-06-16 18:41:44 +02:00
Jan Safranek
f795b02f4f Refactor dswp unit tests
Change existing desiredStateOfWorldPopulator.findAndAddNewPods tests to use
a common initialization function.
2021-06-16 18:41:43 +02:00
Jan Safranek
2fcb5e9cf7 Add PodRemovedFromVolume
To know when a volume has been fully unmounted (incl. uncertain mounts).
2021-06-16 18:41:41 +02:00
Jan Safranek
ca934b8f5c Add GetPossiblyMountedVolumesForPod to let kubelet know all volumes were unmounted
podVolumesExist() should consider also uncertain volumes (where kubelet
does not know if a volume was fully unmounted) when checking for pod's
volumes. Added GetPossiblyMountedVolumesForPod for that.

Adding uncertain mounts to GetMountedVolumesForPod would potentially break
other callers (e.g. `verifyVolumesMountedFunc`).
2021-06-16 18:39:12 +02:00
Elana Hashman
9469756b6c
Ensure kubelet statuses can handle loss of container runtime state 2021-06-15 11:12:55 -07:00
Lee Verberne
30d2ad576a Remove ManagedPod,ManagedContainer metrics
This replaces the generic ManagedPod and ManagedContainer kubelet
metrics with a gauge to track only ephemeral container usage.
2021-06-15 19:02:07 +02:00
Guillaume Le Biller
f1de598233
Improve terminated pod message when node is shutting down
Signed-off-by: Guillaume Le Biller <glebiller@Traveldoo.com>
2021-06-15 18:29:54 +02:00
Marek Siarkowicz
f9343f837d Use LoggingConfig within LogOptions
Co-authored-by: mengjiao.liu <mengjiao.liu@daocloud.io>
Co-authored-by: Jordan Liggitt <jordan@liggitt.net>
Co-authored-by: Heisenberg <yuzhiquanlong@gmail.com>
2021-06-15 17:14:43 +02:00
刁浩 10284789
be48f1d272 Add test cases to the addAllocatableThresholds function in pkg/kubelet/eviction/helpers.go
Signed-off-by: 刁浩 10284789 <diao.hao@zte.com.cn>
2021-06-15 11:32:44 +00:00
KeZhang
83ee5da75e Fix:slow memory leak may be in kubelet podworkers.isWorking 2021-06-15 15:26:30 +08:00
Kubernetes Prow Robot
4e7fc6df63
Merge pull request #100369 from wzshiming/fix/restart-dbus-for-graceful-node-shutdown
After DBus restarts, make GracefulNodeShutdown work again
2021-06-14 20:50:00 -07:00
Kubernetes Prow Robot
85f0931ab9
Merge pull request #102772 from saintube/patch-1
cleanup: fix kubelet cpuset typo
2021-06-14 19:00:13 -07:00
Francesco Romani
369416b763 cm: handle nil cpumanager avoiding segfault
If the cpumanager feature gate is disabled, the corresponsing field
of the containerManager will be nil.
A couple functions don't check for this occurrence and happily
deference the pointer unconditionally, leading to possible segfaults.

The relevant functions were introduced to support the podresources API,
so to trigger this segfault all the following are needed:
- cpumanager feature gate has to be disabled explicitely
- any podresources API must be called

Worth pointing out that when the new functions were introduced (around
kubernetes 1.20) the default feature gate for cpumanager was already set
to true, hence this bug is expected to be triggered rarely.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-06-10 16:22:43 +02:00
Frame
9255f2ccf3
Fix kubelet cpuset typo 2021-06-10 18:17:04 +08:00
Jonathan Dobson
484eb01822 kubelet: do not call RemoveAll on volumes directory for orphaned pods 2021-06-08 13:57:35 -06:00
Kubernetes Prow Robot
f7cff077d5
Merge pull request #102611 from ehashman/test-order
kubelet: Fix test order in verifyContainerStatuses
2021-06-08 00:29:11 -07:00
Kubernetes Prow Robot
db34c5a869
Merge pull request #102471 from wzshiming/clean/cap
Pre-allocated memory
2021-06-07 19:55:12 -07:00
Kubernetes Prow Robot
bd0196e8ba
Merge pull request #102568 from ehashman/init-container-coverage
Add unit test coverage for init container phases
2021-06-07 09:46:55 -07:00
Elana Hashman
cc2e9394be
kubelet: Fix test order in verifyContainerStatuses
Per https://pkg.go.dev/github.com/stretchr/testify/assert#Equal
expected goes before actual.
2021-06-04 16:04:10 -07:00
Shihao Xia
a2a4b50bc1 fixed deadlock 2021-06-03 18:03:17 -04:00
Kubernetes Prow Robot
9f7c9c322f
Merge pull request #101738 from matthyx/deflake-startupprobe
fix manual trigger of readinessProbe on startupProbe success
2021-06-03 14:34:42 -07:00
Elana Hashman
dfd67c7d79
Add unit test coverage for init container phases 2021-06-02 17:37:51 -07:00
Kubernetes Prow Robot
4eda493658
Merge pull request #101959 from lunhuijie/run-test5
Add test cases to the LoadClientConfig function
2021-06-02 13:42:55 -07:00
Kubernetes Prow Robot
4d50f2ace0
Merge pull request #101633 from llhuii/kubelet/remove-redundant-code
kubelet_pods.go: clean makeEnvironmentVariables
2021-06-02 13:42:43 -07:00
刁浩 10284789
ce08fd5976 Add test cases to the LoadClientConfig function
Signed-off-by: 刁浩 10284789 <diao.hao@zte.com.cn>
2021-06-02 15:22:00 +00:00
Kubernetes Prow Robot
1795a98eeb
Merge pull request #102221 from kikimo/add-hint-to-fake-topology-manager
Add hint to fake topology manager.
2021-06-02 03:40:05 -07:00
kikimo
86d68effc2 clean code 2021-06-02 09:07:53 +08:00
Kubernetes Prow Robot
38b94683c9
Merge pull request #101559 from njuptlzf/fsstore_test
Clean up tempDir after fsstore_test.go is executed
2021-06-01 16:02:07 -07:00
Kubernetes Prow Robot
7c7a0865cd
Merge pull request #102218 from kolyshkin/cgroup-cleanups
pkg/kubelet/cm: cgroup-related cleanups
2021-06-01 13:45:51 -07:00
Kubernetes Prow Robot
e5b54d0769
Merge pull request #102232 from MadhavJivrajani/mirror-client-log-line-fix
Change log line to print actual pod uid and not address of the pod uid
2021-06-01 10:51:52 -07:00
kikimo
9d2135f703 reuse fake topology manager 2021-06-02 01:35:00 +08:00
kikimo
8b3162d67b clean code 2021-06-02 01:17:04 +08:00
Shiming Zhang
582b492cc0 Pre-allocated memory 2021-06-01 15:19:44 +08:00
Kubernetes Prow Robot
49897ca156
Merge pull request #102268 from sanwishe/loggingformat1
cleanup: Optimization logging format for pkg/kubelet
2021-05-28 07:50:25 -07:00
njuptlzf
6738380a80 cleanup tempDir after fsstore_test.go 2021-05-27 10:10:08 +08:00
Gunju Kim
6317ce63c6 Add feature gate ExpandedDNSConfig
ExpandedDNSConfig allows kubernetes to have expanded DNS(Domain Name
System) configuration
2021-05-27 07:10:13 +09:00
Gunju Kim
819059f641 kubelet: Validate the length of the DNS search path 2021-05-27 07:09:46 +09:00
Madhav Jivrajani
d7a67a3b8e change log line to print actual pod uid instead of address of the pod uid
Signed-off-by: Madhav Jivrajani <madhav.jiv@gmail.com>
2021-05-26 20:32:32 +05:30
Kubernetes Prow Robot
91656fa6eb
Merge pull request #101308 from pacoxu/doc-kubelet-running-pods
kubelet_running_pods shows number of pods that have a running pod sandbox
2021-05-26 03:17:20 -07:00
Sascha Grunert
b167fc24d7
Update pause image to v3.5
Update dependencies and the test images to use pause 3.5. We also
provide a changelog entry for the new container image version.

Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
2021-05-25 09:04:46 +02:00
Kubernetes Prow Robot
13cafd5cb0
Merge pull request #101480 from yuzhiquan/little-nit-for-kubelet
Fix some nit for kubelet
2021-05-24 21:49:05 -07:00
Kubernetes Prow Robot
a49b4a1018
Merge pull request #100608 from pacoxu/fix/poststart-hook
correct messages in post start hook error handling
2021-05-24 21:48:32 -07:00
Kubernetes Prow Robot
63014d87cd
Merge pull request #101013 from pacoxu/mac-ut/skips
skip some UT on mac for dockershim
2021-05-24 20:10:20 -07:00
sanwishe
9e257ec194 Optimization logging format for pkg/kubelet
Signed-off-by: sanwishe <jiang.mingzhi35@zte.com.cn>
2021-05-25 08:52:08 +08:00
Kubernetes Prow Robot
f545438bd3
Merge pull request #101587 from nixpanic/in-tree/block-metrics
Fix a panic for in-tree drivers that partialy support Block volume metrics
2021-05-24 16:18:47 -07:00
Kubernetes Prow Robot
88c0e8968b
Merge pull request #99680 from CaoDonghui123/fixissues4
fix error of setting  negative value for containerLogMaxSize
2021-05-24 16:18:20 -07:00
Kubernetes Prow Robot
cf59c68e15
Merge pull request #102088 from wzshiming/fix/pod-devices-has-pod-lock
Add the missing RLock
2021-05-24 15:16:20 -07:00
Kir Kolyshkin
f1aee7e049 kubelet/cm: GetResourceStats -> MemoryUsage
Commit cc50aa9dfb introduced GetResourceStats, a method which collected
all the statistics from various cgroup controllers, only to discard all
of the info collected except a single value (memory usage).

While one may argue that this method can potentially be used from other
places, this did not happen since it was added 4+ years ago.

Let's streamline this code and only collect what we need, i.e. memory
usage. Rename the method accordingly.

While at it, fix pkg/kubelet/cm/cgroup_manager_unsupported.go to not
instantiate a new error every time a method is called.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-05-23 20:43:52 -07:00
Shiming Zhang
202a012093 Add restart unit test 2021-05-23 00:47:36 +08:00
kikimo
20c02357ca Add hint to fake topology manager. 2021-05-22 15:29:08 +08:00
Kubernetes Prow Robot
a2357f4516
Merge pull request #100136 from Danil-Grigorev/disable-cloud-providers-fg
Add feature gate to disable all in-tree cloud providers
2021-05-21 15:39:36 -07:00
Kir Kolyshkin
c299b8fc9a kubelet/cm: rm propagateControllers
This was added by commit a9772b2290.

In the current codebase, the cgroup being updated was created using
runc/opencontainers' manager.Apply(), which already does controllers
propagation, so there is no need to repeat that on every update.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-05-21 13:44:54 -07:00
Kubernetes Prow Robot
cb1775c73a
Merge pull request #101893 from kikimo/fix-numa-topo-error
Avoid undesirable allocation when device is associated with multiple …
2021-05-21 08:40:47 -07:00
Danil-Grigorev
5d57b3794c Add DisableCloudProviders FG
FeatureGate acts as a secondary switch to disable cloud-controller loops
in KCM, Kubelet and KAPI.

Provide comprehensive logging information to users, so they will be
guided in adoption of out-of-tree cloud provider implementation.
2021-05-21 16:09:44 +02:00
Kubernetes Prow Robot
5de1a754c8
Merge pull request #102147 from kolyshkin/update-runc-rc94-take-II
vendor: bump runc to rc95
2021-05-20 17:16:56 -07:00
Kubernetes Prow Robot
823d870725
Merge pull request #102014 from klueska/upstream-update-cpu-asssignment-algorithm
Refactor the algorithm used to decide CPU assignments in the CPUManager
2021-05-20 16:10:56 -07:00
Kubernetes Prow Robot
e259943f7f
Merge pull request #101265 from s-ito-ts/ut_kubelet_topology
Adds unit tests for pkg/kubelet/cm/cpumanager/topology
2021-05-20 14:16:28 -07:00
Kubernetes Prow Robot
6e4e32985a
Merge pull request #99576 from marosset/windows-host-process-work
Windows host process work
2021-05-20 14:16:15 -07:00
Niels de Vos
b997e0e4d6 Add SupportsMetrics() for Block-mode volumes
Volumes that are provisioned with `VolumeMode: Block` often have a
MetrucsProvider interface declared in their type. However, the
MetricsProvider should implement a GetMetrics() function. In the cases
where the storage drivers do not implement GetMetrics(), a panic can
occur.

Usual type-assertions are not sufficient in this case. All assertions
assume the interface is present. There is no straight forward way to
verify that a valid GetMetrics() function is provided.

By adding SupportsMetrics(), storage driver implementations require
careful reviewing for metrics support.
2021-05-20 17:10:23 +02:00
kikimo
c0a7939cbb remove redundant test branch in sorting algorithm 2021-05-20 20:31:47 +08:00
Rancho Chen
9469ee7025 Add testcase for freeCPUs with three Sockets 2021-05-20 11:49:51 +00:00
pacoxu
75c19da843 correct messages in post start hook error handling
Signed-off-by: pacoxu <paco.xu@daocloud.io>
2021-05-20 15:14:47 +08:00
Odin Ugedal
d312ef7eb6 Set cgroups via opencontainer
This sets cgroup config via libcontainer to make sure we apply the
correct values to the systemd slices and scopes.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-05-19 23:52:01 -07:00
Kir Kolyshkin
f3cdfc488e vendor: bump runc to rc95
runc rc95 contains a fix for CVE-2021-30465.

runc rc94 provides fixes and improvements.

One notable change is cgroup manager's Set now accept Resources rather
than Cgroup (see https://github.com/opencontainers/runc/pull/2906).
Modify the code accordingly.

Also update runc dependencies (as hinted by hack/lint-depdendencies.sh):

        github.com/cilium/ebpf v0.5.0
        github.com/containerd/console v1.0.2
        github.com/coreos/go-systemd/v22 v22.3.1
        github.com/godbus/dbus/v5 v5.0.4
        github.com/moby/sys/mountinfo v0.4.1
        golang.org/x/sys v0.0.0-20210426230700-d19ff857e887
        github.com/google/go-cmp v0.5.4
        github.com/kr/pretty v0.2.1
        github.com/opencontainers/runtime-spec v1.0.3-0.20210326190908-1c3f411f0417

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-05-19 23:51:59 -07:00
Kir Kolyshkin
029e6b6e3a pkg/kubelet/nodeshutdown/systemd: fix for dbus 5.0.4
dbus 5.0.4 adds StoreProperty method which needs to be implemented for
the mock.

Fixes the errors like

> pkg/kubelet/nodeshutdown/systemd/inhibit_linux_test.go:88:9: cannot use f.fakeDBusObject (variable of type *fakeDBusObject) as dbus.BusObject value in return statement: missing method StoreProperty

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-05-19 23:51:57 -07:00
Giuseppe Scrivano
12abc3b7c9 kubelet: reuse manager
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2021-05-19 23:51:49 -07:00
kikimo
445b9c0762 minor tweak on numa node sorting algorithm 2021-05-20 08:21:20 +08:00
marosset
fd94032b21 Kubelet updates for Windows HostProcess Containers 2021-05-19 16:24:14 -07:00
kikimo
ecfa609b71 simplify sorting comparator of numa nodes 2021-05-19 21:19:47 +08:00
kikimo
84a4b40526 fix incompatible interface in fakeTopologyManagerWithHint 2021-05-19 10:12:12 +08:00
kikimo
7d30bfecd5 simplify sorting comparator of numa nodes 2021-05-19 10:07:37 +08:00
kikimo
893ebf3a1c add a reusable fakeTopologyManagerWithHint{} 2021-05-19 10:07:37 +08:00
kikimo
2ef1f81076 Avoid undesirable allocation when device is associated with multiple NUMA Nodes
suppose there are two devices dev1 and dev2, each has NUMA Nodes associated as below:
  dev1: numa1
  dev2: numa1, numa2

and we request a device from numa2, currently filterByAffinity() will return
[], [dev1, dev2], [] if loop of available devices produce a sequence of [dev1, dev2],
that is is not desirable as what we truely expect is an allocation of dev2 from numa2.
2021-05-19 10:07:37 +08:00
Shiming Zhang
9c59e6c85f After dbus restarts, make GracefulNodeShutdown work again 2021-05-19 10:05:38 +08:00
Kubernetes Prow Robot
708a9a1c8c
Merge pull request #98583 from damemi/internal-helpers-noderesources-registry
Scheduler: remove pkg/features dependency from NodeResources plugins
2021-05-18 11:35:05 -07:00
Jordan Liggitt
4b45d0d921 Revert "Merge pull request 101888 from kolyshkin/update-runc-rc94"
This reverts commit b1b06fe0a4, reversing
changes made to 382a33986b.
2021-05-18 09:13:47 -04:00
Mike Dame
5a77ebe28b Scheduler: remove pkg/features dependency from NodeResources plugins 2021-05-18 08:59:02 -04:00
Shiming Zhang
bbed9d27b0 Add the missing RLock 2021-05-18 17:27:27 +08:00
caodonghui
a06ed1244e fix error of setting negative value for containerLogMaxSize 2021-05-18 10:28:15 +08:00
Kubernetes Prow Robot
3e588be763
Merge pull request #101712 from SergeyKanzhelev/disableAcceleratorUsageMetricsOnContainerd
disable collecting of accelerator metrics in cAdvisor
2021-05-17 13:39:51 -07:00
Kubernetes Prow Robot
003dd87cff
Merge pull request #100565 from lack/cpuset-validation
cpuset parsing:Fix more edge cases and add more unit tests
2021-05-17 13:39:30 -07:00
Kevin Klues
67c92a5cd4 Refactor / simplify logic for CPU assignment algorithm in CPUManager
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-05-14 14:53:06 +00:00
s-ito-ts
1dea66439c Adds unit tests for pkg/kubelet/cm/cpumanager/topology 2021-05-12 07:13:04 +00:00
Kir Kolyshkin
b49744f177 vendor: bump runc to rc94
One notable change is cgroup manager's Set now accept Resources rather
than Cgroup (see https://github.com/opencontainers/runc/pull/2906).
Modify the code accordingly.

Also update runc dependencies (as hinted by hack/lint-depdendencies.sh):

	github.com/cilium/ebpf v0.5.0
	github.com/containerd/console v1.0.2
	github.com/coreos/go-systemd/v22 v22.3.1
	github.com/godbus/dbus/v5 v5.0.4
	github.com/moby/sys/mountinfo v0.4.1
	golang.org/x/sys v0.0.0-20210426230700-d19ff857e887
	github.com/google/go-cmp v0.5.4
	github.com/kr/pretty v0.2.1
	github.com/opencontainers/runtime-spec v1.0.3-0.20210326190908-1c3f411f0417

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-05-11 11:56:42 -07:00
Kir Kolyshkin
8167f83437 pkg/kubelet/nodeshutdown/systemd: fix for dbus 5.0.4
dbus 5.0.4 adds StoreProperty method which needs to be implemented for
the mock.

Fixes the errors like

> pkg/kubelet/nodeshutdown/systemd/inhibit_linux_test.go:88:9: cannot use f.fakeDBusObject (variable of type *fakeDBusObject) as dbus.BusObject value in return statement: missing method StoreProperty

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-05-11 11:11:02 -07:00
Jim Ramsay
a21179ae69 cpuset.Parse: Fix edge cases and add negative tests
The cpuset.Parse function missed a couple bad input cases, specifically
"1--3" and "10-6".  These were silently ignored when they should instead
be flagged as invalid.

This now catches these cases and expands the unit tests for cpuset to
cover them (and other negative test cases as well).

Signed-off-by: Jim Ramsay <jramsay@redhat.com>
2021-05-11 11:05:38 -04:00
Giuseppe Scrivano
fd7ecd3915 kubelet: reuse manager
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2021-05-10 17:34:53 -07:00
Kubernetes Prow Robot
160425640e
Merge pull request #101771 from klueska/upstream-only-uppdate-if-needed
Add logic to only call CPUManager Update() if state different than last Update()
2021-05-10 09:45:09 -07:00
Kubernetes Prow Robot
9f9d774eee
Merge pull request #101615 from aheng-ch/podTopologyHints
fix removing pods from podTopologyHints mapping
2021-05-10 08:13:26 -07:00
Ed Bartosh
c12aa0f6b7 promote HugePageStorageMediumSize to GA 2021-05-10 15:57:55 +03:00
aheng-ch
ff7b94fa5a fix removing pods from podTopologyHints mapping 2021-05-10 19:44:15 +08:00
Kevin Klues
6646039481 Add logic to only call Update() if state different than last Update()
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-05-06 23:38:08 +02:00
Matthias Bertschy
3916c00955 fix manual trigger of readinessProbe on startupProbe success 2021-05-05 11:21:40 +02:00
Kubernetes Prow Robot
ae3250165a
Merge pull request #101708 from joelsmith/master
Fix log spam for du failure on pod etc-hosts metrics
2021-05-04 11:08:14 -07:00
Kubernetes Prow Robot
a5235299ae
Merge pull request #101593 from rphillips/fix_kernel_move_message
kubelet: change cgroup move message to log level 3
2021-05-03 17:21:37 -07:00
Kubernetes Prow Robot
c0a991369d
Merge pull request #101400 from wangyx1992/fix-single-case-select
cleanup: use plain channel send or receive instead of single-case select
2021-05-03 17:21:29 -07:00
Kubernetes Prow Robot
cff652d951
Merge pull request #101369 from markusthoemmes/status-simplification
pkg/kubelet: Simplify status string generation on probes
2021-05-03 17:21:22 -07:00
Kubernetes Prow Robot
9fc32e57fb
Merge pull request #101364 from markusthoemmes/consistent-kubelet-log
Consistently use log.KObj to format pods in logs
2021-05-03 17:21:11 -07:00
Kubernetes Prow Robot
a238eb2fe8
Merge pull request #99748 from rphillips/fixes/check_log_path_for_restart_count
kubelet: fix log files being overwritten on container state loss
2021-05-03 16:14:19 -07:00
Joel Smith
1e0ca5bdc7 Fix log spam for du failure on pod etc-hosts metrics 2021-05-03 08:29:56 -06:00
Sergey Kanzhelev
e8ae653c1d disable collecting of accelerator metrics and exposing it for containerd 2021-04-30 22:16:34 +00:00
Kubernetes Prow Robot
6850e0abf2
Merge pull request #100218 from aojea/unitflakes1
unit test using metrics must reset the global registry
2021-04-29 23:01:57 -07:00
llhuii
afe28c6fc8 kubelet_pods.go: clean makeEnvironmentVariables
For the simplicity and clarity, I think we can safely delete the
`delete(serviceEnv, envVar.Name)` and the duplicate comments at
function makeEnvironmentVariables of kubelet_pods.go:774-779.

1. `delete(serviceEnv, envVar.Name)` and `if _, present := tmpEnv[k]; !present`
of line 796 are the same logic that is to merge the non-present keys
of serviceEnv into tmpEnv.

2. And the keys deleted from serviceEnv are guarantee to be in tmpEnv,
this doesn't affect mappingFunc.

3. the delete may miss some key from container.EnvFrom
2021-04-30 10:33:13 +08:00
Ryan Phillips
224a4db269 cleanup podkiller close 2021-04-29 11:49:58 -05:00
Ryan Phillips
1f81b44cc7 kubelet: do not cleanup volumes if pod is being killed 2021-04-29 11:49:58 -05:00
pacoxu
650666406e update kubelet_running_pods metrics comments: pods that have a running pod sandbox
Signed-off-by: pacoxu <paco.xu@daocloud.io>
Co-authored-by: Elana Hashman <ehashman@users.noreply.github.com>
2021-04-29 11:05:52 +08:00
Ryan Phillips
4488162bd9 kubelet: change cgroup move message to log level 3 2021-04-28 14:54:54 -05:00
Kubernetes Prow Robot
b9e86716b9
Merge pull request #101465 from ingvagabund/scheduler-drop-Resource-ResourceList-method
pkg/scheduler: drop Resource.ResourceList() method
2021-04-28 10:33:03 -07:00
Jan Chaloupka
7286f9712a pkg/scheduler: drop Resource.ResourceList() method
The method is used only for testing purposes. Given Resource data type
exposes all its fields, any invoker of ResourceList that is still
using the method outside of kubernetes/kubernetes can still either
copy paste the original implementation or implement a custom method
that's converting resources into proper Quantity data type.

Given the hugepage resource is a scalar resource, it's sufficient
the underlying code under fit_test.go to take into account any
extended resources. For predicate_test.go, the hugepage
resource does not play any role as the General predicates test cases
does not set any scaler resource at all.

Additionally, by removing ResourceList method, pkg/scheduler/framework
can get rid of dependency on k8s.io/kubernetes/pkg/apis/core/v1/helper.
2021-04-28 16:26:33 +02:00
yuzhiquan
bebca30309 comment should have function name as prefix 2021-04-28 15:26:46 +08:00
Kubernetes Prow Robot
e213fb61ef
Merge pull request #100778 from Fish-pro/fix-delete-duplicate-log
delete duplicate logs
2021-04-26 12:53:37 -07:00
Kubernetes Prow Robot
afe567d0fc
Merge pull request #100750 from dabaooline/master
make clear PodConfigNotification's type
2021-04-26 12:53:29 -07:00
yuzhiquan
d483872d64 fix potential nil pointer 2021-04-26 15:31:34 +08:00
Kubernetes Prow Robot
5c34712a09
Merge pull request #101421 from yuzhiquan/typo
Fix typo for kubelet
2021-04-25 12:19:00 -07:00
Kubernetes Prow Robot
8b057cdfa4
Merge pull request #99095 from maxlaverse/fix_kubelet_stuck_in_diskpressure
Prevent Kubelet from getting stuck in DiskPressure when imagefs minReclaim is set
2021-04-23 18:23:14 -07:00
Kubernetes Prow Robot
520959060d
Merge pull request #97972 from nixpanic/csi/block-metrics
Add support for gathering metrics from CSI block-mode volumes
2021-04-23 12:33:40 -07:00
yuzhiquan
02c3d53a23 typo 2021-04-23 17:55:54 +08:00
pacoxu
fd7bb771f9 skip linux ut on mac in pkg/kubelet/dockershim
Signed-off-by: pacoxu <paco.xu@daocloud.io>
2021-04-23 15:43:41 +08:00
wangyx1992
31d449bf57 cleanup: use plain channel send or receive instead of single-case select
Signed-off-by: wangyx1992 <wang.yixiang@zte.com.cn>
2021-04-23 11:17:12 +08:00
Kubernetes Prow Robot
32b7c63e1b
Merge pull request #101213 from Elbehery/rephrase_volume_limit_log_msg
rephrase kubelet volume limit log msg from error to info
2021-04-22 13:19:32 -07:00
Kubernetes Prow Robot
62876fb406
Merge pull request #101085 from lojies/fixbrokenlinkindockersandbox
fix a broken link in docker_sandbox.go
2021-04-22 13:19:06 -07:00
Kubernetes Prow Robot
19d47ed330
Merge pull request #101037 from BodilessSleeper/master
Fixed the broken link
2021-04-22 13:18:58 -07:00
Kubernetes Prow Robot
ba4d8f7cc2
Merge pull request #101011 from lojies/fixbrokelinkinmirrorclient
fix broken link for issue 101008
2021-04-22 13:18:49 -07:00
Kubernetes Prow Robot
9cc6c2a82f
Merge pull request #100150 from bobbypage/remove-unused-limit-func
kubelet: remove unused applyLimits function
2021-04-22 13:18:22 -07:00
Kubernetes Prow Robot
27e23967f4
Merge pull request #99880 from Dragoncell/pleg-log
Add exit code log when container died
2021-04-22 13:18:01 -07:00
Niels de Vos
fb703b4cc1 Include metrics of BlockVolumes in volumeStatCalculator 2021-04-22 18:21:46 +02:00
Markus Thömmes
f00441d2ee pkg/kubelet: Simplify status string generation on probes 2021-04-22 14:06:18 +02:00
Markus Thömmes
168b6cf8a1 Consistently use log.KObj to format pods in logs 2021-04-22 12:14:44 +02:00
Niels de Vos
e22012950b Add Kubelet.ListBlockVolumesForPod() 2021-04-22 08:36:20 +02:00
Kubernetes Prow Robot
7ed02d61d1
Merge pull request #101235 from andyzhangx/azurefile-inline-ns-translation
fix: azure file inline volume namespace issue in csi migration translation
2021-04-21 19:15:44 -07:00
Kubernetes Prow Robot
5779fec3c4
Merge pull request #99959 from AliceZhang2016/nodeaffinity-cleanup
Move nodeaffinity helpers to component-helpers package
2021-04-21 17:03:53 -07:00
Lubomir I. Ivanov
7deac5e697 pkg/kubelet: improve the node informer sync check
GetNode() is called in a lot of places including a hot loop in
fastStatusUpdateOnce. Having a poll in it is delaying
the kubelet /readyz status=200 report.

If a client is available attempt to wait for the sync to happen,
before starting the list watch for pods at the apiserver.
2021-04-21 22:46:27 +03:00
elbehery
848ae095c8 fix_change_error_to_info 2021-04-21 10:35:23 +02:00
andyzhangx
e10d3948f5 fix: azure file namespace issue in csi translation
fix build failure

fix comments
2021-04-20 07:23:09 +00:00
Kubernetes Prow Robot
7552ca9f56
Merge pull request #101093 from wzshiming/fix/startup-probe
Fix `startupProbe` behaviour changed
2021-04-19 18:54:32 -07:00
Jiaming Xu
5f8dd349d1 Add exit code log when container died
update log exit code logic

adjust log exit code logic

fix invalid memory access in unit test

adjust log

update log message

address latest comment

change logging format

remove space in key of log

address latest comments

address comments
2021-04-20 00:19:16 +00:00
Kubernetes Prow Robot
bd67aeff26
Merge pull request #101012 from tnqn/kubelet-panic
Fix panic when killing container fails
2021-04-19 11:05:42 -07:00
Rastko Sarcevic
4a99a6eb12 Deleted deprecated lines 2021-04-16 13:24:43 +02:00
Shiming Zhang
6defb3657f Fix startupProbe behaviour changed
Signed-off-by: Shiming Zhang <wzshiming@foxmail.com>
2021-04-16 10:11:52 +08:00
Shiming Zhang
44e9f6175d Fix test
Signed-off-by: Shiming Zhang <wzshiming@foxmail.com>
2021-04-14 17:02:27 +08:00
Kubernetes Prow Robot
3c20c5aa2f
Merge pull request #100177 from wangyx1992/wrapped-error
fix errors in wrapped format
2021-04-13 23:24:42 -07:00
Kubernetes Prow Robot
87e0466e4e
Merge pull request #101006 from pacoxu/flake/fix-exec-probe
frequently flake ut: exec test should not run in Parallel as feature gate is not locked yet
2021-04-13 19:44:43 -07:00
卢振兴10069964
feab44c273 fix a broken link in docker_sandbox.go 2021-04-14 09:01:37 +08:00
Lee Verberne
29178fff1c Add kubelet managed pod metrics 2021-04-13 14:13:30 +02:00
pacoxu
49e7700cef exec test should not run in Parallel as feature gate is not locked
Signed-off-by: pacoxu <paco.xu@daocloud.io>
2021-04-13 14:15:43 +08:00
Maxime Lagresle
63cba062eb
more sensible comment 2021-04-12 21:20:20 +02:00
Rastko Sarcevic
a245f73b34 Fixed the broken link 2021-04-12 19:23:41 +02:00
卢振兴10069964
70cf1fe882 fix broken link for issue 101008 2021-04-12 15:54:20 +08:00
Quan Tian
a90df057ac Fix panic when killing container fails
Use runningPod for logging as the pod passed in could be nil.
2021-04-12 14:02:53 +08:00
Kubernetes Prow Robot
0691157ec4
Merge pull request #99729 from ravisantoshgudimetla/low-oom-score
Only system-node-critical pods should be OOM Killed last
2021-04-11 11:28:01 -07:00
Kubernetes Prow Robot
3eac797255
Merge pull request #100200 from jackfrancis/ctx-respect-ExecProbeTimeout
respect ExecProbeTimeout=false for dockershim
2021-04-10 22:55:59 -07:00
Kubernetes Prow Robot
4959cd6339
Merge pull request #100671 from Niekvdplas/spelling-mistakes
Fixed several spelling mistakes
2021-04-09 05:19:45 -07:00
Kubernetes Prow Robot
12f8466459
Merge pull request #100267 from Jeffwan/support_arbitratry_resources
Expose resources overrides and maxPods conf in kubemark
2021-04-08 20:29:12 -07:00
Kubernetes Prow Robot
4a3e1b90c7
Merge pull request #100175 from changshuchao/testcase_utils
test case for pkg/kubelet/cri/remote/utils.go
2021-04-08 20:28:22 -07:00
Kubernetes Prow Robot
f72410d4c6
Merge pull request #100067 from changshuchao/testcase_status
Add test case for state.go
2021-04-08 17:11:21 -07:00
Kubernetes Prow Robot
4fae6ae5d2
Merge pull request #99839 from saschagrunert/portforward-stream-cleanup
Cleanup portforward streams after their usage
2021-04-08 15:59:51 -07:00
Kubernetes Prow Robot
86fdf7b56e
Merge pull request #99487 from chymy/fix-staticcheck0226
Fix staticcheck failures for pkg/controller/replicaset and pkg/kubelet/dockershim
2021-04-08 14:28:17 -07:00
Jack Francis
5a43067915 respect ExecProbeTimeout 2021-04-07 12:38:19 -07:00
chen zechun
d16d57b7d1 fix delete duplicate logs 2021-04-02 16:18:47 +08:00
dabaooline
a03db16c5f make clear PodConfigNotification's type 2021-04-01 18:53:16 +08:00
Niekvdplas
fec272a7b2 Fixed several spelling mistakes 2021-03-30 23:02:09 +02:00
wangyx1992
34c2b2360b fix errors in wrapped format
Signed-off-by: wangyx1992 <wang.yixiang@zte.com.cn>
2021-03-26 14:57:55 +08:00
Paco Xu
54606db1b4
Update pkg/kubelet/pleg/generic.go
Co-authored-by: Elana Hashman <ehashman@users.noreply.github.com>
2021-03-26 13:19:51 +08:00
pacoxu
3fc1e0891b Update the kubelet log status to level 6 as it is so big
Signed-off-by: pacoxu <paco.xu@daocloud.io>
2021-03-26 10:09:20 +08:00
Kubernetes Prow Robot
bacce2eca6
Merge pull request #100215 from pacoxu/fix/data-race
fix a data race in volume reconciler ut #99815
2021-03-24 20:01:29 -07:00
Kubernetes Prow Robot
ea07644522
Merge pull request #99926 from gjkim42/deflake-TestWatchFileChanged
kubelet_test: Deflake TestWatchFileChanged
2021-03-23 16:30:05 -07:00
Kubernetes Prow Robot
bbb58fa085
Merge pull request #100465 from chrishenzie/nil-ptr-deref-in-logs
Fix nil ptr dereference in log line
2021-03-23 09:41:36 -07:00
Kubernetes Prow Robot
be2eb33b96
Merge pull request #100438 from dims/fix-providerless-kubelet
Ensure providerless kubelet does not pull cloud providers
2021-03-23 07:49:37 -07:00
Chris Henzie
f756bd5189 Fix nil ptr dereference in log line 2021-03-22 16:06:51 -07:00
Davanum Srinivas
ba56884d91
Ensure providerless kubelet does not pull cloud providers
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2021-03-21 21:36:38 -04:00
Aditi Sharma
a724a3df77 Fix structured logs for dns.go
Signed-off-by: Aditi Sharma <adi.sky17@gmail.com>
2021-03-19 19:05:30 +05:30
Elana Hashman
6af7eb6d49
Migrate missed log entries in kubelet
Co-Authored-By: pacoxu <paco.xu@daocloud.io>
2021-03-18 14:26:26 -07:00
Kubernetes Prow Robot
e9632d93f7
Merge pull request #99861 from navidshaikh/pr/slog-kubelet
Migrate pkg/kubelet/kubelet.go to structured logging
2021-03-17 19:03:18 -07:00
Jiaxin Shan
1b4dc87a1f Expose resources overrides and maxPods conf in kubemark 2021-03-17 16:31:58 -07:00
Kubernetes Prow Robot
862aa6d3a0
Merge pull request #99970 from krzysiekg/structured_logging_pkg_kubelet_kuberuntime
Migrate pkg/kubelet/kuberuntime to structured logging
2021-03-17 11:45:31 -07:00
Kubernetes Prow Robot
b5c6434f6b
Merge pull request #98850 from yangjunmyfm192085/run-test14
Structured Logging migration: modify volume and container part logs o…
2021-03-17 11:45:19 -07:00
Navid Shaikh
be91ea5bd1 Migrate pkg/kubelet/kubelet.go to structured logging 2021-03-17 14:39:08 +05:30
Kubernetes Prow Robot
80ff14a47d
Merge pull request #99855 from hexxdump/master
Migrating pkg/kubelet/winstats to structured logging
2021-03-17 00:46:56 -07:00
Kubernetes Prow Robot
c5680da8df
Merge pull request #99006 from yangjunmyfm192085/run-test17
Structured Logging migration: modify cri  part logs of kubelet.
2021-03-16 20:10:56 -07:00
JunYang
01a4e4face Structured Logging migration: modify volume and container part logs of kubelet.
Signed-off-by: JunYang <yang.jun22@zte.com.cn>
2021-03-17 08:59:03 +08:00
Krzysztof Gibuła
629d5ab213 Migrate pkg/kubelet/kuberuntime to structured logging 2021-03-17 01:53:44 +01:00
Kubernetes Prow Robot
1d4777b798
Merge pull request #100163 from lala123912/kubelet_log_3
Migrate pkg/kubelet/cm/cpumanage/{topology/togit pology.go, policy_none.go, cpu_assignment.go} to structured logging
2021-03-16 15:57:08 -07:00
Kubernetes Prow Robot
045b5ddd0b
Merge pull request #100265 from ehashman/finish-100010
Migrate pkg/kubelet/kubeletconfig to structured logging
2021-03-16 14:50:51 -07:00
Kubernetes Prow Robot
1cd909606d
Merge pull request #100176 from pacoxu/structured-log-kubelet-last
Kubelet migration to structured logs: cpumanager/{cpu_manager.go\fake_cpu_manager.go\policy_static.go)
2021-03-16 14:50:31 -07:00
Kubernetes Prow Robot
f217f3c0f9
Merge pull request #100081 from utsavoza/ugo/issue-98976/10-03-2021
Migrate pkg/kubelet/cm/cgroup_manager_linux.go to structured logging
2021-03-16 14:50:22 -07:00
Kubernetes Prow Robot
21de277402
Merge pull request #100007 from utsavoza/ugo/issue-98976/09-03-2021
Migrate remaining pkg/kubelet/cm/ top level files to structured logging
2021-03-16 14:50:14 -07:00
Kubernetes Prow Robot
38fbecf0c8
Merge pull request #100001 from shiyajuan123/logs
migrate kubelet/cm/container logs to structured logging
2021-03-16 14:50:06 -07:00
Kubernetes Prow Robot
5ead6af84e
Merge pull request #99994 from AfrouzMashayekhi/sl-cmd-kubelet
Migrate cmd/kubelet and pkg/kubelet/cadvisor , pkg/kubelet/cri/remote/util , pkg/kubelet/images to structured logging
2021-03-16 14:49:56 -07:00
Kubernetes Prow Robot
e5309efbdf
Merge pull request #99974 from knabben/sl-memorymanager
Migrate pkg/kubelet/cm/memorymanager to structured logging
2021-03-16 14:49:47 -07:00
Kubernetes Prow Robot
81a1a793a1
Merge pull request #99969 from knabben/sl-topologymanager
Migrate pkg/kubelet/cm/topologymanager to structure logging
2021-03-16 14:49:39 -07:00
Kubernetes Prow Robot
97f59e9431
Merge pull request #99848 from qingwave/structred-log-kubelet-preemption
Migrate kubelet/preemption and kubelet/logs to structured logging
2021-03-16 14:49:21 -07:00