Commit Graph

8604 Commits

Author SHA1 Message Date
mattjmcnaughton
9077603b97 Add richer unit tests for OomWatcher
Add unit tests for OomWatcher that actually test the logic defined in
the `Start` method. As a result of an earlier refactor, its now trivial
to mock the OOMInstance events which the `oom_watcher` is supposed to be
watching.
2020-01-14 07:57:06 -05:00
mattjmcnaughton
ab7e0f58d5 Clean up rkt specific code in pkg/kubelet/pleg
Clean up code in PLEG which was only necessary for the `rkt` runtime.

Rkt is no longer a built-in runtime and docker(shim) uses the CRI, so
its safe to remove this code entirely.

This diff removes the last mentions of `rkt` in the kubelet.
2020-01-14 07:42:30 -05:00
Kubernetes Prow Robot
be26fbc638 Merge pull request #86282 from RainbowMango/pr_refactor_resource_endpoint
Refactor kubelet resource metrics
2020-01-14 02:23:09 -08:00
Kubernetes Prow Robot
f4db8212be Merge pull request #76496 from danielqsj/metrics-2
Clean deprecated metrics
2020-01-13 20:53:09 -08:00
Kubernetes Prow Robot
61d36e4a43 Merge pull request #85850 from danwinship/kubelet-ipv6-node-ip
Allow "kubelet --node-ip ::" to mean prefer IPv6
2020-01-13 17:41:08 -08:00
Kubernetes Prow Robot
8467561f2c Merge pull request #86783 from mattjmcnaughton/mattjmcnaughton/remove-unnecessary-modification-container-pid-namespace
Remove no longer needed `modifyContainerPIDNamespaceOverrides`
2020-01-10 15:43:50 -08:00
Kubernetes Prow Robot
befc371364 Merge pull request #86702 from mattjmcnaughton/mattjmcnaughton/refactor-oom-watcher-to-allow-greater-test-coverage
Refactor oom watcher to allow greater test coverage
2020-01-10 15:43:37 -08:00
danielqsj
ab182552b4 clean SinceInMicroseconds, convert to SinceInSeconds 2020-01-10 17:05:38 +08:00
danielqsj
8ae3f80048 remove deprecated metrics of dockershim 2020-01-10 17:05:38 +08:00
danielqsj
1a9b121764 remove deprecated metrics of kubelet 2020-01-10 16:46:52 +08:00
Kubernetes Prow Robot
7a50fdb2a6 Merge pull request #85993 from chendotjs/fix-cidr
kubenet: replace gateway with cni result
2020-01-09 20:13:04 -08:00
Kubernetes Prow Robot
792fe793a1 Merge pull request #86946 from cchord/fix_typo
fix typo
2020-01-08 14:46:24 -08:00
Kubernetes Prow Robot
fd0358fd21 Merge pull request #86689 from klueska/upstream-fix-cpumanager-v1-state-checksum
Lock checksum calculation for v1 CPUManager state to pre 1.18 logic
2020-01-08 02:57:40 -08:00
Jiahao Zhu
680df17f39 fix typo 2020-01-08 15:48:58 +08:00
Kubernetes Prow Robot
8dca390262 Merge pull request #84927 from mattjmcnaughton/mattjmcnaughton/fix-kubelet-config-common
Fix golint failures for pkg/kubelet/config/...
2020-01-07 21:09:40 -08:00
mattjmcnaughton
8897c435ad Refactor oom watcher to allow greater test coverage
This diff contains a strict refactor; there are no behavioral changes.

Address a long standing TODO in `oom_watcher_linux_test.go` around test
coverage. We refactor our `oom.Watcher` so it takes in a struct
fulfulling the `streamer` interface (i.e. defines `StreamOoms` method).
In production, we will continue to use the `oomparser` from `cadvisor`.
However, for testing purposes, we can now create our own `fakeStreamer`,
and control how it streams `oomparser.OomInstance`. With this fake, we
can implement richer unit testing for the `oom.Watcher` itself.

Actually adding the additional unit tests will come in a later commit.
2020-01-07 21:48:14 -05:00
Kubernetes Prow Robot
49e24adf3e Merge pull request #86832 from mattjmcnaughton/mattjmcnaughton/remove-dead-code-in-fake-docker-client
Remove dead code in fake docker client
2020-01-07 07:36:18 -08:00
Dan Winship
ce68edf700 Allow "kubelet --node-ip ::" to mean prefer IPv6 2020-01-07 07:53:21 -05:00
Kubernetes Prow Robot
f3df7a2fdb Merge pull request #86727 from mattjmcnaughton/mattjmcnaughton/remove-recorder-PastEventf
Remove `recorder.PastEventf` method
2020-01-07 04:38:49 -08:00
Kubernetes Prow Robot
dd5272b76f Merge pull request #86575 from gongguan/nkubemark
kubemark use remote cri
2020-01-07 03:20:46 -08:00
Kubernetes Prow Robot
195e8e3ad9 Merge pull request #86844 from mattjmcnaughton/mattjmcnaughton/update-cadvisor-stats-provider-comment
Correct comment around which integrations require cadvisor_stats
2020-01-07 01:13:14 -08:00
Kubernetes Prow Robot
8b8f2aa4a5 Merge pull request #85431 from irbull/api-doc
Add public documentation for kubelet/apis/config
2020-01-06 23:12:18 -08:00
louisgong
324e5ce7e3 hollow-node use remote CRI 2020-01-07 11:00:45 +08:00
Kubernetes Prow Robot
59b4933fb8 Merge pull request #86724 from gongguan/fix-fake-CRI
fix fake remote CRI
2020-01-06 18:06:57 -08:00
Kubernetes Prow Robot
49bc696614 Merge pull request #86251 from bboreham/pleg-last-seen-metric
Kubelet: add a metric to observe time since PLEG last seen
2020-01-06 18:06:18 -08:00
Kubernetes Prow Robot
d6412b856f Merge pull request #84345 from danielqsj/withdialer
replace grpc.WithDialer which is deprecated
2020-01-06 15:56:17 -08:00
Kubernetes Prow Robot
19ecd690fa Merge pull request #86646 from yutedz/client-protocol
Require client / server protocols
2020-01-06 13:34:18 -08:00
Kubernetes Prow Robot
b112ad4f0b Merge pull request #86845 from mattjmcnaughton/mattjmcnaughton/remove-rkt-from-runtime-options
Remove `rkt` from container runtime options
2020-01-06 11:12:29 -08:00
Kubernetes Prow Robot
9acf7d11fe Merge pull request #86344 from klueska/upstream-cm-approver
Add klueska as an approver in pkg/kubelet/cm/OWNERS
2020-01-06 09:54:16 -08:00
Ted Yu
906adbdfcd Require client / server protocols 2020-01-06 08:50:04 -08:00
mattjmcnaughton
794d0d9b4d Remove rkt from container runtime options
Part of efforts to clean up mentions of rkt in kubelet.

rkt was removed entirely in 1.11, in favor of using `rktlet` and CRI
instead. It should no longer be listed at all as a runtime.
2020-01-05 09:27:38 -05:00
mattjmcnaughton
06b44c76fd Correct comment around which integrations require cadvisor_stats
This commit is part of a larger effort to clean up references to `rkt`
in the kubelet.

Previously, this comment hard-coded which integrations required
the cadvisor stats provider. The comment has grown stale
(i.e. referenced rkt and did not reference cri-o).

Update the comment to instead point to the code which determines which
integrations need the cadvisor stats provider.
2020-01-05 09:23:09 -05:00
mattjmcnaughton
f2cb1f35fe Remove dead code in fake docker client
The `FakeDockerClient` had a number of methods defined on it which were
not being called anywhere. The majority were of the form `Assert...`.

In the spirit of removing dead code, remove the methods which aren't
being called.
2020-01-05 08:31:59 -05:00
louisgong
e8eb5c656b fix fake remote CRI 2020-01-04 08:43:17 +08:00
Aresforchina
2293b47346 add some comments for const variable 2020-01-03 23:28:21 +08:00
Bryan Boreham
cc0b3e82eb Kubelet: add a metric to observe time since PLEG last seen
Expose the measurement that kubelet uses to judge that "PLEG is
unhealthy". If we can observe the measurement growing then we can
alert before the node goes unhealthy.

Note that the existing metrics PLEGRelistInterval and
PLEGRelistDuration are poor for this, because when relist() gets
stuck they are never updated.

Signed-off-by: Bryan Boreham <bryan@weave.works>
2020-01-03 10:01:27 +00:00
mattjmcnaughton
d09fe8e247 Remove no longer needed modifyContainerPIDNamespaceOverrides
As of https://github.com/kubernetes/kubernetes/pull/72831/, the minimum
kubernetes version is now 1.13.1. As a result, this function becomes a
no-op. As the TODO indicates, we should delete it.
2020-01-02 09:09:02 -05:00
mattjmcnaughton
92940fa80d Remove recorder.PastEventf method
The `recorder.PastEventf` method wasn't actually working as advertised.
It was supposed to accept a timestamp, which would be used when
generating the event. However, as the
[source code](https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/client-go/tools/record/event.go#L316)
shows, this `timestamp` was never actually used.

In other words, `PastEventf` is identical to `Eventf`.

We have two options: one would be to fix `PastEventf` so that it works
as advertised. The other would be to delete `PastEventf` and only
support `Eventf`.

Ultimately, I could only find one use of `PastEventf` in the code base,
so I propose we just delete `PastEventf` and convert all uses to
`Eventf`.
2019-12-30 12:00:23 -05:00
Kevin Klues
b373121a14 Make CPUManagerCheckpointV2 type an alias of CPUManagerCheckpoint
This change is to prevent problems when we remove the V1->V2 migration
code in the future. Without this, the checksums of all checkpoints would
be hashed with the name CPUManagerCheckpointV2 embedded inside of them,
which is undesirable. We want the checkpoints to be hashed with the name
CPUManagerCheckpoint instead.
2019-12-28 19:29:13 +01:00
Kevin Klues
5faf8f4c52 Lock checksum calculation for v1 CPUManager state to pre 1.18 logic
The updated CPUManager from PR #84462 implements logic to migrate the
CPUManager checkpoint file from an old format to a new one. To do so, it
defines the following types:

```
type CPUManagerCheckpoint = CPUManagerCheckpointV2
type CPUManagerCheckpointV1 struct {  ...  }
type CPUManagerCheckpointV2 struct {  ...  }
```

This replaces the old definition of just:

```
type CPUManagerCheckpoint struct {  ...  }
```

Code was put in place to ensure proper migration from checkpoints in V1
format to checkpoints in V2 format. However (and this is a big however),
all of the unit tests were performed on V1 checkpoints that were
generated using the type name `CPUManagerCheckpointV1` and not the
original type name of `CPUManagerCheckpoint`. As such, the checksum in
the checkpoint file uses the `CPUManagerCheckpointV1` type to calculate
its checksum and not the original type name of `CPUManagerCheckpoint`.

This causes problems in the real world since all pre-1.18 checkpoint
files will have been generated with the original type name of
`CPUManagerCheckpoint`. When verifying the checksum of the checkpoint
file across an upgrade to 1.18, the checksum is calculated assuming
a type name of `CPUManagerCheckpointV1` (which is incorrect) and the
file is seen to be corrupt.

This patch ensures that all V1 checksums are verified against a type
name of `CPUManagerCheckpoint` instead of ``CPUManagerCheckpointV1`.
It also locks the algorithm used to calculate the checksum in place,
since it wil never change in the future (for pre-1.18 checkpoint
files at least).
2019-12-28 14:17:55 +01:00
danielqsj
19fe9f8d94 replace grpc.WithDialer which is deprecated 2019-12-26 17:46:59 +08:00
SataQiu
2497a1209b bump k8s.io/utils version 2019-12-21 14:54:44 +08:00
Kubernetes Prow Robot
03e90b80ce Merge pull request #86167 from yiyang5055/change-CounterVec-to-Counter
change CounterVec to use Counter in the Kubelet's Pod Lifecycle Event…
2019-12-19 11:33:56 -08:00
whypro
f4bd4e2e96 Return error instead of panic when cpu manager starts failed. 2019-12-19 21:56:23 +08:00
chenyaqi01
c5002a348e kubenet: replace gateway with cni result 2019-12-19 18:32:25 +08:00
Jacek Kaniuk
4303be3d9f Revert pull request #85879 "hollow-node use remote CRI" 2019-12-19 10:52:35 +01:00
sshukun
12a5bdec00 Fix go-lint issues in package pkg/kubelet/checkpointmanager/testing/example_checkpoint_formats/v1 2019-12-19 12:53:33 +09:00
Kubernetes Prow Robot
814fc34cde Merge pull request #85879 from gongguan/cri-kubemark
hollow-node use remote CRI
2019-12-18 06:01:57 -08:00
louisgong
e8e1cc9ee0 extract PreInitRuntimeService from NewMainKubelet 2019-12-18 11:48:29 +08:00
Kubernetes Prow Robot
40df9f82d0 Merge pull request #82492 from gnufied/fix-uncertain-mounts
Fix uncertain mounts
2019-12-17 14:49:57 -08:00