Add unit tests for OomWatcher that actually test the logic defined in
the `Start` method. As a result of an earlier refactor, its now trivial
to mock the OOMInstance events which the `oom_watcher` is supposed to be
watching.
Clean up code in PLEG which was only necessary for the `rkt` runtime.
Rkt is no longer a built-in runtime and docker(shim) uses the CRI, so
its safe to remove this code entirely.
This diff removes the last mentions of `rkt` in the kubelet.
This diff contains a strict refactor; there are no behavioral changes.
Address a long standing TODO in `oom_watcher_linux_test.go` around test
coverage. We refactor our `oom.Watcher` so it takes in a struct
fulfulling the `streamer` interface (i.e. defines `StreamOoms` method).
In production, we will continue to use the `oomparser` from `cadvisor`.
However, for testing purposes, we can now create our own `fakeStreamer`,
and control how it streams `oomparser.OomInstance`. With this fake, we
can implement richer unit testing for the `oom.Watcher` itself.
Actually adding the additional unit tests will come in a later commit.
Part of efforts to clean up mentions of rkt in kubelet.
rkt was removed entirely in 1.11, in favor of using `rktlet` and CRI
instead. It should no longer be listed at all as a runtime.
This commit is part of a larger effort to clean up references to `rkt`
in the kubelet.
Previously, this comment hard-coded which integrations required
the cadvisor stats provider. The comment has grown stale
(i.e. referenced rkt and did not reference cri-o).
Update the comment to instead point to the code which determines which
integrations need the cadvisor stats provider.
The `FakeDockerClient` had a number of methods defined on it which were
not being called anywhere. The majority were of the form `Assert...`.
In the spirit of removing dead code, remove the methods which aren't
being called.
Expose the measurement that kubelet uses to judge that "PLEG is
unhealthy". If we can observe the measurement growing then we can
alert before the node goes unhealthy.
Note that the existing metrics PLEGRelistInterval and
PLEGRelistDuration are poor for this, because when relist() gets
stuck they are never updated.
Signed-off-by: Bryan Boreham <bryan@weave.works>
The `recorder.PastEventf` method wasn't actually working as advertised.
It was supposed to accept a timestamp, which would be used when
generating the event. However, as the
[source code](https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/client-go/tools/record/event.go#L316)
shows, this `timestamp` was never actually used.
In other words, `PastEventf` is identical to `Eventf`.
We have two options: one would be to fix `PastEventf` so that it works
as advertised. The other would be to delete `PastEventf` and only
support `Eventf`.
Ultimately, I could only find one use of `PastEventf` in the code base,
so I propose we just delete `PastEventf` and convert all uses to
`Eventf`.
This change is to prevent problems when we remove the V1->V2 migration
code in the future. Without this, the checksums of all checkpoints would
be hashed with the name CPUManagerCheckpointV2 embedded inside of them,
which is undesirable. We want the checkpoints to be hashed with the name
CPUManagerCheckpoint instead.
The updated CPUManager from PR #84462 implements logic to migrate the
CPUManager checkpoint file from an old format to a new one. To do so, it
defines the following types:
```
type CPUManagerCheckpoint = CPUManagerCheckpointV2
type CPUManagerCheckpointV1 struct { ... }
type CPUManagerCheckpointV2 struct { ... }
```
This replaces the old definition of just:
```
type CPUManagerCheckpoint struct { ... }
```
Code was put in place to ensure proper migration from checkpoints in V1
format to checkpoints in V2 format. However (and this is a big however),
all of the unit tests were performed on V1 checkpoints that were
generated using the type name `CPUManagerCheckpointV1` and not the
original type name of `CPUManagerCheckpoint`. As such, the checksum in
the checkpoint file uses the `CPUManagerCheckpointV1` type to calculate
its checksum and not the original type name of `CPUManagerCheckpoint`.
This causes problems in the real world since all pre-1.18 checkpoint
files will have been generated with the original type name of
`CPUManagerCheckpoint`. When verifying the checksum of the checkpoint
file across an upgrade to 1.18, the checksum is calculated assuming
a type name of `CPUManagerCheckpointV1` (which is incorrect) and the
file is seen to be corrupt.
This patch ensures that all V1 checksums are verified against a type
name of `CPUManagerCheckpoint` instead of ``CPUManagerCheckpointV1`.
It also locks the algorithm used to calculate the checksum in place,
since it wil never change in the future (for pre-1.18 checkpoint
files at least).