Exploring termination revealed we have race conditions in certain
parts of pod initialization and termination. To better catch these
issues refactor the existing test so it can be reused, and then test
a number of alternate scenarios.
Some storage tests deploy DaemonSets which hard-code /var/lib/kubelet as root
directory for kubelet registration and pod directory. There was already a
parameter which allowed specifying the root directory, just with a very
confusing name ("--volume-dir") and matching field name. A --kubelet-root-dir
parameters gets added because this may make it easier to find the parameter,
with the old name preserved as an alias for the same field for backwards
compatibility.
This test has been part of the Conformance suite since at least
Kubernetes 1.2 (2015-10-xx). Some years later, around 2018-10-xx, we
drafted a rigorous set of rules for tests to follow in order to be
eligible for promotion to Conformance. We explicitly disallowed any
tests that check for specific Events, since they are not an API, and we
make no guarantees about their contents nor their delivery.
Unfortunately, we neglected to go through the existing corpus of
Conformance tests with a fine-toothed comb after drafting these rules.
The very nature of what this test is attempting to exercise and verify
is specific Events, and their delivery, thus making it ineligible for
Conformance. We should have caught and demoted this test back then.
Better late than never?
* De-share the Handler struct in core API
An upcoming PR adds a handler that only applies on one of these paths.
Having fields that don't work seems bad.
This never should have been shared. Lifecycle hooks are like a "write"
while probes are more like a "read". HTTPGet and TCPSocket don't really
make sense as lifecycle hooks (but I can't take that back). When we add
gRPC, it is EXPLICITLY a health check (defined by gRPC) not an arbitrary
RPC - so a probe makes sense but a hook does not.
In the future I can also see adding lifecycle hooks that don't make
sense as probes. E.g. 'sleep' is a common lifecycle request. The only
option is `exec`, which requires having a sleep binary in your image.
* Run update scripts
This test case requires special test-handler setup which is only done
for gce clusters created by kube-up scripts. Let's skip the test when
run under other providers.
We now use a host local exec instead of SSH commands to simplify the
test and make the result more robust.
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
This assumes that SSH via bastion works if the `KUBE_SSH_BASTION`
environment variable is set, which is the case for
`pull-kubernetes-e2e-gce-correctness`.
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
A number of race conditions exist when pods are terminated early in
their lifecycle because components in the kubelet need to know "no
running containers" or "containers can't be started from now on" but
were relying on outdated state.
Only the pod worker knows whether containers are being started for
a given pod, which is required to know when a pod is "terminated"
(no running containers, none coming). Move that responsibility and
podKiller function into the pod workers, and have everything that
was killing the pod go into the UpdatePod loop. Split syncPod into
three phases - setup, terminate containers, and cleanup pod - and
have transitions between those methods be visible to other
components. After this change, to kill a pod you tell the pod worker
to UpdatePod({UpdateType: SyncPodKill, Pod: pod}).
Several places in the kubelet were incorrect about whether they
were handling terminating (should stop running, might have
containers) or terminated (no running containers) pods. The pod worker
exposes methods that allow other loops to know when to set up or tear
down resources based on the state of the pod - these methods remove
the possibility of race conditions by ensuring a single component is
responsible for knowing each pod's allowed state and other components
simply delegate to checking whether they are in the window by UID.
Removing containers now no longer blocks final pod deletion in the
API server and are handled as background cleanup. Node shutdown
no longer marks pods as failed as they can be restarted in the
next step.
See https://docs.google.com/document/d/1Pic5TPntdJnYfIpBeZndDelM-AbS4FN9H2GTLFhoJ04/edit# for details
The e2e that create/deletes pods rapidly and verifies their status
was reporting a very long timing:
timings total=12.211347385s t=491ms run=2s execute=450402h8m25s
in a few scenarios. Add error checks that clarify when this happens
and why. Report p50/75/90/99 latencies on teardown as observed from
the test for baseline for future changes.
Before assuming that a certain host runs an SSH server, we now test its
`SSHPort` for connectivity. This means that the test `should be able to
run crictl on the node` can be now more failure proof by checking only
hosts where SSH actually runs. Beside that, we can also test all hosts
and not only the first one.
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
This replaces the check of mount propagation to/from the host OS mount
namespace to a similar check about the mount namespace where kubelet is
running (which may or may not be the same mount namespace as the host
OS).
This addresses issue #100259