This starts ephemeral containers prior to init containers so that
ephemeral containers will still be started when init containers fail to
start.
Also improves tests and comments with review suggestions.
If kubelet never gets past sandbox creation (i.e., never attempted to
create containers for a pod), it should retry the sandbox creation on
failure, regardless of the restart policy of the pod.
If the container is not found, do not stop reading the log file
immediately but wait until we reach again EOF.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
it seems fsnotify can miss some read events, blocking the kubelet to
receive more data from the log file.
If we end up waiting for events with fsnotify, force a read from the
log file every second so that are sure to not miss new data for longer
than that.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
if the runtime is configured to rotate the log file, we might end up
watching the old fd where there are no more writes.
When a fsnotify event other than Write is received, reopen the log
file and recreate the watcher.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
1) Add suffix (`seconds` or `total`) to metric name
2) Switch Summary metric to Histogram metric (Summary metrics are not
supported completely by prometheus-to-sd and can't be aggregated.)
The messages for container lifecycle events are subtly inconsistent
and should be unified.
First, the field format for containers is hard to parse for a human,
so include the container name directly in the message for create
and start, and for kill remove the container runtime prefix.
Second, the pulling image event has inconsistent capitalization, fix
that to be sentence without punctuation.
Third, the kill container event was unnecessarily wordy and inconsistent
with the create and start events. Make the following changes:
* Use 'Stopping' instead of 'Killing' since kill is usually reserved for
when we decide to hard stop a container
* Send the event before we dispatch the prestop hook, since this is an
"in-progress" style event vs a "already completed" type event
* Remove the 'cri-o://' / 'docker://' prefix by printing the container
name instead of id (we already do that replacement at the lower level
to prevent high cardinality events)
* Use 'message' instead of 'reason' as the argument name since this is a
string for humans field, not a string for machines field
* Remove the hash values on the container spec changed event because no
human will ever be able to do anything with the hash value
* Use 'Stopping container %s(, explanation)?' form without periods to
follow event conventions
The end result is a more pleasant message for humans:
```
35m Normal Created Pod Created container
35m Normal Started Pod Started container
10m Normal Killing Pod Killing container cri-o://installer:Need to kill Pod
10m Normal Pulling Pod pulling image "registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-02-10-172026@sha256:3da5303d4384d24691721c1cf2333584ba60e8f82c9e782f593623ce8f83ddc5"
```
becomes
```
35m Normal Created Pod Created container installer
35m Normal Started Pod Started container installer
10m Normal Killing Pod Stopping container installer
10m Normal Pulling Pod Pulling image "registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-02-10-172026@sha256:3da5303d4384d24691721c1cf2333584ba60e8f82c9e782f593623ce8f83ddc5"
```