The event parameter wasn't actually used when processing oom events,
likely because it's only ever available for reads.
Additionally clarify flush is for eventfds, and point to where the
buffer size of 8 is coming from.
Signed-off-by: Daniel Canter <dcanter@microsoft.com>
The Collector.Collect will be the field ns'Collect's callback, which be
invoked periodically with internal lock. And Collector.Add also runs
with ns.Lock in Collector.Lock, which is easy to cause deadlock.
Goroutine X:
ns.Collect
ns.Lock
Collector.Collect
Collector.RLock
Goroutine Y:
Collector.Add
Collector.Lock
ns.Lock
We should use ns.Lock without Collector.Lock in Add.
Fix: #6772
Signed-off-by: Wei Fu <fuweid89@gmail.com>
How to test (from https://github.com/opencontainers/runc/pull/2352#issuecomment-620834524):
(host)$ sudo swapoff -a
(host)$ sudo ctr run -t --rm --memory-limit $((1024*1024*32)) docker.io/library/alpine:latest foo
(container)$ sh -c 'VAR=$(seq 1 100000000)'
An event `/tasks/oom {"container_id":"foo"}` will be displayed in `ctr events`.
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
This makes the metrics package more extensible by allowing the default name of
`container_id` to be changed by the package caller.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
* only shim v2 runc v2 ("io.containerd.runc.v2") is supported
* only PID metrics is implemented. Others should be implemented in separate PRs.
* lots of code duplication in v1 metrics and v2 metrics. Dedupe should be separate PR.
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
This will help to decouple the import in CRI from the cgroups package
directly by importing the type alias in containerd repo.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Since Go 1.7, context is a standard package, superceding the
"x/net/context". Since Go 1.9, the latter only provides a few type
aliases from the former. Therefore, it makes sense to switch to the
standard package.
This commit was generated by the following script (with a couple of
minor fixups to remove extra changes done by goimports):
#!/bin/bash
if [ $# -ge 1 ]; then
FILES=$*
else
FILES=$(git ls-files \*.go | grep -vF ".pb.go" | grep -v
^vendor/)
fi
for f in $FILES; do
printf .
sed -i -e 's|"golang.org/x/net/context"$|"context"|' $f
goimports -w $f
awk ' /^$/ {e=1; next;}
/[[:space:]]"context"$/ {e=0;}
{if (e) {print ""; e=0}; print;}' < $f > $f.new && \
mv $f.new $f
goimports -w $f
done
echo
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This linter checks for unnecessary type convertions.
Some convertions are whitelisted because their type is different
on 32bit platforms
Signed-off-by: Daniel Nephin <dnephin@gmail.com>
To avoid importing all of grpc when consuming events, the types of
events have been split in to a separate package. This should allow a
reduction in memory usage in cases where a package is consuming events
but not using the gprc service directly.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
This commit adds a collection step in the Stop() task handler which will
retrieve the metrics available for this container at that time, and
store them until the next prometheus Collect() cycle.
This allows short-lived containers to be visible in prometheus, which
would otherwise be ignored (for example, running containerd-stress would
show something like 2 or 3 containers in the end, while now we can see
all of them). It also allows for more accurate collection when
long-running containers end (for example CPU usage could spike in the
last few seconds).
A simple case illustrating this with cpu usage would be:
ctr run -t --rm docker.io/library/alpine:latest mycontainer sh -c 'yes > /dev/null & sleep 3 && pkill yes'
Signed-off-by: Mathieu Pasquet <mathieu.pasquet@alterway.fr>
This adds an option for the cgroups monitor to include container metrics
in the prometheus output. We will have to use the plugin to emit oom
events via the events service but when the `no_prom` setting is set for
the plugin container metrics will not be included in the prom output.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This converts the oom metric to be a const metric so that deleted tasks
do not fill up the metric labels.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This reverts commit 06dc87ae59.
Revert "Change oom metric to const"
This reverts commit e800f08f9f.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This removes the metric vec that was holding onto all task id and
namespace combinations forever, until containerd was restarted. This
was causing a memory leak with many task.
This also removes the shim cmd where the `Args` is quite large from the
reaper after the shim has been started cutting down on another leak.
This is the first pass through the reaper but more code is required to
fix all the issues when commands are added.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
In the course of setting out to add filters and address some cleanup, it
was found that we had a few problems in the events subsystem that needed
addressing before moving forward.
The biggest change was to move to the more standard terminology of
publish and subscribe. We make this terminology change across the Go
interface and the GRPC API, making the behavior more familier. The
previous system was very context-oriented, which is no longer required.
With this, we've removed a large amount of dead and unneeded code. Event
transactions, context storage and the concept of `Poster` is gone. This
has been replaced in most places with a `Publisher`, which matches the
actual usage throughout the codebase, removing the need for helpers.
There are still some questions around the way events are handled in the
shim. Right now, we've preserved some of the existing bugs which may
require more extensive changes to resolve correctly.
Signed-off-by: Stephen J Day <stephen.day@docker.com>