This brings in a ton of great improvements, most notably for the containerd
daemon is performance improvements for cgroups1 and 2 for gathering stats,
as well as some fixes for enabling controllers and deleting v1 cgroups.
Signed-off-by: Danny Canter <danny@dcantah.dev>
All of the tests using this didn't need stdin/err (one of them not even
stdout), so we can just leave them "empty" and change to a withStdout
naming to make it more obvious.
Signed-off-by: Danny Canter <danny@dcantah.dev>
I think NullIO is fine on Windows now. We have it as an option in ctr
and it's used for the pod sandbox container in CRI. Lets see if CI agrees..
Signed-off-by: Danny Canter <danny@dcantah.dev>
Microsoft announced the removal of nondistributable layers from their
images today. This makes the convert test fail since it assumes the
first layer is nondistributable on Windows during the test.
Signed-off-by: Phil Estes <estesp@amazon.com>
This change adds support for CDI devices to the ctr --device flag.
If a fully-qualified CDI device name is specified, this is injected
into the OCI specification before creating the container.
Note that the CDI specifications and the devices that they represent
are local and mirror the behaviour of linux devices in the ctr command.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Add new test cases for volumes on both Linux and Windows. These new
volumes will be used to test that we don't accidentally mangle volume
paths on Linux and that non-C volume mounts work properly when defined
in an image on Windows.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
release notes: https://github.com/opencontainers/runc/releases/tag/v1.1.7
full diff: https://github.com/opencontainers/runc/compare/v1.1.6...v1.1.7
This is the seventh patch release in the 1.1.z release of runc, and is
the last planned release of the 1.1.z series. It contains a fix for
cgroup device rules with systemd when handling device rules for devices
that don't exist (though for devices whose drivers don't correctly
register themselves in the kernel -- such as the NVIDIA devices -- the
full fix only works with systemd v240+).
- When used with systemd v240+, systemd cgroup drivers no longer skip
DeviceAllow rules if the device does not exist (a regression introduced
in runc 1.1.3). This fix also reverts the workaround added in runc 1.1.5,
removing an extra warning emitted by runc run/start.
- The source code now has a new file, runc.keyring, which contains the keys
used to sign runc releases.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
release notes: https://github.com/opencontainers/runc/releases/tag/v1.1.6
full diff: opencontainers/runc@v1.1.5...v1.1.6
This is the sixth patch release in the 1.1.z series of runc, which fixes
a series of cgroup-related issues.
Note that this release can no longer be built from sources using Go
1.16. Using a latest maintained Go 1.20.x or Go 1.19.x release is
recommended. Go 1.17 can still be used.
- systemd cgroup v1 and v2 drivers were deliberately ignoring UnitExist error
from systemd while trying to create a systemd unit, which in some scenarios
may result in a container not being added to the proper systemd unit and
cgroup.
- systemd cgroup v2 driver was incorrectly translating cpuset range from spec's
resources.cpu.cpus to systemd unit property (AllowedCPUs) in case of more
than 8 CPUs, resulting in the wrong AllowedCPUs setting.
- systemd cgroup v1 driver was prefixing container's cgroup path with the path
of PID 1 cgroup, resulting in inability to place PID 1 in a non-root cgroup.
- runc run/start may return "permission denied" error when starting a rootless
container when the file to be executed does not have executable bit set for
the user, not taking the CAP_DAC_OVERRIDE capability into account. This is
a regression in runc 1.1.4, as well as in Go 1.20 and 1.20.1
- cgroup v1 drivers are now aware of misc controller.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
We will use this in future commits to see if the kubelet requested idmap
mounts for volumes, that we don't yet support.
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
- Rename test name
- Add a tag to the container image used in the tests instead of the latest tag
- Add a 5 second delay between container start and stop to ensure that the
container is fully initialized
Signed-off-by: Kirtana Ashok <Kirtana.Ashok@microsoft.com>
Update dependencies and remove the local bindfilter files. Those have
been moved to go-winio.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
Add a public const for "containerd.io/distribution.source" in `labels`
package and replace hardcoded usages.
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
This commit adds supports for the ArgsEscaped
value for the image got from the dockerfile.
It is used to evaluate and process the image
entrypoint/cmd and container entrypoint/cmd
options got from the podspec.
Signed-off-by: Kirtana Ashok <Kirtana.Ashok@microsoft.com>
By default, the child processes spawned by exec process will inherit standard
io file descriptors. The shim server creates a pipe as data channel. Both exec
process and its children write data into the write end of the pipe. And the
shim server will read data from the pipe. If the write end is still open, the
shim server will continue to wait for data from pipe.
So, if the exec command is like `bash -c "sleep 365d &"`, the exec process is
bash and quit after create `sleep 365d`. But the `sleep 365d` will hold the
write end of the pipe for a year! It doesn't make senses that CRI plugin
should wait for it.
For this case, we should use timeout to drain exec process's io instead of
waiting for it.
Fixes: #7802
Signed-off-by: Wei Fu <fuweid89@gmail.com>
Update NRI plugin configuration to match that of NRI. Remove
option for the eliminated NRI configuration file. Add option
to disable connections from externally launched plugins. Add
options to override default plugin registration and request
timeouts.
Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
Point NRI dependency to latest HEAD, commit b3cabdec0657. That
pulls in the necessary NRI fix for a recently discovered panic
and crash.
Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
Create an in-memory exporter and global tracer provider
Pull image with client which should create spans
Validate spans in the exporter
Signed-off-by: Tony Fang <nhfang@amazon.com>
PR #7892 which supposed to fix issue on Linux introduced random failure
on Windows, this commit is to revert that change for Windows platform
Signed-off-by: Tony Fang <nenghui.fang@gmail.com>
The name of the GID 65534 differs across distros.
("nogroup" on Debian derivatives, "nobody" on Red Hat derivatives)
Fix the following test failure:
```
=== RUN TestVolumeOwnership
volume_copy_up_test.go:103: Create a sandbox
main_test.go:667: Pull test image "ghcr.io/containerd/volume-ownership:2.1"
volume_copy_up_test.go:108: Create a container with volume-ownership test image
volume_copy_up_test.go:117: Start the container
volume_copy_up_test.go:125: Check ownership of test directory inside container
volume_copy_up_test.go:146: Check ownership of test directory on the host
volume_copy_up_test.go:153:
Error Trace: /root/go/src/github.com/containerd/containerd/volume_copy_up_test.go:153
Error: Not equal:
expected: "nobody:nogroup\n"
actual : "nobody:nobody\n"
Diff:
--- Expected
+++ Actual
@@ -1,2 +1,2 @@
-nobody:nogroup
+nobody:nobody
Test: TestVolumeOwnership
--- FAIL: TestVolumeOwnership (3.45s)
```
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
Moves the sandbox store plugin under the plugins packages and adds a
unique plugin type for other plugins to depend on it.
Updates the sandbox controller plugin to depend on the sandbox store
plugin.
Signed-off-by: Derek McGowan <derek@mcg.dev>
Due to when we were updating the pod sandboxes underlying container
object, the pointer to the sandbox would have the right info, but
the on-disk representation of the data was behind. This would cause
the data returned from loading any sandboxes after a restart to have
no CNI result or IP information for the pod.
This change does an additional update to the on-disk container info
right after we invoke the CNI plugin so the metadata for the CNI result
and other networking information is properly flushed to disk.
Signed-off-by: Danny Canter <danny@dcantah.dev>
OCI runtime spec defines memory.swap as 'limit of memory+Swap usage'
so setting them to equal should disable the swap. Also, this change
should make containerd behaviour same as other runtimes e.g
'cri-dockerd/dockershim' and won't be impacted when user turn on
'NodeSwap' (https://github.com/kubernetes/enhancements/issues/2400) feature.
Signed-off-by: Qasim Sarfraz <qasimsarfraz@microsoft.com>
golang.org/x/net contains a fix for CVE-2022-41717, which was addressed
in stdlib in go1.19.4 and go1.18.9;
> net/http: limit canonical header cache by bytes, not entries
>
> An attacker can cause excessive memory growth in a Go server accepting
> HTTP/2 requests.
>
> HTTP/2 server connections contain a cache of HTTP header keys sent by
> the client. While the total number of entries in this cache is capped,
> an attacker sending very large keys can cause the server to allocate
> approximately 64 MiB per open connection.
>
> This issue is also fixed in golang.org/x/net/http2 v0.4.0,
> for users manually configuring HTTP/2.
full diff: https://github.com/golang/net/compare/c63010009c80...v0.4.0
other dependency updates (due to (circular) dependencies between them):
- golang.org/x/sys v0.3.0: https://github.com/golang/sys/compare/v0.2.0...v0.3.0
- golang.org/x/term v0.3.0: https://github.com/golang/term/compare/v0.1.0...v0.3.0
- golang.org/x/text v0.5.0: https://github.com/golang/text/compare/v0.4.0...v0.5.0
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Add a common NRI 'service' plugin. It takes care of relaying
requests and respones to and from NRI (external NRI plugins)
and the high-level containerd namespace-independent logic of
applying NRI container adjustments and updates to actual CRI
and other containers.
The namespace-dependent details of the necessary container
manipulation operations are to be implemented by namespace-
specific adaptations. This NRI plugin defines the API which
such adaptations need to implement.
Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
This was updated in 470d3ee057, but we
only needed the ebpf update. As nothing depends on this module anymore,
other than for the stats package (which didn't change in between), we
can (for now) roll it back to v1.0.4, and just force the newer ebpf
package.
Things rolled back (doesn't affect vendored code);
https://github.com/containerd/cgroups/compare/7083cd60b721..v1.0.4
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
go.mod doesn't always do a great job on keeping the dependencies grouped in the
right block; 2b60770c4b added an extra "require"
block, after which things went downward.
This patch is grouping them back in the right block to nudge it in the right
direction.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Go 1.18 and up now provides a strings.Cut() which is better suited for
splitting key/value pairs (and similar constructs), and performs better:
```go
func BenchmarkSplit(b *testing.B) {
b.ReportAllocs()
data := []string{"12hello=world", "12hello=", "12=hello", "12hello"}
for i := 0; i < b.N; i++ {
for _, s := range data {
_ = strings.SplitN(s, "=", 2)[0]
}
}
}
func BenchmarkCut(b *testing.B) {
b.ReportAllocs()
data := []string{"12hello=world", "12hello=", "12=hello", "12hello"}
for i := 0; i < b.N; i++ {
for _, s := range data {
_, _, _ = strings.Cut(s, "=")
}
}
}
```
BenchmarkSplit
BenchmarkSplit-10 8244206 128.0 ns/op 128 B/op 4 allocs/op
BenchmarkCut
BenchmarkCut-10 54411998 21.80 ns/op 0 B/op 0 allocs/op
While looking at occurrences of `strings.Split()`, I also updated some for alternatives,
or added some constraints; for cases where an specific number of items is expected, I used `strings.SplitN()`
with a suitable limit. This prevents (theoretical) unlimited splits.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Signed-off-by: Swagat Bora <sbora@amazon.com>
Add spans around image unpack operations
Use image.ref to denote image name and image.id for the image config digest
Add top-level spand and record errors in the CRI instrumentation service
Remove nolint-comments that weren't hit by linters, and remove the "structcheck"
and "varcheck" linters, as they have been deprecated:
WARN [runner] The linter 'structcheck' is deprecated (since v1.49.0) due to: The owner seems to have abandoned the linter. Replaced by unused.
WARN [runner] The linter 'varcheck' is deprecated (since v1.49.0) due to: The owner seems to have abandoned the linter. Replaced by unused.
WARN [linters context] structcheck is disabled because of generics. You can track the evolution of the generics support by following the https://github.com/golangci/golangci-lint/issues/2649.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
- fix "nolint" comments to be in the correct format (`//nolint:<linters>[,<linter>`
no leading space, required colon (`:`) and linters.
- remove "nolint" comments for errcheck, which is disabled in our config.
- remove "nolint" comments that were no longer needed (nolintlint).
- where known, add a comment describing why a "nolint" was applied.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
ArgsEscaped has now been merged into upstream OCI image spec.
This change removes the workaround we were doing in containerd
to deserialize the extra json outside of the spec and instead
just uses the formal spec types.
Signed-off-by: Justin Terry <jlterry@amazon.com>
Some minor improvements, but biggest for here is ErrPipeListenerClosed
is no longer an errors.New where the string matches the text of the now
exported net.ErrClosed in the stdlib, but is just assigned to net.ErrClosed
directly. This should allow us to get rid of the string check for "use of closed
network connection" here now..
Signed-off-by: Daniel Canter <dcanter@microsoft.com>
For some shims (namely github.com/cpuguy83/containerd-shim-systemd-v1),
the shim cgroup test doesn't make sense since there is only a single
shim process for the entire node.
I use these integration tests to make sure the shim is compatible with
the runc shims and generally works as expected. This will let me skip
the shim cgroup test as there is no process for the shim to stick into
the cgroup... mostly.
There is a bootstrap process as well as a PTY copier proces which do use
the shim cgroup if provided, but the test is not able to check for
those (unless we enable tty on the test, which is a bit arbitrary and
not useful).
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
github.com/AdaLogics/go-fuzz-headers and
github.com/AdamKorcz/go-118-fuzz-build have less dependencies in
the last versions.
Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
This comment was added in 09a0c9471b when the
Windows integration tests were enabled. The PR (microsoft/hcsshim#931) was
merged, and part of hcsshim v0.9.0, and support for resource limits on Windows
was added in 2bc77b8a28, so it looks like this
comment is no longer current.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>