release notes: https://github.com/opencontainers/runc/releases/tag/v1.1.7
full diff: https://github.com/opencontainers/runc/compare/v1.1.6...v1.1.7
This is the seventh patch release in the 1.1.z release of runc, and is
the last planned release of the 1.1.z series. It contains a fix for
cgroup device rules with systemd when handling device rules for devices
that don't exist (though for devices whose drivers don't correctly
register themselves in the kernel -- such as the NVIDIA devices -- the
full fix only works with systemd v240+).
- When used with systemd v240+, systemd cgroup drivers no longer skip
DeviceAllow rules if the device does not exist (a regression introduced
in runc 1.1.3). This fix also reverts the workaround added in runc 1.1.5,
removing an extra warning emitted by runc run/start.
- The source code now has a new file, runc.keyring, which contains the keys
used to sign runc releases.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
release notes: https://github.com/opencontainers/runc/releases/tag/v1.1.6
full diff: opencontainers/runc@v1.1.5...v1.1.6
This is the sixth patch release in the 1.1.z series of runc, which fixes
a series of cgroup-related issues.
Note that this release can no longer be built from sources using Go
1.16. Using a latest maintained Go 1.20.x or Go 1.19.x release is
recommended. Go 1.17 can still be used.
- systemd cgroup v1 and v2 drivers were deliberately ignoring UnitExist error
from systemd while trying to create a systemd unit, which in some scenarios
may result in a container not being added to the proper systemd unit and
cgroup.
- systemd cgroup v2 driver was incorrectly translating cpuset range from spec's
resources.cpu.cpus to systemd unit property (AllowedCPUs) in case of more
than 8 CPUs, resulting in the wrong AllowedCPUs setting.
- systemd cgroup v1 driver was prefixing container's cgroup path with the path
of PID 1 cgroup, resulting in inability to place PID 1 in a non-root cgroup.
- runc run/start may return "permission denied" error when starting a rootless
container when the file to be executed does not have executable bit set for
the user, not taking the CAP_DAC_OVERRIDE capability into account. This is
a regression in runc 1.1.4, as well as in Go 1.20 and 1.20.1
- cgroup v1 drivers are now aware of misc controller.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
We will use this in future commits to see if the kubelet requested idmap
mounts for volumes, that we don't yet support.
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
- Rename test name
- Add a tag to the container image used in the tests instead of the latest tag
- Add a 5 second delay between container start and stop to ensure that the
container is fully initialized
Signed-off-by: Kirtana Ashok <Kirtana.Ashok@microsoft.com>
Update dependencies and remove the local bindfilter files. Those have
been moved to go-winio.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
Add a public const for "containerd.io/distribution.source" in `labels`
package and replace hardcoded usages.
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
This commit adds supports for the ArgsEscaped
value for the image got from the dockerfile.
It is used to evaluate and process the image
entrypoint/cmd and container entrypoint/cmd
options got from the podspec.
Signed-off-by: Kirtana Ashok <Kirtana.Ashok@microsoft.com>
By default, the child processes spawned by exec process will inherit standard
io file descriptors. The shim server creates a pipe as data channel. Both exec
process and its children write data into the write end of the pipe. And the
shim server will read data from the pipe. If the write end is still open, the
shim server will continue to wait for data from pipe.
So, if the exec command is like `bash -c "sleep 365d &"`, the exec process is
bash and quit after create `sleep 365d`. But the `sleep 365d` will hold the
write end of the pipe for a year! It doesn't make senses that CRI plugin
should wait for it.
For this case, we should use timeout to drain exec process's io instead of
waiting for it.
Fixes: #7802
Signed-off-by: Wei Fu <fuweid89@gmail.com>
Update NRI plugin configuration to match that of NRI. Remove
option for the eliminated NRI configuration file. Add option
to disable connections from externally launched plugins. Add
options to override default plugin registration and request
timeouts.
Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
Point NRI dependency to latest HEAD, commit b3cabdec0657. That
pulls in the necessary NRI fix for a recently discovered panic
and crash.
Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
Create an in-memory exporter and global tracer provider
Pull image with client which should create spans
Validate spans in the exporter
Signed-off-by: Tony Fang <nhfang@amazon.com>
PR #7892 which supposed to fix issue on Linux introduced random failure
on Windows, this commit is to revert that change for Windows platform
Signed-off-by: Tony Fang <nenghui.fang@gmail.com>
The name of the GID 65534 differs across distros.
("nogroup" on Debian derivatives, "nobody" on Red Hat derivatives)
Fix the following test failure:
```
=== RUN TestVolumeOwnership
volume_copy_up_test.go:103: Create a sandbox
main_test.go:667: Pull test image "ghcr.io/containerd/volume-ownership:2.1"
volume_copy_up_test.go:108: Create a container with volume-ownership test image
volume_copy_up_test.go:117: Start the container
volume_copy_up_test.go:125: Check ownership of test directory inside container
volume_copy_up_test.go:146: Check ownership of test directory on the host
volume_copy_up_test.go:153:
Error Trace: /root/go/src/github.com/containerd/containerd/volume_copy_up_test.go:153
Error: Not equal:
expected: "nobody:nogroup\n"
actual : "nobody:nobody\n"
Diff:
--- Expected
+++ Actual
@@ -1,2 +1,2 @@
-nobody:nogroup
+nobody:nobody
Test: TestVolumeOwnership
--- FAIL: TestVolumeOwnership (3.45s)
```
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
Moves the sandbox store plugin under the plugins packages and adds a
unique plugin type for other plugins to depend on it.
Updates the sandbox controller plugin to depend on the sandbox store
plugin.
Signed-off-by: Derek McGowan <derek@mcg.dev>
Due to when we were updating the pod sandboxes underlying container
object, the pointer to the sandbox would have the right info, but
the on-disk representation of the data was behind. This would cause
the data returned from loading any sandboxes after a restart to have
no CNI result or IP information for the pod.
This change does an additional update to the on-disk container info
right after we invoke the CNI plugin so the metadata for the CNI result
and other networking information is properly flushed to disk.
Signed-off-by: Danny Canter <danny@dcantah.dev>
OCI runtime spec defines memory.swap as 'limit of memory+Swap usage'
so setting them to equal should disable the swap. Also, this change
should make containerd behaviour same as other runtimes e.g
'cri-dockerd/dockershim' and won't be impacted when user turn on
'NodeSwap' (https://github.com/kubernetes/enhancements/issues/2400) feature.
Signed-off-by: Qasim Sarfraz <qasimsarfraz@microsoft.com>
golang.org/x/net contains a fix for CVE-2022-41717, which was addressed
in stdlib in go1.19.4 and go1.18.9;
> net/http: limit canonical header cache by bytes, not entries
>
> An attacker can cause excessive memory growth in a Go server accepting
> HTTP/2 requests.
>
> HTTP/2 server connections contain a cache of HTTP header keys sent by
> the client. While the total number of entries in this cache is capped,
> an attacker sending very large keys can cause the server to allocate
> approximately 64 MiB per open connection.
>
> This issue is also fixed in golang.org/x/net/http2 v0.4.0,
> for users manually configuring HTTP/2.
full diff: https://github.com/golang/net/compare/c63010009c80...v0.4.0
other dependency updates (due to (circular) dependencies between them):
- golang.org/x/sys v0.3.0: https://github.com/golang/sys/compare/v0.2.0...v0.3.0
- golang.org/x/term v0.3.0: https://github.com/golang/term/compare/v0.1.0...v0.3.0
- golang.org/x/text v0.5.0: https://github.com/golang/text/compare/v0.4.0...v0.5.0
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Add a common NRI 'service' plugin. It takes care of relaying
requests and respones to and from NRI (external NRI plugins)
and the high-level containerd namespace-independent logic of
applying NRI container adjustments and updates to actual CRI
and other containers.
The namespace-dependent details of the necessary container
manipulation operations are to be implemented by namespace-
specific adaptations. This NRI plugin defines the API which
such adaptations need to implement.
Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
This was updated in 470d3ee057, but we
only needed the ebpf update. As nothing depends on this module anymore,
other than for the stats package (which didn't change in between), we
can (for now) roll it back to v1.0.4, and just force the newer ebpf
package.
Things rolled back (doesn't affect vendored code);
https://github.com/containerd/cgroups/compare/7083cd60b721..v1.0.4
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
go.mod doesn't always do a great job on keeping the dependencies grouped in the
right block; 2b60770c4b added an extra "require"
block, after which things went downward.
This patch is grouping them back in the right block to nudge it in the right
direction.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Go 1.18 and up now provides a strings.Cut() which is better suited for
splitting key/value pairs (and similar constructs), and performs better:
```go
func BenchmarkSplit(b *testing.B) {
b.ReportAllocs()
data := []string{"12hello=world", "12hello=", "12=hello", "12hello"}
for i := 0; i < b.N; i++ {
for _, s := range data {
_ = strings.SplitN(s, "=", 2)[0]
}
}
}
func BenchmarkCut(b *testing.B) {
b.ReportAllocs()
data := []string{"12hello=world", "12hello=", "12=hello", "12hello"}
for i := 0; i < b.N; i++ {
for _, s := range data {
_, _, _ = strings.Cut(s, "=")
}
}
}
```
BenchmarkSplit
BenchmarkSplit-10 8244206 128.0 ns/op 128 B/op 4 allocs/op
BenchmarkCut
BenchmarkCut-10 54411998 21.80 ns/op 0 B/op 0 allocs/op
While looking at occurrences of `strings.Split()`, I also updated some for alternatives,
or added some constraints; for cases where an specific number of items is expected, I used `strings.SplitN()`
with a suitable limit. This prevents (theoretical) unlimited splits.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Signed-off-by: Swagat Bora <sbora@amazon.com>
Add spans around image unpack operations
Use image.ref to denote image name and image.id for the image config digest
Add top-level spand and record errors in the CRI instrumentation service
Remove nolint-comments that weren't hit by linters, and remove the "structcheck"
and "varcheck" linters, as they have been deprecated:
WARN [runner] The linter 'structcheck' is deprecated (since v1.49.0) due to: The owner seems to have abandoned the linter. Replaced by unused.
WARN [runner] The linter 'varcheck' is deprecated (since v1.49.0) due to: The owner seems to have abandoned the linter. Replaced by unused.
WARN [linters context] structcheck is disabled because of generics. You can track the evolution of the generics support by following the https://github.com/golangci/golangci-lint/issues/2649.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
- fix "nolint" comments to be in the correct format (`//nolint:<linters>[,<linter>`
no leading space, required colon (`:`) and linters.
- remove "nolint" comments for errcheck, which is disabled in our config.
- remove "nolint" comments that were no longer needed (nolintlint).
- where known, add a comment describing why a "nolint" was applied.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
ArgsEscaped has now been merged into upstream OCI image spec.
This change removes the workaround we were doing in containerd
to deserialize the extra json outside of the spec and instead
just uses the formal spec types.
Signed-off-by: Justin Terry <jlterry@amazon.com>
Some minor improvements, but biggest for here is ErrPipeListenerClosed
is no longer an errors.New where the string matches the text of the now
exported net.ErrClosed in the stdlib, but is just assigned to net.ErrClosed
directly. This should allow us to get rid of the string check for "use of closed
network connection" here now..
Signed-off-by: Daniel Canter <dcanter@microsoft.com>
For some shims (namely github.com/cpuguy83/containerd-shim-systemd-v1),
the shim cgroup test doesn't make sense since there is only a single
shim process for the entire node.
I use these integration tests to make sure the shim is compatible with
the runc shims and generally works as expected. This will let me skip
the shim cgroup test as there is no process for the shim to stick into
the cgroup... mostly.
There is a bootstrap process as well as a PTY copier proces which do use
the shim cgroup if provided, but the test is not able to check for
those (unless we enable tty on the test, which is a bit arbitrary and
not useful).
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
github.com/AdaLogics/go-fuzz-headers and
github.com/AdamKorcz/go-118-fuzz-build have less dependencies in
the last versions.
Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
This comment was added in 09a0c9471b when the
Windows integration tests were enabled. The PR (microsoft/hcsshim#931) was
merged, and part of hcsshim v0.9.0, and support for resource limits on Windows
was added in 2bc77b8a28, so it looks like this
comment is no longer current.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
ForceRemoveAll was only used in tests/fuzzing, but added hcsshim as dependency
for the sys package. Moving this to integration/client makes the "sys" package
slightly more lightweight, and may help simplifying dependency-management.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
These functions were originally copied from the docker / moby repository in
4a7a8efc2d. Migrating these functions to use the
github.com/moby/sys/sequential module allows them being shared between moby,
docker/cli, and containerd, and to allow using them without importing all of sys
which also depends on hcsshim and more.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Previously in the Windows shim, killing a task that has already exited
or a task that has not yet been started, yielded an ErrNotFound. We now
return nil, which is in line with how the linux runtime behaves, so remove
the special case we had in TestContainerdRestart for this.
Signed-off-by: Daniel Canter <dcanter@microsoft.com>
This contains quite a bit (also bumps google/uuid to 1.3.0). Some HostProcess
container improvements to get ready for whenever it goes to stable in
Kubernetes, Hyper-V (windows) container support for CRI, and a plethora of
other small additions and fixes.
Signed-off-by: Daniel Canter <dcanter@microsoft.com>
Add a test to the CRI suite to validate stats functions for hostprocess
containers. hcsshim v0.9.3 had a bug in stats collection so this is
mainly for sanity and to avoid another regression.
Signed-off-by: Daniel Canter <dcanter@microsoft.com>
This tag contains some fixes for hostprocess containers, mainly around
fixing task stats which regressed from a change in v0.9.3.
https://github.com/microsoft/hcsshim/releases/tag/v0.9.4
Signed-off-by: Daniel Canter <dcanter@microsoft.com>
The ghcr.io/containerd/registry:2.7 image does not support the ppc64le
architecture, causing the TestCRIImagePullTimeout tests to fail when
executed on a ppc64le device.
Replace the ghcr.io/containerd/registry:2.7 image name and index with
the ghcr.io/containerd/volume-ownership:2.1 image and index in the
HoldingContentOpenWriter test.
Change the image pulled in the NoDataTransferred test to the
ghcr.io/containerd/volume-ownership:2.1 image.
Signed-off-by: James Jenkins <James.Jenkins@ibm.com>
This tag brings in a new field to fix an HNS issue in ws2019 as well as
an optimization for collecting Windows stats (memory, cpu, iops).
Signed-off-by: Daniel Canter <dcanter@microsoft.com>
`ioutil` has been deprecated by golang. All the code in `ioutil` just
forwards functionality to code in either the `io` or `os` packages.
See https://github.com/golang/go/pull/51961 for more info.
Signed-off-by: Jeff Widman <jeff@jeffwidman.com>
Introduce cni-bridge-fp as CNI bridge plugin wrapper binary for CRI
testing.
With CNI `io.kubernetes.cri.pod-annotations` capability enabled, the user
can inject the failpoint setting by pod's annotation
`cniFailpointControlStateDir`, which stores each pod's failpoint setting
named by `${K8S_POD_NAMESPACE}-${K8S_POD_NAME}.json`.
When the plugin is invoked, the plugin will check the CNI_ARGS to get
the failpoint for the CNI_COMMAND from disk. For the testing, the user
can prepare setting before RunPodSandbox.
Signed-off-by: Wei Fu <fuweid89@gmail.com>
Added new runc shim binary in integration testing.
The shim is named by io.containerd.runc-fp.v1, which allows us to use
additional OCI annotation `io.containerd.runtime.v2.shim.failpoint.*` to
setup shim task API's failpoint. Since the shim can be shared with
multiple container, like what kubernetes pod does, the failpoint will be
initialized during setup the shim server. So, the following the
container's OCI failpoint's annotation will not work.
This commit also updates the ctr tool that we can use `--annotation` to
specify annotations when run container. For example:
```bash
➜ ctr run -d --runtime runc-fp.v1 \
--annotation "io.containerd.runtime.v2.shim.failpoint.Kill=1*error(sorry)" \
docker.io/library/alpine:latest testing sleep 1d
➜ ctr t ls
TASK PID STATUS
testing 147304 RUNNING
➜ ctr t kill -s SIGKILL testing
ctr: sorry: unknown
➜ ctr t kill -s SIGKILL testing
➜ sudo ctr t ls
TASK PID STATUS
testing 147304 STOPPED
```
The runc-fp.v1 shim is based on core runc.v2. We can use it to inject
failpoint during testing complicated or big transcation API, like
kubernetes PodRunPodsandbox.
Signed-off-by: Wei Fu <fuweid89@gmail.com>
This version contains the CRI changes for user namespaces support.
Future patches will use the new fields in the CRI.
Updating the module without using the new fields doesn't cause any
behaviour change.
Updates: #7063
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
This change does a couple things to remove some cruft/unused functionality
in the Windows snapshotter, as well as add a way to specify the rootfs
size in bytes for a Windows container via a new field added in the CRI api in
k8s 1.24. Setting the rootfs/scratch volume size was assumed to be working
prior to this but turns out not to be the case.
Previously I'd added a change to pass any annotations in the containerd
snapshot form (containerd.io/snapshot/*) as labels for the containers
rootfs snapshot. This was added as a means for a client to be able to provide
containerd.io/snapshot/io.microsoft.container.storage.rootfs.size-gb as an
annotation and have that be translated to a label and ultimately set the
size for the scratch volume in Windows. However, this actually only worked if
interfacing with the CRI api directly (crictl) as Kubernetes itself will
fail to validate annotations that if split by "/" end up with > 2 parts,
which the snapshot labels will (containerd.io / snapshot / foobarbaz).
With this in mind, passing the annotations and filtering to
containerd.io/snapshot/* is moot, so I've removed this code in favor of
a new `snapshotterOpts()` function that will return platform specific
snapshotter options if ones exist. Now on Windows we can just check if
RootfsSizeInBytes is set on the WindowsContainerResources struct and
then return a snapshotter option that sets the right label.
So all in all this change:
- Gets rid of code to pass CRI annotations as labels down to snapshotters.
- Gets rid of the functionality to create a 1GB sized scratch disk if
the client provided a size < 20GB. This code is not used currently and
has a few logical shortcomings as it won't be able to create the disk
if a container is already running and using the same base layer. WCIFS
(driver that handles the unioning of windows container layers together)
holds open handles to some files that we need to delete to create the
1GB scratch disk is the underlying problem.
- Deprecates the containerd.io/snapshot/io.microsoft.container.storage.rootfs.size-gb
label in favor of a new containerd.io/snapshot/windows/rootfs.sizebytes label.
The previous label/annotation wasn't being used by us, and from a cursory
github search wasn't being used by anyone else either. Now that there is a CRI
field to specify the size, this should just be a field that users can set
on their pod specs and don't need to concern themselves with what it eventually
gets translated to, but non-CRI clients can still use the new label/deprecated
label as usual.
- Add test to cri integration suite to validate expanding the rootfs size.
Signed-off-by: Daniel Canter <dcanter@microsoft.com>
HostProcess containers require every container in the pod to be a
host process container and have the corresponding field set. The Kubelet
usually enforces this so we'd error before even getting here but we recently
found a bug in this logic so better to be safe than sorry.
Signed-off-by: Daniel Canter <dcanter@microsoft.com>
The regression in v1.22.2 has been resolved, so we can drop the
replace rule and use the latest v1.22.x version.
full diff: https://github.com/urfave/cli/compare/v1.22.1...v1.22.9
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
These tests are launching containerd and pulling busybox there, while
other tests are using busybox from TestMain().
This commit shares busybox at least between TestRestartMonitor and
TestRestartMonitorWithOnFailurePolicy to reduce the chance of
throttling from ghcr.io.
Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
This test tends to fail under Cirrus CI + Vagrant. Skipping for now
since running the test on GitHub Actions would be suffice.
Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
Kubelet sends the PullImage request without timeout, because the image size
is unknown and timeout is hard to defined. The pulling request might run
into 0B/s speed, if containerd can't receive any packet in that connection.
For this case, the containerd should cancel the PullImage request.
Although containerd provides ingester manager to track the progress of pulling
request, for example `ctr image pull` shows the console progress bar, it needs
more CPU resources to open/read the ingested files to get status.
In order to support progress timeout feature with lower overhead, this
patch uses http.RoundTripper wrapper to track active progress. That
wrapper will increase active-request number and return the
countingReadCloser wrapper for http.Response.Body. Each bytes-read
can be count and the active-request number will be descreased when the
countingReadCloser wrapper has been closed. For the progress tracker,
it can check the active-request number and bytes-read at intervals. If
there is no any progress, the progress tracker should cancel the
request.
NOTE: For each blob data, the containerd will make sure that the content
writer is opened before sending http request to the registry. Therefore, the
progress reporter can rely on the active-request number.
fixed: #4984
Signed-off-by: Wei Fu <fuweid89@gmail.com>
Previouslty "Size" was reserved by protoc-gen-gogoctrd and user-generated
"Size" was automatically renamed to "Size_" to avoid conflicts.
Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
This commit hides types.Any from the diff package's interface. Clients
(incl. imgcrypt) shouldn't aware about gogo/protobuf.
Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
This commit upgrades github.com/containerd/typeurl to use typeurl.Any.
The interface hides gogo/protobuf/types.Any from containerd's Go client.
Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>