Moves the sandbox store plugin under the plugins packages and adds a
unique plugin type for other plugins to depend on it.
Updates the sandbox controller plugin to depend on the sandbox store
plugin.
Signed-off-by: Derek McGowan <derek@mcg.dev>
Due to when we were updating the pod sandboxes underlying container
object, the pointer to the sandbox would have the right info, but
the on-disk representation of the data was behind. This would cause
the data returned from loading any sandboxes after a restart to have
no CNI result or IP information for the pod.
This change does an additional update to the on-disk container info
right after we invoke the CNI plugin so the metadata for the CNI result
and other networking information is properly flushed to disk.
Signed-off-by: Danny Canter <danny@dcantah.dev>
OCI runtime spec defines memory.swap as 'limit of memory+Swap usage'
so setting them to equal should disable the swap. Also, this change
should make containerd behaviour same as other runtimes e.g
'cri-dockerd/dockershim' and won't be impacted when user turn on
'NodeSwap' (https://github.com/kubernetes/enhancements/issues/2400) feature.
Signed-off-by: Qasim Sarfraz <qasimsarfraz@microsoft.com>
golang.org/x/net contains a fix for CVE-2022-41717, which was addressed
in stdlib in go1.19.4 and go1.18.9;
> net/http: limit canonical header cache by bytes, not entries
>
> An attacker can cause excessive memory growth in a Go server accepting
> HTTP/2 requests.
>
> HTTP/2 server connections contain a cache of HTTP header keys sent by
> the client. While the total number of entries in this cache is capped,
> an attacker sending very large keys can cause the server to allocate
> approximately 64 MiB per open connection.
>
> This issue is also fixed in golang.org/x/net/http2 v0.4.0,
> for users manually configuring HTTP/2.
full diff: https://github.com/golang/net/compare/c63010009c80...v0.4.0
other dependency updates (due to (circular) dependencies between them):
- golang.org/x/sys v0.3.0: https://github.com/golang/sys/compare/v0.2.0...v0.3.0
- golang.org/x/term v0.3.0: https://github.com/golang/term/compare/v0.1.0...v0.3.0
- golang.org/x/text v0.5.0: https://github.com/golang/text/compare/v0.4.0...v0.5.0
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Add a common NRI 'service' plugin. It takes care of relaying
requests and respones to and from NRI (external NRI plugins)
and the high-level containerd namespace-independent logic of
applying NRI container adjustments and updates to actual CRI
and other containers.
The namespace-dependent details of the necessary container
manipulation operations are to be implemented by namespace-
specific adaptations. This NRI plugin defines the API which
such adaptations need to implement.
Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
This was updated in 470d3ee057, but we
only needed the ebpf update. As nothing depends on this module anymore,
other than for the stats package (which didn't change in between), we
can (for now) roll it back to v1.0.4, and just force the newer ebpf
package.
Things rolled back (doesn't affect vendored code);
https://github.com/containerd/cgroups/compare/7083cd60b721..v1.0.4
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
go.mod doesn't always do a great job on keeping the dependencies grouped in the
right block; 2b60770c4b added an extra "require"
block, after which things went downward.
This patch is grouping them back in the right block to nudge it in the right
direction.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Go 1.18 and up now provides a strings.Cut() which is better suited for
splitting key/value pairs (and similar constructs), and performs better:
```go
func BenchmarkSplit(b *testing.B) {
b.ReportAllocs()
data := []string{"12hello=world", "12hello=", "12=hello", "12hello"}
for i := 0; i < b.N; i++ {
for _, s := range data {
_ = strings.SplitN(s, "=", 2)[0]
}
}
}
func BenchmarkCut(b *testing.B) {
b.ReportAllocs()
data := []string{"12hello=world", "12hello=", "12=hello", "12hello"}
for i := 0; i < b.N; i++ {
for _, s := range data {
_, _, _ = strings.Cut(s, "=")
}
}
}
```
BenchmarkSplit
BenchmarkSplit-10 8244206 128.0 ns/op 128 B/op 4 allocs/op
BenchmarkCut
BenchmarkCut-10 54411998 21.80 ns/op 0 B/op 0 allocs/op
While looking at occurrences of `strings.Split()`, I also updated some for alternatives,
or added some constraints; for cases where an specific number of items is expected, I used `strings.SplitN()`
with a suitable limit. This prevents (theoretical) unlimited splits.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Signed-off-by: Swagat Bora <sbora@amazon.com>
Add spans around image unpack operations
Use image.ref to denote image name and image.id for the image config digest
Add top-level spand and record errors in the CRI instrumentation service
Remove nolint-comments that weren't hit by linters, and remove the "structcheck"
and "varcheck" linters, as they have been deprecated:
WARN [runner] The linter 'structcheck' is deprecated (since v1.49.0) due to: The owner seems to have abandoned the linter. Replaced by unused.
WARN [runner] The linter 'varcheck' is deprecated (since v1.49.0) due to: The owner seems to have abandoned the linter. Replaced by unused.
WARN [linters context] structcheck is disabled because of generics. You can track the evolution of the generics support by following the https://github.com/golangci/golangci-lint/issues/2649.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
- fix "nolint" comments to be in the correct format (`//nolint:<linters>[,<linter>`
no leading space, required colon (`:`) and linters.
- remove "nolint" comments for errcheck, which is disabled in our config.
- remove "nolint" comments that were no longer needed (nolintlint).
- where known, add a comment describing why a "nolint" was applied.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
ArgsEscaped has now been merged into upstream OCI image spec.
This change removes the workaround we were doing in containerd
to deserialize the extra json outside of the spec and instead
just uses the formal spec types.
Signed-off-by: Justin Terry <jlterry@amazon.com>
Some minor improvements, but biggest for here is ErrPipeListenerClosed
is no longer an errors.New where the string matches the text of the now
exported net.ErrClosed in the stdlib, but is just assigned to net.ErrClosed
directly. This should allow us to get rid of the string check for "use of closed
network connection" here now..
Signed-off-by: Daniel Canter <dcanter@microsoft.com>
For some shims (namely github.com/cpuguy83/containerd-shim-systemd-v1),
the shim cgroup test doesn't make sense since there is only a single
shim process for the entire node.
I use these integration tests to make sure the shim is compatible with
the runc shims and generally works as expected. This will let me skip
the shim cgroup test as there is no process for the shim to stick into
the cgroup... mostly.
There is a bootstrap process as well as a PTY copier proces which do use
the shim cgroup if provided, but the test is not able to check for
those (unless we enable tty on the test, which is a bit arbitrary and
not useful).
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
github.com/AdaLogics/go-fuzz-headers and
github.com/AdamKorcz/go-118-fuzz-build have less dependencies in
the last versions.
Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
This comment was added in 09a0c9471b when the
Windows integration tests were enabled. The PR (microsoft/hcsshim#931) was
merged, and part of hcsshim v0.9.0, and support for resource limits on Windows
was added in 2bc77b8a28, so it looks like this
comment is no longer current.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
ForceRemoveAll was only used in tests/fuzzing, but added hcsshim as dependency
for the sys package. Moving this to integration/client makes the "sys" package
slightly more lightweight, and may help simplifying dependency-management.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
These functions were originally copied from the docker / moby repository in
4a7a8efc2d. Migrating these functions to use the
github.com/moby/sys/sequential module allows them being shared between moby,
docker/cli, and containerd, and to allow using them without importing all of sys
which also depends on hcsshim and more.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Previously in the Windows shim, killing a task that has already exited
or a task that has not yet been started, yielded an ErrNotFound. We now
return nil, which is in line with how the linux runtime behaves, so remove
the special case we had in TestContainerdRestart for this.
Signed-off-by: Daniel Canter <dcanter@microsoft.com>
This contains quite a bit (also bumps google/uuid to 1.3.0). Some HostProcess
container improvements to get ready for whenever it goes to stable in
Kubernetes, Hyper-V (windows) container support for CRI, and a plethora of
other small additions and fixes.
Signed-off-by: Daniel Canter <dcanter@microsoft.com>
Add a test to the CRI suite to validate stats functions for hostprocess
containers. hcsshim v0.9.3 had a bug in stats collection so this is
mainly for sanity and to avoid another regression.
Signed-off-by: Daniel Canter <dcanter@microsoft.com>
This tag contains some fixes for hostprocess containers, mainly around
fixing task stats which regressed from a change in v0.9.3.
https://github.com/microsoft/hcsshim/releases/tag/v0.9.4
Signed-off-by: Daniel Canter <dcanter@microsoft.com>
The ghcr.io/containerd/registry:2.7 image does not support the ppc64le
architecture, causing the TestCRIImagePullTimeout tests to fail when
executed on a ppc64le device.
Replace the ghcr.io/containerd/registry:2.7 image name and index with
the ghcr.io/containerd/volume-ownership:2.1 image and index in the
HoldingContentOpenWriter test.
Change the image pulled in the NoDataTransferred test to the
ghcr.io/containerd/volume-ownership:2.1 image.
Signed-off-by: James Jenkins <James.Jenkins@ibm.com>
This tag brings in a new field to fix an HNS issue in ws2019 as well as
an optimization for collecting Windows stats (memory, cpu, iops).
Signed-off-by: Daniel Canter <dcanter@microsoft.com>
`ioutil` has been deprecated by golang. All the code in `ioutil` just
forwards functionality to code in either the `io` or `os` packages.
See https://github.com/golang/go/pull/51961 for more info.
Signed-off-by: Jeff Widman <jeff@jeffwidman.com>
Introduce cni-bridge-fp as CNI bridge plugin wrapper binary for CRI
testing.
With CNI `io.kubernetes.cri.pod-annotations` capability enabled, the user
can inject the failpoint setting by pod's annotation
`cniFailpointControlStateDir`, which stores each pod's failpoint setting
named by `${K8S_POD_NAMESPACE}-${K8S_POD_NAME}.json`.
When the plugin is invoked, the plugin will check the CNI_ARGS to get
the failpoint for the CNI_COMMAND from disk. For the testing, the user
can prepare setting before RunPodSandbox.
Signed-off-by: Wei Fu <fuweid89@gmail.com>
Added new runc shim binary in integration testing.
The shim is named by io.containerd.runc-fp.v1, which allows us to use
additional OCI annotation `io.containerd.runtime.v2.shim.failpoint.*` to
setup shim task API's failpoint. Since the shim can be shared with
multiple container, like what kubernetes pod does, the failpoint will be
initialized during setup the shim server. So, the following the
container's OCI failpoint's annotation will not work.
This commit also updates the ctr tool that we can use `--annotation` to
specify annotations when run container. For example:
```bash
➜ ctr run -d --runtime runc-fp.v1 \
--annotation "io.containerd.runtime.v2.shim.failpoint.Kill=1*error(sorry)" \
docker.io/library/alpine:latest testing sleep 1d
➜ ctr t ls
TASK PID STATUS
testing 147304 RUNNING
➜ ctr t kill -s SIGKILL testing
ctr: sorry: unknown
➜ ctr t kill -s SIGKILL testing
➜ sudo ctr t ls
TASK PID STATUS
testing 147304 STOPPED
```
The runc-fp.v1 shim is based on core runc.v2. We can use it to inject
failpoint during testing complicated or big transcation API, like
kubernetes PodRunPodsandbox.
Signed-off-by: Wei Fu <fuweid89@gmail.com>
This version contains the CRI changes for user namespaces support.
Future patches will use the new fields in the CRI.
Updating the module without using the new fields doesn't cause any
behaviour change.
Updates: #7063
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
This change does a couple things to remove some cruft/unused functionality
in the Windows snapshotter, as well as add a way to specify the rootfs
size in bytes for a Windows container via a new field added in the CRI api in
k8s 1.24. Setting the rootfs/scratch volume size was assumed to be working
prior to this but turns out not to be the case.
Previously I'd added a change to pass any annotations in the containerd
snapshot form (containerd.io/snapshot/*) as labels for the containers
rootfs snapshot. This was added as a means for a client to be able to provide
containerd.io/snapshot/io.microsoft.container.storage.rootfs.size-gb as an
annotation and have that be translated to a label and ultimately set the
size for the scratch volume in Windows. However, this actually only worked if
interfacing with the CRI api directly (crictl) as Kubernetes itself will
fail to validate annotations that if split by "/" end up with > 2 parts,
which the snapshot labels will (containerd.io / snapshot / foobarbaz).
With this in mind, passing the annotations and filtering to
containerd.io/snapshot/* is moot, so I've removed this code in favor of
a new `snapshotterOpts()` function that will return platform specific
snapshotter options if ones exist. Now on Windows we can just check if
RootfsSizeInBytes is set on the WindowsContainerResources struct and
then return a snapshotter option that sets the right label.
So all in all this change:
- Gets rid of code to pass CRI annotations as labels down to snapshotters.
- Gets rid of the functionality to create a 1GB sized scratch disk if
the client provided a size < 20GB. This code is not used currently and
has a few logical shortcomings as it won't be able to create the disk
if a container is already running and using the same base layer. WCIFS
(driver that handles the unioning of windows container layers together)
holds open handles to some files that we need to delete to create the
1GB scratch disk is the underlying problem.
- Deprecates the containerd.io/snapshot/io.microsoft.container.storage.rootfs.size-gb
label in favor of a new containerd.io/snapshot/windows/rootfs.sizebytes label.
The previous label/annotation wasn't being used by us, and from a cursory
github search wasn't being used by anyone else either. Now that there is a CRI
field to specify the size, this should just be a field that users can set
on their pod specs and don't need to concern themselves with what it eventually
gets translated to, but non-CRI clients can still use the new label/deprecated
label as usual.
- Add test to cri integration suite to validate expanding the rootfs size.
Signed-off-by: Daniel Canter <dcanter@microsoft.com>