The containerd-shim creates pipes and passes them to the init container as
stdin, stdout, and stderr for logging purposes. By default, these pipes are
owned by the root user (UID/GID: 0/0). The init container can access them
directly through inheritance.
However, if the init container attempts to open any files pointing to these
pipes (e.g., /proc/1/fd/2, /dev/stderr), it will encounter a permission issue
since it is not the owner. To avoid this, we need to align the ownership of
the pipes with the init process.
Fixes: #10598
Signed-off-by: Wei Fu <fuweid89@gmail.com>
Fun times: In grpc 1.63 grpc.Dial and a few of the options we use (WithBlock) are
deprecated in favor of the no-IO variant NewClient. The uses in the integration tests
should be easy to swap however as they don't use WithBlock anyways, so that's what this
change aims to do. This also removes some context.WithTimeout's as I don't see anywhere
the context is actually used in Dial if you don't also specify WithBlock (and it's
especially not used now with NewClient as it doesn't even take in a context).
Signed-off-by: Danny Canter <danny@dcantah.dev>
This commit addresses issue #7318 by introducing events broadcasting
to the current implementation. The integration/container_event_test.go
is extended to demonstrate the broadcasting capabilities
of two simultaneous connected clients.
Signed-off-by: Yury Gargay <yury.gargay@gmail.com>
This is effectively a revert of 2ac9968401, which
switched from os/exec to the golang.org/x/sys/execabs package to mitigate
security issues (mainly on Windows) with lookups resolving to binaries in the
current directory.
from the go1.19 release notes https://go.dev/doc/go1.19#os-exec-path
> ## PATH lookups
>
> Command and LookPath no longer allow results from a PATH search to be found
> relative to the current directory. This removes a common source of security
> problems but may also break existing programs that depend on using, say,
> exec.Command("prog") to run a binary named prog (or, on Windows, prog.exe) in
> the current directory. See the os/exec package documentation for information
> about how best to update such programs.
>
> On Windows, Command and LookPath now respect the NoDefaultCurrentDirectoryInExePath
> environment variable, making it possible to disable the default implicit search
> of “.” in PATH lookups on Windows systems.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Go 1.18 and up now provides a strings.Cut() which is better suited for
splitting key/value pairs (and similar constructs), and performs better:
```go
func BenchmarkSplit(b *testing.B) {
b.ReportAllocs()
data := []string{"12hello=world", "12hello=", "12=hello", "12hello"}
for i := 0; i < b.N; i++ {
for _, s := range data {
_ = strings.SplitN(s, "=", 2)[0]
}
}
}
func BenchmarkCut(b *testing.B) {
b.ReportAllocs()
data := []string{"12hello=world", "12hello=", "12=hello", "12hello"}
for i := 0; i < b.N; i++ {
for _, s := range data {
_, _, _ = strings.Cut(s, "=")
}
}
}
```
BenchmarkSplit
BenchmarkSplit-10 8244206 128.0 ns/op 128 B/op 4 allocs/op
BenchmarkCut
BenchmarkCut-10 54411998 21.80 ns/op 0 B/op 0 allocs/op
While looking at occurrences of `strings.Split()`, I also updated some for alternatives,
or added some constraints; for cases where an specific number of items is expected, I used `strings.SplitN()`
with a suitable limit. This prevents (theoretical) unlimited splits.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Remove nolint-comments that weren't hit by linters, and remove the "structcheck"
and "varcheck" linters, as they have been deprecated:
WARN [runner] The linter 'structcheck' is deprecated (since v1.49.0) due to: The owner seems to have abandoned the linter. Replaced by unused.
WARN [runner] The linter 'varcheck' is deprecated (since v1.49.0) due to: The owner seems to have abandoned the linter. Replaced by unused.
WARN [linters context] structcheck is disabled because of generics. You can track the evolution of the generics support by following the https://github.com/golangci/golangci-lint/issues/2649.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This change does a couple things to remove some cruft/unused functionality
in the Windows snapshotter, as well as add a way to specify the rootfs
size in bytes for a Windows container via a new field added in the CRI api in
k8s 1.24. Setting the rootfs/scratch volume size was assumed to be working
prior to this but turns out not to be the case.
Previously I'd added a change to pass any annotations in the containerd
snapshot form (containerd.io/snapshot/*) as labels for the containers
rootfs snapshot. This was added as a means for a client to be able to provide
containerd.io/snapshot/io.microsoft.container.storage.rootfs.size-gb as an
annotation and have that be translated to a label and ultimately set the
size for the scratch volume in Windows. However, this actually only worked if
interfacing with the CRI api directly (crictl) as Kubernetes itself will
fail to validate annotations that if split by "/" end up with > 2 parts,
which the snapshot labels will (containerd.io / snapshot / foobarbaz).
With this in mind, passing the annotations and filtering to
containerd.io/snapshot/* is moot, so I've removed this code in favor of
a new `snapshotterOpts()` function that will return platform specific
snapshotter options if ones exist. Now on Windows we can just check if
RootfsSizeInBytes is set on the WindowsContainerResources struct and
then return a snapshotter option that sets the right label.
So all in all this change:
- Gets rid of code to pass CRI annotations as labels down to snapshotters.
- Gets rid of the functionality to create a 1GB sized scratch disk if
the client provided a size < 20GB. This code is not used currently and
has a few logical shortcomings as it won't be able to create the disk
if a container is already running and using the same base layer. WCIFS
(driver that handles the unioning of windows container layers together)
holds open handles to some files that we need to delete to create the
1GB scratch disk is the underlying problem.
- Deprecates the containerd.io/snapshot/io.microsoft.container.storage.rootfs.size-gb
label in favor of a new containerd.io/snapshot/windows/rootfs.sizebytes label.
The previous label/annotation wasn't being used by us, and from a cursory
github search wasn't being used by anyone else either. Now that there is a CRI
field to specify the size, this should just be a field that users can set
on their pod specs and don't need to concern themselves with what it eventually
gets translated to, but non-CRI clients can still use the new label/deprecated
label as usual.
- Add test to cri integration suite to validate expanding the rootfs size.
Signed-off-by: Daniel Canter <dcanter@microsoft.com>
HostProcess containers require every container in the pod to be a
host process container and have the corresponding field set. The Kubelet
usually enforces this so we'd error before even getting here but we recently
found a bug in this logic so better to be safe than sorry.
Signed-off-by: Daniel Canter <dcanter@microsoft.com>
This test takes advantage of the fact that when you tell Windows to
mount the GUID_DEVINTERFACE_DISPLAY_ADAPTER class, it will also mount
the host's device store into the container, even if there is no real GPU
on the host.
Signed-off-by: Paul "TBBle" Hampson <Paul.Hampson@Pobox.com>
Adds an equivalent TestSandboxRemoveWithoutIPLeakage for Windows, in which
we assert that the IPs are not leaked when a Pod's HNS namespace dissapears
and the Pod is deleted afterwards.
Signed-off-by: Claudiu Belu <cbelu@cloudbasesolutions.com>
This offers a more reliable way of killing a process. The /IM flag
allows us to specify the "image name" of the process we're killing.
This means we can use wildcards, foce kill a process and all the child
processes it may have spawned.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
Windows HostProcess containers can run containerized workloads on a Windows host.
These containers operate as normal processes but have access to the host network
namespace, storage, and devices when given the appropriate user privileges.
HostProcess containers support the ability to run as one of the following Windows
service accounts: LocalSystem, LocalService, NetworkService.
Signed-off-by: Claudiu Belu <cbelu@cloudbasesolutions.com>
The test sets container's Linux.SecurityContext.NamespaceOptions.Pid = NamespaceMode_CONTAINER,
which will ensure that the container keeps running even if the sandbox container dies. We do
not have that option on Windows.
Adds additional logging in the test, so it is easier to figure out which assertion failed.
Signed-off-by: Claudiu Belu <cbelu@cloudbasesolutions.com>
Go 1.15.7 contained a security fix for CVE-2021-3115, which allowed arbitrary
code to be executed at build time when using cgo on Windows. This issue also
affects Unix users who have “.” listed explicitly in their PATH and are running
“go get” outside of a module or with module mode disabled.
This issue is not limited to the go command itself, and can also affect binaries
that use `os.Command`, `os.LookPath`, etc.
From the related blogpost (ttps://blog.golang.org/path-security):
> Are your own programs affected?
>
> If you use exec.LookPath or exec.Command in your own programs, you only need to
> be concerned if you (or your users) run your program in a directory with untrusted
> contents. If so, then a subprocess could be started using an executable from dot
> instead of from a system directory. (Again, using an executable from dot happens
> always on Windows and only with uncommon PATH settings on Unix.)
>
> If you are concerned, then we’ve published the more restricted variant of os/exec
> as golang.org/x/sys/execabs. You can use it in your program by simply replacing
This patch replaces all uses of `os/exec` with `golang.org/x/sys/execabs`. While
some uses of `os/exec` should not be problematic (e.g. part of tests), it is
probably good to be consistent, in case code gets moved around.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Most of the tests creating and using Pod Sandboxes in the same way. We can create
a common function that will do that for us, so we can have less duplicated code,
and make it easier to add new tests in the future.
Signed-off-by: Claudiu Belu <cbelu@cloudbasesolutions.com>
There was a known issue regarding how the symlink files mounted as
volumes were being handled on Windows. This commit adds tests that
will check against those issue to ensure there won't be any
regressions.
Signed-off-by: Claudiu Belu <cbelu@cloudbasesolutions.com>
Currently, the cri-integration tests do not work on Windows due to various reasons.
One of the reasons is because all the tests are using Linux-specific images.
Previous commits refactored the image pulling / usage in the cri-integration tests,
making it easier to update, and easier to configure a custom registry to pull images
with Windows support.
For Windows runs, custom registries can be created, which will also contain Windows
images, and the cri-integration tests can be configured to use those registries by
specifying the "--repo-list" argument, a YAML file which will contain an alternative
mapping of the default registries. This is similar to how E2E tests are handled for
Windows runs in Kubernetes.
Some of the tests are Skipped, as they do not pass yet on Windows.
Windows does not collect inodes used stats, thus, the tests that were expecting non-zero
inodes stats were failing.
Signed-off-by: Claudiu Belu <cbelu@cloudbasesolutions.com>