This version contains the CRI changes for user namespaces support.
Future patches will use the new fields in the CRI.
Updating the module without using the new fields doesn't cause any
behaviour change.
Updates: #7063
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
This change does a couple things to remove some cruft/unused functionality
in the Windows snapshotter, as well as add a way to specify the rootfs
size in bytes for a Windows container via a new field added in the CRI api in
k8s 1.24. Setting the rootfs/scratch volume size was assumed to be working
prior to this but turns out not to be the case.
Previously I'd added a change to pass any annotations in the containerd
snapshot form (containerd.io/snapshot/*) as labels for the containers
rootfs snapshot. This was added as a means for a client to be able to provide
containerd.io/snapshot/io.microsoft.container.storage.rootfs.size-gb as an
annotation and have that be translated to a label and ultimately set the
size for the scratch volume in Windows. However, this actually only worked if
interfacing with the CRI api directly (crictl) as Kubernetes itself will
fail to validate annotations that if split by "/" end up with > 2 parts,
which the snapshot labels will (containerd.io / snapshot / foobarbaz).
With this in mind, passing the annotations and filtering to
containerd.io/snapshot/* is moot, so I've removed this code in favor of
a new `snapshotterOpts()` function that will return platform specific
snapshotter options if ones exist. Now on Windows we can just check if
RootfsSizeInBytes is set on the WindowsContainerResources struct and
then return a snapshotter option that sets the right label.
So all in all this change:
- Gets rid of code to pass CRI annotations as labels down to snapshotters.
- Gets rid of the functionality to create a 1GB sized scratch disk if
the client provided a size < 20GB. This code is not used currently and
has a few logical shortcomings as it won't be able to create the disk
if a container is already running and using the same base layer. WCIFS
(driver that handles the unioning of windows container layers together)
holds open handles to some files that we need to delete to create the
1GB scratch disk is the underlying problem.
- Deprecates the containerd.io/snapshot/io.microsoft.container.storage.rootfs.size-gb
label in favor of a new containerd.io/snapshot/windows/rootfs.sizebytes label.
The previous label/annotation wasn't being used by us, and from a cursory
github search wasn't being used by anyone else either. Now that there is a CRI
field to specify the size, this should just be a field that users can set
on their pod specs and don't need to concern themselves with what it eventually
gets translated to, but non-CRI clients can still use the new label/deprecated
label as usual.
- Add test to cri integration suite to validate expanding the rootfs size.
Signed-off-by: Daniel Canter <dcanter@microsoft.com>
HostProcess containers require every container in the pod to be a
host process container and have the corresponding field set. The Kubelet
usually enforces this so we'd error before even getting here but we recently
found a bug in this logic so better to be safe than sorry.
Signed-off-by: Daniel Canter <dcanter@microsoft.com>
The regression in v1.22.2 has been resolved, so we can drop the
replace rule and use the latest v1.22.x version.
full diff: https://github.com/urfave/cli/compare/v1.22.1...v1.22.9
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
These tests are launching containerd and pulling busybox there, while
other tests are using busybox from TestMain().
This commit shares busybox at least between TestRestartMonitor and
TestRestartMonitorWithOnFailurePolicy to reduce the chance of
throttling from ghcr.io.
Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
This test tends to fail under Cirrus CI + Vagrant. Skipping for now
since running the test on GitHub Actions would be suffice.
Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
Kubelet sends the PullImage request without timeout, because the image size
is unknown and timeout is hard to defined. The pulling request might run
into 0B/s speed, if containerd can't receive any packet in that connection.
For this case, the containerd should cancel the PullImage request.
Although containerd provides ingester manager to track the progress of pulling
request, for example `ctr image pull` shows the console progress bar, it needs
more CPU resources to open/read the ingested files to get status.
In order to support progress timeout feature with lower overhead, this
patch uses http.RoundTripper wrapper to track active progress. That
wrapper will increase active-request number and return the
countingReadCloser wrapper for http.Response.Body. Each bytes-read
can be count and the active-request number will be descreased when the
countingReadCloser wrapper has been closed. For the progress tracker,
it can check the active-request number and bytes-read at intervals. If
there is no any progress, the progress tracker should cancel the
request.
NOTE: For each blob data, the containerd will make sure that the content
writer is opened before sending http request to the registry. Therefore, the
progress reporter can rely on the active-request number.
fixed: #4984
Signed-off-by: Wei Fu <fuweid89@gmail.com>
Previouslty "Size" was reserved by protoc-gen-gogoctrd and user-generated
"Size" was automatically renamed to "Size_" to avoid conflicts.
Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
This commit hides types.Any from the diff package's interface. Clients
(incl. imgcrypt) shouldn't aware about gogo/protobuf.
Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
This commit upgrades github.com/containerd/typeurl to use typeurl.Any.
The interface hides gogo/protobuf/types.Any from containerd's Go client.
Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
- Upgrade github.com/containerd/imgcrypt to prepare for typeurl upgrade
(see https://github.com/containerd/imgcrypt/pull/72)
- Upgrade github.com/opencontainers/image-spec since imgcrypto needs at
least 1.0.2.
Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
full diff: 32db794688...3147a52a75
This version contains a fix for CVE-2022-27191 (not sure if it affects us).
From the golang mailing list:
Hello gophers,
Version v0.0.0-20220315160706-3147a52a75dd of golang.org/x/crypto/ssh implements
client authentication support for signature algorithms based on SHA-2 for use with
existing RSA keys.
Previously, a client would fail to authenticate with RSA keys to servers that
reject signature algorithms based on SHA-1. This includes OpenSSH 8.8 by default
and—starting today March 15, 2022 for recently uploaded keys.
We are providing this announcement as the error (“ssh: unable to authenticate”)
might otherwise be difficult to troubleshoot.
Version v0.0.0-20220314234659-1baeb1ce4c0b (included in the version above) also
fixes a potential security issue where an attacker could cause a crash in a
golang.org/x/crypto/ssh server under these conditions:
- The server has been configured by passing a Signer to ServerConfig.AddHostKey.
- The Signer passed to AddHostKey does not also implement AlgorithmSigner.
- The Signer passed to AddHostKey does return a key of type “ssh-rsa” from its PublicKey method.
Servers that only use Signer implementations provided by the ssh package are
unaffected. This is CVE-2022-27191.
Alla prossima,
Filippo for the Go Security team
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The directory created by `T.TempDir` is automatically removed when the
test and all its subtests complete.
Reference: https://pkg.go.dev/testing#T.TempDir
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
This test takes advantage of the fact that when you tell Windows to
mount the GUID_DEVINTERFACE_DISPLAY_ADAPTER class, it will also mount
the host's device store into the container, even if there is no real GPU
on the host.
Signed-off-by: Paul "TBBle" Hampson <Paul.Hampson@Pobox.com>
Using a bytes buffer for this test increases the memory usage on Windows
to over 3 GB. Using a temporary file as a destination for the image
keeps memory usage at a reasonable level.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
Windows needs a bit more time to finish the restarting containerd. With
the current 2 second timeout, we run the risk of exceeding that
deadline.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
There's no specific need mentioned at the points it was added, and it
makes the Windows-hosted test run setup slightly weird.
Signed-off-by: Paul "TBBle" Hampson <Paul.Hampson@Pobox.com>
full diff: https://github.com/emicklei/go-restful/compare/v2.9.5...v3.7.3
- Switch to using go modules
- Add check for wildcard to fix CORS filter
- Add check on writer to prevent compression of response twice
- Add OPTIONS shortcut WebService receiver
- Add Route metadata to request attributes or allow adding attributes to routes
- Add wroteHeader set
- Enable content encoding on Handle and ServeHTTP
- Feat: support google custom verb
- Feature: override list of method allowed without content-type
- Fix Allow header not set on '405: Method Not Allowed' responses
- Fix Go 1.15: conversion from int to string yields a string of one rune
- Fix WriteError return value
- Fix: use request/response resulting from filter chain
- handle path params with prefixes and suffixes
- HTTP response body was broken, if struct to be converted to JSON has boolean value
- List available representations in 406 body
- Support describing response headers
- Unwrap function in filter chain + remove unused dispatchWithFilters
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
full diff: https://github.com/fsnotify/fsnotify/compare/v1.4.9...v1.5.1
Relevant changes:
- Fix unsafe pointer conversion
- Drop support/testing for Go 1.11 and earlier
- Update x/sys to latest
- add //go:build lines
- add go 1.17 to test matrix
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
With the release of hcsshim v0.9.2, this test should pass without
issues on Windows.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
POSIX guidelines describes; https://www.gnu.org/prep/standards/html_node/_002d_002dversion.html#g_t_002d_002dversion
> The program’s name should be a constant string; don’t compute it from argv[0].
> The idea is to state the standard or canonical name for the program, not its
> file name.
We don't have a const for this, but let's make a start and just remove the path info.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
ShimV2 has shim.Delete command to cleanup task's temporary resource,
like bundle folder. Since the shim server exits and no persistent store
is for task's exit code, the result of shim.Delete is always 137 exit
code, like the task has been killed.
And the result of shim.Delete can be used as task event only when the
shim server is killed somehow after container is running. Therefore,
dockerd, which watches task exit event to update status of container,
can report correct status.
Back to the issue #6429, the container is not running because the
entrypoint is not found. Based on this design, we should not send
137 exitcode event to subscriber.
This commit is aimed to remove shim instance first and then the
`cleanupAfterDeadShim` should not send event.
Similar Issue: #4769Fix#6429
Signed-off-by: Wei Fu <fuweid89@gmail.com>
The build number used to determine whether we need to pull the Windows
Server 2022 image for the integration tests was previously hardcoded as there
wasn't an hcsshim release with the build number. Now that there is and it's
vendored in, this change just swaps to it.
Signed-off-by: Daniel Canter <dcanter@microsoft.com>
This tag brings in some bug fixes related to waiting for containers to terminate and
trying to kill an already terminated process, as well as tty support (exec -it) for
Windows Host Process Containers.
Signed-off-by: Daniel Canter <dcanter@microsoft.com>
This is a effective revert of 294143bf38
The one thing that makes it not a total revert is this keeps the usage
of chain interceptors, which prevents us from overwriting interceptors
passed into client options.
The automatic trace injection is unnecessary overhead since callers of
this function can add the necessary interceptors when creating the
client.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Adds an equivalent TestSandboxRemoveWithoutIPLeakage for Windows, in which
we assert that the IPs are not leaked when a Pod's HNS namespace dissapears
and the Pod is deleted afterwards.
Signed-off-by: Claudiu Belu <cbelu@cloudbasesolutions.com>
The "notready-sandbox" array will only have a CONTAINER_CREATED
and a CONTAINER_EXITED in the sandbox. So there will be no running
task to send a Kill() to. This means that on Windows, it will always
return an ErrorNotFound.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
This offers a more reliable way of killing a process. The /IM flag
allows us to specify the "image name" of the process we're killing.
This means we can use wildcards, foce kill a process and all the child
processes it may have spawned.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
This change fixes flakiness on Windows for TestContainerListStatsWithSandboxIdFilter
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
The global &daemon{} object only gets an address assigned if you call
start() on it, which only gets called if you don't pass no-daemon. The
aftermath of this is that running the integration tests with no-daemon
just has them fail trying to create a client for containerd.
This change just assigns whatever address is passed to the binary even in
the no-daemon case so you can run the integration tests against an already
running containerd instance.
Signed-off-by: Daniel Canter <dcanter@microsoft.com>
OCI hooks aren't implemented on Windows. The test will, and has been,
actuallyrunning fine on Windows because the Github runners seem to have
a 'ps' binary in the users PATH, but there's not any actual hook
functionality being tested as any of the OCI fields are ignored for
Windows containers.
Signed-off-by: Daniel Canter <dcanter@microsoft.com>
The CRI-plugin will setup watcher for each container after
StartContainer or RunPodSandbox. It will cleanup task(container/sandbox)
if received the exit event from watcher.
The original test design is to `Delete` sandbox container to get
NOT_READY state and expect to receive NotFound error. It depends on that
CRI-plugin cleanups container after `Delete` API. If not, the shim will
be cleanup and test code will receive `ttrpc: closed: unknown` or other
unknown error. It is flaky.
In this patch, the test will only send the kill signal and wait for the
exit event. When sandbox exits, the state will and must be NOT_READY.
```plain
// test fail log
=== RUN TestContainerdRestart
restart_test.go:92: Make sure no sandbox is running before test
restart_test.go:97: Start test sandboxes and containers
common.go:115: Image "k8s.gcr.io/pause:3.6" already exists, not pulling.
common.go:115: Image "k8s.gcr.io/pause:3.6" already exists, not pulling.
restart_test.go:139:
Error Trace: restart_test.go:139
Error: Should be true
Test: TestContainerdRestart
Messages: delete should return not found error but returned failed to delete task: ttrpc: closed: unknown
--- FAIL: TestContainerdRestart (4.25s)
// containerd log
&TaskExit{ContainerID:4b4c1d1d303c14a2cc759631d163f153ba8536e9ea6821744a509e4a17346184,ID:4b4c1d1d303c14a2cc759631d163f153ba8536e9ea6821744a509e4a17346184,Pid:28430,ExitStatus:137,ExitedAt:2021-12-12 07:56:01.400753012 +0000 UTC,XXX_unrecognized:[],}"
time="2021-12-12T07:56:01.401120516Z" level=debug msg="event forwarded" ns=k8s.io topic=/tasks/exit type=containerd.events.TaskExit
time="2021-12-12T07:56:01.418934208Z" level=debug msg="event forwarded" ns=k8s.io topic=/tasks/delete type=containerd.events.TaskDelete
time="2021-12-12T07:56:01.419192910Z" level=info msg="shim disconnected" id=4b4c1d1d303c14a2cc759631d163f153ba8536e9ea6821744a509e4a17346184
time="2021-12-12T07:56:01.419235911Z" level=warning msg="cleaning up after shim disconnected" id=4b4c1d1d303c14a2cc759631d163f153ba8536e9ea6821744a509e4a17346184 namespace=k8s.io
time="2021-12-12T07:56:01.419247711Z" level=info msg="cleaning up dead shim"
time="2021-12-12T07:56:01.419235311Z" level=error msg="failed sending message on channel" error="write unix /run/containerd/s/18afde7fcde70236eb31b9f43f3bd92af1dc1186583c501aa1396255f87f95d4->@: write: broken pipe"
time="2021-12-12T07:56:01.419354712Z" level=debug msg="failed to delete task" error="ttrpc: closed" id=4b4c1d1d303c14a2cc759631d163f153ba8536e9ea6821744a509e4a17346184
```
CI Link: `https://pipelines.actions.githubusercontent.com/G4SighzWVVZ6vsyiz7FFMFjLjRzveJHseEnVyibkSq87Cl2x4O/_apis/pipelines/1/runs/9501/signedlogcontent/76?urlExpires=2021-12-12T08%3A42%3A08.0765750Z&urlSigningMethod=HMACV1&urlSignature=pH93isMSFdZUo1ndnZynJpZbPGrEyvt12MO03fgUU7I%3D`
Signed-off-by: Wei Fu <fuweid89@gmail.com>
Skip this test until this error can be evaluated and the appropriate
test fix or environment configuration can be determined.
Signed-off-by: Derek McGowan <derek@mcg.dev>
The OCI image spec did a v1.0.2 security release for CVE-2021-41190, however
commit 09c9270fee, depends on MediaTypes that
have not yet been released by the OCI image-spec, so using current "main" instead.
full diff: 5ad6f50d62...693428a734
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This change enables the TestVolumeOwnership on Windows. The test
assumes that the volume-ownership image is built on Windows, thus
ensuring that Windows file security info (ACLs and ownership info)
are attached to the C:\volumes\test_dir path.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
I noticed we were using some different versions of the same test
images, so changing them to be the same (can help with find/replace
if we need to update them).
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
It seems that the default ACLs inherited from the parent folder
on Windows Server 2022, does not include "CREATOR OWNER" as it
does on Windows Server 2019. This sets explicit ACLs on test
files.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
GA for ws2022 github actions VMs launched a couple weeks ago so seems like
it's time to try out the CI on this new SKU.
This involved adding new ws2022 runs for the OS matrices in the CI, fixing up
a test in the platforms package and adding a mapping for the ws2022 container image in
integration/client.
Signed-off-by: Daniel Canter <dcanter@microsoft.com>