containerd

Author	SHA1	Message	Date
Maksym Pavlenko	9c23fd71ed	Merge pull request #10760 from GabyCT/topic/runctrap script/setup/install-runc: Add trap statement to clean up tmp files	2024-10-02 23:26:10 +00:00
Saket Jajoo	d7f83034c2	Fix the race condition during GC of snapshots when client retries When an upstream client (e.g. kubelet) stops or restarts, the CRI connection to the containerd gets interrupted which is treated as a cancellation of context which subsequently cancels an ongoing operation, including an image pull. This generally gets followed by containerd's GC routine that tries to delete the prepared snapshots for the image layer(s) corresponding to the image in the pull operation that got cancelled. However, if the upstream client immediately retries (or starts a new) image pull operation, containerd initiates a new image pull and starts unpacking the image layers into snapshots. This may create a race condition: the GC routine (corresponding to the failed image pull operation) trying to clean up the same snapshot that the new image pull operation is preparing, thus leading to the "parent snapshot does not exist: not found" error. Race Condition Scenario: Assume an image consisting of 2 layers (L1 and L2, L1 being the bottom layer) that are supposed to get unpacked into snapshots S1 and S2 respectively. During an image pull operation, containerd unpacks(L1) which involves Stat()'ing the chainID. This Stat() fails as the chainID does not exist and Prepare(L1) gets called. Once S1 gets prepared, containerd processes L2 - unpack(L2) which again involves Stat()'ing the chainID which fails as the chainID for S2 does not exist which results in the call to Prepare(L2). However, if the image pull operation gets cancelled before Prepare(L2) is called, then the GC routine tries to clean up S1. When the image pull operation is retried by the upstream client, containerd follows the same series of operations. unpack(L1) gets called which then calls Stat(chainID) for L1. However, this time, Stat(L1) succedes as S1 already exists (from the previous image pull operation) and thus containerd goes to the next iteration to unpack(L2). Now, GC cleans up S1 and when Prepare(L2) gets called, it returns back the "parent snapshot does not exist: not found" error. Fix: Removing the "Stat() + early return" fixes the race condition. Now during the image pull operation corresponding to the client retry, although the chainID (for L1) already exists, containerd does not return early and goes on to Prepare(L1). Since L1 is already prepared, it adds a new lease to S1 and then returns `ErrAlreadyExists`. This new lease prevents GC from cleaning up S1 when containerd processes L2 (unpack(L2) -> Prepare(L2)). Fixes: https://github.com/containerd/containerd/issues/3787 Signed-off-by: Saket Jajoo <saketjajoo@google.com>	2024-10-02 22:10:15 +00:00
Gabriela Cervantes	24fe444eb6	script/setup/install-runc: Add trap statement to clean up tmp files This PR adds the trap statement in the install runc script to clean up the temporary files and ensure we are not leaving them. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-10-02 19:52:02 +00:00
Brian Goff	6ffdabf725	Makefile: fix shim tags overwritten Go taks multiple `--tags` as overwriting the previously set ones, which is not what we want. Signed-off-by: Brian Goff <cpuguy83@gmail.com>	2024-10-02 19:17:56 +00:00
Maksym Pavlenko	01ca26f209	Merge pull request #10757 from cpuguy83/shim_deps Clean up some dependency trees for runc shim	2024-10-02 16:39:23 +00:00
lengrongfu	095131abf9	add use systemd cgroup e2e Signed-off-by: lengrongfu <lenronfu@gmail.com> Signed-off-by: rongfu.leng <lenronfu@gmail.com>	2024-10-03 00:37:29 +08:00
Derek McGowan	06dfa0c2f1	Merge pull request #10754 from containerd/dependabot/go_modules/github.com/intel/goresctrl-0.8.0 build(deps): bump github.com/intel/goresctrl from 0.7.0 to 0.8.0	2024-10-02 13:53:44 +00:00
Derek McGowan	4d65025d92	Merge pull request #10725 from kiashok/update-hcsshim-0.12.7 Update hcsshim to v0.12.7	2024-10-02 13:52:16 +00:00
Brian Goff	2123855eeb	Add build tag to omit grpc This is needed so we can build the runc shim without grpc as a transative dependency. With this change the runc shim binary went from 14MB to 11MB. The RSS from an idle shim went from about 17MB to 14MB (back around where it was in in 1.7). Signed-off-by: Brian Goff <cpuguy83@gmail.com>	2024-10-02 01:50:48 +00:00
Brian Goff	64d29ebe5b	snapshots: core: Remove dependency on api types Core should not have a dependency on API types. This was causing a transative dependency on grpc when importing the core snapshots package. Signed-off-by: Brian Goff <cpuguy83@gmail.com>	2024-10-02 01:46:19 +00:00
Brian Goff	11ffba3dc4	shim: Do not depend on pkg/oci pkg/oci is a general utility package with dependency chains that are uneccessary for the shim. The shim only actually used it for a convenience function for reading an oci spec file. Instead of pulling in those deps just re-implement that internally in the shim command. Signed-off-by: Brian Goff <cpuguy83@gmail.com>	2024-10-01 21:04:58 +00:00
Derek McGowan	05ee43a5fd	Merge pull request #10752 from dmcgowan/prepare-v2.0.0-rc.5 Prepare release notes for v2.0.0-rc.5	2024-10-01 15:20:19 +00:00
Kirtana Ashok	0d4e606bbc	Update hcsshim to v0.12.7 Signed-off-by: Kirtana Ashok <kiashok@microsoft.com>	2024-09-30 17:38:28 -07:00
dependabot[bot]	78e39f7c5b	build(deps): bump github.com/intel/goresctrl from 0.7.0 to 0.8.0 Bumps [github.com/intel/goresctrl](https://github.com/intel/goresctrl) from 0.7.0 to 0.8.0. - [Release notes](https://github.com/intel/goresctrl/releases) - [Commits](https://github.com/intel/goresctrl/compare/v0.7.0...v0.8.0) --- updated-dependencies: - dependency-name: github.com/intel/goresctrl dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>	2024-09-30 23:56:42 +00:00
Maksym Pavlenko	03e0e4c02f	Merge pull request #10186 from cpuguy83/propagate_traces_to_shim Propagate trace contexts to shims	2024-09-30 23:21:57 +00:00
Brian Goff	17d4a1357e	Propagate trace contexts to shims This adds trace context propagation over the grpc/ttrpc calls to a shim. It also adds the otlp plugin to the runc shim so that it will send traces to the configured tracer (which is inherited from containerd's config). It doesn't look like this is adding any real overhead to the runc shim's memory usage, however it does add 2MB to the binary size. As such this is gated by a build tag `shim_tracing` Signed-off-by: Brian Goff <cpuguy83@gmail.com>	2024-09-30 21:44:16 +00:00
Maksym Pavlenko	03db11c3f2	Merge pull request #10744 from sameersaeed/sandbox-cni-plugins Add check for CNI plugins before tearing down pod network	2024-09-30 15:23:58 +00:00
Derek McGowan	bc4646067d	Prepare release notes for v2.0.0-rc.5 Signed-off-by: Derek McGowan <derek@mcg.dev>	2024-09-30 07:13:14 -07:00
Fu Wei	363e50bee0	Merge pull request #10747 from lujinda/anno-image-user-specified [cri] use 'UserSpecifiedImage' to set the image-name annotation	2024-09-30 04:47:30 +00:00
jinda.ljd	ccb2a8d747	[cri] use 'UserSpecifiedImage' to set the image-name annotation However, when an image has multiple tags, the image originally obtained may not be the one actually specified by the user. Starting from cri-api v0.28.0, a UserSpecifiedImage field is added to ImageSpec. It is more appropriate to use UserSpecifiedImage. Signed-off-by: jinda.ljd <jinda.ljd@alibaba-inc.com>	2024-09-30 08:38:17 +08:00
Maksym Pavlenko	86e0f52e17	Merge pull request #10740 from zouyee/event Add timestamp to PodSandboxStatusResponse for kubernetes Evented PLEG	2024-09-28 00:11:16 +00:00
Sameer	b7b6b324b8	Add check for CNI plugins before tearing down pod network Signed-off-by: Sameer <sameer.saeed@live.ca>	2024-09-27 16:12:03 -04:00
Maksym Pavlenko	db97449598	Merge pull request #10730 from mxpv/features Move features section to a separate file	2024-09-27 14:42:50 +00:00
zouyee	b5290726d2	Add timestamp to PodSandboxStatusResponse for kubernetes Evented PLEG Signed-off-by: zouyee <zouyee1989@gmail.com>	2024-09-27 16:50:00 +08:00
Maksym Pavlenko	3df2cc1a6b	Merge pull request #10717 from containerd/dependabot/go_modules/github.com/klauspost/compress-1.17.10 build(deps): bump github.com/klauspost/compress from 1.17.9 to 1.17.10	2024-09-26 22:33:43 +00:00
Maksym Pavlenko	146a977f92	Move features section to a separate file Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>	2024-09-26 15:32:16 -07:00
Maksym Pavlenko	0dc66dab71	Merge pull request #10718 from containerd/dependabot/go_modules/google.golang.org/grpc-1.67.0 build(deps): bump google.golang.org/grpc from 1.66.2 to 1.67.0	2024-09-26 20:06:54 +00:00
Maksym Pavlenko	83fe5264e7	Merge pull request #10719 from containerd/dependabot/go_modules/github.com/prometheus/client_golang-1.20.4 build(deps): bump github.com/prometheus/client_golang from 1.20.3 to 1.20.4	2024-09-26 20:06:39 +00:00
Rodrigo Campos	30f2893351	core/mount: Only remove dirs if unmount succeeded The detached mount is less likely to fail in our case, but if we see any failure to unmount, we should just skip the removal of directories. Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>	2024-09-24 17:45:34 +02:00
Rodrigo Campos	f8d84ecf92	core/mount: Prevent accidental removal of rootfs files Using os.RemoveAll() is quite risky, as if the unmount failed and we can delete files from the container rootfs. In fact, we were doing just that. Let's use os.Remove() to make sure we only deleted empty dirs. Big kudos to @mbaynton for reporting this issue with lot of details, nailing it down to containerd lines of code and showing all the log lines to understand the big picture. Fixes: #10704 Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>	2024-09-24 17:45:34 +02:00
Rodrigo Campos	004f3951d5	core/mount: Use MNT_DETACH for umount of tmp layers Overlayfs needs to do an idmap mount of each layer and the cleanup function just unmounts and deletes the directories. However, when the resource is busy, the umount fails. Let's make the unmount detached so the unmount will eventually be done when it's not busy anymore. Also, making it detached solves the issues with the unmount failing because it is busy. Big kudos to @mbaynton for reporting this issue with lot of details, nailing it down to containerd lines of code and showing all the log lines to understand the big picture. Fixes: #10704 Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>	2024-09-24 17:34:52 +02:00
dependabot[bot]	f7ca91fa39	build(deps): bump github.com/prometheus/client_golang Bumps [github.com/prometheus/client_golang](https://github.com/prometheus/client_golang) from 1.20.3 to 1.20.4. - [Release notes](https://github.com/prometheus/client_golang/releases) - [Changelog](https://github.com/prometheus/client_golang/blob/main/CHANGELOG.md) - [Commits](https://github.com/prometheus/client_golang/compare/v1.20.3...v1.20.4) --- updated-dependencies: - dependency-name: github.com/prometheus/client_golang dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>	2024-09-23 23:18:45 +00:00
dependabot[bot]	c75178d931	build(deps): bump google.golang.org/grpc from 1.66.2 to 1.67.0 Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.66.2 to 1.67.0. - [Release notes](https://github.com/grpc/grpc-go/releases) - [Commits](https://github.com/grpc/grpc-go/compare/v1.66.2...v1.67.0) --- updated-dependencies: - dependency-name: google.golang.org/grpc dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>	2024-09-23 23:18:36 +00:00
dependabot[bot]	519cbda1d8	build(deps): bump github.com/klauspost/compress from 1.17.9 to 1.17.10 Bumps [github.com/klauspost/compress](https://github.com/klauspost/compress) from 1.17.9 to 1.17.10. - [Release notes](https://github.com/klauspost/compress/releases) - [Changelog](https://github.com/klauspost/compress/blob/master/.goreleaser.yml) - [Commits](https://github.com/klauspost/compress/compare/v1.17.9...v1.17.10) --- updated-dependencies: - dependency-name: github.com/klauspost/compress dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>	2024-09-23 23:18:27 +00:00
Samuel Karp	2ca3ff8725	Merge pull request #10713 from wzshiming/enable-selinux-on-cri-test Enable the selinux on cri test	2024-09-23 17:44:05 +00:00
Fu Wei	906c23218c	Merge pull request #10307 from henry118/uidmap Support multiple uid/gid mappings [1/2]	2024-09-23 12:25:05 +00:00
Shiming Zhang	d72051036a	Enable the selinux on cri test Signed-off-by: Shiming Zhang <wzshiming@hotmail.com>	2024-09-21 16:22:25 +08:00
Akihiro Suda	a448047386	Merge pull request #10699 from containerd/dependabot/go_modules/k8s-82433053af build(deps): bump the k8s group with 4 updates	2024-09-19 22:57:49 +00:00
dependabot[bot]	b03a3c5a21	build(deps): bump the k8s group with 4 updates Bumps the k8s group with 4 updates: [k8s.io/apimachinery](https://github.com/kubernetes/apimachinery), [k8s.io/client-go](https://github.com/kubernetes/client-go), [k8s.io/component-base](https://github.com/kubernetes/component-base) and [k8s.io/kubelet](https://github.com/kubernetes/kubelet). Updates `k8s.io/apimachinery` from 0.31.0 to 0.31.1 - [Commits](https://github.com/kubernetes/apimachinery/compare/v0.31.0...v0.31.1) Updates `k8s.io/client-go` from 0.31.0 to 0.31.1 - [Changelog](https://github.com/kubernetes/client-go/blob/master/CHANGELOG.md) - [Commits](https://github.com/kubernetes/client-go/compare/v0.31.0...v0.31.1) Updates `k8s.io/component-base` from 0.31.0 to 0.31.1 - [Commits](https://github.com/kubernetes/component-base/compare/v0.31.0...v0.31.1) Updates `k8s.io/kubelet` from 0.31.0 to 0.31.1 - [Commits](https://github.com/kubernetes/kubelet/compare/v0.31.0...v0.31.1) --- updated-dependencies: - dependency-name: k8s.io/apimachinery dependency-type: direct:production update-type: version-update:semver-patch dependency-group: k8s - dependency-name: k8s.io/client-go dependency-type: direct:production update-type: version-update:semver-patch dependency-group: k8s - dependency-name: k8s.io/component-base dependency-type: direct:production update-type: version-update:semver-patch dependency-group: k8s - dependency-name: k8s.io/kubelet dependency-type: direct:production update-type: version-update:semver-patch dependency-group: k8s ... Signed-off-by: dependabot[bot] <support@github.com>	2024-09-19 17:25:38 +00:00
Maksym Pavlenko	ff67a67d72	Merge pull request #10700 from containerd/dependabot/go_modules/otel-910e354cca build(deps): bump the otel group with 8 updates	2024-09-19 16:45:26 +00:00
Akihiro Suda	8c64a2f6a1	Merge pull request #10607 from fuweid/pin-userns internal/cri: simplify netns setup with pinned userns	2024-09-19 01:05:41 +00:00
dependabot[bot]	017efe05a2	build(deps): bump the otel group with 8 updates Bumps the otel group with 8 updates: \| Package \| From \| To \| \| --- \| --- \| --- \| \| [go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc](https://github.com/open-telemetry/opentelemetry-go-contrib) \| `0.54.0` \| `0.55.0` \| \| [go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp](https://github.com/open-telemetry/opentelemetry-go-contrib) \| `0.54.0` \| `0.55.0` \| \| [go.opentelemetry.io/otel](https://github.com/open-telemetry/opentelemetry-go) \| `1.29.0` \| `1.30.0` \| \| [go.opentelemetry.io/otel/exporters/otlp/otlptrace](https://github.com/open-telemetry/opentelemetry-go) \| `1.29.0` \| `1.30.0` \| \| [go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc](https://github.com/open-telemetry/opentelemetry-go) \| `1.29.0` \| `1.30.0` \| \| [go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp](https://github.com/open-telemetry/opentelemetry-go) \| `1.29.0` \| `1.30.0` \| \| [go.opentelemetry.io/otel/sdk](https://github.com/open-telemetry/opentelemetry-go) \| `1.29.0` \| `1.30.0` \| \| [go.opentelemetry.io/otel/trace](https://github.com/open-telemetry/opentelemetry-go) \| `1.29.0` \| `1.30.0` \| Updates `go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc` from 0.54.0 to 0.55.0 - [Release notes](https://github.com/open-telemetry/opentelemetry-go-contrib/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-go-contrib/blob/main/CHANGELOG.md) - [Commits](https://github.com/open-telemetry/opentelemetry-go-contrib/compare/zpages/v0.54.0...zpages/v0.55.0) Updates `go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp` from 0.54.0 to 0.55.0 - [Release notes](https://github.com/open-telemetry/opentelemetry-go-contrib/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-go-contrib/blob/main/CHANGELOG.md) - [Commits](https://github.com/open-telemetry/opentelemetry-go-contrib/compare/zpages/v0.54.0...zpages/v0.55.0) Updates `go.opentelemetry.io/otel` from 1.29.0 to 1.30.0 - [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md) - [Commits](https://github.com/open-telemetry/opentelemetry-go/compare/v1.29.0...v1.30.0) Updates `go.opentelemetry.io/otel/exporters/otlp/otlptrace` from 1.29.0 to 1.30.0 - [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md) - [Commits](https://github.com/open-telemetry/opentelemetry-go/compare/v1.29.0...v1.30.0) Updates `go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc` from 1.29.0 to 1.30.0 - [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md) - [Commits](https://github.com/open-telemetry/opentelemetry-go/compare/v1.29.0...v1.30.0) Updates `go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp` from 1.29.0 to 1.30.0 - [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md) - [Commits](https://github.com/open-telemetry/opentelemetry-go/compare/v1.29.0...v1.30.0) Updates `go.opentelemetry.io/otel/sdk` from 1.29.0 to 1.30.0 - [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md) - [Commits](https://github.com/open-telemetry/opentelemetry-go/compare/v1.29.0...v1.30.0) Updates `go.opentelemetry.io/otel/trace` from 1.29.0 to 1.30.0 - [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md) - [Commits](https://github.com/open-telemetry/opentelemetry-go/compare/v1.29.0...v1.30.0) --- updated-dependencies: - dependency-name: go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc dependency-type: direct:production update-type: version-update:semver-minor dependency-group: otel - dependency-name: go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp dependency-type: direct:production update-type: version-update:semver-minor dependency-group: otel - dependency-name: go.opentelemetry.io/otel dependency-type: direct:production update-type: version-update:semver-minor dependency-group: otel - dependency-name: go.opentelemetry.io/otel/exporters/otlp/otlptrace dependency-type: direct:production update-type: version-update:semver-minor dependency-group: otel - dependency-name: go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc dependency-type: direct:production update-type: version-update:semver-minor dependency-group: otel - dependency-name: go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp dependency-type: direct:production update-type: version-update:semver-minor dependency-group: otel - dependency-name: go.opentelemetry.io/otel/sdk dependency-type: direct:production update-type: version-update:semver-minor dependency-group: otel - dependency-name: go.opentelemetry.io/otel/trace dependency-type: direct:production update-type: version-update:semver-minor dependency-group: otel ... Signed-off-by: dependabot[bot] <support@github.com>	2024-09-18 08:30:25 +00:00
Akihiro Suda	67b06872e7	Merge pull request #10695 from containerd/dependabot/go_modules/golang-x-cf6e4563c3 build(deps): bump golang.org/x/mod from 0.20.0 to 0.21.0 in the golang-x group across 1 directory	2024-09-18 07:48:38 +00:00
Maksym Pavlenko	4bdaebb9c8	Merge pull request #10701 from containerd/dependabot/go_modules/google.golang.org/grpc-1.66.2 build(deps): bump google.golang.org/grpc from 1.65.0 to 1.66.2	2024-09-17 22:38:13 +00:00
dependabot[bot]	7c89148a1c	build(deps): bump google.golang.org/grpc from 1.65.0 to 1.66.2 Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.65.0 to 1.66.2. - [Release notes](https://github.com/grpc/grpc-go/releases) - [Commits](https://github.com/grpc/grpc-go/compare/v1.65.0...v1.66.2) --- updated-dependencies: - dependency-name: google.golang.org/grpc dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>	2024-09-16 23:16:37 +00:00
dependabot[bot]	6e2c4d00dc	build(deps): bump golang.org/x/mod Bumps the golang-x group with 1 update in the / directory: [golang.org/x/mod](https://github.com/golang/mod). Updates `golang.org/x/mod` from 0.20.0 to 0.21.0 - [Commits](https://github.com/golang/mod/compare/v0.20.0...v0.21.0) --- updated-dependencies: - dependency-name: golang.org/x/mod dependency-type: direct:production update-type: version-update:semver-minor dependency-group: golang-x ... Signed-off-by: dependabot[bot] <support@github.com>	2024-09-12 20:12:29 +00:00
Maksym Pavlenko	15dd956915	Merge pull request #10687 from containerd/dependabot/go_modules/github.com/prometheus/client_golang-1.20.3 build(deps): bump github.com/prometheus/client_golang from 1.20.2 to 1.20.3	2024-09-12 19:34:57 +00:00
Phil Estes	ee9d950bdc	Merge pull request #10688 from containerd/dependabot/go_modules/github.com/checkpoint-restore/go-criu/v7-7.2.0 build(deps): bump github.com/checkpoint-restore/go-criu/v7 from 7.1.0 to 7.2.0	2024-09-12 19:05:37 +00:00
Wei Fu	ee0ed75d64	internal/cri: simplify netns setup with pinned userns Motivation: For pod-level user namespaces, it's impossible to force the container runtime to join an existing network namespace after creating a new user namespace. According to the capabilities section in [user_namespaces(7)][1], a network namespace created by containerd is owned by the root user namespace. When the container runtime (like runc or crun) creates a new user namespace, it becomes a child of the root user namespace. Processes within this child user namespace are not permitted to access resources owned by the parent user namespace. If the network namespace is not owned by the new user namespace, the container runtime will fail to mount /sys due to the [sysfs: Restrict mounting sysfs][2] patch. Referencing the [cap_capable][3] function in Linux, a process can access a resource if: * The resource is owned by the process's user namespace, and the process has the required capability. * The resource is owned by a child of the process's user namespace, and the owner's user namespace was created by the process's UID. In the context of pod-level user namespaces, the CRI plugin delegates the creation of the network namespace to the container runtime when running the pause container. After the pause container is initialized, the CRI plugin pins the pause container's network namespace into `/run/netns` and then executes the `CNI_ADD` command over it. However, if the pause container is terminated during the pinning process, the CRI plugin might encounter a PID cycle, leading to the `CNI_ADD` command operating on an incorrect network namespace. Moreover, rolling back the `RunPodSandbox` API is complex due to the delegation of network namespace creation. As highlighted in issue #10363, the CRI plugin can lose IP information after a containerd restart, making it challenging to maintain robustness in the RunPodSandbox API. Solution: Allow containerd to create a new user namespace and then create the network namespace within that user namespace. This way, the CRI plugin can force the container runtime to join both the user namespace and the network namespace. Since the network namespace is owned by the newly created user namespace, the container runtime will have the necessary permissions to mount `/sys` on the container's root filesystem. As a result, delegation of network namespace creation is no longer needed. NOTE: * The CRI plugin does not need to pin the newly created user namespace as it does with the network namespace, because the kernel allows retrieving a user namespace reference via [ioctl_ns(2)][4]. As a result, the podsandbox implementation can obtain the user namespace using the `netnsPath` parameter. [1]: <https://man7.org/linux/man-pages/man7/user_namespaces.7.html> [2]: <`7dc5dbc879`> [3]: <`2c85ebc57b/security/commoncap.c (L65)`> [4]: <https://man7.org/linux/man-pages/man2/ioctl_ns.2.html> Signed-off-by: Wei Fu <fuweid89@gmail.com>	2024-09-11 07:21:43 +08:00
Wei Fu	fd3f3d5a13	pkg/sys: add GetUsernsForNamespace interface Signed-off-by: Wei Fu <fuweid89@gmail.com>	2024-09-11 07:21:43 +08:00

... 2 3 4 5 6 ...

14463 Commits