Commit Graph

44 Commits

Author SHA1 Message Date
Samuel Karp
b7a8a54141
Merge pull request #6995 from akhilerm/retry-on-reset 2022-09-28 10:36:55 -07:00
Derek McGowan
bc01f8fc05
Add reader option to local content reader at
Allows optimized copying from a local content file into another file.

Signed-off-by: Derek McGowan <derek@mcg.dev>
2022-09-20 20:50:04 -07:00
Akhil Mohan
8f4c23b69f
retry request on writer reset
when a put request is retried due to the response from registry,
the body of the request should be seekable. A dynamic pipe is added
to the body so that the content of the body can be read again.
Currently a maximum of 5 resets are allowed, above which will fail the
request. A new error ErrReset is introduced which informs that a
reset has occured and request needs to be retried.

also added tests for Copy() and push() to test the new functionality

Signed-off-by: Akhil Mohan <makhil@vmware.com>
2022-09-20 22:09:11 +05:30
Derek McGowan
0c2c289d4c
Fix seek error used without nil check
Signed-off-by: Derek McGowan <derek@mcg.dev>
2022-01-07 12:19:23 -08:00
haoyun
bbe46b8c43 feat: replace github.com/pkg/errors to errors
Signed-off-by: haoyun <yun.hao@daocloud.io>
Co-authored-by: zounengren <zouyee1989@gmail.com>
2022-01-07 10:27:03 +08:00
haoyun
c0d07094be feat: Errorf usage
Signed-off-by: haoyun <yun.hao@daocloud.io>
2021-12-13 14:31:53 +08:00
Eng Zer Jun
50da673592
refactor: move from io/ioutil to io and os package
The io/ioutil package has been deprecated as of Go 1.16, see
https://golang.org/doc/go1.16#ioutil. This commit replaces the existing
io/ioutil functions with their new definitions in io and os packages.

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
2021-09-21 09:50:38 +08:00
Derek McGowan
2458afeb13
Fix content copy to not ignore unexpected EOF
Signed-off-by: Derek McGowan <derek@mcg.dev>
2021-09-09 10:01:51 -07:00
Amr Mahdi
b81917ee72 Add comments clarifying copyWithBuffer implementation
Signed-off-by: Amr Mahdi <amramahdi@gmail.com>
2020-11-03 04:25:42 +00:00
Amr Mahdi
f6834d4c0b replicate io.Copy optimizations
Signed-off-by: Amr Mahdi <amramahdi@gmail.com>
2020-10-28 05:50:14 +00:00
Amr Mahdi
289130b8a7 Improve image pull performance from http 1.1 container registries
Private registries that does not support http 2.0 such as Azure Container Registry streams back content in a max of 16KB chunks (max TLS record size). The small chunks introduce an overhead when copying the layers to the content store sine each chunk incurs the overhead of  grpc message that has to be sent to the content store.

This change reduces this overhead by buffering the chunks into 1MB chunks and only then writes a message to the content store.

Below is a per comparsion between the 2 approaches using a couple of large images that are being pulled from the docker hub (http 2.0) and a private Azure CR (http 1.1) in seconds.

image                                                   | Buffered copy | master
-------                                                 |---------------|----------
docker.io/pytorch/pytorch:latest                        |  55.63        | 58.33
docker.io/nvidia/cuda:latest                            |  72.05        | 75.98
containerdpulltest.azurecr.io/pytorch/pytorch:latest    | 61.45         | 77.1
containerdpulltest.azurecr.io/nvidia/cuda:latest        | 77.13         | 85.47

Signed-off-by: Amr Mahdi <amramahdi@gmail.com>
2020-10-26 05:05:09 +00:00
Derek McGowan
dde436e65b Crypto library movement and changes to content helper interfaces
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
2019-07-17 15:21:29 -04:00
Stefan Berger
bf8804c743 Implemented image encryption/decryption libraries and ctr commands
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Brandon Lum <lumjjb@gmail.com>
2019-07-17 15:19:58 -04:00
Edgar Lee
faf925ba25 Handle EOF from ReadAt in content.ReadBlob
Signed-off-by: Edgar Lee <edgarl@netflix.com>
2019-06-04 10:18:36 -07:00
Derek McGowan
a62be324b7
Unify docker and oci importer
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
2018-09-17 14:41:43 -07:00
Akihiro Suda
d88de4a34f content: change Writer/ReaderAt to take OCI
This change allows implementations to resolve the location of the actual data
using OCI descriptor fields such as MediaType.

No OCI descriptor field is written to the store.

No change on gRPC API.

Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
2018-06-01 11:51:43 +09:00
Derek McGowan
3a6825e1a0
content: remove unnecessary defer in helpers
Deferring the put back of the buffer is not necessary since
it can be done immediately after use, before checking error.
This decreases the amount of the time the buffer is out
of the pool and not in use.

Signed-off-by: Derek McGowan <derek@mcgstyle.net>
2018-04-20 11:55:59 -07:00
Derek McGowan
5304ef294b
Add writer open helper to handle unavailable refs
Updates blob writer helper to use new open and ensure
unavailable errors are always handled.
Removes duplication of unavailable handling code.

Signed-off-by: Derek McGowan <derek@mcgstyle.net>
2018-03-21 16:30:22 -07:00
Stephen Day
b3b95c0a2a
Merge pull request #2154 from dmcgowan/shared-content-ingests
content: shared content across namespaces
2018-03-12 16:11:32 -07:00
Derek McGowan
a1a67899f8
Shared content across namespaces
Update content ingests to use content from another namespace.
Ingests must be committed to make content available and the
client will see the sharing as an ingest which has already
been fully written to, but not completed.

Updated the database version to change the ingest record in
the database from a link key to an object with a link and
expected value. This expected value is used to indicate that
the content already exists and an underlying writer may
not yet exist.

Signed-off-by: Derek McGowan <derek@mcgstyle.net>
2018-02-22 14:45:10 -08:00
Kunal Kushwaha
b12c3215a0 Licence header added
Signed-off-by: Kunal Kushwaha <kushwaha_kunal_v7@lab.ntt.co.jp>
2018-02-19 10:32:26 +09:00
Derek McGowan
a12f493bd3
Update copy to discard over truncate
Prevents the copy method from calling discard on the
writer when the reader is not seekable. Instead,
the copy method will discard up to the offset.
Truncate is a more expensive operation since any
bytes that are truncated already have their hash calculated
and are stored on disk in the backend. Re-writing bytes
which were truncated requires transfering the data over
GRPC again and re-computing the hash up to the point of
truncation.

Signed-off-by: Derek McGowan <derek@mcgstyle.net>
2018-01-26 11:44:50 -08:00
Daniel Nephin
9184908075 Fix image pull after a failure
When resuming from a failed pull writer.Truncate() was not
seeking to the proper position in the file. This caused writes to
happen after the previously written content, instead of at the start
of the file.

Signed-off-by: Daniel Nephin <dnephin@gmail.com>
2017-12-13 18:24:12 -05:00
Daniel Nephin
ee04cfa3f9 Add staticcheck linter
Fix issues with sync.Pool being passed an array and not a pointer.
See https://github.com/dominikh/go-tools/blob/master/cmd/staticcheck/docs/checks/SA6002

Add missing tests for content.Copy

Fix T.Fatal being called in a goroutine

Signed-off-by: Daniel Nephin <dnephin@gmail.com>
2017-11-28 13:05:30 -05:00
Stephen J Day
682151b166 remotes/docker: implement seekable http requests
To support resumable download, the fetcher for a remote must implement
`io.Seeker`. If implemented the `content.Copy` function will detect the
seeker and begin from where the download was terminated by a previous
attempt.

Signed-off-by: Stephen J Day <stephen.day@docker.com>
2017-11-16 16:13:06 -05:00
Derek McGowan
b798d87bd4
Ensure manifests are marked as root during pull
For schema1 mark blobs as roots and remove labels
once referenced by the created manifest.

Signed-off-by: Derek McGowan <derek@mcgstyle.net>
2017-10-17 16:26:52 -07:00
Derek McGowan
7884707c2f
Add reference labels to snapshots and content
Ensure all snapshots and content are referenced on commit and
protected from cleanup.

Signed-off-by: Derek McGowan <derek@mcgstyle.net>
2017-10-11 10:42:47 -07:00
Michael Crosby
f43b7acfd2 Update files based on go lint
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2017-10-02 10:15:28 -04:00
Derek McGowan
9613acb2ed
Add context to content commit
Content commit is updated to take in a context, allowing
content to be committed within the same context the writer
was in. This is useful when commit may be able to use more
context to complete the action rather than creating its own.
An example of this being useful is for the metadata implementation
of content, having a context allows tests to fully create
content in one database transaction by making use of the context.

Signed-off-by: Derek McGowan <derek@mcgstyle.net>
2017-09-06 10:19:12 -07:00
Stephen J Day
8be340e37b
content: remove Provider.Reader
After some analysis, it was found that Content.Reader was generally
redudant to an io.ReaderAt. This change removes `Content.Reader` in
favor of a `Content.ReaderAt`. In general, `ReaderAt` can perform better
over interfaces with indeterminant latency because it avoids remote
state for reads. Where a reader is required, a helper is provided to
convert it into an `io.SectionReader`.

Signed-off-by: Stephen J Day <stephen.day@docker.com>
2017-08-09 14:32:28 -07:00
Derek McGowan
442365248b
Move content store implementation
Move the filesystem content store implementation to a
subpackage of content.

Signed-off-by: Derek McGowan <derek@mcgstyle.net>
2017-07-24 10:11:34 -07:00
Stephen J Day
a4fadc596b
errdefs: centralize error handling
Now that we have most of the services required for use with containerd,
it was found that common patterns were used throughout services. By
defining a central `errdefs` package, we ensure that services will map
errors to and from grpc consistently and cleanly. One can decorate an
error with as much context as necessary, using `pkg/errors` and still
have the error mapped correctly via grpc.

We make a few sacrifices. At this point, the common errors we use across
the repository all map directly to grpc error codes. While this seems
positively crazy, it actually works out quite well. The error conditions
that were specific weren't super necessary and the ones that were
necessary now simply have better context information. We lose the
ability to add new codes, but this constraint may not be a bad thing.

Effectively, as long as one uses the errors defined in `errdefs`, the
error class will be mapped correctly across the grpc boundary and
everything will be good. If you don't use those definitions, the error
maps to "unknown" and the error message is preserved.

Signed-off-by: Stephen J Day <stephen.day@docker.com>
2017-06-29 15:00:47 -07:00
Derek McGowan
ae24077d2c
Avoid fetch call to registry when blob already exists
Open up content store writer with expected digest before
opening up the fetch call to the registry. Add Copy method
to content store helpers to allow content copy to take
advantage of seeking helper code.

Signed-off-by: Derek McGowan <derek@mcgstyle.net>
2017-05-22 13:24:45 -07:00
Stephen J Day
e53539c58f
cmd/dist, cmd/ctr: end to end image pull
With this changeset, we now have a proof of concept of end to end pull.
Up to this point, the relationship between subsystems has been somewhat
theoretical. We now leverage fetching, the snapshot drivers, the rootfs
service, image metadata and the execution service, validating the proposed
model for containerd. There are a few caveats, including the need to move some
of the access into GRPC services, but the basic components are there.

The first command we will cover here is `dist pull`. This is the analog
of `docker pull` and `git pull`. It performs a full resource fetch for
an image and unpacks the root filesystem into the snapshot drivers. An
example follows:

``` console
$ sudo ./bin/dist pull docker.io/library/redis:latest
docker.io/library/redis:latest:                                                   resolved       |++++++++++++++++++++++++++++++++++++++|
manifest-sha256:4c8fb09e8d634ab823b1c125e64f0e1ceaf216025aa38283ea1b42997f1e8059: done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:3b281f2bcae3b25c701d53a219924fffe79bdb74385340b73a539ed4020999c4:    done           |++++++++++++++++++++++++++++++++++++++|
config-sha256:e4a35914679d05d25e2fccfd310fde1aa59ffbbf1b0b9d36f7b03db5ca0311b0:   done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:4b7726832aec75f0a742266c7190c4d2217492722dfd603406208eaa902648d8:    done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:338a7133395941c85087522582af182d2f6477dbf54ba769cb24ec4fd91d728f:    done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:83f12ff60ff1132d1e59845e26c41968406b4176c1a85a50506c954696b21570:    done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:693502eb7dfbc6b94964ae66ebc72d3e32facd981c72995b09794f1e87bac184:    done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:622732cddc347afc9360b4b04b46c6f758191a1dc73d007f95548658847ee67e:    done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:19a7e34366a6f558336c364693df538c38307484b729a36fede76432789f084f:    done           |++++++++++++++++++++++++++++++++++++++|
elapsed: 1.6 s                                                                    total:   0.0 B (0.0 B/s)
INFO[0001] unpacking rootfs
```

Note that we haven't integrated rootfs unpacking into the status output, but we
pretty much have what is in docker today (:P). We can see the result of our pull
with the following:

```console
$ sudo ./bin/dist images
REF                            TYPE                                                 DIGEST                                                                  SIZE
docker.io/library/redis:latest application/vnd.docker.distribution.manifest.v2+json sha256:4c8fb09e8d634ab823b1c125e64f0e1ceaf216025aa38283ea1b42997f1e8059 1.8 kB
```

The above shows that we have an image called "docker.io/library/redis:latest"
mapped to the given digest marked with a specific format. We get the size of
the manifest right now, not the full image, but we can add more as we need it.
For the most part, this is all that is needed, but a few tweaks to the model
for naming may need to be added. Specifically, we may want to index under a few
different names, including those qualified by hash or matched by tag versions.
We can do more work in this area as we develop the metadata store.

The name shown above can then be used to run the actual container image. We can
do this with the following command:

```console
$ sudo ./bin/ctr run --id foo docker.io/library/redis:latest /usr/local/bin/redis-server
1:C 17 Mar 17:20:25.316 # Warning: no config file specified, using the default config. In order to specify a config file use /usr/local/bin/redis-server /path/to/redis.conf
1:M 17 Mar 17:20:25.317 * Increased maximum number of open files to 10032 (it was originally set to 1024).
                _._
           _.-``__ ''-._
      _.-``    `.  `_.  ''-._           Redis 3.2.8 (00000000/0) 64 bit
  .-`` .-```.  ```\/    _.,_ ''-._
 (    '      ,       .-`  | `,    )     Running in standalone mode
 |`-._`-...-` __...-.``-._|'` _.-'|     Port: 6379
 |    `-._   `._    /     _.-'    |     PID: 1
  `-._    `-._  `-./  _.-'    _.-'
 |`-._`-._    `-.__.-'    _.-'_.-'|
 |    `-._`-._        _.-'_.-'    |           http://redis.io
  `-._    `-._`-.__.-'_.-'    _.-'
 |`-._`-._    `-.__.-'    _.-'_.-'|
 |    `-._`-._        _.-'_.-'    |
  `-._    `-._`-.__.-'_.-'    _.-'
      `-._    `-.__.-'    _.-'
          `-._        _.-'
              `-.__.-'

1:M 17 Mar 17:20:25.326 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:M 17 Mar 17:20:25.326 # Server started, Redis version 3.2.8
1:M 17 Mar 17:20:25.326 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
1:M 17 Mar 17:20:25.326 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
1:M 17 Mar 17:20:25.326 * The server is now ready to accept connections on port 6379
```

Wow! So, now we are running `redis`!

There are still a few things to work out. Notice that we have to specify the
command as part of the arguments to `ctr run`. This is because are not yet
reading the image config and converting it to an OCI runtime config. With the
base laid in this PR, adding such functionality should be straightforward.

While this is a _little_ messy, this is great progress. It should be easy
sailing from here.

Signed-off-by: Stephen J Day <stephen.day@docker.com>
2017-03-21 13:08:23 -07:00
Stephen J Day
5da4e1d0d2 services/content: move service client into package
Signed-off-by: Stephen J Day <stephen.day@docker.com>
2017-02-28 17:12:24 -08:00
Stephen J Day
d99756a8a2
content: allow reset via Truncate
To make restarting after failed pull less racy, we define `Truncate(size
int64) error` on `content.Writer` for the zero offset. Truncating a
writer will dump any existing data and digest state and start from the
beginning. All subsequent writes will start from the zero offset.

For the service, we support this by defining the behavior for a write
that changes the offset. To keep this narrow, we only support writes out
of order at the offset 0, which causes the writer to dump existing data
and reset the local hash.

This makes restarting failed pulls much smoother when there was a
previously encountered error and the source doesn't support arbitrary
seeks or reads at arbitrary offsets. By allowing this to be done while
holding the write lock on a ref, we can restart the full download
without causing a race condition.

Once we implement seeking on the `io.Reader` returned by the fetcher,
this will be less useful, but it is good to ensure that our protocol
properly supports this use case for when streaming is the only option.

Signed-off-by: Stephen J Day <stephen.day@docker.com>
2017-02-28 10:40:02 -08:00
Stephen J Day
850f8addc2
content: close writer after opening
Signed-off-by: Stephen J Day <stephen.day@docker.com>
2017-02-24 16:08:31 -08:00
Stephen J Day
c062a85782
content: cleanup service and interfaces
After implementing pull, a few changes are required to the content store
interface to make sure that the implementation works smoothly.
Specifically, we work to make sure the predeclaration path for digests
works the same between remote and local writers. Before, we were
hesitent to require the the size and digest up front, but it became
clear that having this provided significant benefit.

There are also several cleanups related to naming. We now call the
expected digest `Expected` consistently across the board and `Total` is
used to mark the expected size.

This whole effort comes together to provide a very smooth status
reporting workflow for image pull and push. This will be more obvious
when the bulk of pull code lands.

There are a few other changes to make `content.WriteBlob` more broadly
useful. In accordance with addition for predeclaring expected size when
getting a `Writer`, `WriteBlob` now supports this fully. It will also
resume downloads if provided an `io.Seeker` or `io.ReaderAt`. Coupled
with the `httpReadSeeker` from `docker/distribution`, we should only be
a lines of code away from resumable downloads.

Signed-off-by: Stephen J Day <stephen.day@docker.com>
2017-02-22 13:30:01 -08:00
Michael Crosby
3101be93bc Load runtimes dynamically via go1.8 plugins
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>

Add registration for more subsystems via plugins

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>

Move content service to separate package

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2017-02-21 16:29:46 -08:00
Stephen J Day
621164bc84
content: refactor content store for API
After iterating on the GRPC API, the changes required for the actual
implementation are now included in the content store. The begin change
is the move to a single, atomic `Ingester.Writer` method for locking
content ingestion on a key. From this, comes several new interface
definitions.

The main benefit here is the clarification between `Status` and `Info`
that came out of the GPRC API. `Status` tells the status of a write,
whereas `Info` is for querying metadata about various blobs.

Signed-off-by: Stephen J Day <stephen.day@docker.com>
2017-02-21 13:10:22 -08:00
Stephen J Day
f9cd9be61a
dist: expand functionality of the dist tool
With this change, we add the following commands to the dist tool:

- `ingest`: verify and accept content into storage
- `active`: display active ingest processes
- `list`: list content in storage
- `path`: provide a path to a blob by digest
- `delete`: remove a piece of content from storage

We demonstrate the utility with the following shell pipeline:

```
$ ./dist fetch docker.io/library/redis latest mediatype:application/vnd.docker.distribution.manifest.v2+json | \
    jq -r '.layers[] | "./dist fetch docker.io/library/redis "+.digest + "| ./dist ingest --expected-digest "+.digest+" --expected-size "+(.size | tostring) +" docker.io/library/redis@"+.digest' | xargs -I{} -P10 -n1 sh -c "{}"
```

The above fetches a manifest, pipes it to jq, which assembles a shell
pipeline to ingest each layer into the content store. Because the
transactions are keyed by their digest, concurrent downloads and
downloads of repeated content are ignored. Each process is then executed
parallel using xargs.

Put shortly, this is a parallel layer download.

In a separate shell session, could monitor the active downloads with the
following:

```
$ watch -n0.2 ./dist active
```

For now, the content is downloaded into `.content` in the current
working directory. To watch the contents of this directory, you can use
the following:

```
$ watch -n0.2 tree .content
```

This will help to understand what is going on internally.

To get access to the layers, you can use the path command:

```
$./dist path sha256:010c454d55e53059beaba4044116ea4636f8dd8181e975d893931c7e7204fffa
sha256:010c454d55e53059beaba4044116ea4636f8dd8181e975d893931c7e7204fffa /home/sjd/go/src/github.com/docker/containerd/.content/blobs/sha256/010c454d55e53059beaba4044116ea4636f8dd8181e975d893931c7e7204fffa
```

When you are done, you can clear out the content with the classic xargs
pipeline:

```
$ ./dist list -q | xargs ./dist delete
```

Note that this is mostly a POC. Things like failed downloads and
abandoned download cleanup aren't quite handled. We'll probably make
adjustments around how content store transactions are handled to address
this.

From here, we'll build out full image pull and create tooling to get
runtime bundles from the fetched content.

Signed-off-by: Stephen J Day <stephen.day@docker.com>
2017-01-27 10:29:10 -08:00
Stephen J Day
f27d43285a
content: cleanup package exports
The package exports are now cleaned up to remove a lot of stuttering in
the API. We also remove direct mapping of refs to the filesystem, opting
for a hash-based approach. This *does* affect insertion performance,
since it requires more individual file ios. A benchmark baseline has
been added and we can fix this later.

Signed-off-by: Stephen J Day <stephen.day@docker.com>
2017-01-23 20:11:35 -08:00
Gábor Lipták
d8aee18f6c Point digest import to opencontainers/go-digest
Signed-off-by: Gábor Lipták <gliptak@gmail.com>
2017-01-09 18:10:52 -05:00
Stephen J Day
3469905bbb
content: break up into multiple files
Break up the content store prototype into a few logical files. We have a
file for the store, the writer and helpers.

Also, the writer has been modified to remove write and exec permissions
on blobs in the store.

Signed-off-by: Stephen J Day <stephen.day@docker.com>
2016-11-03 17:18:45 -07:00