kubernetes/test/images/README.md

# Kubernetes test images

## Overview

All the images found here are used in Kubernetes tests that ensure its features and functionality.
The images are built and published as manifest lists, allowing multiarch and cross platform support.

This guide will provide information on how to: make changes to images, bump their version, build the
new images, test the changes made, promote the newly built staging images.


## Prerequisites

In order to build the docker test images, a Linux node is required. The node will require `make`
and `docker (version 18.06.0 or newer)`. Manifest lists were introduced in 18.03.0, but 18.06.0
is recommended in order to avoid certain issues.

The node must be able to push the images to the desired container registry, make sure you are
authenticated with the registry you're pushing to.

Windows Container images are not built by default, since they cannot be built on Linux. For
that, a Windows node with Docker installed and configured for remote management is required.


### Windows node(s) setup

In order to build the Windows container images, a node with Windows 10 or Windows Server 2019
with the latest updates installed is required. The node will have to have Docker installed,
preferably version 18.06.0 or newer.

Keep in mind that the Windows node might not be able to build container images for newer OS versions
than itself (even with `--isolation=hyperv`), so keeping the node up to date and / or upgrading it
to the latest Windows Server edition is ideal.

Windows test images must be built for Windows Server 2019 (1809) and Windows Server 1903, thus,
if the node does not have Hyper-V enabled, or it is not supported, multiple Windows nodes are required,
one per OS version.

Additionally, remote management must be configured for the node's Docker daemon. Exposing the
Docker daemon without requiring any authentication is not recommended, and thus, it must be
configured with TLS to ensure that only authorised people can interact with it. For this, the
following `powershell` script can be executed:

```powershell
mkdir .docker
docker run --isolation=hyperv --user=ContainerAdministrator --rm `
  -e SERVER_NAME=$(hostname) `
  -e IP_ADDRESSES=127.0.0.1,YOUR_WINDOWS_BUILD_NODE_IP `
  -v "c:\programdata\docker:c:\programdata\docker" `
  -v "$env:USERPROFILE\.docker:c:\users\containeradministrator\.docker" stefanscherer/dockertls-windows:2.5.5
# restart the Docker daemon.
Restart-Service docker
```

For more information about the above commands, you can check [here](https://hub.docker.com/r/stefanscherer/dockertls-windows/).

A firewall rule to allow connections to the Docker daemon is necessary:

```powershell
New-NetFirewallRule -DisplayName 'Docker SSL Inbound' -Profile @('Domain', 'Public', 'Private') -Direction Inbound -Action Allow -Protocol TCP -LocalPort 2376
```

If your Windows build node is hosted by a cloud provider, make sure the port `2376` is open for the node.
For example, in Azure, this is done by running the following command:

```console
az vm open-port -g GROUP-NAME -n NODE-NAME --port 2376
```

The `ca.pem`, `cert.pem`, and `key.pem` files that can be found in `$env:USERPROFILE\.docker`
will have to copied to the `~/.docker-${os_version)/` on the Linux build node, where `${os_version}`
is `1809` or `1903`.

```powershell
scp.exe -r $env:USERPROFILE\.docker ubuntu@YOUR_LINUX_BUILD_NODE:/home/ubuntu/.docker-$os_version
```

After all this, the Linux build node should be able to connect to the Windows build node:

```bash
docker --tlsverify --tlscacert ~/.docker-${os_version}/ca.pem --tlscert ~/.docker-${os_version}/cert.pem --tlskey ~/.docker-${os_version}/key.pem -H "$REMOTE_DOCKER_URL" version
```

For more information and troubleshooting about enabling Docker remote management, see
[here](https://docs.microsoft.com/en-us/virtualization/windowscontainers/management/manage_remotehost)

Finally, the node must be able to push the images to the desired container registry, make sure you are
authenticated with the registry you're pushing to.


## Making changes to images

There are several thousands of tests in Kubernetes E2E testing. Not all of them are being run on
new PRs, and thus, not all images are used, especially those that are not used by Conformance tests.

So, in order to prevent regressions in the images and failing jobs, any changes made to the image
itself or its binaries will require the image's version to be bumped. In the case of a regression
which cannot be immediately resolved, the image version used in E2E tests will be reverted to the
last known stable version.

The version can easily be bumped by modifying the file `test/images/${IMAGE_NAME}/VERSION`, which will
be used when building the image. Additionally, for the `agnhost` image, also bump the `Version` in
`test/images/agnhost/agnhost.go`.

The typical image used in E2E testing is the `agnhost` image. It contains several subcommands with
different [functionalities](agnhost/README.md), used to validate different Kubernetes behaviours. If
a new functionality needs testing, consider adding an `agnhost` subcommand for it first, before
creating an entirely separate test image.

Some test images (`agnhost`) are used as bases for other images (`kitten`, `nautilus`). If the parent
image's `VERSION` has been bumped, also bump the version in the children's `BASEIMAGE` files in order
for base image changes to be reflected in the child images as well.

Keep in mind that the Kubernetes CI will not run with the image changes you've made. It is a good idea
to build the image and push it to your own registry first, and run some tests that are using that image.
For these steps, see the sections below.

After the desired changes have been made, the affected images will have to be built and published,
and then tested. After the pull request with those changes has been approved, the new images will be
built and published to the `gcr.io/kubernetes-e2e-test-images` registry as well.

Currently, the image building process has been automated with the Image Promoter, but *only* for the
Conformance images (`agnhost`, `jessie-dnsutils`, `kitten`, `nautilus`, `nonewprivs`, `resource-consumer`,
`sample-apiserver`).  After the pull request merges, a postsubmit job will be started with the new changes,
which can be tracked [here](https://testgrid.k8s.io/sig-testing-images#post-kubernetes-push-images).
After it passes successfully, the new image will reside in the `gcr.io/k8s-staging-e2e-test-images/${IMAGE_NAME}:${VERSION}`
registry, from which it will have to be promoted by adding a line for it
[here](https://github.com/kubernetes/k8s.io/blob/master/k8s.gcr.io/images/k8s-staging-e2e-test-images/images.yaml).
For this, you will need the image manifest list's digest, which can be obtained by running:

```bash
manifest-tool inspect --raw gcr.io/k8s-staging-e2e-test-images/${IMAGE_NAME}:${VERSION} | jq '.[0].Digest'
```

The images are built through `make`. Since some images (e.g.: `busybox`) are used as a base for
other images, it is recommended to build them first, if needed.


## Building images

The images are built through `make`. Since some images (`agnhost`) are used as a base for other images,
it is recommended to build them first, if needed.

An image can be built by simply running the command:

```bash
make all WHAT=agnhost
```

To build AND push an image, the following command can be used:

```bash
make all-push WHAT=agnhost
```

By default, the images will be tagged and pushed under the `gcr.io/kubernetes-e2e-test-images`
registry. That can changed by running this command instead:

```bash
REGISTRY=foo_registry make all-push WHAT=agnhost
```

In order to also include Windows Container images into the final manifest lists, the `REMOTE_DOCKER_URL` argument
in the form `tcp://[host]:[port][path]` (for more details, see [here]([https://docs.docker.com/engine/reference/commandline/dockerd/#daemon-socket-option]/))
will also have to be specified:

```bash
REMOTE_DOCKER_URL_1909=remote_docker_url_1909 REMOTE_DOCKER_URL_1903=remote_docker_url_1903 REMOTE_DOCKER_URL_1809=remote_docker_url_1809 REGISTRY=foo_registry make all-push WHAT=test-webserver
```

*NOTE* (for test `gcr.io` image publishers): Some tests (e.g.: `should serve a basic image on each replica with a private image`)
require the `agnhost` image to be published in an authenticated repo as well:

```bash
REGISTRY=gcr.io/kubernetes-e2e-test-images make all-push WHAT=agnhost
REGISTRY=gcr.io/k8s-authenticated-test make all-push WHAT=agnhost
```


## Testing the new image

Once the image has been built and pushed to an accesible registry, you can run the tests using that image
by having the environment variable `KUBE_TEST_REPO_LIST` set before running the tests that are using the
image:

```bash
export KUBE_TEST_REPO_LIST=/path/to/repo_list.yaml
```

`repo_list.yaml` is a configuration file used by the E2E tests, in which you can set alternative registries
to pull the images from. Sample file:

```yaml
dockerLibraryRegistry: your-awesome-registry
e2eRegistry: your-awesome-registry
gcRegistry: your-awesome-registry
sampleRegistry: your-awesome-registry
```

Keep in mind that some tests are using multiple images, so it is a good idea to also build and push those images.

Finally, make sure to bump the image version used in E2E testing by modifying the file `test/utils/image/manifest.go`, and recompile afterwards:

```bash
./build/run.sh make WHAT=test/e2e/e2e.test
```

After all the above has been done, run the desired tests.


## Known issues and workarounds

`docker manifest create` fails due to permission denied on `/etc/docker/certs.d/gcr.io` (https://github.com/docker/for-linux/issues/396). This issue can be resolved by running:

```bash
sudo chmod o+x /etc/docker
```

`nc` is being used by some E2E tests, which is why we are including a Linux-like `nc.exe` into the Windows `busybox` image. The image could fail to build during that step with an error that looks like this:

```console
re-exec error: exit status 1: output: time="..." level=error msg="hcsshim::ImportLayer failed in Win32: The system cannot find the path specified. (0x3) path=\\\\?\\C:\\ProgramData\\...
```

The issue is caused by the Windows Defender which is removing the `nc.exe` binary from the filesystem. For more details on this issue, see [here](https://github.com/diegocr/netcat/issues/6). To fix this, you can simply run the following powershell command to temporarily disable Windows Defender:

```powershell
Set-MpPreference -DisableRealtimeMonitoring $true
```