A nil CRIImplementation field can cause a nil pointer dereference and
panic during startup recovery.
Prior to this change, the nri.API struct would have a nil cri
(CRIImplementation) field after nri.NewAPI until nri.Register was
called. Register is called mid-way through initialization of the CRI
plugin, but recovery for containers occurs prior to that. Container
recovery includes establishing new exit monitors for existing containers
that were discovered. When a container exits, NRI plugins are given the
opportunity to be notified about the lifecycle event, and this is done
by accessing that CRIImplementation field inside the nri.API. If a
container exits prior to nri.Register being called, access to the
CRIImplementation field can cause a panic.
Here's the call-path:
* The CRI plugin starts running
[here](ae71819c4f/pkg/cri/server/service.go (L222))
* It then [calls into](ae71819c4f/pkg/cri/server/service.go (L227))
`recover()` to recover state from previous runs of containerd
* `recover()` then attempts to recover all containers through
[`loadContainer()`](ae7d74b9e2/internal/cri/server/restart.go (L175))
* When `loadContainer()` finds a container that is still running, it waits
for the task (internal containerd object) to exit and sets up
[exit monitoring](ae7d74b9e2/internal/cri/server/restart.go (L391))
* Any exit that then happens must be
[handled](ae7d74b9e2/internal/cri/server/events.go (L145))
* Handling an exit includes
[deleting the Task](ae7d74b9e2/internal/cri/server/events.go (L188))
and specifying [`nri.WithContainerExit`](ae7d74b9e2/internal/cri/nri/nri_api_linux.go (L348))
to [notify](ae7d74b9e2/internal/cri/nri/nri_api_linux.go (L356))
any subscribed NRI plugins
* NRI plugins need to know information about the pod (not just the sandbox),
so before a plugin is notified the NRI API package
[queries the Sandbox Store](ae7d74b9e2/internal/cri/nri/nri_api_linux.go (L232))
through the CRI implementation
* The `cri` implementation member field in the `nri.API` struct is set as part of the
[`Register()`](ae7d74b9e2/internal/cri/nri/nri_api_linux.go (L66)) method
* The `nri.Register()` method is only called
[much further down in the CRI `Run()` method](ae71819c4f/pkg/cri/server/service.go (L279))
Signed-off-by: Samuel Karp <samuelkarp@google.com>
NRI is still newer and mostly used by CRI plugin. Keep the package in
internal to allow for interfaces as the project matures.
Signed-off-by: Derek McGowan <derek@mcg.dev>
so that we cri service don't have to get sandbox controller everytime it
needs to call sandbox controller api.
Signed-off-by: Abel Feng <fshb1988@gmail.com>