integration: add ShouldRetryShutdown case based on #7496

Since the moby/moby can't handle duplicate exit event well, it's hard for containerd to retry shutdown if there is error, like context canceled. In order to prevent from regression like #4769, I add skipped integration case as TODO item and we should rethink about how to handle the task/shim lifecycle. Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-08-11 08:06:15 +00:00
parent 8dcb2a6e6d
commit 601699a184
2 changed files with 76 additions and 1 deletions
--- a/runtime/v2/shim.go
+++ b/runtime/v2/shim.go
@@ -458,6 +458,12 @@ func (s *shimTask) delete(ctx context.Context, sandboxed bool, removeTask func(c
 	// If not, the shim has been delivered the exit and delete events.
 	// So we should remove the record and prevent duplicate events from
 	// ttrpc-callback-on-close.
+	//
+	// TODO: It's hard to guarantee that the event is unique and sent only
+	// once. The moby/moby should not rely on that assumption that there is
+	// only one exit event. The moby/moby should handle the duplicate events.
+	//
+	// REF: https://github.com/containerd/containerd/issues/4769
 	if shimErr == nil {
 		removeTask(ctx, s.ID())
 	}
@@ -466,7 +472,11 @@ func (s *shimTask) delete(ctx context.Context, sandboxed bool, removeTask func(c
 	// Let controller decide when to shutdown.
 	if !sandboxed {
 		if err := s.waitShutdown(ctx); err != nil {
-			log.G(ctx).WithField("id", s.ID()).WithError(err).Error("failed to shutdown shim task")
+			// FIXME(fuweid):
+			//
+			// If the error is context canceled, should we use context.TODO()
+			// to wait for it?
+			log.G(ctx).WithField("id", s.ID()).WithError(err).Error("failed to shutdown shim task and the shim might be leaked")
 		}
 	}