Commit Graph

51 Commits

Author SHA1 Message Date
Sebastiaan van Stijn
4785c70883
switch to github.com/containerd/log for logs
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2024-06-19 23:19:35 +02:00
Kevin Parsons
1b4f6f8edb client: Fix deadlock when writing to pipe blocks
Use sendLock to guard the entire stream allocation + write to wire
operation, and streamLock to only guard access to the underlying stream
map. This ensures the following:
- We uphold the constraint that new stream IDs on the wire are always
  increasing, because whoever holds sendLock will be ensured to get the
  next stream ID and be the next to write to the wire.
- Locks are always released in LIFO order. This prevents deadlocks.

Taking sendLock before releasing streamLock means that if a goroutine
blocks writing to the pipe, it can make another goroutine get stuck
trying to take sendLock, and therefore streamLock will be kept locked as
well. This can lead to the receiver goroutine no longer being able to
read responses from the pipe, since it needs to take streamLock when
processing a response. This ultimately leads to a complete deadlock of
the client.

It is reasonable for a server to block writes to the pipe if the client
is not reading responses fast enough. So we can't expect writes to never
block.

I have repro'd the hang with a simple ttrpc client and server. The
client spins up 100 goroutines that spam the server with requests
constantly. After a few seconds of running I can see it hang. I have set
the buffer size for the pipe to 0 to more easily repro, but it would
still be possible to hit with a larger buffer size (just may take a
higher volume of requests or larger payloads).

I also validated that I no longer see the hang with this fix, by leaving
the test client/server running for a few minutes. Obviously not 100%
conclusive, but before I could get a hang within several seconds of
running.

Signed-off-by: Kevin Parsons <kevpar@microsoft.com>
2024-05-13 14:01:59 -07:00
Fu Wei
9c0db2b1c3
Merge pull request #152 from klihub/devel/unary-interceptor-chaining
Implement support for unary interceptor chaining.
2023-09-06 09:39:17 +08:00
Krisztian Litkey
f984c9b178 client: implement UnaryClientInterceptor chaining.
Add a WithChainUnaryClientInterceptor client option to allow
using more that one client call interceptor which will then
get chained and invoked in the order given.

This should allow us to implement opentelemetry instrumentation
as interceptors while allowing users to keep intercepting their
client calls for other reasons at the same time.

Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
2023-08-25 15:57:52 +03:00
Krisztian Litkey
8ca4110ebc Fix comment for UserOnCloseWait.
Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
2023-07-28 20:02:17 +03:00
Iceber Gu
c51165f20d First process the pending messages in recv channel
Signed-off-by: Iceber Gu <wei.cai-nat@daocloud.io>
2023-05-09 11:40:38 +08:00
Derek McGowan
471297eed9 Add recvClose channel to stream
Prevent panic from closing recv channel, which may be written to after
close. Use a separate channel to signal recv has closed and check that
channel on read and write.

Signed-off-by: Derek McGowan <derek@mcg.dev>
2023-05-08 12:26:34 -07:00
Vincent Batts
6eee73df5d
*.go: organize errors to one spot
And add a little documentation.

Signed-off-by: Vincent Batts <vbatts@hashbangbash.com>
2022-04-19 14:40:16 -04:00
Derek McGowan
80efa545d4 Unwrap syscall error and check
Signed-off-by: Derek McGowan <derek@mcg.dev>
2022-04-07 17:11:40 -07:00
Derek McGowan
d28bc92657 Introduce streaming to client and server
Implementation of the 1.2 protocol with support for streaming. Provides
the client and server interfaces for implementing services with
streaming.

Unary behavior is mostly unchanged and avoids extra stream tracking just
for unary calls. Streaming calls are tracked to route data to the
appropriate stream as it is received.

Stricter stream ID handling, disallowing unexpected re-use of stream
IDs.

Signed-off-by: Derek McGowan <derek@mcg.dev>
2022-04-07 17:11:40 -07:00
Kazuyoshi Kato
d240c5005f Use google.golang.org/protobuf instead of github.com/gogo/protobuf
This change replaces github.com/gogo/protobuf with
google.golang.org/protobuf, except for the code generators.

All proto-encoded structs are now generated from .proto files,
which include ttrpc.Request and ttrpc.Response.

Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
2022-02-16 23:11:27 +00:00
Derek McGowan
f7a2e09ef8 Fix lint issues
Cleanup server test

Signed-off-by: Derek McGowan <derek@mcg.dev>
2022-01-21 12:04:30 -08:00
Kevin Parsons
4f0aeb590b client: Handle sending/receiving in separate goroutines
Changes the TTRPC client logic so that sending and receiving with the
server are in completely independent goroutines, with shared state
guarded by a mutex. Previously, sending/receiving were tied together by
reliance on a coordinator goroutine. This led to issues where if the
server was not reading from the connection, the client could get stuck
sending a request, causing the client to not read responses from the
server. See [1] for more details.

The new design sets up separate sending/receiving goroutines. These
share state in the form of the set of active calls that have been made
to the server. This state is encapsulated in the callMap type and access
is guarded by a mutex.

The main event loop in `run` previously handled a lot of state
management for the client. Now that most state is tracked by the
callMap, it mostly exists to notice when the client is closed and take
appropriate action to clean up.

Also did some minor code cleanup. For instance, the code was previously
written to support multiple receiver goroutines, though this was not
actually used. I've removed this for now, since the code is simpler this
way, and it's easy to add back if we actually need it in the future.

[1] https://github.com/containerd/ttrpc/issues/72

Signed-off-by: Kevin Parsons <kevpar@microsoft.com>
2021-10-13 17:31:34 -07:00
zounengren
81faa3ee80 replace pkg/errors from vendor
Signed-off-by: Zou Nengren <zouyee1989@gmail.com>
2021-09-25 22:46:36 +08:00
Wei Fu
225de2c936 client: add UserOnCloseWait function
ttrpc provides WithOnClose option for user and ttrpc will call the
callback function when connection is closed unexpectedly or the ttrpc
client's Close() method is called. containerd runtime plugin uses it
to handle cleanup the resources created by containerd shim.

But the ttrpc client's Close() is only trigger and the shim's cleanup
resource callback is called asynchronously, which might make part of
resources leaky. There is an example from containerd-runtime-v2 for
runc:

```happy
[Task.Delete goroutine]         [cleanupCallback goroutine]

call ttrpc client.Close() -->

                                  read bundle and call runc delete

delete bundle
```

If the cleanupCallback is called after deleting bundle, the callback
will fail to call runc delete. If there is any running processes, the
resource becomes leaky.

```unhappy
[Task.Delete goroutine]         [cleanupCallback goroutine]

call ttrpc client.Close() -->

delete bundle

                                  failed to read bundle and call runc delete
```

In order to avoid this, introduces the UserOnCloseWait to make sure that
the cleanupCallback has been called synchronously, like:

```
[Task.Delete goroutine]         [cleanupCallback goroutine]

call ttrpc client.Close() -->

wait for callback

                               read bundle and call runc delete

                          <--  finish sync

delete bundle
```

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2020-09-07 23:09:55 +08:00
blade
df116954de fix bug, failed to assert net error due to error wrap
Signed-off-by: blade <blade.shen@ucloud.cn>
2020-08-07 00:57:23 +08:00
Michael Crosby
d4834b09f5 Revert "Copy codes and status from grpc project"
This reverts commit f02233564f.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2019-10-28 14:46:51 -04:00
Michael Crosby
f02233564f Copy codes and status from grpc project
This copies the codes and status package from grpc as it is the only references
to the grpc project from ttrpc. This will help ensure that API breaking changes
in grpc do not affect ttrpc.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2019-10-28 12:33:50 -04:00
Wei Fu
6e416eafd2 return ErrClosed if read: connection reset by peer
When call server.Close(), server will close all listener and notify
flighting-connection to shutdown. Connections are closed asynchronously.
In TestClientEOF, client can send request into closing-connection. But
the read for reply will return error if the closing-connection is
shutdown.

In this case, we should filter error for client side about `read:
connection reset by peer`.

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2019-10-21 19:18:01 +08:00
Michael Crosby
0e0f228740 Handle ok status
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2019-08-28 11:45:14 -04:00
Sebastiaan van Stijn
17f4d32234
Client.Call(): do not return error if no Status is set (gRPC v1.23 and up)
To account for 5da5b1f225,
which is part of gRPC v1.23.0 and up, and after which gRPC no longer sets a
Status if no error occured.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-08-26 19:18:48 +02:00
Phil Estes
1fb3814edf
Merge pull request #42 from crosbymichael/client
Refactor close handling for ttrpc clients
2019-06-13 14:33:16 -04:00
Michael Crosby
694de9d955 metadata as KeyValue type
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2019-06-13 18:06:27 +00:00
Michael Crosby
3afb82bd27 Fix error handling with server shutdown
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2019-06-13 17:19:47 +00:00
Michael Crosby
f3eb35b158 Refactor close handling for ttrpc clients
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2019-06-13 16:52:46 +00:00
Michael Crosby
de8faac08b Add godocs for interceptors
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2019-06-12 20:26:36 +00:00
Michael Crosby
819653f40c Add client and server unary interceptors
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2019-06-07 15:49:42 +00:00
Maksym Pavlenko
04523b9d2c Rename headers to metadata
Signed-off-by: Maksym Pavlenko <makpav@amazon.com>
2019-05-24 17:02:38 -07:00
Maksym Pavlenko
5926a92b70 Support headers
Signed-off-by: Maksym Pavlenko <makpav@amazon.com>
2019-05-23 13:34:04 -07:00
Lantao Liu
ba15956d22 Make onclose an option.
Signed-off-by: Lantao Liu <lantaol@google.com>
2019-04-11 10:57:14 -07:00
Phil Estes
6914432707
Merge pull request #33 from JoeWrightss/patch-1
Fix returns error message
2019-02-10 20:22:30 -08:00
zhoulin xie
ce5c1c4546 Fix returns error message
Signed-off-by: zhoulin xie <zhoulin.xie@daocloud.io>
2019-02-08 22:30:00 +08:00
Brian Goff
a364f44e55 Add support for request timeout propgation.
Adds a new field to the `Request` type which specifies a timeout (in
nanoseconds) for the request. This is propagated on method dispatch as a
context timeout.

There was some discussion here on supporting a broader "metadata" field
(similar to grpc) that can be used for other things, but we ended up
with a dedicated field because it is lighter weight and expect it to be
used pretty heavily as is.... metadata may be added in the future, but
is not necessary for timeouts.

Also discussed using a deadline vs a timeout in the request and decided
to go with a timeout in order to deal with potential clock skew between
the client and server. This also has the side-effect of eliminating the
protocol/wire overhead from the request timeout.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2019-01-07 12:43:52 -08:00
Michael Crosby
d77f111e2e Add client side context.Done support
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2018-08-28 10:54:57 -04:00
Michael Crosby
0690b20898 Add apache license to files
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2018-06-27 17:49:06 -04:00
Arnaud Rebillout
87ac4c6f7a Log with sirupse/logrus to avoid a circular dependency to containerd #6
Signed-off-by: Arnaud Rebillout <arnaud.rebillout@collabora.com>
2018-02-22 13:28:57 +07:00
Michael Crosby
042635eccb Add onclose func
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2018-02-05 16:51:41 -05:00
Stephen Day
4d1bf6563c
Merge pull request #20 from stevvooe/pump-read-block
ttrpc: refactor channel to take a conn
2018-01-16 15:50:42 -08:00
Stephen J Day
c575201d9a
ttrpc: refactor channel to take a conn
Signed-off-by: Stephen J Day <stephen.day@docker.com>
2018-01-16 15:29:07 -08:00
Stephen J Day
2c96d0a152
ttrpc: return correct error on (*Client).Close
Because `shutdownErr` will likely be `nil` in the close select branch,
returning it to waiters will result in the waiting `(*Client).Call`
returning `(nil, nil)`. This should take whatever is set for the client
as the exit condition, which is likely to be `ErrClosed`.

Signed-off-by: Stephen J Day <stephen.day@docker.com>
2018-01-12 16:21:27 -08:00
Stephen J Day
e963fd5a12
ttrpc: return ErrClosed when client is shutdown
To gracefully handle scenarios where the connection is closed or the
client is closed, we now set the final error to be `ErrClosed`. Callers
can resolve it through using `errors.Cause` to detect this condition.

Signed-off-by: Stephen J Day <stephen.day@docker.com>
2018-01-09 14:46:02 -08:00
Stephen J Day
5859cd7b45
ttrpc: return buffers to pool
Signed-off-by: Stephen J Day <stephen.day@docker.com>
2017-11-29 21:32:38 -08:00
Stephen J Day
b774f8872e
ttrpc: refactor client to better handle EOF
The request and response requests opened up a nasty race condition where
waiters could find themselves either blocked or receiving errant errors.
The result was low performance and inadvertent busy waits. This
refactors the client to have a single request into the main client loop,
eliminating the race.

The reason for the original design was to allow a sender to control
request and response individually to make unit testing easier. The unit
test has now been refactored to use a channel to ensure that requests
are serviced on graceful shutdown.

Signed-off-by: Stephen J Day <stephen.day@docker.com>
2017-11-29 21:00:50 -08:00
Stephen J Day
2a1ad5f6c7
ttrpc: increase maximum message length
This change increases the maximum message size to 4MB to be inline
with the grpc default. The buffer management approach has been changed
to use a pool to minimize allocations and keep memory usage low.

Signed-off-by: Stephen J Day <stephen.day@docker.com>
2017-11-29 13:30:41 -08:00
Stephen J Day
b1feeec836
ttrpc: implement Close and Shutdown
This apples logic to correctly Close a server, as well as implements
graceful shutdown. This ensures that inflight requests are not
interrupted and works similar to the functionality in `net/http`.

This required a fair bit of refactoring around how the connection is
managed. The connection now has an explicit wrapper object, ensuring
that shutdown happens in a coordinated fashion, whether or not a
forceful close or graceful shutdown is called.

In addition to the above, hardening around the accept loop has been
added. We now correctly exit on non-temporary errors and debounce the
accept call when encountering repeated errors. This should address some
issues where `SIGTERM` was not honored when dropping into the accept
spin.

Signed-off-by: Stephen J Day <stephen.day@docker.com>
2017-11-29 11:03:51 -08:00
Stephen J Day
bdb2ab7a81
ttrpc: use odd numbers for client initiated streams
Following the convention of http2, we now use odd stream ids for client
initiated streams. This makes it easier to tell who initiates the
stream. We enforce the convention on the server-side.

This allows us to upgrade the protocol in the future to have server
initiated streams.

Signed-off-by: Stephen J Day <stephen.day@docker.com>
2017-11-27 18:18:25 -08:00
Stephen J Day
07cd4de2f2
ttrpc: correctly propagate error from response
Signed-off-by: Stephen J Day <stephen.day@docker.com>
2017-11-22 12:06:18 -08:00
Stephen J Day
7f752bf263
ttrpc: handle concurrent requests and responses
With this changeset, ttrpc can now handle mutliple outstanding requests
and responses on the same connection without blocking. On the
server-side, we dispatch a goroutine per outstanding reequest. On the
client side, a management goroutine dispatches responses to blocked
waiters.

The protocol has been changed to support this behavior by including a
"stream id" that can used to identify which request a response belongs
to on the client-side of the connection. With these changes, we should
also be able to support streams in the future.

Signed-off-by: Stephen J Day <stephen.day@docker.com>
2017-11-21 21:38:38 -08:00
Stephen J Day
2a81659f49
ttrpc: remove use of typeurl
Rather than employ the typeurl package, we now generate code to
correctly allocate the incoming types from the caller. As a side-effect
of this activity, the services definitions have been split out into a
separate type that handles the full resolution and dispatch of the
method, incuding correctly mapping the RPC status.

This work is a pre-cursor to larger protocol change that will allow us
to handle multiple, concurrent requests.

Signed-off-by: Stephen J Day <stephen.day@docker.com>
2017-11-21 18:03:52 -08:00
Stephen J Day
f147d6ca77
ttrpc: rename project to ttrpc
Signed-off-by: Stephen J Day <stephen.day@docker.com>
2017-11-15 17:04:16 -08:00