Fix a race where an asynchronous server.Serve() invoked in a
a goroutine races with an almost immediate server.Shutdown().
If Shutdown() finishes its locked closing of listeners before
Serve() gets around to add the new one, Serve will sit stuck
forever in l.Accept(), unless the caller closes the listener
in addition to Shutdown().
This is probably almost impossible to trigger in real life,
but some of the unit tests, which run the server and client
in the same process, occasionally do trigger this. Then, if
the test tries to verify a final ErrServerClosed error from
Serve() after Shutdown() it gets stuck forever.
Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
Use a dedicated, grpc Status-compatible error to wrap the
unique grpc status code, the size of the rejected message
and the maximum allowed size when a message is rejected
due to size limitations by the sending side.
Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
Reject oversized messages on the sender side, keeping the
receiver side rejection intact. This should provide minimal
low-level plumbing for clients to attempt application level
corrective actions on the requestor side, if the high-level
protocol is designed with this in mind.
Co-authored-by: Alessio Cantillo <cantillo.trd@gmail.com>
Co-authored-by: Qian Zhang <cosmoer@qq.com>
Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
Use sendLock to guard the entire stream allocation + write to wire
operation, and streamLock to only guard access to the underlying stream
map. This ensures the following:
- We uphold the constraint that new stream IDs on the wire are always
increasing, because whoever holds sendLock will be ensured to get the
next stream ID and be the next to write to the wire.
- Locks are always released in LIFO order. This prevents deadlocks.
Taking sendLock before releasing streamLock means that if a goroutine
blocks writing to the pipe, it can make another goroutine get stuck
trying to take sendLock, and therefore streamLock will be kept locked as
well. This can lead to the receiver goroutine no longer being able to
read responses from the pipe, since it needs to take streamLock when
processing a response. This ultimately leads to a complete deadlock of
the client.
It is reasonable for a server to block writes to the pipe if the client
is not reading responses fast enough. So we can't expect writes to never
block.
I have repro'd the hang with a simple ttrpc client and server. The
client spins up 100 goroutines that spam the server with requests
constantly. After a few seconds of running I can see it hang. I have set
the buffer size for the pipe to 0 to more easily repro, but it would
still be possible to hit with a larger buffer size (just may take a
higher volume of requests or larger payloads).
I also validated that I no longer see the hang with this fix, by leaving
the test client/server running for a few minutes. Obviously not 100%
conclusive, but before I could get a hang within several seconds of
running.
Signed-off-by: Kevin Parsons <kevpar@microsoft.com>
Fixes error "is a proto3 file that contains optional fields, but code generator protoc-gen-go-ttrpc hasn't been updated to support optional fields in proto3. Please ask the owner of this code generator to support proto3 optional."
Signed-off-by: Derek McGowan <derek@mcg.dev>
Add a WithChainUnaryServerInterceptor server option to allow
using more that one server side interceptor which will then
get chained and invoked in the order given.
This should allow us to implement opentelemetry instrumentation
as interceptors while allowing users to keep intercepting their
server side calls for other reasons at the same time.
Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
Add a WithChainUnaryClientInterceptor client option to allow
using more that one client call interceptor which will then
get chained and invoked in the order given.
This should allow us to implement opentelemetry instrumentation
as interceptors while allowing users to keep intercepting their
client calls for other reasons at the same time.
Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
In the test for client Call failing with ErrClosed on a closed
server, wait for the client's OnClose handler to get triggered
to make sure closing the socket had properly been administered
on the client's side. Otherwise trying a new Call() might fail
with some other error than ErrClosed, for instance ENOTCONN.
Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
Increase timeout for build+run tests CI workflow
job from 5 to 10 minutes. It regularly times out
during github's busiest hours, typically either on
windows or macos workers.
Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
Prevent panic from closing recv channel, which may be written to after
close. Use a separate channel to signal recv has closed and check that
channel on read and write.
Signed-off-by: Derek McGowan <derek@mcg.dev>