Re: RSocket Followup (post TOC meeting)


Ben Hale <bhale@...>
 

A pretty common way that application flow control is achieved in TCP-based distributed systems is for a message consumer to simply stop reading messages off a TCP connection if it wants the message producer to stop sending messages from the other end. TCP does all the rest automatically, compliments of sliding windows, receive buffers etc. If senders and receivers use fairly basic thread pools, back pressure through the system “just works”, in my experience. It sounds like a fairly significant part of Rsocket is an additional protocol to achieve much the same back-pressure-based application flow control? Other than to support non-TCP transports (like UDP), which I would assume are fairly uncommon, why did you feel it necessary to add an additional layer of application flow control on top of what TCP already provides?
TCP flow control is at the byte level and RSocket is flow control at the message level. It's often easier to reason at the message level when you build applications because you don't have to guess how many bytes a single message is. RSocket arbitrages between transport level flow control, and Reactive Streams semantic.

When you have asynchronous applications that don't have thread pools, the back pressure provided by TCP is often broken because the number of bytes in a message doesn't necessarily correspond to how expensive it is to process. You can have a one megabyte message processed swiftly and a one kilobyte message that is expensive. Most people already deal with this application-level flow control problem already using circuit breakers, rate limiters, request throttling and bounded thread pools as you mention. These are all examples of application back pressure but the common limitation is that all cause an error on the client side when capacity is exceeded. Essentially, you can only find out if you need to back off by attempting first and then failing. RSocket eliminates this failing attempt because a client will only request a volume _it_ knows it can handle, and then the server will only send that. This prevents things like retry storms and extra processing while resulting in better tail latency, etc.

To illustrate this, imagine you have three services A streaming to B, B streaming to C. Lets also say that they are streaming data at 25% of the capacity of the TCP buffer. C slows down with garbage collection, or expensive to process items. With TCP each buffer has to fill up before A (the root generator) slows down and this only happens after messages have been sent. This leads to cascading failures in services: C is going to get messages it can't handle regardless of the TCP back pressure. The typical way to deal with is add circuit breakers, and rate limiting that create errors when they reach a certain capacity. With RSocket you don't need circuit breakers; instead C stops sending requestN requests to B, and B stops sending requestN's A. Everything slows down to the rate that C can handle, and this happens uniquely and individually for each end-client's performance.

Besides application safety, Reactive Stream requestN semantics make it easier for the developer to create applications that scroll over data, page over data, or take a finite amount of data from a stream. It's difficult to reason about how many bytes 8 'user profiles' are, but it's very easy to ask for 8.


-Ben

Join cncf-toc@lists.cncf.io to automatically receive all group messages.