[boost] Review of proposed Boost.Async

15 Aug 2023

      Hi all,

Thanks Klemens for submitting this library.

I've taken a different approach for the review this time.
I've tried to build something useful with it. I was already writing a web chat
application with a C++ backend. It was using Boost.Context coroutines
(the ones you get with boost::asio::spawn). I have rewritten the server
to use Boost.Async coroutines.

This email is not a review (I'll write it later), but just a summary of my
experience using the library, in case anyone finds it useful.

You can see the full source code here:
https://github.com/anarthal/servertech-chat/tree/async-rewrite/server

The application
===============

It is a super-simplistic chatting application, where users can send messages
that get broadcast to other users in the same chat room. The application uses
a HTTP webserver to serve static files, websockets for message broadcasting
and Redis for message persistence. It uses Boost.Asio, Boost.Beast, Boost.Redis
and now Boost.Async (together with other foundational Boost libraries).

My experience
=============

I've been able to rewrite it and make it work. I've had some problems (which I
described below) but have been able to solve them. Control flow was mostly
identical to stackful coroutines, except in some key places I describe below.

I've used the use_op token extensively to interoperate with Beast and Redis.
As proposed, it causes problems with Beast's websockets, as it's moving
the websocket's internal implementation. As a result, after the first initiation
that uses use_op, the websocket is rendered unusable. I could overcome it
by slightly modifying use_op's implementation. This is tracked by
https://github.com/klemens-morgenstern/async/issues/68. It's still under
discussion whether this is a bug in Async or in Beast.

use_op uses a per-operation, stack-based memory pool of 2KB internally. It
creates a completion handler with an associated allocator that will consume
this stack-based memory before calling operator new. The pool gets destroyed
once the co_await expression finishes execution. I had a problem with
Boost.Redis, where redis::connection::async_exec was calling complete() without
releasing the memory allocated in the operation, which led to crashes.
I think it's a problem in Boost.Redis, not in Async. Tracked by
https://github.com/boostorg/redis/issues/140.

By default, use_op converts error codes to exceptions. This is consistent
with Asio's use_awaitable, but it's a setting I personally don't like using.
However, it's easy to adapt it by using asio::as_tuple(async::use_op) as
completion token. This should probably be mentioned in docs.

I'm using async::with to clean-up Redis connections on application exit. I
think it's a very useful feature, and one that Asio doesn't offer. It would be
a nice addition to be able to return values from the with call though,
to make code less verbose. Tracked by
https://github.com/klemens-morgenstern/async/issues/69.

As any Beast-based HTTP server, my code has a TCP acceptor that listens
for connections and launches HTTP sessions. At first, I was running these
sessions as detached promises (by using async::promise::operator+).
This caused a deadlock on finalization. async::co_main does not cancel detached
promises, but does wait for them to finish after receiving a SIGINT.
I solved this by placing my promises in an async::wait_group instead of
detaching them. However, I think this behavior is non-obvious and can cause
problems for people less experienced with Asio. I've raised
https://github.com/klemens-morgenstern/async/issues/72 to track this.

I'm currently using detached promises to broadcast messages over websocket
sessions. Compared to asio::spawn, spawning detached tasks in Boost.Async
is significantly easier.

As you may know, when using Beast's websockets, you need to make sure that
the application doesn't perform two concurrent async reads or writes on the
same websocket at the same time. In this chat server, two concurrent writes
can happen if two messages need to be broadcast at the same time. Thus, I was
using a hand-rolled async_mutex, based on Asio channels, to guarantee mutual
exclusion (unfortunately, there is no asio::mutex).

I tried re-writing my async_mutex using async::channel, but I couldn't, because
my implementation relied on asio::channel::try_send, which Async does not offer.
It would be great if this library could offer async locking primitives,
like mutex, condition_variable or event, like Python's asyncio does.
I don't think this is a requirement for inclusion, though.

I then decided to remove the mutex and base my strategy on channels. Websocket
reads and writes are now managed by a single coroutine. If any other session
needs to write through a different session's websocket, it writes a message
through a channel. The coroutine running the websocket session reads from
the websocket and the channel in parallel, using async::select.

My first, naive implementation used
async::select(websocket.async_read(..., use_op), channel.read()).
This does not work as intended because, if the channel read completes first,
select will cancel the websocket read, which will cause the websocket connection
to close. This can be overcome easily with generators, as they implement
the wait_interrupt mechanism. If the channel read completes first, the generator
can be resumed later, without losing information.

Although there exists an asio::experimental::parallel_group with similar
functionality, I don't think this channel-based strategy could be achieved using
plain Asio, since the non-destructive capabilities of the generator are vital.
parallel_group is also almost undocumented, which makes it difficult to use.

My only concern with channels as they are is that channel.read() will throw an
exception on cancellation. This means that
async::select(websocket_generator, channel.read())
will throw-and-catch an exception internally every time a message arrives
on the websocket. I believe that exceptions should be exceptional,
and it's not the case here. It makes debugging harder, and may have a
performance impact (which I haven't measured).
https://github.com/klemens-morgenstern/async/issues/76 tracks this.

I've been able to successfully build this project using clang-17 under Linux.
I had problems building wait_group under gcc-12
(https://github.com/klemens-morgenstern/async/issues/73).

Some other minor issues I found:
* https://github.com/klemens-morgenstern/async/issues/75
* https://github.com/klemens-morgenstern/async/issues/66
* https://github.com/klemens-morgenstern/async/issues/71

Some of these issues have already been fixed in the develop branch by Klemens.
If you're trying the library, I'd advise you to use this branch.

Questions
=========

Q1. Is it possible to somehow address the deadlock problem?
Q2. Is it possible to get non-throwing overloads of the channel functions?
Q3. The docs mention that MSVC compilers have broken coroutine implementations.
What compilers/OS does this library target/has been tested with?

Thanks Klemens for making the async world less painful.

Regards,
Ruben.

[boost] Review of proposed Boost.Async

Ruben Perez