[boost] Review of Boost.MySQL

18 May 2022

      Hello Boost Mailing List, this is my review of Boost.MySQL,

------------
Introduction
------------

Full disclosure, Rubén initially approached me and asked if a MySQL Asio
implementation would be wortwhile and I emphatically said, "Yes!" so in some
sense, it's impossible for me to be unbiased here.

I strongly vote in favor of accepting this library into Boost, for the
following
set of reasons:
* it actually works
* it's incredibly idiomatic Asio code
* it enables developers to meet business needs
* everything else can be built on top of it

----------
Background
----------

I was one of the earlier users of this library, using it going back probably
about a year or so ago. I used it at my last job to store records in a
local DB.
There's probably still a tractor somewhere in a tomato field in California
that's running a version of Boost.MySQL on it.

Originally, our goal was to have replication working between our deployed
machines and a remote server that stored everything "for real". But
unfortunately, that requires a bit of DBA voodoo that I'm not capable of so
I
settled for implementing my own replication logic for the tables where we
had
to preserve the data.

It should also be known that I guess I fall into the "advanced" category of
Asio
user though that is somewhat contentious. I can write an iniating function
okay
and preserve executor/allocator correctness.

I've also written my fair share of SQL queries as well, working as a webdev
for
5+ years so I'm no stranger to MySQL.

---------------------------
Discussed Issues & Concerns
---------------------------

I've seen some recurring themes during review. Namely, no connection pool,
no
direct parse-to-struct, no fail-over/healthcheck support.

As far as a connection pool is concerned, this should, in my opinion, be
considered out-of-scope for an initial acceptance candidate, especially
considering the work that was already put into the candidate we have today.

It is nice to have one but it's also an open-ended design space and if one
was
included in the implementation, we'd all be bikeshedding it to death.
Instead, I
think it's better to let users write what works for them and then gradually
we
see what a community-driven approach looks like and add that to Boost
officially
at a later time.

It's also worth noting that if you're running against a localhost DB,
creating a
unix socket connection on something like `/var/run/mysqld/mysqld.sock` is
essentially free and you won't even need to use TLS. This helps ameliorate
some
of the need for a connection pool.

Now, in terms of a health-check, this is something the library could add to
its
`connection` class and have it function identically to how Beast has its
websocket streams with their automatic ping/pong semantics.

I also saw in a previous email the idea of `void connection::ping()`. I
think
this should be added before acceptance. If we don't add automatic
health-checks
for a connection, we should at least give users a proper tool for solving
that
problem themselves.

I will say, however, that it's actually common in Asio to handle
reconnection
and health-checking yourself. For example, I've delivered Asio-driven
applications that communicated with serial ports and CANbus (using
SocketCAN).
These connection types had reconnection and timeout logic that I manually
implemented. These things seem daunting at first but become second-nature
once
you're accustomed to Asio. Even moreso now that Asio's awaitable types
implement
`operator||`.

As for fail-over, I can't comment too directly as it's something I've never
had
to deal with in my career. On first thought, I don't know if it'd be
exactly the
most difficult thing to get a connection to realize it's disconnected and
then
reconnect to another database from a known whitelist. I'm also under the
impression that services like AWS' RDS will simply give you a static
hostname
or IP address to connect against which acts as a proxy and then the
fail-over
happens in the background. Maybe I misunderstand the intent of what
"fail-over"
is so I could simply be uninformed with an opinion here.

For parsing directly into a user-defined struct, this is always something
nice
to have but it shouldn't be a requirement as C++ offers no standardized
reflection interface so it's unreasonable to expect this library to
accomodate
that. This was similarly debated at length during the JSON review and we've
learned that a library can still be good while not parsing directly to a
structure.

-----------------
Personal Opinions
-----------------

There's a school of thought that only the lowest-level components should be
standardized on a first-pass. Higher-level components can always be built
on top
of solid primitives. I think Boost.MySQL creates this set of primitives and
I
think it succeeds.
...
From a connection, you derive prepared statements or queries. From those,
you
derive a resultset which is an interface for reading what the DB writes
back.
Rows derive from the resultset and then values derive from the rows. The
surface
area of the library is delightfully small and well-contained.
I think this library is "simple", for better or worse. I empathize with the
side
that expresses woes about this simplicity. I think perhaps I'm hardened by
my
own experiences with Asio so the pains of this library are nothing new to
me. To
this end, the simplicity of the library is a welcome breath of fresh air
from my
perspective. The for example, is actually small which is incredibly rare
these
days.

There is one other issue I'd still like to discuss and it's the infamous
usage
of `std::vector` in interfaces. This is not a problem as a convenience
interface
as 98% of users are going to use `std::vector` anyway. But it does inhibit
Allocators which is unsuitable for acceptance as a Boost library. Everywhere
`std::vector` appears, there should be an accompanying version that works
with a
`Vector&` out-param, i.e.

    my_company::vector<mysql::row> v;
    co_await resultset.async_read_all(v);

I don't think requiring `vector`-like operations on the type is problematic
but
preventing control over the allocation is.

`error_info&` as an out-param is slightly awkward but I do appreciate that
it at
least enables us to read the messages about why our queries are wrong. This
was
my only real complaint about the ergonomics of the library. I would've
expected
to somehow pull this information from the `error_code` so using an
`error_info&`
out-param is clunky here.

I also don't particularly mind that the backing data of a `value` is tied
to the
lifetime of a `row` but I can see why this is confusing so perhaps a change
in
naming is required.

---------
Summation
---------

In summary, the library is great, it's intuitive, it's idiomatic and I
think it
belongs in Boost aside other great Asio-based libraries like Beast. At some
point, Boost will have its own LAMP/MEAN stack which is the dream.

I also realize, I never critiqued the implementation and I think that's
because
parsing/serializing perf isn't a priority for me and simply working
correctly
and being clean under valgrind are all that matter so long as the library
isn't
horrendously slow.

If anyone has different opinions on my take, I'd love to hear them.

- Christian

[boost] Review of Boost.MySQL

Christian Mazakas