Re: [boost] Review of Boost.MySQL

19 May 2022

      On Thu, 19 May 2022 at 00:10, Christian Mazakas via Boost
<boost@lists.boost.org> wrote:
...
Hello Boost Mailing List, this is my review of Boost.MySQL,
Hi Christian, I'm happy to see that you found time to write your review.
Thanks also for supporting the idea of proposing this library to Boost,
it wouldn't be happening without it.
...
------------
Introduction
------------
Full disclosure, Rubén initially approached me and asked if a MySQL Asio
implementation would be wortwhile and I emphatically said, "Yes!" so in some
sense, it's impossible for me to be unbiased here.
I strongly vote in favor of accepting this library into Boost, for the
following
set of reasons:
* it actually works
* it's incredibly idiomatic Asio code
* it enables developers to meet business needs
* everything else can be built on top of it
----------
Background
----------
I was one of the earlier users of this library, using it going back probably
about a year or so ago. I used it at my last job to store records in a
local DB.
There's probably still a tractor somewhere in a tomato field in California
that's running a version of Boost.MySQL on it.
Originally, our goal was to have replication working between our deployed
machines and a remote server that stored everything "for real". But
unfortunately, that requires a bit of DBA voodoo that I'm not capable of so
I
settled for implementing my own replication logic for the tables where we
had
to preserve the data.
It should also be known that I guess I fall into the "advanced" category of
Asio
user though that is somewhat contentious. I can write an iniating function
okay
and preserve executor/allocator correctness.
I've also written my fair share of SQL queries as well, working as a webdev
for
5+ years so I'm no stranger to MySQL.
---------------------------
Discussed Issues & Concerns
---------------------------
I've seen some recurring themes during review. Namely, no connection pool,
no
direct parse-to-struct, no fail-over/healthcheck support.
As far as a connection pool is concerned, this should, in my opinion, be
considered out-of-scope for an initial acceptance candidate, especially
considering the work that was already put into the candidate we have today.
It is nice to have one but it's also an open-ended design space and if one
was
included in the implementation, we'd all be bikeshedding it to death.
Instead, I
think it's better to let users write what works for them and then gradually
we
see what a community-driven approach looks like and add that to Boost
officially
at a later time.
Some user input on use cases here would definitely help.
https://github.com/anarthal/mysql/issues/19 tracks the issue, though.
...
It's also worth noting that if you're running against a localhost DB,
creating a
unix socket connection on something like `/var/run/mysqld/mysqld.sock` is
essentially free and you won't even need to use TLS. This helps ameliorate
some
of the need for a connection pool.
Now, in terms of a health-check, this is something the library could add to
its
`connection` class and have it function identically to how Beast has its
websocket streams with their automatic ping/pong semantics.
I also saw in a previous email the idea of `void connection::ping()`. I
think
this should be added before acceptance. If we don't add automatic
health-checks
for a connection, we should at least give users a proper tool for solving
that
problem themselves.
https://github.com/anarthal/mysql/issues/62 tracks this. Just as a note,
I have been playing with the official MySQL C API and with the mysql
executable and I've grepped its source code and they don't seem
to automatically use the ping command. They expose the protocol function
as I proposed, too. They add an option to automatically reconnect the
client when calling ping() (and any other function). That could be
interesting but again, I don't think it's a common Asio pattern.
...
I will say, however, that it's actually common in Asio to handle
reconnection
and health-checking yourself. For example, I've delivered Asio-driven
applications that communicated with serial ports and CANbus (using
SocketCAN).
These connection types had reconnection and timeout logic that I manually
implemented. These things seem daunting at first but become second-nature
once
you're accustomed to Asio. Even moreso now that Asio's awaitable types
implement
`operator||`.
As for fail-over, I can't comment too directly as it's something I've never
had
to deal with in my career. On first thought, I don't know if it'd be
exactly the
most difficult thing to get a connection to realize it's disconnected and
then
reconnect to another database from a known whitelist. I'm also under the
impression that services like AWS' RDS will simply give you a static
hostname
or IP address to connect against which acts as a proxy and then the
fail-over
happens in the background. Maybe I misunderstand the intent of what
"fail-over"
is so I could simply be uninformed with an opinion here.
...
From what I've read about this topic (I'm far from an expert here,
unfortunately), MySQL/MariaDB implement the following replication
options:

* Basic replication (MySQL and MariaDB). This is a source/replica
architecture, where a single source node updates multiple replica nodes.
This is the oldest form of replication, failover must be explicitly invoked
via a CHANGE REPLICATION SOURCE TO command, and implies
that clients must know each specific node and failover manually.
Docs: https://dev.mysql.com/doc/refman/8.0/en/replication.html
* Group replication (MySQL). This features a group of MySQL servers
and a membership service that manages the group automatically.
There is usually a single primary server (handling reads and writes)
and several replica servers, handling only reads (although other configurations
are possible). The membership service gives you automatic failover
(i.e. when a node fails, it's marked as failed and removed from the group.
If it was a primary, a new primary is automatically elected). However,
clients need to know the available nodes and switch manually to a new node
if the one they were connected to becomes unavailable.
Docs: https://dev.mysql.com/doc/refman/8.0/en/group-replication.html
* InnoDB cluster (MySQL). This is built on top of group replication,
and provides a higher level abstraction - it's easier to manage.
As a client, you still need to know which host to connect to and
fail over manually.
Docs: https://dev.mysql.com/doc/mysql-shell/8.0/en/mysql-innodb-cluster.html
* InnoDB ReplicaSet (MySQL). This is similar to InnoDB cluster, but built
on top of basic replication. In general, documentation suggests that we
should prefer InnoDB cluster over InnoDB ReplicaSet.
Docs: https://dev.mysql.com/doc/refman/8.0/en/mysql-innodb-replicaset-introduction...
* MySQL router (MySQL). This is essentially a load balancer that can sit
before InnoDB clusters or InnoDB ReplicaSets, to which clients can connect to.
The router exposes the same protocol interface as a regular MySQL server, and
forwards packets to the actual servers in the cluster. There are
usually two open
ports in the router: one for read/write connections (which should be forwarded
to primaries) and other for read-only connections (which should be forwarded to
secondaries). This can be achieved by this library without problems.
When a server fails over, the router closes the connection to the client.
The client should then attempt to re-open the client to the router, as if a
network disconnect took place. The router will then forward traffic to a
different MySQL server.
To sum up, users using MySQL Router need only implement reconnection
logic, and will get failover handling for free.
Docs on failover:
https://dev.mysql.com/doc/mysql-router/8.0/en/mysql-router-general-using-dev...
Docs on router: https://dev.mysql.com/doc/mysql-router/8.0/en/
* NDB cluster (MySQL). This uses a different storage engine (NDB vs.
traditional InnoDB), and can be accessed either as a regular MySQL server
(i.e. using this or any other library) or a dedicated API (which should be
more performant). It doesn't seem possible to put a MySQL Router instance
before one of these clusters, so manual failover would be required.
* Galera cluster (MySQL and MariaDB). This is third-party software,
but seems to have been adopted as the default by MariaDB. Similar
to InnoDB cluster.
Docs in MariaDB: https://mariadb.com/kb/en/about-galera-replication/
Site: https://galeracluster.com/
* MariaDB MaxScale. This is similar to MySQL router, but for MariaDB.
I don't know what MaxScale clients see when the server they are connected
fails over, but it is likely that it works similarly as in MySQL Router.

If you're using the official C API, you've got two options:
* mysql_real_connect(), which is given a hostname and a port. This will
perform regular hostname resolution and then connect to each returned host
(same as boost::asio::connect() free function).
* mysql_real_connect_dns_srv(), which is given the name of a DNS SRV
record. It will retrieve a set of hostnames from that DNS query and then
call mysql_real_connect() for each of these hosts. This could be a nice-to-have
for replication, although I don't know if Asio supports DNS SRV queries.
This is a pretty recent addition to the API.

Disclaimer: I haven't tried these scenarios, only read the official docs.
A deeper investigation in this topic, as well as some documentation, would
be nice. I've raised https://github.com/anarthal/mysql/issues/93 to
address this.
I'd say an example on reconnection would be nice, and raised
https://github.com/anarthal/mysql/issues/92 to address it.
...
For parsing directly into a user-defined struct, this is always something
nice
to have but it shouldn't be a requirement as C++ offers no standardized
reflection interface so it's unreasonable to expect this library to
accomodate
that. This was similarly debated at length during the JSON review and we've
learned that a library can still be good while not parsing directly to a
structure.
My idea here would be adding support for std::tuple and structs using
Boost.Describe, as explained in https://github.com/anarthal/mysql/issues/60.
...
-----------------
Personal Opinions
-----------------
There's a school of thought that only the lowest-level components should be
standardized on a first-pass. Higher-level components can always be built
on top
of solid primitives. I think Boost.MySQL creates this set of primitives and
I
think it succeeds.
From a connection, you derive prepared statements or queries. From those,
you
derive a resultset which is an interface for reading what the DB writes
back.
Rows derive from the resultset and then values derive from the rows. The
surface
area of the library is delightfully small and well-contained.
I've had some discussions on the possibility of making prepared_statement
and resultset not be I/O objects, and moving their I/O functions to the
connection object. Doing this, you'd have simpler prepared_statement
and resultset types, and a type-erased connection object would be
simpler to implement. The interface would roughly go from the current:

prepared_statement::execute(const params<ValueIterator>&)
prepared_statement::close()
resultset::read_one(row&)

To these ones:

connection::execute_statement(const prepared_statement&, const
params<ValueIterator>&)
connection::close_statement(const prepared_statement&)
connection::read_row(resultset&, row&)

Do you think this interface would still be intuitive for the user?
...
I think this library is "simple", for better or worse. I empathize with the
side
that expresses woes about this simplicity. I think perhaps I'm hardened by
my
own experiences with Asio so the pains of this library are nothing new to
me. To
this end, the simplicity of the library is a welcome breath of fresh air
from my
perspective. The for example, is actually small which is incredibly rare
these
days.
Thanks. That was one of my design goals.
...
There is one other issue I'd still like to discuss and it's the infamous
usage
of `std::vector` in interfaces. This is not a problem as a convenience
interface
as 98% of users are going to use `std::vector` anyway. But it does inhibit
Allocators which is unsuitable for acceptance as a Boost library. Everywhere
`std::vector` appears, there should be an accompanying version that works
with a
`Vector&` out-param, i.e.
my_company::vector<mysql::row> v;
    co_await resultset.async_read_all(v);
I don't think requiring `vector`-like operations on the type is problematic
but
preventing control over the allocation is.
This will definitely be addressed before the library gets into Boost.
https://github.com/anarthal/mysql/issues/58 tracks it.
...
`error_info&` as an out-param is slightly awkward but I do appreciate that
it at
least enables us to read the messages about why our queries are wrong. This
was
my only real complaint about the ergonomics of the library. I would've
expected
to somehow pull this information from the `error_code` so using an
`error_info&`
out-param is clunky here.
That's a good shout. I don't like it either, but haven't found a better way to
transmit a server-provided error message back to the user. Other than storing
it in the connection and then calling something like GetLastError(),
but it feels
too "plain-old-C", and may be problematic for stuff like pipeline mode.
Supporting an extra overload for every async function is also a pain.
Any ideas are welcome.
...
I also don't particularly mind that the backing data of a `value` is tied
to the
lifetime of a `row` but I can see why this is confusing so perhaps a change
in
naming is required.
https://github.com/anarthal/mysql/issues/85 tracks it,
proposed names are field (for owning values) and
field_view (for non-owning ones).
...
---------
Summation
---------
In summary, the library is great, it's intuitive, it's idiomatic and I
think it
belongs in Boost aside other great Asio-based libraries like Beast. At some
point, Boost will have its own LAMP/MEAN stack which is the dream.
I also realize, I never critiqued the implementation and I think that's
because
parsing/serializing perf isn't a priority for me and simply working
correctly
and being clean under valgrind are all that matter so long as the library
isn't
horrendously slow.
If anyone has different opinions on my take, I'd love to hear them.
- Christian
Thank you.

Regards,
Ruben.

Re: [boost] Review of Boost.MySQL

Ruben Perez