On Thu, 19 May 2022 at 00:10, Christian Mazakas via Boost
Hello Boost Mailing List, this is my review of Boost.MySQL,
Hi Christian, I'm happy to see that you found time to write your review. Thanks also for supporting the idea of proposing this library to Boost, it wouldn't be happening without it.
------------ Introduction ------------
Full disclosure, Rubén initially approached me and asked if a MySQL Asio implementation would be wortwhile and I emphatically said, "Yes!" so in some sense, it's impossible for me to be unbiased here.
I strongly vote in favor of accepting this library into Boost, for the following set of reasons: * it actually works * it's incredibly idiomatic Asio code * it enables developers to meet business needs * everything else can be built on top of it
---------- Background ----------
I was one of the earlier users of this library, using it going back probably about a year or so ago. I used it at my last job to store records in a local DB. There's probably still a tractor somewhere in a tomato field in California that's running a version of Boost.MySQL on it.
Originally, our goal was to have replication working between our deployed machines and a remote server that stored everything "for real". But unfortunately, that requires a bit of DBA voodoo that I'm not capable of so I settled for implementing my own replication logic for the tables where we had to preserve the data.
It should also be known that I guess I fall into the "advanced" category of Asio user though that is somewhat contentious. I can write an iniating function okay and preserve executor/allocator correctness.
I've also written my fair share of SQL queries as well, working as a webdev for 5+ years so I'm no stranger to MySQL.
--------------------------- Discussed Issues & Concerns ---------------------------
I've seen some recurring themes during review. Namely, no connection pool, no direct parse-to-struct, no fail-over/healthcheck support.
As far as a connection pool is concerned, this should, in my opinion, be considered out-of-scope for an initial acceptance candidate, especially considering the work that was already put into the candidate we have today.
It is nice to have one but it's also an open-ended design space and if one was included in the implementation, we'd all be bikeshedding it to death. Instead, I think it's better to let users write what works for them and then gradually we see what a community-driven approach looks like and add that to Boost officially at a later time.
Some user input on use cases here would definitely help. https://github.com/anarthal/mysql/issues/19 tracks the issue, though.
It's also worth noting that if you're running against a localhost DB, creating a unix socket connection on something like `/var/run/mysqld/mysqld.sock` is essentially free and you won't even need to use TLS. This helps ameliorate some of the need for a connection pool.
Now, in terms of a health-check, this is something the library could add to its `connection` class and have it function identically to how Beast has its websocket streams with their automatic ping/pong semantics.
I also saw in a previous email the idea of `void connection::ping()`. I think this should be added before acceptance. If we don't add automatic health-checks for a connection, we should at least give users a proper tool for solving that problem themselves.
https://github.com/anarthal/mysql/issues/62 tracks this. Just as a note, I have been playing with the official MySQL C API and with the mysql executable and I've grepped its source code and they don't seem to automatically use the ping command. They expose the protocol function as I proposed, too. They add an option to automatically reconnect the client when calling ping() (and any other function). That could be interesting but again, I don't think it's a common Asio pattern.
I will say, however, that it's actually common in Asio to handle reconnection and health-checking yourself. For example, I've delivered Asio-driven applications that communicated with serial ports and CANbus (using SocketCAN). These connection types had reconnection and timeout logic that I manually implemented. These things seem daunting at first but become second-nature once you're accustomed to Asio. Even moreso now that Asio's awaitable types implement `operator||`.
As for fail-over, I can't comment too directly as it's something I've never had to deal with in my career. On first thought, I don't know if it'd be exactly the most difficult thing to get a connection to realize it's disconnected and then reconnect to another database from a known whitelist. I'm also under the impression that services like AWS' RDS will simply give you a static hostname or IP address to connect against which acts as a proxy and then the fail-over happens in the background. Maybe I misunderstand the intent of what "fail-over" is so I could simply be uninformed with an opinion here.
From what I've read about this topic (I'm far from an expert here, unfortunately), MySQL/MariaDB implement the following replication
options: * Basic replication (MySQL and MariaDB). This is a source/replica architecture, where a single source node updates multiple replica nodes. This is the oldest form of replication, failover must be explicitly invoked via a CHANGE REPLICATION SOURCE TO command, and implies that clients must know each specific node and failover manually. Docs: https://dev.mysql.com/doc/refman/8.0/en/replication.html * Group replication (MySQL). This features a group of MySQL servers and a membership service that manages the group automatically. There is usually a single primary server (handling reads and writes) and several replica servers, handling only reads (although other configurations are possible). The membership service gives you automatic failover (i.e. when a node fails, it's marked as failed and removed from the group. If it was a primary, a new primary is automatically elected). However, clients need to know the available nodes and switch manually to a new node if the one they were connected to becomes unavailable. Docs: https://dev.mysql.com/doc/refman/8.0/en/group-replication.html * InnoDB cluster (MySQL). This is built on top of group replication, and provides a higher level abstraction - it's easier to manage. As a client, you still need to know which host to connect to and fail over manually. Docs: https://dev.mysql.com/doc/mysql-shell/8.0/en/mysql-innodb-cluster.html * InnoDB ReplicaSet (MySQL). This is similar to InnoDB cluster, but built on top of basic replication. In general, documentation suggests that we should prefer InnoDB cluster over InnoDB ReplicaSet. Docs: https://dev.mysql.com/doc/refman/8.0/en/mysql-innodb-replicaset-introduction... * MySQL router (MySQL). This is essentially a load balancer that can sit before InnoDB clusters or InnoDB ReplicaSets, to which clients can connect to. The router exposes the same protocol interface as a regular MySQL server, and forwards packets to the actual servers in the cluster. There are usually two open ports in the router: one for read/write connections (which should be forwarded to primaries) and other for read-only connections (which should be forwarded to secondaries). This can be achieved by this library without problems. When a server fails over, the router closes the connection to the client. The client should then attempt to re-open the client to the router, as if a network disconnect took place. The router will then forward traffic to a different MySQL server. To sum up, users using MySQL Router need only implement reconnection logic, and will get failover handling for free. Docs on failover: https://dev.mysql.com/doc/mysql-router/8.0/en/mysql-router-general-using-dev... Docs on router: https://dev.mysql.com/doc/mysql-router/8.0/en/ * NDB cluster (MySQL). This uses a different storage engine (NDB vs. traditional InnoDB), and can be accessed either as a regular MySQL server (i.e. using this or any other library) or a dedicated API (which should be more performant). It doesn't seem possible to put a MySQL Router instance before one of these clusters, so manual failover would be required. * Galera cluster (MySQL and MariaDB). This is third-party software, but seems to have been adopted as the default by MariaDB. Similar to InnoDB cluster. Docs in MariaDB: https://mariadb.com/kb/en/about-galera-replication/ Site: https://galeracluster.com/ * MariaDB MaxScale. This is similar to MySQL router, but for MariaDB. I don't know what MaxScale clients see when the server they are connected fails over, but it is likely that it works similarly as in MySQL Router. If you're using the official C API, you've got two options: * mysql_real_connect(), which is given a hostname and a port. This will perform regular hostname resolution and then connect to each returned host (same as boost::asio::connect() free function). * mysql_real_connect_dns_srv(), which is given the name of a DNS SRV record. It will retrieve a set of hostnames from that DNS query and then call mysql_real_connect() for each of these hosts. This could be a nice-to-have for replication, although I don't know if Asio supports DNS SRV queries. This is a pretty recent addition to the API. Disclaimer: I haven't tried these scenarios, only read the official docs. A deeper investigation in this topic, as well as some documentation, would be nice. I've raised https://github.com/anarthal/mysql/issues/93 to address this. I'd say an example on reconnection would be nice, and raised https://github.com/anarthal/mysql/issues/92 to address it.
For parsing directly into a user-defined struct, this is always something nice to have but it shouldn't be a requirement as C++ offers no standardized reflection interface so it's unreasonable to expect this library to accomodate that. This was similarly debated at length during the JSON review and we've learned that a library can still be good while not parsing directly to a structure.
My idea here would be adding support for std::tuple and structs using Boost.Describe, as explained in https://github.com/anarthal/mysql/issues/60.
----------------- Personal Opinions -----------------
There's a school of thought that only the lowest-level components should be standardized on a first-pass. Higher-level components can always be built on top of solid primitives. I think Boost.MySQL creates this set of primitives and I think it succeeds.
From a connection, you derive prepared statements or queries. From those, you derive a resultset which is an interface for reading what the DB writes back. Rows derive from the resultset and then values derive from the rows. The surface area of the library is delightfully small and well-contained.
I've had some discussions on the possibility of making prepared_statement and resultset not be I/O objects, and moving their I/O functions to the connection object. Doing this, you'd have simpler prepared_statement and resultset types, and a type-erased connection object would be simpler to implement. The interface would roughly go from the current: prepared_statement::execute(const params<ValueIterator>&) prepared_statement::close() resultset::read_one(row&) To these ones: connection::execute_statement(const prepared_statement&, const params<ValueIterator>&) connection::close_statement(const prepared_statement&) connection::read_row(resultset&, row&) Do you think this interface would still be intuitive for the user?
I think this library is "simple", for better or worse. I empathize with the side that expresses woes about this simplicity. I think perhaps I'm hardened by my own experiences with Asio so the pains of this library are nothing new to me. To this end, the simplicity of the library is a welcome breath of fresh air from my perspective. The for example, is actually small which is incredibly rare these days.
Thanks. That was one of my design goals.
There is one other issue I'd still like to discuss and it's the infamous usage of `std::vector` in interfaces. This is not a problem as a convenience interface as 98% of users are going to use `std::vector` anyway. But it does inhibit Allocators which is unsuitable for acceptance as a Boost library. Everywhere `std::vector` appears, there should be an accompanying version that works with a `Vector&` out-param, i.e.
my_company::vectormysql::row v; co_await resultset.async_read_all(v);
I don't think requiring `vector`-like operations on the type is problematic but preventing control over the allocation is.
This will definitely be addressed before the library gets into Boost. https://github.com/anarthal/mysql/issues/58 tracks it.
`error_info&` as an out-param is slightly awkward but I do appreciate that it at least enables us to read the messages about why our queries are wrong. This was my only real complaint about the ergonomics of the library. I would've expected to somehow pull this information from the `error_code` so using an `error_info&` out-param is clunky here.
That's a good shout. I don't like it either, but haven't found a better way to transmit a server-provided error message back to the user. Other than storing it in the connection and then calling something like GetLastError(), but it feels too "plain-old-C", and may be problematic for stuff like pipeline mode. Supporting an extra overload for every async function is also a pain. Any ideas are welcome.
I also don't particularly mind that the backing data of a `value` is tied to the lifetime of a `row` but I can see why this is confusing so perhaps a change in naming is required.
https://github.com/anarthal/mysql/issues/85 tracks it, proposed names are field (for owning values) and field_view (for non-owning ones).
--------- Summation ---------
In summary, the library is great, it's intuitive, it's idiomatic and I think it belongs in Boost aside other great Asio-based libraries like Beast. At some point, Boost will have its own LAMP/MEAN stack which is the dream.
I also realize, I never critiqued the implementation and I think that's because parsing/serializing perf isn't a priority for me and simply working correctly and being clean under valgrind are all that matter so long as the library isn't horrendously slow.
If anyone has different opinions on my take, I'd love to hear them.
- Christian
Thank you. Regards, Ruben.