Re: [boost] Boost MySQL Review

18 May 2022

      My review of Boost Mysql

I'll loosely go by the guidelines
[here](https://www.boost.org/community/reviews.html#Comments)

# What is your evaluation of the potential usefulness of the library?

Highly useful. 

In fact, my feeling was mostly "where was this library back in 2016". Back then
we had to postpone necessary refactoring with big scalability benefits because
I wasn't ready to do the async initiation function myself, or figure out how
the mysql native client wants to do asynchrony.

# What is your evaluation of the design?

It is largely what I'd expect.

I might consider dropping the `*read_many` interfaces which has been been
subject of discussion (mainly for returning `vector<row>`). I guess they could
also be trivially improved by taking a reference to any mutable range of `row`
objects instead (obviating the need to specify a count).

In my use, I find a "cursor-style" approach to consuming results (reusing
the row buffer) the most natural way to async consume result-sets. Hence
`read_one()` and perhaps `read_all()` would be the only interfaces I like to
see.

I'll probably have some more notes below.

# What is your evaluation of the implementation?

Pleasantly surprised.

A good example is when I zoomed in on the "generic execute" algorithm
implementation. It looks completely readable, which is quite an accomplishment
considering the domain.

Little cues picked up elsewhere (e.g. in tests) confirmed the impression.

This will positively affect both technical the maintainability and chances of
community interest to support that.

Also, I think given a clean state of the implementation, the library could
serve as a inspiration for people wishing to implement other protocols. 

# What is your evaluation of the documentation?

The documentation gave me a good idea of what the library does and doesn't do.

I'm not convinced of the "user-friendliness" angle. That's fine for me. It
seems obvious that a library that implements a third-party wire-protocol from
scratch is not focusing on high-level abstraction. It is good to manage the
expectations in the documentation, though.

In particular, a number of features seem more user-hostile.

This is felt e.g. in the (lack) of user-defined conversion support and
interoperability (see below for concrete examples with e.g. `mysql::value`).

I have to mention the statically parameterized class hierarchy. To be clear, I
wouldn't want it any other way, but to end-users it does *not* seem
user-friendly: they have to template a good part of their DB layer on the
chosen stream type if they just want to e.g. optionally support SSL and UNIX
domain socket access. Most ecosystems would consider it standard to have a
connection-string representation with ditto parser that can dynamically spawn
the right implementation types.

I also feel the situation can be improved by providing the conventional nested
type names (see below)

# Did you try to use the library? With what compiler? Did you have any problems?
# How much effort did you put into your evaluation? A glance? A quick reading? In-depth study?

I looked at this library last year around March, and again this week. Both times
I spent roughly 4 hours exercising the interfaces and accomplishing typical
tasks. This time around I spent also some time reading the docs.

I used my own existing databases (MySql 5.7), mainly asynchronously, focusing
on the `read_one()` 'cursor` approach, several Stream protocols and
simultaneous connections. 

I tried some DDL tasks and insertions vs. auto-increment fields.

I used GCC 10 on linux with CMake. I found I had to add the include path, which
doesn't seem mentioned in the documentation:

    INCLUDE_DIRECTORIES(${CMAKE_SOURCE_DIR}/deps/mysql/include/)

The docs (mysql.intro.requirements) gave me the impression that nothing should
have been required beyond:

    TARGET_LINK_LIBRARIES(reviewtest Boost::mysql)

# Are you knowledgeable about the problem domain?

I have experience with MySql, Asio and related libraries.

As alluded to before I've stopped short of creating an Asio-based async
interface for MySql some years ago, when we were migrating a distributed server
application to Asio (making the entirity of RPC paths non-blocking coroutines
except for the database operations). We stopped short lack of time of time and
experience.

We've assessed Amy (https://github.com/liancheng/amy) around that time - which
didn't seem ready for primetime.

Instead I've invested time using mysql client lib as well as MySQL++ wrapper to
get ORM-mapping to domain types from Mysql resultsets in that application. If
anything, I respect that this library doesn't try to provide these
functionalities (do one thing and do it well).

I have experience with using a variety of other databases (mainly Oracle,
MSSQL) in different tech stacks (.NET), which also adds some perspective.

# Do you think the library should be accepted as a Boost library?

I vote for acceptance. I think the library has a good foundation, focus and
implementation quality. 

There are areas for improvements. Some of these will involve breaking interface
changes, which would be best addressed before acceptance.

Notes from review process:

 1. The docs/examples need to show common usage patterns like INSERT with
    auto-increment fields. E.g. nothing the use of `last_insert_id()`. 

 2. The relational operations on mysql::value have technical meaning only. While
    the docs call this out, it forms a major pitfall.

         struct MyEntity {
             unsigned id;
         } entity{42};

         try { conn.query("CREATE TABLE MyEntity AS SELECT 42 as id"); } catch (...) { }
         auto actual = conn.query("SELECT id from MyEntity").read_all().front().values().front();

         M::value expected(entity.id);
         std::cout << actual << " == " << expected << "? " << std::boolalpha
                   << (actual == expected) << "\n";

    I'd say we need to remove the relational operations whole-sale OR implement them with Mysql semantics

         SELECT CAST(42 AS unsigned) = 42; -- MySQL: TRUE

    If the comparisons are required for internal purposes, they can be provided
    with custom comparator function objects.   

 3. The documentation claim "Values do not define `std::hash` yet, as
    `boost::string_view` does not support it" is out of date. `hash_value` has
    been added in 1.69.0

 4. On that note, consider `boost::core::string_view` for better interop
    ("lingua franca" string_view). If you're interested I just did an exhaustive
    side-by-side comparison of features and observable behaviour` which I can
    share.

 5. Others have mentioned multi-line queries. I'm wondering whether they add
    much seeing that sessions are already stateful:

         conn.query("set @greeting = 0x68656c6c6f20776f726c64;");

         std::cout << "retained: " << conn.query("SELECT @greeting;").read_all()[0].values()[0] << '\n'; 
         std::cout << "retained: " << conn.query("SELECT @greeting;").read_all()[0].values()[0] << '\n'; 

         conn.query("set @greeting = 0x62796520776f726c64;");
         std::cout << "modified: " << conn.query("SELECT @greeting;").read_all()[0].values()[0] << '\n'; 

    prints

         retained: hello world
         retained: hello world
         modified: bye world

    In fact it could be enough when `channel<>` does any required queuing.

    Unless queries are tiny and fragmented, there may not be a significant
    benefit. In fact, coalescing queries into a valid multi-line text form leads
    to   allocation/copying overhead - and potential new safety risks.

 6. The /one/ area where multi-line can shine is when sourcing hard-coded
    scripts, which begs the question: should a
    `connection::source(path/sqlscript)` feature be added instead?

 7. "Finally, note that `prepared_statement`'s destructor does not perform any
    server-side deallocation of the statement. This is because closing a
    statement involves a network operation that may block your code or fail"

    This tells me that probably connections should contain a list of owned
    statement implementations that `prepared_statements` are a handle to.

    This is important when connection pooling, so prepared statements can be
    cleared before returning a connection to the pool. Otherwise this is simply
    a resource leak, and that's a problem.

    (Though some applications might prefer to give each connection in the pool
    a fixed set of statements. This doesn't suit all applications -
    heterogenous connection pools)

 8. [mysql.other_streams.connection] If Windows named pipes is a common
    transport, I'd say providing first class connect/quit support is in order

 9. Working with the Stream-parameterized type hierarchy is harder than necessary.

    E.g. `connection` has no inner typedef for `prepared_statement`/`resultset`
    or even the `Stream` template argument. This leaves one with work-arounds
    like:

     template <typename Conn> void my_function(Conn& conn) {
         using Stream = typename std::remove_reference<decltype(conn.next_layer())>::type;
         using Stmt = M::prepared_statement<Stream>;

    In general interface types (arguments/return values of member functions)
    should be directly expressible in code, if not through nested typenames,
    then at least through traits.

 10. I did notice that in the integration tests there is a complete set of
     type-erased wrappers for resultset, connection, prepared_statement etc.
     Perhaps it is worth considering exposing this for users who are looking
     for convenience over raw speed.

    It could of course be that the type-erased wrappers have limitations
    (beyond efficiency) I'm not aware of.

 11. A loose observation: `initializer_list<mysql::value>` is not practical due
     to explicit constructors. 

     Of course, `make_values` facilitates that.

     (I'd keep the constructors explicit due to the fact that `mysql::value`
     has some surprising properties that make it bad idea to use as incidental
     type (ref strings and non-obvious relational ops, to name two))

 12. Seeing that Boost Mysql is header-only, what happens with the text
     segments (e.g. the error descriptions)?  Is it worth providing alternative
     linkage options - conditional header-only compilation?

     Note: I didn't measure anything here. It was just a thought that crossed
     my mind when I compared to the stock mysql client libs. (On the plus side,
     there's no per-thread `mysql_library_init` call anymore, yay!)

 13. tools/valgrind_suppressions.txt looks promising; I don't see where it is
     used. Is it up-to-date?

 14. I'm not going to repeat some of the observations made by others, re.
     specific interfaces. I'm confident that these are going to be addressed.

     I **do** want to provide counter-point for the recurring theme that
     `mysql::value`'s string element type should be owning. I disagree. To me
     the primary purpose of the `value` type is to be a lightweight view that
     has utility efficiently marshaling to/from the wire format.

     To me it is more important to keep the 99% of use patterns efficient.

     We can instead clearly document that `mysql::value` is not a vocabulary
     type (and why) and not well suited for use outside the context of IO
     operations.

Thanks Rubén for putting your considerable skill and time into this, and for
being very responsive.

Seth Heeren

On Sun, May 15, 2022, at 6:36 PM, Ruben Perez via Boost wrote:
...
Thanks to all of you who have found the time to write a review to this point.
I hope some more will be able to share your thoughts with us.
The process has already provided me with a lot of useful information.
I've done my best to answer all your messages and questions, but
the threads are sometimes difficult to follow. So if I've missed any
of your messages, please let me know, as it hasn't been intentional.
Many thanks,
Ruben.
On Sat, 14 May 2022 at 10:58, Richard Hodges via Boost
<boost@lists.boost.org> wrote:
...
Dear All,
We've had a good response for calls to review this much anticipated
library, and there is a lot of interest on Reddit:
https://www.reddit.com/r/cpp/comments/unik91/acceptance_review_for_boostmysq...
.
In general the public perception of utility libraries like this in Boost
seems popular, and Boost's emphasis on quality is often mentioned.
If you are able, I would be more than grateful if you could take a few
hours over this delightful weekend to offer a review. The more eyes on this
project the better, as if it is successful, it will no doubt prompt
submission of similar libraries targeting other popular database and
messaging systems.
The Boost formal review of the MySQL library started on May 10th, 2022 and
will conclude on May 19th, 2022 (inclusive) - In fact, I am able to accept
submissions up until the (GMT) morning of May 20th as I'll be traveling on
May19th.
The library is authored by Rubén Pérez Hidalgo (@anarthal in the CppLang
slack).
Documentation: https://anarthal.github.io/mysql/index.html
Source: https://github.com/anarthal/mysql/
The library is built on the bedrock of Boost.Asio and provides both
synchronous and asynchronous client connectors for the MySQL database
system.
Boost.MySQL is written from the ground up, implementing the entire protocol
with no external dependencies beyond the Boost library.
It is compatible with MariaDB.
Connectivity options include TCP, SSL and Unix Sockets.
For async interfaces, examples in the documentation demonstrate full
compatibility with all Asio completion handler styles, including:
Callbacks:-
https://anarthal.github.io/mysql/mysql/examples/query_async_callbacks.html
Futures :-
https://anarthal.github.io/mysql/mysql/examples/query_async_futures.html
Boost.Coroutine :-
https://anarthal.github.io/mysql/mysql/examples/query_async_coroutines.html
C++20 Coroutines :-
https://anarthal.github.io/mysql/mysql/examples/query_async_coroutinescpp20....
Rubén has also implemented the Asio protocols for deducing default
completion token types :-
https://anarthal.github.io/mysql/mysql/examples/default_completion_tokens.ht...
Reviewing a database connector in depth will require setting up an instance
of a MySQL database. Fortunately most (all?) Linux distributions carry a
MySQL and/or MariaDB package. MySQL community edition is available for
download on all platforms here:
https://dev.mysql.com/downloads/
Rubén has spent quite some time in order to bring us this library
candidate. The development process has no doubt been a journey of discovery
into Asio, its concepts and inner workings. I am sure he has become a fount
of knowledge along the way.
From a personal perspective, I was very happy to be asked to manage this
review. I hope it will be the first of many more reviews of libraries that
tackle business connectivity problems without further dependencies beyond
Boost, arguably one of the most trusted foundation libraries available.
Please provide in your review information you think is valuable to
understand your choice to ACCEPT or REJECT including Describe as a
Boost library. Please be explicit about your decision (ACCEPT or REJECT).
Some other questions you might want to consider answering:
- Will the library bring additional out-of-the-box utility to Boost?
  - What is your evaluation of the implementation?
  - What is your evaluation of the documentation?
  - Will the choice of API abstraction model ease the development of
software that must talk to a MySQL database?
  - Are there any immediate improvements that could be made after
acceptance, if acceptance should happen?
  - Did you try to use the library? With which compiler(s)? Did you
    have any problems?
  - How much effort did you put into your evaluation? A glance? A quick
    reading? In-depth study?
  - Are you knowledgeable about the problem domain?
More information about the Boost Formal Review Process can be found
at: http://www.boost.org/community/reviews.html
The review is open to anyone who is prepared to put in the work of
evaluating and reviewing the library. Prior experience in
contributing to Boost reviews is not a requirement.
Thank you for your efforts in the Boost community. They are very much
appreciated.
Richard Hodges
- review manager of the proposed Boost.MySQL library
Rubén is often available on CppLang Slack and of course by email should you
require any clarification not covered by the documentation, as am I.
--
Richard Hodges
hodges.r@gmail.com
office: +44 2032 898 513
home: +376 861 195
mobile: +376 380 212
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost