Re: [boost] Boost MySQL Review

18 May 2022

      Hi Seth,

Thank you for taking your time to write a review.

On Wed, 18 May 2022 at 04:43, Seth via Boost <boost@lists.boost.org> wrote:
...
My review of Boost Mysql
I'll loosely go by the guidelines
[here](https://www.boost.org/community/reviews.html#Comments)
# What is your evaluation of the potential usefulness of the library?
Highly useful.
In fact, my feeling was mostly "where was this library back in 2016". Back then
we had to postpone necessary refactoring with big scalability benefits because
I wasn't ready to do the async initiation function myself, or figure out how
the mysql native client wants to do asynchrony.
Glad to hear.
...
# What is your evaluation of the design?
It is largely what I'd expect.
I might consider dropping the `*read_many` interfaces which has been been
subject of discussion (mainly for returning `vector<row>`). I guess they could
also be trivially improved by taking a reference to any mutable range of `row`
objects instead (obviating the need to specify a count).
In my use, I find a "cursor-style" approach to consuming results (reusing
the row buffer) the most natural way to async consume result-sets. Hence
`read_one()` and perhaps `read_all()` would be the only interfaces I like to
see.
They will likely disappear or be changed somehow to allow memory re-use.
Tracked by https://github.com/anarthal/mysql/issues/58.
...
I'll probably have some more notes below.
# What is your evaluation of the implementation?
Pleasantly surprised.
A good example is when I zoomed in on the "generic execute" algorithm
implementation. It looks completely readable, which is quite an accomplishment
considering the domain.
Little cues picked up elsewhere (e.g. in tests) confirmed the impression.
This will positively affect both technical the maintainability and chances of
community interest to support that.
Also, I think given a clean state of the implementation, the library could
serve as a inspiration for people wishing to implement other protocols.
Thanks. Glad to hear.
...
# What is your evaluation of the documentation?
The documentation gave me a good idea of what the library does and doesn't do.
I'm not convinced of the "user-friendliness" angle. That's fine for me. It
seems obvious that a library that implements a third-party wire-protocol from
scratch is not focusing on high-level abstraction. It is good to manage the
expectations in the documentation, though.
It's something that has definitely caused confusion during the reviews,
so it may be worth rephrasing the docs to clearly state that this is a protocol
library which can't be as user-friendly as a higher level one.
https://github.com/anarthal/mysql/issues/70 tracks it.
...
I used GCC 10 on linux with CMake. I found I had to add the include path, which
doesn't seem mentioned in the documentation:
INCLUDE_DIRECTORIES(${CMAKE_SOURCE_DIR}/deps/mysql/include/)
The docs (mysql.intro.requirements) gave me the impression that nothing should
have been required beyond:
TARGET_LINK_LIBRARIES(reviewtest Boost::mysql)
This shouldn't be the case at all. The second line should be sufficient.
Could you please open an issue against the repo and post your
CMakeLists.txt, please?
...
# Do you think the library should be accepted as a Boost library?
I vote for acceptance. I think the library has a good foundation, focus and
implementation quality.
There are areas for improvements. Some of these will involve breaking interface
changes, which would be best addressed before acceptance.
Thanks. I will address all these potentially breaking changes before getting
the library into Boost (could take a while but necessary).
...
Notes from review process:
1. The docs/examples need to show common usage patterns like INSERT with
    auto-increment fields. E.g. nothing the use of `last_insert_id()`.
Seems fair. I've raised https://github.com/anarthal/mysql/issues/87
to track it.
...
2. The relational operations on mysql::value have technical meaning only. While
    the docs call this out, it forms a major pitfall.
struct MyEntity {
             unsigned id;
         } entity{42};
try { conn.query("CREATE TABLE MyEntity AS SELECT 42 as id"); } catch (...) { }
         auto actual = conn.query("SELECT id from MyEntity").read_all().front().values().front();
M::value expected(entity.id);
         std::cout << actual << " == " << expected << "? " << std::boolalpha
                   << (actual == expected) << "\n";
I'd say we need to remove the relational operations whole-sale OR implement them with Mysql semantics
SELECT CAST(42 AS unsigned) = 42; -- MySQL: TRUE
If the comparisons are required for internal purposes, they can be provided
    with custom comparator function objects.
The only reason they are there is to allow values to be used
in std::set, std::unordered_set and the like. As a user, I'd also find
useful operator==, at least, to know "is the type and value of these two
values the same"? The implementation doesn't rely on these operators.
Tests do, but can be changed easily.

Implementing MySQL semantics implies the following:

value("1") == value(1)  // true
value(1.0) == value(1)  // true

Which reminds me of the kind of patterns that have brought us
pain in PHP and JS, and would prefer to avoid. Do you think
"placing values in containers" is something that adds value?
...
3. The documentation claim "Values do not define `std::hash` yet, as
    `boost::string_view` does not support it" is out of date. `hash_value` has
    been added in 1.69.0
I can see a definition of hash_value, and from what I've read this makes
it compatible with Boost containers, but the std::hash specializations are
in a #ifdef 0 block. AFAIR std::hash doesn't use hash_value, and
std::unordered_set<boost::string_view> fails to compile. The same
applies for std::unordered_set<boost::mysql::value>.
...
4. On that note, consider `boost::core::string_view` for better interop
    ("lingua franca" string_view). If you're interested I just did an exhaustive
    side-by-side comparison of features and observable behaviour` which I can
    share.
I wasn't aware of this class. Doing a quick grep in Beast, I've found it does
use it, but it seems located in a detail/ folder
(<boost/core/detail/string_view.hpp>).
Could you please provide me with your comparison and any extra info I should
be aware of? (BTW std::unordered_set<boost::core::string_view> doesn't compile
either).
...
5. Others have mentioned multi-line queries. I'm wondering whether they add
    much seeing that sessions are already stateful:
conn.query("set @greeting = 0x68656c6c6f20776f726c64;");
std::cout << "retained: " << conn.query("SELECT @greeting;").read_all()[0].values()[0] << '\n';
         std::cout << "retained: " << conn.query("SELECT @greeting;").read_all()[0].values()[0] << '\n';
conn.query("set @greeting = 0x62796520776f726c64;");
         std::cout << "modified: " << conn.query("SELECT @greeting;").read_all()[0].values()[0] << '\n';
prints
retained: hello world
         retained: hello world
         modified: bye world
In fact it could be enough when `channel<>` does any required queuing.
Unless queries are tiny and fragmented, there may not be a significant
    benefit. In fact, coalescing queries into a valid multi-line text form leads
    to   allocation/copying overhead - and potential new safety risks.
Not following you here. What do you mean by channel<> doing any queueing?
It does have an internal buffer to re-use memory usage, but it doesn't implement
any sort of queuing.

I'd say that the pipeline interface
(https://github.com/anarthal/mysql/issues/76)
may be a better interface than multi-statement for this use case.
...
6. The /one/ area where multi-line can shine is when sourcing hard-coded
    scripts, which begs the question: should a
    `connection::source(path/sqlscript)` feature be added instead?
Agree. Raised https://github.com/anarthal/mysql/issues/88 to track it.
...
7. "Finally, note that `prepared_statement`'s destructor does not perform any
    server-side deallocation of the statement. This is because closing a
    statement involves a network operation that may block your code or fail"
This tells me that probably connections should contain a list of owned
    statement implementations that `prepared_statements` are a handle to.
This is important when connection pooling, so prepared statements can be
    cleared before returning a connection to the pool. Otherwise this is simply
    a resource leak, and that's a problem.
(Though some applications might prefer to give each connection in the pool
    a fixed set of statements. This doesn't suit all applications -
    heterogenous connection pools)
I'd say this is a problem specific to connection pooling and that should be
implemented together with it (i.e. with
https://github.com/anarthal/mysql/issues/19).
When connections are closed, statements are de-allocated automatically, even
if you didn't call prepared_statement::close explicitly (i.e. statements are
local to the session established by the connection). So this only becomes a
problem if you use the connection, prepare and execute some statements
within that session, and then return the connection to the pool.
I'd say to implement this as part of that issue.

Note that prepared_statement's get allocated a server-side ID, which
we expose via the prepared_statement::id() function. Something that
could be useful for that use case is closing a statement without a
prepared_statement object, just using its ID.

I've noted all this in https://github.com/anarthal/mysql/issues/19
to prevent information from being lost.
...
8. [mysql.other_streams.connection] If Windows named pipes is a common
    transport, I'd say providing first class connect/quit support is in order
It is a supported transport - not aware of how many clients would be using it.
There is two sides here:

1) Providing the relevant typedefs, docs and testing. This should be
relatively straightforward.
2) Supporting the connection::connect and connection::close
functions for named pipes. This involves opening a named pipe,
which doesn't seem to have an async mode. Other than that,
it should be doable.
...
9. Working with the Stream-parameterized type hierarchy is harder than necessary.
E.g. `connection` has no inner typedef for `prepared_statement`/`resultset`
    or even the `Stream` template argument. This leaves one with work-arounds
    like:
template <typename Conn> void my_function(Conn& conn) {
         using Stream = typename std::remove_reference<decltype(conn.next_layer())>::type;
         using Stmt = M::prepared_statement<Stream>;
In general interface types (arguments/return values of member functions)
    should be directly expressible in code, if not through nested typenames,
    then at least through traits.
Added the one for Stream as next_layer_type in develop. The other ones
addressed in https://github.com/anarthal/mysql/issues/73 (see below).
...
10. I did notice that in the integration tests there is a complete set of
     type-erased wrappers for resultset, connection, prepared_statement etc.
     Perhaps it is worth considering exposing this for users who are looking
     for convenience over raw speed.
It could of course be that the type-erased wrappers have limitations
    (beyond efficiency) I'm not aware of.
They are more limited than the regular interface, as they only account for
the use cases covered in the tests covering all the stream types. It should
be possible to tidy them up and ship them to the general public, though.
I hadn't really thought on this before.

Something I noticed while implementing these erased types is that
having prepared_statement and resultset be I/O objects makes
the erased interface much harder. As I've mentioned in other
review responses, I'm thinking on replacing the following signatures:

prepared_statement::execute(const statement_params<ValueIterator>&)
prepared_statement::close()
resultset::read_one(row&)

By functions in the connection:

connection::execute_statement(const prepared_statement&)
connection::close_statement(const prepared_statement&)
connection::read_row(resultset&, row&)

Where prepared_statement and resultset would be non-templated
types. What do you think of this API?
...
11. A loose observation: `initializer_list<mysql::value>` is not practical due
     to explicit constructors.
Of course, `make_values` facilitates that.
(I'd keep the constructors explicit due to the fact that `mysql::value`
     has some surprising properties that make it bad idea to use as incidental
     type (ref strings and non-obvious relational ops, to name two))
12. Seeing that Boost Mysql is header-only, what happens with the text
     segments (e.g. the error descriptions)?  Is it worth providing alternative
     linkage options - conditional header-only compilation?
Note: I didn't measure anything here. It was just a thought that crossed
     my mind when I compared to the stock mysql client libs. (On the plus side,
     there's no per-thread `mysql_library_init` call anymore, yay!)
It's something on the roadmap - that's why the implementation is split in .ipp
files. https://github.com/anarthal/mysql/issues/23 tracks it.
...
13. tools/valgrind_suppressions.txt looks promising; I don't see where it is
     used. Is it up-to-date?
If you look into cmake/test_utils.cmake, there is a function add_memcheck_test
that uses that file. All examples excepting the Boost.Coroutine one, as well
as a subset of integration tests, are memchecked. This is run in one CI job.
...
14. I'm not going to repeat some of the observations made by others, re.
     specific interfaces. I'm confident that these are going to be addressed.
I **do** want to provide counter-point for the recurring theme that
     `mysql::value`'s string element type should be owning. I disagree. To me
     the primary purpose of the `value` type is to be a lightweight view that
     has utility efficiently marshaling to/from the wire format.
To me it is more important to keep the 99% of use patterns efficient.
We can instead clearly document that `mysql::value` is not a vocabulary
     type (and why) and not well suited for use outside the context of IO
     operations.
Vinnie also made your same point, suggesting having an owning_value
as counterpart. The row type would expose the lightweight view,
and the user can then construct the owning_value from it.
https://github.com/anarthal/mysql/issues/85 tracks it.
...
Thanks Rubén for putting your considerable skill and time into this, and for
being very responsive.
Thanks.

Re: [boost] Boost MySQL Review

Ruben Perez