Re: [boost] Boost MySQL Review

19 May 2022

      On Thu, 19 May 2022 at 20:47, Julien Blanc via Boost
<boost@lists.boost.org> wrote:
...
My (small) review of Boost MySQL.
Hi Julien, thanks for taking your time to write your review.
...
First, i want to apologize for this not to be a real review. I just
could not find the time to make a proper one, so this is just some
thoughts that i want to share in the hope they are useful.
========
Are you knowledgeable about the problem domain?
========
Not really. I've used mySQL in the far past, not from C++, and more
recently used postgreSQL (via libpq) and sqlite3. I'm not unfamiliar
with asio, but never used it outside very small projects.
========
How much effort did you put into your evaluation? A glance? A quick
reading? In-depth study?
========
A quick reading. My review is solely based on reading the documentation
(which, by the way, is pretty good).
========
Will the library bring additional out-of-the-box utility to Boost?
========
That's a great YES. Definitely useful. I lacked such a library a few
years ago (was for postgreSQL, but the issue is the same).
Glad to hear.
...
========
What is your evaluation of the implementation?
========
Can't tell, i did not look at it.
========
What is your evaluation of the documentation?
========
Pretty good. I stumbled several times while navigating into a « This
Page Intentionally Left Blank » page. I have no doubt that this will be
fixed. Just a few issues i noted :
Could you please open an issue against the repository
pointing out where you found these?
...
* the documentation on encodings/collation could probably be improved.
From the current state of the documentation, if i have a table with
three columns, one in utf8, the other one in utf-16, and the last one
in iso8859-15, i'm not sure what the returned string_view contents will
be. Is this the raw data ? And if i'm filtering in a WHERE clause, what
is the expected encoding of the values ? The connection encoding or the
field encoding ? Is this different between prepared statements and
litteral queries ? (i have the intuition that it is, from a quick look
at MySQL doc).
The MySQL to C++ type reference briefly talks about this
(https://anarthal.github.io/mysql/mysql/types.html). The string_view
contents will be the raw bytes in all cases. In your example,
the first string_view will be utf8 encoded, the second utf16 encoded,
and the third one will be in iso8859-15. The WHERE clause is part
of the SQL statement you're issuing, so it should be encoded using
the connection's encoding (as given by connection_params::collation).

I agree the documentation on this topic is scarce, and it's a common
source of confusion for users. I've raised
https://github.com/anarthal/mysql/issues/94 to tackle this.
...
* the same goes for date storage and time zone handling. I can live
with the fact that the library only handles UTC dates, if that's
correctly handled. I have had very bad experiences with postgreSQL
trying to be smart and causing more pain than good when handling
timezones. I don't know how bad mysql is in this regard, but I'm pretty
sure there are some pitfalls here as well. While i understand this may
be a bit out of scope for the library, it does provides conversion
between mysql raw data and a c++ time_point, so it should at least make
sure that this time_point is an UTC one. Given that some mysql
date/time types are local or encode a time zone, i expect some issues
to raise here.
This is explained pretty thoroughly here:
https://anarthal.github.io/mysql/mysql/types.html.

MySQL DATETIME fields represent "naive" datetime objects (similar
to Python's datetimes with tzinfo=None). It's up to the user to interpret
which time zone they are in.

MySQL TIMESTAMP fields also represent datetime objects. However,
these are timezone aware, but the timezone is not stored with the value.
When you insert a value into a TIMESTAMP column, MySQL converts
the timestamp into UTC and stores it as UTC. When you query the field,
by default, it will convert the timestamp back to your local timezone.
"Your timezone", in this context, means the timezone given by the
session-local time_zone variable
(https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_...).
The time_point object you get, in this case, is a local time (and NOT UTC),
and the time zone is given by the time_zone variable. I haven't tried to be
smart and convert to UTC because the time_zone variable can change
programmatically, by issuing a `SET` SQL command, so its contents
can't be tracked reliably.

Full MySQL docs here: https://dev.mysql.com/doc/refman/8.0/en/datetime.html.

It may be worth adding an example on how to work with these types.
https://github.com/anarthal/mysql/issues/95 tracks it.
...
========
Will the choice of API abstraction model ease the development of
software that must talk to a MySQL database?
========
I believe yes, however, i have a few concerns with the API :
* the value variant type has a get_optional member (and the
get_std_optional variant). I would prefer that it follows the std
variant get_if pattern. That would also solve the need to handle both
std/boost optional.
These were chosen because of conversions. So the following works:

value v (uint64_t(200));
auto opt = v.get<int64_t>(); // This will perform a conversion because
200 fits in a int64_t

I need to reconsider this because many of you have raised it.
A get_if interface means that these conversions must go away
(or be reduced to just value::get).
...
* i'm not at ease with the fact that value get and get_optional handles
the same way a NULL value and a type mismatch. I would like to be able
to write get<optional<int>>(), and get an empty optional if value is
NULL, and an exception if value is a float. get_if<optional<int>> would
cause implementation issues, but i think it's just not worth it and we
can live without (a failure in that case means a database schema and
program mistake, so an exception is fine here). Since the user would be
in charge of providing the optional type, that may also solve the issue
of handling both optional types.
Thanks for offering a possible solution to this problem, as many have flagged it
already. I've updated https://github.com/anarthal/mysql/issues/85
with your proposal.
...
* for resultset, the async_read_all/many completion handlers takes a
vector<row>. I would prefer if it were an opaque iterable type (whose
iteration element is a row). This would probably gives far more room
for further optimizations without breaking user code.
* similarly, the row type exposes the fields via a vector of value. I
would prefer if it would itself be iterable (iteration element being a
value). IMHO it should also be "visitable", ie something like
row.for_each(visitor), visitor being a class with overloaded either
operator()(int field_index, int/float/string_view/...) or
operator()(string_view field_name, int/float/string_view/...). That
would also allow gives more room
This will be addressed as part of https://github.com/anarthal/mysql/issues/58,
which will be tackled before inclusion into Boost. I like your idea,
and other users have proposed that I use custom ranges here, too,
which I will likely do.

I'm not convinced of the row visiting mechanism - I would say this is
mixing concerns. I would say the rows should offer a mechanism to access
the values (e.g. by index or iteration), and then the values should be
visitable.
...
* the library define its own « days » type, and explicitly states that
it may not be the same as c++20 days. I may have missed the reason
(they seems equally defined), but i would like it to use c++20 days
when compiled under a c++20 compiler.
std::chrono::days doesn't specify which integer type is used to
represent the duration, it only states that it is signed. I used plain
int, which fits the purpose, but may be different to std's. The only
possibility here would be to detect if std::chrono::days is available
and define boost::mysql::days differently depending on whether it
is or it is not. But this can then cause ODR violation issues if
a user uses the library under two different C++ standard levels
(possibly without even knowing it, because of a dependency).
That's why this is defined like this.
...
========
Are there any immediate improvements that could be made after
acceptance, if acceptance should happen?
========
I think the API can be improved. I think some API changes would allow
some internal improvements. However, the library in its current state
seems already useful. The overall impression from reading other reviews
is that the quality of the implementation is very good.
My opinion is thus that the library should be ACCEPTED, but it should
be made quite clear that the library is still in a state where some API
changes can occur, and any form of source compatibility may be broken
by update.
Thanks. I will try to address as many of these changes before it is
available to the general public.
...
And great thanks to Ruben for writing this library, and taking the time
to get it into boost.
Thank you for your time.
...
Regards,
Julien
Regards,
Ruben.

Re: [boost] Boost MySQL Review

Ruben Perez