On Thu, 19 May 2022 at 20:47, Julien Blanc via Boost
My (small) review of Boost MySQL.
Hi Julien, thanks for taking your time to write your review.
First, i want to apologize for this not to be a real review. I just could not find the time to make a proper one, so this is just some thoughts that i want to share in the hope they are useful.
======== Are you knowledgeable about the problem domain? ========
Not really. I've used mySQL in the far past, not from C++, and more recently used postgreSQL (via libpq) and sqlite3. I'm not unfamiliar with asio, but never used it outside very small projects.
======== How much effort did you put into your evaluation? A glance? A quick reading? In-depth study? ========
A quick reading. My review is solely based on reading the documentation (which, by the way, is pretty good).
======== Will the library bring additional out-of-the-box utility to Boost? ========
That's a great YES. Definitely useful. I lacked such a library a few years ago (was for postgreSQL, but the issue is the same).
Glad to hear.
======== What is your evaluation of the implementation? ========
Can't tell, i did not look at it.
======== What is your evaluation of the documentation? ========
Pretty good. I stumbled several times while navigating into a « This Page Intentionally Left Blank » page. I have no doubt that this will be fixed. Just a few issues i noted :
Could you please open an issue against the repository pointing out where you found these?
* the documentation on encodings/collation could probably be improved. From the current state of the documentation, if i have a table with three columns, one in utf8, the other one in utf-16, and the last one in iso8859-15, i'm not sure what the returned string_view contents will be. Is this the raw data ? And if i'm filtering in a WHERE clause, what is the expected encoding of the values ? The connection encoding or the field encoding ? Is this different between prepared statements and litteral queries ? (i have the intuition that it is, from a quick look at MySQL doc).
The MySQL to C++ type reference briefly talks about this (https://anarthal.github.io/mysql/mysql/types.html). The string_view contents will be the raw bytes in all cases. In your example, the first string_view will be utf8 encoded, the second utf16 encoded, and the third one will be in iso8859-15. The WHERE clause is part of the SQL statement you're issuing, so it should be encoded using the connection's encoding (as given by connection_params::collation). I agree the documentation on this topic is scarce, and it's a common source of confusion for users. I've raised https://github.com/anarthal/mysql/issues/94 to tackle this.
* the same goes for date storage and time zone handling. I can live with the fact that the library only handles UTC dates, if that's correctly handled. I have had very bad experiences with postgreSQL trying to be smart and causing more pain than good when handling timezones. I don't know how bad mysql is in this regard, but I'm pretty sure there are some pitfalls here as well. While i understand this may be a bit out of scope for the library, it does provides conversion between mysql raw data and a c++ time_point, so it should at least make sure that this time_point is an UTC one. Given that some mysql date/time types are local or encode a time zone, i expect some issues to raise here.
This is explained pretty thoroughly here: https://anarthal.github.io/mysql/mysql/types.html. MySQL DATETIME fields represent "naive" datetime objects (similar to Python's datetimes with tzinfo=None). It's up to the user to interpret which time zone they are in. MySQL TIMESTAMP fields also represent datetime objects. However, these are timezone aware, but the timezone is not stored with the value. When you insert a value into a TIMESTAMP column, MySQL converts the timestamp into UTC and stores it as UTC. When you query the field, by default, it will convert the timestamp back to your local timezone. "Your timezone", in this context, means the timezone given by the session-local time_zone variable (https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_...). The time_point object you get, in this case, is a local time (and NOT UTC), and the time zone is given by the time_zone variable. I haven't tried to be smart and convert to UTC because the time_zone variable can change programmatically, by issuing a `SET` SQL command, so its contents can't be tracked reliably. Full MySQL docs here: https://dev.mysql.com/doc/refman/8.0/en/datetime.html. It may be worth adding an example on how to work with these types. https://github.com/anarthal/mysql/issues/95 tracks it.
======== Will the choice of API abstraction model ease the development of software that must talk to a MySQL database? ========
I believe yes, however, i have a few concerns with the API :
* the value variant type has a get_optional member (and the get_std_optional variant). I would prefer that it follows the std variant get_if pattern. That would also solve the need to handle both std/boost optional.
These were chosen because of conversions. So the following works:
value v (uint64_t(200));
auto opt = v.get
* i'm not at ease with the fact that value get and get_optional handles the same way a NULL value and a type mismatch. I would like to be able to write get
(), and get an empty optional if value is NULL, and an exception if value is a float. get_if would cause implementation issues, but i think it's just not worth it and we can live without (a failure in that case means a database schema and program mistake, so an exception is fine here). Since the user would be in charge of providing the optional type, that may also solve the issue of handling both optional types.
Thanks for offering a possible solution to this problem, as many have flagged it already. I've updated https://github.com/anarthal/mysql/issues/85 with your proposal.
* for resultset, the async_read_all/many completion handlers takes a vector<row>. I would prefer if it were an opaque iterable type (whose iteration element is a row). This would probably gives far more room for further optimizations without breaking user code. * similarly, the row type exposes the fields via a vector of value. I would prefer if it would itself be iterable (iteration element being a value). IMHO it should also be "visitable", ie something like row.for_each(visitor), visitor being a class with overloaded either operator()(int field_index, int/float/string_view/...) or operator()(string_view field_name, int/float/string_view/...). That would also allow gives more room
This will be addressed as part of https://github.com/anarthal/mysql/issues/58, which will be tackled before inclusion into Boost. I like your idea, and other users have proposed that I use custom ranges here, too, which I will likely do. I'm not convinced of the row visiting mechanism - I would say this is mixing concerns. I would say the rows should offer a mechanism to access the values (e.g. by index or iteration), and then the values should be visitable.
* the library define its own « days » type, and explicitly states that it may not be the same as c++20 days. I may have missed the reason (they seems equally defined), but i would like it to use c++20 days when compiled under a c++20 compiler.
std::chrono::days doesn't specify which integer type is used to represent the duration, it only states that it is signed. I used plain int, which fits the purpose, but may be different to std's. The only possibility here would be to detect if std::chrono::days is available and define boost::mysql::days differently depending on whether it is or it is not. But this can then cause ODR violation issues if a user uses the library under two different C++ standard levels (possibly without even knowing it, because of a dependency). That's why this is defined like this.
======== Are there any immediate improvements that could be made after acceptance, if acceptance should happen? ========
I think the API can be improved. I think some API changes would allow some internal improvements. However, the library in its current state seems already useful. The overall impression from reading other reviews is that the quality of the implementation is very good.
My opinion is thus that the library should be ACCEPTED, but it should be made quite clear that the library is still in a state where some API changes can occur, and any form of source compatibility may be broken by update.
Thanks. I will try to address as many of these changes before it is available to the general public.
And great thanks to Ruben for writing this library, and taking the time to get it into boost.
Thank you for your time.
Regards,
Julien
Regards, Ruben.