Hi Seth,
Thank you for taking your time to write a review.
On Wed, 18 May 2022 at 04:43, Seth via Boost
My review of Boost Mysql
I'll loosely go by the guidelines [here](https://www.boost.org/community/reviews.html#Comments)
# What is your evaluation of the potential usefulness of the library?
Highly useful.
In fact, my feeling was mostly "where was this library back in 2016". Back then we had to postpone necessary refactoring with big scalability benefits because I wasn't ready to do the async initiation function myself, or figure out how the mysql native client wants to do asynchrony.
Glad to hear.
# What is your evaluation of the design?
It is largely what I'd expect.
I might consider dropping the `*read_many` interfaces which has been been subject of discussion (mainly for returning `vector<row>`). I guess they could also be trivially improved by taking a reference to any mutable range of `row` objects instead (obviating the need to specify a count).
In my use, I find a "cursor-style" approach to consuming results (reusing the row buffer) the most natural way to async consume result-sets. Hence `read_one()` and perhaps `read_all()` would be the only interfaces I like to see.
They will likely disappear or be changed somehow to allow memory re-use. Tracked by https://github.com/anarthal/mysql/issues/58.
I'll probably have some more notes below.
# What is your evaluation of the implementation?
Pleasantly surprised.
A good example is when I zoomed in on the "generic execute" algorithm implementation. It looks completely readable, which is quite an accomplishment considering the domain.
Little cues picked up elsewhere (e.g. in tests) confirmed the impression.
This will positively affect both technical the maintainability and chances of community interest to support that.
Also, I think given a clean state of the implementation, the library could serve as a inspiration for people wishing to implement other protocols.
Thanks. Glad to hear.
# What is your evaluation of the documentation?
The documentation gave me a good idea of what the library does and doesn't do.
I'm not convinced of the "user-friendliness" angle. That's fine for me. It seems obvious that a library that implements a third-party wire-protocol from scratch is not focusing on high-level abstraction. It is good to manage the expectations in the documentation, though.
It's something that has definitely caused confusion during the reviews, so it may be worth rephrasing the docs to clearly state that this is a protocol library which can't be as user-friendly as a higher level one. https://github.com/anarthal/mysql/issues/70 tracks it.
I used GCC 10 on linux with CMake. I found I had to add the include path, which doesn't seem mentioned in the documentation:
INCLUDE_DIRECTORIES(${CMAKE_SOURCE_DIR}/deps/mysql/include/)
The docs (mysql.intro.requirements) gave me the impression that nothing should have been required beyond:
TARGET_LINK_LIBRARIES(reviewtest Boost::mysql)
This shouldn't be the case at all. The second line should be sufficient. Could you please open an issue against the repo and post your CMakeLists.txt, please?
# Do you think the library should be accepted as a Boost library?
I vote for acceptance. I think the library has a good foundation, focus and implementation quality.
There are areas for improvements. Some of these will involve breaking interface changes, which would be best addressed before acceptance.
Thanks. I will address all these potentially breaking changes before getting the library into Boost (could take a while but necessary).
Notes from review process:
1. The docs/examples need to show common usage patterns like INSERT with auto-increment fields. E.g. nothing the use of `last_insert_id()`.
Seems fair. I've raised https://github.com/anarthal/mysql/issues/87 to track it.
2. The relational operations on mysql::value have technical meaning only. While the docs call this out, it forms a major pitfall.
struct MyEntity { unsigned id; } entity{42};
try { conn.query("CREATE TABLE MyEntity AS SELECT 42 as id"); } catch (...) { } auto actual = conn.query("SELECT id from MyEntity").read_all().front().values().front();
M::value expected(entity.id); std::cout << actual << " == " << expected << "? " << std::boolalpha << (actual == expected) << "\n";
I'd say we need to remove the relational operations whole-sale OR implement them with Mysql semantics
SELECT CAST(42 AS unsigned) = 42; -- MySQL: TRUE
If the comparisons are required for internal purposes, they can be provided with custom comparator function objects.
The only reason they are there is to allow values to be used in std::set, std::unordered_set and the like. As a user, I'd also find useful operator==, at least, to know "is the type and value of these two values the same"? The implementation doesn't rely on these operators. Tests do, but can be changed easily. Implementing MySQL semantics implies the following: value("1") == value(1) // true value(1.0) == value(1) // true Which reminds me of the kind of patterns that have brought us pain in PHP and JS, and would prefer to avoid. Do you think "placing values in containers" is something that adds value?
3. The documentation claim "Values do not define `std::hash` yet, as `boost::string_view` does not support it" is out of date. `hash_value` has been added in 1.69.0
I can see a definition of hash_value, and from what I've read this makes it compatible with Boost containers, but the std::hash specializations are in a #ifdef 0 block. AFAIR std::hash doesn't use hash_value, and std::unordered_setboost::string_view fails to compile. The same applies for std::unordered_setboost::mysql::value.
4. On that note, consider `boost::core::string_view` for better interop ("lingua franca" string_view). If you're interested I just did an exhaustive side-by-side comparison of features and observable behaviour` which I can share.
I wasn't aware of this class. Doing a quick grep in Beast, I've found it does
use it, but it seems located in a detail/ folder
(
5. Others have mentioned multi-line queries. I'm wondering whether they add much seeing that sessions are already stateful:
conn.query("set @greeting = 0x68656c6c6f20776f726c64;");
std::cout << "retained: " << conn.query("SELECT @greeting;").read_all()[0].values()[0] << '\n'; std::cout << "retained: " << conn.query("SELECT @greeting;").read_all()[0].values()[0] << '\n';
conn.query("set @greeting = 0x62796520776f726c64;"); std::cout << "modified: " << conn.query("SELECT @greeting;").read_all()[0].values()[0] << '\n';
prints
retained: hello world retained: hello world modified: bye world
In fact it could be enough when `channel<>` does any required queuing.
Unless queries are tiny and fragmented, there may not be a significant benefit. In fact, coalescing queries into a valid multi-line text form leads to allocation/copying overhead - and potential new safety risks.
Not following you here. What do you mean by channel<> doing any queueing? It does have an internal buffer to re-use memory usage, but it doesn't implement any sort of queuing. I'd say that the pipeline interface (https://github.com/anarthal/mysql/issues/76) may be a better interface than multi-statement for this use case.
6. The /one/ area where multi-line can shine is when sourcing hard-coded scripts, which begs the question: should a `connection::source(path/sqlscript)` feature be added instead?
Agree. Raised https://github.com/anarthal/mysql/issues/88 to track it.
7. "Finally, note that `prepared_statement`'s destructor does not perform any server-side deallocation of the statement. This is because closing a statement involves a network operation that may block your code or fail"
This tells me that probably connections should contain a list of owned statement implementations that `prepared_statements` are a handle to.
This is important when connection pooling, so prepared statements can be cleared before returning a connection to the pool. Otherwise this is simply a resource leak, and that's a problem.
(Though some applications might prefer to give each connection in the pool a fixed set of statements. This doesn't suit all applications - heterogenous connection pools)
I'd say this is a problem specific to connection pooling and that should be implemented together with it (i.e. with https://github.com/anarthal/mysql/issues/19). When connections are closed, statements are de-allocated automatically, even if you didn't call prepared_statement::close explicitly (i.e. statements are local to the session established by the connection). So this only becomes a problem if you use the connection, prepare and execute some statements within that session, and then return the connection to the pool. I'd say to implement this as part of that issue. Note that prepared_statement's get allocated a server-side ID, which we expose via the prepared_statement::id() function. Something that could be useful for that use case is closing a statement without a prepared_statement object, just using its ID. I've noted all this in https://github.com/anarthal/mysql/issues/19 to prevent information from being lost.
8. [mysql.other_streams.connection] If Windows named pipes is a common transport, I'd say providing first class connect/quit support is in order
It is a supported transport - not aware of how many clients would be using it. There is two sides here: 1) Providing the relevant typedefs, docs and testing. This should be relatively straightforward. 2) Supporting the connection::connect and connection::close functions for named pipes. This involves opening a named pipe, which doesn't seem to have an async mode. Other than that, it should be doable.
9. Working with the Stream-parameterized type hierarchy is harder than necessary.
E.g. `connection` has no inner typedef for `prepared_statement`/`resultset` or even the `Stream` template argument. This leaves one with work-arounds like:
template <typename Conn> void my_function(Conn& conn) { using Stream = typename std::remove_reference
::type; using Stmt = M::prepared_statement<Stream>; In general interface types (arguments/return values of member functions) should be directly expressible in code, if not through nested typenames, then at least through traits.
Added the one for Stream as next_layer_type in develop. The other ones addressed in https://github.com/anarthal/mysql/issues/73 (see below).
10. I did notice that in the integration tests there is a complete set of type-erased wrappers for resultset, connection, prepared_statement etc. Perhaps it is worth considering exposing this for users who are looking for convenience over raw speed.
It could of course be that the type-erased wrappers have limitations (beyond efficiency) I'm not aware of.
They are more limited than the regular interface, as they only account for the use cases covered in the tests covering all the stream types. It should be possible to tidy them up and ship them to the general public, though. I hadn't really thought on this before. Something I noticed while implementing these erased types is that having prepared_statement and resultset be I/O objects makes the erased interface much harder. As I've mentioned in other review responses, I'm thinking on replacing the following signatures: prepared_statement::execute(const statement_params<ValueIterator>&) prepared_statement::close() resultset::read_one(row&) By functions in the connection: connection::execute_statement(const prepared_statement&) connection::close_statement(const prepared_statement&) connection::read_row(resultset&, row&) Where prepared_statement and resultset would be non-templated types. What do you think of this API?
11. A loose observation: `initializer_listmysql::value` is not practical due to explicit constructors.
Of course, `make_values` facilitates that.
(I'd keep the constructors explicit due to the fact that `mysql::value` has some surprising properties that make it bad idea to use as incidental type (ref strings and non-obvious relational ops, to name two))
12. Seeing that Boost Mysql is header-only, what happens with the text segments (e.g. the error descriptions)? Is it worth providing alternative linkage options - conditional header-only compilation?
Note: I didn't measure anything here. It was just a thought that crossed my mind when I compared to the stock mysql client libs. (On the plus side, there's no per-thread `mysql_library_init` call anymore, yay!)
It's something on the roadmap - that's why the implementation is split in .ipp files. https://github.com/anarthal/mysql/issues/23 tracks it.
13. tools/valgrind_suppressions.txt looks promising; I don't see where it is used. Is it up-to-date?
If you look into cmake/test_utils.cmake, there is a function add_memcheck_test that uses that file. All examples excepting the Boost.Coroutine one, as well as a subset of integration tests, are memchecked. This is run in one CI job.
14. I'm not going to repeat some of the observations made by others, re. specific interfaces. I'm confident that these are going to be addressed.
I **do** want to provide counter-point for the recurring theme that `mysql::value`'s string element type should be owning. I disagree. To me the primary purpose of the `value` type is to be a lightweight view that has utility efficiently marshaling to/from the wire format.
To me it is more important to keep the 99% of use patterns efficient.
We can instead clearly document that `mysql::value` is not a vocabulary type (and why) and not well suited for use outside the context of IO operations.
Vinnie also made your same point, suggesting having an owning_value as counterpart. The row type would expose the lightweight view, and the user can then construct the owning_value from it. https://github.com/anarthal/mysql/issues/85 tracks it.
Thanks Rubén for putting your considerable skill and time into this, and for being very responsive.
Thanks.