Boost MySQL Review Starts Today

Richard Hodges

10 May 2022 10 May '22

7:14 a.m.

Dear All, The Boost formal review of the MySQL library starts Today, taking place from May 10th, 2022 to May 19th, 2022 (inclusive) - We are starting one day after the announced date and extending the period by one day to compensate. The library is authored by Rubén Pérez Hidalgo (@anarthal in the CppLang slack). Documentation: https://anarthal.github.io/mysql/index.html Source: https://github.com/anarthal/mysql/ The library is built on the bedrock of Boost.Asio and provides both synchronous and asynchronous client connectors for the MySQL database system. Boost.MySQL is written from the ground up, implementing the entire protocol with no external dependencies beyond the Boost library. It is compatible with MariaDB. Connectivity options include TCP, SSL and Unix Sockets. For async interfaces, examples in the documentation demonstrate full compatibility with all Asio completion handler styles, including: Callbacks:- https://anarthal.github.io/mysql/mysql/examples/query_async_callbacks.html Futures :- https://anarthal.github.io/mysql/mysql/examples/query_async_futures.html Boost.Coroutine :- https://anarthal.github.io/mysql/mysql/examples/query_async_coroutines.html C++20 Coroutines :- https://anarthal.github.io/mysql/mysql/examples/query_async_coroutinescpp20.... Rubén has also implemented the Asio protocols for deducing default completion token types :- https://anarthal.github.io/mysql/mysql/examples/default_completion_tokens.ht... Reviewing a database connector in depth will require setting up an instance of a MySQL database. Fortunately most (all?) Linux distributions carry a MySQL and/or MariaDB package. MySQL community edition is available for download on all platforms here: https://dev.mysql.com/downloads/ Rubén has spent quite some time in order to bring us this library candidate. The development process has no doubt been a journey of discovery into Asio, its concepts and inner workings. I am sure he has become a fount of knowledge along the way.

...

From a personal perspective, I was very happy to be asked to manage this review. I hope it will be the first of many more reviews of libraries that tackle business connectivity problems without further dependencies beyond Boost, arguably one of the most trusted foundation libraries available.

Please provide in your review information you think is valuable to understand your choice to ACCEPT or REJECT including Describe as a Boost library. Please be explicit about your decision (ACCEPT or REJECT). Some other questions you might want to consider answering: - Will the library bring additional out-of-the-box utility to Boost? - What is your evaluation of the implementation? - What is your evaluation of the documentation? - Will the choice of API abstraction model ease the development of software that must talk to a MySQL database? - Are there any immediate improvements that could be made after acceptance, if acceptance should happen? - Did you try to use the library? With which compiler(s)? Did you have any problems? - How much effort did you put into your evaluation? A glance? A quick reading? In-depth study? - Are you knowledgeable about the problem domain? More information about the Boost Formal Review Process can be found at: http://www.boost.org/community/reviews.html The review is open to anyone who is prepared to put in the work of evaluating and reviewing the library. Prior experience in contributing to Boost reviews is not a requirement. Thank you for your efforts in the Boost community. They are very much appreciated. Richard Hodges - review manager of the proposed Boost.MySQL library Rubén is often available on CppLang Slack and of course by email should you require any clarification not covered by the documentation, as am I. -- Richard Hodges hodges.r@gmail.com tg: @rhodges office: +44 2032 898 513 mobile: +376 380 212

Show replies by date

Rainer Deyke

10 May 10 May

2:59 p.m.

On 10.05.22 09:14, Richard Hodges via Boost wrote:

...

The Boost formal review of the MySQL library starts Today, taking place from May 10th, 2022 to May 19th, 2022 (inclusive) - We are starting one day after the announced date and extending the period by one day to compensate. I took a quick look, and my first impression is that the library doesn't do enough to prevent SQL injection attacks. Yes, text queries are convenient when the full query is known at compile-time. Yes, security is ultimately the responsibility of those who use the API. Yes, this is C++, where far worse security flaws are a constant threat. Even so, connection::query gives me shivers.

-- Rainer Deyke (rainerd@eldwood.com)

Ruben Perez

8:14 p.m.

On Tue, 10 May 2022 at 17:00, Rainer Deyke via Boost <boost@lists.boost.org> wrote:

...

On 10.05.22 09:14, Richard Hodges via Boost wrote:

...
The Boost formal review of the MySQL library starts Today, taking place from May 10th, 2022 to May 19th, 2022 (inclusive) - We are starting one day after the announced date and extending the period by one day to compensate. I took a quick look, and my first impression is that the library doesn't do enough to prevent SQL injection attacks. Yes, text queries are convenient when the full query is known at compile-time. Yes, security is ultimately the responsibility of those who use the API. Yes, this is C++, where far worse security flaws are a constant threat. Even so, connection::query gives me shivers.

Hi Rainer, The library exposes the MySQL client/server protocol primitives as they are. There are two of these protocol primitives: text queries (implemented by connection::query) and prepared statements (implemented by connection::prepare_statement and the prepared_statement class). When using SQL queries with parameters, there are two options: you may compose a SQL query client-side and then execute it with connection::query, or prepare a statement and leave the server do the heavy-lifting. We could offer a SQL query composition function in this library, like: string compose_sql_query(string_view query_template, const ValueCollection& parameters); So you could then pass the result of compose_sql_query to connection::query. However, I think it's more secure (and efficient) to just rely on prepared statements. This section in the docs covers this topic: https://anarthal.github.io/mysql/mysql/prepared_statements.html To sum up, I'd use connection::query for queries known at compile-time (e.g. "START TRANSACTION"), and prepared statements for anything with parameters. Please let me know if there is any use case I may be missing. Regards, Ruben.

...

-- Rainer Deyke (rainerd@eldwood.com)

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Rainer Deyke

11 May 11 May

5:52 a.m.

On 10.05.22 16:59, Rainer Deyke via Boost wrote:

...

On 10.05.22 09:14, Richard Hodges via Boost wrote:

...
The Boost formal review of the MySQL library starts Today, taking place from May 10th, 2022 to May 19th, 2022 (inclusive) - We are starting one day after the announced date and extending the period by one day to compensate. I took a quick look, and my first impression is that the library doesn't do enough to prevent SQL injection attacks. Yes, text queries are convenient when the full query is known at compile-time. Yes, security is ultimately the responsibility of those who use the API. Yes, this is C++, where far worse security flaws are a constant threat. Even so, connection::query gives me shivers.

So, I've been thinking about what the library can do to prevent SQL injection attacks. Ideas: - connection::query can be renamed to connection::unsafe_query. - A big red warning about SQL injection attacks can be added to the documentation. - Syntax sugar for a one-off parametrized query wouldn't hurt either. - As a nuclear option, the query string can be changed into a template argument to prevent its use with strings that aren't known at compile-time. Unfortunately this would also prevent some valid uses of connection::query. If I were to design an API, I might use a combination of the above: // Allowed: con.query("SELECT * FROM employee WHERE company_id = ?", "HGS"); con.query<"SELECT * FROM employee WHERE company_id = 'HGS'">(); con.unsafe_query("SELECT * FROM employee WHERE company_id = 'HGS'"); // Error: literals in runtime-supplied query string not allowed. con.query("SELECT * FROM employee WHERE company_id = 'HGS'"); But I'm just throwing out ideas random here. -- Rainer Deyke (rainerd@eldwood.com)

Daniela Engert

6:26 a.m.

Am 11.05.2022 um 07:52 schrieb Rainer Deyke via Boost:

...

On 10.05.22 16:59, Rainer Deyke via Boost wrote:

...
...
The Boost formal review of the MySQL library starts Today, taking place from May 10th, 2022 to May 19th, 2022 (inclusive) - We are starting one day after the announced date and extending the period by one day to compensate. I took a quick look, and my first impression is that the library doesn't do enough to prevent SQL injection attacks. Yes, text queries are convenient when the full query is known at compile-time. Yes, security is ultimately the responsibility of those who use the API. Yes, this is C++, where far worse security flaws are a constant

On 10.05.22 09:14, Richard Hodges via Boost wrote: threat. Even so, connection::query gives me shivers.

So, I've been thinking about what the library can do to prevent SQL injection attacks. Ideas: - As a nuclear option, the query string can be changed into a template argument to prevent its use with strings that aren't known at compile-time. Unfortunately this would also prevent some valid uses of connection::query.

Instead of going *that* nuclear, there are better options now with compile-time string inspections. Instead of accepting a 'string_view' (or heavens forbid, 'string') as query string, do the same as the standard library (or {fmt}) does (please see P2216r3, C++23 and DR to C++20, and the {fmt} API documentation): * accept e.g. a boost::query_string object with a consteval-only constructor from a 'string_view' to enable compile-time inspection, similar to (not-yet-)std::/fmt::basic_format_string * and optionally, as a second overload, accept e.g. a boost::runtime_query_string which wraps a 'string_view' for queries which are not known at compile time, similar to fmt::runtime With that in place, you can do magic as I know from my own explorations of that design space. Ciao Dani

Ruben Perez

9:07 p.m.

On Wed, 11 May 2022 at 08:29, Daniela Engert via Boost <boost@lists.boost.org> wrote:

...

Am 11.05.2022 um 07:52 schrieb Rainer Deyke via Boost:

...
On 10.05.22 16:59, Rainer Deyke via Boost wrote:

...
...
The Boost formal review of the MySQL library starts Today, taking place from May 10th, 2022 to May 19th, 2022 (inclusive) - We are starting one day after the announced date and extending the period by one day to compensate. I took a quick look, and my first impression is that the library doesn't do enough to prevent SQL injection attacks. Yes, text queries are convenient when the full query is known at compile-time. Yes, security is ultimately the responsibility of those who use the API. Yes, this is C++, where far worse security flaws are a constant

On 10.05.22 09:14, Richard Hodges via Boost wrote: threat. Even so, connection::query gives me shivers.

So, I've been thinking about what the library can do to prevent SQL injection attacks. Ideas: - As a nuclear option, the query string can be changed into a template argument to prevent its use with strings that aren't known at compile-time. Unfortunately this would also prevent some valid uses of connection::query.

Instead of going *that* nuclear, there are better options now with compile-time string inspections. Instead of accepting a 'string_view' (or heavens forbid, 'string') as query string, do the same as the standard library (or {fmt}) does (please see P2216r3, C++23 and DR to C++20, and the {fmt} API documentation):

* accept e.g. a boost::query_string object with a consteval-only constructor from a 'string_view' to enable compile-time inspection, similar to (not-yet-)std::/fmt::basic_format_string

This library targets C++11 and higher. Can this be achieved realistically in C++11?

...

* and optionally, as a second overload, accept e.g. a boost::runtime_query_string which wraps a 'string_view' for queries which are not known at compile time, similar to fmt::runtime

With that in place, you can do magic as I know from my own explorations of that design space.

Ciao Dani

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Daniela Engert

12 May 12 May

5:09 a.m.

Am 11.05.2022 um 23:07 schrieb Ruben Perez:
> On Wed, 11 May 2022 at 08:29, Daniela Engert via Boost
> <boost@lists.boost.org> wrote:
>> Am 11.05.2022 um 07:52 schrieb Rainer Deyke via Boost:
>>> On 10.05.22 16:59, Rainer Deyke via Boost wrote:
>>>> On 10.05.22 09:14, Richard Hodges via Boost wrote:
>>>>> The Boost formal review of the MySQL library starts Today, taking place
>>>>> from May 10th, 2022 to May 19th, 2022 (inclusive) - We are starting
>>>>> one day
>>>>> after the announced date and extending the period by one day to
>>>>> compensate.
>>>> I took a quick look, and my first impression is that the library
>>>> doesn't do enough to prevent SQL injection attacks.  Yes, text
>>>> queries are convenient when the full query is known at compile-time.
>>>> Yes, security is ultimately the responsibility of those who use the
>>>> API.  Yes, this is C++, where far worse security flaws are a constant
>>>> threat.  Even so, connection::query gives me shivers.
>>> So, I've been thinking about what the library can do to prevent SQL
>>> injection attacks.  Ideas:
>>>    - As a nuclear option, the query string can be changed into a
>>> template argument to prevent its use with strings that aren't known at
>>> compile-time.  Unfortunately this would also prevent some valid uses
>>> of connection::query.
>> Instead of going *that* nuclear, there are better options now with
>> compile-time string inspections. Instead of accepting a 'string_view'
>> (or heavens forbid, 'string') as query string, do the same as the
>> standard library (or {fmt}) does (please see P2216r3, C++23 and DR to
>> C++20, and the {fmt} API documentation):
>>
>>     * accept e.g. a boost::query_string object with a consteval-only
>> constructor from a 'string_view' to enable compile-time inspection,
>> similar to (not-yet-)std::/fmt::basic_format_string
> This library targets C++11 and higher. Can this be achieved
> realistically in C++11?

I can't imagine a C++11 implementation of meaningful compile-time query 
checking. C++11 and 'constexpr' is like a toddler at doing maths.

Please don't get me wrong - I appreciate every effort to improve the 
library landscape. But if the bar is that low I have a hard time seeing 
the benefit of this library given that there's sql11 out there for years.

>>     * and optionally, as a second overload, accept e.g. a
>> boost::runtime_query_string which wraps a 'string_view' for queries
>> which are not known at compile time, similar to fmt::runtime
>>
>> With that in place, you can do magic as I know from my own explorations
>> of that design space.
>>
>> Ciao
>>     Dani
>>
>>
>> _______________________________________________
>> Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Vinnie Falco

6:29 a.m.

On Wed, May 11, 2022 at 10:09 PM Daniela Engert via Boost <boost@lists.boost.org> wrote:

...

...if the bar is that low I have a hard time seeing the benefit of this library given that there's sql11 out there for years.

Are you referring to this? <https://github.com/rbock/sqlpp11> Thanks

Daniela Engert

6:32 a.m.

Am 12.05.2022 um 08:29 schrieb Vinnie Falco:

...

On Wed, May 11, 2022 at 10:09 PM Daniela Engert via Boost <boost@lists.boost.org> wrote:

...
...if the bar is that low I have a hard time seeing the benefit of this library given that there's sql11 out there for years. Are you referring to this?

<https://github.com/rbock/sqlpp11>

Yes, I do. Ciao Dani

Ruben Perez

10:50 a.m.

...

I can't imagine a C++11 implementation of meaningful compile-time query checking. C++11 and 'constexpr' is like a toddler at doing maths.

Please don't get me wrong - I appreciate every effort to improve the library landscape. But if the bar is that low I have a hard time seeing the benefit of this library given that there's sql11 out there for years.

The main benefit of this library is providing the user an async API (plus being compatible with any user code already using Asio). The licensing also differs. sqlpp11 is a different kind of library: higher abstraction level, provides uniform access to different SQL flavors. It does not provide any async API though. If it did, Boost.MySQL could be used by sqlpp11 as a backend to access MySQL. What I mean is that I don't think that higher level libraries (like sqlpp11 or any ORM) are competitors, but can build on top of this library, instead.

Ruben Perez

11 May 11 May

9:06 p.m.

On Wed, 11 May 2022 at 07:52, Rainer Deyke via Boost <boost@lists.boost.org> wrote:

...

On 10.05.22 16:59, Rainer Deyke via Boost wrote:

...
On 10.05.22 09:14, Richard Hodges via Boost wrote:

...
The Boost formal review of the MySQL library starts Today, taking place from May 10th, 2022 to May 19th, 2022 (inclusive) - We are starting one day after the announced date and extending the period by one day to compensate. I took a quick look, and my first impression is that the library doesn't do enough to prevent SQL injection attacks. Yes, text queries are convenient when the full query is known at compile-time. Yes, security is ultimately the responsibility of those who use the API. Yes, this is C++, where far worse security flaws are a constant threat. Even so, connection::query gives me shivers.

So, I've been thinking about what the library can do to prevent SQL injection attacks. Ideas: - connection::query can be renamed to connection::unsafe_query.

That reminds me of React naming. It inspires me "don't use this function unless you're really knowledgeable of the library's internals and you know what you are doing". I don't think that's the case. For example, if you're using transactions, you'll be calling `conn.query("COMMIT")`. I'm not very keen on making this `conn.unsafe_query("COMMIT")`.

...

- A big red warning about SQL injection attacks can be added to the documentation.

I've raised https://github.com/anarthal/mysql/issues/66 to address it.

...

- Syntax sugar for a one-off parametrized query wouldn't hurt either.

This requires a decent amount of work, as it requires implementing SQL sanitizing client-side. I'm not very keen on it, as it's very possible to get it wrong and end up introducing a vulnerability that wouldn't have existed with prepared statements. I can have a look at how complex would this be if the community thinks it really adds a lot of value.

...

- As a nuclear option, the query string can be changed into a template argument to prevent its use with strings that aren't known at compile-time. Unfortunately this would also prevent some valid uses of connection::query.

As you already say, I think it's too nuclear - this is C++ and I think we should treat our users as adults.

...

If I were to design an API, I might use a combination of the above:

// Allowed: con.query("SELECT * FROM employee WHERE company_id = ?", "HGS"); con.query<"SELECT * FROM employee WHERE company_id = 'HGS'">(); con.unsafe_query("SELECT * FROM employee WHERE company_id = 'HGS'");

// Error: literals in runtime-supplied query string not allowed. con.query("SELECT * FROM employee WHERE company_id = 'HGS'");

But I'm just throwing out ideas random here.

-- Rainer Deyke (rainerd@eldwood.com)

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Rainer Deyke

12 May 12 May

5:59 a.m.

On 11.05.22 23:06, Ruben Perez via Boost wrote:

...

On Wed, 11 May 2022 at 07:52, Rainer Deyke via Boost <boost@lists.boost.org> wrote:

...
On 10.05.22 16:59, Rainer Deyke via Boost wrote:

...
I took a quick look, and my first impression is that the library doesn't do enough to prevent SQL injection attacks. Yes, text queries are convenient when the full query is known at compile-time. Yes, security is ultimately the responsibility of those who use the API. Yes, this is C++, where far worse security flaws are a constant threat. Even so, connection::query gives me shivers.

So, I've been thinking about what the library can do to prevent SQL injection attacks. Ideas: - connection::query can be renamed to connection::unsafe_query.

That reminds me of React naming. It inspires me "don't use this function unless you're really knowledgeable of the library's internals and you know what you are doing". I don't think that's the case. For example, if you're using transactions, you'll be calling `conn.query("COMMIT")`. I'm not very keen on making this `conn.unsafe_query("COMMIT")`.

Two possible workarounds: - Allow conn.query("COMMIT") but disallow conn.query("SELECT * FROM employee WHERE company_id = 'HGS'"), somehow. - Provide conn.commit() which calls conn.unsafe_query("COMMIT"). I'm not really too fond of either of them.

...

...
- Syntax sugar for a one-off parametrized query wouldn't hurt either.

This requires a decent amount of work, as it requires implementing SQL sanitizing client-side. I'm not very keen on it, as it's very possible to get it wrong and end up introducing a vulnerability that wouldn't have existed with prepared statements. I can have a look at how complex would this be if the community thinks it really adds a lot of value.

What if this worked by creating a prepared statement behind the scenes? The idea of a one-off query is that it is only used once per program invocation, so the performance cost of constructing a prepared statement should be negligible. -- Rainer Deyke (rainerd@eldwood.com)

Ruben Perez

10:57 a.m.

On Thu, 12 May 2022 at 07:59, Rainer Deyke via Boost <boost@lists.boost.org> wrote:

...

Two possible workarounds: - Allow conn.query("COMMIT") but disallow conn.query("SELECT * FROM employee WHERE company_id = 'HGS'"), somehow. - Provide conn.commit() which calls conn.unsafe_query("COMMIT"). I'm not really too fond of either of them.

I can see the second one. It's actually done this way in the Python DB2 API. Raised https://github.com/anarthal/mysql/issues/68 to track it.

...

...
...
- Syntax sugar for a one-off parametrized query wouldn't hurt either.

This requires a decent amount of work, as it requires implementing SQL sanitizing client-side. I'm not very keen on it, as it's very possible to get it wrong and end up introducing a vulnerability that wouldn't have existed with prepared statements. I can have a look at how complex would this be if the community thinks it really adds a lot of value.

What if this worked by creating a prepared statement behind the scenes? The idea of a one-off query is that it is only used once per program invocation, so the performance cost of constructing a prepared statement should be negligible.

I like the idea. Raised https://github.com/anarthal/mysql/issues/69 to track it.

...

-- Rainer Deyke (rainerd@eldwood.com)

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Phil Endecott

10 May 10 May

4:28 p.m.

Richard Hodges wrote:

...

Reviewing a database connector in depth will require setting up an instance of a MySQL database.

Here is how to set up a MySQL-compatible database in Amazon Web Services, in case it is useful to anyone: - Go to the AWS console and log on. Yes, you need an account. - Select an appropriate region. - Select "RDS" (Relational Database Service) from Services. - Select "Create database". - You have a choice of MySQL, MariaDB, and Amazon Aurora MySQL-compatible edition. I'm using Aurora. Subsequent settings may depend on the variant chosen. - I suggest turning on "Show versions that support Serverless v2" and choosing the one available version, currently "Aurora MySQL 3.02.0 (compatible with MySQL 8.0.23)". (Having said that, I don't think this qualifies for the free tier.) Skipping many settings where the defaults are OK and/or the choice is obvious... - In "Instance configuration", choose "Serverless" and set min and max memory sizes (I'm trying 0.5 and 2). - In "Connectivity", you can either choose "Public access - yes" in which case you'll be able to connect from a client outside AWS, or "Public access - no" and a VPC/subnet, in which case you'll be able to connect only from within AWS, e.g. from an EC2 instance. - Note that the "VPC security group" is essentially the firewall rules for the database. Choosing "create new" will open the default port 3306. Take care with this, it may have granted access only to the IP address of my web browser when I tried it just now. - I don't suggest letting it generate the password automatically, I don't think it told me what it was :-( You can change it later. - Click "Create database". - You should now see a simple tree of instances with a parent "Regional Cluster" with a single child "Writer Instance". Wait until both show Status Available. - Copy the "Endpoint name" for the "Writer instance". - On your client, try to connect using the endpoint name as the hostname: $ mysql -h xyz1234.region.rds.amazonaws.com -P 3306 -u admin Enter password: MySQL> (Beware of -p vs. -P on the command line there...) Now to see if I can make this library communicate with it! Regards, Phil.

Ruben Perez

8:19 p.m.

On Tue, 10 May 2022 at 18:29, Phil Endecott via Boost <boost@lists.boost.org> wrote:

...

Richard Hodges wrote:

...
Reviewing a database connector in depth will require setting up an instance of a MySQL database.

Here is how to set up a MySQL-compatible database in Amazon Web Services, in case it is useful to anyone:

Hi Phil, Thank you so much for the explanation, I'm sure it will be useful. In case you have docker available in your local machine, you can also make use of the Docker containers used by this library's CIs. There is a page in the docs explaining how to use them: https://anarthal.github.io/mysql/mysql/examples/setup.html In case anyone wants to run the integration test suite that is run by the CI, I'd recommend using these containers, as they contain the appropriate configuration for the tests. If you are curious, there is a guide on how to run them in the docs, too: https://anarthal.github.io/mysql/mysql/tests.html Please note that integration tests are *NOT* run by default when building the library (only unit tests are). If you have any questions on using these containers or running the tests, please let me know.

...

- Go to the AWS console and log on. Yes, you need an account.

- Select an appropriate region.

- Select "RDS" (Relational Database Service) from Services.

- Select "Create database".

- You have a choice of MySQL, MariaDB, and Amazon Aurora MySQL-compatible edition. I'm using Aurora. Subsequent settings may depend on the variant chosen.

- I suggest turning on "Show versions that support Serverless v2" and choosing the one available version, currently "Aurora MySQL 3.02.0 (compatible with MySQL 8.0.23)". (Having said that, I don't think this qualifies for the free tier.)

Skipping many settings where the defaults are OK and/or the choice is obvious...

- In "Instance configuration", choose "Serverless" and set min and max memory sizes (I'm trying 0.5 and 2).

- In "Connectivity", you can either choose "Public access - yes" in which case you'll be able to connect from a client outside AWS, or "Public access - no" and a VPC/subnet, in which case you'll be able to connect only from within AWS, e.g. from an EC2 instance.

- Note that the "VPC security group" is essentially the firewall rules for the database. Choosing "create new" will open the default port 3306. Take care with this, it may have granted access only to the IP address of my web browser when I tried it just now.

- I don't suggest letting it generate the password automatically, I don't think it told me what it was :-( You can change it later.

- Click "Create database".

- You should now see a simple tree of instances with a parent "Regional Cluster" with a single child "Writer Instance". Wait until both show Status Available.

- Copy the "Endpoint name" for the "Writer instance".

- On your client, try to connect using the endpoint name as the hostname:

$ mysql -h xyz1234.region.rds.amazonaws.com -P 3306 -u admin Enter password: MySQL>

(Beware of -p vs. -P on the command line there...)

Now to see if I can make this library communicate with it!

Please let me know if you encounter any trouble.

...

Regards, Phil.

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Klemens Morgenstern

11 May 11 May

5:13 a.m.

On Tue, 2022-05-10 at 09:14 +0200, Richard Hodges via Boost wrote: Here's my review

...

Reviewing a database connector in depth will require setting up an instance of a MySQL database. Fortunately most (all?) Linux distributions carry a MySQL and/or MariaDB package. MySQL community edition is available for download on all platforms here: https://dev.mysql.com/downloads/

I've spun up a small test program https://gist.github.com/klemens-morgenstern/0195a502340bd33029437d33eb914f2c which worked very well & was intuitive given my asio & sql experience. This library should definitely be accepted into boost. A few things are odd (like the .async_handshake on the ssl-conn; I would have expected that to require .next_layer().async_handshake). The ssl option also seems to be a runtime parameter in many functions, which confused me a bit. Is that so I can use an ssl::stream<> but turn it off at runtime? If so, why is that needed, why wouldn't I just us a tcp::socket directly? I think returning objects through completion handlers is absolutely fine and Asio-ish; but the library will surely grow to include more overloads in the future.

...

Some other questions you might want to consider answering:

- Will the library bring additional out-of-the-box utility to Boost?

Yes.

...

- What is your evaluation of the implementation?

It looks to be based on async_compose & asio::coroutine, which is a proven pattern . In addition, I am impressed and happy that it supports cancellation. The prepared statements are easy to use, so that one can prevent injection attacks properly.

...

- What is your evaluation of the documentation?

It desribes the library quite well. I would like a section on transactions, because I feel like there might be some sharp edges there.

...

- Will the choice of API abstraction model ease the development of software that must talk to a MySQL database?

It's exactly the right amount of detail for what it is: a database connector. It's not an ORM nor some wrapper that tries to prevent users from doing stupid things (like forwarding stdin to the DB). Either of those problems should be solved in another library that could be built on top of boost.mysql.

...

- Are there any immediate improvements that could be made after acceptance, if acceptance should happen?

I think additional overloads, depending on user requests. I also wonder if there is some useful transaction support, but this largely depends on the mysql protocol. It could be that there is no point to that.

...

- Did you try to use the library? With which compiler(s)? Did you have any problems?

Clang-13 with libc++ and C++20.

...

- How much effort did you put into your evaluation? A glance? A quick reading? In-depth study?

Build a test program and read the doc for ~1h.

...

- Are you knowledgeable about the problem domain?

Yes. I vote to accept the library into boost. PS: Isn't mysql a registered trademark owned by oracle?

Ruben Perez

8:44 p.m.

On Wed, 11 May 2022 at 07:13, Klemens Morgenstern via Boost <boost@lists.boost.org> wrote:

...

On Tue, 2022-05-10 at 09:14 +0200, Richard Hodges via Boost wrote:

Here's my review

...
Reviewing a database connector in depth will require setting up an instance of a MySQL database. Fortunately most (all?) Linux distributions carry a MySQL and/or MariaDB package. MySQL community edition is available for download on all platforms here: https://dev.mysql.com/downloads/

I've spun up a small test program https://gist.github.com/klemens-morgenstern/0195a502340bd33029437d33eb914f2c

which worked very well & was intuitive given my asio & sql experience.

This library should definitely be accepted into boost.

Thanks for your time writing your review and your positive comments.

...

A few things are odd (like the .async_handshake on the ssl-conn; I would have expected that to require .next_layer().async_handshake).

This is an unfortunate oddity of the MySQL client/server protocol. The protocol defines an initial, unencrypted set of messages exchanged by the client and the server, where they negotiate whether TLS will be used or not. Once this is decided, the TLS handshake is invoked. After that, the handshake operation continues with a set of encrypted messages until it either succeeds or fails. Having the user call .next_layer().async_handshake directly would make the handshake operation more complex; it would need to be split in two: bool requires_ssl = connection.handshake_part1(params); if (requires_ssl) connection.next_layer().handshake(); connection.handshake_part2(params); I thought the added complexity wasn't worth it.

...

The ssl option also seems to be a runtime parameter in many functions, which confused me a bit. Is that so I can use an ssl::stream<> but turn it off at runtime? If so, why is that needed, why wouldn't I just us a tcp::socket directly?

This option controls TLS negotiation. By specifying ssl_mode::enable, you allow the client to not use TLS if the server doesn't support it. The default, ssl_mode::require, will make the handshake fail if the server doesn't support TLS, which is the most secure option. ssl_mode::disable is provided for consistency with the official mysql command line client. It isn't really required, and if there is consensus that it causes more confusion than help, I'm happy to remove it. Docs: https://anarthal.github.io/mysql/mysql/connparams.html#mysql.connparams.ssl

...

I think returning objects through completion handlers is absolutely fine and Asio-ish; but the library will surely grow to include more overloads in the future.

...
Some other questions you might want to consider answering:

- Will the library bring additional out-of-the-box utility to Boost?

Yes.

...
- What is your evaluation of the implementation?

It looks to be based on async_compose & asio::coroutine, which is a proven pattern . In addition, I am impressed and happy that it supports cancellation.

Thanks, but that's really something you get by using async_compose, and the library doesn't provide special support for it.

...

The prepared statements are easy to use, so that one can prevent injection attacks properly.

...
- What is your evaluation of the documentation?

It desribes the library quite well. I would like a section on transactions, because I feel like there might be some sharp edges there.

AFAIK transactions are just managed using regular SQL statements, so using `connection::query` with those statements should be enough. You aren't the first user asking about it though, so I've raised https://github.com/anarthal/mysql/issues/65 to add docs/examples on it.

...

...
- Will the choice of API abstraction model ease the development of software that must talk to a MySQL database?

It's exactly the right amount of detail for what it is: a database connector. It's not an ORM nor some wrapper that tries to prevent users from doing stupid things (like forwarding stdin to the DB). Either of those problems should be solved in another library that could be built on top of boost.mysql.

...
- Are there any immediate improvements that could be made after acceptance, if acceptance should happen?

I think additional overloads, depending on user requests. I also wonder if there is some useful transaction support, but this largely depends on the mysql protocol. It could be that there is no point to that.

AFAIK this is handled as regular SQL statements, so this shouldn't need special support. The integration test suite actually relies on transactions to maintain test runtime below adequate levels.

...

...
- Did you try to use the library? With which compiler(s)? Did you have any problems?

Clang-13 with libc++ and C++20.

...
- How much effort did you put into your evaluation? A glance? A quick reading? In-depth study?

Build a test program and read the doc for ~1h.

...
- Are you knowledgeable about the problem domain?

Yes.

I vote to accept the library into boost.

Thanks.

...

PS: Isn't mysql a registered trademark owned by oracle?

I'm no lawyer, so I don't know how much trouble this can cause. I chose this name because it made obvious what the library was about. I'm open to suggestions.

...

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Alan de Freitas

13 May 13 May

2:24 a.m.

Hi! Here's my review. Boost.MySql is an impressive library. I would like to thank Rubén for that. It implements the complete protocol from the ground up with async functionalities and certainly involved a lot of work. The documentation is extensive, goal-oriented, and helpful. The async operations support cancellation and the examples with coroutines are beautiful. During this review, I was impressed at how easy it works and I've noticed a few points in the API that have also been brought up in previous reviews and could be improved. Some other issues could also be better highlighted in the documentation, which would avoid many problems. The overall recommendation in this review is acceptance conditional on some fixes. ## Value

...

Will the library bring additional out-of-the-box utility to Boost?

The library is very good news considering our recent discussions about the future of Boost, where providing more protocol implementations comes up frequently. I wish more people would make this kind of contribution.

...

What is your evaluation of the potential usefulness of the library?

Others have questioned the benefit of the library when compared to sqlpp11 or any wrapper around the C API. The main difference is other libraries are high-level but this is a discussion still worth having from the point of view of users. I was thinking about the transition cost from Boost.MySQL to any other SQL database, since many applications have the requirement/necessity of allowing different SQL databases. In sqlpp11, the user can just change the backend. The user could use Boost.MySQL as an sqlpp11 backend and that would have the same effect. However, I think Rubén mentioned this is not possible at some point. I'm not sure whether this is just for the async functionality. In the same line, I wonder if a library for Postgres or Sqlite would be possible with a similar API, which could also solve the problem, although I'm not sure anyone would be willing to implement that. If we did, we could have the convenience of sqlpp11 and the ASIO async functionalities of Boost.Mysql for other DBs. The library really provides ease-of-use, when we consider what it provides and how low-level it is. However, unlike in other libraries like Boost.Beast, Boost.MySql users might not be sold into the Asio way of doing things. Applications that require access to databases might be making sparse database requests where the Asio asynchronous model is not as useful. Highlighting these differences in the docs is important. Asio takes some time to learn, and I guess for a user not used to Asio, already understanding Asio does not sound like the ease of use. The docs could focus on the protocol before moving on to the asynchronous functionality. I'm also a little worried about the maintainability of the library and protocol changes and how this could impact boost as a whole. Should we really announce it as compatible with MariaDB? What direction would the library take if they diverge? How often does the protocol change or is extended? Is Ruben going to stick around if the protocol changes? How hard would it be for someone to understand the code and implement extensions? Can a user be sure it's always going to provide the same features and be as reliable as the C API? I don't have the answer to these questions, but it's something that got me wondering. I guess this kind of question is going to come up for any library that is related to a protocol. I don't know if the name "MySql" can be used for the library, as it belongs to Oracle. I'm not saying it can't. I'm really saying I don't know. I'm not a lawyer and I don't understand the implications here. But this should be considered, investigated, and evidence should be provided. The library is also compatible with MariaDB and the name "MySql" might not reflect that. Maybe there's a small probability it might be compatible with some other similar DB protocol derived from MySql in the future? As others have mentioned, the protocol is strictly sequential for a single connection, and this might have some implications for the asynchronous operations the library provides. - No two asynchronous MySql query reads can happen concurrently. While this still has value among other Asio operations, like a server that needs the DB eventually, the user needs to be careful about that. Maybe it would be safer if all MySql operations were on some special kind of strand. Or maybe the library could provide some "mysql::transaction_strand" functionality to help ensure this invariant for individual queries in the future. - A second implication is that some applications might find the asynchronous functionalities in Boost.Mysql not as useful as asynchronous functionalities in other protocols, like the ones in Boost.Beast. This depends on how their applications are structured. Since this is the main advantage over the C API, these users may question the value of the library and the documentation should discuss this more explicitly. - These implications could become irrelevant if the library provides some kind of functionality to enable a non-blocking mode. I have no idea how the MySql client achieves that. ## API

...

What is your evaluation of the design? Will the choice of API abstraction model ease the development of software that must talk to a MySQL database?

I like how the API is very clean compared to the C API, even when including the asynchronous functionality. This would be a reason for using the library, even if I only used the synchronous functions. I'm worried about the lack of possibility of reusing memory for the results, as the interface depends on vector. This is not the usual Asio pattern. These vectors look even weirder in the asynchronous callbacks: - People have their containers/buffers and I would assume reading into some kind of existing row buffer would be the default interface, as is the case with other Asio read functions. In other words, read_many and read_all should work more like read_one. - Not returning these vectors is the common pattern in Asio: the initiating function receives a buffer for storage and the callback returns how many elements were read. Note that the buffer size already delimits how many elements we should read. - If we return vectors, async operations would need to instantiate the vector with the custom allocator for the operation. The callback wouldn't use std::vector<T> then. I would be std::vector<T, very_long_allocator_name> for every query callback, which is inconvenient. Using only std::vector<T> in the callback is not only inconvenient but dangerous because it would be the wrong allocator and everything would need to be copied. - One problem I see without the vectors is that each row would still need to allocate memory internally because of their dynamic size. I don't know the best solution for that second problem unless we have some sort of static size row/tuple, which does make sense since the user should know the number of columns at compile time. The library provides read_one, read_all, and read_many, but no "read_some". Besides the implication of a buffer to the initiating function, I would imagine reading as much as possible and parsing that to be the usual best strategy in this kind of an operation since we have a stream and we don't know how much data is available to read yet. Unfortunately, read_one, read_all, and read_many don't allow that. Unless read_many can already read less than the max number of rows in intermediary calls, which I don't think is the intention. It might also be useful to replicate the naming conventions Asio uses to read some or read all of the data. If "value" is intended to encapsulate the variant type, "value::to_variant" seems dangerous. The variant supports lots of types, which indicates the probability the internal design might need to change to fit new user demands is high. This could also need to change according to protocol extensions, and changing any of these types would break the API. If "value" already represents what it needs to represent, "value::to_variant" may not need to be exposed unless there's a lot of user demand for that. There is also an existential value/reference problem with "value": - The internal variant type includes a string_view whose string is owned by the row. This seems kind of dangerous or at least misleading. We have a reference type that's called "value", while this is more like a const "field view" or something. - At the same time, it's not a view either because I assume this is mutable, and updating numbers would not update the row. - The other values in the variant are value types, which makes "value" both a value and a reference type depending on the column type. - A reference type seems appropriate here because we don't want to copy all the data when iterating the results, but we need some other pattern for "value". A const view is especially useful because we might be able to keep value types for numbers, etc... Encapsulating the variant is also helpful here. - Calling a type "value" is a bad indication anyway. Any value is a "value". As others have mentioned, "value::get_std_optional" is unusual. It may not look problematic at first but it has a number of problems: 1) We have a dangerous precedent because we could also say we need to support all other boost/std equivalents in the library, such as the string_view the library uses at many places. 2) We are including and depending on two libraries when we know almost for sure the user is only using one of them: If there's no std::optional in the supported platform, we just need boost::optional. If you are requiring a C++ standard where std::optional is available, then we just need std::optional. 3) This is also not very useful if the user has both boost::optional and std::optional, because the code would still need to be refactored from get_optional to get_std_optional or vice-versa if the user decides to change it. This could be solved with some BOOST_MYSQL_USE_STD_OPTIONAL or BOOST_MYSQL_USE_STD_STRING_VIEW, etc., but this would also invalidate the distinction. 4) This would make the API inconsistent in case Boost moves to a C++17 minimum at some point in the future, replacing boost classes, as we have been discussing C++14 lately. Both get_optional and get_std_optional would return the same type. Also, if std::optional is the *standard* type, it should not be the one whose function needs clarification with a longer name "get_std_optional". 5) This doesn't need to be a member function, which indicates it shouldn't be a member function in this case. The class is internally representing only one of these and conversions will happen anyway. A user in the >C++17 intersection can easily implement their own to_std_optional free function outside the class. If I understood the documentation correctly, it's also problematic to use an empty optional for both a null value and an invalid result. They are simply not the same thing. Libraries always differentiate an invalid state from null in a field. This is especially relevant because the documentation warns us against using the "is_convertible_to" pattern over "get_optional" because "is_convertible_to" is unnecessarily slow. By the way, it would be nice to explain why "// WARNING!! Inefficient, do NOT do" is true. I wouldn't expect that to cost more than creating the optional value. The documentation states that the behavior of relational operators may change in the future. What do other libraries do about these comparisons? We probably don't want something like PHP, where "20" == 20 so that we need some kind of "20" === 20 in the future, but the documentation implies we also don't want "value(std::int64_t(200)) > value(std::uint64_t(200))" to fail. The library makes the whole thing particularly tricky with number types because not checking the proper number type in the variant would return invalid/null even when the number could be stored in a variable of another number type perfectly fine, which is even worse when both invalid and null are presented by the same empty optional. About the "resultset": - If "resultset" is a table, the "set" part of the name might be a little misleading in C++ as the results are in a deterministic sequence, even though this is what MySQL may call a similar concept. However, I imagine there's no equivalent concept in MySql for this specific Asio-async-lazy-fetching functionality itself. - This class is more like a "table", "frame", "vector", "list", or just "result". However, after all the steps of "connect", "prepare", and "execute", the name "result" gives the impression that the results are finally there, but this is just another stream object like a "result fetcher" or something. - Since the protocol is strictly sequential, it might be useful to not even have a resultset class at all. The connection could send the query and have some read function, like the usual read/write functions in Asio can operate on the same stream. Minor issues: - Maybe it's not clear from the examples, but why would the user need connection::handshake when connection::connection already does it? - Shouldn't `tcp_ssl_connection::connect` accept an EndpointSequence to be more like other Asio functions. - The names "tcp_ssl_prepared_statement", "tcp_ssl_<component>" are quite repetitive and verbose in the documentation. Maybe "tcp_ssl" could become a namespace? Isn't that what Asio does for the ssl related objects? Some names might be reduced if there's no alternative. For instance, "prepared_statement" might become "statement" since there's no "unprepared_statement". - Wouldn't some form of `statement::execute` with variadic args (different name perhaps) be a simpler alternative to `statement::execute(boost::mysql::make\_values(...))`. Or maybe statement::operator(), as Phil suggested. Or some `statement::execute_with(...)`. In the variadic args, the optional err/info can go into some specified positions, since they're not ambiguous with the field variant types. This is a little tricky but definitely useful, especially if suggestions of customization points are implemented in the future. ## Implementation

...

What is your evaluation of the implementation?

...

Did you try to use the library? With what compiler? Did you have any

I haven't analyzed the implementation very profoundly. I skimmed through the source code and couldn't find anything problematic. It would be useful if the experts could inspect the Asio composed ops more deeply. CMakeLists.txt: - I believe the CMakeLists.txt script is not in the format of other boost libraries in boost/libs so it won't work with the super-project as it is. - The example/CMakeLists.txt script refers to BOOST_MYSQL_INTEGRATION_TESTS. I don't think examples can be considered integration tests. Examples: - The examples are very nice. Especially the one with coroutines. They are also very limited. They are all about the same text queries, which shouldn't even be used in favor of prepared statements. - Many examples about continuation styles are not very useful because this is more of an Asio feature than a library feature. The library feature, so to speak, is supporting Asio tokens properly. The continuation styles could be exemplified in the exposition with some small snippets for users not used to Asio without the documentation losing any value. - Some examples are simple enough and don't require the reader to know the rest of the exposition. They are like a quick look into the library. These could come at the beginning, as in the Asio tutorials and Beast quick look section. - The first sync example could be simpler to involve just a hello world before moving on to other operations. - The page about the docker container should specify that the username and password are "root" and "". Tests: Some unit tests take a ***very*** long time. Enough to make coffee and a sandwich. And they seem to not be adding a lot of value in terms of coverage. For instance, "mysql/test/unit/detail/protocol/date.cpp(72): info: check '1974- 1-30' has passed" going through all possible dates multiple times took a long time. problems? No problems at all. GCC 11 and MSVC 19. ## Documentation

...

What is your evaluation of the documentation?

The documentation is complete. The main points that differentiate the library are - it's a complete rewrite of the protocol, - it's low-level and - it's based on Boost.Asio The documentation should emphasize these points as much as possible, especially the first one. This should be in the introduction, the motivation, slogans, logos, and wherever people can see it easily. The documentation should also provide arguments and evidence that these design goals are a good idea, as often discussed when the topic is the value of this library. Why is it worth rewriting the protocol? To what use cases are such a low-level library useful? Why should a person who already uses other libraries or the C API care about Asio now? Something that should also be highlighted is the difference between the library and other higher-level libraries, in particular, naming names. Minor issues: - There's no link in the documentation to the protocol specification. It would be interesting to know what the reference specification is. Or whether the protocol was inferred somehow. Is there any chance this protocol might change? What about divergences between MySql and MariaDB? How stable is the protocol? For what range of versions does it work? What's the policy when it changes? - Some links are broken (for instance, linking to https://anarthal.github.io/boost-mysql/index.html). - "All async operations in this library support per-operation cancellation". It's important to highlight this is per operation in the Asio sense of an operation but not in the MySql sense of an operation because the MySql connection is invalid after that. - "Boost.MySql has been tested with the following versions of MySQL". MariaDB is not a version of MySql. - Prepared statements should come first in the examples, to highlight them as the default pattern. - The documentation refers to topics that haven't been explained yet. Maybe "value" could be explained after "row", and "row" could be explained after "resultset" and "resultset" after "queries". - The section "Text Queries" is quite small in comparison to other sections. It could include some examples and snippets like other sections do. - "The following completion tokens can be used in any asyncrhonous operation within Boost.Mysql" -> "Any completion token..." - "When they fail, they throw a boost::system::system_error exception". Don't these functions just set the proper error_code, as usual with Asio and Beast? - The "MySQL to C++ mapping reference" section should be using a table. - A small subsection on transactions would be helpful even if there's no library functionality to help with that. - The documentation should include some comparisons that are not obvious to potential users. C/C++ APIs. The advantages of the Asio async model. Benchmarks if possible. ## Conclusion

...

How much effort did you put into your evaluation? A glance? A quick reading? In-depth study?

I spent one day in this review. I read all the documentation, ran the tests, experimented with the examples, and had a reasonable look at the implementation.

...

Are you knowledgeable about the problem domain?

I'm reasonably educated about databases but not an expert. I've been working a lot with Asio.

...

Are there any immediate improvements that could be made after acceptance, if acceptance should happen?

While it's important to have a general variant type for row values, a simpler interface for tuples of custom types would be very welcome and would simplify things by a lot, while also avoiding allocations, since columns always have the same types. This feature is too obvious since users almost always know their column types at compile time and this demand is too recurrent in applications to ignore.

...

Do you think the library should be accepted as a Boost library? Be sure to say this explicitly so that your other comments don't obscure your overall opinion.

I believe it should be conditionally accepted with the same conditions stated in other reviews: allowing for memory reuse in the read_* functions and fixing the "value" type. Best, Em ter., 10 de mai. de 2022 às 04:14, Richard Hodges via Boost < boost@lists.boost.org> escreveu:

...

Dear All,

The Boost formal review of the MySQL library starts Today, taking place from May 10th, 2022 to May 19th, 2022 (inclusive) - We are starting one day after the announced date and extending the period by one day to compensate.

The library is authored by Rubén Pérez Hidalgo (@anarthal in the CppLang slack).

Documentation: https://anarthal.github.io/mysql/index.html Source: https://github.com/anarthal/mysql/

The library is built on the bedrock of Boost.Asio and provides both synchronous and asynchronous client connectors for the MySQL database system.

Boost.MySQL is written from the ground up, implementing the entire protocol with no external dependencies beyond the Boost library. It is compatible with MariaDB.

Connectivity options include TCP, SSL and Unix Sockets.

For async interfaces, examples in the documentation demonstrate full compatibility with all Asio completion handler styles, including:

Callbacks:- https://anarthal.github.io/mysql/mysql/examples/query_async_callbacks.html

Futures :- https://anarthal.github.io/mysql/mysql/examples/query_async_futures.html

Boost.Coroutine :- https://anarthal.github.io/mysql/mysql/examples/query_async_coroutines.html

C++20 Coroutines :-

https://anarthal.github.io/mysql/mysql/examples/query_async_coroutinescpp20....

Rubén has also implemented the Asio protocols for deducing default completion token types :-

https://anarthal.github.io/mysql/mysql/examples/default_completion_tokens.ht...

Reviewing a database connector in depth will require setting up an instance of a MySQL database. Fortunately most (all?) Linux distributions carry a MySQL and/or MariaDB package. MySQL community edition is available for download on all platforms here: https://dev.mysql.com/downloads/

Rubén has spent quite some time in order to bring us this library candidate. The development process has no doubt been a journey of discovery into Asio, its concepts and inner workings. I am sure he has become a fount of knowledge along the way.

From a personal perspective, I was very happy to be asked to manage this review. I hope it will be the first of many more reviews of libraries that tackle business connectivity problems without further dependencies beyond Boost, arguably one of the most trusted foundation libraries available.

Please provide in your review information you think is valuable to understand your choice to ACCEPT or REJECT including Describe as a Boost library. Please be explicit about your decision (ACCEPT or REJECT).

Some other questions you might want to consider answering:

- Will the library bring additional out-of-the-box utility to Boost? - What is your evaluation of the implementation? - What is your evaluation of the documentation? - Will the choice of API abstraction model ease the development of software that must talk to a MySQL database? - Are there any immediate improvements that could be made after acceptance, if acceptance should happen? - Did you try to use the library? With which compiler(s)? Did you have any problems? - How much effort did you put into your evaluation? A glance? A quick reading? In-depth study? - Are you knowledgeable about the problem domain?

More information about the Boost Formal Review Process can be found at: http://www.boost.org/community/reviews.html

The review is open to anyone who is prepared to put in the work of evaluating and reviewing the library. Prior experience in contributing to Boost reviews is not a requirement.

Thank you for your efforts in the Boost community. They are very much appreciated.

Richard Hodges - review manager of the proposed Boost.MySQL library

Rubén is often available on CppLang Slack and of course by email should you require any clarification not covered by the documentation, as am I.

-- Richard Hodges hodges.r@gmail.com tg: @rhodges office: +44 2032 898 513 mobile: +376 380 212

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

-- Alan Freitas https://github.com/alandefreitas

Ruben Perez

15 May 15 May

1:49 p.m.

. On Fri, 13 May 2022 at 04:25, Alan de Freitas via Boost <boost@lists.boost.org> wrote:

...

Hi! Here's my review.

Hi Alan, thank you very much for taking your time to write your review.

...

Boost.MySql is an impressive library. I would like to thank Rubén for that. It implements the complete protocol from the ground up with async functionalities and certainly involved a lot of work. The documentation is extensive, goal-oriented, and helpful. The async operations support cancellation and the examples with coroutines are beautiful. During this review, I was impressed at how easy it works and I've noticed a few points in the API that have also been brought up in previous reviews and could be improved. Some other issues could also be better highlighted in the documentation, which would avoid many problems. The overall recommendation in this review is acceptance conditional on some fixes.

Thanks a lot.

...

Others have questioned the benefit of the library when compared to sqlpp11 or any wrapper around the C API. The main difference is other libraries are high-level but this is a discussion still worth having from the point of view of users. I was thinking about the transition cost from Boost.MySQL to any other SQL database, since many applications have the requirement/necessity of allowing different SQL databases. In sqlpp11, the user can just change the backend. The user could use Boost.MySQL as an sqlpp11 backend and that would have the same effect. However, I think Rubén mentioned this is not possible at some point. I'm not sure whether this is just for the async functionality. In the same line, I wonder if a library for Postgres or Sqlite would be possible with a similar API, which could also solve the problem, although I'm not sure anyone would be willing to implement that. If we did, we could have the convenience of sqlpp11 and the ASIO async functionalities of Boost.Mysql for other DBs.

It is indeed possible. My point was that sqlpp11 does not provide an async interface. You can surely write a sqlpp11 backend for MySQL using the proposed Boost.MySQL library. I haven't done it, but having a quick look at the interface, it should just work. However, you won't get the most of Boost.MySQL, because you will be using just sync code. It is possible to write a Postgres connector with a similar approach. I indeed wrote a POC available here: https://github.com/anarthal/postgres-asio The protocols are different, though, so if you follow my approach of providing primitives easy to use, yet close to the protocol, the APIs will be a little different. The Postgres protocol allows finer grain control over statement execution, AFAIR. I don't know much about SQLite - but being just a library and not a protocol, the connector would have to wrap libsqlite. I can't tell you whether exposing an Asio interface for it is possible or not.

...

The library really provides ease-of-use, when we consider what it provides and how low-level it is. However, unlike in other libraries like Boost.Beast, Boost.MySql users might not be sold into the Asio way of doing things. Applications that require access to databases might be making sparse database requests where the Asio asynchronous model is not as useful. Highlighting these differences in the docs is important. Asio takes some time to learn, and I guess for a user not used to Asio, already understanding Asio does not sound like the ease of use. The docs could focus on the protocol before moving on to the asynchronous functionality.

What actions do you think we can take here? Adding a section in the intro (https://anarthal.github.io/mysql/mysql/intro.html) about "knowing Asio is a requirement to use this library"? Do you think an overview on the protocol (handshake, query, metadata and row packets) would really be useful for the user at that point?

...

I'm also a little worried about the maintainability of the library and protocol changes and how this could impact boost as a whole. Should we really announce it as compatible with MariaDB? What direction would the library take if they diverge? How often does the protocol change or is extended? Is Ruben going to stick around if the protocol changes? How hard would it be for someone to understand the code and implement extensions? Can a user be sure it's always going to provide the same features and be as reliable as the C API? I don't have the answer to these questions, but it's something that got me wondering. I guess this kind of question is going to come up for any library that is related to a protocol.

The protocol it implements is the "client/server" protocol, which was in place before the MariaDB fork. From what I know, this is a pretty old protocol and remains pretty stable. MySQL is separately developing a new X-protocol`(which this library does *NOT* implement). Docs here: https://dev.mysql.com/doc/dev/mysql-server/latest/page_mysqlx_protocol.html MariaDB does not implement it and does not plan to do it. I don't really think MySQL is going to develop the client/server protocol a lot. I wouldn't say they will deprecate it in the next years, either, as there are millions of clients relying on the client/server protocol, and there aren't many connectors for the X protocol yet (no C client AFAIK). Does that answer part of your questions?

...

I don't know if the name "MySql" can be used for the library, as it belongs to Oracle. I'm not saying it can't. I'm really saying I don't know. I'm not a lawyer and I don't understand the implications here. But this should be considered, investigated, and evidence should be provided. The library is also compatible with MariaDB and the name "MySql" might not reflect that. Maybe there's a small probability it might be compatible with some other similar DB protocol derived from MySql in the future?

I'm as stuck as you here. What would be the best path here? For now, I've tried to contact a former user in the Boost ML who works at Oracle, to see whether he can provide me some guidance here. Any ideas are welcome.

...

- No two asynchronous MySql query reads can happen concurrently. While this still has value among other Asio operations, like a server that needs the DB eventually, the user needs to be careful about that. Maybe it would be safer if all MySql operations were on some special kind of strand. Or maybe the library could provide some "mysql::transaction_strand" functionality to help ensure this invariant for individual queries in the future. - A second implication is that some applications might find the asynchronous functionalities in Boost.Mysql not as useful as asynchronous functionalities in other protocols, like the ones in Boost.Beast. This depends on how their applications are structured. Since this is the main advantage over the C API, these users may question the value of the library and the documentation should discuss this more explicitly. - These implications could become irrelevant if the library provides some kind of functionality to enable a non-blocking mode. I have no idea how the MySql client achieves that.

At the end of the day, I'd say this is similar to HTTP (half-duplex). The client sends a request and gets a response. I think the docs here are not doing a great job, and that the interface may be a little limiting (not allowing the possibility of pipelining). This is something brought up in other reviews, too, and tracked by https://github.com/anarthal/mysql/issues/76 and https://github.com/anarthal/mysql/issues/75. Do you think such pipelining API would mitigate your concerns?

...

I'm worried about the lack of possibility of reusing memory for the results, as the interface depends on vector. This is not the usual Asio pattern. These vectors look even weirder in the asynchronous callbacks:

This is also an issue for resultsets, and tracked by https://github.com/anarthal/mysql/issues/59.

...

- People have their containers/buffers and I would assume reading into some kind of existing row buffer would be the default interface, as is the case with other Asio read functions. In other words, read_many and read_all should work more like read_one. - Not returning these vectors is the common pattern in Asio: the initiating function receives a buffer for storage and the callback returns how many elements were read. Note that the buffer size already delimits how many elements we should read. - If we return vectors, async operations would need to instantiate the vector with the custom allocator for the operation. The callback wouldn't use std::vector<T> then. I would be std::vector<T, very_long_allocator_name> for every query callback, which is inconvenient. Using only std::vector<T> in the callback is not only inconvenient but dangerous because it would be the wrong allocator and everything would need to be copied.

Brought up by another user during review, tracked by https://github.com/anarthal/mysql/issues/58. I think the best thing here is just passing the vectors by lvalue reference.

...

- One problem I see without the vectors is that each row would still need to allocate memory internally because of their dynamic size. I don't know the best solution for that second problem unless we have some sort of static size row/tuple, which does make sense since the user should know the number of columns at compile time.

Brought up by everyone, so I will give this high priority. Tracked by https://github.com/anarthal/mysql/issues/60.

...

The library provides read_one, read_all, and read_many, but no "read_some". Besides the implication of a buffer to the initiating function, I would imagine reading as much as possible and parsing that to be the usual best strategy in this kind of an operation since we have a stream and we don't know how much data is available to read yet. Unfortunately, read_one, read_all, and read_many don't allow that. Unless read_many can already read less than the max number of rows in intermediary calls, which I don't think is the intention. It might also be useful to replicate the naming conventions Asio uses to read some or read all of the data.

Not sure if I follow you. What's the interface you propose here? The connection object already holds an internal buffer for requests. The library currently performs boost::asio::read call per packet, but I'm working in optimizing that to use read_some instead. With that approach, read_one may get the row from the buffer without performing any read at all. Is that what you mean?

...

If "value" is intended to encapsulate the variant type, "value::to_variant" seems dangerous. The variant supports lots of types, which indicates the probability the internal design might need to change to fit new user demands is high. This could also need to change according to protocol extensions, and changing any of these types would break the API. If "value" already represents what it needs to represent, "value::to_variant" may not need to be exposed unless there's a lot of user demand for that.

I exposed it to avoid having to implement a visit function. Having such a visit function would have the same problems. I think visiting a variant-like is one of the most common things. It make sense not to expose the function if we end up with an alternate tuple-like interface, though.

...

There is also an existential value/reference problem with "value":

- The internal variant type includes a string_view whose string is owned by the row. This seems kind of dangerous or at least misleading. We have a reference type that's called "value", while this is more like a const "field view" or something. - At the same time, it's not a view either because I assume this is mutable, and updating numbers would not update the row. - The other values in the variant are value types, which makes "value" both a value and a reference type depending on the column type. - A reference type seems appropriate here because we don't want to copy all the data when iterating the results, but we need some other pattern for "value". A const view is especially useful because we might be able to keep value types for numbers, etc... Encapsulating the variant is also helpful here. - Calling a type "value" is a bad indication anyway. Any value is a "value".

If we go down the compile-time parsing strategy, together with the buffering I described above, making it owning may be the best option. Any naming ideas? field_value? field? sql_value?

...

As others have mentioned, "value::get_std_optional" is unusual. It may not look problematic at first but it has a number of problems:

First of all, I'd like to know your thoughts about the conversion functionality in value (https://anarthal.github.io/mysql/mysql/values.html#mysql.values.conversions). Do you think it's helpful for the user? If it wasn't because of conversions, we could replace value::get_optional and value::get_std_optional for something similar to variant2::get_if, returning a pointer to the underlying value, avoiding the problem altogether. This could make even more sense if we make the value owning, as value::get_optional<std::string> would end up copying the underlying string, which doesn't make a lot of sense.

...

If I understood the documentation correctly, it's also problematic to use an empty optional for both a null value and an invalid result. They are simply not the same thing. Libraries always differentiate an invalid state from null in a field. This is especially relevant because the documentation warns us against using the "is_convertible_to" pattern over "get_optional" because "is_convertible_to" is unnecessarily slow. By the way, it would be nice to explain why "// WARNING!! Inefficient, do NOT do" is true. I wouldn't expect that to cost more than creating the optional value.

The distinction can be made with something like: value v = /* get the value */ if (v.is_null()) { // handle the NULL case } else { double d = v.get<double>(); // will throw on type mismatch } So maybe what we're lacking is something in the documentation to address how to handle NULL values effectively. I've raised https://github.com/anarthal/mysql/issues/77 to address this issue. Calling value::is_convertible_to, then value::get will perform the conversion twice, while value::get_optional will do it just once. It isn't horribly inefficient, though - maybe this section is causing more confusion than aid. Considering all this, I'd make the following to the value class: 1) Make it owning. 2) Remove conversions. 3) Remove get_optional and get_std_optional, and provide a get_if function returning a pointer to the internal value, as variant2::get_if does. 4) Make value::get return a reference instead of a value. 5) Remove value::to_variant I've summarized these points in https://github.com/anarthal/mysql/issues/85.

...

The documentation states that the behavior of relational operators may change in the future. What do other libraries do about these comparisons? We probably don't want something like PHP, where "20" == 20 so that we need some kind of "20" === 20 in the future, but the documentation implies we also don't want "value(std::int64_t(200)) > value(std::uint64_t(200))" to fail. The library makes the whole thing particularly tricky with number types because not checking the proper number type in the variant would return invalid/null even when the number could be stored in a variable of another number type perfectly fine, which is even worse when both invalid and null are presented by the same empty optional.

Thanks to conversions, value(int64_t(200)).get_optional<uint64_t>() succeeds and returns a non-empty optional. The main use I see for inequality operators is being able to store values in containers such as std::set, so I don't think we should change the behavior in the future. Raised https://github.com/anarthal/mysql/issues/80.

...

About the "resultset":

- If "resultset" is a table, the "set" part of the name might be a little misleading in C++ as the results are in a deterministic sequence, even though this is what MySQL may call a similar concept. However, I imagine there's no equivalent concept in MySql for this specific Asio-async-lazy-fetching functionality itself. - This class is more like a "table", "frame", "vector", "list", or just "result". However, after all the steps of "connect", "prepare", and "execute", the name "result" gives the impression that the results are finally there, but this is just another stream object like a "result fetcher" or something.

I'm thinking on calling it query_operation or query_result.

...

- Since the protocol is strictly sequential, it might be useful to not even have a resultset class at all. The connection could send the query and have some read function, like the usual read/write functions in Asio can operate on the same stream.

Resultsets also hold whether they're complete or not, and some extra info like affected_rows and warning_count, which become available once the resultset has been completely read (or from the beginning, if the query was an INSERT, for example). Docs: https://anarthal.github.io/mysql/mysql/resultsets.html#mysql.resultsets.comp... If we implement pipeline mode, it will have to hold some state specific to the operation (i.e. a sequence number that the protocol dictates - although that's internal). It is also possible to make resultset (or whatever name we come up with) not an I/O object, and move the read functions to the connection.

...

Minor issues:

- Maybe it's not clear from the examples, but why would the user need connection::handshake when connection::connection already does it?

If your Stream object does not satisfy the SocketStream requirements (https://anarthal.github.io/mysql/mysql/ref/boost__mysql__SocketStream.html), you won't be able to use connection::connect or connection::close. This would be the case for Windows named pipes. Docs here: https://anarthal.github.io/mysql/mysql/other_streams.html#mysql.other_stream...

...

- Shouldn't `tcp_ssl_connection::connect` accept an EndpointSequence to be more like other Asio functions.

I think it should accept both. Tracked by https://github.com/anarthal/mysql/issues/72

...

- The names "tcp_ssl_prepared_statement", "tcp_ssl_<component>" are quite repetitive and verbose in the documentation. Maybe "tcp_ssl" could become a namespace? Isn't that what Asio does for the ssl related objects? Some names might be reduced if there's no alternative. For instance, "prepared_statement" might become "statement" since there's no "unprepared_statement".

Would having type aliases under connection (i.e. connection::resultset_type) alleviate your concerns?

...

- Wouldn't some form of `statement::execute` with variadic args (different name perhaps) be a simpler alternative to `statement::execute(boost::mysql::make\_values(...))`. Or maybe statement::operator(), as Phil suggested. Or some `statement::execute_with(...)`. In the variadic args, the optional err/info can go into some specified positions, since they're not ambiguous with the field variant types. This is a little tricky but definitely useful, especially if suggestions of customization points are implemented in the future.

Definitely. https://github.com/anarthal/mysql/issues/74 tracks it. How would variadic operator() work with async functions? The convention is having the _async suffix, but that's not possible with an operator overload.

...

## Implementation

...
What is your evaluation of the implementation?

I haven't analyzed the implementation very profoundly. I skimmed through the source code and couldn't find anything problematic. It would be useful if the experts could inspect the Asio composed ops more deeply.

CMakeLists.txt:

- I believe the CMakeLists.txt script is not in the format of other boost libraries in boost/libs so it won't work with the super-project as it is.

Not aware there was a standard format. Could you please point me out to the relevant documentation?

...

- The example/CMakeLists.txt script refers to BOOST_MYSQL_INTEGRATION_TESTS. I don't think examples can be considered integration tests.

They are from the perspective that they require a real server to run.

...

Examples:

- The examples are very nice. Especially the one with coroutines. They are also very limited. They are all about the same text queries, which shouldn't even be used in favor of prepared statements.

Tracked by https://github.com/anarthal/mysql/issues/81. May be worth reconsidering when we tackle https://github.com/anarthal/mysql/issues/69

...

- Many examples about continuation styles are not very useful because this is more of an Asio feature than a library feature. The library feature, so to speak, is supporting Asio tokens properly. The continuation styles could be exemplified in the exposition with some small snippets for users not used to Asio without the documentation losing any value.

I may move them into a separate section, instead of having all of them in a single section.

...

- The page about the docker container should specify that the username and password are "root" and "".

Raised https://github.com/anarthal/mysql/issues/82

...

Tests:

Some unit tests take a ***very*** long time. Enough to make coffee and a sandwich. And they seem to not be adding a lot of value in terms of coverage. For instance, "mysql/test/unit/detail/protocol/date.cpp(72): info: check '1974- 1-30' has passed" going through all possible dates multiple times took a long time.

True. They may have been useful during development but they don't give us a lot as regression tests. Raised https://github.com/anarthal/mysql/issues/83

...

The documentation is complete. The main points that differentiate the library are

- it's a complete rewrite of the protocol, - it's low-level and - it's based on Boost.Asio

The documentation should emphasize these points as much as possible, especially the first one. This should be in the introduction, the motivation, slogans, logos, and wherever people can see it easily.

Has been noted in several other reviews. https://github.com/anarthal/mysql/issues/70 tracks it.

...

The documentation should also provide arguments and evidence that these design goals are a good idea, as often discussed when the topic is the value of this library. Why is it worth rewriting the protocol? To what use cases are such a low-level library useful? Why should a person who already uses other libraries or the C API care about Asio now? Something that should also be highlighted is the difference between the library and other higher-level libraries, in particular, naming names.

https://github.com/anarthal/mysql/issues/50 tracks it.

...

Minor issues:

- There's no link in the documentation to the protocol specification. It would be interesting to know what the reference specification is. Or whether the protocol was inferred somehow. Is there any chance this protocol might change? What about divergences between MySql and MariaDB? How stable is the protocol? For what range of versions does it work? What's the policy when it changes?

Raised https://github.com/anarthal/mysql/issues/84. FYI, this is the specification: https://dev.mysql.com/doc/dev/mysql-server/latest/PAGE_PROTOCOL.html. Some mechanics are better explained here: https://dev.mysql.com/doc/internals/en/client-server-protocol.html (although it's an old page).

...

- Some links are broken (for instance, linking to https://anarthal.github.io/boost-mysql/index.html).

Just fixed it in develop.

...

- "All async operations in this library support per-operation cancellation". It's important to highlight this is per operation in the Asio sense of an operation but not in the MySql sense of an operation because the MySql connection is invalid after that.

Note added in develop.

...

- "Boost.MySql has been tested with the following versions of MySQL". MariaDB is not a version of MySql.

Fixed in develop.

...

- Prepared statements should come first in the examples, to highlight them as the default pattern.

Will be tackled as part of https://github.com/anarthal/mysql/issues/81

...

- The documentation refers to topics that haven't been explained yet. Maybe "value" could be explained after "row", and "row" could be explained after "resultset" and "resultset" after "queries".

The basic concepts have been covered in the tutorial, which goes first. So at this point the user should know the *basic* concept of what a value, a row and a resultset is. The rest of the sections go deeper in these concepts. Did you find section order counter-intuitive, even reading the tutorial first?

...

- The section "Text Queries" is quite small in comparison to other sections. It could include some examples and snippets like other sections do.

I'll probably expand it as part of https://github.com/anarthal/mysql/issues/69, because there will be more stuff to cover.

...

- "The following completion tokens can be used in any asyncrhonous operation within Boost.Mysql" -> "Any completion token..."

Fixed in develop

...

- "When they fail, they throw a boost::system::system_error exception". Don't these functions just set the proper error_code, as usual with Asio and Beast?

No. In Asio, you always get two overloads for sync operations. Let's take basic_stream_socket::read_some, for instance: https://www.boost.org/doc/libs/1_79_0/doc/html/boost_asio/reference/basic_st... https://www.boost.org/doc/libs/1_79_0/doc/html/boost_asio/reference/basic_st... Overload 1 throws boost::system::system_error, while overload 2 returns the error by setting the appropriate error_code. This library is consistent with that.

...

- The "MySQL to C++ mapping reference" section should be using a table.

I'd say this is personal preference. I will explore the tabular format when I have the chance.

...

- A small subsection on transactions would be helpful even if there's no library functionality to help with that.

https://github.com/anarthal/mysql/issues/65 tracks it.

...

- The documentation should include some comparisons that are not obvious to potential users. C/C++ APIs. The advantages of the Asio async model. Benchmarks if possible.

https://github.com/anarthal/mysql/issues/50

...

...
Are there any immediate improvements that could be made after acceptance, if acceptance should happen?

While it's important to have a general variant type for row values, a simpler interface for tuples of custom types would be very welcome and would simplify things by a lot, while also avoiding allocations, since columns always have the same types. This feature is too obvious since users almost always know their column types at compile time and this demand is too recurrent in applications to ignore.

...
Do you think the library should be accepted as a Boost library? Be sure to say this explicitly so that your other comments don't obscure your overall opinion.

I believe it should be conditionally accepted with the same conditions stated in other reviews: allowing for memory reuse in the read_* functions and fixing the "value" type.

Could you please be a little more explicit on your second point? Would adding a struct/tuple interface (https://github.com/anarthal/mysql/issues/60) and making it owning (https://github.com/anarthal/mysql/issues/85), which would in turn tackle all the optional stuff, be enough? Kind regards, Ruben.

1160

Age (days ago)

1165

Last active (days ago)

List overview

Download

18 comments

8 participants

participants (8)

Alan de Freitas
Daniela Engert
Klemens Morgenstern
Phil Endecott
Rainer Deyke
Richard Hodges
Ruben Perez
Vinnie Falco