Re: [boost] MySQL ASIO library

14 Apr 2020

      On Tue, 14 Apr 2020 at 14:16, Dominique Devienne via Boost <
boost@lists.boost.org> wrote:
...
On Mon, Apr 13, 2020 at 1:52 PM Ruben Perez via Boost <
boost@lists.boost.org>
wrote:
...
I have been working in a proof-of-concept implementation of PostgreSQL.
You
can find it here: https://github.com/anarthal/postgres-asio
Hi. Great, I've been following this thread from afar, because first I'm
using/learning ASIO
myself, and second because I call databases as part of the processing I do
in my ASIO-scheduled
"tasks". But I'm a PostgreSQL user, not a MySQL one, thus my relative lack
of interest until now :).

...
...
It supports connections, queries and prepared statements with an API
almost
identical to the one proposed for Boost.MySQL. I have only implemented
synchronous operations yet. I hope this helps prove that implementing
other
databases with a very similar API is feasible and helps push the MySQL
library forward.
I'm confused. What's the point of implementing synchronous at all rather
than use
libpq directly? libpq is "official" and synchronous, so the *only* reason
I'd prefer your
re-implementation of the protocol would be to gain async via ASIO.
Thanks for checking out! The point of that library was just a small proof
of concept. There were a considerable amount of people concerned about the
MySQL library being too MySQL specific. I just wanted to show that you
*could* write a similar kind of library for other database. So at this
moment you are right, there is absolutely no reason to choose my
implementation. Just to be clear, the MySQL implementation is indeed
complete and tested.
...
I haven't looked at the code, but I would also have assumed that any
synchronuous
API for a library that stated to the async, would be as a convenience layer
on top of
the async layer, doing the waiting/synchronization for you, to simplify
things when async
is not strictly necessary.
But what you wrote above seems to imply the contrary. Is the async API
implemented
on top of the sync API? Or you implement two separate APIs completely?
There is no async API at the moment. In the MySQL library both APIs are
implemented separately, with most of the code factored out into
functions/classes. But once you've got the synchronous implementation and
API, you can "just write" the async one, as I keep the sync and async API
consistent. So, for the proof of concept, I just did the sync one (which is
easier).
...
...
Note that the Postgres library is just a proof of concept; it is very far
from being a full implementation. At this point I lack the time to
transform this library into a full implementation. If somebody wants to
volunteer and progress this library into a full implementation (or just
write something from scratch), I would appreciate it a lot - just message
me.
It's hard to get traction on alternatives to official APIs (like libpq for
PostgreSQL),
w/o some pretty strong guarantees on testing. And testing with databases is
complicated
because of the need to access that remote DB; you can't easily mock a
database reliably.
I'm not sure how that would work within Boost. Are there precedent for
Boost libraries that
need "external" resources like full blown databases? Perhaps via VMs and/or
Docker?
For MySQL, I have set up CI testing (Travis for Linux and OSX, Appveyor for
Windows). I deploy the database (at this moment using the system package
manager, I am considering switching to Docker containers to test a wider
range of DB versions). Most of the tests run with a real database
connection. At this moment, The MySQL library database tests cover most of
the native types, both in text queries and statements, as well as the most
used authentication methods.

If I were to do a fully-fledged implementation for Postgres, I would
implement a similar strategy. But doing so for MySQL has proven to be
really time-consuming. I would prefer to push the MySQL library forward
before moving into another one.
...
...
As always, any feedback is welcome.
A few things that came to my mind reading this thread, from my DB
experience, is that
the re-implement-the-low-level-protocol approach applies to very few DB
systems,
given that most commercial DBs use proprietary protocols (I know Oracle
does for sure).
Or the most used DB in the world (SQLite) doesn't have a network protocol
at all. So that
pretty much only leaves MySQL and PostgreSQL, no? So at the very least,
both should be
supported by a library that wants to gain widespread appeal IMHO. And from
my POV, you
chose the wrong one to start with :)
I have read that SQLite has an async API (using the filesystem asynchronous
API), so maybe we could do something with it. I haven't explored that at
all, TBH.
...
Another remark is that PostgreSQL has two modes, text (the default) and
binary to send row
data back and forth. We've tested both and went the most difficult route of
using the binary
route for performance reasons, which complicates things. Do you handle both
modes at the
protocol level?
No, the proof of concept just does text.
...
I'd conclude by saying that from my perspective, the one-true-SQL-library
is a pipe dream
when you want to maximum performance, or when wanting to avoid the
lowest-common
denominator approach.

...
I've used Array-binding in Oracle, and XMLTYPE, and low-level C/C++ binding
of Spatial objects
in OCI, and pretty much no SQL abstraction libraries can do these things.
After doing lots of
Oracle and SQLite, and now doing PostgreSQL, the differences are actually
very large between
these DBs, to the point a lots of the optimization done for one are not
applicable to the other,
and many things just don't translate at all. If one stays within a *tiny*
subset of SQL and features,
maybe... But any non-trivial sophisticated DB use, you quickly find the
limits libraries that try
to be multi-DB IMHO.
I completely agree with this. My view on this is that there should be a
database for each library, so that you can expose the details specific to
each database. (E.g. Portals, the binary protocol and execution row count
for Postgres; Portals are not even a thing in MySQL).

Then, if you want a multi-database library, you use the database-specific
building blocks. However, there were people arguing that these building
blocks may not be valuable enough to be part of Boost.
...
So in your case, what matters most is the async ASIO-based design, and
having a framework
to develop async DB client libraries, but not necessarily exposing exactly
the same API for all
of them. What matters is to be able to start processing rows and sending
them downstream for
further processing, *BEFORE* the whole result set made it client-side.
Interesting feature, we could do that in MySQL, too.
...
Maybe what I'm saying
doesn't make sense, because I haven't studied your code and examples yet,
but FWIW,
these are the thoughts I had following this thread.
Thanks, --DD
All you said makes sense, thanks. Glad to know there would be interest for
a Postgres library, too.
...
_______________________________________________
Unsubscribe & other changes:
http://lists.boost.org/mailman/listinfo.cgi/boost