On Tue, 14 Apr 2020 at 14:16, Dominique Devienne via Boost < boost@lists.boost.org> wrote:
On Mon, Apr 13, 2020 at 1:52 PM Ruben Perez via Boost < boost@lists.boost.org> wrote:
I have been working in a proof-of-concept implementation of PostgreSQL. You can find it here: https://github.com/anarthal/postgres-asio
Hi. Great, I've been following this thread from afar, because first I'm using/learning ASIO myself, and second because I call databases as part of the processing I do in my ASIO-scheduled "tasks". But I'm a PostgreSQL user, not a MySQL one, thus my relative lack of interest until now :).
It supports connections, queries and prepared statements with an API almost identical to the one proposed for Boost.MySQL. I have only implemented synchronous operations yet. I hope this helps prove that implementing other databases with a very similar API is feasible and helps push the MySQL library forward.
I'm confused. What's the point of implementing synchronous at all rather than use libpq directly? libpq is "official" and synchronous, so the *only* reason I'd prefer your re-implementation of the protocol would be to gain async via ASIO.
Thanks for checking out! The point of that library was just a small proof of concept. There were a considerable amount of people concerned about the MySQL library being too MySQL specific. I just wanted to show that you *could* write a similar kind of library for other database. So at this moment you are right, there is absolutely no reason to choose my implementation. Just to be clear, the MySQL implementation is indeed complete and tested.
I haven't looked at the code, but I would also have assumed that any synchronuous API for a library that stated to the async, would be as a convenience layer on top of the async layer, doing the waiting/synchronization for you, to simplify things when async is not strictly necessary.
But what you wrote above seems to imply the contrary. Is the async API implemented on top of the sync API? Or you implement two separate APIs completely?
There is no async API at the moment. In the MySQL library both APIs are implemented separately, with most of the code factored out into functions/classes. But once you've got the synchronous implementation and API, you can "just write" the async one, as I keep the sync and async API consistent. So, for the proof of concept, I just did the sync one (which is easier).
Note that the Postgres library is just a proof of concept; it is very far from being a full implementation. At this point I lack the time to transform this library into a full implementation. If somebody wants to volunteer and progress this library into a full implementation (or just write something from scratch), I would appreciate it a lot - just message me.
It's hard to get traction on alternatives to official APIs (like libpq for PostgreSQL), w/o some pretty strong guarantees on testing. And testing with databases is complicated because of the need to access that remote DB; you can't easily mock a database reliably. I'm not sure how that would work within Boost. Are there precedent for Boost libraries that need "external" resources like full blown databases? Perhaps via VMs and/or Docker?
For MySQL, I have set up CI testing (Travis for Linux and OSX, Appveyor for Windows). I deploy the database (at this moment using the system package manager, I am considering switching to Docker containers to test a wider range of DB versions). Most of the tests run with a real database connection. At this moment, The MySQL library database tests cover most of the native types, both in text queries and statements, as well as the most used authentication methods. If I were to do a fully-fledged implementation for Postgres, I would implement a similar strategy. But doing so for MySQL has proven to be really time-consuming. I would prefer to push the MySQL library forward before moving into another one.
As always, any feedback is welcome.
A few things that came to my mind reading this thread, from my DB experience, is that the re-implement-the-low-level-protocol approach applies to very few DB systems, given that most commercial DBs use proprietary protocols (I know Oracle does for sure). Or the most used DB in the world (SQLite) doesn't have a network protocol at all. So that pretty much only leaves MySQL and PostgreSQL, no? So at the very least, both should be supported by a library that wants to gain widespread appeal IMHO. And from my POV, you chose the wrong one to start with :)
I have read that SQLite has an async API (using the filesystem asynchronous API), so maybe we could do something with it. I haven't explored that at all, TBH.
Another remark is that PostgreSQL has two modes, text (the default) and binary to send row data back and forth. We've tested both and went the most difficult route of using the binary route for performance reasons, which complicates things. Do you handle both modes at the protocol level?
No, the proof of concept just does text.
I'd conclude by saying that from my perspective, the one-true-SQL-library is a pipe dream when you want to maximum performance, or when wanting to avoid the lowest-common denominator approach.
I've used Array-binding in Oracle, and XMLTYPE, and low-level C/C++ binding of Spatial objects in OCI, and pretty much no SQL abstraction libraries can do these things. After doing lots of Oracle and SQLite, and now doing PostgreSQL, the differences are actually very large between these DBs, to the point a lots of the optimization done for one are not applicable to the other, and many things just don't translate at all. If one stays within a *tiny* subset of SQL and features, maybe... But any non-trivial sophisticated DB use, you quickly find the limits libraries that try to be multi-DB IMHO.
I completely agree with this. My view on this is that there should be a database for each library, so that you can expose the details specific to each database. (E.g. Portals, the binary protocol and execution row count for Postgres; Portals are not even a thing in MySQL). Then, if you want a multi-database library, you use the database-specific building blocks. However, there were people arguing that these building blocks may not be valuable enough to be part of Boost.
So in your case, what matters most is the async ASIO-based design, and having a framework to develop async DB client libraries, but not necessarily exposing exactly the same API for all of them. What matters is to be able to start processing rows and sending them downstream for further processing, *BEFORE* the whole result set made it client-side.
Interesting feature, we could do that in MySQL, too.
Maybe what I'm saying doesn't make sense, because I haven't studied your code and examples yet, but FWIW, these are the thoughts I had following this thread.
Thanks, --DD
All you said makes sense, thanks. Glad to know there would be interest for a Postgres library, too.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost