On Mon, Apr 13, 2020 at 1:52 PM Ruben Perez via Boost
I have been working in a proof-of-concept implementation of PostgreSQL. You can find it here: https://github.com/anarthal/postgres-asio
Hi. Great, I've been following this thread from afar, because first I'm using/learning ASIO myself, and second because I call databases as part of the processing I do in my ASIO-scheduled "tasks". But I'm a PostgreSQL user, not a MySQL one, thus my relative lack of interest until now :).
It supports connections, queries and prepared statements with an API almost identical to the one proposed for Boost.MySQL. I have only implemented synchronous operations yet. I hope this helps prove that implementing other databases with a very similar API is feasible and helps push the MySQL library forward.
I'm confused. What's the point of implementing synchronous at all rather than use libpq directly? libpq is "official" and synchronous, so the *only* reason I'd prefer your re-implementation of the protocol would be to gain async via ASIO. I haven't looked at the code, but I would also have assumed that any synchronuous API for a library that stated to the async, would be as a convenience layer on top of the async layer, doing the waiting/synchronization for you, to simplify things when async is not strictly necessary. But what you wrote above seems to imply the contrary. Is the async API implemented on top of the sync API? Or you implement two separate APIs completely?
Note that the Postgres library is just a proof of concept; it is very far from being a full implementation. At this point I lack the time to transform this library into a full implementation. If somebody wants to volunteer and progress this library into a full implementation (or just write something from scratch), I would appreciate it a lot - just message me.
It's hard to get traction on alternatives to official APIs (like libpq for PostgreSQL), w/o some pretty strong guarantees on testing. And testing with databases is complicated because of the need to access that remote DB; you can't easily mock a database reliably. I'm not sure how that would work within Boost. Are there precedent for Boost libraries that need "external" resources like full blown databases? Perhaps via VMs and/or Docker?
As always, any feedback is welcome.
A few things that came to my mind reading this thread, from my DB experience, is that the re-implement-the-low-level-protocol approach applies to very few DB systems, given that most commercial DBs use proprietary protocols (I know Oracle does for sure). Or the most used DB in the world (SQLite) doesn't have a network protocol at all. So that pretty much only leaves MySQL and PostgreSQL, no? So at the very least, both should be supported by a library that wants to gain widespread appeal IMHO. And from my POV, you chose the wrong one to start with :) Another remark is that PostgreSQL has two modes, text (the default) and binary to send row data back and forth. We've tested both and went the most difficult route of using the binary route for performance reasons, which complicates things. Do you handle both modes at the protocol level? I'd conclude by saying that from my perspective, the one-true-SQL-library is a pipe dream when you want to maximum performance, or when wanting to avoid the lowest-common denominator approach. I've used Array-binding in Oracle, and XMLTYPE, and low-level C/C++ binding of Spatial objects in OCI, and pretty much no SQL abstraction libraries can do these things. After doing lots of Oracle and SQLite, and now doing PostgreSQL, the differences are actually very large between these DBs, to the point a lots of the optimization done for one are not applicable to the other, and many things just don't translate at all. If one stays within a *tiny* subset of SQL and features, maybe... But any non-trivial sophisticated DB use, you quickly find the limits libraries that try to be multi-DB IMHO. So in your case, what matters most is the async ASIO-based design, and having a framework to develop async DB client libraries, but not necessarily exposing exactly the same API for all of them. What matters is to be able to start processing rows and sending them downstream for further processing, *BEFORE* the whole result set made it client-side. Maybe what I'm saying doesn't make sense, because I haven't studied your code and examples yet, but FWIW, these are the thoughts I had following this thread. Thanks, --DD