I've written a boost.sqlite library last year mainly to experiment, but it has been a feature-complete C++ wrapper for a while. Hence I'd like to gauge interest to see if it's worth proposing for boost. It has more features than most C++ sqlite wrappers and uses boost-isms such as system.result, tag_invoke and boost.json. Code: https://github.com/klemens-morgenstern/sqlite Docs: https://klemens.dev/sqlite/
On Thu, Jan 4, 2024 at 10:51 AM Klemens Morgenstern via Boost < boost@lists.boost.org> wrote:
I've written a boost.sqlite library last year mainly to experiment, [...] Hence I'd like to gauge interest to see if it's worth proposing for boost.
Sure, I'd be interested. I've used SQLite extensively for years, both with of-the-shelve wrappers, and internal ones (historical). I could use a good one, that went through a Boost review, from someone with good meta-programming experience like yourself. Note that we push lots of large text and blob values through SQLite's binding APIs, and I use no-copy binding (SQLITE_STATIC) extensively (memory needs to be "alive" only until the statement is step'd, not really static). Does your binding support that? In a separate PostgreSQL internal wrapper, I use std::ref() as a "marker" for no-copy binding. Do you do something similar? Would you support the idea? I also use std::optional for NULLs, on both the bind side, and the get side. Do you support it as well? For example, in my PostgreSQL wrapper, getting a value that's NULL is a runtime error, unless the value is an std::optional, i.e. the NULL is expected. A (very) quick look at the doc shows r.at(N).get_text() to get row values. Do you have an alternative that assigns to typed variables instead? Something like ``` bool flag; std::string text; std::vectorstd::byte blob; row >> flag >> text >> blob; ``` I my wrappers, I also support std::span, both on bind and get side as well. In both SQLite and PostgreSQL, on the get side, the memory is owned by them, (the result set in PostgreSQL, the statement in SQLite, until step'd at least, again) so there's no reason to always copy it out, often you just want to inspect at it, and do something with it, w/o an extra needless copy. We move GBs via SQLite ETLs, so that no-copy matters, and even though the gains are small, they exist. I'd expect a Boost quality wrapper allowing maximum performance. The vtable support is also interesting. We've wrapped it years ago, and I haven't touched it for years, and I sure would welcome something easier / better. One large app of ours uses vtables extensively, on top of Boost-Multi-Index containers, exposing the indexes to SQLite. If you haven't thought of that, that could make for a good example. It's tricky to implement xBestIndex, so a good showcase of your wrapper. The fact you mention subtypes shows you're aware of SQLite subtleties so that's good. Native C++ support for the new jsonb format might also be interesting. Richard Hipp says it's an internal format, but since it's just a blob to the outside, there's no reason not to process it separately from SQLite, outside SQL. This is fairly new and not even officially released perhaps. Are you aware of it? Anyways, I'll try to find some time to look at your wrapper. And I definitely endorse a production quality SQLite wrapper in Boost. Heck, I'd welcome a PostgreSQL one, I've peeved there's only MySQL support :) --DD PS: And with two SQL-DB-related wrappers in Boost, maybe an ORM isn't far? :) I'm personally not a fan of ORMs, but who knows, maybe I haven't found a good one? PPS: Perhaps with Peter's Boost.Describe (awaiting native C++ support), and the no-annotation variant for simple structs (PFR?), at least each wrapper could provide auto-magic assignments to structs from the SQLite stepped rows? I think I saw something like that for Boost.MySQL, no?
On Thu, Jan 4, 2024 at 6:43 PM Dominique Devienne
On Thu, Jan 4, 2024 at 10:51 AM Klemens Morgenstern via Boost
wrote: I've written a boost.sqlite library last year mainly to experiment, [...] Hence I'd like to gauge interest to see if it's worth proposing for boost.
Sure, I'd be interested. I've used SQLite extensively for years, both with of-the-shelve wrappers, and internal ones (historical).
I could use a good one, that went through a Boost review, from someone with good meta-programming experience like yourself.
Note that we push lots of large text and blob values through SQLite's binding APIs, and I use no-copy binding (SQLITE_STATIC) extensively (memory needs to be "alive" only until the statement is step'd, not really static). Does your binding support that?
If you use views with parameterized queries, yes: https://github.com/klemens-morgenstern/sqlite/blob/02824d508ff6e79a8432f3ee5...
In a separate PostgreSQL internal wrapper, I use std::ref() as a "marker" for no-copy binding. Do you do something similar? Would you support the idea?
I don't, it's just the type (string_view or blob_view). This is simpler (because it'll never be async) and since sqlite doesn't have a dynamic type system, I think anything else is overkill.
I also use std::optional for NULLs, on both the bind side, and the get side. Do you support it as well?
I don't, I ended up writing my own type wrapping values. But you can extend the bind side.
For example, in my PostgreSQL wrapper, getting a value that's NULL is a runtime error, unless the value is an std::optional, i.e. the NULL is expected.
A (very) quick look at the doc shows r.at(N).get_text() to get row values. Do you have an alternative that assigns to typed variables instead? Something like
No, on purpose. that's not how sqlite works and it's very easy to just wrap `sqlite::value` to do your own thing. That means I don't need to pick things like error handling for a user. Since you don't own the data, you'll get a blob or string_view, which is as fast as it gets.
``` bool flag; std::string text; std::vectorstd::byte blob; row >> flag >> text >> blob; ``` I my wrappers, I also support std::span, both on bind and get side as well. In both SQLite and PostgreSQL, on the get side, the memory is owned by them, (the result set in PostgreSQL, the statement in SQLite, until step'd at least, again) so there's no reason to always copy it out, often you just want to inspect at it, and do something with it, w/o an extra needless copy. We move GBs via SQLite ETLs, so that no-copy matters, and even though the gains are small, they exist.
I'd expect a Boost quality wrapper allowing maximum performance.
The vtable support is also interesting. We've wrapped it years ago, and I haven't touched it for years, and I sure would welcome something easier / better. One large app of ours uses vtables extensively, on top of Boost-Multi-Index containers, exposing the indexes to SQLite. If you haven't thought of that, that could make for a good example. It's tricky to implement xBestIndex, so a good showcase of your wrapper.
Doing that automatically wasn't within my skills, but I got an example: https://github.com/klemens-morgenstern/sqlite/blob/develop/example/multi_ind...
The fact you mention subtypes shows you're aware of SQLite subtleties so that's good.
Native C++ support for the new jsonb format might also be interesting. Richard Hipp says it's an internal format, but since it's just a blob to the outside, there's no reason not to process it separately from SQLite, outside SQL. This is fairly new and not even officially released perhaps. Are you aware of it?
I am aware of jsonb, but not of any C-API to process it. I just used boost.json to handle json data up to now, but that might be slow for jsonb.
Anyways, I'll try to find some time to look at your wrapper. And I definitely endorse a production quality SQLite wrapper in Boost. Heck, I'd welcome a PostgreSQL one, I've peeved there's only MySQL support :)
--DD
PS: And with two SQL-DB-related wrappers in Boost, maybe an ORM isn't far? :) I'm personally not a fan of ORMs, but who knows, maybe I haven't found a good one?
I don't like ORMs, but I think a boost.sql library that makes it easy to write "portable sql" would be cool.
PPS: Perhaps with Peter's Boost.Describe (awaiting native C++ support), and the no-annotation variant for simple structs (PFR?), at least each wrapper could provide auto-magic assignments to structs from the SQLite stepped rows? I think I saw something like that for Boost.MySQL, no?
That's possible, but would be something I'd put in above mentioned boost.sql.
PPS: Perhaps with Peter's Boost.Describe (awaiting native C++ support), and the no-annotation variant for simple structs (PFR?), at least each wrapper could provide auto-magic assignments to structs from the SQLite stepped rows? I think I saw something like that for Boost.MySQL, no?
Yes, Boost.MySQL supports parsing into user-defined Describe structs and tuples using what we call "static interface" (https://www.boost.org/doc/libs/master/libs/mysql/doc/html/mysql/static_inter...). There are plans on adding support for PFR types soon. In its simplest form, given a Describe struct type T, you can write: static_results<T> r; conn.execute("SELECT a, b FROM mytable", r); Which will parse all rows returned from the query into objects of type T and store them in r. More advanced uses (like reading row-by-row) also exist and are described in the link above. Regards, Ruben.
Klemens Morgenstern wrote:
I've written a boost.sqlite library last year mainly to experiment, but it has been a feature-complete C++ wrapper for a while.
Hence I'd like to gauge interest to see if it's worth proposing for boost.
It has more features than most C++ sqlite wrappers and uses boost-isms such as system.result, tag_invoke and boost.json.
Code: https://github.com/klemens-morgenstern/sqlite Docs: https://klemens.dev/sqlite/
That's interesting. I have also written an sqlite wrapper - as I bet lots of others have. What do you do about multi-threaded usage? FYI here is the readme for my wrapper. // Wrapper for SQLite. // // Usage: // // Open the database file: // sqlite::Database db("/path/to/sqlite/file"); // // Set a busy polling interval: // db.set_busy_timeout(1.0); // // * If no busy timeout is set, an attempt to access the database (for reading or // writing) can fail because another process has it locked for writing. If a busy // timeout has been set with this method, the process will sleep briefly when it // finds the database is locked and then retry, until it succeeds or the total time // (in seconds) has elapsed. // // Load an extension: // db.load_extension("/path/to/extension.so"); // // Create a prepared query. ?s are placeholders for paramters: // sqlite::Query q1(db, "insert into t values (?,?)"); // // Execute the query, substituting the parameters: // q1("hello","world"); // // * SQLite is dynamically typed; the types are INTEGER, REAL, TEXT, BLOB and NULL; // its API further distinguishes between int and int64. NULL is not currently // supported here. std::string_view is used for TEXT. float is promoted // to double for REAL. // // Execute a select query which returns results: // sqlite::Query q2(db, "select a from t where b=?"); // { // auto r = q2("hello"); // ... // } // // * IMPORTANT: note that the scope of the result object r must be limited. // Execution of the query occurs stepwise, with only the first step performed // in the invokation of q2(); subsequent steps occur as result data is read // from r, and the execution is "reset" in r's dtor. Each prepared query // can only have a single execution in progress at any time, so r must have // been destroyed before q2 is invoked again. // // Getting result data: // sqlite::Query q3(db, "select a,b,c from t where b=?"); // { // auto r = q3("hello"); // r.foreach_row( [&](std::string a, std::string b, int c) { // ... // }); // } // // * forach_row takes a callable, e.g. a lambda, and invokes it for each row of // the query results. The callable's parameter types can be std::string // (FIXME should that be string_view?), int, int64 or double, and they are // passed the corresponding column values; SQLite will cast them if necessary. // Alternatively if the callable accepts a single std::arraystd::string,N // (with suitable N) or std::vectorstd::string then the columns are all // extracted as strings into the container. // Note that a lambda with auto parameters cannot be used. // Note that foreach_row() can only be invoked once on a result. // // Simplified syntax that avoids the result scope issue: // sqlite::Query q3(db, "select a,b,c from t where b=?"); // q3("hello").foreach_row( [&](std::string a, std::string b, int c) { // ... // }); // // The result object also has an n_columns() method. // // It's also possible to execute miscellaneous SQL using the exec method: // db.exec("pragma foreign_key = on"); // // No results are returned from this method. Multiple statements can be passed in // a single call, separated by semicolons. // // It is possible to run queries inside transactions scoped: // { // sqlite::Transaction t // q(...); // t.commit(); // } // // If the transaction's dtor is reached without commit() having been invoked, because // q() has thrown an exception for example, the transaction is rolled back. // // Errors are indicated by throwing sqlite::Exception. Regards, Phil.
On Fri, Jan 5, 2024 at 2:15 PM Phil Endecott via Boost < boost@lists.boost.org> wrote:
What do you do about multi-threaded usage?
Hi. What is there to do Phil? SQLite itself deals with that. And even then, it depends how you build it, and even how you configure it at runtime. SQLite is inherently single-threaded, but can be thread-safe on its connection (sqlite3*), and can serialize access. Although most anything one does requires several API calls, even that serialization is probably not enough for correct usage. It's much better and recommended in MT apps to either use one connection per thread, or do higher level serialization (thus not at the per-API level as SQLite does). Klemens' API is rather high level, with .execute() hiding multiple bind calls, step()+reset() calls, so he could I guess lock around that set of calls, but even that is not enough for queries, since getting the values from the resultset accesses the very same statement that was executed. Really it's the client code that has the full context and can do the locking, if relevant. My $0.02. --DD
Dominique Devienne wrote:
On Fri, Jan 5, 2024 at 2:15 PM Phil Endecott via Boost
wrote: What do you do about multi-threaded usage?
Hi. What is there to do Phil?
I mentioned this because it is something that I got burned by recently. The sqlite docs are now improved after my feedback. Having the wrapper assert() if it detects non-thread-safe usage would be useful. I don't know how feasible that is.
[...] use one connection per thread
Yes, this is likely the best approach in most cases. You might also consider a connection pool. Could a generalised connection pool be something of value in Boost? I may have mentioned this before in relation to MySQL. On another subject, I note that the proposed API extensively uses string_view as a function return type. For example, field::column_name() returns a string_view. I consider this an anti-pattern. (In case anyone doesn't understand the issue, the danger is that the view will be dangling if the caller keeps it beyond the life of... some other object; I was going to write "the field object", but I'm not sure if that's right; maybe the row? The docs don't say.) In the case of e.g. column_name(), std::string's small buffer optimisation means that returning a std::string will not involve dynamic allocation unless the column name is more than maybe 23 characters long, which surely must be sufficient in practically all cases. If you really worry about long column names, please have a separate method (e.g. column_name_view()) whose name is a warning that it returns a view. There is more justification for using view-like types for the actual data. I'd still argue that the default should be to return safe types with value semantics, with those methods that return views named to indicate that. There is a lot of criticism of C++'s memory-unsafety. Dangling views are part of that. In newly-designed APIs, we should be going out of our way to offer memory safety by default. Note how I handle this in my sqlite wrapper: Query q(db, "select a,b from t where c=?"); q("hello").foreach_row( [&](std::string a, std::string_view b) { // do something with a and b }); Note that I can use strings or string_views as the parameters to the per-row lambda. If I choose to use a string_view I have to explicitly write "string_view", making it clear what the type is. If I were writing this today, I'd investigate whether generator co-routines could be used instead of the lambda. Regards, Phil.
[...] use one connection per thread
Yes, this is likely the best approach in most cases. You might also consider a connection pool. Could a generalised connection pool be something of value in Boost? I may have mentioned this before in relation to MySQL.
Boost.MySQL has recently added dedicated connection_pool functionality. I don't see how sqlite and MySQL could share pooling functionality, since one is sync and the other is async. Additionally, efficient pooling relies on database-specific details, which are not exposed to the end user but are used internally to boost performance.
On another subject, I note that the proposed API extensively uses string_view as a function return type. For example, field::column_name() returns a string_view. I consider this an anti-pattern. (In case anyone doesn't understand the issue, the danger is that the view will be dangling if the caller keeps it beyond the life of... some other object; I was going to write "the field object", but I'm not sure if that's right; maybe the row? The docs don't say.) In the case of e.g. column_name(), std::string's small buffer optimisation means that returning a std::string will not involve dynamic allocation unless the column name is more than maybe 23 characters long, which surely must be sufficient in practically all cases. If you really worry about long column names, please have a separate method (e.g. column_name_view()) whose name is a warning that it returns a view.
MSVC standard lib and stdlibc++ std::string's have 15 characters of static capacity. Regards, Ruben.
On Jan 6, 2024, at 7:50 AM, Ruben Perez via Boost
[...] use one connection per thread
Yes, this is likely the best approach in most cases. You might also consider a connection pool. Could a generalised connection pool be something of value in Boost? I may have mentioned this before in relation to MySQL.
Boost.MySQL has recently added dedicated connection_pool functionality. I don't see how sqlite and MySQL could share pooling functionality, since one is sync and the other is async. Additionally, efficient pooling relies on database-specific details, which are not exposed to the end user but are used internally to boost performance.
On another subject, I note that the proposed API extensively uses string_view as a function return type. For example, field::column_name() returns a string_view. I consider this an anti-pattern. (In case anyone doesn't understand the issue, the danger is that the view will be dangling if the caller keeps it beyond the life of... some other object; I was going to write "the field object", but I'm not sure if that's right; maybe the row? The docs don't say.) In the case of e.g. column_name(), std::string's small buffer optimisation means that returning a std::string will not involve dynamic allocation unless the column name is more than maybe 23 characters long, which surely must be sufficient in practically all cases. If you really worry about long column names, please have a separate method (e.g. column_name_view()) whose name is a warning that it returns a view.
MSVC standard lib and stdlibc++ std::string's have 15 characters of static capacity.
libc++’s SSO string holds 22 chars on a 64 bit platforms. — Marshall #include <iostream> #include <string> int main () { std::cout << sizeof (std::string) << std::endl; std::string s; for (int i = 0; i < 30; ++i) { s.push_back(' '); std::cout << s.size() << " " << s.capacity() << std::endl; } }
libc++’s SSO string holds 22 chars on a 64 bit platforms.
A summary of SBO sizes: https://godbolt.org/z/5EchKdacK gcc/clang with stdlibc++: 15 MSVC: 15 clang with -std=libc++: 23 Regards, Ruben.
participants (5)
-
Dominique Devienne
-
Klemens Morgenstern
-
Marshall Clow
-
Phil Endecott
-
Ruben Perez