Hi all, I've been thinking of writing a base64 encoder/decoder library for some time. I first found the need when writing ServerTech chat (https://github.com/anarthal/servertech-chat), and I couldn't find a library that suited my needs. All the libraries I could find were either forgotten one-day projects with no tests, or private parts of something bigger.
From this issue from Beast (https://github.com/boostorg/beast/issues/1710) it looks like this could be useful to people.
My idea would be writing something that is: * Boost quality - extensive testing and fuzzing. * Configurable. My use case in ServerTech required me to not output padding chars, for instance. * Support for a streaming API. * Support for calling in a constexpr context. * I'd prefer to focus on doing base64 only and doing it well, rather than targeting arbitrary bases. I don't have any sample code yet. Following our recommendations, I'm only determining interest. My questions are: * Do you think a library like this could be useful to the C++ community? * If the answer to the above is yes, do you think it could belong to Boost, or would be better as a standalone library? Regards, Ruben.
On Mar 19, 2024, at 10:46 AM, Ruben Perez via Boost
wrote: Hi all,
I've been thinking of writing a base64 encoder/decoder library for some time. I first found the need when writing ServerTech chat (https://github.com/anarthal/servertech-chat), and I couldn't find a library that suited my needs. All the libraries I could find were either forgotten one-day projects with no tests, or private parts of something bigger.
From this issue from Beast (https://github.com/boostorg/beast/issues/1710) it looks like this could be useful to people.
My idea would be writing something that is: * Boost quality - extensive testing and fuzzing. * Configurable. My use case in ServerTech required me to not output padding chars, for instance. * Support for a streaming API. * Support for calling in a constexpr context. * I'd prefer to focus on doing base64 only and doing it well, rather than targeting arbitrary bases.
I don't have any sample code yet. Following our recommendations, I'm only determining interest. My questions are: * Do you think a library like this could be useful to the C++ community? * If the answer to the above is yes, do you think it could belong to Boost, or would be better as a standalone library?
I wrote something like this back in the day: https://github.com/mclow/snippets/blob/master/Base64.hpp If you look in that repo, you can find tests, and fuzzing tests. No docs, though. I’d be interested in hearing why (or why not) it fits your needs. — Marshall
I wrote something like this back in the day: https://github.com/mclow/snippets/blob/master/Base64.hpp
If you look in that repo, you can find tests, and fuzzing tests. No docs, though.
I’d be interested in hearing why (or why not) it fits your needs.
It wouldn't because: * It's hardwired to use exceptions. My use case is a parser for a data format with several base64 fields. It's not exceptional for them to be malformed, so I'd rather use error codes. * The format I'm parsing (P-H-C strings, a way to store hashed passwords - see https://github.com/P-H-C/phc-string-format/blob/master/phc-sf-spec.md) mandates to not output padding characters. Also, I don't see any tests for error cases. Regards, Ruben.
On Tue, Mar 19, 2024 at 10:46 AM Ruben Perez via Boost
* Do you think a library like this could be useful to the C++ community?
I'm interested in this, whether or not there is general interest for Boost, as it is an area of API research that could yield general benefits. With respect to base64 though, I would imagine we want it to at least be comparable in performance to this: https://github.com/aklomp/base64 Thanks
Jumping in real quick in here. Base64 and other encoding algorithms are also a part of Boost.Crypto3 once proposed some about 4 years ago (https://github.com/nilfoundation/boost-crypto3). Some usage examples are in here: https://github.com/NilFoundation/boost-crypto3/blob/master/test/codec/base.c.... Sincerely yours, Misha Komarov nemo@nil.foundation
On 19 Mar 2024, at 20:08, Vinnie Falco via Boost
wrote: On Tue, Mar 19, 2024 at 10:46 AM Ruben Perez via Boost
wrote: * Do you think a library like this could be useful to the C++ community?
I'm interested in this, whether or not there is general interest for Boost, as it is an area of API research that could yield general benefits. With respect to base64 though, I would imagine we want it to at least be comparable in performance to this:
https://github.com/aklomp/base64
Thanks
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
I'm interested in this, whether or not there is general interest for Boost, as it is an area of API research that could yield general benefits. With respect to base64 though, I would imagine we want it to at least be comparable in performance to this:
Looks like a great library - and pretty optimized. I've been looking into their source code and it looks like they dynamically query CPU capabilities and store them in a global struct without any protection - I'd say this makes the library not thread-safe. Looks like a good library to take inspiration from, especially regarding performance. Thanks, Ruben.
I've been thinking of writing a base64 encoder/decoder library for some time. I first found the need when writing ServerTech chat (https://github.com/anarthal/servertech-chat), and I couldn't find a library that suited my needs. All the libraries I could find were either forgotten one-day projects with no tests, or private parts of something bigger.
From this issue from Beast (https://github.com/boostorg/beast/issues/1710) it looks like this could be useful to people.
I was in a similar situation yesterday, I looked at the same ticket, and probably the same search results that took me there. I was close to including the beast detail implementation, like some SO post suggested. but in the end Vinnies comments about not doing that made me look further. Daniel Lemire has a post describing how to do fast base64 encoding/decoding https://lemire.me/blog/2018/01/17/ridiculously-fast-base64-encoding-and-deco... which is implemented in this library which is still maintained (last commit 3 weeks ago) https://github.com/aklomp/base64 to be honest, I've not yet made the tests to verify that it works, but it is for sure easy to work with, and supposedly very fast. I am lazy and would have used the boost version if it existed something neat. but maybe there isn't a real need for it to be in boost. Kind Regards
On Tue, Mar 19, 2024 at 12:46 PM Ruben Perez via Boost
* Do you think a library like this could be useful to the C++ community?
Yes, but.. I would like a library that handles various types of encoding/decoding with the "same" interface. Encodings that come to mind: base-64, url, html, radix-64, base-16, base-32, custom base-x alphabet table, base-36, base-62, and so on.
* If the answer to the above is yes, do you think it could belong to Boost, or would be better as a standalone library?
If it's more than base-64, yes, it could be a Boost library. -- -- René Ferdinand Rivera Morell -- Don't Assume Anything -- No Supone Nada -- Robot Dreams - http://robot-dreams.net
On Tue, Mar 19, 2024 at 1:16 PM René Ferdinand Rivera Morell
On Tue, Mar 19, 2024 at 12:46 PM Ruben Perez via Boost
wrote: * Do you think a library like this could be useful to the C++ community?
Yes, but.. I would like a library that handles various types of encoding/decoding with the "same" interface. Encodings that come to mind: base-64, url, html, radix-64, base-16, base-32, custom base-x alphabet table, base-36, base-62, and so on.
* If the answer to the above is yes, do you think it could belong to Boost, or would be better as a standalone library?
If it's more than base-64, yes, it could be a Boost library.
PS. Yes, I do realize you said you would want to concentrate on base64 only. But it's the concepts of encoding/decoding for arbitrary kinds what would interest me. -- -- René Ferdinand Rivera Morell -- Don't Assume Anything -- No Supone Nada -- Robot Dreams - http://robot-dreams.net
thinking of writing a base64 encoder/decoder library Absolutely! But I fear we might need base-whatever. Where the selection of base-radices can/shouldbe judiciously selected.
I wrote something like this back in the day As did so many others including myself. forgot
There was another comment includingbase-64 conversions in cryptographicdomains.
Yes we need base-whatever conversions*and* cryptography. But these are different.
I need base conversions for cryptographybut that is not the only place I convert bases.And vice-versa.
I'd like to have cryptography (if it ever gets there)include a header-only base conversion.
Chris
On Tuesday, March 19, 2024 at 07:17:00 PM GMT+1, René Ferdinand Rivera Morell via Boost
* Do you think a library like this could be useful to the C++ community?
Yes, but.. I would like a library that handles various types of encoding/decoding with the "same" interface. Encodings that come to mind: base-64, url, html, radix-64, base-16, base-32, custom base-x alphabet table, base-36, base-62, and so on.
* If the answer to the above is yes, do you think it could belong to Boost, or would be better as a standalone library?
If it's more than base-64, yes, it could be a Boost library. -- -- René Ferdinand Rivera Morell -- Don't Assume Anything -- No Supone Nada -- Robot Dreams - http://robot-dreams.net _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
On Tue, Mar 19, 2024 at 11:17 AM René Ferdinand Rivera Morell via
Boost
Yes, but.. I would like a library that handles various types of encoding/decoding with the "same" interface. ... url ...
Hmm.... I disagree. There are often unique qualities of an encoding that complicate creating a generic API. For example, with URL-encoding, there is the concept of the "reserved set." That is, the set of characters for which escaping is required. Different parts of a URL have different reserved sets. The target for example reserves the forward slash (among other things). The query reserves the hashtag '#' but not the forward slash. On the other hand base64 has no concept of reserved sets as it operates on unsigned integers of arbitrary bit width. One is a numeric encoding, the other is a character encoding.
If it's more than base-64, yes, it could be a Boost library.
It isn't clear why only offering base-64 functionality is insufficient. In fact, as a proponent of "modular boost" surely you see value in isolating each radix to its own library. Thanks
It isn't clear why only offering base-64functionality isinsufficient
Good point. In light of <charconv>-likeconversions maybe base-64 could (or evenshould) stand alone.
This seems deep. Hmmm. Must consider.
On Tuesday, March 19, 2024 at 07:27:08 PM GMT+1, Vinnie Falco via Boost
Yes, but.. I would like a library that handles various types of encoding/decoding with the "same" interface. ... url ...
Hmm.... I disagree. There are often unique qualities of an encoding that complicate creating a generic API. For example, with URL-encoding, there is the concept of the "reserved set." That is, the set of characters for which escaping is required. Different parts of a URL have different reserved sets. The target for example reserves the forward slash (among other things). The query reserves the hashtag '#' but not the forward slash. On the other hand base64 has no concept of reserved sets as it operates on unsigned integers of arbitrary bit width. One is a numeric encoding, the other is a character encoding.
If it's more than base-64, yes, it could be a Boost library.
It isn't clear why only offering base-64 functionality is insufficient. In fact, as a proponent of "modular boost" surely you see value in isolating each radix to its own library. Thanks _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
On Tue, Mar 19, 2024 at 1:26 PM Vinnie Falco
On Tue, Mar 19, 2024 at 11:17 AM René Ferdinand Rivera Morell via Boost
wrote: Yes, but.. I would like a library that handles various types of encoding/decoding with the "same" interface. ... url ...
Hmm.... I disagree. There are often unique qualities of an encoding that complicate creating a generic API. For example, with URL-encoding, there is the concept of the "reserved set." That is, the set of characters for which escaping is required. Different parts of a URL have different reserved sets. The target for example reserves the forward slash (among other things). The query reserves the hashtag '#' but not the forward slash.
On the other hand base64 has no concept of reserved sets as it operates on unsigned integers of arbitrary bit width. One is a numeric encoding, the other is a character encoding.
Understood.. But having a generic API that models the reserved context independent of the encoding would allow composition of the encoder for url-target, query-target, host-target, etc.
If it's more than base-64, yes, it could be a Boost library.
It isn't clear why only offering base-64 functionality is insufficient. In fact, as a proponent of "modular boost" surely you see value in isolating each radix to its own library.
Yes :-) But I was also thinking that a user is more likely to use a library if it overcomes the download "cost". If it's just base64 they are more likely to either write their own (by copy-pasting from somewhere) or use some single header standalone library instead of reaching for the Boost solution. -- -- René Ferdinand Rivera Morell -- Don't Assume Anything -- No Supone Nada -- Robot Dreams - http://robot-dreams.net
On Tue, Mar 19, 2024 at 11:35 AM René Ferdinand Rivera Morell
Understood.. But having a generic API that models the reserved context independent of the encoding would allow composition of the encoder for url-target, query-target, host-target, etc.
Again the benefits are unclear. Creating a generic API is a tradeoff, as it constrains a specific implementation into meeting the requirements of the generic API. This might be ok if all the implementations are largely the same but that becomes less and less true as you add more encodings and especially decodings. For example, the API for an encoder that supports streaming is going to be necessarily different from one that does not support streaming. Trying to abstract the "streaming" feature of the encoder will be an exercise in futility (as I know from experience). The implementation is also going to be different between them, with different performance profiles. The implementation for a decoder which can assume the output buffer has sufficient size looks radically different than when it has to handle the potential for insufficient space. The API also looks different as in how the "insufficient space" condition is communication. Trying to make this generic, is also going to be weird. There are actually two problems proposed here: 1. High-quality implementations of numerical radix conversions (I leave out character encodings since those are really a completely different thing) 2. Designing a generic API for parameterizing radix conversions There is a clear need for 1, but there is little evidence that a Boost-quality solution for 2 exists. A quick search shows no shortage of implementations (that is, solutions to number 1 above) for a pretty good number of different bases. If you believe that a workable generic API exists for encoding and decoding, that is a separate problem that can be approached in terms of design. And you can explore that by downloading and using the already-existing, non-Boost libraries which perform radix conversions and adapt them to your prototyped API. FYI, I have explored the design space of "generic APIs for radix conversions" and those experiments didn't go well. Maybe someone else can show me how it's done...but it seems like the more generic you make the interface the less opportunity you have for something that is optimized for a particular base (like 64 as we are contemplating). Thanks
Yes, but.. I would like a library that handles various types of encoding/decoding with the "same" interface. Encodings that come to mind: base-64, url, html, radix-64, base-16, base-32, custom base-x alphabet table, base-36, base-62, and so on.
As a user, I've never felt this need - I usually need to encode something to a particular base or encoding, and I'd like to have a component that does it and does it well. As others have pointed out, the more general we make it, the less optimization opportunities we have. Aside from URL and HTML, I've found the need for base-64 and base-16 in my day to day. What use cases do the other cases cover? Thanks, Ruben.
On Wed, Mar 20, 2024 at 12:29 PM Ruben Perez via Boost < boost@lists.boost.org> wrote:
Yes, but.. I would like a library that handles various types of encoding/decoding with the "same" interface. Encodings that come to mind: base-64, url, html, radix-64, base-16, base-32, custom base-x alphabet table, base-36, base-62, and so on.
As a user, I've never felt this need - I usually need to encode something to a particular base or encoding, and I'd like to have a component that does it and does it well. As others have pointed out, the more general we make it, the less optimization opportunities we have.
Aside from URL and HTML, I've found the need for base-64 and base-16 in my day to day. What use cases do the other cases cover?
As an anecdotal counterpoint to Ruben's, I needed base-62 (by choice, for identifier), and base-64 of course (in various variants), and ascii-85. So I'm more in Andrey's camp than a more generic library would suit me better. But that doesn't preclude having a more optimized base-64 variant, since the fact it's "bit-aligned" (to 6-bits) makes it more open to optimization (SIMD). I've also looked at aklomp (new to me, thanks for the link Vinnie), which looks impressive, but sometimes all you want is a small and well tested impl. But they claim having a fast non-SIMD variant too (as a fallback), so maybe good enough? Anyone who wants to squeeze out the last bit of performance can it directly. Same way one would use simdjson directly, instead of Boost.JSON. My $0.02. --DD
On 19.03.24 19:16, René Ferdinand Rivera Morell via Boost wrote:
Yes, but.. I would like a library that handles various types of encoding/decoding with the "same" interface. Encodings that come to mind: base-64, url, html, radix-64, base-16, base-32, custom base-x alphabet table, base-36, base-62, and so on.
Depends on what you mean by "same" interface. I would expect separate functions for each encoding with different customization parameters. This does not require that the different encodings come from the same library; it just requires that they use the same API conventions. For example, this is good (in terms of parallelism, not necessarily in terms of the specifics of the API): result = base64_encode(source, base64_options::no_padding); result2 = base16_encode(source, base16_options::lower_case); This is not so good, because it mixes the options of different encodings, resulting in potentially nonsensical combinations: result = baseX_encode<64>(source, encoding_options::no_padding); result2 = baseX_encode<16>(source, encoding_options::lower_case); // The lower_case option is non-sensical for base 64; can this // error be caught at compile time? // result3 = baseX_encode<64>(source, encoding_options::lower_case); This is just bad, because it sacrifices performance and type safety for the dubious flexibility of specifying encoding at runtime: result = baseX_encode(64, source, encoding_options::no_padding); result2 = baseX_encode(16, source, encoding_options::lower_case); -- Rainer Deyke (rainerd@eldwood.com)
On 3/21/24 11:35, Rainer Deyke via Boost wrote:
On 19.03.24 19:16, René Ferdinand Rivera Morell via Boost wrote:
Yes, but.. I would like a library that handles various types of encoding/decoding with the "same" interface. Encodings that come to mind: base-64, url, html, radix-64, base-16, base-32, custom base-x alphabet table, base-36, base-62, and so on.
Depends on what you mean by "same" interface. I would expect separate functions for each encoding with different customization parameters. This does not require that the different encodings come from the same library; it just requires that they use the same API conventions. For example, this is good (in terms of parallelism, not necessarily in terms of the specifics of the API):
result = base64_encode(source, base64_options::no_padding); result2 = base16_encode(source, base16_options::lower_case);
This is not so good, because it mixes the options of different encodings, resulting in potentially nonsensical combinations:
result = baseX_encode<64>(source, encoding_options::no_padding); result2 = baseX_encode<16>(source, encoding_options::lower_case);
// The lower_case option is non-sensical for base 64; can this // error be caught at compile time? // result3 = baseX_encode<64>(source, encoding_options::lower_case);
This is just bad, because it sacrifices performance and type safety for the dubious flexibility of specifying encoding at runtime:
result = baseX_encode(64, source, encoding_options::no_padding); result2 = baseX_encode(16, source, encoding_options::lower_case);
I agree. In every case I have, I know exactly what encoding I want in every instance; using a different encoding or allowing it to be determined at run time wouldn't make sense. I think, runtime configurability is a rare use case, enough to maybe provide it as a separate layer on top of the specialized algorithms, if at all.
On Thu, Mar 21, 2024 at 3:35 AM Rainer Deyke via Boost
On 19.03.24 19:16, René Ferdinand Rivera Morell via Boost wrote:
Yes, but.. I would like a library that handles various types of encoding/decoding with the "same" interface. Encodings that come to mind: base-64, url, html, radix-64, base-16, base-32, custom base-x alphabet table, base-36, base-62, and so on.
Depends on what you mean by "same" interface.
It certainly does. :-)
I would expect separate functions for each encoding with different customization parameters. This does not require that the different encodings come from the same library; it just requires that they use the same API conventions. For example, this is good (in terms of parallelism, not necessarily in terms of the specifics of the API):
result = base64_encode(source, base64_options::no_padding); result2 = base16_encode(source, base16_options::lower_case);
I wouldn't consider that good. Passable, sure. More..
This is not so good, because it mixes the options of different encodings, resulting in potentially nonsensical combinations:
result = baseX_encode<64>(source, encoding_options::no_padding); result2 = baseX_encode<16>(source, encoding_options::lower_case);
// The lower_case option is non-sensical for base 64; can this // error be caught at compile time? // result3 = baseX_encode<64>(source, encoding_options::lower_case);
It can be caught at compile time. But not with that interface.
This is just bad, because it sacrifices performance and type safety for the dubious flexibility of specifying encoding at runtime:
result = baseX_encode(64, source, encoding_options::no_padding); result2 = baseX_encode(16, source, encoding_options::lower_case);
Yeah, as Adrey mentions, making this a runtime choice is sufficiently rare that it's not worth even thinking about it. I certainly never had a use for a runtime choice for that. As for a sane interface.. I would think having encoder/decoder templates (perhaps as functors) is the way to go. For example: auto base64enc = boost::thing::base_encoder<64, boost::thing::encoding_options::no_padding>(); auto encoded = base64enc.encode(data); auto decoded = base64enc.decode(encoded); This makes it possible to pass the encoder object to generic code without worrying about calling some specific base64 or base16 functions. Having the template args also makes it possible to check valid combinations. It also makes it easier to specialize performant combinations and still cover everything else with not-so-performant default implementation. Note, don't take my saying that I would like more encoding coverage as a requirement for acceptance of such a hypothetical library. Having a couple to start would be good enough to be useful and address most API design issues. -- -- René Ferdinand Rivera Morell -- Don't Assume Anything -- No Supone Nada -- Robot Dreams - http://robot-dreams.net
Regarding the unified interface, you mean something like this? Base32 Encoding: https://github.com/NilFoundation/boost-crypto3/blob/master/test/codec/base.c... Base32 Decoding: https://github.com/NilFoundation/boost-crypto3/blob/master/test/codec/base.c... Same interface for Base56: https://github.com/NilFoundation/boost-crypto3/blob/master/test/codec/base.c... Sincerely yours, Misha Komarov nemo@nil.foundation
On 21 Mar 2024, at 15:43, René Ferdinand Rivera Morell via Boost
wrote: On Thu, Mar 21, 2024 at 3:35 AM Rainer Deyke via Boost
mailto:boost@lists.boost.org> wrote: On 19.03.24 19:16, René Ferdinand Rivera Morell via Boost wrote:
Yes, but.. I would like a library that handles various types of encoding/decoding with the "same" interface. Encodings that come to mind: base-64, url, html, radix-64, base-16, base-32, custom base-x alphabet table, base-36, base-62, and so on.
Depends on what you mean by "same" interface.
It certainly does. :-)
I would expect separate functions for each encoding with different customization parameters. This does not require that the different encodings come from the same library; it just requires that they use the same API conventions. For example, this is good (in terms of parallelism, not necessarily in terms of the specifics of the API):
result = base64_encode(source, base64_options::no_padding); result2 = base16_encode(source, base16_options::lower_case);
I wouldn't consider that good. Passable, sure. More..
This is not so good, because it mixes the options of different encodings, resulting in potentially nonsensical combinations:
result = baseX_encode<64>(source, encoding_options::no_padding); result2 = baseX_encode<16>(source, encoding_options::lower_case);
// The lower_case option is non-sensical for base 64; can this // error be caught at compile time? // result3 = baseX_encode<64>(source, encoding_options::lower_case);
It can be caught at compile time. But not with that interface.
This is just bad, because it sacrifices performance and type safety for the dubious flexibility of specifying encoding at runtime:
result = baseX_encode(64, source, encoding_options::no_padding); result2 = baseX_encode(16, source, encoding_options::lower_case);
Yeah, as Adrey mentions, making this a runtime choice is sufficiently rare that it's not worth even thinking about it. I certainly never had a use for a runtime choice for that. As for a sane interface.. I would think having encoder/decoder templates (perhaps as functors) is the way to go. For example:
auto base64enc = boost::thing::base_encoder<64, boost::thing::encoding_options::no_padding>(); auto encoded = base64enc.encode(data); auto decoded = base64enc.decode(encoded);
This makes it possible to pass the encoder object to generic code without worrying about calling some specific base64 or base16 functions. Having the template args also makes it possible to check valid combinations. It also makes it easier to specialize performant combinations and still cover everything else with not-so-performant default implementation.
Note, don't take my saying that I would like more encoding coverage as a requirement for acceptance of such a hypothetical library. Having a couple to start would be good enough to be useful and address most API design issues.
-- -- René Ferdinand Rivera Morell -- Don't Assume Anything -- No Supone Nada -- Robot Dreams - http://robot-dreams.net http://robot-dreams.net/
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
On Thu, Mar 21, 2024 at 6:51 AM Misha Komarov via Boost
Regarding the unified interface, you mean something like this?
If I were writing such I library I would: * minimize the use of templates * place as few function definitions as possible in the headers * make the library a compiled lib (NOT header-only) * require all input and output passed in contiguous buffers of char I would not: * use generic input iterators and output iterators * try make everything header-only This is the philosophy I have adopted ever since writing Boost.JSON. Having extensively written libraries both ways I have concluded that for most of the domains I work in, the purported benefits of templating everything (muh generic algorithm) do not outweigh the downsides. I especially don't see the value in templating the input and output iterators. I realize this will come across as an unpopular opinion, given the ingrained dogma of writing things "STL style." Thanks
On Thu, 21 Mar 2024 at 15:11, Vinnie Falco via Boost
On Thu, Mar 21, 2024 at 6:51 AM Misha Komarov via Boost
wrote: Regarding the unified interface, you mean something like this?
If I were writing such I library I would:
* minimize the use of templates * place as few function definitions as possible in the headers * make the library a compiled lib (NOT header-only) * require all input and output passed in contiguous buffers of char
I feel compelled to agree with Vinnie here. Also, the library should contain exactly two functions at inception: base64::encode base64::decode And none of the other "oh but what if we want base 63?". Nobody wants that or will ever want or need it. Base64 is mandated in a number of specifications and RFCs. It is used in these because it allows binary data to be encoded in such a way that it can be transmitted as 7-bit ascii without causing terminals to go haywire when printed.
I would not:
* use generic input iterators and output iterators * try make everything header-only
This is the philosophy I have adopted ever since writing Boost.JSON. Having extensively written libraries both ways I have concluded that for most of the domains I work in, the purported benefits of templating everything (muh generic algorithm) do not outweigh the downsides. I especially don't see the value in templating the input and output iterators.
I realize this will come across as an unpopular opinion, given the ingrained dogma of writing things "STL style."
Thanks
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
On 21.03.24 14:43, René Ferdinand Rivera Morell via Boost wrote:
On Thu, Mar 21, 2024 at 3:35 AM Rainer Deyke via Boost
wrote: This is not so good, because it mixes the options of different encodings, resulting in potentially nonsensical combinations:
result = baseX_encode<64>(source, encoding_options::no_padding); result2 = baseX_encode<16>(source, encoding_options::lower_case);
// The lower_case option is non-sensical for base 64; can this // error be caught at compile time? // result3 = baseX_encode<64>(source, encoding_options::lower_case);
It can be caught at compile time. But not with that interface.
Actually it can be caught at compile time even with that interface, if
encoding_options is a namespace instead of an enum type and
encoding_options::lower_case has a distinct type from
encoding_options::no_padding. For example:
template
Yeah, as Adrey mentions, making this a runtime choice is sufficiently rare that it's not worth even thinking about it. I certainly never had a use for a runtime choice for that. As for a sane interface.. I would think having encoder/decoder templates (perhaps as functors) is the way to go. For example:
auto base64enc = boost::thing::base_encoder<64, boost::thing::encoding_options::no_padding>(); auto encoded = base64enc.encode(data); auto decoded = base64enc.decode(encoded);
This makes it possible to pass the encoder object to generic code without worrying about calling some specific base64 or base16 functions. Having the template args also makes it possible to check valid combinations. It also makes it easier to specialize performant combinations and still cover everything else with not-so-performant default implementation.
Having an encoder object makes sense. Defining a reusable concept for this encoder object makes sense. Defining a single class template for the encoder object does not make sense, because the set of encoder objects I might want to use is open. -- Rainer Deyke (rainerd@eldwood.com)
FWIW the boost serialization library has included a module for handling this for over 20 years.
I've been looking at the implementation and it wouldn't fit my use case (parsing P-H-C strings as per https://github.com/P-H-C/phc-string-format/blob/master/phc-sf-spec.md) because: * It's hardwired to use exceptions. My use case is a parser for a data format with several base64 fields. It's not exceptional for them to be malformed, so I'd rather use error codes. * The format I'm parsing mandates to not output padding characters. I don't see any configuration option in the documentation to disable these. How does serialization compare in performance when doing base64 to other libraries?
On 3/19/24 20:46, Ruben Perez via Boost wrote:
Hi all,
I've been thinking of writing a base64 encoder/decoder library for some time. I first found the need when writing ServerTech chat (https://github.com/anarthal/servertech-chat), and I couldn't find a library that suited my needs. All the libraries I could find were either forgotten one-day projects with no tests, or private parts of something bigger.
From this issue from Beast (https://github.com/boostorg/beast/issues/1710) it looks like this could be useful to people.
My idea would be writing something that is: * Boost quality - extensive testing and fuzzing. * Configurable. My use case in ServerTech required me to not output padding chars, for instance. * Support for a streaming API. * Support for calling in a constexpr context. * I'd prefer to focus on doing base64 only and doing it well, rather than targeting arbitrary bases.
I don't have any sample code yet. Following our recommendations, I'm only determining interest. My questions are: * Do you think a library like this could be useful to the C++ community?
I think, a BaseN library (with N being at least 16 and 64) with high performance (i.e. with SIMD support) and configurability would be useful. The particular points of configuration and capabilities I'm interested in: - Character set. For Base16 - upper or lower-case letters. For Base64 - normal or URL-safe[1] character set. - For Base64, whether to include trailing padding on encoding. - For decoding, support error indication via exception or an error code. - Support output into an externally provided buffer. This also implies that the library must provide means to estimate the size of that buffer for a given input. Support for output in unallocated buffer (e.g. via std::back_inserter) is also welcome, but not a strong requirement. - Zero allocation, context-less mode. I.e. a function taking inputs and outputs and doing the whole job in one go. I have my own implementation in my project, and having a well-tested Boost library with these capabilities, I think, would be useful.
* If the answer to the above is yes, do you think it could belong to Boost, or would be better as a standalone library?
I would be interested in a Boost library, not a standalone version. There are plenty implementations out there (for example, in OpenSSL), so the proposed library will need to have a comparison, including performance, with popular alternatives in the docs. [1]: https://datatracker.ietf.org/doc/html/rfc4648#section-5
I think, a BaseN library (with N being at least 16 and 64) with high performance (i.e. with SIMD support) and configurability would be useful. The particular points of configuration and capabilities I'm interested in:
Looks like the two more widespread bases. Seems reasonable.
- Character set. For Base16 - upper or lower-case letters. For Base64 - normal or URL-safe[1] character set.
Definitely.
- For Base64, whether to include trailing padding on encoding.
This is actually one of the requirements that motivated me to think of this project.
- For decoding, support error indication via exception or an error code.
And this is the other one. I might go with system::result, which joins the best of the two worlds.
- Support output into an externally provided buffer. This also implies that the library must provide means to estimate the size of that buffer for a given input. Support for output in unallocated buffer (e.g. via std::back_inserter) is also welcome, but not a strong requirement.
Yes, that's mandatory. I'm not convinced of the std::back_inserter one, though.
- Zero allocation, context-less mode. I.e. a function taking inputs and outputs and doing the whole job in one go.
Seems reasonable, too.
I would be interested in a Boost library, not a standalone version.
There are plenty implementations out there (for example, in OpenSSL), so the proposed library will need to have a comparison, including performance, with popular alternatives in the docs.
I attempted to use the OpenSSL implementation and found it quite frustrating. Regards, Ruben.
On 19.03.24 18:46, Ruben Perez via Boost wrote:
I don't have any sample code yet. Following our recommendations, I'm only determining interest. My questions are: * Do you think a library like this could be useful to the C++ community?
Yes. Base-64 encoding is a problem that comes up occasionally, and while it's fairly easy to write one's one implementation, having a well-designed and well-tested implementation available would save some time and effort.
* If the answer to the above is yes, do you think it could belong to Boost, or would be better as a standalone library?
As a part of Boost, I would probably use it. As a standalone library, I probably wouldn't bother. Each additional standalone library I use is an additional risk, and the task of base-64 coding is not difficult enough to justify that risk. -- Rainer Deyke (rainerd@eldwood.com)
In article
I've been thinking of writing a base64 encoder/decoder library for some time.
Are you aware of the SIMD base64 codec in this library? https://github.com/simdutf/simdutf?tab=readme-ov-file#base64 -- "The Direct3D Graphics Pipeline" free book http://tinyurl.com/d3d-pipeline The Terminals Wiki http://terminals-wiki.org The Computer Graphics Museum http://computergraphicsmuseum.org Legalize Adulthood! (my blog) http://legalizeadulthood.wordpress.com
participants (13)
-
Andrey Semashev
-
Christopher Kormanyos
-
Dominique Devienne
-
Jakob Lövhall
-
Marshall Clow
-
Misha Komarov
-
Rainer Deyke
-
René Ferdinand Rivera Morell
-
Richard
-
Richard Hodges
-
Robert Ramey
-
Ruben Perez
-
Vinnie Falco