request for comments in a new format library

older
Re: [boost] [Boost-bugs] [Boost...

Roberto Hinz

21 Feb 2018 21 Feb '18

2:24 p.m.

Dear boost community, some time ago I announced a format library I'm working on. I think at that time it was still too incipient for one to evaluate whether it looked promising, thought some few but very useful observation were made ( one regarding i18n and other code bloat ) that made me change many things. Now the library is more evolved and I think it will be ready for review somewhere in the second half of this year, if no big change happens. This is a c++14 text format library with some unique features: - designed to be utf8 friendly - easy to extend for new destination types ( like some different string type ) - able to align (justify) a set of sub-arguments as one ( see joins <https://robhz786.github.io/stringify/doc/html/special_input_types/special_input_types.html> ) - it enables the use of translations tools ( like gettext ), yet the formatting is not specified by a format string. And the problem with format strings that they lead to run time errors (instead of compilation error ) when they are incorrect. Hence I much appreciate any comments, especially if can identify any flaw I didn't realize. repo: https://github.com/robhz786/stringify doc: https://robhz786.github.io/stringify/doc/html/ best regards roberto

Show replies by date

Frédéric

22 Feb 22 Feb

9:46 p.m.

Hi Roberto, After long time using boost::format, I searched for an efficient formatting library and ended using fmtlib which I found extremely quick and easy to use. From you benchmark, fmt is almost always the quickest solution. I also like very much the fact that I can provide some kind of printf format string as this is what is most suitable for translations. What would be the advantage to use your library instead? Side question: I am impressed by the very bad timing (x2) of the tests on Windows compared to linux. Are they the same kind of machines/processors? If yes, why such a difference? Regards, F

Roberto Hinz

11:01 p.m.

Hi Frédéric On Thu, Feb 22, 2018 at 6:46 PM, Frédéric <ufospoke@gmail.com> wrote:

...

Hi Roberto,

After long time using boost::format, I searched for an efficient formatting library and ended using fmtlib which I found extremely quick and easy to use. From you benchmark, fmt is almost always the quickest solution. I also like very much the fact that I can provide some kind of printf format string as this is what is most suitable for translations. What would be the advantage to use your library instead?

I think the advantages are subtle: - you can extend it to write into your own output types ( like if you use some string type other than std::string) - compile error instead of runtime errors: fmt throws an exception if there is something wrong in the format string. Since stringify use a composition of those format functions <https://robhz786.github.io/stringify/doc/html/format_functions0/format_functions.html>, you get compilation errors instead. - In oder to customize numeric punctuation, fmt forces you to change the current locate. I find this bad because It means modifying a global state. Also, I presume that fmt delegate the job to std::ostream in this case, which I suspect must reduce the performance considerably. - stringify decouples formatting from translation, for instance: namespace strf = boost::stringify::v0; auto str = strf::make_string [ gettext("your login is: {0}\n your access code is: {1}") ] &= { strf::right(login, 40) , strf::hex(code) > 40 }; The message to be translated does not contain formatting. Hence: - There is less chances that the translator team ( which is usually not composed by programmers ) make some mistake. - Enables you to change the formatting without requesting the translators to update the translated strings.

...

Side question: I am impressed by the very bad timing (x2) of the tests on Windows compared to linux. Are they the same kind of machines/processors? If yes, why such a difference?

It's the same machine ( an intel NUC6i5SYH ). I'm impressed too. Maybe there is some inaccurateness in my benchmarks that is dependent on the operating system. But It can't be just that. Also, in order to copy strings stringify uses internally std::char_traits::copy. I think gcc implementations uses some parallelism there, while msvc does not. best regargs, robhz

Frédéric

25 Feb 25 Feb

7:21 a.m.

...

- In oder to customize numeric punctuation, fmt forces you to change the current locate. I find this bad because It means modifying a global state. Also, I presume that fmt delegate the job to std::ostream in this case, which I suspect must reduce the performance considerably.

Yes this is an issue I have. They use vsnprintf if I remember correctly and there is no way to pass the locale as argument which is a big issue in multithread programs.

...

- stringify decouples formatting from translation, for instance:

namespace strf = boost::stringify::v0;

auto str = strf::make_string [ gettext("your login is: {0}\n your access code is: {1}") ] &= { strf::right(login, 40) , strf::hex(code) > 40 };

I like this idea. This keeps the possibility to have easy translations and allows for compile time checks.

...

The message to be translated does not contain formatting. Hence: - There is less chances that the translator team ( which is usually not composed by programmers ) make some mistake. - Enables you to change the formatting without requesting the translators to update the translated strings.

good point. Thanks, F

Gavin Lambert

23 Feb 23 Feb

12:49 a.m.

On 22/02/2018 03:24, Roberto Hinz wrote:

...

some time ago I announced a format library I'm working on. I think at that time it was still too incipient for one to evaluate whether it looked promising, thought some few but very useful observation were made ( one regarding i18n and other code bloat ) that made me change many things.

https://robhz786.github.io/stringify/doc/html/ Do you assume that "char" is UTF-8 or is it encoding-agnostic? I suspect the former given use of char32_t elsewhere. While this might be typical on Posix it is not universal, and on Windows it is rarely the case. https://robhz786.github.io/stringify/doc/html/general_syntax/syntax.html = vs. &= to denote exception throwing or not seems a bit opaque. Have you considered overloads taking nothrow_t or error_code& instead? https://robhz786.github.io/stringify/doc/html/assembly_string/the_assembly_s... : {/ as an escape is highly peculiar. \{ or {{ would be more typical. "destinated" => "intended" https://robhz786.github.io/stringify/doc/html/facets/facets.html "octadecimal" != "octal". The latter is correct. Also, while I've never particularly been a fan of the STL formatting facets, it does bother me a bit that you're trying to reinvent the wheel here. https://robhz786.github.io/stringify/doc/html/encoding_facets/facets_for_enc... I guess this confirms the "char == UTF-8" assumption. Admittedly the STL seems to have poor support in this area too. Regarding the BOOST_STRINGIFY_DONT_ASSUME_WCHAR_ENCODING macro definition, you should try to avoid having these break ABI in different compilation units -- have it select in the headers which of two unique implementations in different namespaces is used, such that both can coexist in different translation units without conflict. The compiled library, if there is one, would always provide both.

Roberto Hinz

7:14 a.m.

On Thu, Feb 22, 2018 at 9:49 PM, Gavin Lambert via Boost < boost@lists.boost.org> wrote:

...

On 22/02/2018 03:24, Roberto Hinz wrote:

...
some time ago I announced a format library I'm working on. I think at that time it was still too incipient for one to evaluate whether it looked promising, thought some few but very useful observation were made ( one regarding i18n and other code bloat ) that made me change many things.

https://robhz786.github.io/stringify/doc/html/

Do you assume that "char" is UTF-8 or is it encoding-agnostic? I suspect the former given use of char32_t elsewhere. While this might be typical on Posix it is not universal, and on Windows it is rarely the case.

by default it is utf-8, but this is customizable through facets. However if one wants another encoding, one needs to implement ones own encoder and decoder as explained at: https://robhz786.github.io/stringify/doc/html/encoding_ facets/facets_for_encoding_convertion.html#encoding_ facets.facets_for_encoding_convertion.how_to_implement_your_own_decode https://robhz786.github.io/stringify/doc/html/encoding_facet s/facets_for_encoding_convertion.html#encoding_facets. facets_for_encoding_convertion.how_to_implement_your_own_encode

...

https://robhz786.github.io/stringify/doc/html/general_syntax/syntax.html

= vs. &= to denote exception throwing or not seems a bit opaque. Have you considered overloads taking nothrow_t or error_code& instead?

Perhaps we could do something like: auto s = make_string .error_code("blah blah {} blah {}") = {arg1, arg2};

...

https://robhz786.github.io/stringify/doc/html/assembly_strin g/the_assembly_string.html :

{/ as an escape is highly peculiar. \{ or {{ would be more typical.

I'm against \{ because using an escape character before the '{' instead of after would force the parser to be less efficient. And also because the user would actually need to type "\\{". My only reluctance against {{ is that it may wrongly suggest that there must be an enclosing }}, especially to users coming from fmtlib, since this is how is done there. And I really don't want to require an enclosing }}, because the user may want to print a { without an enclosing }. Let's see what others think about.

...

"destinated" => "intended"

https://robhz786.github.io/stringify/doc/html/facets/facets.html

"octadecimal" != "octal". The latter is correct.

I really appreciate these little advices. thanks. Being a non-native English speaker, I'm aways afraid of making some linguistic gaffe. By the way, is it better "format library" or "fomatting library" ? "format functions" or "formatting functions" ?

...

Also, while I've never particularly been a fan of the STL formatting facets, it does bother me a bit that you're trying to reinvent the wheel here.

I can't avoid this. The std facets just don't combine well with this library. Facets of boost.stringify are designed to be used with the ftuple class template, while the facets from std are designed to be used with std::locate. And we can't use std::locale here, because we need a really fast getter, since it is called several times for each input argument. ( get_facet of ftuple is very fast, while std::use_facet is not ). ( Btw, I thinking about renaming the ftuple class template. Perhaps facets_pack or facets_bundle ). Another reason is that each facet category is designed to work well specifically in this library. Using for instance std::codecvt instead of the decode and encode facets would be too cumbersome and inefficient. It is too oriented toward std streams. There is also some design philosophy differences. For example, note that numpunct facet is based in char32_t, so that the same facet object works for different encodings. he std::numpunct on the other hand is a class template parametrized to CharT. And anyway, creating a format library is already reinventing wheel. I also needed to reinvent the formatting functions ( hex, oct, right, left, etc ).

...

Regarding the BOOST_STRINGIFY_DONT_ASSUME_WCHAR_ENCODING macro definition, you should try to avoid having these break ABI in different compilation units -- have it select in the headers which of two unique implementations in different namespaces is used, such that both can coexist in different translation units without conflict. The compiled library, if there is one, would always provide both.

I need to think about that. I will come to an answer later. I can see you really spent some time reading the documentations And I much appreciate your effort. Thanks and best regards. Robz

Frédéric

28 Feb 28 Feb

8:04 a.m.

Hi, Why did you choose this syntax: strf::write_to(output) [assembly_string] = {name, age}; and not strf::write_to(output, assembly_string, {name, age}); or strf::write_to(output, assembly_string, name, age); F

Roberto Hinz

3:02 p.m.

On Wed, Feb 28, 2018 at 5:04 AM, Frédéric <ufospoke@gmail.com> wrote:

...

Hi,

Why did you choose this syntax:

strf::write_to(output) [assembly_string] = {name, age};

and not

strf::write_to(output, assembly_string, {name, age});

because of that optional functions like .with(facets) , .reserve(size), etc (and I will possibly add some others ) Although I could be possible (maybe) to specify everything as arguments of a single function call like this: strf::write_to ( output , strf::no_reserve() , strf::make_ftuple(facets) , assembly_string , {args...} ); that would require a function overload for each possible combination. That would complicate me, and would especially complicate the user when he/she extends the library to a new output type. In the way it is now, the user only needs to write one simple function: // see section "How to add support to a new destination type" auto write_to(user_own_output_type& output) { return boost::stringify::make_args_handler < user_own_output_type_writer , user_own_output_type& > ( output ); } So, the user only needs to provide what I named as a new "leading expression". He/she doesn't need to implement all the optional syntax variations, since this is already provided by the return of boost::stringify::make_args_handler. However, I don't see any problem of: write_to(ouput) /*optional funcs here*/ (assembly_string, {args}); And actually I thinks it looks better, though I find the syntax with ={args} easier to ident when breaking it into multiple lines. But it is also possible to provide both option, so that the user can choose which one he/she prefers. I think I'll do that!

...

or strf::write_to(output, assembly_string, name, age);

The arguments have to go into an std::initializer_list. Variadic template would make it impossible to pass some special input types ( like joins ). Though I think it could be possible to use "Fixed sized parameter packs" http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4072.html But I think unfortunately this will never became part of c++. greetings robhz

2691

Age (days ago)

2698

Last active (days ago)

List overview

Download

7 comments

3 participants

participants (3)

Frédéric
Gavin Lambert
Roberto Hinz