[program_options] Proposal: self-contained, header-only port of Boost Program Options library

newer
[Quickbook] VSCode Extension for...

Vicram Rajagopalan

12 Sep 2019 12 Sep '19

3:02 a.m.

I am interested in creating a header-only implementation of the Boost Program Options library that only depends on the C++ standard library. Program Options uses several other Boost libraries, so I would have to re-implement some of it using standard library constructs. I have 2 questions for the community: 1. Would you use something like this if it were available? 2. Do you know of any implementation details of Program Options which might make some part of this difficult or impossible? To be clear, I do not intend for this to be merged into Boost in any form. Rationale: There is no portable command-line argument-parsing capability in the C++ standard library. There's getopt, but that's in unistd.h which is only available on Unix-based systems. The only widely-used C++ command-line parsing library I am aware of is Program Options, but that requires adding a dependency on Boost to your project, which seems like overkill to me. I would like to be able to simply add a project as a submodule in my Git repo and #include it without even having to add anything to my build files. The goal is to ensure that the library is as portable and easy to include as possible, because it shouldn't be difficult to parse command-line options. I appreciate any thoughts, comments, or criticisms! -Vicram Rajagopalan

Show replies by date

degski

12 Sep 12 Sep

5:21 a.m.

On Thu, 12 Sep 2019 at 08:04, Vicram Rajagopalan via Boost < boost@lists.boost.org> wrote:

...

I would like to be able to simply add a project as a submodule in my Git repo and #include it without even having to add anything to my build files.

Why not try the recently proposed https://bfgroup.github.io/Lyra/ , released under the Boost Software License. Lyra is forked (from dormant Clara) and maintained by Rene Rivera, who is also a contributor to Boost (and Conan I believe). degski -- @realdegski https://brave.com/google-gdpr-workaround/ "We value your privacy, click here!" Sod off! - degski "Anyone who believes that exponential growth can go on forever in a finite world is either a madman or an economist" - Kenneth E. Boulding "Growth for the sake of growth is the ideology of the cancer cell" - Edward P. Abbey

Peter Dimov

2:30 p.m.

degski wrote:

...

Why not try the recently proposed https://bfgroup.github.io/Lyra/ , released under the Boost Software License. Lyra is forked (from dormant Clara) and maintained by Rene Rivera, who is also a contributor to Boost (and Conan I believe).

Lyra looks pretty good.

Zach Laine

4 p.m.

On Thu, Sep 12, 2019 at 7:30 AM Peter Dimov via Boost <boost@lists.boost.org> wrote:

...

degski wrote:

...
Why not try the recently proposed https://bfgroup.github.io/Lyra/ , released under the Boost Software License. Lyra is forked (from dormant Clara) and maintained by Rene Rivera, who is also a contributor to Boost (and Conan I believe).

Lyra looks pretty good.

I agree with a lot of the points raised above about the problematic nature of Boost.ProgramOptions. I also think Lyra looks interesting. If you're interesting in solving problems in this space, rather than doing a straight port, here are some things I would find very helpful, not all of which Lyra provides: - An options-specifying API similar to Python's argparse library ( https://docs.python.org/2/library/argparse.html). That covers all the permutations I've ever needed, and then some. - The ability to serialize the options, so that I can easily use "response files" (files containing command line options or some serialized form of them), and/or hand-editable config files. I find YAML to be an attractive format for saving such things. YMMV. Zach

Vicram Rajagopalan

17 Sep 17 Sep

5:02 a.m.

On Thu, Sep 12, 2019 at 11:01 AM Zach Laine via Boost <boost@lists.boost.org> wrote:

...

I agree with a lot of the points raised above about the problematic nature of Boost.ProgramOptions. I also think Lyra looks interesting.

Regarding Lyra, I took a look at the Github repo a few weeks ago, but as far as I could tell, it hasn't gained much traction. That was just my impression from the low number of stars, watches, issues, and pull requests. Are there any particular reasons that y'all recommend Lyra in particular? Granted, development/maintenance does seem to be active, which is a good sign in my book.

...

If you're interesting in solving problems in this space, rather than doing a straight port, here are some things I would find very helpful, not all of which Lyra provides:

I suppose what I'd really like to see is a de-facto standard; right now, it doesn't seem that one exists. Given that Boost.ProgramOptions is not a particularly good example to follow, the best use of my time may be to contribute to a healthy project. cxxopts is one that caught my eye, as it seems more well-known than most other similar projects. Does anyone have any impressions of cxxopts (or others)?

...

- The ability to serialize the options, so that I can easily use "response files" (files containing command line options or some serialized form of them), and/or hand-editable config files.

Judging from the documentation, the Gflags library (Google's command- line flags library) supports something like this, which they call a "flagfile": https://gflags.github.io/gflags/ I've never used Gflags so I can't speak to whether it's any good. -Vicram Rajagopalan

Zach Laine

4:05 p.m.

On Tue, Sep 17, 2019 at 12:02 AM Vicram Rajagopalan via Boost < boost@lists.boost.org> wrote:

...

On Thu, Sep 12, 2019 at 11:01 AM Zach Laine via Boost <boost@lists.boost.org> wrote:

...
I agree with a lot of the points raised above about the problematic nature of Boost.ProgramOptions. I also think Lyra looks interesting.

Regarding Lyra, I took a look at the Github repo a few weeks ago, but as far as I could tell, it hasn't gained much traction. That was just my impression from the low number of stars, watches, issues, and pull requests. Are there any particular reasons that y'all recommend Lyra in particular?

I only just heard about it in this thread.

...

Granted, development/maintenance does seem to be active, which is a good sign in my book.

...
If you're interesting in solving problems in this space, rather than doing a straight port, here are some things I would find very helpful, not all of which Lyra provides:

I suppose what I'd really like to see is a de-facto standard; right now, it doesn't seem that one exists. Given that Boost.ProgramOptions is not a particularly good example to follow, the best use of my time may be to contribute to a healthy project. cxxopts is one that caught my eye, as it seems more well-known than most other similar projects. Does anyone have any impressions of cxxopts (or others)?

IMO, what made libfmt (which became C++20's std::format) a success is that it took an existing and popular API for string formatting (from Python) and implemented it efficiently. If you were to do the same thing with Python's argparse, I think the result would be similar. I say this because all the libraries above, and probably others besides, are each taking a particular point of view, API-wise, that has not necessarily caught on. Perhaps one of them will, I don't know. I *do* know that the argparse API has been stable for years, and covers every scenario for handling command line arguments that I have seen. Zach

Rainer Deyke

12 Sep 12 Sep

7:57 a.m.

On 12.09.19 05:02, Vicram Rajagopalan via Boost wrote:

...

1. Would you use something like this if it were available?

I would not use it because I do not use Boost Program Options, and I do not expect a straight port to solve the problems I have with Boost Program Options. These problems are: 1. Unicode support is based on wchar_t instead of utf8. wchar_t has an implementation-defined width which makes it unsuitable for portable Unicode code. The correct way to handle Unicode in general is to use narrow strings encoded as utf-8. The correct way to handle Unicode on Unix systems is to accept narrow strings and to assume that they are in utf-8, regardless of locale. The correct way to handle Unicode on Windows is to accept wide strings and convert them to utf-8 immediately when received. I could, of course, perform my own conversion to utf-8 and pass the result to Boost Program Options, but that approach seems brittle given that Boost Program Options assumes that 8-bit strings are in the "local 8-bit encoding". 2. I have found that code that uses Boost Program Options is neither easier to write nor more maintainable than code which parses command line options manually. -- Rainer Deyke (rainerd@eldwood.com)

Vicram Rajagopalan

17 Sep 17 Sep

4:10 a.m.

On Thu, Sep 12, 2019 at 2:57 AM Rainer Deyke via Boost <boost@lists.boost.org> wrote:

...

I would not use it because I do not use Boost Program Options, and I do not expect a straight port to solve the problems I have with Boost Program Options.

My initial thinking was that to increase likelihood of adoption, it would be a good idea to provide an interface that people are familiar with, but it's true that the API has a lot of issues. This proposal may be a non-starter, which is fine; I'm glad to get constructive feedback.

...

1. Unicode support is based on wchar_t instead of utf8. wchar_t has an implementation-defined width which makes it unsuitable for portable Unicode code. The correct way to handle Unicode in general is to use narrow strings encoded as utf-8. The correct way to handle Unicode on Unix systems is to accept narrow strings and to assume that they are in utf-8, regardless of locale. The correct way to handle Unicode on Windows is to accept wide strings and convert them to utf-8 immediately when received.

I'm not too familiar with dealing with non-ASCII character encodings in argv. Is it portable to assume that the input is UTF-8, regardless of locale? -Vicram Rajagopalan

Gavin Lambert

6:32 a.m.

On 17/09/2019 16:10, Vicram Rajagopalan wrote:

...

I'm not too familiar with dealing with non-ASCII character encodings in argv. Is it portable to assume that the input is UTF-8, regardless of locale?

It is not. I'm probably ignorant of several things in this area myself, but the basic version is: * On Windows, argv is converted to the current system codepage unless you are using the wmain/wWinMain entrypoints to get wchar_t strings instead. (And you should never ever use the converted values, as they will only sometimes work, due to being a lossy conversion.) It will never be UTF-8, but you can rely on it being UTF-16 (when using wmain/wWinMain). * On Unixes, argv contains whatever byte sequence the shell/caller put there. This might be the actual filename on disk (if they used tab completion) or it might be something subtly different (if they typed it themselves using some kind of IME), or even a binary blob. In the first two cases, while it is fairly *likely* to be UTF-8 (especially in modern systems), it is not guaranteed to be -- the user could be running a non-UTF-8 locale, or be accessing a filesystem created by someone who was. Ideally, treat them as an opaque blob that can only be passed to open() etc and never manipulated as text. (Obviously, this is frequently impractical.) So, on Windows, you must use the wchar_t as input, and while you *could* convert this to UTF-8 for internal use you still have to convert it back to UTF-16 to actually make use of it with the OS. Which is fine if you're doing a lot of string manipulation (including option parsing) but seems a bit wasteful if you're only using it as an opaque filename token. (And if you forget to convert back to UTF-16, it may interpret your UTF-8 string as a local-codepage-ANSI string, and hilarity ensues.) Whereas on Linux you can often get away with assuming that it's UTF-8, but some valid filenames will break encoder-savvy code, and any string conversions might output a no-longer-valid filename.

Jonathan Coe

7:25 a.m.

abseil flags is good and well supported https://abseil.io/docs/cpp/guides/flags J

...

On 17 Sep 2019, at 07:32, Gavin Lambert via Boost <boost@lists.boost.org> wrote:

...
On 17/09/2019 16:10, Vicram Rajagopalan wrote: I'm not too familiar with dealing with non-ASCII character encodings in argv. Is it portable to assume that the input is UTF-8, regardless of locale?

It is not.

I'm probably ignorant of several things in this area myself, but the basic version is:

* On Windows, argv is converted to the current system codepage unless you are using the wmain/wWinMain entrypoints to get wchar_t strings instead. (And you should never ever use the converted values, as they will only sometimes work, due to being a lossy conversion.) It will never be UTF-8, but you can rely on it being UTF-16 (when using wmain/wWinMain).

* On Unixes, argv contains whatever byte sequence the shell/caller put there. This might be the actual filename on disk (if they used tab completion) or it might be something subtly different (if they typed it themselves using some kind of IME), or even a binary blob. In the first two cases, while it is fairly *likely* to be UTF-8 (especially in modern systems), it is not guaranteed to be -- the user could be running a non-UTF-8 locale, or be accessing a filesystem created by someone who was. Ideally, treat them as an opaque blob that can only be passed to open() etc and never manipulated as text. (Obviously, this is frequently impractical.)

So, on Windows, you must use the wchar_t as input, and while you *could* convert this to UTF-8 for internal use you still have to convert it back to UTF-16 to actually make use of it with the OS. Which is fine if you're doing a lot of string manipulation (including option parsing) but seems a bit wasteful if you're only using it as an opaque filename token. (And if you forget to convert back to UTF-16, it may interpret your UTF-8 string as a local-codepage-ANSI string, and hilarity ensues.)

Whereas on Linux you can often get away with assuming that it's UTF-8, but some valid filenames will break encoder-savvy code, and any string conversions might output a no-longer-valid filename.

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Rainer Deyke

11:27 a.m.

On 17.09.19 08:32, Gavin Lambert via Boost wrote:

...

* On Unixes, argv contains whatever byte sequence the shell/caller put there. This might be the actual filename on disk (if they used tab completion) or it might be something subtly different (if they typed it themselves using some kind of IME), or even a binary blob. In the first two cases, while it is fairly *likely* to be UTF-8 (especially in modern systems), it is not guaranteed to be -- the user could be running a non-UTF-8 locale, or be accessing a filesystem created by someone who was.

Or the user could be running a non-UTF-8 locale, but accessing a filesystem created by somebody who was using UTF-8 - in which case any filenames should be in UTF-8, even if the user's locale disagrees. It is because of this last possibility that I recommend treating all command-line arguments as UTF-8 on Unix systems, even if running a non-UTF-8 locale, for all cases where treating them as binary blobs is impractical. Unix filenames are binary blobs, but the de-facto standard for interpreting these binary blobs as text is to use UTF-8. How can two users, running two different locales, share a filesystem? By using UTF-8 for all filenames, regardless of locale. How should a program convert command-line arguments into UTF-8 filenames? By assuming that they are already in UTF-8, because performing any kind of conversion will cause more problems than it will fix. -- Rainer Deyke (rainerd@eldwood.com)

Peter Dimov

1:16 p.m.

Rainer Deyke wrote:

...

Or the user could be running a non-UTF-8 locale, but accessing a filesystem created by somebody who was using UTF-8 - in which case any filenames should be in UTF-8, even if the user's locale disagrees.

It is because of this last possibility that I recommend treating all command-line arguments as UTF-8 on Unix systems, even if running a non-UTF-8 locale, for all cases where treating them as binary blobs is impractical. Unix filenames are binary blobs, but the de-facto standard for interpreting these binary blobs as text is to use UTF-8. [...]

How does any of this affect the library? It just gives you whatever you passed as `argv`, without needing to interpret it. Windows is a different story.

Zach Laine

4:14 p.m.

On Tue, Sep 17, 2019 at 8:17 AM Peter Dimov via Boost <boost@lists.boost.org> wrote:

...

Rainer Deyke wrote:

...
Or the user could be running a non-UTF-8 locale, but accessing a filesystem created by somebody who was using UTF-8 - in which case any filenames should be in UTF-8, even if the user's locale disagrees.

It is because of this last possibility that I recommend treating all command-line arguments as UTF-8 on Unix systems, even if running a non-UTF-8 locale, for all cases where treating them as binary blobs is impractical. Unix filenames are binary blobs, but the de-facto standard for interpreting these binary blobs as text is to use UTF-8. [...]

How does any of this affect the library? It just gives you whatever you passed as `argv`, without needing to interpret it.

Windows is a different story.

Indeed, you can just use UTF-8 (as long as you document this!) for everything except Windows. With Windows, you need to provide a wchar_t/UTF-16 overload for every char/UTF-8 overload in your lib. If you want 100% correctness, you are not allowed to arbitrarily convert the wchar_t strings. In particular, you are not allowed to convert them to UTF-8, because it is possible that one of them is a filename, and it is possible to construct filenames on the Windows platform that are not properly UTF-16-encoded. This means that the UTF-16 -> UTF-8 conversion is lossy, if you follow the Unicode guidelines for that conversion -- you should produce a replacement character (U+FFFD) where you encounter the broken UTF-16. Though such broken-UTF-16-named files are possible to create, they do not come up often in practice; they almost never do. So, if you don't care about this case that prevents 100% correctness, just provide wchar_t overloads, and implement each one by converting to UTF-8 and calling your UTF-8 overload, and only define the wchar_t overloads when building on Windows. Zach

Andrey Semashev

12 Sep 12 Sep

8:25 a.m.

On 2019-09-12 06:02, Vicram Rajagopalan via Boost wrote:

...

I am interested in creating a header-only implementation of the Boost Program Options library that only depends on the C++ standard library. Program Options uses several other Boost libraries, so I would have to re-implement some of it using standard library constructs.

I have 2 questions for the community: 1. Would you use something like this if it were available?

No.

...

2. Do you know of any implementation details of Program Options which might make some part of this difficult or impossible?

Even if you require C++11, there is a considerable amount of Boost used in ProgramOptions: https://pdimov.github.io/boostdep-report/develop/program_options.html It is not impossible to reimplement all that or redesign the library to not require some of the components, but that would be a considerable amount of work.

...

To be clear, I do not intend for this to be merged into Boost in any form.

Rationale: There is no portable command-line argument-parsing capability in the C++ standard library. There's getopt, but that's in unistd.h which is only available on Unix-based systems. The only widely-used C++ command-line parsing library I am aware of is Program Options, but that requires adding a dependency on Boost to your project, which seems like overkill to me. I would like to be able to simply add a project as a submodule in my Git repo and #include it without even having to add anything to my build files. The goal is to ensure that the library is as portable and easy to include as possible, because it shouldn't be difficult to parse command-line options.

I appreciate any thoughts, comments, or criticisms!

Boost is almost an implicit dependency of any of my projects, I find myself using it extensively, so the dependency on it is not a problem. Adding yet another dependency might be problematic, especially given that there is Boost.ProgramOptions already. I understand there probably are projects that need nothing but Boost.ProgramOptions, where a standalone version might be useful. However, I do not believe reimplementing well-known components, like boost::any or boost::function or type traits for example, is a good approach. As I said, you can mitigate some of this by raising the minimum C++ version you require, but I don't believe raising it to e.g. C++17 would ease the library adoption. Another point is that I'm not quite happy with the API Boost.ProgramOptions provides. If there is a new library, I would probably prefer a simpler API, possibly employing C++11 features, rather than a straight reimplementation. The new library should offer something new compared to the existing solutions.

Mike

1:35 p.m.

...

Gesendet: Donnerstag, 12. September 2019 um 05:02 Uhr Von: "Vicram Rajagopalan via Boost" <boost@lists.boost.org>

I am interested in creating a header-only implementation of the Boost Program Options library that only depends on the C++ standard library. Program Options uses several other Boost libraries, so I would have to re-implement some of it using standard library constructs.

I have 2 questions for the community: 1. Would you use something like this if it were available?

Probably not. If boost is a dependency anyway, I'd probably stick to the boost version and if I don't use boost, I'm perfectly happy with one of the alternatives (cxxopts, clara - to name two). A modernized version of Boost.ProgramOptions that is part of boost would probably be more appealing to me.

...

2. Do you know of any implementation details of Program Options which might make some part of this difficult or impossible?

Depends on what c++ standard you are targeting. I tried this myself some time ago (nothing production ready, just a quick POC). I can't remember the details, but I think I used a lot of c++17 features, so if you target a lower standard you probably have to internalize a lot of other boost facilities. Best Mike

2118

Age (days ago)

2123

Last active (days ago)

List overview

Download

14 comments

9 participants

participants (9)

Andrey Semashev
degski
Gavin Lambert
Jonathan Coe
Mike
Peter Dimov
Rainer Deyke
Vicram Rajagopalan
Zach Laine