[Boost-users] RE: Re: Feature request for boost::filesystem

2 Jul 2004

      Delfin Rojas wrote:
...
It seems most people post here at night PST. I never thought my posting
would generate so many interesting discussions.
Well.. night PST is evening GMT+3, which explains at least my postings ;-)
...
I have been taking a look at the library code and certainly the only thing
that would need to change is to use a preprocessor define to turn on/off
wide character strings and everywhere in the code use TChar strings. When
the code is being compiled for POSIX systems this Unicode define should be
turned off. In the Windows specific code all the calls to the Windows API
would need to change from "FunctionCallA" to "FunctionCall" since
internally the Windows API also works with TChar.
Yes, that would work. But note that you might want to use wide string even
on Linux -- so you get two versions, narrow and wide.
...
The caller could also use the TChar idea to have its code talk to the
library seamlessly.
Yes, that's OK for application, where the decision to use Unicode is global.
But if you write another library which uses the first one. Then it also must
have two variants. This is what bothers me: everything library should be
unicode and non-unicode variant, even if the differences can probably be
hidden somewhere inside implemenetation.
...
String constants can also be expressed in TChars 
(_T("my string") in Windows).
If I understand correctly, this expands to L"my string" -- i.e. string
constant. Then I think it's still needed to have portable string->wstring
conversion which repsects the current locale.
...
As far as a library that can be passed both single char and double char
strings it is also a possibility that would play along well with the
scenario I just described. The library can perform a string_cast<TChar>
always to make sure the string is converted to the string type being used
by the library. If the library is compiled to use wide strings internally
then string_cast<TChar> would convert char strings to wchar_t strings and
wchar_t strings would remain unchanged. The contrary occurs when Unicode
define is turned off.
Yes, that's what I find right. The question is whether you ever need two
version of the library. Supposing that conversions are optimized enough, or
that the performance does not matter much (e.g. for boost::path access to
files via OS might cost must more than any conversion), then you can have
just one version of the compiled library. The users don't have to worry
which one to obtain/install/link to.
...
However, I feel this interface is not the best since 
it would allow the caller to mix single char strings and double char
strings and this is not a good practice generally. Converting strings back
and forth is not a fast process and conversions may not always result in
what you expect, especially if you are a novice working with encodings.
This is where we disagree. For example, I want to support Unicode on Linux.
All filesystem functions accept char*, so I *have* to do conversion.
Another question is that many other function only return char*, so again I
need conversions. Why can't they be done by boost::path?

E.g.: 

    boost::path p(L".......");

    p /= argv[1];
    p /= to_wstring(argv[1]);

I don't really think the latter is better than the former.

- Volodya
...
Somebody mentioned Java doesn't have this problem. This is because all
strings in Java are UTF-16 (wchar_t) strings.
Let me know what you guys think of all this.
Thanks
-delfin
-----Original Message-----
From: boost-users-bounces@lists.boost.org
[mailto:boost-users-bounces@lists.boost.org] On Behalf Of David Abrahams
Sent: Thursday, July 01, 2004 9:46 AM
To: boost-users@lists.boost.org
Subject: [Boost-users] Re: Feature request for boost::filesystem
Vladimir Prus <ghost@cs.msu.su> writes:
...
David Abrahams wrote:
...
...
1. Make the library interface templated.
2. Use narrow classes: e.g. string
3. Use wide classes: e.g. wstring
4. Have some class which works with ascii and unicode.
The first approach is bad for code size reasons.
It doesn't have to be.  There can be a library object with explicit
instantiations of the wide and narrow classes.
Which doubles the size of shared library itself.
It depends; the narrow specialization might be implemented in terms
of the wide one ;-)