Delfin Rojas wrote:
It seems most people post here at night PST. I never thought my posting would generate so many interesting discussions.
Well.. night PST is evening GMT+3, which explains at least my postings ;-)
I have been taking a look at the library code and certainly the only thing that would need to change is to use a preprocessor define to turn on/off wide character strings and everywhere in the code use TChar strings. When the code is being compiled for POSIX systems this Unicode define should be turned off. In the Windows specific code all the calls to the Windows API would need to change from "FunctionCallA" to "FunctionCall" since internally the Windows API also works with TChar.
Yes, that would work. But note that you might want to use wide string even on Linux -- so you get two versions, narrow and wide.
The caller could also use the TChar idea to have its code talk to the library seamlessly.
Yes, that's OK for application, where the decision to use Unicode is global. But if you write another library which uses the first one. Then it also must have two variants. This is what bothers me: everything library should be unicode and non-unicode variant, even if the differences can probably be hidden somewhere inside implemenetation.
String constants can also be expressed in TChars (_T("my string") in Windows).
If I understand correctly, this expands to L"my string" -- i.e. string constant. Then I think it's still needed to have portable string->wstring conversion which repsects the current locale.
As far as a library that can be passed both single char and double char strings it is also a possibility that would play along well with the scenario I just described. The library can perform a string_cast<TChar> always to make sure the string is converted to the string type being used by the library. If the library is compiled to use wide strings internally then string_cast<TChar> would convert char strings to wchar_t strings and wchar_t strings would remain unchanged. The contrary occurs when Unicode define is turned off.
Yes, that's what I find right. The question is whether you ever need two version of the library. Supposing that conversions are optimized enough, or that the performance does not matter much (e.g. for boost::path access to files via OS might cost must more than any conversion), then you can have just one version of the compiled library. The users don't have to worry which one to obtain/install/link to.
However, I feel this interface is not the best since it would allow the caller to mix single char strings and double char strings and this is not a good practice generally. Converting strings back and forth is not a fast process and conversions may not always result in what you expect, especially if you are a novice working with encodings.
This is where we disagree. For example, I want to support Unicode on Linux. All filesystem functions accept char*, so I *have* to do conversion. Another question is that many other function only return char*, so again I need conversions. Why can't they be done by boost::path? E.g.: boost::path p(L"......."); p /= argv[1]; p /= to_wstring(argv[1]); I don't really think the latter is better than the former. - Volodya
Somebody mentioned Java doesn't have this problem. This is because all strings in Java are UTF-16 (wchar_t) strings.
Let me know what you guys think of all this.
Thanks
-delfin
-----Original Message----- From: boost-users-bounces@lists.boost.org [mailto:boost-users-bounces@lists.boost.org] On Behalf Of David Abrahams Sent: Thursday, July 01, 2004 9:46 AM To: boost-users@lists.boost.org Subject: [Boost-users] Re: Feature request for boost::filesystem
Vladimir Prus
writes: David Abrahams wrote:
1. Make the library interface templated. 2. Use narrow classes: e.g. string 3. Use wide classes: e.g. wstring 4. Have some class which works with ascii and unicode.
The first approach is bad for code size reasons.
It doesn't have to be. There can be a library object with explicit instantiations of the wide and narrow classes.
Which doubles the size of shared library itself.
It depends; the narrow specialization might be implemented in terms of the wide one ;-)