Re: [boost] Interest in Unicode library for Boost?

23 Sep 2018

      On Sun, Sep 23, 2018 at 4:57 AM Andrey Semashev via Boost <
boost@lists.boost.org> wrote:
...
On 9/23/18 7:45 AM, Zach Laine via Boost wrote:
I think a Unicode library is very much needed in Boost.
Out of curiosity, it looks like you implemented Unicode algorithms
yourself. Why not use a specialized library, like ICU?
It's partly a question of the size of ICU, which is several megabytes,
whereas Boost.Text is only 1.2-2MB depending on your compiler.

I built HEAD of ICU just now, and here are the resulting .so's:

-rwxrwxr-x 1 tzlaine tzlaine  26M Sep 23 10:29 ./lib/libicudata.so.62.1
-rwxrwxr-x 1 tzlaine tzlaine 3.6M Sep 23 10:28 ./lib/libicui18n.so.62.1
-rwxrwxr-x 1 tzlaine tzlaine  65K Sep 23 10:28 ./lib/libicuio.so.62.1
-rwxrwxr-x 1 tzlaine tzlaine  66K Sep 23 10:28 ./lib/libiculx.so.62.1
-rwxrwxr-x 1 tzlaine tzlaine 234K Sep 23 10:28 ./lib/libicutu.so.62.1
-rwxrwxr-x 1 tzlaine tzlaine 2.2M Sep 23 10:28 ./lib/libicuuc.so.62.1
-rwxrwxr-x 1 tzlaine tzlaine 5.3K Sep 23 10:28 ./stubdata/libicudata.so.62.1
-rwxrwxr-x 1 tzlaine tzlaine  83K Sep 23 10:28
./tools/ctestfw/libicutest.so.62.1

So, I don't know how many of those you need, but if you require data (and
you do!), 26MB is a lot.  Note that I put collation data into headers, so
your runtime memory footprint might be much larger than 1.2-2MB, but the
minimum requirement is still only that small.  Requiring the user to pay
more than this minimum is a classic "Don't pay for what you don't use"
violation.

Another thing is that ICU allocates memory all over the place, in some
cases needlessly.

ICU also has IMO a poor (too complicated and confusing) API; there are way
too many types and functions, and the types that are emphasized are often
the wrong ones, like UTF-16 strings.  The algorithms should be C++-style
algorithms if this is something we're going to standardize.

Zach