On Sun, Sep 23, 2018 at 4:57 AM Andrey Semashev via Boost < boost@lists.boost.org> wrote:
On 9/23/18 7:45 AM, Zach Laine via Boost wrote:
I think a Unicode library is very much needed in Boost.
Out of curiosity, it looks like you implemented Unicode algorithms yourself. Why not use a specialized library, like ICU?
It's partly a question of the size of ICU, which is several megabytes, whereas Boost.Text is only 1.2-2MB depending on your compiler. I built HEAD of ICU just now, and here are the resulting .so's: -rwxrwxr-x 1 tzlaine tzlaine 26M Sep 23 10:29 ./lib/libicudata.so.62.1 -rwxrwxr-x 1 tzlaine tzlaine 3.6M Sep 23 10:28 ./lib/libicui18n.so.62.1 -rwxrwxr-x 1 tzlaine tzlaine 65K Sep 23 10:28 ./lib/libicuio.so.62.1 -rwxrwxr-x 1 tzlaine tzlaine 66K Sep 23 10:28 ./lib/libiculx.so.62.1 -rwxrwxr-x 1 tzlaine tzlaine 234K Sep 23 10:28 ./lib/libicutu.so.62.1 -rwxrwxr-x 1 tzlaine tzlaine 2.2M Sep 23 10:28 ./lib/libicuuc.so.62.1 -rwxrwxr-x 1 tzlaine tzlaine 5.3K Sep 23 10:28 ./stubdata/libicudata.so.62.1 -rwxrwxr-x 1 tzlaine tzlaine 83K Sep 23 10:28 ./tools/ctestfw/libicutest.so.62.1 So, I don't know how many of those you need, but if you require data (and you do!), 26MB is a lot. Note that I put collation data into headers, so your runtime memory footprint might be much larger than 1.2-2MB, but the minimum requirement is still only that small. Requiring the user to pay more than this minimum is a classic "Don't pay for what you don't use" violation. Another thing is that ICU allocates memory all over the place, in some cases needlessly. ICU also has IMO a poor (too complicated and confusing) API; there are way too many types and functions, and the types that are emphasized are often the wrong ones, like UTF-16 strings. The algorithms should be C++-style algorithms if this is something we're going to standardize. Zach