Boost or Standard when there is the choice?

older
Indirectly 'recursive' type erasure

Paolo Bolzoni

11 Feb 2014 11 Feb '14

7:23 p.m.

Dear list, I was wondering, what is the best pratice when you can select between a standard solution or a boost one? For example, std::tie or boost::tie? std::unordered_map or boost::unordered_map? Personally I tend to use std solutions, but does it make sense? Yours sincerely, Paolo Bolzoni

Show replies by date

Maxime Boissonneault

11 Feb 11 Feb

7:32 p.m.

It depends what your objective is. If it is to require no extra library, then the standard is your only choice. If it is to be portable, then either will work, but keep in mind that although the standard is well defined, its implementation can change from one compiler to another, and bugs do exist. With boost, you have a single provider, for whatever compiler. Keep in mind also that std:: or boost:: are not necessarily the same with the exact same features. My 2 cents. Maxime Boissonneault Le 2014-02-11 14:23, Paolo Bolzoni a écrit :

...

Dear list, I was wondering, what is the best pratice when you can select between a standard solution or a boost one? For example, std::tie or boost::tie? std::unordered_map or boost::unordered_map?

Personally I tend to use std solutions, but does it make sense?

Yours sincerely, Paolo Bolzoni _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

Niall Douglas

8:08 p.m.

On 11 Feb 2014 at 20:23, Paolo Bolzoni wrote:

...

I was wondering, what is the best pratice when you can select between a standard solution or a boost one? For example, std::tie or boost::tie? std::unordered_map or boost::unordered_map?

Personally I tend to use std solutions, but does it make sense?

A standard library implementation is /probably/ more stable, better tested and more micro-optimised than Boost *eventually*. I stress the eventually - right now, C++ 11 standard libraries are still getting the bugs wringed out of them. but they are settling down and in the past year certainly I've found bugs more frequently in Boost implementations of C++11 features than in say Dinkumware's STL. Regarding the micro optimisation remark, I make this purely based on my experience with Visual Studio where the Dinkumware implementation almost always outperforms or is similar to a Boost implementation, especially in VS2013. This is partially because a Boost implementation usually has a lot more features and therefore a larger code footprint, but it's also because Dinkumware do a lot of micro optimisation once code is known stable. Lastly, using STL implementations is probably more future proof over say the next twenty years than using Boost. I can certainly see that once C++11 is ubiquitous, the incentive to maintain Boost C++11 implementations of standard features will drop substantially. Future users of your code base will then have to port over code away from Boost to gain timely access to fixes. All that said, I still choose Boost.Thread primitives over STL primitives, mainly due to the much superior feature set and I know many of those superior features will end up in C++1y, so why wait till then? Note that I use the namespace mapping technique to map /some/ implementation into a private namespace, that makes switching to a STL implementation later fairly trivial. Niall -- Currently unemployed and looking for work in Ireland. Work Portfolio: http://careers.stackoverflow.com/nialldouglas/

Ahmed Charles

12 Feb 12 Feb

5:30 p.m.

From: s_sourceforge@nedprod.com To: boost-users@lists.boost.org Date: Tue, 11 Feb 2014 20:08:59 +0000 Subject: Re: [Boost-users] Boost or Standard when there is the choice?

...

A standard library implementation is /probably/ more stable, better tested and more micro-optimised than Boost *eventually*. I stress the eventually - right now, C++ 11 standard libraries are still getting the bugs wringed out of them. but they are settling down and in the past year certainly I've found bugs more frequently in Boost implementations of C++11 features than in say Dinkumware's STL.

...

Regarding the micro optimisation remark, I make this purely based on my experience with Visual Studio where the Dinkumware implementation almost always outperforms or is similar to a Boost implementation, especially in VS2013. This is partially because a Boost implementation usually has a lot more features and therefore a larger code footprint, but it's also because Dinkumware do a lot of micro optimisation once code is known stable.

Since I'm pedantic, I think there's a phenomenon here that you're seeing but not ascribing properly. Dinkumware's initial implementation tends to not have much concern for performance at all and Boost's solution tends to be more mature by the time it's standardized and gets implemented by Dinkumware, which usually results in Boost being faster. The reason that Microsoft's STL ends up performing better than both Dinkumware's and Boost's implementations is because Microsoft cares about performance and implements most of the optimizations and because, like you said, Boost has more surface area. Microsoft currently cares and they know their compiler/platform better. Note: the STL that Dinkumware ships is not the same as the one Microsoft ships, so perhaps I'm just being pedantic about naming. As an example, Dinkumware's STL didn't have the make_shared optimization that Boost implemented until Microsoft rewrote make_shared to include it. That's a case where Boost did better until the standard provider improved their implementation. Granted, I don't think shared_ptr is complex enough that there is any difference between optimal implementations. PS. This ignores optional parts of the standard like advanced math functions which Boost provides but most standard library vendors will not, or if they do, it will only be to get the 'complete implementation' rubber stamp.

...

Lastly, using STL implementations is probably more future proof over say the next twenty years than using Boost. I can certainly see that once C++11 is ubiquitous, the incentive to maintain Boost C++11 implementations of standard features will drop substantially. Future users of your code base will then have to port over code away from Boost to gain timely access to fixes.

...

All that said, I still choose Boost.Thread primitives over STL primitives, mainly due to the much superior feature set and I know many of those superior features will end up in C++1y, so why wait till then? Note that I use the namespace mapping technique to map /some/ implementation into a private namespace, that makes switching to a STL implementation later fairly trivial.

Niall Douglas

7:50 p.m.

CC: Stephan @ Microsoft, for reason see end of email below. On 12 Feb 2014 at 9:30, Ahmed Charles wrote:

...

...
Regarding the micro optimisation remark, I make this purely based on my experience with Visual Studio where the Dinkumware implementation almost always outperforms or is similar to a Boost implementation, especially in VS2013. This is partially because a Boost implementation usually has a lot more features and therefore a larger code footprint, but it's also because Dinkumware do a lot of micro optimisation once code is known stable.

Since I'm pedantic, I think there's a phenomenon here that you're seeing but not ascribing properly. Dinkumware's initial implementation tends to not have much concern for performance at all and Boost's solution tends to be more mature by the time it's standardized and gets implemented by Dinkumware, which usually results in Boost being faster.

So far I agree.

...

The reason that Microsoft's STL ends up performing better than both Dinkumware's and Boost's implementations is because Microsoft cares about performance and implements most of the optimizations and because, like you said, Boost has more surface area. Microsoft currently cares and they know their compiler/platform better.

I agree that Microsoft do a ton of customisation of the Dinkumware STL, or rather, they /used/ to do a ton of customisation up till VS2013. From my reading of the Dinkumware STL headers, most of the customisation was MSVC specific workarounds rather than optimisations per se. When running a diff between VS2013's STL and the vanilla one we had at BlackBerry I saw a large drop in size of the patchset as compared to VS2012. I didn't investigate, so in fairness the cause could be anything.

...

Note: the STL that Dinkumware ships is not the same as the one Microsoft ships, so perhaps I'm just being pedantic about naming.

It's pretty close. The BB10 NDK also uses the Dinkumware STL, and it was not unheard of for us to compare the two when a bug turned up to see our STL had been fixed without us realising.

...

As an example, Dinkumware's STL didn't have the make_shared optimization that Boost implemented until Microsoft rewrote make_shared to include it. That's a case where Boost did better until the standard provider improved their implementation. Granted, I don't think shared_ptr is complex enough that there is any difference between optimal implementations.

Historically you are right, however Dinkumware's STL had a long head start on other STLs in implementing C++11 and indeed, to my knowledge, they were the first STL to hit full C++11 standards compliance (libc++ wasn't portable until very recently, and I might be right in thinking they were slightly later than Dinkumware's). But I'll tell you what Ahmed, probably neither you nor I are as knowledgeable on this as Stephan @ Microsoft, so I've CC-ied him to ask these simple questions: (i) how close is vanilla Dinkumware to the STL shipped in VS2013 (ii) would you, Stephan, say that the VS2013 STL is likely to be on average better performance than a Boost implementation of some given C++ 11 feature, and if yes/no, why in your opinion, and is it due to Dinkumware or Boost or other factors? Niall -- Currently unemployed and looking for work in Ireland. Work Portfolio: http://careers.stackoverflow.com/nialldouglas/

Stephan T. Lavavej

13 Feb 13 Feb

7:24 a.m.

[Niall]

...

When running a diff between VS2013's STL and the vanilla one we had at BlackBerry I saw a large drop in size of the patchset as compared to VS2012. I didn't investigate, so in fairness the cause could be anything.

Probably the elimination of faux variadics. [Ahmed]

...

As an example, Dinkumware's STL didn't have the make_shared optimization that Boost implemented until Microsoft rewrote make_shared to include it.

This is incorrect (or at least phrased in an ambiguous way; I am objecting to the timing implications). I explained the story at GoingNative 2012, so I can explain it again here. First, Boost invented shared_ptr and make_shared. Then Dinkumware implemented shared_ptr and make_shared, which Microsoft picked up (shared_ptr in 2008 SP1, make_shared in 2010). When I saw Dinkumware's implementation of make_shared, which achieved Boost/C++0x's recommended single allocation with a special deleter, I rewrote it (in faux variadics) to use a dedicated control block. While doing so, I paid attention to the layout and developed what I referred to as the "We Know Where You Live" optimization, squeezing out an unnecessary pointer. VC 2010 shipped with 5 control blocks (traditional, deleter, deleter/allocator, make_shared, allocate_shared), which were optimal (except that they did not attempt to compress the storage of provided but empty deleters/allocators; on my todo list). This optimization has been maintained in 2012 and 2013 (now true variadic). At GN 2012, I observationally determined that both Boost and libstdc++ had not implemented We Know Where You Live, with significant impacts on total memory consumption, so I explained the optimization without code, as a gift to Boost (who gave the world shared_ptr/make_shared). I don't think anyone figured this out before I did, or if they did, they didn't ship it, which is what matters. Since then, it has been explained to me that Boost paid the cost of an extra pointer as make_shared was implemented "non-intrusively", while libstdc++ accidentally paid an extra pointer (not noticing the control block redundancy). I don't know when Boost implemented the optimization, and whether libstdc++ has yet, but they are both aware, and I hope libc++ is too (I should ask Marshall/Howard). (Note: make_shared didn't ship in 2008 SP1 as it needed rvalue references, and at no time - even prerelease - did VC have a make_shared that performed two allocations.) [Ahmed]

...

That's a case where Boost did better until the standard provider improved their implementation.

It's the reverse. See http://channel9.msdn.com/Events/GoingNative/GoingNative-2012/STL11-Magic-Sec... slide 7, comparing VC10 SP1, Boost 1.48.0, and GCC 4.6.1. Boost is difficult to beat in correctness and performance for the reason that you mentioned (Boost gets a head start, when their features are standardized), so please, give us credit for this one thing! :-> (And for contributing the optimization back. I didn't have to tell anyone about that.) [Niall]

...

(i) how close is vanilla Dinkumware to the STL shipped in VS2013

I am unable to answer this question, as I have never seen Dinkumware's vanilla library. I can tell you that we work closely with Dinkumware, and that (in 2010 and above) what may appear to be Microsoft-specific edits, have actually been sent upstream to Dinkumware's master sources, which are capable of generating drops for Microsoft or other clients (whose existence, but not identity, I am aware of). Between 2005 and 2010 the sources were divergent, which caused difficulty in picking up new features (starting with TR1). Now we can rapidly acquire new features.

...

(ii) would you, Stephan, say that the VS2013 STL is likely to be on average better performance than a Boost implementation of some given C++ 11 feature,

It varies. Some regexes we do better, some we do worse (e.g. many alternations). make_shared, we did better, because (oddly enough) I had the luxury of (re)writing it from scratch, without any attempt to be "non-intrusive". Our shared_ptr on ARM is likely to be better than Boost, since we go to the trouble of using weaker-than-SC atomics. Our <atomic> is probably better on 2013 since we've figured out how to use intrinsics for almost everything (and even better in the next major version), although maybe not if you've looked at our intrinsics carefully too. Our threading stuff, we've tried to be fast by powering our futures with ConcRT, but I also know where a few bodies are buried (e.g. call_once). Correctness-wise is another story (e.g. I need to rewrite <functional>, Boost.Bind is more solid), although I will say that for the type traits, VC is likely to give more correct answers because I can request compiler hooks as I need them (and more importantly, compiler fixes). Hope this helps, STL

Niall Douglas

7:35 p.m.

On 13 Feb 2014 at 7:24, Stephan T. Lavavej wrote:

...

Since then, it has been explained to me that Boost paid the cost of an extra pointer as make_shared was implemented "non-intrusively", while libstdc++ accidentally paid an extra pointer (not noticing the control block redundancy). I don't know when Boost implemented the optimization, and whether libstdc++ has yet, but they are both aware, and I hope libc++ is too (I should ask Marshall/Howard).

Firstly Stephan this is a hugeful useful post, and thank you for taking the time to write it. The subject of whether to use STL or Boost implementations of C++11 facilities is likely to come up again and again, and unlike most speculators including myself, you actually know the answer. I will mention with regard to the above that the non-intrusiveness of Boost implementations is - as you acknowledged - done deliberately. This can make maintenance in the face of refactors easier to grok, something important for an open source project on which most work, especially in recent years, is done outside a for-pay context. In comparison any of the major STLs is vastly better resourced than Boost, one of the big reasons Boost had to bring in a community maintenance programme. As you know, there are some big enhancements of the C++ threading primitives planned to come soon to Boost with the expectation that they provide a testing ground for C++-1y directions, as some of the papers which have gone before WG21 recently have been ... perhaps insufficiently ambitious and overly ambitious simultaneously. Certainly I think everyone here can see a metaprogrammed reusable monadic framework implementing continuations as very desirable - that way Boost.Thread's facilities and Boost.AFIO's facilities plus several other Boost libraries can all reuse the same continuations metaprogramming rather than going it alone individually. We're hoping for a really great student for this year's GSoC to help drive forth that generic framework.

...

[Niall]

...
(i) how close is vanilla Dinkumware to the STL shipped in VS2013

I am unable to answer this question, as I have never seen Dinkumware's vanilla library.

I can tell you that we work closely with Dinkumware, and that (in 2010 and above) what may appear to be Microsoft-specific edits, have actually been sent upstream to Dinkumware's master sources, which are capable of generating drops for Microsoft or other clients (whose existence, but not identity, I am aware of). Between 2005 and 2010 the sources were divergent, which caused difficulty in picking up new features (starting with TR1). Now we can rapidly acquire new features.

I suspected that exact case simply through comparing the BB10 NDK and VS2013. The NDK's STL seemed to have a lot of Microsoft formatted code (it looks distinctive somehow, I can't quite explain how).

...

...
(ii) would you, Stephan, say that the VS2013 STL is likely to be on average better performance than a Boost implementation of some given C++ 11 feature,

It varies. Some regexes we do better, some we do worse (e.g. many alternations).

The <random> generator implementations on ARM in libstdc++ are a good example of being pathologically awful performance.

...

"non-intrusive". Our shared_ptr on ARM is likely to be better than Boost, since we go to the trouble of using weaker-than-SC atomics. Our <atomic> is probably better on 2013 since we've figured out how to use intrinsics for almost everything (and even better in the next major version), although maybe not if you've looked at our intrinsics carefully too. Our threading stuff, we've tried to be fast by powering our futures with ConcRT, but I also know where a few bodies are buried (e.g. call_once).

One GSoC project proposal I added is to patch in memory transaction support throughout Boost wherever it makes sense. We have an implementation (which I wrote), we just need someone to do the donkey work of the refactor. It still doesn't gain us the automatic use of cooperative scheduling thanks to ConcRT, but it should help.

...

Correctness-wise is another story (e.g. I need to rewrite <functional>, Boost.Bind is more solid), although I will say that for the type traits, VC is likely to give more correct answers because I can request compiler hooks as I need them (and more importantly, compiler fixes).

I am surprised ... I had less issues getting std::bind() to work mostly as expected across all compilers as compared to Boost.Bind which has a particular knack of breaking at least one particular compiler and none of the others (it ICEs VS2010 when used with C++11 very frequently in particular). In my particular use case scenarios only of course, others almost certainly have a better experience. Thanks Stephan. Niall -- Currently unemployed and looking for work in Ireland. Work Portfolio: http://careers.stackoverflow.com/nialldouglas/

Stephan T. Lavavej

16 Feb 16 Feb

4:01 a.m.

[Niall]

...

I will mention with regard to the above that the non-intrusiveness of Boost implementations is - as you acknowledged - done deliberately. This can make maintenance in the face of refactors easier to grok,

As a library maintainer, I can say that I appreciate maintainability, but performance for users is more important. I am willing to add a fair amount of complexity in the pursuit of performance (e.g. I recently added a bunch of code to is_permutation() to optimize various things). In the specific case of make_shared, the dedicated control block was about the same amount of code as the special deleter, and it ended up being more robust (in particular, wrt enable_shared_from_this). That said, in Issaquah I learned (via Peter Dimov's proposal) that the shared_ptr casts can now be implemented non-intrusively via the aliasing constructors, which is obvious in hindsight, so I'll probably do that soon. STL

Ahmed Charles

14 Feb 14 Feb

5:27 a.m.

----------------------------------------

...

From: stl@exchange.microsoft.com To: s_sourceforge@nedprod.com; boost-users@lists.boost.org Date: Thu, 13 Feb 2014 07:24:59 +0000 Subject: Re: [Boost-users] Boost or Standard when there is the choice?

[Niall]

...
When running a diff between VS2013's STL and the vanilla one we had at BlackBerry I saw a large drop in size of the patchset as compared to VS2012. I didn't investigate, so in fairness the cause could be anything.

Probably the elimination of faux variadics.

[Ahmed]

...
As an example, Dinkumware's STL didn't have the make_shared optimization that Boost implemented until Microsoft rewrote make_shared to include it.

This is incorrect (or at least phrased in an ambiguous way; I am objecting to the timing implications). I explained the story at GoingNative 2012, so I can explain it again here.

Sorry for being incorrect. I was attempting to relay the story below, but apparently misremembered or misheard it the first time. No offense intended. :)

...

First, Boost invented shared_ptr and make_shared. Then Dinkumware implemented shared_ptr and make_shared, which Microsoft picked up (shared_ptr in 2008 SP1, make_shared in 2010). When I saw Dinkumware's implementation of make_shared, which achieved Boost/C++0x's recommended single allocation with a special deleter, I rewrote it (in faux variadics) to use a dedicated control block. While doing so, I paid attention to the layout and developed what I referred to as the "We Know Where You Live" optimization, squeezing out an unnecessary pointer. VC 2010 shipped with 5 control blocks (traditional, deleter, deleter/allocator, make_shared, allocate_shared), which were optimal (except that they did not attempt to compress the storage of provided but empty deleters/allocators; on my todo list). This optimization has been maintained in 2012 and 2013 (now true variadic).

At GN 2012, I observationally determined that both Boost and libstdc++ had not implemented We Know Where You Live, with significant impacts on total memory consumption, so I explained the optimization without code, as a gift to Boost (who gave the world shared_ptr/make_shared). I don't think anyone figured this out before I did, or if they did, they didn't ship it, which is what matters.

Since then, it has been explained to me that Boost paid the cost of an extra pointer as make_shared was implemented "non-intrusively", while libstdc++ accidentally paid an extra pointer (not noticing the control block redundancy). I don't know when Boost implemented the optimization, and whether libstdc++ has yet, but they are both aware, and I hope libc++ is too (I should ask Marshall/Howard).

(Note: make_shared didn't ship in 2008 SP1 as it needed rvalue references, and at no time - even prerelease - did VC have a make_shared that performed two allocations.)

[Ahmed]

...
That's a case where Boost did better until the standard provider improved their implementation.

It's the reverse. See http://channel9.msdn.com/Events/GoingNative/GoingNative-2012/STL11-Magic-Sec... slide 7, comparing VC10 SP1, Boost 1.48.0, and GCC 4.6.1.

Thanks for the link. Unfortunately, I didn't go back to the presentation before sending the mail. :/

...

Boost is difficult to beat in correctness and performance for the reason that you mentioned (Boost gets a head start, when their features are standardized), so please, give us credit for this one thing! :-> (And for contributing the optimization back. I didn't have to tell anyone about that.)

[Niall]

...
(i) how close is vanilla Dinkumware to the STL shipped in VS2013

I am unable to answer this question, as I have never seen Dinkumware's vanilla library.

I can tell you that we work closely with Dinkumware, and that (in 2010 and above) what may appear to be Microsoft-specific edits, have actually been sent upstream to Dinkumware's master sources, which are capable of generating drops for Microsoft or other clients (whose existence, but not identity, I am aware of). Between 2005 and 2010 the sources were divergent, which caused difficulty in picking up new features (starting with TR1). Now we can rapidly acquire new features.

...
(ii) would you, Stephan, say that the VS2013 STL is likely to be on average better performance than a Boost implementation of some given C++ 11 feature,

It varies. Some regexes we do better, some we do worse (e.g. many alternations). make_shared, we did better, because (oddly enough) I had the luxury of (re)writing it from scratch, without any attempt to be "non-intrusive". Our shared_ptr on ARM is likely to be better than Boost, since we go to the trouble of using weaker-than-SC atomics. Our <atomic> is probably better on 2013 since we've figured out how to use intrinsics for almost everything (and even better in the next major version), although maybe not if you've looked at our intrinsics carefully too. Our threading stuff, we've tried to be fast by powering our futures with ConcRT, but I also know where a few bodies are buried (e.g. call_once).

Correctness-wise is another story (e.g. I need to rewrite <functional>, Boost.Bind is more solid), although I will say that for the type traits, VC is likely to give more correct answers because I can request compiler hooks as I need them (and more importantly, compiler fixes).

Hope this helps, STL _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

Ahmed Charles

5:36 a.m.

----------------------------------------

...

From: s_sourceforge@nedprod.com To: boost-users@lists.boost.org Date: Wed, 12 Feb 2014 19:50:47 +0000 CC: stl@microsoft.com Subject: Re: [Boost-users] Boost or Standard when there is the choice?

...
The reason that Microsoft's STL ends up performing better than both Dinkumware's and Boost's implementations is because Microsoft cares about performance and implements most of the optimizations and because, like you said, Boost has more surface area. Microsoft currently cares and they know their compiler/platform better.

I agree that Microsoft do a ton of customisation of the Dinkumware STL, or rather, they /used/ to do a ton of customisation up till VS2013. From my reading of the Dinkumware STL headers, most of the customisation was MSVC specific workarounds rather than optimisations per se. When running a diff between VS2013's STL and the vanilla one we had at BlackBerry I saw a large drop in size of the patchset as compared to VS2012. I didn't investigate, so in fairness the cause could be anything.

My point here, though apparently I didn't get it across well at all, was to give credit to Microsoft and STL for caring about performance and pushing to make it better. My example of make_shared was wrong, but I think the pattern still holds that Boost starts with a 'better' implementation and the standard implementers catch up and sometimes 'beat' Boost in some respect.

...

...
Note: the STL that Dinkumware ships is not the same as the one Microsoft ships, so perhaps I'm just being pedantic about naming.

It's pretty close. The BB10 NDK also uses the Dinkumware STL, and it was not unheard of for us to compare the two when a bug turned up to see our STL had been fixed without us realising.

Correct, they are likely to be very close because Microsoft gives fixes back, which for portable fixes, means everyone benefits.

gast128

15 Feb 15 Feb

9:11 p.m.

...

my experience with Visual Studio where the Dinkumware implementation almost always outperforms or is similar to a Boost implementation

I had one big bad experience with std::unordered_map in the past, see topic 'Performance difference between std::unordered_map and boost::unordered_map' (https://groups.google.com/forum/#!topic/comp.lang.c++.moderated/pFBjDiW6mW0). Also a factor is that bugs get repaired more easily in Boost, while in vs you mostly have to wait for a new version (if it is fixed).

Daniel James

16 Feb 16 Feb

1:04 p.m.

On 15 February 2014 21:11, gast128 <gast128@hotmail.com> wrote:

...

...
my experience with Visual Studio where the Dinkumware implementation almost always outperforms or is similar to a Boost implementation

I had one big bad experience with std::unordered_map in the past, see topic 'Performance difference between std::unordered_map and boost::unordered_map' (https://groups.google.com/forum/#!topic/comp.lang.c++.moderated/pFBjDiW6mW0).

According to Joaquin's recent benchmarks the recent standard implementations are generally faster than the boost implementation, which IMO is how it should be. I'm looking forward to Boost.Unordered's obsolescence. http://bannalia.blogspot.co.uk/

Niall Douglas

5:09 p.m.

On 16 Feb 2014 at 13:04, Daniel James wrote:

...

On 15 February 2014 21:11, gast128 <gast128@hotmail.com> wrote:

...
...
my experience with Visual Studio where the Dinkumware implementation almost always outperforms or is similar to a Boost implementation

I had one big bad experience with std::unordered_map in the past, see topic 'Performance difference between std::unordered_map and boost::unordered_map' (https://groups.google.com/forum/#!topic/comp.lang.c++.moderated/pFBjDiW6mW0).

According to Joaquin's recent benchmarks the recent standard implementations are generally faster than the boost implementation, which IMO is how it should be. I'm looking forward to Boost.Unordered's obsolescence.

http://bannalia.blogspot.co.uk/

It seemed to me that blog says libstdc++ is equally much faster and much slower than Boost depending on the test with a huge variance (sometimes 4x). libc++ is always slower, sometimes much slower. Dinkumware is usually faster than Boost, but not always. Besides, unordered_map<> while important is not the end of all of a STL quality of implementation. A 4x performance difference one can usually live with, while a 400x performance less so. Also, performance isn't everything, memory footprint is also very important. It's unfortunate that blog didn't show differences in unordered_map<> memory consumption. I'd warrant that Dinkumware's unordered_map<> is very, very good on memory footprint. For the other STLs, I wouldn't like to guess. Niall -- Currently unemployed and looking for work in Ireland. Work Portfolio: http://careers.stackoverflow.com/nialldouglas/

Joaquin M Lopez Munoz

5:34 p.m.

Niall Douglas <s_sourceforge <at> nedprod.com> writes:

...

On 16 Feb 2014 at 13:04, Daniel James wrote:

...
According to Joaquin's recent benchmarks the recent standard implementations are generally faster than the boost implementation, which IMO is how it should be. I'm looking forward to Boost.Unordered's obsolescence.

http://bannalia.blogspot.co.uk/

[...]

Also, performance isn't everything, memory footprint is also very important. It's unfortunate that blog didn't show differences in unordered_map<> memory consumption. I'd warrant that Dinkumware's unordered_map<> is very, very good on memory footprint. For the other STLs, I wouldn't like to guess.

No practical measure was taken, but one can guess from the layout of the different data structures as described in http://bannalia.blogspot.com/2013/10/implementation-of-c-unordered.html http://bannalia.blogspot.com/2013/10/implementation-of-c-unordered_25.html The overhead (memory minus sizeof(value_type)) in words for a container of N elments and B buckets is: * Dinkum: 2B + 2N * libstdc++: 1B + 2N (1B + 1N is hash is fast and does not throw) * libc++: 1B + 2N * Boost.Unordered - unique elements: 1B + 2N - duplicate elements: 1B + 3N * Boost.MultiIndex: 1B + 2N So, in fact Dinkumware and Boost.Unordered-duplicate are the ones that use the most memory. Joaquín M López Muñoz Telefónica Digital

Niall Douglas

17 Feb 17 Feb

4:33 p.m.

On 16 Feb 2014 at 17:34, Joaquin M Lopez Munoz wrote:

...

...
Also, performance isn't everything, memory footprint is also very important. It's unfortunate that blog didn't show differences in unordered_map<> memory consumption. I'd warrant that Dinkumware's unordered_map<> is very, very good on memory footprint. For the other STLs, I wouldn't like to guess.

The overhead (memory minus sizeof(value_type)) in words for a container of N elments and B buckets is:

* Dinkum: 2B + 2N * libstdc++: 1B + 2N (1B + 1N is hash is fast and does not throw) * libc++: 1B + 2N * Boost.Unordered - unique elements: 1B + 2N - duplicate elements: 1B + 3N * Boost.MultiIndex: 1B + 2N

So, in fact Dinkumware and Boost.Unordered-duplicate are the ones that use the most memory.

I have to admit I am surprised that Dinkumware would be an outlier here. But I'm no expert, I was simply judging from fairly fuzzy indirect benchmarking. Thanks for the detail Joaquín, it was usefully informative. Niall -- Currently unemployed and looking for work in Ireland. Work Portfolio: http://careers.stackoverflow.com/nialldouglas/

Joaquin M Lopez Munoz

16 Feb 16 Feb

5:25 p.m.

Daniel James <dnljms <at> gmail.com> writes:

...

On 15 February 2014 21:11, gast128 <gast128 <at> hotmail.com> wrote:

...
...
my experience with Visual Studio where the Dinkumware implementation almost always outperforms or is similar to a Boost implementation

I had one big bad experience with std::unordered_map in the past[...]

According to Joaquin's recent benchmarks the recent standard implementations are generally faster than the boost implementation, which IMO is how it should be.

I think this nos the story the benchmarks tell. 15 different scenarios were tested: * Insertion without rehashing, unique elements * Insertion with rehashing, unique elements * Insertion without rehashing, duplicate elements * Insertion without rehashing, duplicate elements, high load factor * Insertion with rehashing, duplicate elements * Insertion with rehashing, duplicate elements, high load factor * Erasure, unique elements * Erasure, dupicate elements * Erasure, dupicate elements, high load factor * Successful lookup, unique elements * Unsuccessful lookup, unique elements * Successful lookup, duplicate elements * Unsuccessful lookup, duplicate elements * Successful lookup, duplicate elements, high load factor * Unsuccessful lookup, duplicate elements, high load factor and results do not show any definite winner (< means "better than"): * VS 2012 http://bannalia.blogspot.com/2014/01/a-better-hash-table.html - insert_norehash_unique: BMI < Dinkum < BU - insert_rehash_unique: BMI < Dinkum < BU - insert_norehash_duplicate: Dinkum < BMI < BU - insert_norehash_duplicate_highlf: Dinkum < BMI < BU - insert_rehash_duplicate: Dinkum < BMI ~ BU - insert_rehash_duplicate_highlf: Dinkum ~ BMI ~ BU - erase_unique: BMI < Dinkum < BU - erase_duplicate: Dinkum < BMI < BU - erase_duplicate_highlf: Dinkum < BMI < BU - ok_lookup_unique: BMI < Dinkum < BU - nok_lookup_unique: Dinkum < BMI < BU - ok_lookup_duplicate: BMI < Dinkum < BU - nok_lookup_duplicate: BMI < BU < Dinkum - ok_lookup_duplicate_highlf: BMI < BU < Dinkum - nok_lookup_duplicate_highlf: BU ~ BMI < Dinkum * GCC 4.8.2 http://bannalia.blogspot.com/2014/01/a-better-hash-table-gcc.html - insert_norehash_unique: BMI < BU < libstdc++ - insert_rehash_unique: BMI < BU < libstdc++ - insert_norehash_duplicate: libstdc++ < BMI < BU - insert_norehash_duplicate_highlf: libstdc++ < BMI < BU - insert_rehash_duplicate: libstdc++ < BMI < BU - insert_rehash_duplicate_highlf: libstdc++ < BMI < BU - erase_unique: libstdc++ < BMI < BU - erase_duplicate: libstdc++ < BMI < BU - erase_duplicate_highlf: libstdc++ < BMI < BU - ok_lookup_unique: BMI < BU < libstdc++ - nok_lookup_unique: BMI < BU <libstdc++ - ok_lookup_duplicate: BMI < BU < libstdc++ - nok_lookup_duplicate: BMI ~ BU < libstdc++ - ok_lookup_duplicate_highlf: BMI < BU < libstdc++ - nok_lookup_duplicate_highlf: BMI ~ BU < libstdc++ * Clang 3.4 with libstdc++ http://bannalia.blogspot.com/2014/01/a-better-hash-table-clang.html - insert_norehash_unique: BMI < BU < libstdc++ - insert_rehash_unique: libstdc++ ~ BU ~ BMI - insert_norehash_duplicate: libstdc++ < BMI < BU - insert_norehash_duplicate_highlf: libstdc++ < BMI < BU - insert_rehash_duplicate: libstdc++ < BMI < BU - insert_rehash_duplicate_highlf: libstdc++ < BMI < BU - erase_unique: libstdc++ < BMI < BU - erase_duplicate: libstdc++ < BMI < BU - erase_duplicate_highlf: libstdc++ < BMI < BU - ok_lookup_unique: BMI < BU < libstdc++ - nok_lookup_unique: BMI < BU < libstdc++ - ok_lookup_duplicate: BMI < BU < libstdc++ - nok_lookup_duplicate: BMI < BU < libstdc++ - ok_lookup_duplicate_highlf: BMI < BU < libstdc++ - nok_lookup_duplicate_highlf: BMI < BU < libstdc++ * Clang 3.4 with libc++ http://bannalia.blogspot.com/2014/01/a-better-hash-table-clang.html - insert_norehash_unique: BMI < BU < libc++ - insert_rehash_unique: BMI < BU < libc++ - insert_norehash_duplicate: BMI < BU < libc++ - insert_norehash_duplicate_highlf: BMI < BU < libc++ - insert_rehash_duplicate: BMI < BU < libc++ - insert_rehash_duplicate_highlf: BMI < BU < libc++ - erase_unique: BMI < libc++ < BU - erase_duplicate: BMI < BU < libc++ - erase_duplicate_highlf: BMI < BU < libc++ - ok_lookup_unique: BMI < BU < libc++ - nok_lookup_unique: BMI < BU < libc++ - ok_lookup_duplicate: BMI < BU < libc++ - nok_lookup_duplicate: BMI ~ BU < libc++ - ok_lookup_duplicate_highlf: BMI < BU < libc++ - nok_lookup_duplicate_highlf: BMI ~ BU < libc++ In particular, Boost containers (Unordered and MultiIndex) fare better on lookup. Also, they are the only ones using a special data structure to skip duplicate elements in constant time, which shows in much faster lookups for high load factors. Dinkum is generally faster at erasure, but does so at the expense of potentially throwing where no throwing is permitted (Stephan already knows about this). Some fixed conditions for the tests, like the fact that easy-to-hash uints are used, do certainly bias some results in favor of containers *not* storing hash values (Dinkum, BMI, libstdc++ in some cases). But, summarizing, I wouldn't say standard library implementations can be unconditionally be deemed superior. Joaquín M López Muñoz Telefónica Digital

Daniel James

6:51 p.m.

On 16 February 2014 17:25, Joaquin M Lopez Munoz <joaquin@tid.es> wrote:

...

I think this nos the story the benchmarks tell. 15 different scenarios were tested:

* Insertion without rehashing, unique elements * Insertion with rehashing, unique elements * Insertion without rehashing, duplicate elements * Insertion without rehashing, duplicate elements, high load factor * Insertion with rehashing, duplicate elements * Insertion with rehashing, duplicate elements, high load factor * Erasure, unique elements * Erasure, dupicate elements * Erasure, dupicate elements, high load factor * Successful lookup, unique elements * Unsuccessful lookup, unique elements * Successful lookup, duplicate elements * Unsuccessful lookup, duplicate elements * Successful lookup, duplicate elements, high load factor * Unsuccessful lookup, duplicate elements, high load factor

and results do not show any definite winner (< means "better than"):

Well, of course I mostly noticed the results where Boost.Unordered did worse. I also wasn't paying much attention to the results for duplicate elements as I've found that most people don't care about them at all. One of the annoying things I've found is that the things I've spent time optimising are the things that users never benchmark. I think yours are the first benchmarks I've seen to deal with elements with equivalent keys. But what I'm not sure about, is how closely benchmarks reflect actual use. A minor dilemma that I've got at the moment is that on 64-bit machines the newer versions use a different bucket placement strategy to avoid the costly prime modulus. It's typically faster, but for certain cases (such as inserting consecutive integers) the old version is much faster as there are no collisions at all. The problem is I have no idea how often this happens in real world cases. It seems like something that will happen fairly often, so I could put in a special case for integers, but that would result in a performance hit for less regular numbers.

Joaquin M Lopez Munoz

7:34 p.m.

Daniel James <dnljms <at> gmail.com> writes:

...

On 16 February 2014 17:25, Joaquin M Lopez Munoz <joaquin <at> tid.es> wrote:

...
I think this not the story the benchmarks tell. 15 different scenarios were tested:

[...]

and results do not show any definite winner (< means "better than"):

[...] I think yours are the first benchmarks I've seen to deal with elements with equivalent keys. But what I'm not sure about, is how closely benchmarks reflect actual use.

Neither am I :-/

...

A minor dilemma that I've got at the moment is that on 64-bit machines the newer versions use a different bucket placement strategy to avoid the costly prime modulus. It's typically faster, but for certain cases (such as inserting consecutive integers) the old version is much faster as there are no collisions at all. The problem is I have no idea how often this happens in real world cases.

My gut feeling is that inserting more or less consecutive ranges of integers is a fairly common scenario. In order to speed modulo calculations, I do this trick: instead of having size_t hash_to_bucket(size_t h) { return h%bucket_size; } I write the following: size_t hash_to_bucket(size_t h) { switch(bucket_size_index){ case 0: return h%53; case 1: return h%97; case 2: return h%193; case 2: return h%389; ... case 59: return h%18446744073709551557ul; } } Each h%constant operation the compiler is able to heavily optimize by emulating integer division with precalculated multiplications (there's a section in Hacker's Delight on this I guess). I took a look at the generated assembly code and integer divisions were in fact gone, and the thing is considerably faster despite the huge number of cases (switching on [0,...n) can be coded very efficiently and I think CPU branch prediction plays a role here also). See http://tinyurl.com/okou28x for the actual implementation. You might want to try someting similar in Boost.Unordered. Joaquín Mª López Muñoz Telefónica Digital

Ion Gaztañaga

8:50 p.m.

El 16/02/2014 20:34, Joaquin M Lopez Munoz escribió:

...

Each h%constant operation the compiler is able to heavily optimize by emulating integer division with precalculated multiplications (there's a section in Hacker's Delight on this I guess). I took a look at the generated assembly code and integer divisions were in fact gone, and the thing is considerably faster despite the huge number of cases (switching on [0,...n) can be coded very efficiently and I think CPU branch prediction plays a role here also).

Awesome Joaquín. Thanks for sharing this. Best, Ion

Daniel James

17 Feb 17 Feb

12:46 p.m.

On 16 February 2014 19:34, Joaquin M Lopez Munoz <joaquin@tid.es> wrote:

...

See http://tinyurl.com/okou28x for the actual implementation. You might want to try someting similar in Boost.Unordered.

I saw this in your code a little while ago and tried it, and it is faster than the old implementation, but it's still slower than the newer version, because of the way I find the bucket for a node. I cache the mixed hash function in a node, rather than the original hash function, so that the bucket for a node can be found with a single 'and' operation. This can be done several times when searching for a node, so even an optimised modulus is notably slower. Of course this doesn't apply to MultiIndex as you use a different data structure.

4155

Age (days ago)

4161

Last active (days ago)

List overview

Download

3 comments

2 participants

participants (2)

Ahmed Charles
Daniel James
gast128
Ion Gaztañaga
Joaquin M Lopez Munoz
Maxime Boissonneault
Niall Douglas
Paolo Bolzoni
Stephan T. Lavavej