Firstly, you don't invest the multi-year effort to get a library into Boost because it's superior, as it probably isn't.
You do it because of what you will become after you succeed: a Boost library author who has passed the judgement of his or her peers.
Personally, I actually want to do it to help others, as that is the focal point of my project. I'm not greatly looking for personal gain, but I know that this algorithm gives much greater efficiency under the circumstances described - and I believe efficiency is important. Also, it solves some problem scenarios that cannot be solved with current algorithms - at least easily.
I am struggling to see the gain over other combinations of STL containers. If you need very fast deletion and never invalidated iterators, unordered_map is very hard to beat - the cost is insertion performance, but even that goes away with Hinnant's node_ptr proposed extensions to the standard. Dense hash maps probably will be very competitive, and we sorely need one of those in Boost and the STL.
The problem with map and unordered_map, which you can see from the first benchmark on the colony page (map currently outperforms unordered_map under GCC, so I didn't include it), is that they have terrible iteration and search performance. A colony has better erase and insert performance, and the best iteration speeds against all std:: containers in the context of requiring valid pointers/iterators. http://www.plflib.org/colony.htm#benchmarks
Don't do it because of that. Do it for yourself personally, or not at all. Trust me: my library is up for review this Friday. I started the process of getting it in in 2012. That's the commitment you need to make.
Gotcha, that is good to know- thanks.
I'd have thought there is surely a space wastage going on here too if I understand the algorithm. That can have cache spillage effects under multithreaded use.
If you take a look at the diagram at the top of the colony page, those gaps are filled in when any new 'add' is called - basically, they get added to a 'recycle' stack and popped off at next insertion. All empty slots are reused before adding to the end of the colony. Hence the unordered schema. However you are correct in that if many elements are deleted and no insert's performed, iteration will suffer - slightly. It doesn't affect cache performance, but obviously if you are skipping over many element slots that takes more time than if those slots don't actually exist anymore! Also when a memory block is 100% empty ie. all elements erased, the memory block is removed from the chain, so you don't have that wastage anymore. Those blocks can be tuned for size in the constructor.
I'd benchmark your algorithm when running four benchmarks in parallel. A lot of implementations which look faster single threaded look very slow when multithreaded due to LL cache pressure. The STL tries to balance those two off, and it generally succeeds, sacrificing some single threaded performance for much improved cache load.
To be fair, I have not looked into multithreaded performance. I pretty much followed the STL guidelines as per under what conditions containers can be used concurrently. I am potentially looking at implementing colonies on CUDA however.
FYI dense hash maps and bitwise trie maps have *excellent* multithreaded performance, almost linear speedup with cores, thanks to their low LL cache loading.
You may correct me here, but my understanding is that vector pretty much outperforms the other std:: containers in terms of iteration as cache performance is better, due to contiguous data location. According to Chandler Carruth, this also holds true for key-value searchs vs map, as the same rules apply (400ns for L1-cache-search beats multiple 200ns main memory fetches). www.youtube.com/watch?v=fHNmRkzxHWs If I'm missing something do let me know here. Genuine thanks, M