Date: Thu, 3 Apr 2014 16:39:29 -0700 From: Glen Fernandes
I wasn't sure at first why my name was in parenthesis in the e-mail subject. :-) Glen It's my fault, sorry about it. Date: Fri, 04 Apr 2014 13:41:04 +0200 From: Adam Wulkiewicz Subject: Re: [boost] Proposal: Vector-like container, which takes O(log(N)) for insert/erase AFAIU just like a B-tree can be seen as a generalization of a binary search tree in that a node can have more than two children, your container can be seen as a similar generalization of a Rope. Or it can be seen as an augumented B-tree (not really because the values are no longer stored in internal nodes and the tree isn't sorted, correct me if I'm wrong). Your You understood right. proposal is very similar to Counted B-tree (http://www.chiark.greenend.org.uk/~sgtatham/algorithms/cbtree.html), which is a B-tree with random access. The difference is that your container can't be used to search for a stored value in O(logN) without sorting and binary search. This may be ok if someone require only random access, the size of a node is smaller which probably improves the cacheing. Again, right. There are a lot of situations when only random access is necessary. You probably will not sort characters in string. In general, B-tree was introduced to improve the performance of the access to data stored in persistent memory (lower performance). In this case it was better to load the whole page into memory (and cache it) than load many small, scatered parts. But in the case of RAM-only-access, a binary search tree should be faster. As far as we don't consider spatial indexing, interval trees, etc. where values/nodes may "overlap" somehow and we must search in more than one subtree. Therefore I'm wondering if a classic Rope would be faster than your container in the use-case you're testing - simple insert and random It outperformed Rope 10 times. It's expected result. Please visit my
Hi, everybody! The most important feature of my container is performance. However, other features will appear later. https://github.com/alexkupri/array/blob/master/description/description.txt Today I compared my container with rope. My container outperformed it 10 times. link, I added verbose explanation under "ROPE TEST" section. Shortly speaking, four reasons exist: cache, pages, balancing and malloc/free.
access to a in-memory container. Have you test it? I'd perform 2 tests: 1. Very big number of values + truly random insert/access to prevent cacheing 2. insert/access in the same area I've tested with truly random tests. Please see results and code of SingleOperationPerformanceCheck and MultipleOperationsTest functions. Tell me, if I'm wrong.
IMHO the benefits related to the use of this container should be more noticable if the persistent storage were used. In memory/cache there are the same relationship as in HD/memory.
As for the name, it's definietly not an array. I'd call it B-rope. Thanks, your option is accepted. I'll collect more options and we'll vote. I like brope.
Regards, Adam
Message: 9 From: Lars Viklund
Date: Fri, 4 Apr 2014 14:10:06 +0200 If it doesn't have contiguous storage, it's not a replacement for vector, however much you want it to be the universal container that everyone uses for everything. It may be a better general-purpose container compared to the usual default containers, deque and vector, but it isn't a replacement for either of them. Well, I would see it as first choise container for non-sorted data. It's comparable with vector/deque on small arrays and outperforms them on large sizes. It's more preferrable than list at least in terms of malloc/free calls. And it's probably better for strings. So one should use it by default in all cases with unsorted data and use list, vector, deque and string only if he/she really understands what he/she does. You use the term 'intrusive' in the documentation, but it doesn't seem to carry the usual meaning that it requires the types involved to participate in the container structure. You are right. I've corrected the documentation. Now I call "fat leaves" leaves which contain elements, and "thin leaves" leaves which contain pointers to elements. Your tests doesn't seem to exercise much else than container<int> either. Suggest your tests. I'm not quite sure I like the explicit L/M non-type template parameters either, considering that it leaks implementation detail into every single user of the type. Well, you are right in principle. It must have default parameters. But it's platform dependent, they should be chosen so that branch and leaf would take roughly 1 cache line, minus information required for malloc overhead. As for the Copy-on-Write semantics, isn't the general consensus that CoW is a horrible horrible thing and is very hard to get right+fast in the modern world with concurrency? I disagree with your estimation. There are different situations, different machines. Embedded and mobile machines want to save memory. Large machines can also want to save memory, if they amount of data is large. Finally, they can even speed-up by avoiding copying the whole array. Imagine you are writing text editor. (DNA editor, sound editor, etc.) You make backup copy and continue editing, again backup copy and continue editing. And you keep only forward commands (without undo). You can easily rollback to any state you need. Ofcourse, user should decide, whether CoW option is necessary for him/her, because it wastes resources when enabled and not used. Message: 11 Date: Fri, 4 Apr 2014 23:23:09 +1000 From: Vadim Stadnik
Could you please provide a figure that shows data association with nodes in your data structure? I updated the doc, follow the link above.