[boost] Proposal: Vector-like container, which takes O(log(N)) for insert/erase, feedback

5 Apr 2014

      ...
Date: Thu, 3 Apr 2014 16:39:29 -0700
From: Glen Fernandes <glen.fernandes@gmail.com>
I wasn't sure at first why my name was in parenthesis in the e-mail 
subject. :-)
Glen
It's my fault, sorry about it.
Date: Fri, 04 Apr 2014 13:41:04 +0200
From: Adam Wulkiewicz <adam.wulkiewicz@gmail.com>
Subject: Re: [boost] Proposal: Vector-like container, which takes 
O(log(N)) for insert/erase
AFAIU just like a B-tree can be seen as a generalization of a binary 
search tree in that a node can have more than two children, your 
container can be seen as a similar generalization of a Rope. Or it can 
be seen as an augumented B-tree (not really because the values are no 
longer stored in internal nodes and the tree isn't sorted, correct me 
if I'm wrong). Your 
You understood right.
proposal is very similar to Counted B-tree 
(http://www.chiark.greenend.org.uk/~sgtatham/algorithms/cbtree.html), 
which is a B-tree with random access. The difference is that your 
container can't be used to search for a stored value in O(logN) 
without sorting and binary search. This may be ok if someone require 
only random access, the size of a node is smaller which probably 
improves the cacheing.
Again, right. There are a lot of situations when only random access is 
necessary. You probably will not sort characters in string.
In general, B-tree was introduced to improve the performance of the 
access to data stored in persistent memory (lower performance). In 
this case it was better to load the whole page into memory (and cache 
it) than load many small, scatered parts. But in the case of 
RAM-only-access, a binary search tree should be faster. As far as we 
don't consider spatial indexing, interval trees, etc. where 
values/nodes may "overlap" somehow and we must search in more than one 
subtree. Therefore I'm wondering if a classic Rope would be faster 
than your container in the use-case you're testing - simple insert and 
random 
It outperformed Rope 10 times. It's expected result. Please visit my
Hi, everybody!
The most important feature of my container is performance. However, 
other features will appear later.
https://github.com/alexkupri/array/blob/master/description/description.txt
Today I compared my container with rope. My container outperformed it 10 
times.

link, I added verbose explanation under "ROPE TEST" section. Shortly 
speaking, four reasons exist: cache, pages, balancing and malloc/free.
...
access to a in-memory container. Have you test it? I'd perform 2 tests:
1. Very big number of values + truly random insert/access to prevent 
cacheing
2. insert/access in the same area
I've tested with truly random tests. Please see results and code of 
SingleOperationPerformanceCheck and MultipleOperationsTest functions. 
Tell me, if I'm wrong.
IMHO the benefits related to the use of this container should be more 
noticable if the persistent storage were used. 
In memory/cache there are the same relationship as in HD/memory.
As for the name, it's definietly not an array. I'd call it B-rope.
Thanks, your option is accepted. I'll collect more options and we'll 
vote. I like brope.
Regards,
Adam
...
Message: 9
From: Lars Viklund <zao@acc.umu.se>
Date: Fri, 4 Apr 2014 14:10:06 +0200
If it doesn't have contiguous storage, it's not a replacement for
vector, however much you want it to be the universal container that
everyone uses for everything.
It may be a better general-purpose container compared to the
usual default containers, deque and vector, but it isn't a replacement
for either of them.
Well, I would see it as first choise container for non-sorted data. It's 
comparable with vector/deque
on small arrays and outperforms them on large sizes. It's more 
preferrable than list at least
in terms of malloc/free calls. And it's probably better for strings.
So one should use it by default in all cases with unsorted data and use 
list, vector, deque and string only
if he/she really understands what he/she does.
You use the term 'intrusive' in the documentation, but it doesn't seem
to carry the usual meaning that it requires the types involved to
participate in the container structure.
You are right. I've corrected the documentation. Now I call "fat leaves" 
leaves which contain elements, and
"thin leaves" leaves which contain pointers to elements.
Your tests doesn't seem to exercise much else than container<int>
either.
Suggest your tests.
I'm not quite sure I like the explicit L/M non-type template parameters
either, considering that it leaks implementation detail into every
single user of the type.
Well, you are right in principle. It must have default parameters. But 
it's platform dependent,
they should be chosen so that branch and leaf would take roughly 1 cache 
line, minus information
required for malloc overhead.
As for the Copy-on-Write semantics, isn't the general consensus that CoW
is a horrible horrible thing and is very hard to get right+fast in the
modern world with concurrency?
I disagree with your estimation. There are different situations, 
different machines. Embedded and mobile machines
want to save memory. Large machines can also want to save memory, if 
they amount of data is large. Finally, they
can even speed-up by avoiding copying the whole array.
Imagine you are writing text editor. (DNA editor, sound editor, etc.) 
You make backup copy and continue editing,
again backup copy and continue editing. And you keep only forward 
commands (without undo). You can easily rollback
to any state you need.
Ofcourse, user should decide, whether CoW option is necessary for 
him/her, because it wastes resources when enabled
and not used.
Message: 11
Date: Fri, 4 Apr 2014 23:23:09 +1000
From: Vadim Stadnik <vadimstdk@gmail.com>
Could you please provide a figure that shows data association with 
nodes in
your data structure?
I updated the doc, follow the link above.

[boost] Proposal: Vector-like container, which takes O(log(N)) for insert/erase, feedback

Alexander Kuprijanov