Re: [boost] Review managers wanted for augmented data structures
Vadim Stadnik wrote
Hi all,
At present, neither Boost library nor STL provides containers using augmented data structures. I suggest that the Boost community consider solving the challenging problem – how to extend STL by integrating advanced data structures.
...
Augmented trees are non-standard due to different performance guarantees. At the same these data structures are beneficial for complex algorithms, since they offer a wider set of more efficient operations compared to standard containers. The typical performance improvement for one operation is O(log N) against O(N). For less efficient operations these trees normally provide running time O(log N) against O(1), which is good enough for many applications.
This is very similar to a project I have been working on, with the intention of submitting to boost. The new data structure I have implemented is called a skip list. They can be used in place of balanced trees, and according to the inventor, insertion and deletion are significantly faster than for balanced trees (I have not tested this yet). See, e.g. William Pugh. 1990. Skip lists: a probabilistic alternative to balanced trees. Commun. ACM 33, 6 (June 1990), 668-676. DOI=10.1145/78973.78977 http://doi.acm.org/10.1145/78973.78977 The difference in performance guarantees are much more subtle than you describe above. For example, skip lists have expected O(log n) insert, delete and search, versus O(log n). In practice, the expected O(log n) time is just as good as true O(log n), unless a performance bound is required with certainty, because the probability of the skip list having height much larger than log n is extremely low. I also implemented what I called augmented skip lists, which allow for expected O(log n) rank lookup (i.e. random access). I wasn't aware of the counter tree library, or else I would probably have just used it instead of implementing skip lists. I used skip lists to provide "drop in" STL replacements. Now I have skip_list::set and skip_list::multiset, and I could similarly make skip_list::map and skip_list::multimap. There is also augmented_skip_list::set and augmented_skip_list::multiset. For now the augmented versions have an additional member function nth_element providing E[O(log n)] random access. Hopefully, this will be helpful towards the goals you have described.
There are two libraries in the Boost review schedule:
http://www.boost.org/community/review_schedule.html
"STL Extensions" and "Countertree". None of them has as yet a review manager. The order of the reviews may not be significant. It is more important that at least one library pass the review. This would help integrating other advanced data structures.
Are there Boost experts who could volunteer to take responsibility of review manager for any of these two libraries?
Data structures are very interesting to me, and I am in between jobs, so I have time to review one of these. If I take the time to review, I hope someone will be willing to review my skip list library, if I complete it and submit it to boost. -- View this message in context: http://boost.2283326.n4.nabble.com/Review-managers-wanted-for-augmented-data... Sent from the Boost - Dev mailing list archive at Nabble.com.
On Sat, Apr 6, 2013 at 1:39 AM, James Hirschorn wrote: Vadim Stadnik wrote ... Augmented trees are non-standard due to different performance guarantees.
At the same these data structures are beneficial for complex algorithms,
since they offer a wider set of more efficient operations compared to
standard containers. The typical performance improvement for one operation is O(log N) against O(N). For less efficient operations these trees
normally provide running time O(log N) against O(1), which is good enough
for many applications. This is very similar to a project I have been working on, with the
intention
of submitting to boost. The new data structure I have implemented is called
a skip list. They can be used in place of balanced trees, and according to
the inventor, insertion and deletion are significantly faster than for
balanced trees (I have not tested this yet). See, e.g. William Pugh. 1990. Skip lists: a probabilistic alternative to balanced
trees. Commun. ACM 33, 6 (June 1990), 668-676. DOI=10.1145/78973.78977
http://doi.acm.org/10.1145/78973.78977 The difference in performance guarantees are much more subtle than you
describe above. For example, skip lists have expected O(log n) insert,
delete and search, versus O(log n). In practice, the expected O(log n) time
is just as good as true O(log n), unless a performance bound is required
with certainty, because the probability of the skip list having height much
larger than log n is extremely low. I also implemented what I called
augmented skip lists, which allow for expected O(log n) rank lookup (i.e.
random access). I wasn't aware of the counter tree library, or else I would
probably have just used it instead of implementing skip lists. I used skip lists to provide "drop in" STL replacements. Now I have
skip_list::set and skip_list::multiset, and I could similarly make
skip_list::map and skip_list::multimap. There is also
augmented_skip_list::set and augmented_skip_list::multiset. For now the
augmented versions have an additional member function nth_element providing
E[O(log n)] random access. Hopefully, this will be helpful towards the goals you have described. IMO, the basic and augmented skip lists that support interfaces of STL
containers would be a valuable addition to Boost library. Skip lists have
an important advantage of relatively simple search and update operations.
Quite obviously, you when you propose these data structures, you will have
the problem of explaining the usefulness of these data structures as they
have non-standard computational complexities of some STL interface
functions.
I am interested in performance of various augmented data structures.
Comparison of deterministic data structures with randomized should be
particularly interesting. Please let me know when you finish and publish
your project. I will be happy to analyze your data structures and test
their performance.
... Are there Boost experts who could volunteer to take responsibility of
review manager for any of these two libraries? Data structures are very interesting to me, and I am in between jobs, so I
have time to review one of these. If I take the time to review, I hope someone will be willing to review my
skip list library, if I complete it and submit it to boost. If you mean the role of the review manager, then I really appreciate that
you agreed. However, there are Boost rules for a formal review and I am not
sure if you are eligible for this role. Hopefully, one of experienced Boost
members will help answer this question.
As for just review, I am interested in any helpful comments and suggestions
concerning advanced data structures.
Regards,
Vadim Stadnik
Zitat von Vadim Stadnik
As for just review, I am interested in any helpful comments and suggestions concerning advanced data structures.
whichever data structures you'll decide are useful to boost users, would you please consider implementing them as Boost.Intrusive containers from the get-go? Non-Intrusive containers can be implemented very easily in terms of Intrusive containers, as opposed to the other way around. Regards
On Sat, Apr 6, 2013 at 11:37 PM, Stefan Strasser
Zitat von Vadim Stadnik
: As for just review, I am interested in any helpful comments and
suggestions concerning advanced data structures.
whichever data structures you'll decide are useful to boost users, would you please consider implementing them as Boost.Intrusive containers from the get-go? Non-Intrusive containers can be implemented very easily in terms of Intrusive containers, as opposed to the other way around.
First of all, I cannot speak on behalf of boost users. I can only express my own opinion. In the general case the question is not trivial at all, since there are at least two significant parameters to be taken into account: - the cost of development of a data structure and/or container; - the total computational cost or the running time of a user algorithm; The first parameter is normally more important for a small project: a single developer or a small team of developers. The second parameter has top priority for applications with thousands and millions of users. As usual, there is a grey area. This is the case of properly formulated requirements. If Boost.Intrusive containers (and containers based on Boost.Intrusive containers) outperform all other containers, they are the best possible choice. They are winners by both of these important parameters. Otherwise their advantage is not so obvious. Regards, Vadim Stadnik
Zitat von Vadim Stadnik
If Boost.Intrusive containers (and containers based on Boost.Intrusive containers) outperform all other containers, they are the best possible choice. They are winners by both of these important parameters. Otherwise their advantage is not so obvious.
a non-intrusive container that is implemented in terms of an intrusive container is as efficient as any other non-intrusive container. it is a little more work to implement, to provide the Intrusive public interface. However, you can not implement an Intrusive container in terms of a non-intrusive container but have to reimplement the entire data structure. That's why Boost.Intrusive is not based on namespace std containers but are a reimplementation of the algorithms. The implementation of a non-intrusive container in terms of an intrusive one is as simple as: struct value_holder : intrusive::hook{ T value; }; and forwarding all calls of the STL container interface.
participants (3)
-
James Hirschorn
-
Stefan Strasser
-
Vadim Stadnik