[Container] Priority Deque - Refinement
Hi. I've recently made a double-ended priority queue (priority deque) publicly available at https://github.com/nmcclatchey/Priority-Deque , with the intent of submitting it to Boost (more specifically, I wish to make it part of Boost.Container). I would appreciate any comments, discussion, or refinements. Goals of the project: * Provide an efficient double-ended priority queue, with interface similar to std::priority_queue. * Keep requirements to a minimum. In particular, require nothing except what is already required by std::priority_queue. * Make it fast. Performance should be competitive with std::priority_queue. How goals are met: * The priority deque is implemented as a single class template, mimicking std::priority_queue * The interface of priority_deque is a strict superset of the interface of std::priority_queue. New functions follow the STL naming scheme. * The underlying container is assumed to be a sequence container supporting random-access iterators, front(), back(), and push_back(), as in std::priority_queue. No other assumptions are made. * The underlying data structure is an interval heap, which allows finding min/max in O(1) time, and single-element changes in O(log n) time. * Benchmarks (on queues with 20,000,000+ elements) indicated that the difference in performance between std::priority_queue and the priority deque is negligible when compiled by GCC with the O3 option. A specific question for anyone more experienced with development of generalized code: * I added several functions for controlled access to arbitrary elements. Is this overkill? If so, should they be removed? Thank-you for your time, Nate
AMDG On 11/23/2013 10:20 AM, Nathaniel McClatchey wrote:
I've recently made a double-ended priority queue (priority deque) publicly available at https://github.com/nmcclatchey/Priority-Deque , with the intent of submitting it to Boost (more specifically, I wish to make it part of Boost.Container). I would appreciate any comments, discussion, or refinements.
- Please consider (and document) the exception safety guarantees that you provide. From a brief glance at the code, I suspect that merge and push, at least can leave an invalid heap if they throw. - You're going to need a lot more tests. In particular, you need to check that pushing and popping values gives the correct results, not just that the priority_deque's invariants hold.
<snip>
A specific question for anyone more experienced with development of generalized code: * I added several functions for controlled access to arbitrary elements. Is this overkill? If so, should they be removed?
I assume that you're referring to begin_mutable, end_mutable, and make_valid? What I'd actually prefer to see is for this to be implemented as algorithms akin to std::push_heap and std::pop_heap. That way priority_deque is just a thin wrapper, and those who need more functionality can easily build their own. In Christ, Steven Watanabe
23.11.2013 22:20, Nathaniel McClatchey:
[...] I wish to make it part of Boost.Container [...] [...] * The underlying data structure is an interval heap [...]
I think it is worth to consider to make it part of Boost.Heap library ( http://www.boost.org/doc/libs/1_55_0/doc/html/heap.html ). In particular there is boost::heap::priority_queue ( http://www.boost.org/doc/libs/1_55_0/doc/html/boost/heap/priority_queue.html ) which is wrapper around STL heap functions (make_heap, etc) but with interface similar to other Boost.Heap containers. If your implementation would have std::make_heap/etc functions (which are desirable on their own, as Steven mentioned) then it is even possible to have two interfaces: Boost.Heap-like and std::priority_queue-like. -- Evgeny Panasyuk
[...] I wish to make it part of Boost.Container [...] [...] * The underlying data structure is an interval heap [...]
I think it is worth to consider to make it part of Boost.Heap library ( http://www.boost.org/doc/libs/1_55_0/doc/html/heap.html ).
i'd welcome this as contribution to boost.heap, if you want to integrate it there. however it provides two features for its priority_queue implementation, which are currently missing in the priority deque: * configuration for stable heap order * mutability the lack of this functionality may be intended if the priority_deque is supposed to mimic std::priority_queue ... while boost.heap's data structures offer more functionality, which may be required by some use cases ... cheers, tim
24.11.2013 13:22, tim:
[...] I wish to make it part of Boost.Container [...] [...] * The underlying data structure is an interval heap [...]
I think it is worth to consider to make it part of Boost.Heap library ( http://www.boost.org/doc/libs/1_55_0/doc/html/heap.html ).
i'd welcome this as contribution to boost.heap, if you want to integrate it there. however it provides two features for its priority_queue implementation, which are currently missing in the priority deque:
* mutability
Could you please clarify, does boost::heap::priority_queue support mutable interface? (I know that other heaps support it, like fibonacci_heap) -- Evgeny Panasyuk
Stabilizing a priority deque is not as simple as stabilizing a priority queue. The priority queue can be stabilized by appending to each element of the heap an integer "tie breaker" such that each new element's tie breaker is greater than that of any previous element. Problems with overflow can be dealt with (in O(n) time) by adjusting all tie breakers simultaneously and reconstructing the heap. The same method will work for priority deques, but with one important difference: if the maximum element is stable (FIFO), the minimum element would have the reverse order (LIFO). Correcting this is likely to require searching through all elements indistinguishable from the minimum, which is potentially linear rather than logarithmic. Personally, I would prefer to create the stabilized priority deque (with notes about the behavior) as a thin wrapper around the unstable deque. This is primarily because the stabilized deque may not be able to offer the same options as the unstable deque (such as choice of underlying container).
I assume that you're referring to begin_mutable, end_mutable, and make_valid?
I was thinking of begin, end, set, and erase actually. Exposing (as protected member functions) the ability to make large-scale changes to the deque has immediately apparent advantages. In particular, it would allow a (mostly) stabilized heap to operate in amortized O(log n) time, O(n) worst case. Editing or removing a single element is highly situational, especially when the iterators used to find that element must be discarded immediately afterward.
Please consider (and document) the exception safety guarantees that you provide. From a brief glance at the code, I suspect that merge and push, at least can leave an invalid heap if they throw.
Good point. I hadn't considered exception-safety until you suggested it. If movement and comparison do not throw, I believe I will be able to offer as strong a guarantee as offered by the operations on the underlying container. If they can throw, I can offer the basic guarantee, but only because an empty heap is a vacuously valid heap. I'll keep working to try to improve these.
You're going to need a lot more tests. In particular, you need to check that pushing and popping values gives the correct results, not just that the priority_deque's invariants hold.
You are correct. I do need more tests. My original plan was to wait to create unit tests until the revision stage was done, because testing preservation of invariants is more sensitive to failures than most other tests, but I can see your point (need to make sure that the elements added are actually in the deque, and that the popped elements are indeed removed from it).
What I'd actually prefer to see is for this to be implemented as algorithms akin to std::push_heap and std::pop_heap. That way priority_deque is just a thin wrapper
If your implementation would have std::make_heap/etc functions (which are desirable on their own, as Steven mentioned) then it is even >possible to have two interfaces: Boost.Heap-like and std::priority_queue-like.
I think it is worth to consider to make it part of Boost.Heap library ( http://www.boost.org/doc/libs/1_55_0/doc/html/heap.html ).
I had considered this possibility previously, but I thought the interfaces of the private functions were too complicated to expose to an average user. There are also some concerns over whether an average user would find my internal design choices to be intuitive. For example, I realized early on that either the minimum or maximum would have a slight performance advantage over the other, depending on how I constructed my intervals. Because the priority deque is so similar to the priority queue, I decided to give the advantage to the maximum; this has resulted in intervals of the form [max, min], rather than the intuitive [min, max]. It would not be difficult to change this, but it is still something to consider. Nevertheless, this is worth revisiting. I will consider ways to simplify, separate, and make intuitive the interval-heap functions.
* mutability
I would appreciate a clarification of what was meant. I included functions for traversing the deque, editing individual elements, and editing large numbers of elements, so I must assume that my intuitive definition of the meaning of "mutability" differs from yours.
hi,
Stabilizing a priority deque is not as simple as stabilizing a priority queue. The priority queue can be stabilized by appending to each element of the heap an integer "tie breaker" such that each new element's tie breaker is greater than that of any previous element. Problems with overflow can be dealt with (in O(n) time) by adjusting all tie breakers simultaneously and reconstructing the heap. The same method will work for priority deques, but with one important difference: if the maximum element is stable (FIFO), the minimum element would have the reverse order (LIFO).
having fifo/max lifo/min would be quite reasonable. a fifo of both min and max sides could probably be achieved by using use stability counters, no?
I think it is worth to consider to make it part of Boost.Heap library ( http://www.boost.org/doc/libs/1_55_0/doc/html/heap.html ).
I had considered this possibility previously, but I thought the interfaces of the private functions were too complicated to expose to an average user. There are also some concerns over whether an average user would find my internal design choices to be intuitive. For example, I realized early on that either the minimum or maximum would have a slight performance advantage over the other, depending on how I constructed my intervals. Because the priority deque is so similar to the priority queue, I decided to give the advantage to the maximum; this has resulted in intervals of the form [max, min], rather than the intuitive [min, max]. It would not be difficult to change this, but it is still something to consider.
boost.heap allows the user to configure the data structures via template arguments. these design decisions could be exposed to the user.
* mutability
I would appreciate a clarification of what was meant. I included functions for traversing the deque, editing individual elements, and editing large numbers of elements, so I must assume that my intuitive definition of the meaning of "mutability" differs from yours.
if you change the key (priority) of an element, the heap order may be violated, so you may have to update the heap structure. http://www.boost.org/doc/libs/1_55_0/doc/html/heap/concepts.html#heap.concep... cheers, tim
On Nov 25, 2013, at 4:08 AM, tim
having fifo/max lifo/min would be quite reasonable. a fifo of both min and max sides could probably be achieved by using use stability ^^^ two
This didn't translate well to HTML-based viewers without font control, unfortunately. I think you were trying to append "two", but then I wonder if you didn't mean "too". ___ Rob (Sent from my portable computation engine)
On 25 November 2013 12:30, Rob Stewart wrote:
On Nov 25, 2013, at 4:08 AM, tim
wrote: having fifo/max lifo/min would be quite reasonable. a fifo of both min and max sides could probably be achieved by using use stability ^^^ two
This didn't translate well to HTML-based viewers without font control, unfortunately. I think you were trying to append "two", but then I wonder if you didn't mean "too".
He meant s/use/two/ i.e. "using two stability counters"
Here is a tentative implementation of priority_deque as a thin wrapper around STL-like interval heap functions (provided in interval_heap.hpp): https://github.com/nmcclatchey/Priority-Deque/tree/Thin-Wrapper The function interfaces in interval_heap.hpp are likely to change as I more thoroughly examine the heaps in STL and Boost.Heap.
having fifo/max lifo/min would be quite reasonable. a fifo of both min and max sides could probably be achieved by using use stability
An implementation using a single stability counter modifies the comparison (which must be a strict weak ordering) to break ties according to the order in which the elements were added. For a priority queue, newer objects are considered "less" than older objects. For a priority deque, breaking ties in favor of one direction precludes breaking of ties in the other. Thus, simultaneous FIFO max and LIFO min is trivial (use the same stability counter as used for the priority queue), but that particular method cannot be used efficiently for simultaneous FIFO max and FIFO min. I don't believe that using two stability counters would help, but if you've got ideas or (preferably) details about how it could be accomplished, I would be happy to read them. While I'm on the subject of stability counters, I believe I can improve on the method used in Boost.Heap's stable priority queue. In particular, the Boost.Heap queue does not handle integer overflow except by throwing an exception. A better way to handle overflow would, I believe, be to sort the underlying container according to its counters (O(n^2), O(n log n), or O(n), depending on sort used). Follow this by editing the counters to be consecutive from the minimum possible value (O(n)), and by re-building the heap (O(n)).
hey,
having fifo/max lifo/min would be quite reasonable. a fifo of both min and max sides could probably be achieved by using use stability
An implementation using a single stability counter modifies the comparison (which must be a strict weak ordering) to break ties according to the order in which the elements were added. For a priority queue, newer objects are considered "less" than older objects. For a priority deque, breaking ties in favor of one direction precludes breaking of ties in the other. Thus, simultaneous FIFO max and LIFO min is trivial (use the same stability counter as used for the priority queue), but that particular method cannot be used efficiently for simultaneous FIFO max and FIFO min.
I don't believe that using two stability counters would help, but if you've got ideas or (preferably) details about how it could be accomplished, I would be happy to read them.
one will have to change the way the stability counters are used: one possibility is to have a list of elements for each key (priority) and look for the maximum stability count for the minimum or maximum side. this is more a node-based approach, which probably does not translate very well to a container adaptor, though ... i had been thinking about the chaining of identical keys when implementing boost.heap ... it has the nice advantage that it slightly simplifies the tree structures
While I'm on the subject of stability counters, I believe I can improve on the method used in Boost.Heap's stable priority queue. In particular, the Boost.Heap queue does not handle integer overflow except by throwing an exception. A better way to handle overflow would, I believe, be to sort the underlying container according to its counters (O(n^2), O(n log n), or O(n), depending on sort used). Follow this by editing the counters to be consecutive from the minimum possible value (O(n)), and by re-building the heap (O(n)).
yes, the handling of the stability count is a bit fragile, though 64bit integers are probably enough to prevent overflows in most applications. a simple way to prevent the overflow could also be to simply subtract the minimum value of all stability counts, which could be done in O(n) by traversing the heap twice. cheers, tim
AMDG On 11/25/2013 11:14 PM, tim wrote:
yes, the handling of the stability count is a bit fragile, though 64bit integers are probably enough to prevent overflows in most applications. a simple way to prevent the overflow could also be to simply subtract the minimum value of all stability counts, which could be done in O(n) by traversing the heap twice.
That's unreliable, because there's no guarantee that any particular counter is ever removed. It's quite possible still to have 0 in the queue when you reach 2^64. In Christ, Steven Watanabe
yes, the handling of the stability count is a bit fragile, though 64bit integers are probably enough to prevent overflows in most applications. a simple way to prevent the overflow could also be to simply subtract the minimum value of all stability counts, which could be done in O(n) by traversing the heap twice.
That's unreliable, because there's no guarantee that any particular counter is ever removed. It's quite possible still to have 0 in the queue when you reach 2^64.
in theory, yes ... though i doubt this is an issue in the real world ... especially one can do quite a lot of work before a 64bit counter overflows and if this really turns out to be a problem ... just use a larger counter ;)
I have updated the main branch ( https://github.com/nmcclatchey/Priority-Deque ). - More appropriate names have been chosen. interval_heap.hpp, for example, provides make_interval_heap, push_interval_heap, etc. In priority_deque.hpp, the member function "set" was changed to "update" to be more consistent with the way Boost.Heap names things. - Exception-safety guarantees have now been documented for both priority_deque.hpp and interval_heap.hpp. Where possible (push/pop/update with non-throwing move or copy), the strong guarantee has been provided. The multiple-element functions, such as make_interval_heap, provide only the basic guarantee because the strong guarantee would cripple performance. If you think a guarantee can be strengthened, that a better nomenclature is available, please let me know. As always, any other comments are also welcome. Now that I have more thoroughly examined Boost.Heap, I can conclude that implementing the following concepts and interfaces would impractical when mimicking the STL priority_queue interface, which allows the user to specify the type of underlying container: - Template options. These would require that I depart from the interface of std::priority_queue. - Stability. Adding a stability counter would require modifying the type stored in the underlying container and would (slightly) decrease performance. - Mutability. I'd need to implement handles, which have the same objections as stability. I have provided a more limited version of Boost.Heap's mutability, which modifies at an iterator. - Ordered Iterators. I can think of at least one way to add these, even without departing from STL's interface, but it is not truly practical because it would require O(n) additional memory. Of these concepts, at least mutability can be created with relative ease in a separate class which does not let the user choose the underlying container. Stability can be added, with the caveat that if one end is FIFO, the other is necessarily LIFO. As such, I think Panasyuk's suggestion is the wisest option in this case. The project to-do list is currently: - Tests. I need to write lots and lots of tests for this class, including tests of exception-safety guarantees. - A separate priority deque class which fully conforms to the Boost.Heap interface, rather than STL interface. This class can then provide stability and mutability. - See if I can make any functions in interval_heap.hpp more elegant.
in theory, yes ... though i doubt this is an issue in the real world ... especially one can do quite a lot of work before a 64bit counter overflows and if this really turns out to be a problem ... just use a larger counter ;)
Yes, it's true that a 64-bit counter does effectively solve the problem. However, using a large counter may decrease performance (especially if it increases the likelihood of a page fault), and increase required memory. Thus it may be worthwhile to have the failsafe fix all problems, if for no reason other than permitting safe use of smaller counters on mobile or low-memory devices.
hey, sry for the delay, this got lost in my inbox :/
Now that I have more thoroughly examined Boost.Heap, I can conclude that implementing the following concepts and interfaces would impractical when mimicking the STL priority_queue interface, which allows the user to specify the type of underlying container: - Template options. These would require that I depart from the interface of std::priority_queue.
std::priority_queue provides a very limited interface. extending or using a different interface is not necessarily a disadvantage ;)
- Stability. Adding a stability counter would require modifying the type stored in the underlying container and would (slightly) decrease performance.
boost.heap follows the principle that you only pay for what you use: if you don't configure the data structure to have a stable heap order, you won't have any performance overhead.
- Mutability. I'd need to implement handles, which have the same objections as stability. I have provided a more limited version of Boost.Heap's mutability, which modifies at an iterator.
see above: you only pay for what you use. for d-ary heaps (which keep the tree in an array) mutability requires an indirection: the values are stored in std::list and the actual heap only contains list iterators. side note: this indirection may actually be a performance benefit if copying a value is (much) more expensive than copying an interator. the reason or choosing an opaque handle instead of using iterators is that handles are never invalidated, while iterators can be.
- Ordered Iterators. I can think of at least one way to add these, even without departing from STL's interface, but it is not truly practical because it would require O(n) additional memory.
the idea behind boost.heap is to provide ordered iterators, no matter if their implementation is efficient or not. in some cases (check ordered_iterator_adaptor.hpp) they need to keep track of the unvisited nodes. however there are use cases for traversing all elements in heap order.
in theory, yes ... though i doubt this is an issue in the real world ... especially one can do quite a lot of work before a 64bit counter overflows and if this really turns out to be a problem ... just use a larger counter ;)
Yes, it's true that a 64-bit counter does effectively solve the problem. However, using a large counter may decrease performance (especially if it increases the likelihood of a page fault), and increase required memory. Thus it may be worthwhile to have the failsafe fix all problems, if for no reason other than permitting safe use of smaller counters on mobile or low-memory devices.
does the space overhead really matter on these devices? iac, i do have a prototype for re-assigning the stability counter if an overflow will occur ... but this requires ordered iterators ;) cheers, tim
i'd welcome this as contribution to boost.heap, if you want to integrate it there. however it provides two features for its priority_queue implementation, which are currently missing in the priority deque:
* mutability
Could you please clarify, does boost::heap::priority_queue support mutable interface?
no, but boost::heap's priority_queue is only a wrapper around the poor stl interface
participants (7)
-
Evgeny Panasyuk
-
Jonathan Wakely
-
Nathaniel McClatchey
-
Rob Stewart
-
Steven Watanabe
-
tim
-
Tim Blechmann