Merging sorted iterators?
Hi, I have a bunch of iterators over some sorted input from several big files which do not fit into memory. What I would like to have is one iterator which allows me to iterate over the combined sorted input (of course, the result should be sorted, too). Something like this: -------\ --------\ ... ---- --------/ -------/ Is there a class in boost which allows me to do this in a simple way? Thanks and regards, Roland
G'day Roland.
Quoting "Dr. Roland Bock"
I have a bunch of iterators over some sorted input from several big files which do not fit into memory.
What I would like to have is one iterator which allows me to iterate over the combined sorted input (of course, the result should be sorted, too).
I wrote the following iterator some time ago. It's for ints, it
only handles two input iterators, and doesn't handle end conditions
(this wasn't an issue in the application), but it illustrates the idea:
template
Hi Andrew, thanks! Not being a template guru (yet), it took some time to get a grasp on that, but I am beginning to understand :-) When I come up with some nice variant, I am certainly going to contribute it to the vault as you suggested. Regards, Roland ajb@spamcop.net wrote:
G'day Roland.
Quoting "Dr. Roland Bock"
: I have a bunch of iterators over some sorted input from several big files which do not fit into memory.
What I would like to have is one iterator which allows me to iterate over the combined sorted input (of course, the result should be sorted, too).
I wrote the following iterator some time ago. It's for ints, it only handles two input iterators, and doesn't handle end conditions (this wasn't an issue in the application), but it illustrates the idea:
template
class merge_iterator : public boost::iterator_facade< merge_iterator , const int , incrementable_traversal_tag > { public: merge_iterator(It1 p_it1, It2 p_it2) : m_it1(p_it1), m_it2(p_it2) { } private: friend class iterator_core_access;
void increment() { int x1 = *m_it1; int x2 = *m_it2; if (x1 <= x2) { ++m_it1; } if (x1 >= x2) { ++m_it2; } }
const int& dereference() const { return std::min(*m_it1, *m_it2); }
It1 m_it1; It2 m_it2; };
As you can see, these things are actually pretty simple to write in Boost. Indeed, it's hard to see how this could be made any simpler and still retain C++ syntax.
For your problem, we'd have to assume that all of the input iterators have the same time, and store them in a priority_queue ordered on initial element. This is the sort of thing I'd be looking to do:
// WARNING: Untested code follows. This may not compile, let // alone work. There are probably const-correctness issues, for // example. Usual disclaimers about using at own risk apply as // if incorporated herein.
template<typename It> struct iterator_pair { It m_cur; It m_end;
bool at_end() { return m_cur == m_end; }
iterator_pair<It>& increment() { ++m_cur; return *this; }
const It& current() { return m_cur; } };
template<typename It> struct iterator_pair_comparator { bool operator()(const It& p_i1, const It& p_i2) { return std::less_than(*p_i1.current(), *p_i2.current()); } };
template
class merge_iterator : public boost::iterator_facade< merge_iterator , const T , incrementable_traversal_tag > { public: // Constructor and/or stuff to add iterators NYI. Also need // to implement some way to signal end conditions. private: friend class iterator_core_access;
void increment() { It top = m_iterators.top(); if (!top.increment().at_end()) { m_iterators.push(top); } }
const T& dereference() const { return *m_iterators.top().current(); }
std::priority_queue
, iterator_pair_comparator<It> > m_iterators; }; If you come up with a nice variation on this, please add it to the vault.
Cheers, Andrew Bromage _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
participants (3)
-
ajb@spamcop.net
-
Dr. Roland Bock
-
Roland Bock