Boost range - Add variadic join/zip - Boost - lists.stage.boost.cppalliance.org

Boost range - Add variadic join/zip

older
[chrono/date] conversion between...

Gonzalo BG

9 May 2013 9 May '13

3:01 p.m.

Code available here: https://gist.github.com/gnzlbg/5547905 Let, std::array<double,4> a = {{ 1, 2, 3, 4 }}; std::list<int> b = { 11, 22, 33, 44 }; std::deque<int> c = { 111, 222, 333, 444 }; std::vector<int> d = {1111,2222,3333,4444 }; If the values stored in a container have the same type, I would like to iterate over different containers as if it were a single one, and: /// - modify its value for(auto&& i : join(b,c,d)) { i += 1; } /// - use it with boost algorithms boost::transform(join(b,c,d),begin(join(b,c,d)),[&](int j){ return j * 2; }); /// - and read it for(const auto& i : join(b,c,d)) { std::cout << i << "\n"; } Another problem I usually face is manipulating data that has different types. This data is usually stored in different containers for efficiency, but it does belong together. Zip can be used to iterate throw a row of this kind of table-like data structure: for(const auto& t : zip(a,b,c,d)) { std::cout << t << "\n"; } However, boost's zip_iterators are not writable. I don't really think it would be possible to make them writable, but that would allow code like this: typedef decltype(*begin(zip(a,c,d))) tIt; boost::sort(zip(a, c, d), [](const tIt& i, const tIt& j){ return boost::get<0>(i) > boost::get<0>(j); }); which sorts the three containers in lock-step after the values in the first container (a). Some parts of this code can be implemented with minor extensions to boost range. A variadic join might look like this: template<class C> auto join(C&& c) -> decltype(boost::make_iterator_range(c)) { return boost::make_iterator_range(c); } template<class C, class D, class... Args> auto join(C&& c, D&& d, Args&&... args) -> decltype(boost::join(boost::join(boost::make_iterator_range(std::forward<C>(c)), boost::make_iterator_range(std::forward<D>(d))), join(std::forward<Args>(args)...))) { return boost::join(boost::join(boost::make_iterator_range(std::forward<C>(c)), boost::make_iterator_range(std::forward<D>(d))), join(std::forward<Args>(args)...)); } A variadic zip is more complicated. If we want write access we cannot use boost zip_iterator. Still one can use Anthony Williams' TupleIterator: template <class... T> auto zip(T&&... c) -> boost::iterator_range< decltype(iterators::makeTupleIterator(std::begin(std::forward<T>(c))...))> { return boost::make_iterator_range (iterators::makeTupleIterator(std::begin(std::forward<T>(c))...), iterators::makeTupleIterator(std::end(std::forward<T>(c))...)); } For read-only access one could use boost::zip_iterator, but I think write-access is _really_ important (e.g. sort wouldn't work). Would it be possible to add similar functionality to boost range? This code wouldn't work without the help of everyone who participated in the following SO discussions: - http://stackoverflow.com/questions/14366576/boostrangejoin-for-multiple-rang... - http://stackoverflow.com/questions/13840998/sorting-zipped-locked-containers... and without Anthony Williams's tupleIterator. See also http://www.justsoftwaresolutions.co.uk/articles/pair_iterators.pdf Bests, Gonzalo BG

Show replies by date

Neil Groves

10 May 10 May

8:37 a.m.

...

However, boost's zip_iterators are not writable. I don't really think it would be possible to make them writable, but that would allow code like this:

...

template<class C, class D, class... Args> auto join(C&& c, D&& d, Args&&... args) ->

decltype(boost::join(boost::join(boost::make_iterator_range(std::forward<C>(c)),

boost::make_iterator_range(std::forward<D>(d))), join(std::forward<Args>(args)...))) { return boost::join(boost::join(boost::make_iterator_range(std::forward<C>(c)),

boost::make_iterator_range(std::forward<D>(d))), join(std::forward<Args>(args)...)); }

A variadic zip is more complicated. If we want write access we cannot use boost zip_iterator. Still one can use Anthony Williams' TupleIterator:

template <class... T> auto zip(T&&... c) -> boost::iterator_range<

decltype(iterators::makeTupleIterator(std::begin(std::forward<T>(c))...))> { return boost::make_iterator_range (iterators::makeTupleIterator(std::begin(std::forward<T>(c))...), iterators::makeTupleIterator(std::end(std::forward<T>(c))...)); }

For read-only access one could use boost::zip_iterator, but I think write-access is _really_ important (e.g. sort wouldn't work).

Would it be possible to add similar functionality to boost range?

Yes and I shall do so when I find time. If you find time to create a patch it would be even quicker of course, but I understand if this is not possible. It seems we are all running with less and less time to work on these extra-curricula projects!

...

This code wouldn't work without the help of everyone who participated in the following SO discussions: -

http://stackoverflow.com/questions/14366576/boostrangejoin-for-multiple-rang... -

http://stackoverflow.com/questions/13840998/sorting-zipped-locked-containers... and without Anthony Williams's tupleIterator. See also http://www.justsoftwaresolutions.co.uk/articles/pair_iterators.pdf

I'll have to see if I can persuade Anthony to be charitable enough to allow us to include his tuple iterator. Ideally this would be made a public feature of Boost.Iterator I think.

...

Bests, Gonzalo BG

Thank you for your feedback. It's great to have positive suggestions that are actionable like this. Regards, Neil Groves

Jonathan Wakely

10:28 a.m.

On 9 May 2013 16:01, Gonzalo BG wrote:

...

A variadic zip is more complicated. If we want write access we cannot use boost zip_iterator. Still one can use Anthony Williams' TupleIterator:

template <class... T> auto zip(T&&... c) -> boost::iterator_range<

decltype(iterators::makeTupleIterator(std::begin(std::forward<T>(c))...))> { return boost::make_iterator_range (iterators::makeTupleIterator(std::begin(std::forward<T>(c))...), iterators::makeTupleIterator(std::end(std::forward<T>(c))...)); }

For read-only access one could use boost::zip_iterator, but I think write-access is _really_ important (e.g. sort wouldn't work).

It looks as though it's undefined behaviour to zip ranges of different lengths, because you'll walk off the end of the shorter ranges. My variadic zip stops at the end of the shortest range, which seems to be consistent with zip functions in most other languages I've looked at. Your adaptors also get dangling references if used with rvalue ranges, although this is a problem with the existing boost range adaptors too.

Neil Groves

11:55 a.m.

On Fri, May 10, 2013 at 11:28 AM, Jonathan Wakely <jwakely.boost@kayari.org>wrote:

...

On 9 May 2013 16:01, Gonzalo BG wrote:

...
A variadic zip is more complicated. If we want write access we cannot use boost zip_iterator. Still one can use Anthony Williams' TupleIterator:

template <class... T> auto zip(T&&... c) -> boost::iterator_range<

decltype(iterators::makeTupleIterator(std::begin(std::forward<T>(c))...))> {

...
return boost::make_iterator_range (iterators::makeTupleIterator(std::begin(std::forward<T>(c))...), iterators::makeTupleIterator(std::end(std::forward<T>(c))...)); }

For read-only access one could use boost::zip_iterator, but I think write-access is _really_ important (e.g. sort wouldn't work).

It looks as though it's undefined behaviour to zip ranges of different lengths, because you'll walk off the end of the shorter ranges.

My variadic zip stops at the end of the shortest range, which seems to

...

be consistent with zip functions in most other languages I've looked at.

I like being able to avoid the cost of checking for the end of every item in the zip especially for non-random access iterators. In anything I put into Boost.Range I think it of paramount importance to obey the zero overhead principle. It seems that it would be simple to allow both end detection mechanisms.

...

Your adaptors also get dangling references if used with rvalue ranges, although this is a problem with the existing boost range adaptors too.

Yes, this has come up numerous times. It's a problem far beyond just ranges and range adaptors. Knowing you a little, I suspect you have a solution I have not thought of to better deal with the issue. Is the variadic zip iterator you implemented public? Thanks, Neil Groves

Jonathan Wakely

12:44 p.m.

On 10 May 2013 12:55, Neil Groves wrote:

...

On Fri, May 10, 2013 at 11:28 AM, Jonathan Wakely

...
It looks as though it's undefined behaviour to zip ranges of different lengths, because you'll walk off the end of the shorter ranges.

My variadic zip stops at the end of the shortest range, which seems to

...
be consistent with zip functions in most other languages I've looked at.

I like being able to avoid the cost of checking for the end of every item in the zip especially for non-random access iterators. In anything I put into Boost.Range I think it of paramount importance to obey the zero overhead principle. It seems that it would be simple to allow both end detection mechanisms.

Hi Neil, That makes sense. My implementation always truncates the ranges to the shortest length but I should make it unchecked and then provide a second interface to do the checking and truncating if needed.

...

...
Your adaptors also get dangling references if used with rvalue ranges, although this is a problem with the existing boost range adaptors too.

Yes, this has come up numerous times. It's a problem far beyond just ranges and range adaptors. Knowing you a little, I suspect you have a solution I have not thought of to better deal with the issue.

Is the variadic zip iterator you implemented public?

I don't have my own zip_iterator (well, I do, but I'm still working on it :-) but my zip() function is at https://gitorious.org/redistd/redistd/blobs/master/include/redi/zip.h and just uses boost::zip_iterator. The solution to the dangling reference problem is surprisingly simple in C++11: template<typename Traversable> struct adaptor { explicit adaptor(Traversable&& r) : range(r) { } Traversable range; auto begin() const -> decltype(range.begin()) { return range.begin(); } auto end() const -> decltype(range.end()) { return range.end(); } }; // When called with lvalue, Traversable deduces to R& // When called with rvalue, Traversable deduces to R template<typename Traversable> adaptor<Traversable> adapt(Traversable&& t) { return adaptor<Traversable>(std::forward<Traversable>(t)); } When you call adapt(lvalue) you get an adaptor<R&> and the member adaptor<R&>::range is a reference to the lvalue. This makes the lvalue case cheap as there's no copying. (This is what your adaptors do today.) When you call adapt(rvalue) you get an adaptor<R>and the member adaptor<R>::range is a copy of the rvalue, initialized by a move. The move construction is not as cheap as binding a reference, but it's safe and avoids a dangling reference. This would be very hard to do without rvalue references, and doesn't play nicely with make_iterator_range, because you don't want to make an iterator_range that refers to a temporary range, or you're back to the dangling reference problem. I've used this solution in isolated cases, but haven't got a generic solution to the problem.

Jeff Flinn

2:31 p.m.

On 5/10/2013 8:44 AM, Jonathan Wakely wrote:

...

On 10 May 2013 12:55, Neil Groves wrote:

...
On Fri, May 10, 2013 at 11:28 AM, Jonathan Wakely

...
It looks as though it's undefined behaviour to zip ranges of different lengths, because you'll walk off the end of the shorter ranges.

My variadic zip stops at the end of the shortest range, which seems to

...
be consistent with zip functions in most other languages I've looked at.

I like being able to avoid the cost of checking for the end of every item in the zip especially for non-random access iterators. In anything I put into Boost.Range I think it of paramount importance to obey the zero overhead principle. It seems that it would be simple to allow both end detection mechanisms.

The closest existing practice I see is the std::mismatch algorithm which requires the first of the two input sequences to be the longest, and only checks first1 != last1. boost::range::mismatch pays the overhead of checking both input sequence iterators against their corresponding end iterators. Perhaps the std::mismatch approach is appropriate for zip iterator as well. In accommodating std::mismatch requirements when I don't know that the first sequence is longest I've a wrapped mismatch that check's the sizes and swaps argument references, though this is limited in it's genericity. Jeff

Jonathan Wakely

2:51 p.m.

On 10 May 2013 15:31, Jeff Flinn wrote:

...

The closest existing practice I see is the std::mismatch algorithm which requires the first of the two input sequences to be the longest, and only checks first1 != last1.

As an aside, see http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3607.html which was voted into C++14

Neil Groves

7:39 p.m.

On Fri, May 10, 2013 at 3:31 PM, Jeff Flinn <Jeffrey.Flinn@gmail.com> wrote:

...

On 5/10/2013 8:44 AM, Jonathan Wakely wrote:

...
On 10 May 2013 12:55, Neil Groves wrote:

...
On Fri, May 10, 2013 at 11:28 AM, Jonathan Wakely

...
It looks as though it's undefined behaviour to zip ranges of different lengths, because you'll walk off the end of the shorter ranges.

My variadic zip stops at the end of the shortest range, which seems to

...
be consistent with zip functions in most other languages I've looked at.

I like being able to avoid the cost of checking for the end of every item in the zip especially for non-random access iterators. In anything I put into Boost.Range I think it of paramount importance to obey the zero overhead principle. It seems that it would be simple to allow both end detection mechanisms.

The closest existing practice I see is the std::mismatch algorithm which requires the first of the two input sequences to be the longest, and only checks first1 != last1.

boost::range::mismatch pays the overhead of checking both input sequence iterators against their corresponding end iterators.

It does indeed pay the price under all circumstances with no option to opt-out. This is my design error. I have a renewed effort on obeying the zero-overhead principle in the last year or so. I've found design decisions in libraries that I have used to not provide zero-overhead options extremely limiting. I shall go back and review all of the Boost.Range code and provide zero-overhead options wherever I have failed to do so. For a zip iterator, of course, the overhead could be considerably greater depending on how many ranges were zipped together. I don't believe there is a need for me to choose the right solution for my clients. It's trivial to allow both.

...

Perhaps the std::mismatch approach is appropriate for zip iterator as well. In accommodating std::mismatch requirements when I don't know that the first sequence is longest I've a wrapped mismatch that check's the sizes and swaps argument references, though this is limited in it's genericity.

Ah yes this comes back to the boost::size modifications I also have to do! I could utilise an optimised boost::size() that provides O(1) for containers such as list to provide optimised implementations under more scenarios. It looks like I've got some work to do!

...

Jeff

Thank you for pointing this out since I had honestly forgotten that I'd made this design decision for mismatch. Regards, Neil Groves

Dave Abrahams

12 May 12 May

7:33 p.m.

on Thu May 09 2013, Gonzalo BG <gonzalobg88-AT-gmail.com> wrote:

...

However, boost's zip_iterators are not writable.

That's surprising to hear. IIRC, at least, they used to be writable. You *can* assign into a tuple of references. -- Dave Abrahams

Jonathan Wakely

9:34 p.m.

On 12 May 2013 20:33, Dave Abrahams wrote:

...

on Thu May 09 2013, Gonzalo BG <gonzalobg88-AT-gmail.com> wrote:

...
However, boost's zip_iterators are not writable.

That's surprising to hear. IIRC, at least, they used to be writable.

They still are. The docs even make note of it: http://www.boost.org/doc/libs/1_53_0/libs/iterator/doc/zip_iterator.html#zip... "The fact that the zip_iterator models only Readable Iterator does not prevent you from modifying the values that the individual iterators point to."

4438

Age (days ago)

4441

Last active (days ago)

List overview

Download

9 comments

5 participants

participants (5)

Dave Abrahams
Gonzalo BG
Jeff Flinn
Jonathan Wakely
Neil Groves