I have exactly the same problem. The iterator approach to treat the range as a pair of iterators is not sufficient for me, since it bloats the last iterator. I would like to store all informations in the range and just pass pointers to the range to the iterators, like Erik Niebler explains in his posts single pass ranges.
There is much merit in having state in a range rather than in bloated iterators. It is much more general than just the "last" iterator. It is always horrible to see the predicates and functors held in iterators. It is not a panacea however for the lifetime issues. If applied universally state in the ranges would introduce a lot more lifetime issues. Currently many scenarios work with ranges that die before the end of an expression because the iterators are copied outside of the adaptor expressions therefore the iterators are copied into a new range. It is possible to imagine various mechanisms for passing ownership of ranges or mandating shallow copying, but these all work less consistently with standard containers. It is not the case that one solution is obviously always better than the other.
I have done extensive performance measurements upon various arrangements of having data in iterators versus ranges. I have been surprised how many times a bloated iterator outperforms the indirection to a separate range object. The performance impact varies between processor architectures and of course the size of the iterator and range. I am not allowed to publish my results due to this being under an NDA. I therefore don't expect my assertion to be taken as fact. I encourage experimentation.
What kind of algorithms have you used for your experiments? Most of the algorithms for single pass ranges are quite easy. I am experimenting with odeint, where the range abstracts a mathematical algorithm which solves an ODE iteratively. Using pairs of iterators immediately means that the end iterator needs to know the algorithms and the ode. Even if this does not influence the performance negatively it induces a very unintuitive interface, since your end iterator must be instantiated with these types. I think I can provide some performance measures in the next weeks. Nevertheless, I think that the requirement of obtaining a iterator from const single pass range could be relaxed.