On Thu, Oct 31, 2019 at 6:28 PM Gavin Lambert via Boost-users < boost-users@lists.boost.org> wrote:
On 30/10/2019 18:03, Zach Laine wrote:
Returning end of range on failure is incredibly inconvenient (for the consumer; granted it's usually more convenient for the algorithm implementer), and I'd be happier if STL algorithms didn't do that either.
I see that as an unfortunate consequence of using generic iterators
as
input parameters and return types, and not an otherwise desirable design choice.
(ie. the STL algorithms do it because they couldn't do anything
better.
string doesn't do it because it can do something better [since it
knows
the iterator type and class, and can consequently choose to return something other than an iterator].)
I heartily disagree, but I'm also very curious about this. As an example, could you take one of the simple std algorithms (std::find would be a very simple candidate), and show its definition in the style you have in mind?
Ok: wall of text incoming.
There are many options (especially when you have Outcome and/or ranges), but a very straightforward update using only existing STL concepts would be to take this:
template
InputIt find(InputIt first, InputIt last, T const& value); And change its return type:
template
optional<InputIt> my_find(InputIt first, InputIt last, T const& value); It now either returns an iterator in the range [first, last) or it returns nullopt. It will never return last.
This simple change means that code following this can easily detect search failure without having to refer back to the actual input parameters (ie. having to know the value of last).
This can be useful when searching a subset of the container (you don't need to save a separate copy of "list.begin() + 5" just so that you can use it as both the last parameter and for failure checks -- or worse, write it twice and risk bugs if someone edits one but not the other).
It also can be useful when the container as a whole is a temporary rvalue -- which is *rarely* useful for std::find (since after all a successful iterator return is almost useless if the container itself is about to be destroyed), but not completely so. Sometimes you only want to detect existence without needing to actually consume the iterator. Sometimes you can use the successful return iterator within the same whole-expression, so the container's lifetime hasn't ended yet.
This is a red herring; if you only care about existence, why are you using find()? Use any_of() or C++20's includes() (for detecting subranges) instead. They each return a bool. Moreover, just knowing whether a value is found at all within a subrange via linear search is a corner case -- usually you will use something with O(log(N)) or faster access if you need to do that operation a lot.
As for "last" itself, while having to pass both begin and end iterators *usually* precludes using rvalue containers, that's also not always true. Some containers/iterators will accept a default-constructed iterator to mean "the end of the sequence" (especially when underlying storage is a linked-list, but other containers/iterators use this model too). This means that "first" can be passed "method().begin()" or some other rvalue and "last" can be passed "iterator()". And in range-based algorithms, there's only a single parameter to worry about anyway, so it's even more likely.
As a demonstration, let's imagine a simple span-based contains check (or replace span with your better range type of choice) -- it doesn't care about the iterator other than to assure that there was a successful return:
template<typename T> bool contains(std::span<T> range, T const& value) { return my_find(range.begin(), range.end(), value).has_value(); // or you can just cast to bool if you prefer // alternatively, if my_find itself took a range: return my_find(range, value).has_value(); }
This seems more readable (and less error prone) than an explicit "== range.end()" check. And, given the range-based version of my_find, you could even have code that does this:
if (my_find(method(), 42)) { /* method included 42 */ }
None of those is better to me than using any_of().
Or this:
return resolve(my_find(method(), 42), 0);
Where resolve(x, y) returns "x ? **x : y" -- somewhat like value_or, but including the iterator dereference.
(That makes more sense with a map find, or predicate find, or where the element type is a class that only checks key equality (so the above would return a fully populated object if it exists or a default object if not). Obviously it's a bit silly with plain ints, but you get the idea.)
Now we're getting somewhere. I fully agree that what you wrote above is more easily followed than using std::find(). However, consider this use: out = std::copy(c.begin(), std::find(c.begin(), c.end(), 42), out); Or, in the near future: out = std::copy(c.begin(), std::ranges::find(c, 42), out); That says I want to copy *until* I find 42, and if there is no 42, I want to copy the rest. I find this pattern of code comes up pretty often. I find that I write something like this vs. something like the did-I-find-it style code you wrote above vaguely half the time. That is, I want to know where an element is -- *and* whether it is found -- about as often as I want to get a reference to the first one. This is true because when I frequently want to find just the first one, I tend to reach for something sorted, for efficiency reasons. If I had to write that code using your approach, it would suffer. All I'm pointing out here is that the change you propose is not universally better. In fact, it is universally worse if what you want to do is search for a subrange: auto lower = std::lower_bound(c.begin(), c.end(), 42); out = std::copy(lower, std::upper_bound(lower, c.end(), 42), out); Or: out = std::ranges::copy(std::ranges::equal_range(c, 42), out); That turns in to a real mess when the iterators returned are optionals. Zach