On 26. Nov 2017, at 12:51, Bjorn Reese via Boost
wrote: On 11/20/2017 12:51 PM, Hans Dembinski wrote:
I totally see the point for partial_sum. Could you also provide a code example for how you would like to compute the cosine similarity?
A working example using cosine similarity is too large to include here, but the crux of it is to calculate the cosine similarity between a histogram h1 and all other histograms in order to determine which of these other histograms is closest to h1. Anyways, it was just an example of where it makes sense to use std::inner_product() on two histograms.
Ok, I just wanted to know whether this example would require the same kind of interface.
I committed a change yesterday which implements value iterators for the histograms. This allows code like this:
auto h = /* make a histogram */
std::vector<double> csum;
std::partial_sum(h.begin(), h.end(), std::back_inserter(csum));
The forward iterator returned here iterates over the values of the histogram in implementation-defined order, skipping under-/overflow bins. For 1d-histograms, this is guaranteed to be the normal order you would expect from any 1d container.
This code works well for 1d-histograms. So what if I have 2d? The forward iterator can tell you the current 2d index of the value it points to. To fill a boost::multi_array, you would do
auto h = /* make 2d histogram, 2x3 */
boost::multi_array
I think we need several iterators then. The current axis iterators iterate over the bins definition, not really over the bin content of the histogram itself. I chose the current iterators in this way, because I believe this scheme generalises well to the multi-dimensional case. I wouldn't replace these iterators, because they seem useful to me, but additional iterators that iterate over the bin values and play well with algorithms are also clearly needed. I would rename the current indexing iterator so something like bin_iterator because it does something special:
using bin_iterator = unspecified; bin_iterator bin_begin() const; bin_iterator bin_end() const;
I suppose you suggest this to provide more clarity on what the axis iterators iterate over, but this solution has some drawbacks. The philosophy in this library is to regard an axis as a logical collection of bins (the individual bins do not really exist physically as objects in memory). If you see an axis like that, then you would expect `begin()` and `end()` to provide iterators over the bins. The axis does not know anything about the histogram it belongs to, it is a sub-structure without a callback into the super-structure. Therefore, it technically cannot even provide an iterator over the histogram values in the current design. This hierarchical separation is a nice feature. I would argue that iterators should not be named for documentation purposes, there are algorithms which expect the standard names const_iterator and iterator, and most of the time you would write `auto` nowadays, so that the typename does not appear in the source code and thus you loose it as a source of self-documentation. The case against `bin_begin()` and `bin_end()` is based on the lost compatibility with range-based for loops, which expect the standard names `begin()` and `end()`. I think it is better to emphasise the philosophy behind the design in the documentation to clear up these misunderstandings, no? I am also thinking of changing the bin iterator to make it less surprising, by reusing the design of the value_iterator above. Example code: auto a = /* make axis */ for (auto iter = a.begin(); iter != a.end(); ++iter) { auto index = iter.idx(); // special method to access current bin index auto bin = *iter; // the actual bin object, implemented as an interval with methods lower() and upper() for most axis types } The range-based for loop would then only iterate over the bins, not providing indices for (auto bin : a) { // do something with bin } Since range-based for loops are so nice, I am thinking of providing an adaptor called "enumerate", inspired by Python for (auto index_and_bin : enumerate(a)) { auto index = index_and_bin.first; auto bin = index_and_bin.second; // do something with index and bin } I think this design is less surprising then the current one.
The iterator (and reverse_iterator) types can then be used to traverse the content.
I put rbegin() and rend() for axis types on the to-do list.
- What should this do if h is 2d? The same? If yes, in which order should the 2d array be iterated over?
I would let axis iterators traverse only along the axis. If we need something to traverse all elements within a multi-dimensional array, then we probably need something else, such as mdspan [1].
[1] https://github.com/kokkos/array_ref/blob/master/proposals/P0009.rst
Yeah, mdspan would fit the needs of this library better than multi_array, because the rank can be a dynamic property, while the rank in multi_array is a compile-time constant. Since mdspan is not ready yet, we should still try to provide at least something to iterate over the values, hence my proposal of a value_iterator above. Best regards, Hans