[boost] [histogram] discussion of accessor interface design

15 Jan 2019

      Dear all,

work on boost.histogram is progressing fast. I am still implementing feedback from the review, simplifying the interface, adding requested features, such as axes that can grow. STL compatibility was further improved, you can now also write to histograms via iterators.

The normal iterators make sense when you iterate over a 1D histogram. When you iterate over a multi-dimensional histogram, you also want to know the current multi-dimensional index.

After considering many options, and I really thought about this a lot, I went for the following design, which is a bit unusual. Therefore I would appreciate feedback. I think it is great, once you overcome an initial feeling of awkwardness.

The accessor class is "polymorphic", it behaves like a pointer to the histogram value, and like an array for the multi-dimensional index. In code, this is how you use it:

auto h = make_histogram(…) // make 2D histogram

for (auto && x : indexed(h)) { // indexed produces a range of accessors
  // x is a special accessor type, combining two non-overlapping concepts
  // - it acts like a pointer to the histogram value
  // - it acts like an array to the current index
  std::cout << "current value " << *x << std::endl; // "dereference" to get value
  std::cout << "current index " << x[0] << " " << x[1] << std::endl; // use subscript operator to get index
}

This syntax is beautifully terse, e.g. see 
https://github.com/HDembinski/histogram/blob/develop/examples/guide_access_b...
for a full example, especially line 66.

Pros:
- really terse
- you can access methods on the pointee with x->method() as well! (useful when histogram counters are not PODs)
- you can iterate over x to get the indices, for (auto i : x) { … } works
- since x acts like an array to the indices, you can pass it to functions which accept ranges or iterators (it has .begin() and .end())
- x can be (and is) enhanced with other useful methods
  * x.bin(N) returns the current bin interval for the N-th axis, allowing you to access the central value, width, edges
  * x.density() returns the current density (bin value divided by product of current bin widths)
Cons:
- *x and x[0] do completely different things: *x gives you the bin value, x[0] gives you the first index

The last point is, of course, where people have a problem. But if you take C++ concepts seriously then the accessor is a perfect model of a pointer and a perfect model of an array. These two roles are non-overlapping and they have non-overlapping sets of interfaces, which I exploit here.

If you have the expectation that *x and x[0] should do the same thing, it is so because of C. C has no extensive type system like C++ and does not distinguish between arrays and pointers, although these are very different concepts. A pointer points to a value, and an array is a collection of values. "dereferencing" a collection of values makes no sense, we are just used to it because of our C heritage.

C++ has a better type system, and better classes for pointers and arrays than raw pointers. The stdlib authors recognize that *x has no meaning when x is a sequence of values. *x fails when x is a std::vector, std::deque, std::list, or any kind of collection in the stdlib. Even for std::unique_ptr, they made sure that the interfaces for the pointer-to-object and pointer-to-array specializations behave differently:

```
#include <memory>

int main()  {  
  std::unique_ptr<int> p(new int); // pointer version
  // p[0]; // fails
  *p; // OK
  std::unique_ptr<int[]> a(new int[3]); // array version
  // *a; // fails 
  a[0]; // OK
}
```

Once you accept that the two concepts of pointers and arrays have non-overlapping interfaces in C++, it becomes possible to make a combined object which has both interfaces, and uses these two sets to return different information.

If you have read this far, I hope your initial reaction of "woah, this looks really inconsistent and confusing!" turned to "hmm, maybe it is not inconsistent after all".

I am looking forward to hear your thoughts.

Best regards,
Hans

PS: The alternative would be to return a std::pair<index_type, value_reference_type>, but this has disadvantages. Unpacking the pair is going be nice in C++17 with structured bindings, but not so much in C++14. Also, it prevents me from adding convenience methods, like the above-mentioned bin(N) and density() methods.

[boost] [histogram] discussion of accessor interface design

Hans Dembinski