Hi Andrzej, (and all),
Thank you for taking the time to write your answer and the quick
first-impressions
On Mon, Sep 16, 2024 at 7:17 AM Andrzej Krzemienski
śr., 26 kwi 2023 o 23:38 Alfredo Correa via Boost
napisał(a): The library is available here, https://gitlab.com/correaa/boost-multi.
Hi Alfredo,
Thank you for sharing your library. This has been more than a year now, and I am sorry for the delayed response. Thank you for reminding us of it in the slack channel. From this, I gather that the game is still afoot.
yes, it is. No problem about the delayed response.
I personally never needed to manipulate big multidimensional arrays, so I cannot immediately appreciate the usefulness of the library. I need a good introductory part. When I read the high-level description, I immediately think, "it is the same as std::mdspan". The docs say that it is different from the std::mdspan, but then I think, "no, it is the same as std::mdspan".
In my experience manipulating (big) multidimensional arrays boils down to 3 things: 1) manage allocations carefully, 2) resolve the tension between 1D access in a n-dimensional space. (Handle logic access but also fuse loops when performance demands it.) 3) good separation between value and reference semantics to avoid unnecessary copies when possible and ensure true value semantics when needed. Well-defined semantics in generic settings avoid the need for "defensive" copies. None of this is directly tackled by std::mdspan. std::mdarray is newer, and I didn't have time to experiment with it. My understanding is that mdarray doesn't tackle these problems either, only 1) partially since it is going to be a container-adaptor (it will rely on an underlying container).
From the comparison table, I gather that Multi offers both the container and the views (sort of references), and that std::mdspan is only a view. Am I right?
yes
The docs say that Multi provides value semantics, but I guess it is not a fair statement.
Multi provides value semantics that no other library provided so far IMO, and that is a fair statement.
I guess (and correct me if I got this wrong) that the container is value-semantic, but the views are not.
That is the nature of the views, that you want them *not* to have value
semantics.
I don't like to use the term "views" because it can mean many things, especially because the ranges and spans abuse the term. It means so many things that even the term "owning" views are used now, which is opposite to your definitions ("the nature of the views, that you want them *not* to have value semantics"). In my opinion, the current use of "view" has no well-defined reference semantics, no well-defined consistency propagation, and no well-defined lifetime. Individual elements can return anything, basically, l-values, e-values, or proxies. It seems that "view" nowadays means anything that is not strictly container-value but is related to it. For this reason, I stopped using the term view for my library. I tend to use more term subarrays and reference objects (not necessarily language references).
A fair comparison, should compare Multi's views to std::mdspan.
I touch on this on the section "Substitutability with standard vector and span". With respect to semantics mdspan is the same as span. In a few words "Multi's views" are proper references (as much as the language allows) and std::mdspan is a mix of things (that is lately accepted as good enough under the "view" wording). Now, the Standard library has also a proposal in flight to add a container
for multi-dimensional arrays: std::mdarray: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p1684r2.html Could you also include it in your comparison? And I would expect that std::mdspan and std::mdarray are treated as one in this case.
Yes, I could to that, I can add mdarray in the same column as mdspan, and bump its specified requirement to C++26. (Multi is C++17) Take into mdarray is very recent I only had access to an experimental implementation of it.
I read that Multi's types have an STL-compatible interface (range).
yes, because it provides a iterators begin() and end() that are random access that access the multidimensional structure. The access can be recursive (A2D.begin(), A2D.end(), A2.begin()->begin(), etc) or flattened (A.elements().begin(), A.elements().end()). I will add this explicitly early in the documentation.
But this is far from obvious what it means in the context of multi-dimensional arrays. The range/iterator interface was tailored for one-dimensional data structures. There is no obvious generalization to multiple dimensions.
There are two "canonical" generalizations to multiple dimensions, the library handles both is two different clear ways, recursive and flattened. The first consists in regarding a multidimensional as *nested* 1D ranges, where the order of nesting corresponds to the indices ordering. In this view given a multidimensional object A (dimensional larger than one), The A[0], A[1], A[2],... is a one dimensional sequence of ranges of lower dimension than A. If done right, all algorithms that work on 1D ranges should work on the range A.begin()... A.end(). If you write a function that is agnostic of the ultimate (true) dimension of A, you are writing dimension-generic code. The second is to see the whole multidimensional object as a 1D range of all the "terminal" (zero-dimensional elements), that is an unravelled version of the array. Both generalizations are useful, one is accessed through indices, or iterators, A[i] and A.begin(), A.end(). The interesting thing is that A itself is regarded as a 1D object for algorithms that expect that. For example std::ranges algorithms. The other generalization is accessed through the .elements() member. A.elements() gives all the elements across all dimensions are a linear range.
I wouldn't even expect a multi-dimensional array to give me an STL interface (whatever that means).
And yet it does. You know what it means now. Imagine it, you have a multidimensional object and all the algorithms of STL and std::ranges and (if you wrote your generic functions carefully) all your functions that deal with 1D random access containers would work! Maybe, you mean that the library offers a view where you can see the entire
multidimensional array as a long string of values? This would make sense, but if it is the case, I expect the introduction to say exactly this.
Presenting the multidimensional array as a long string is something that fundamentally breaks the abstraction of the multidimensional object, so I delayed referencing to it. It is mentioned in the "comparison table" in the row "flattening of arrays". I am going to add this distinction more prominently.
In the case of std::mdspan, it has been said that it has been tailored to efficiently represent both huge datasets as well as tiny 4x4 matrices. I am not sure if this is the case, but I request that the docs for Multi say what use case they have been designed and optimized for.
In this sense it is designed as std::vector, it is optimized for the large-n case number. It is not optimized (amortized) for insertions or push_backs because of the nature of multidimensional arrays and space and time efficiency constrains and to maintain symmetry among subdimensions. Except for the fact that it wasn't programmed for compile-time dimensions (like mdspan was), the small-n case shouldn't be bad either. Also, there is no small-array optimization. Since the library is very good at interfacing with allocators, it gives the option for stack-based allocator for small array. The other optimized case is on the dimensionality, in the sense that it is generic. Dimensionality is handled recursively.
Does the library only represent dense matrices, or can it also represent sparse data?
Who talked about "matrices"? :)
(yes, I mistakenly wrote it once in the documentation)
The point is that the term 'matrices' (and 'tensors') carry semantic
meaning, such as algebraic operations, related to liner algebra (and
geometry).
If someone wants to implement matrices using Multi they are welcomed, and
of course, as you said, using multi::array
From the intro paragraph: "The library's primary concern is with the storage and logic structure of data; it doesn't make algebraic or geometric assumptions about the arrays and their elements. In this sense, it is instead a building block to implement algorithms to represent mathematical operations, specifically on numeric data. Although most of the examples use numeric elements for conciseness, the library is designed to hold general types (e.g. non-numeric, non-trivial types, like std::string, other containers or, in general, user-defined value-types.)"
The term "stride-based". It is not clear to me what it means.
It referees to the main data structure layout that the library supports. Ultimately, it says that the data of any Multi object is arranged as base + i1*stride1 + i2*stride2 + i3*stride3 + ... mdspan/mdarray gives (completely?) general layouts, but at the price of no-iterators, and fewer complexity guarantees. If there is a better name for it, please let me know.
I cannot see from the introduction if this library will throw exceptions.
It does not. It embraces the basic exception guarantee in general. Operations that allocate may throw exceptions from the allocator, which is provided by other libraries. Logical errors when using the library result in UB or assertions when possible. I will make this clear in the documentation.
The comparison between Multi and std::mdspan in the row "const-propagation semantics" is unfair. I guess you are comparing Multi's container with a view.
Not in particular, I am comparing all aspect of Multi. All aspect of Multi should propagate constness (modulo bugs).
A view is not expected to propagate constness.
(it seems that nothing is expected from "views" after all, so everything is allowed) Multi subarrays, and iterators propagate constness, IMO in a way that ranges should have propagated constness. I choose to propagate constness because it makes const useful.
So this is the very initial feedback, I hope it helps.
It helps a lot actually, I appreciate your time and effort. I will proceed to improve the documentation based on your comments. Thank you, Alfredo