On Mon, Sep 25, 2017 at 12:17 PM, Thorsten Ottosen via Boost < boost@lists.boost.org> wrote:
Hi Zach,
Thanks for spending a lot of time thinking about the library. Some minor points for discussion follow.
My first concern is that this library is centered around runtime
performance, yet there appears to be no measurement done to determine what -- or even whether -- optimization occurs when using this library instead of the standard containers.
For devector: -------------
Is it it not obvious that an O(1) push_front performs better than an O(n) vec.insert( vec.begin(), x ) ?
I find the use of the word "obvious" when applied to optimization to be a little suspect. Also, that statement is not quite true. push_front() is *amortized* O(1), not O(1). A particular call to push_front() is actually O(N). But the larger point to me is that I can use vector when I care about amortized O(1) growth in either direction. I only care about devector or deque when I need amortized O(1) growth in *both* directions. Then, it is a question of whether the relative performance recommends devector or deque. As of this writing, I have to write benchmarks or do a lot of custom profiling to figure out which works better. My strong preference would be that the author provide sufficient measurements of a few use cases of both of these data structures at various values of N, so that I can make a reasonable educated guess when first selecting the data structure to use. It won't allow me to elide profiling altogether, but it will get me going in the right direction more often, with less effort for me as a user.
Is it not obvious that having a small buffer optimization can be useful? boost::small_vector does it. Nearly all std::string does it. stlsoft::auto_buffer has done it for a decade, described in detail in Matthew Wilson's book. Chandler Carruth had the following presentation last year
https://www.youtube.com/watch?v=vElZc6zSIXM
about small buffer optimizations use in Clang data structures.
You don't need to justify SBO to me; I'm all for it. We already have boost::small_vector though, as you point out. Furthermore, that particular optimization only pays off when N is small. When N is small, amortization doesn't really pay off. So, again, since I would only use devector/deque when I know I need growth in *both* directions, I would want to see the relative strengths and weaknesses of deque vs. devector to make an informed decision.
For batch_deque: ----------------
With std::deque you cannot control the segment size, and implementations differ widely. Is it not obvious that having a block size of 1 or 2 (as e.g. visual c++ does) is very fast.
Right. Control of the segment size is the one thing I called out in my review as an unambiguous positive.
That iteration over a deque can be slow has been known for two decades, Matt Austern wrote an article about it.
... and this should have been the other. I'm aware of that work, and I'd like to see a deque that features configurable segment size and segmented iteration support. As an aside, my experiments with segmented iteration have been of mixed utility. The standard algorithms useful on segments is limited, since they cannot see across segment boundaries. For instance, adjacent_difference() does not work without some gymnastics.
My second concern is that the containers in this library are difficult to
reason about.
Look at it this way: devector is like vector with O(1) front insertion; batch_deque is like deque with custom control over the segment size.
Are you opposed to such containers or is it "all the extra" stuff (which you don't have to use btw) that makes you uneasy?
Up until now, in this post all my objections have been related to runtime efficiency. But now you've brought back up a major complaint that I have, which is the unsafety of the interface. To say that I don't have to use it is a bit disingenuous to me. Adding an unsafe API weakens the invariants of the type, making it more difficult to reason about and use. The idea that I can add elements that will be manually destructed but that are not yet constructed is a serious defect in my opinion. Any throwing operation I do between the creation of such elements and their construction is now unsafe.
Serialization: --------------
std types are over encapsulated which means that they can't avoid double initialization, nor process elements in batches. Is it not obvious that double initialization is more expensive than single initialization?
In the abstract, of course you cannot argue with that. I don't find it to be a useful question, though. The questions I have are: 1) In my specific case, do I care (that is, is it measurable)? In many cases, a type will be expensive to copy or non-default-construct, but trivial to default construct. 2) If the answer to #1 is yes, is reserve() not sufficient? I would expect the answer to one of those two questions to be "no" in almost all cases. It's not that you'd *never* answer "yes" to both, it's just that this represents a low-frequency corner.
For that matter, is the performance of devector better than std::deque when
I don't know the relative frequency of front- and back-pushes? It seems in many cases that it would be quite a bit worse.
You are comparing apples and oranges. push_back on vector may be worse than on deque (just try to store array
).
Re-reading that. I think I didn't state my point very clearly. What I was trying to say is that the zero-point implied in having separate front and back capacities means that I need to guess the relative frequency of front and back pushes if I want the benefits of reserving space. When you combine that with the fact that growing a devector requires copying every element, it may be a pessimization to use one over a deque. To me, that's not an apples/oranges comparison, because my use of the entire library boils down to the case where I know I need growth in *both* directions, and now I must determine whether devector or deque is a better fit. It will depend on the situation, how expensive move operations are, how
expensive allocation is etc. But if you need or want a contiguous layout, you can't use deque.
Another example from the docs:
de::devectorstd::string reversed_lines; reversed_lines.reserve_front(24);
[snip]
? If it's just to avoid the verbose for-loop, that's not enough for me.
If it's an efficiency claim, I have no numbers to back that up, and the use of IO in the example makes it moot anyway.
It's not about rewriting a loop. It about having a different layout for your container's sequence because that layout is either more natural or convenient or simply needed.
Fair enough. Sometimes contiguity is very important.
The first use case for devector is actually in the implementation of batch_deque. One could use some deque implementation, but that would not be optimal for performance.
For small objects or large? For what values of N? For what pattern of calls? Only for double-ended growth, or for single-ended growth as well? I don't necessarily doubt your claim, I just don't have the relative superiority of devector quantified in any way that I can use.
Being able to use devector there has made the code so much simpler.
It's not obvious to me why. Could you elaborate?
I need a justification for unsafe_push_front() besides "avoiding a check,
thus sparing a few instructions". My intuition is that it does not matter, since branch prediction is quite good these days, and a single allocation will swallow all the gains if there are any. However, *measurement* is the only thing that can make this justification, pro or con.
Yes, and what do you do when you actually measure and find out push_back is the bottleneck?
vector has both operator[] and at(), and I'm pretty sure people would go mad if they were forced to use an operator[] that did what at() does. So why is push_back any different? You have basically the same situation: two functions that differ only slightly in their precondition.
First, please demonstrate that I should care about those few instructions. Second, I agree that the difference is slight, but I disagree that it's ok to make that slight change. One of the chief invariants of a standard container is that it manages its storage. This breaks that. Dangerous interfaces are not necessarily evil, but they certainly need to be justified. I would need significant use cases with significant performance benefits (not night-and-day, but significant) to tolerate this unsafety.
Again, from the example docs:
// int sockfd = socket(...); connect(...); de::devector<char> buffer(256, de::unsafe_uninitialized_tag{}); long recvSize = recv(sockfd, buffer.data(), buffer.size(), 0);
if (recvSize >= 0) { buffer.unsafe_uninitialized_resize_back(recvSize); // process contents of buffer } else { /* handle error */ }
The unsafe resize does nothing in this case, since char has a trivial dtor.
It changes the size of the container so you can't read uninitialized values.
That's true, though some sort of span would accomplish the same thing with no loss of safety. In the case that the unsafe uninitialized resize grows the container, leaving storage around that is not yet constructed, but that the dtor will destruct, is very dangerous. If I distill all my objections down it comes to these three points, I think: 1) This library seems to cover small corner cases. 2) The API is unsafe. 3) I don't have any reason to expect that the efficiency gains in the corner cases are substantial enough to justify the unsafety, or even that there are positive gains in my particular corner case. Zach