Time Series review results

17 Aug 2007

      I am pleased to announce that the Time Series library, submitted by Eric
Neibler and developed by him along with Daniel Egloff, Matthias Troyer,
David Abrahams and Daniel Wallin using funding provided by Zurcher
Kantonalbank has been accepted into boost. As seen in the review, there are
some important issues that need to be addressed before the library is ready
for boost distribution, however I am confident that this is a very good
library. Although the donation of time and funding for Time Series by
Zurcher Kantonalbank is laudable and something we would like to encourage
more companies to do, the fact that it was donated is not important to the
review process.

  Something that is important is the well-established reputation of the
involved lead developer. Eric has a number of very high quality libraries
already in boost and has shown his willingness to support them thoroughly
and improve them continually. He has been very responsive to user feedback
and has an obvious desire to have only high quality work committed to boost
with his name on it. For this reason, I am giving him a little more leeway
than might be given to a developer without his reputation to address the
issues in this review and enter the revised version into boost without a
fast tracked second review.

  Many thanks to the participants in the review: Tom Brinkman, Hugo Duncan,
Andrey Tcherepanov, Matthias Schabel, Steven Watanabe, Stjepan Rajko, Matias
Capeletto, Matthias Schabel, Lewis Hyatt, Paul Bristow, Tobias Schwinger,
Matthias Troyer, Phil Endecott, Zach Laine, Dave Abrahams, Michael Marcin,
and Michael Fawcett for their time and attention. The discussions held in
this review provided a strong foundation for improving the library, and the
variety of domains of expertise represented by the participants shows how
broad the desire for a library of this sort is.

  Most of the points raised in the review have been directly addressed by
Eric and discussed until a solution that is acceptable to all involved
parties was reached. Eric has acknowledged many as now either fixed in his
version or on his To-Do list. However, for the sake of easy reference I will
list the major points and my current understanding of what to do about them
below. The order in which the points are presented should not be taken as a
rating of importance: it is just the order I made notes about them as I
reviewed the discussion.

1)   Fix the floating-point offset issues  The answer to this may be a
change in how they are implemented or a complete scrapping of the floating
point offsets as an indexing type. In either case, it would be a good idea
to provide a view facade for time series that applies a constant
multiplicative factor (or possibly a more general function) to an offset.
This may even be a sufficient answer to all the floating-point issues, but
it is not yet clear to me (or to anyone else that I noticed in the review)
what the best answer is. I expect there will be continued discussion on the
developer list to clarify this issue.

2)   Polish and include the rolling window average example sent to the list
 This is a very good example of a very potent design idiom for this
library, and one that would help a lot of people better understand the
potential of the library. The use of the circular buffer to collect multiple
points is necessary for many filters. It is such a good case that you should
consider not only including it in the library, but also making a tutorial
out of it that shows how new algorithms can be added to the library.
Ideally, many algorithms that are specific to domains that the authors are
not specialists in will be donated by users who develop and polish them for
inclusion. A very clear tutorial will make this more likely.

3)   If the rolling average is not developed as a tutorial example, some
other filter should be  There is no way the library can include all the
filters in use by programmers in all the various use domains, so the
expectation is that writing your own filters will be common. The
documentation needs to support this very well.

4)   The documentation needs improvement in a number of ways 
Reorganization to build from foundational ideas and uses to the more complex
issues is important, as is the addition of overview, tutorial and rationale
sections, as well as a substantial increase in the number of examples. The
learning curve is currently too steep and that seems to be discouraging
users who might find the library a very good fit. An important step along
the way is to add good examples and good pictures in many places in the
documentation. This general note is reinforced in some of the other
comments.

5)   Small toy examples for each of the algorithms would make the
documentation of them far more clear

6)   Since you maintain both libraries, it would be useful to add to the
documentation for both the Time Series and the Accumulators libraries some
notes on how a user should select between them for specific jobs. This came
up in discussions before the review, so it is likely to be a problem for
more future users.

7)   Well chosen pictures can help the readers of this documentation
immensely  Especially in your attempts to clarify the documentation of the
discretization, the offset and sample values, good pictures will be
invaluable.

8)   Expand the focus of the documentation so it doesn't appear to a casual
reader that this is purely for econometric/financial time series  The
library has potential uses in a wide variety of fields, so it is important
that potential future users can tell that from even a light skimming of the
documentation.

9)   The prenamed discretizations seem to cause more problems than they
solve  Initial testing of the boost::units library to provide
discretization units looks hopeful. If this stands up under testing, not
only remove the current prenamed discretizations, but also document using
boost::units. A good example could do for this.

10) It is only possible to assign across series with the same discretization
type  This should be made more explicit in the documentation and also since
a good example of trouble that would be caused by trying to assign series
with different discretization types showed up during the review, it might be
added to the rationale section so users can see why the choice is a good
one.

11) The reasoning behind having unit series for types that have no obvious
unit member should also appear in the rationale  In general, there were
many good explanations for design choices that should appear in the
rationale section. Many of these reasons are compelling but not obvious to a
developer who has not tried writing a time series library. Thus they are
ideal rationale entries.

12) Currently non-entries in a dense time series are recorded as zeros. This
does not seem to be the best idea in all cases. In specific, in some time
series, the difference between a value of zero an unknown is very important.
Investigate the best choice for a non-entry. Zero gives simple arithmetic
for multiplication (though not addition) that gives the unknown value
placeholder when multiplied with anything. Some non-zero value would have to
have special arithmetic rules, probably. A universal answer may not be
feasible. If not, consider making the unknown value a parameter available
for the user to set.

13) Consider a name change from delta_series to dirac_delta_series and a
link in the documentation to what a Dirac delta function is.

14) Consider renaming the inverse_series to reciprocal_series or some other
more illustrative name.

15) Consider reorganizing to expose the storage containers to outside
developers  They are likely to be useful for a number of applications aside
from time series and this would prevent others from needing to reinvent the
wheel as often.

16) Since the usage of time series is spread across a variety of different
problem domains, there will be terms that aren't familiar to many of the
readers of the documentation. Consider including links to definitions and
descriptions for many of the terms used. Paul's example of users who don't
know what amortized constant complexity implies is unfortunately not extreme
for what you should expect.

17) ordered_inserter is not a great name for the operation it performs -
Determine a better name and change to that. Also consider moving to named
parameters for the start, stop and value arguments to the ordered_inserter.
After all this, the ordered_inserter is a lower level interface than most
users want in most circumstances. Add another layer for doing appends on top
of this interface, and include a push_back method where sensible for more
stl-like syntax.

18) There is a confusion between the template parameter discretization and
the constructor parameter discretization. Consider changing the name for the
template parameter to something like Discretization_type.

19) Include in the documentation a clear description of what it means for
two discretizations to be the same, and point out that the runtime check for
this is an assertion, not an exception. Explain why in the rationale
section, if nowhere else.

20) A number of reviewers found the concepts of discretization, offset and
run to be hard to grasp from the documentation. Work to improve this.

21) The comparison operators for the time_series_facade currently are tied
to the time_series_base class. This could be generalized to remove that
dependency. If this is done, the documentation should no longer say that
time_series_base is the base for all time_series classes, but instead that
it provides a convenient base for implementing time_series classes. If this
isn't done, the documentation should clearly state the need to inherit from
time_series_base.

22) A large number of specific documentation edits were provided in posts by
Steven, Paul and Zach.

23) Fix the documentation to clearly show what kind of iterator the
series_stl_iterator is.

24) This is outside the scope of this review for a complete answer, but
there are questions about the behavior of non-MSVC compilers that define the
_MSC_VER macro to masquerade as MS compilers. Is there a way to know which
(if any) share the behaviors of the actual MS compilers they pose as, and
what is the best way to test for them in code. While the pursuit of this
answer is not part of this library, using it is, so if a good answer is
found it should become part of this library.

25) Provide versions of the fine_grain and coarse_grain functions that allow
for the user to easily provide a sampling function - This is likely to be a
very common use case for this library, so it should be well supported. An
example in the documentation of how to do this is a good idea.

26) If the review of floating point offsets concludes that they should
remain in the library, the dense series reference page should explicitly
point out that floating point offsets are not usable with dense series.

27) What is the appropriate handling of zero width points. They are a part
of the library in the delta_series, so having them is unavoidable. They
should be on firm conceptual ground if possible.

28) Make the existence and use of the set_at function more obvious in the
documentation  Reviewers didn't notice it and missed the functionality, so
others will, as well.

29) Consider whether it is valuable to provide versions of the transform
algorithm that take more than two series as arguments. I can't think of a
good application off hand, but there may be some. However, the same reviewer
who requested the feature said earlier in the review that he almost never
works on more than one series at a time.

30) Provide an example of using a single pass data stream with a circular
buffer based algorithm.

31) Consider a better name for the scaled_storage_tag.

32) Consider whether users need access to the metafunctions used to
determine return types. If so, consider how they could be exposed in a user
friendly way.

  Once again, my thanks to the reviewers and the developers for the work,
time and attention put into this library and congratulations to Eric for the
accepted work.

   John Phillips
   Review Manager

-- Those only are happy, who have their minds fixed on some object other
than their own happiness; on the happiness of others, on the improvement of
mankind, even on some art or pursuit, followed not as a means, but as itself
an ideal end. Aiming thus at something else, they find happiness by the way.

John Stuart Mill

John Phillips

tags

participants (1)