Time Series review results
I am pleased to announce that the Time Series library, submitted by Eric Neibler and developed by him along with Daniel Egloff, Matthias Troyer, David Abrahams and Daniel Wallin using funding provided by Zurcher Kantonalbank has been accepted into boost. As seen in the review, there are some important issues that need to be addressed before the library is ready for boost distribution, however I am confident that this is a very good library. Although the donation of time and funding for Time Series by Zurcher Kantonalbank is laudable and something we would like to encourage more companies to do, the fact that it was donated is not important to the review process. Something that is important is the well-established reputation of the involved lead developer. Eric has a number of very high quality libraries already in boost and has shown his willingness to support them thoroughly and improve them continually. He has been very responsive to user feedback and has an obvious desire to have only high quality work committed to boost with his name on it. For this reason, I am giving him a little more leeway than might be given to a developer without his reputation to address the issues in this review and enter the revised version into boost without a fast tracked second review. Many thanks to the participants in the review: Tom Brinkman, Hugo Duncan, Andrey Tcherepanov, Matthias Schabel, Steven Watanabe, Stjepan Rajko, Matias Capeletto, Matthias Schabel, Lewis Hyatt, Paul Bristow, Tobias Schwinger, Matthias Troyer, Phil Endecott, Zach Laine, Dave Abrahams, Michael Marcin, and Michael Fawcett for their time and attention. The discussions held in this review provided a strong foundation for improving the library, and the variety of domains of expertise represented by the participants shows how broad the desire for a library of this sort is. Most of the points raised in the review have been directly addressed by Eric and discussed until a solution that is acceptable to all involved parties was reached. Eric has acknowledged many as now either fixed in his version or on his To-Do list. However, for the sake of easy reference I will list the major points and my current understanding of what to do about them below. The order in which the points are presented should not be taken as a rating of importance: it is just the order I made notes about them as I reviewed the discussion. 1) Fix the floating-point offset issues The answer to this may be a change in how they are implemented or a complete scrapping of the floating point offsets as an indexing type. In either case, it would be a good idea to provide a view facade for time series that applies a constant multiplicative factor (or possibly a more general function) to an offset. This may even be a sufficient answer to all the floating-point issues, but it is not yet clear to me (or to anyone else that I noticed in the review) what the best answer is. I expect there will be continued discussion on the developer list to clarify this issue. 2) Polish and include the rolling window average example sent to the list This is a very good example of a very potent design idiom for this library, and one that would help a lot of people better understand the potential of the library. The use of the circular buffer to collect multiple points is necessary for many filters. It is such a good case that you should consider not only including it in the library, but also making a tutorial out of it that shows how new algorithms can be added to the library. Ideally, many algorithms that are specific to domains that the authors are not specialists in will be donated by users who develop and polish them for inclusion. A very clear tutorial will make this more likely. 3) If the rolling average is not developed as a tutorial example, some other filter should be There is no way the library can include all the filters in use by programmers in all the various use domains, so the expectation is that writing your own filters will be common. The documentation needs to support this very well. 4) The documentation needs improvement in a number of ways Reorganization to build from foundational ideas and uses to the more complex issues is important, as is the addition of overview, tutorial and rationale sections, as well as a substantial increase in the number of examples. The learning curve is currently too steep and that seems to be discouraging users who might find the library a very good fit. An important step along the way is to add good examples and good pictures in many places in the documentation. This general note is reinforced in some of the other comments. 5) Small toy examples for each of the algorithms would make the documentation of them far more clear 6) Since you maintain both libraries, it would be useful to add to the documentation for both the Time Series and the Accumulators libraries some notes on how a user should select between them for specific jobs. This came up in discussions before the review, so it is likely to be a problem for more future users. 7) Well chosen pictures can help the readers of this documentation immensely Especially in your attempts to clarify the documentation of the discretization, the offset and sample values, good pictures will be invaluable. 8) Expand the focus of the documentation so it doesn't appear to a casual reader that this is purely for econometric/financial time series The library has potential uses in a wide variety of fields, so it is important that potential future users can tell that from even a light skimming of the documentation. 9) The prenamed discretizations seem to cause more problems than they solve Initial testing of the boost::units library to provide discretization units looks hopeful. If this stands up under testing, not only remove the current prenamed discretizations, but also document using boost::units. A good example could do for this. 10) It is only possible to assign across series with the same discretization type This should be made more explicit in the documentation and also since a good example of trouble that would be caused by trying to assign series with different discretization types showed up during the review, it might be added to the rationale section so users can see why the choice is a good one. 11) The reasoning behind having unit series for types that have no obvious unit member should also appear in the rationale In general, there were many good explanations for design choices that should appear in the rationale section. Many of these reasons are compelling but not obvious to a developer who has not tried writing a time series library. Thus they are ideal rationale entries. 12) Currently non-entries in a dense time series are recorded as zeros. This does not seem to be the best idea in all cases. In specific, in some time series, the difference between a value of zero an unknown is very important. Investigate the best choice for a non-entry. Zero gives simple arithmetic for multiplication (though not addition) that gives the unknown value placeholder when multiplied with anything. Some non-zero value would have to have special arithmetic rules, probably. A universal answer may not be feasible. If not, consider making the unknown value a parameter available for the user to set. 13) Consider a name change from delta_series to dirac_delta_series and a link in the documentation to what a Dirac delta function is. 14) Consider renaming the inverse_series to reciprocal_series or some other more illustrative name. 15) Consider reorganizing to expose the storage containers to outside developers They are likely to be useful for a number of applications aside from time series and this would prevent others from needing to reinvent the wheel as often. 16) Since the usage of time series is spread across a variety of different problem domains, there will be terms that aren't familiar to many of the readers of the documentation. Consider including links to definitions and descriptions for many of the terms used. Paul's example of users who don't know what amortized constant complexity implies is unfortunately not extreme for what you should expect. 17) ordered_inserter is not a great name for the operation it performs - Determine a better name and change to that. Also consider moving to named parameters for the start, stop and value arguments to the ordered_inserter. After all this, the ordered_inserter is a lower level interface than most users want in most circumstances. Add another layer for doing appends on top of this interface, and include a push_back method where sensible for more stl-like syntax. 18) There is a confusion between the template parameter discretization and the constructor parameter discretization. Consider changing the name for the template parameter to something like Discretization_type. 19) Include in the documentation a clear description of what it means for two discretizations to be the same, and point out that the runtime check for this is an assertion, not an exception. Explain why in the rationale section, if nowhere else. 20) A number of reviewers found the concepts of discretization, offset and run to be hard to grasp from the documentation. Work to improve this. 21) The comparison operators for the time_series_facade currently are tied to the time_series_base class. This could be generalized to remove that dependency. If this is done, the documentation should no longer say that time_series_base is the base for all time_series classes, but instead that it provides a convenient base for implementing time_series classes. If this isn't done, the documentation should clearly state the need to inherit from time_series_base. 22) A large number of specific documentation edits were provided in posts by Steven, Paul and Zach. 23) Fix the documentation to clearly show what kind of iterator the series_stl_iterator is. 24) This is outside the scope of this review for a complete answer, but there are questions about the behavior of non-MSVC compilers that define the _MSC_VER macro to masquerade as MS compilers. Is there a way to know which (if any) share the behaviors of the actual MS compilers they pose as, and what is the best way to test for them in code. While the pursuit of this answer is not part of this library, using it is, so if a good answer is found it should become part of this library. 25) Provide versions of the fine_grain and coarse_grain functions that allow for the user to easily provide a sampling function - This is likely to be a very common use case for this library, so it should be well supported. An example in the documentation of how to do this is a good idea. 26) If the review of floating point offsets concludes that they should remain in the library, the dense series reference page should explicitly point out that floating point offsets are not usable with dense series. 27) What is the appropriate handling of zero width points. They are a part of the library in the delta_series, so having them is unavoidable. They should be on firm conceptual ground if possible. 28) Make the existence and use of the set_at function more obvious in the documentation Reviewers didn't notice it and missed the functionality, so others will, as well. 29) Consider whether it is valuable to provide versions of the transform algorithm that take more than two series as arguments. I can't think of a good application off hand, but there may be some. However, the same reviewer who requested the feature said earlier in the review that he almost never works on more than one series at a time. 30) Provide an example of using a single pass data stream with a circular buffer based algorithm. 31) Consider a better name for the scaled_storage_tag. 32) Consider whether users need access to the metafunctions used to determine return types. If so, consider how they could be exposed in a user friendly way. Once again, my thanks to the reviewers and the developers for the work, time and attention put into this library and congratulations to Eric for the accepted work. John Phillips Review Manager -- Those only are happy, who have their minds fixed on some object other than their own happiness; on the happiness of others, on the improvement of mankind, even on some art or pursuit, followed not as a means, but as itself an ideal end. Aiming thus at something else, they find happiness by the way. John Stuart Mill
participants (1)
-
John Phillips