[Boost-announce] String Algo Review Results

31 Oct 2003

      Dear All,

The String Algorithm Library submission by Pavol Droba should be accepted
into boost given
that the issues that came up during the review are fixed.

I know that my decision will please some and disapoint others, but I will
try explain my decision later.
I studied many former decisions made by review managers to see if I could
find a pattaern of when to accept or reject
a library. Rejection was done for these reasons:

* too poor documentation
* design does not meet enough user's needs
* design was flawed

When I summarize a decision on a certain issue,
then I have read the whole thread, but sometimes the conclusion is not too
easy to read from the discussion; so
please say if you think I have made a mistake.

Here's an overview of the rest of this mail:

(1) Statistics
(2) Critique of the library
(3) Improvements that must be adressed before the inclusion of the library

(1) Statistics
--------------
Let my first show the statistics of this review:

people doing a review: 7
people voting yes:        6
people voting no:         1
people with general comments: many

In general it is understanding that most people payed most attention to the
algortihms themselves; the traits, iterator_range, and find/formatter
concepts
and generating algorithms where not studied in detail.

For future reference, I think it would be nice if people stated more
explicitly what they tried and how long time they used on it. This is part
of the checklist that I gave a link to in the mail that started the review.

Many had wishes for extra algorithms and I think Pavol has them on his todo
list.

(2) Critique of the library
--------------------------
In what follows I only discuss the major concerns during the review.

(a) Two libraries are intertwined. On one side there a string algorithms and
on the otherside
     support components for implementing these: container_traits,
sequence_traits, iterator_range.

This is indeed a relevant isssue that we must discuss. I think most agree
that container_traits, sequence_traits and
iterator_range (or half_open_range) should be a separate libraries. However,
as long as the documentation
clearly states that these sub-libs are just waiting for somebody to replace
them, it shouldn't be any problem. On the contrary,
the string library gives us a good starting point for evaluating the
usefulness of these features, and I think we should thank Pavol
for his courage to do this even though they might not be perfect yet.

If we look at how other libraries evolve in boost, then they too change
quite a bit as we get more experience with something. Examples
like iterator_adaptors and regex come into mind.

(b) Documentation is inadequate; imprecise.

There was very different opinions on this. Some said the documentation was
very good; others we're more pessimistic.
Among those who actually did a review, most felt the docs where ok, but they
did not mind more precise specifications and
more examples.

Some libraries have been rejected because of very poor documentation, but I
don't think it would be fair to do it here.

It counts that most reviewers were able to use the library without this
extra documentation. The algorithms seems
quite intuitive to use, and the docs wasn't really a show-stopper.

Also, if we were to evaluate the documentation of other boost libraries just
as harshly, they probably wouldn't make it into
boost. Several other libraries are somewhat more demanding to use than the
string algorithm library.

(c)  The library is more than a string library; ie. it will work with
vector<foobar> even though it is not even close to
      a string. AFAICT, this is true for all functions not taking a locale
parameter.

When the review started, I mailed Craig Henderson because I know he has
boyer more algorithms in the sandbox. Here is what he wrote:

"WRT overlaps with my algorithms in the sequence_algo sandbox, I have
already
considered this, but concluded that the string algorithms are very specific

to sequences of characters. The Boyer Moore, Edit Distance and LCS
algorithms work on any sequence of data. I think they would loose
flexibility and impact if I included them in the specific string algos."

At that time I actually thought that Pavol's library was "merely" a generic
string algorithm library.
But in fact it is much more general than that: it searches and replaces and
erases for subranges in a bigger
range and strings are only one example of where this is handy. It's too
early to say if this will cause changes
to the Formatter/Finder concepts.

I asked myself the question "is this reason enough for rejecting the
library?". After some thought my answer was
'no' because

1) the algorithms that might be affected by this don't take a locale
argument; the ones that do often have a similar name with an 'i' prefix.
When and if
    the generic versions have their own library, the user should (mostly) be
able to switch namespace and include a new header and everything works as
before.
2) it doesn't hurt anyone that the implementation is a bit more general than
it has to be
3) it will be easier to test if these function can actually be used as
generic algorithms on their own

How Craig's algorithms will fit into all this I cannot answer, but it should
be dealt with during his review.

(3) Improvements that must be adressed before the inclusion of the library
--------------------------------------------------------------------------

Some of these changes are probably already incorporated by Pavol, but I'll
mention them here for sake of completeness.
If Pavol thinks that some of the points should be given a mini-review once
again, he is free to do so on the list.
It might be that some doc issues cannot be resolved before boost-book is
updated; in that case there is not much to do but wait.

(a) The documentation should have an overview table with all the libraries
functions
(b) implementation details should be better hided by using <i>Implementation
defined</i> in docs
(c) the namespace should be changed to boost::algorithm::string
(d) it should be mentioned explicitly that traits and ranges are just
temporary parts of the library and theit use should be discouraged. If there
is no
     alternative under way, the docs could state that too.
(e) concepts should be provided so it is clear what a string is or what type
of containers that the algorithm works with
(f) a general description about exception safety when using the library
should be made
(g) the header policy should changed so there is eg. only one find.hpp and
not find.hpp and find2.hpp
(h) grammar should be checked in docs
(i) naming throughout the library should be made consistent; functions that
take a predicate ends with _if; make_range should
    be renamed make_iterator_range or vice versa.
(j) a safe-bool implicit conversion of the iterator range should be added
(k) mutating algorithms should return void

That's it folks! One thing that you might consider is critique of the review
manager; should I have done something different? Earlier? etc. Don't be
afraid
to criticise me...the worst thing that can happen is that I might actually
learn something.

best regards

Thorsten Ottosen, String Algo review manager