[algorithm] Musser-Nishanov sequence search algorithm

15 Sep 2016

      Dear Boost community,

most of you are probably familiar with Dave Musser, well known for his work
on generic programming, introsort and more. What you maybe didn't know, and
I didn't until not that long ago, is that he and Gor Nishanov also designed
(based off KMP) and wrote a generic sequence search algorithm to compete
with Boyer-Moore, etc, which apart from being competitively fast is
ostensibly more generic than the others.

You can read for yourself on Dave Musser's website:
http://www.cs.rpi.edu/~musser/gp/algorithms.html

By "more generic" I mean it has a fallback algorithm if the corpus does not
have random-access iterators, and its behaviour can easily be tuned for the
alphabet size to dramatic effect. (I haven't looked into the 'tunability'
of BM, BMH, KMP, etc, so I'm not sure if this is a unique feature.)

But at face value, it appears to have powerful features that the others
lack without compromising performance, which is pretty neat.

So I've been in contact with the original authors and with their permission
am just playing the messenger to bring the algorithm to wider attention
with the expectation that it is something we could really benefit from.

What I hope for now is to have discussion on whether indeed it is an
algorithm that we want, presumably alongside the others in Boost.Algorithm.
My slightly more controversial thought is that, if no shortcomings or
weaknesses are revealed, this could up being the superior search algorithm.

I've opened a pull request on Boost.Algorithm with the code so everyone can
play around with it if they want.
https://github.com/boostorg/algorithm/pull/25
Looking forward to hearing what everyone thinks.

Cheers.

Jeremy

Jeremy Murphy

tags

participants (1)