[algorithm] Musser-Nishanov sequence search algorithm
Dear Boost community, most of you are probably familiar with Dave Musser, well known for his work on generic programming, introsort and more. What you maybe didn't know, and I didn't until not that long ago, is that he and Gor Nishanov also designed (based off KMP) and wrote a generic sequence search algorithm to compete with Boyer-Moore, etc, which apart from being competitively fast is ostensibly more generic than the others. You can read for yourself on Dave Musser's website: http://www.cs.rpi.edu/~musser/gp/algorithms.html By "more generic" I mean it has a fallback algorithm if the corpus does not have random-access iterators, and its behaviour can easily be tuned for the alphabet size to dramatic effect. (I haven't looked into the 'tunability' of BM, BMH, KMP, etc, so I'm not sure if this is a unique feature.) But at face value, it appears to have powerful features that the others lack without compromising performance, which is pretty neat. So I've been in contact with the original authors and with their permission am just playing the messenger to bring the algorithm to wider attention with the expectation that it is something we could really benefit from. What I hope for now is to have discussion on whether indeed it is an algorithm that we want, presumably alongside the others in Boost.Algorithm. My slightly more controversial thought is that, if no shortcomings or weaknesses are revealed, this could up being the superior search algorithm. I've opened a pull request on Boost.Algorithm with the code so everyone can play around with it if they want. https://github.com/boostorg/algorithm/pull/25 Looking forward to hearing what everyone thinks. Cheers. Jeremy
participants (1)
-
Jeremy Murphy