[boost] [gsoc 2013] Approximate string matching

12 Apr 2013

      Hi, I'm Jan Strnad, currently a Master's student at CTU in Prague, Czech 
Republic. This summer, I would like to participate in GSOC and I believe 
that approximate string matching is a good match for me, since my 
background i this area is quite strong.

Now more specifically. I would like to implement approximate string 
matching algorithm(s) based on the NFA. I would like to support various 
approximate distances, such as Hamming distance [1], Levenstein distance 
[2], Damerau-Levenstein distance [3], Delta distance, Gamma distance and 
finally (delta, gamma) distance. Sorry no explanation of the last three 
found on the Web, but I can provide one if interested.

In the matter of NFA implementation, I would probably choose dynamic 
programming approach, but I'm thinking of shift-or algorithm as well.

Can I ask you for any kind of thoughts or feedback regarding this topic?

Best regards,
Jan Strnad

[1] http://en.wikipedia.org/wiki/Hamming_distance
[2] http://en.wikipedia.org/wiki/Levenstein_distance
[3] http://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance