[boost] [gsoc-2013]Approximate string matching

23 Apr 2013

      Hello, I'm Yu Jiang, currently a master student from Tsinghua University in
China. I'm attracted by the idea of "approximate string matching" since
it's highly related about my research and also my interest.

I have seen in a previous discussion, another student proposed to implement
some basic matching functions such as "edit distance" for boost. He said by
using dynamic programming, the time complexity is O(m*n). However, if we
give a threshold T and only report the exact edit distance if it's no
larger than T, the complexity can be O(T * min(m,n)). I think sometimes we
just need to decide whether the edit distance is small enough.

Furthermore, I'm also interested in approximate string search. That is,
given a query string word Q and a candidate word set S, find all the
strings in S who have the edit distance no larger than a threshold T with
the query string. This is very useful in many area such as data
integration, and I have strong knowledge in fast algorithms to solve this.

So, is it compatible for gsoc 2013 and for boost? Looking forward to your
opinions.

Thanks,
Yu