
Hello, I'm Yu Jiang, currently a master student from Tsinghua University in China. I'm attracted by the idea of "approximate string matching" since it's highly related about my research and also my interest. I have seen in a previous discussion, another student proposed to implement some basic matching functions such as "edit distance" for boost. He said by using dynamic programming, the time complexity is O(m*n). However, if we give a threshold T and only report the exact edit distance if it's no larger than T, the complexity can be O(T * min(m,n)). I think sometimes we just need to decide whether the edit distance is small enough. Furthermore, I'm also interested in approximate string search. That is, given a query string word Q and a candidate word set S, find all the strings in S who have the edit distance no larger than a threshold T with the query string. This is very useful in many area such as data integration, and I have strong knowledge in fast algorithms to solve this. So, is it compatible for gsoc 2013 and for boost? Looking forward to your opinions. Thanks, Yu