RFC: edit_distance / edit_alignment library

26 Jun 2013

      Boost Community,

I cooked up a library that provides a pair of functions: edit_distance() and edit_alignment().  The library currently lives here:
https://github.com/erikerlandson/edit_distance

Some short example programs are here:
https://github.com/erikerlandson/edit_distance/tree/master/examples

A few example expressions:

// the edit distance (aka levenshtein distance)
unsigned dist = edit_distance("abc", "bcd");

// using a custom cost function for insertion, deletion, substitution
unsigned dist = edit_distance("abc", "bcd", custom_cost_function());

// accepts arbitrary sequences, ranges, range adaptors
unsigned dist = edit_distance(as_vector("abc"), as_list("bcd") | boost::adaptors::reversed);

// obtain the edit distance and the list of the edit operation codes (in output iterator)
unsigned dist = edit_alignment("abc", "bcd", std::ostream_iterator<edit_opcode>(std::cout, "")).second;

// adaptors to add edit operation costs and sequence elements to the edit operation output
unsigned dist = acquire<elements>(acquire<costs>(edit_alignment))("abc", "bcd", std::ostream_iterator<boost::tuple<edit_opcode, unsigned, char, char> >(std::cout, "")).second;

I'm interested in community feedback on whether these would be a useful addition to boost::algorithms.  I envision these as the first additions to an eventually-larger library of other kinds of string distances, dynamic programming, and other algorithms for sequence alignment.

Erik Erlandson

Phil Endecott

Erik Erlandson

Rene Rivera

Erik Erlandson

Phil Endecott

Erik Erlandson

Phil Endecott

Erik Erlandson

Erik Erlandson

Erik Erlandson

Erik Erlandson

Bjorn Reese

Erik Erlandson

Erik Erlandson

Bjorn Reese

Erik Erlandson

Bjorn Reese

tags

participants (4)