-----Original Message-----
From: Marco Correia [mailto:mvc@netcabo.pt]
Thanks!
Today I managed to get a little bit further, let me tell you where I
am.
First I aliased every symbol of the alphabet as an integer.
#define A 1
#define B 2
...
This is a good start, but the wrong way to do this part of it. Instead, you
should do something like:
#define CONVERT(sym) BOOST_PP_CAT(CONVERT_, sym) #define CONVERT_A 1 #define
CONVERT_B 2 ...
#define NOT_EQ(s,E1,E2) \
BOOST_PP_NOT_EQUAL(E1,E2)
#define REMOVE_ALL(Elem,Seq) \
BOOST_PP_SEQ_FILTER(NOT_EQ, Elem, Seq)
#define REMOVE_ELEM(s,Seq,Elem) REMOVE_ALL(Elem,Seq)(Elem)
#define REMOVE_DUPLICATES(Seq) \
BOOST_PP_SEQ_FOLD_LEFT(REMOVE_ELEM, Seq, Seq)
This starts with the full sequence, and for each element, removes all
occurences and add it in the end.
I like that you're thinking about this algorithmically. This is an interesting
solution. The downside is that it is redundantly adding and later removing
elements of the result sequence when the input sequence contains the same
element again.
Do you see any other way, more efficient than this?
This is how I did it:
#include
#include
#include
#include
#include #include
#define CONVERT(sym) BOOST_PP_CAT(CONVERT_, sym)
#define CONVERT_0 0
#define CONVERT_A 1
#define CONVERT_B 2
#define CONVERT_C 3
#define CONVERT_D 4
#define CONVERT_E 5
#define CONVERT_F 6
#define CONVERT_G 7
#define CONVERT_H 8
#define CONVERT_I 9
#define CONVERT_J 10
// ...
#define ID(x) (x)
#define PRED_1(d, tuple) \
CONVERT( \
BOOST_PP_SEQ_HEAD(BOOST_PP_TUPLE_ELEM(2, 1, tuple)) \
) \
/**/
#define PRED_2(s, x, y) \
BOOST_PP_NOT_EQUAL(CONVERT(x), CONVERT(y)) \
/**/
#define OP_1(d, tuple) OP_2 tuple
#define OP_2(result, seq) \
( \
result(BOOST_PP_SEQ_HEAD(seq)), \
BOOST_PP_SEQ_FILTER( \
PRED_2, BOOST_PP_SEQ_HEAD(seq), \
BOOST_PP_SEQ_TAIL(seq) \
) \
) \
/**/
#define REMOVE_DUPLICATES(seq) \
BOOST_PP_TUPLE_ELEM( \
2, 0, \
BOOST_PP_WHILE( \
PRED_1, OP_1, \
(ID, seq(0)) \
) \
) \
/**/
REMOVE_DUPLICATES( (A)(B)(A)(C)(B)(B)(A)(C) ) // (A)(B)(C)
Instead of using FOLD_LEFT, this one works by using WHILE to iterate over the
elements of the input sequence. This allows you to *change* the sequence that
you're iterating over. At the beginning of the algorithm, a (0) is added as a
rogue element to the end of the sequence (since you can't have nil sequences).
An extra CONVERT_ macro is defined that converts 0 to 0 (i.e. a no-op).
The predicate passed to WHILE tests whether the head of the to-be-processed
sequence is non-zero after being CONVERT'ed.
The operation passed to WHILE appends the first element of the sequence to the
result, and removes all occurrences of that element in the to-be-processed
sequence.
Note that we "initialize" the result sequence to ID, but append to the tail of
the result (and thus ID removes itself when the first element gets added to the
result).
With each application of the operation, the tuple changes:
( ID , (A)(B)(A)(C)(B)(B)(A)(C)(0) ) // start
( (A) , (B)(C)(B)(B)(C)(0) )
( (A)(B) , (C)(C)(0) )
( (A)(B)(C) , (0) ) // end
At the end of the loop, we simply extract the first element of tuple. From
there, if you want to concatenate all the elements of the result together, just
use SEQ_CAT.
The worst-case scenario for any given length of input sequence occurs when there
are no duplicates. E.g.
REMOVE_DUPLICATES( (A)(B)(C)(D)(E)(F)(G)(H)(I)(J) )
Even though this version is more complex, it is consistently faster because it
doesn't do any redundant work.
Regards,
Paul Mensonides