As an aside: with the following match values boost::regex_constants::match_flag_type mflags = boost::match_default | boost::match_not_dot_newline | boost::match_continuous ; The following is an irritating feature of the regex package expression.assign("[ \t\n]", boost::regex::extended); does not match tab or newline expression.assign("[ \t\\n]", boost::regex::extended); does not match tab or newline expression.assign("[ \\t\\n]", boost::regex::extended); does not match tab or newline The full expression I'm using is expression.assign("(keyword)|([a-zA-Z][a-zA-Z0-9_-]*)|([ \\t\\n]+)|([0-9]+)|(\"[^\"]*\")|(.)", sflags); ^^^^^^^^^ Presumably I need to set regbase::escape_in_lists but why arent TAB and NL allowed as raw characters in character ranges? David