2015-07-21 15:11 GMT+03:00 Andy Thomason
Hi All,
I am recruiting users for the putative genetics library.
Hi, I like the idea of genetic library in Boost! However code misses essential optimizations and suffers from premature optimizations. * dna_string misses reserve() in assignment. This makes some of the push_back()s slow. * Attempt to understande the exact search rewarded me with headache (cool hack, I've enjoyed it!). Too many magic constants and variables, this makes the algo hard to maintain. Also I have a disbelive that the algorithm is optimal: You are comparing by 4 nucleotides. 256 nucleotide combinations with length 4 exist. Let's assume for simplicity that nucleotides are uniform distributed. Algorithm will often give false positives: it will be triggered roughtly once each 256 nucleotide comparisons. You're doing some kind of vectorization, so algo will give false positives each ~8 loop bodies. Comparing by longer nucleotide chain will trigger the compare_inexact less often. For example comparing by 8 necleotides will trigger false positive once per ~65500 comparisons. * comparison operators require improvements. Compare sizes first (it's cheap!). Use memcmp in cases like `values < rhs.values || values == rhs.values`. memcmp will give you an integer that already shows is value bigger\smaller\equal, without a need to iterate over the data for seconf time. * `const auto str_values = str.get_values();` - must be `const auto& str_values = str.get_values();` * provide an enum for nucleotides { nA = 0, nT = ...}. This would make the library more user friendly. There's more. If you're interested, I can investigate further -- Best regards, Antony Polukhin