On May 12, 2013, at 5:44 AM, "Vicente J. Botet Escriba" wrote:
after making validation *constexpr* I'm getting better results, but there is yet a difference.
checked: volatile ymd_date dt = ymd_date(year(y), month(m, check), day(d,check), check);
unchecked: volatile ymd_date dt = ymd_date(year(y), month(m), day(d));
clang 3.2
checked ymd 1.75081
unchecked ymd 0.0923393
ENCODE empty 0.0895197
gcc-4.8.0
checked ymd 1.42277
unchecked ymd 0.116654
ENCODE empty 0.113943
I haven't tracked this down yet. But here is what I have done:
I changed my test to look like this:
volatile int k;
constexpr int count = 100000000;
std::mt19937 eng;
std::uniform_int_distribution<> get_year(-30000, 30000);
std::uniform_int_distribution<> get_month(1, 12);
auto t0 = std::chrono::high_resolution_clock::now();
for (int c = 0; c < count; ++c)
{
#ifdef cy
constexpr year y(2013);
#else
const year y(get_year(eng));
#endif
#ifdef cm
constexpr month m(5);
#else
const month m(get_month(eng));
#endif
#ifdef cd
constexpr day d(12);
#else
# if defined(cy) && defined(cm)
constexpr year_month ym(y, m);
# else
const year_month ym(y, m);
# endif
std::uniform_int_distribution<> get_day(1, ym.days_in_month());
const day d(get_day(eng));
#endif
ymd_date ymd{y, m, d};
k = days_from(ymd.year(), ymd.month(), ymd.day());
}
auto t1 = std::chrono::high_resolution_clock::now();
typedef std::chrono::duration sec;
auto encode = t1 - t0;
std::cout << sec(encode).count() / count << '\n';
I also put a macro in my validation checking to turn it on and off. And I tested the "empty" case with validation off and:
k = days_from_empty(ymd.year(), ymd.month(), ymd.day());
^^^^^^
which is a function in another translation unit that does nothing but return 0.
I decided I had a better chance at avoiding unwanted optimizations by using uniform_int_distribution to pick my ymd triples instead of looping through them in a predictable manner. This greatly raises the cost of my "empty" test, but that cost gets subsequently subtracted back out.
I ran the test with all combinations of -Dcy -Dcm and -Dcd (each either defined or not, 8 combinations in all).
I changed my year to store a short, and my month and day to store an unsigned char. This brings the ymd_date object down to 32 bits on my platform.
Unlike Vicente's test, I'm doing a field->serial conversion in this test. I'm doing this to model the "Creators Benchmark" from N3344.
I expected to find the cost of validation most expensive when none of -Dcy -Dcm and -Dcd are defined, but that isn't what I measured. But I'm continuing to get sub-ns cost of validation, which is a small percentage of the cost of the serial->field conversion.
I've tried to eliminate variability from my test results:
I run each test 4 times, and then take the average of the fastest 3 of the 4 as the test time. I'm seeing that this reduces the variability in the timing results well. I'm running on a Core i7 2.5GHz with nothing else running (no mail, no chat, no browser etc.).
Here are my current results for all 8 tests in units of ns, for the cost of the validation:
-Ucy -Ucm -Ucd 0.2ns
-Dcy -Ucm -Ucd 0.6ns
-Ucy -Ucm -Dcd 0.3ns
-Dcy -Ucm -Dcd 0.4ns
-Ucy -Dcm -Ucd 0.2ns
-Dcy -Dcm -Ucd 0.2ns
-Ucy -Dcm -Dcd 0.0ns
-Dcy -Dcm -Dcd 0.0ns
For the -Ucy -Ucm -Ucd case this is a 3.9% hit: 6.6ns vs 6.4ns. The cost goes up to 10.5% for the -Dcy -Ucm -Ucd: 5.9ns vs 5.3ns.
There are still mysteries I'm trying to unravel in these timing tests. For example I don't know why I'm measuring a higher cost of validation for the -Dcy -Ucm -Ucd vs the -Ucy -Ucm -Ucd. I would have expected these two cases to have the same cost of validation (since there is no validation work I'm doing for year). It is difficult to get accurate timings for such brief events as validation. On this machine the difference between 0.6ns and 0.2ns is exactly 1 clock cycle.
However what I am seeing is somewhat consistent with the Creators Benchmark from N3344 for the YMD_4 / Linux test case.
And if you average each pair in the above list (on the theory that a constexpr year makes little difference in validation), then you do see something more expected. Though I'm not positive this isn't just squinting in the way I expected the results to come out:
-Ucm -Ucd 0.40ns
-Ucm -Dcd 0.35ns
-Dcm -Ucd 0.20ns
-Dcm -Dcd 0.00ns
Howard