Le 12/05/13 00:22, Howard Hinnant a écrit :
On May 10, 2013, at 1:45 PM, "Vicente J. Botet Escriba"
wrote: When I add validation on the source date format I get
clang 3.2 * empty field->serial ~6.3ns. * field->serial ~13.4ns. * empty serial->field ~1ns. * serial->field ~17.9ns.
gcc-4.8.0 * empty field->serial ~7.5ns. * field->serial ~15.7ns. * empty serial->field ~1ns. * serial->field ~21.7ns. I've been experimenting with adding validation today. I'm guessing that all of your validation is in a translation unit hidden from the testing loop. Is that correct? Right. I've been putting my validation in a header because I want to make it constexpr, and constexpr stuff has weak linkage. The motivation for making it constexpr is that for any part of the validation that involves compile-time information, the validation happens at compile time. Agreed. I would like to make as many consexpr as possible, but the current limitations force to write too many auxiliary functions. Anyway, I have started to move toward this direction. And my first experiments today involve putting some of the validation back into the unit specifiers, in contrast to the direction I was heading earlier. Glad to see you come back. I suspect that we would be forced to implement both to get real performance tests. Specifically:
// invariants: // 1 <= d_ class day { int d_;
static constexpr int __attribute__((__always_inline__)) check_invariants(int d) { return 1 <= d ? d : throw bad_date{}; } Yeah. This is the kind of auxiliary functions that are needed. public: constexpr explicit __attribute__((__always_inline__)) day(int d) : d_(check_invariants(d)) {}
constexpr __attribute__((__always_inline__)) operator int() const {return d_;} };
Because of a bug in clang (http://llvm.org/bugs/show_bug.cgi?id=12848) I've had to mark everything with always_inline to get the compiler to optimize it properly. But once done, it does the optimizations nicely. OK. I would add this attribute to my code.
Now the ymd_date (or whatever name) constructors can be carefully crafted to not re-validate information that is already known. For example if the ymd_date constructor takes a month (not an int), then there is no need for it to re-validate in the month at that point. month is known to be valid.
I've removed what I call "range checking", which means there is no validation on year.
Here is a partial implementation of what I'm testing for ymd_date:
<snip>
The class is holding objects of type year, month and date instead of 3 ints (or whatever) so that the invariants of the individual components are not compromised when storing into, or returning from the ymd_date (i.e. they don't have to unnecessarily undergo re-validation). As the sizeof such a ymd_class is 3 times the minimum needed (32bits) , shouldn't this mean that a ymd_date type should be passed as const reference instead of by value?
The ymd_date validator taking year, month and day doesn't have to validate the month, it is known to be valid. It doesn't have to validate the year, there is nothing to validate. It only has to validate the day. And it doesn't need to check that the day >= 1, the day constructor already took care of that. The problem with this approach, as we discussed previously, is that the day/month constructors will check for a valid range. constexpr objects help to reduce the check cost, but don't avoid it in general.
My experiments with looking at assembly generated at -O3 is that if either the month or day is a compile-time object, the validation code is reduced. For example it is common for day to be the first of the month, or perhaps the 5th, or any other fixed number <= 28. When this happens, and I construct a:
ymd_date ymd(year(y), month(m), day(1));
I can see in the generated assembly that everything disappears except ensuring that 1 <= m <= 12. Similarly when only the month is compile-time information I'm seeing the constraint checking on d is simplified, especially for the case that the month is not feb.
But even when all three unit specifiers are run time information, when I run this through a field->serial conversion:
const int Ymin = 1900; const int Ymax = 2100; volatile int k; int count = 0; auto t0 = std::chrono::high_resolution_clock::now(); for (int y = Ymin; y <= Ymax; ++y) { for (int m = 1; m <= 12; ++m) { int last = days_in_month(y, m); for (int d = 1; d <= last; ++d) { ymd_date ymd{year(y), month(m), day(d)}; k = days_from(ymd.year(), ymd.month(), ymd.day()); ++count; } } } auto t1 = std::chrono::high_resolution_clock::now(); typedef std::chrono::duration
sec; auto encode = t1 - t0; std::cout << encode.count() / count << '\n'; std::cout << sec(encode).count() / count << '\n'; I'm seeing times that are only 0.1ns to 0.2ns slower. This information is preliminary. My optimizer may again be getting the best of me. But in this case, I do not believe I have the option of moving the validation out of the translation unit with the test loop since I believe that this really must be constexpr to take advantage of common cases like:
ymd_date ymd{year(y), month(m), day(1)}; I would comeback with my own results once I move my validation check constexpr. If the validation really is this cheap, this pulls the motivation for the unchecked field types. And I currently don't see a motivation for a checked serial type. Only an unchecked serial type make sense to me since the only thing that can go wrong with it is for it to move out of range. And that range can easily be made ridiculously large (+/- tens of thousands of years, if not millions of years). See my other post. This style of validation checking has renewed my interest in the month_day type. A month_day type can be created and validated once, and then the ymd_date object can be constructed multiple times with a run-time year and the fixed month_day type with a faster validation check than with separate year, month and day components:
static constexpr day __attribute__((__always_inline__)) check_invariants(year y, month_day md) { return md.month() != 2 || md.day() <= 28 || y.is_leap() ? md.day() : throw bad_date{}; }
constexpr __attribute__((__always_inline__)) ymd_date(year y, month_day md) : y_(y), m_(md.month()), d_(check_invariants(y_, md)) {}
And if month_day happens to be constexpr, and the day happens to be <=28, or the month happens to not be feb, this validation completely disappears at compile time. This is made possible because the month_day constructor has already non-reduntantly performed other parts of the validation (and at compile-time if the month_day is constexpr).
Yes, validated month_days could reduce a lot the checked ymd_date validation cost. If the number of days in a year_month can be know at constant time (a constexpr) the constructor from year_month + day could also be improved. I suspect that we need to implement checked and unchecked dates, the users are asking for date construction without any check. I would add a test field construction with a without validation to see the cost. Best, Vicente