[chrono/date] Performance goals and design summary
Hi, After the discussion on other threads I would like to summarize some of my performances goals while refactoring the H.H chrono/date library, just to see if I was wrong and going on the bad direction: 1. *Be able to build dates without validating them.* 2. *I wanted the library to be as efficient as possible when the user uses constant objects or literals, no need to validate at runtime what has already been validated by the compiler*. 3. *Possibility to reassign a date if the parameters represent a valid date.* 4. *No a single date class is better than the others on all the algorithms*. 5. *No additional meta-data in absolute dates so that you don't pay for the relative dates overhead when using absolute dates.* 6. *The user must be able to get all the date related informations at once.* Do you have other? The way I have managed with these goals is 1. *P*rovide an unchecked date construction interface with a no_check_t parameter and an is_valid() function. 2. *T*he date parts day/month/year need to be validated or constructed with no_check_t. 3. dt.set_if_valid(year, month, day) 4. Define the Date requirements all the dates must satisfy, implement ymd_date, days_date, ... 5. Separate the absolute and the relative Date requirements into two Date concepts AbsoluteDate and RelativeDate. Provide at least a relative_date class based on the ymd representation. 6. Conversion between different concrete date classe. H.H approach respond to these goals in a different way 1. date and its parts provide constructors that are not validated, the validation is provided by factories as operator/. 2. I don't think this can be achieved when the date must be validated. 3. No problem, this could be added to H.H approach (at least for its YMD date). 4. Same approach. H.H proposes two classes chrono::day_point (only unchecked) and YMD date. 5. I'm not sure the date YMD class doesn't include meta data (Howard could you clarify your point here) 6. same approach. I have some questions for followed by my personal answer; * *Do you see some unnecessary goals, in particular, is the goal 2**necessary?* IMO, yes. Not providing an interface that allows compiler optimizations is against the C++ philosophy. The question is as always, what would be the gain? In the best case we could avoid 6 comparisons. * *Does the introduction of no_check to achieve goal 1 and 2 results in a uglier date interface?* IMO, yes, but I don't know how to achieve goal 2 without. * *Should a concrete date class provide just the functionalities it is able to do efficiently?* IMO, yes. This would prevent the user of non efficient services (Note that my current design doesn't follows this) * *Do we need a Date concept that a concrete date must model**?* IMO, yes. SO we need two levels, level one provides the most efficient service, level two provides an homogeneous interface. * *Do we need to make the separation between absolute and relative dates?* IMO, yes. While relative dates are powerful they incur on unwanted overhead when absolute dates are needed. Best, Vicente
On May 4, 2013, at 9:56 AM, Vicente J. Botet Escriba
5. I'm not sure the date YMD class doesn't include meta data (Howard could you clarify your point here)
I've left that question open. However if I had to make a decision today, I would say definitely not. That aspect of my 2011 paper was almost universally disliked. There are more ways to explore for getting the same or similar functionality. Or simply dropping that functionality is a consideration.
* *Do we need to make the separation between absolute and relative dates?* IMO, yes. While relative dates are powerful they incur on unwanted overhead when absolute dates are needed.
I can guess what you mean by relative and absolute dates, but I'd rather not. Can you briefly describe what these types of dates are? Thanks, Howard
Le 04/05/13 17:02, Howard Hinnant a écrit :
On May 4, 2013, at 9:56 AM, Vicente J. Botet Escriba
wrote: 5. I'm not sure the date YMD class doesn't include meta data (Howard could you clarify your point here) I've left that question open. However if I had to make a decision today, I would say definitely not. That aspect of my 2011 paper was almost universally disliked. There are more ways to explore for getting the same or similar functionality. Or simply dropping that functionality is a consideration.
* *Do we need to make the separation between absolute and relative dates?* IMO, yes. While relative dates are powerful they incur on unwanted overhead when absolute dates are needed. I can guess what you mean by relative and absolute dates, but I'd rather not. Can you briefly describe what these types of dates are?
The date class of your proposal is what I call a relative date, as it contains meta-data as last day of the month, 2nd sunday of month, ... Arithmetic on relative dates is often quite powerful. We could call them contextual dates or a better name if you find it. What I call an absolute date is a date that has no meta-data, it defines exactly a date without a context. 28/feb/2013 is an absolute date while last/feb/2013 is a relative/contextual date. Taking one of examples in your original proposal // Print Feb. 28 for each year in the decade for (date d = feb/day(28)/2010, e = feb/day(28)/2020; d != e; d += years(1)) std::cout << d << '\n'; // Print the last day in Feb. for each year in the decade for (*rel_**date* d = feb/*last*/2010,*date* e = feb/*last*/2020; d != e; d += years(1)) std::cout << d << '\n'; rel_date implicitly converts to date of course, so that the comparison works as expected, the opposite not been true. So the following is valid date dt = feb/last/2020; It is also true that this implicit conversion could result in surprising behavior (as usual) // Print Feb. 28 and notthe last day in Feb. for each year in the decade for (***date* d = feb/*last*/2010, e = feb/*last*/2020; d != e; d += years(1)) std::cout << d << '\n'; Maybe this implicit conversion should be explicit. In your proposal, day arithmetic didn't satisfied the following assert((aug/last/2011 - day(1) + day(1)) == aug/last/2011); IMO this was because there was only a date class, that needed to store the absolute date and the meta-data, and using more meta-data on a general date class was not an option. If we have two separated classes, the relative_date could contain more meta-data and so be able to satisfy the preceding assertion. Of course this is only a possibility and we don't need to have a design that preserves the assertion. There is last point about relative/contextual dates. How relative dates compare. IMO, the meta-data should be part of the comparison. IMO, a date library should provide both, with no additional performance cost when working with absolute dates. Best, Vicente
* *Do we need to make the separation between absolute and relative dates?* IMO, yes. While relative dates are powerful they incur on unwanted overhead when absolute dates are needed. I can guess what you mean by relative and absolute dates, but I'd rather not. Can you briefly describe what these types of dates are?
The date class of your proposal is what I call a relative date, as it contains meta-data as last day of the month, 2nd sunday of month, ... Arithmetic on relative dates is often quite powerful. We could call them contextual dates or a better name if you find it. What I call an absolute date is a date that has no meta-data, it defines exactly a date without a context.
28/feb/2013 is an absolute date while last/feb/2013 is a relative/contextual date.
Taking one of examples in your original proposal
// Print Feb. 28 for each year in the decade for (date d = feb/day(28)/2010, e = feb/day(28)/2020; d != e; d += years(1)) std::cout << d << '\n';
// Print the last day in Feb. for each year in the decade for (*rel_**date* d = feb/*last*/2010,*date* e = feb/*last*/2020; d != e; d += years(1)) std::cout << d << '\n';
rel_date implicitly converts to date of course, so that the comparison works as expected, the opposite not been true.
So the following is valid
date dt = feb/last/2020;
It is also true that this implicit conversion could result in surprising behavior (as usual)
// Print Feb. 28 and notthe last day in Feb. for each year in the decade for (***date* d = feb/*last*/2010, e = feb/*last*/2020; d != e; d += years(1)) std::cout << d << '\n';
Maybe this implicit conversion should be explicit.
I would suggest we stick to contextual dates. Relative dates are a different concept and that can be handled competently by already existing chrono library. So, as per the contextual dates, this actually depends on how much staying power we give to the word 'last'. I think the 'last' attribute ends when the date is made. As you rightly say, this should be made explicit. have no other suggestion than saying that we adapt the make_date() syntax for this use-case scenario too? And on a tangential note, I eat my old words. In this same thread, I have said that we should make year_month implementation defined. Now, I don't think so. Let me give an example: If we want to enumerate the last dates from {jul, 2013} to {jan, 2014}, current api (names may change; what matters is functionality) allows us to write: for(date d = date(2013, aug, 1), e = date(2014, feb, 1); d <= e; d += month(1) ) { std::cout << d - day(1) << '\n'; } I wrote it like this because this was the cleanest I could manage. The comparison between months of different years is clumsy. We do need a year_month class for it. If there were one, it would look like: for( year_month ym(2013, jul), jan14(2014, jan); ym <= jan14; ym += month(1); ) { std::cout << date(ym, last); } Suddenly I understand Howard's desire to include it. And I agree now. it just makes the whole syntax easier. This leads me to my next question, are there uses for similar class month_day?
In your proposal, day arithmetic didn't satisfied the following
assert((aug/last/2011 - day(1) + day(1)) == aug/last/2011);
How come it does not? The fact that the 'last' attribute gives us an absolute date guarantees it. Am I missing something? Regards, Anurag. -- View this message in context: http://boost.2283326.n4.nabble.com/chrono-date-Performance-goals-and-design-... Sent from the Boost - Dev mailing list archive at Nabble.com.
On May 4, 2013, at 6:05 PM, Anurag Kalia
I wrote it like this because this was the cleanest I could manage. The comparison between months of different years is clumsy. We do need a year_month class for it. If there were one, it would look like:
for( year_month ym(2013, jul), jan14(2014, jan); ym <= jan14; ym += month(1); ) { std::cout << date(ym, last); }
Suddenly I understand Howard's desire to include it. And I agree now. it just makes the whole syntax easier.
The nice thing about year and month arithmetic on a ym type is that it gets rid of the ambiguous semantics of month and year arithmetic, especially if the month and year arithmetic is removed from the ymd type. The burden on defining the semantics of month and year arithmetic when days are involved now falls explicitly on the shoulders of the client (just as you demonstrate above).
This leads me to my next question, are there uses for similar class month_day?
I don't see as strong a case the md type. It could be a short cut. I even demonstrate such a use in my 2011 paper: month_day jan4 = jan/_4th; but it isn't as compelling.
In your proposal, day arithmetic didn't satisfied the following
assert((aug/last/2011 - day(1) + day(1)) == aug/last/2011);
How come it does not? The fact that the 'last' attribute gives us an absolute date guarantees it. Am I missing something?
I believe what Vicente was referring to is the meta data that was stored by my 2011 proposal. Day arithmetic would cause the date to forget that it was constructed with last. Though it would still compare equal to a date constructed with last as long as the two dates had the same serial value. Howard
* *Do we need to make the separation between absolute and relative dates?* IMO, yes. While relative dates are powerful they incur on unwanted overhead when absolute dates are needed. I can guess what you mean by relative and absolute dates, but I'd rather not. Can you briefly describe what these types of dates are?
The date class of your proposal is what I call a relative date, as it contains meta-data as last day of the month, 2nd sunday of month, ... Arithmetic on relative dates is often quite powerful. We could call them contextual dates or a better name if you find it. What I call an absolute date is a date that has no meta-data, it defines exactly a date without a context.
28/feb/2013 is an absolute date while last/feb/2013 is a relative/contextual date.
Taking one of examples in your original proposal
// Print Feb. 28 for each year in the decade for (date d = feb/day(28)/2010, e = feb/day(28)/2020; d != e; d += years(1)) std::cout << d << '\n';
// Print the last day in Feb. for each year in the decade for (*rel_**date* d = feb/*last*/2010,*date* e = feb/*last*/2020; d != e; d += years(1)) std::cout << d << '\n';
rel_date implicitly converts to date of course, so that the comparison works as expected, the opposite not been true.
So the following is valid
date dt = feb/last/2020;
It is also true that this implicit conversion could result in surprising behavior (as usual)
// Print Feb. 28 and notthe last day in Feb. for each year in the decade for (***date* d = feb/*last*/2010, e = feb/*last*/2020; d != e; d += years(1)) std::cout << d << '\n';
Maybe this implicit conversion should be explicit. I would suggest we stick to contextual dates. Relative dates are a different concept and that can be handled competently by already existing chrono library. Agreed. I will use contextual dates from now on, until some one find a better name. So, as per the contextual dates, this actually depends on how much staying power we give to the word 'last'. I think the 'last' attribute ends when the date is made. In the Howard implementation the last attribute is lost after day arithmetic. Is this what you meant? What I want is a contextual date that has always a context independently of the year/month or day arithmetic. And of course when a contextual date is converted to a (non-contextual)date the context is lost. As you rightly say, this should be made explicit. Well up to us to decide. For the time been I admit the drawback of implicit conversion. I can change after more experimentation. have no other suggestion than saying that we adapt the make_date() syntax for this use-case scenario too?
Le 05/05/13 00:05, Anurag Kalia a écrit : the make_date and the operator/ factories should work for contextual and non contextual dates.
And on a tangential note, I eat my old words. In this same thread, I have said that we should make year_month implementation defined. Now, I don't think so. Let me give an example:
If we want to enumerate the last dates from {jul, 2013} to {jan, 2014}, current api (names may change; what matters is functionality) allows us to write:
for(date d = date(2013, aug, 1), e = date(2014, feb, 1); d <= e; d += month(1) ) { std::cout << d - day(1) << '\n'; }
I wrote it like this because this was the cleanest I could manage. The comparison between months of different years is clumsy. We do need a year_month class for it. If there were one, it would look like:
for( year_month ym(2013, jul), jan14(2014, jan); ym <= jan14; ym += month(1); ) { std::cout << date(ym, last); }
I have never thought about doing it this using the year_month. I like it. There is a single point that I intended to raise. Would a non-contextual date accept a contextual parameter as day of the month, or should we use the factory std::cout << make_date(ym, last); But how a contextual date should output? Humm I really don't know yet. So to be sure we will need an explicit conversion std::cout << date(make_date(ym, last)); This is not very nice. I think it is useful to build a non contextual date giving a context that is lost after construction. So date dt(y, m, last); date dt(y,m, monday[_3rd]); should be correct but not contextual.
Suddenly I understand Howard's desire to include it. What 'it' stands here? And I agree now. it just makes the whole syntax easier. This leads me to my next question, are there uses for similar class month_day? I have a class month_day, but I didn't implemented month/year arithmetic on it. I guess this could be useful.
In your proposal, day arithmetic didn't satisfied the following
assert((aug/last/2011 - day(1) + day(1)) == aug/last/2011); How come it does not? The fact that the 'last' attribute gives us an absolute date guarantees it. Am I missing something?
See Howard proposal, this is explained there. The reason is that day arithmetic on Howard date class loss its meta-data context. Best, Vicente
Fwiw, I went to http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3344.pdf to experiment with the "Modified Following" benchmark which has some code shown for it. I wanted to see what it would look like with two date types: field and serial, instead of just one (date) which has been implemented as both. Here is the original code from N3344: bool isNonBusinessDay(const Date& targetDate, const Date& startDate, const int* calendar) { int offset = targetDate - startDate; int wordIndex = offset / 8 / sizeof(int); int bitIndex = offset - wordIndex * sizeof(int); return (1 == (calendar[wordIndex] & (1 << bitIndex))); } Date modifiedFollowing(const Date& targetDate, const Date& startDate, const int* calendar) { Date date(targetDate); while (isNonBusinessDay(date, startDate, calendar)) ++date; if (targetDate.month() == date.month()) return date; date = targetDate; do { --date; } while (isNonBusinessDay(date, startDate, calendar)); return date; } For purposes of discussion, assume we now have two date types: 1. Serial type named day_point, which is a chrono::time_point. 2. Field type named ymd (not a great name, just trying to differentiate it). All isNonBusinessDay does is subtract two Dates. This is clearly the domain of a serial type: bool isNonBusinessDay(const day_point& targetDate, const day_point& startDate, const int* calendar) { days offset = targetDate - startDate; days wordIndex = offset / (8 * sizeof(int)); days bitIndex = offset - wordIndex * sizeof(int); return (1 == (calendar[wordIndex.count()] & (1 << bitIndex.count()))); } The modifications are trivial and expense is not compromised. The code is perhaps a little more type safe, utilizing the days units, but nothing spectacular. I would expect the exact same assembly to be generated. modifiedFollowing is more interesting. If it takes two day_points (two serial dates), it might look like this: day_point modifiedFollowing(const day_point& targetDate, const day_point& startDate, const int* calendar) { day_point date(targetDate); while (isNonBusinessDay(date, startDate, calendar)) ++date; if (ymd(targetDate).month() == ymd(date).month()) // serial->field twice return date; date = targetDate; do { --date; } while (isNonBusinessDay(date, startDate, calendar)); return date; } The only difference is the need to convert the serial dates to a ymd type so that the month can be extracted. This is done exactly twice on the line commented. Otherwise the code is remarkably similar, and I would argue, the exact same efficiency. <disclaimer> std::chrono::time_point is currently missing operator-- and operator++. I view this a defect that should be corrected. </disclaimer> One could explore with passing in targetDate as a ymd type instead: day_point modifiedFollowing(const ymd& targetDate, const day_point& startDate, const int* calendar) { day_point date(targetDate); // field->serial day_point sdate = date; while (isNonBusinessDay(date, startDate, calendar)) ++date; if (targetDate.month() == ymd(date).month()) // serial->field return date; date = sdate; do { --date; } while (isNonBusinessDay(date, startDate, calendar)); return date; } This rewrite trades one serial->field conversion for one field->serial conversion. It might be a win if the client actually has a ymd already for input, as my measurements are showing that serial->field conversions are more expensive than field->serial. My measurements are raw and new, so I could be off on that. And no doubt such a measurement is going to depend upon things like hardware and algorithms (caches). But the main point is that having two date types is not disruptive, and mainly serves to give the client more options in optimizing his date algorithms. Howard
Le 05/05/13 04:31, Howard Hinnant a écrit :
Fwiw, I went to http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3344.pdf to experiment with the "Modified Following" benchmark which has some code shown for it.
I wanted to see what it would look like with two date types: field and serial, instead of just one (date) which has been implemented as both.
Here is the original code from N3344:
bool isNonBusinessDay(const Date& targetDate, const Date& startDate, const int* calendar) { int offset = targetDate - startDate; int wordIndex = offset / 8 / sizeof(int); int bitIndex = offset - wordIndex * sizeof(int); return (1 == (calendar[wordIndex] & (1 << bitIndex))); }
Date modifiedFollowing(const Date& targetDate, const Date& startDate, const int* calendar) { Date date(targetDate); while (isNonBusinessDay(date, startDate, calendar)) ++date; if (targetDate.month() == date.month()) return date; date = targetDate; do { --date; } while (isNonBusinessDay(date, startDate, calendar)); return date; }
For purposes of discussion, assume we now have two date types:
1. Serial type named day_point, which is a chrono::time_point. 2. Field type named ymd (not a great name, just trying to differentiate it).
All isNonBusinessDay does is subtract two Dates. This is clearly the domain of a serial type:
bool isNonBusinessDay(const day_point& targetDate, const day_point& startDate, const int* calendar) { days offset = targetDate - startDate; days wordIndex = offset / (8 * sizeof(int)); days bitIndex = offset - wordIndex * sizeof(int); return (1 == (calendar[wordIndex.count()] & (1 << bitIndex.count()))); }
The modifications are trivial and expense is not compromised. The code is perhaps a little more type safe, utilizing the days units, but nothing spectacular. I would expect the exact same assembly to be generated.
modifiedFollowing is more interesting. If it takes two day_points (two serial dates), it might look like this:
day_point modifiedFollowing(const day_point& targetDate, const day_point& startDate, const int* calendar) { day_point date(targetDate); while (isNonBusinessDay(date, startDate, calendar)) ++date; if (ymd(targetDate).month() == ymd(date).month()) // serial->field twice return date; date = targetDate; do { --date; } while (isNonBusinessDay(date, startDate, calendar)); return date; }
The only difference is the need to convert the serial dates to a ymd type so that the month can be extracted. This is done exactly twice on the line commented. Otherwise the code is remarkably similar, and I would argue, the exact same efficiency.
<disclaimer> std::chrono::time_point is currently missing operator-- and operator++. I view this a defect that should be corrected. </disclaimer>
One could explore with passing in targetDate as a ymd type instead:
day_point modifiedFollowing(const ymd& targetDate, const day_point& startDate, const int* calendar) { day_point date(targetDate); // field->serial day_point sdate = date; while (isNonBusinessDay(date, startDate, calendar)) ++date; if (targetDate.month() == ymd(date).month()) // serial->field return date; date = sdate; do { --date; } while (isNonBusinessDay(date, startDate, calendar)); return date; }
This rewrite trades one serial->field conversion for one field->serial conversion. It might be a win if the client actually has a ymd already for input, as my measurements are showing that serial->field conversions are more expensive than field->serial. My measurements are raw and new, so I could be off on that. And no doubt such a measurement is going to depend upon things like hardware and algorithms (caches).
But the main point is that having two date types is not disruptive, and mainly serves to give the client more options in optimizing his date algorithms.
I agree completely that we heed several dates and the standard (the
library) must describe the performances provided by each one.
Next follow the same using two dates days_date and ymd_date with the
expected interfaces.
days_date
modifiedFollowing(const ymd_date& targetDate,
const days_date& startDate,
const int* calendar)
{
days_date date(targetDate); // field->serial
days_date sdate = date;
while (isNonBusinessDay(date, startDate, calendar))
++date;
if (targetDate.month() == date.month()) // serial->field ++ No need to convert explicitly
return date;
date = sdate;
do
{
--date;
} while (isNonBusinessDay(date, startDate, calendar));
return date;
}
++ No need to convert explicitly
For the purpose of showing the Date concept I will use a template
template
Le 05/05/13 09:31, Vicente J. Botet Escriba a écrit :
Le 05/05/13 04:31, Howard Hinnant a écrit :
Fwiw, I went to http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3344.pdf to experiment with the "Modified Following" benchmark which has some code shown for it.
I wanted to see what it would look like with two date types: field and serial, instead of just one (date) which has been implemented as both.
Here is the original code from N3344:
bool isNonBusinessDay(const Date& targetDate, const Date& startDate, const int* calendar) { int offset = targetDate - startDate; int wordIndex = offset / 8 / sizeof(int); int bitIndex = offset - wordIndex * sizeof(int); return (1 == (calendar[wordIndex] & (1 << bitIndex))); }
Date modifiedFollowing(const Date& targetDate, const Date& startDate, const int* calendar) { Date date(targetDate); while (isNonBusinessDay(date, startDate, calendar)) ++date; if (targetDate.month() == date.month()) return date; date = targetDate; do { --date; } while (isNonBusinessDay(date, startDate, calendar)); return date; }
For purposes of discussion, assume we now have two date types:
1. Serial type named day_point, which is a chrono::time_point. 2. Field type named ymd (not a great name, just trying to differentiate it).
All isNonBusinessDay does is subtract two Dates. This is clearly the domain of a serial type:
bool isNonBusinessDay(const day_point& targetDate, const day_point& startDate, const int* calendar) { days offset = targetDate - startDate; days wordIndex = offset / (8 * sizeof(int)); days bitIndex = offset - wordIndex * sizeof(int); return (1 == (calendar[wordIndex.count()] & (1 << bitIndex.count()))); }
The modifications are trivial and expense is not compromised. The code is perhaps a little more type safe, utilizing the days units, but nothing spectacular. I would expect the exact same assembly to be generated.
modifiedFollowing is more interesting. If it takes two day_points (two serial dates), it might look like this:
day_point modifiedFollowing(const day_point& targetDate, const day_point& startDate, const int* calendar) { day_point date(targetDate); while (isNonBusinessDay(date, startDate, calendar)) ++date; if (ymd(targetDate).month() == ymd(date).month()) // serial->field twice return date; date = targetDate; do { --date; } while (isNonBusinessDay(date, startDate, calendar)); return date; }
The only difference is the need to convert the serial dates to a ymd type so that the month can be extracted. This is done exactly twice on the line commented. Otherwise the code is remarkably similar, and I would argue, the exact same efficiency.
<disclaimer> std::chrono::time_point is currently missing operator-- and operator++. I view this a defect that should be corrected. </disclaimer>
One could explore with passing in targetDate as a ymd type instead:
day_point modifiedFollowing(const ymd& targetDate, const day_point& startDate, const int* calendar) { day_point date(targetDate); // field->serial day_point sdate = date; while (isNonBusinessDay(date, startDate, calendar)) ++date; if (targetDate.month() == ymd(date).month()) // serial->field return date; date = sdate; do { --date; } while (isNonBusinessDay(date, startDate, calendar)); return date; }
This rewrite trades one serial->field conversion for one field->serial conversion. It might be a win if the client actually has a ymd already for input, as my measurements are showing that serial->field conversions are more expensive than field->serial. My measurements are raw and new, so I could be off on that. And no doubt such a measurement is going to depend upon things like hardware and algorithms (caches).
But the main point is that having two date types is not disruptive, and mainly serves to give the client more options in optimizing his date algorithms.
I agree completely that we heed several dates and the standard (the library) must describe the performances provided by each one. <snip>
For the purpose of showing the Date concept I will use a template
template
days_date modifiedFollowing(const Date1& targetDate, const Date2& startDate, const int* calendar) { ymd_date ymdTargetDate(targetDate); // serial->field or nothing (so that the month() is efficient - we could store the month also). days_date date(targetDate); // field->serial or nothing, but at least there is a conversion (as we are making day arithmetic) days_date sdate = date; while (isNonBusinessDay(date, startDate, calendar)) ++date; if (ymdTargetDate.month() == date.month()) // serial->field ++ No need to convert explicitly return date; date = sdate; do { --date; } while (isNonBusinessDay(date, startDate, calendar)); return date; } As you can see the two dates with the same interface have not only drawbacks ;-) Providing the same interface and been convertible to each other helps the user.
One additional advantage of using date.month() and not using a specific conversion: * class days_date could have a better algorithm that converting the days_date to ymd_date, e.g. can convert it to an ordinal_date and use a table to get the month from the is_leap() and day_of_year(). How better than the class days_date could know how to make these operations more efficient? itself or the user? Best, Vicente
On May 5, 2013, at 4:15 AM, "Vicente J. Botet Escriba"
I agree completely that we heed several dates and the standard (the library) must describe the performances provided by each one. <snip>
For the purpose of showing the Date concept I will use a template
template
days_date modifiedFollowing(const Date1& targetDate, const Date2& startDate, const int* calendar) { ymd_date ymdTargetDate(targetDate); // serial->field or nothing (so that the month() is efficient - we could store the month also). days_date date(targetDate); // field->serial or nothing, but at least there is a conversion (as we are making day arithmetic) days_date sdate = date; while (isNonBusinessDay(date, startDate, calendar)) ++date; if (ymdTargetDate.month() == date.month()) // serial->field ++ No need to convert explicitly return date; date = sdate; do { --date; } while (isNonBusinessDay(date, startDate, calendar)); return date; } As you can see the two dates with the same interface have not only drawbacks ;-) Providing the same interface and been convertible to each other helps the user.
One additional advantage of using date.month() and not using a specific conversion: * class days_date could have a better algorithm that converting the days_date to ymd_date, e.g. can convert it to an ordinal_date and use a table to get the month from the is_leap() and day_of_year().
How better than the class days_date could know how to make these operations more efficient? itself or the user?
This concerns me. The client of days_date is going to silently walk beyond the range where days_date.month() is efficient if it is implemented with a Bloomberg-style cache. The only answer from the vendor's point of view to make this less likely to happen is to make the cache bigger. The Bloomberg cache (64Kb) is already larger than some vendors can ship (myself included). And it only spans 56 years. The cache solution is great for a special purpose implementation (financial calculations on large computers connected to permanent power sources). But is unacceptable for a history application running on a ram-constrained mobile device. In the serial->field conversion formulas I'm currently using (which are different from those shown in my 2011 paper), by the time the ordinal_date (day_of_year) and is_leap are computed, the expensive part of the serial->field conversion is done. At that point computing the month and day_of_month is small and fast table lookups (just as you describe above). The only thing that could be eliminated by days_date.month(), without a Bloomberg-sized cache, is the table lookup which converts month and is_leap to day_of_month. Said differently, days_date.month() is 98% the expense of a serial->field conversion. Not a single division operation is eliminated. days_date.day() is 100% of the cost of serial->field conversion. days_date.year() does eliminate 1 out of 6 divisions (I'm not counting divisions by powers of 2) from a full serial->field conversion, and so maybe days_date.year() is 20% cheaper than a full conversion. I note that in my 2011 paper, to_year_and_doy() contains 5 divisions (not counting division by powers of 2), but the very first of those must be a 64 bit division (This is significantly more expensive than a 32 bit division on a 32 bit machine). In that implementation (DESIGN == 2), all of day(), month() and year() are about the same cost as a full serial->field conversion. to_year_and_doy() is nearly the entire expense. Putting month() (and day() and year()) on days_date (which I myself am guilty of in my 2011 paper) is directly analogous to giving std::forward_list a size() member, or giving std::list a size() member and saying: it /should/ have constant complexity, wink, wink. It tricks people into writing inefficient code. Survey evidence of this phenomenon written concerning list::size() in 2005 (long before C++11 changed "should" to "shall"): http://home.roadrunner.com/~hinnant/On_list_size.html Howard
Le 05/05/13 18:39, Howard Hinnant a écrit :
On May 5, 2013, at 4:15 AM, "Vicente J. Botet Escriba"
wrote: I agree completely that we heed several dates and the standard (the library) must describe the performances provided by each one. <snip>
For the purpose of showing the Date concept I will use a template
template
days_date modifiedFollowing(const Date1& targetDate, const Date2& startDate, const int* calendar) { ymd_date ymdTargetDate(targetDate); // serial->field or nothing (so that the month() is efficient - we could store the month also). days_date date(targetDate); // field->serial or nothing, but at least there is a conversion (as we are making day arithmetic) days_date sdate = date; while (isNonBusinessDay(date, startDate, calendar)) ++date; if (ymdTargetDate.month() == date.month()) // serial->field ++ No need to convert explicitly return date; date = sdate; do { --date; } while (isNonBusinessDay(date, startDate, calendar)); return date; } As you can see the two dates with the same interface have not only drawbacks ;-) Providing the same interface and been convertible to each other helps the user.
One additional advantage of using date.month() and not using a specific conversion: * class days_date could have a better algorithm that converting the days_date to ymd_date, e.g. can convert it to an ordinal_date and use a table to get the month from the is_leap() and day_of_year().
How better than the class days_date could know how to make these operations more efficient? itself or the user? This concerns me. The client of days_date is going to silently walk beyond the range where days_date.month() is efficient if it is implemented with a Bloomberg-style cache. The only answer from the vendor's point of view to make this less likely to happen is to make the cache bigger. The Bloomberg cache (64Kb) is already larger than some vendors can ship (myself included). And it only spans 56 years. The cache solution is great for a special purpose implementation (financial calculations on large computers connected to permanent power sources). But is unacceptable for a history application running on a ram-constrained mobile device.
In the serial->field conversion formulas I'm currently using (which are different from those shown in my 2011 paper), by the time the ordinal_date (day_of_year) and is_leap are computed, the expensive part of the serial->field conversion is done. At that point computing the month and day_of_month is small and fast table lookups (just as you describe above). The only thing that could be eliminated by days_date.month(), without a Bloomberg-sized cache, is the table lookup which converts month and is_leap to day_of_month. Said differently, days_date.month() is 98% the expense of a serial->field conversion. Not a single division operation is eliminated. days_date.day() is 100% of the cost of serial->field conversion. days_date.year() does eliminate 1 out of 6 divisions (I'm not counting divisions by powers of 2) from a full serial->field conversion, and so maybe days_date.year() is 20% cheaper than a full conversion.
I note that in my 2011 paper, to_year_and_doy() contains 5 divisions (not counting division by powers of 2), but the very first of those must be a 64 bit division (This is significantly more expensive than a 32 bit division on a 32 bit machine). In that implementation (DESIGN == 2), all of day(), month() and year() are about the same cost as a full serial->field conversion. to_year_and_doy() is nearly the entire expense.
Putting month() (and day() and year()) on days_date (which I myself am guilty of in my 2011 paper) is directly analogous to giving std::forward_list a size() member, or giving std::list a size() member and saying: it /should/ have constant complexity, wink, wink. It tricks people into writing inefficient code.
Survey evidence of this phenomenon written concerning list::size() in 2005 (long before C++11 changed "should" to "shall"):
I understand your concern and the parallelism with list::size(). From the performance point of view the single operations provided would be the ones that are efficient enough and maybe this is the single part that can be standardized. However if I take your days_date year() example the user must know that on order to get the year it is better to go through a ordinal_date instead of a ymd_date. I don't know how yet, but as a end-user I would like to get this 20% gain in a less intrusive way. Maybe days_date shouldn't provide a year() function but can provide another function that will not ensure the minimal performances. year calculate_year(); I don't know what others think, but I would prefer to type dd.calculate_year() than ordinal_date(dt).year() The use of calculate_ could be a sympthom of possible deficiencies of the algorithm. Suggestion for a better name for the calculate_ approach are welcome. Best, Vicente
On May 5, 2013, at 2:45 PM, "Vicente J. Botet Escriba"
However if I take your days_date year() example the user must know that on order to get the year it is better to go through a ordinal_date instead of a ymd_date. I don't know how yet, but as a end-user I would like to get this 20% gain in a less intrusive way.
Maybe days_date shouldn't provide a year() function but can provide another function that will not ensure the minimal performances.
year calculate_year();
I don't know what others think, but I would prefer to type
dd.calculate_year()
than
ordinal_date(dt).year()
The use of calculate_ could be a sympthom of possible deficiencies of the algorithm.
We're accustomed to converting among time_t, tm, etc. to get what we need. I don't find the explicit conversion necessarily a problem. However, the general idea that we should omit inefficient member functions so users don't pessimize their code smacks of nannyism. Profiling is the proper way to discover overhead in a program. If using day() or month() is inefficient, but has an insignificant effect on the overall program, then the user wins through ease of use. If the overhead is an issue, then the user can make the type conversion or add caching, or even do something algorithmically different, based upon the context, and might do better than any library magic you devise. IOW, I'm favoring similar interfaces with differing complexity requirements, leaving optimization to the user. ___ Rob (Sent from my portable computation engine)
On May 5, 2013, at 2:45 PM, Vicente J. Botet Escriba
Le 05/05/13 18:39, Howard Hinnant a écrit :
On May 5, 2013, at 4:15 AM, "Vicente J. Botet Escriba"
wrote: I agree completely that we heed several dates and the standard (the library) must describe the performances provided by each one. <snip>
For the purpose of showing the Date concept I will use a template
template
days_date modifiedFollowing(const Date1& targetDate, const Date2& startDate, const int* calendar) { ymd_date ymdTargetDate(targetDate); // serial->field or nothing (so that the month() is efficient - we could store the month also). days_date date(targetDate); // field->serial or nothing, but at least there is a conversion (as we are making day arithmetic) days_date sdate = date; while (isNonBusinessDay(date, startDate, calendar)) ++date; if (ymdTargetDate.month() == date.month()) // serial->field ++ No need to convert explicitly return date; date = sdate; do { --date; } while (isNonBusinessDay(date, startDate, calendar)); return date; } As you can see the two dates with the same interface have not only drawbacks ;-) Providing the same interface and been convertible to each other helps the user.
One additional advantage of using date.month() and not using a specific conversion: * class days_date could have a better algorithm that converting the days_date to ymd_date, e.g. can convert it to an ordinal_date and use a table to get the month from the is_leap() and day_of_year().
How better than the class days_date could know how to make these operations more efficient? itself or the user? This concerns me. The client of days_date is going to silently walk beyond the range where days_date.month() is efficient if it is implemented with a Bloomberg-style cache. The only answer from the vendor's point of view to make this less likely to happen is to make the cache bigger. The Bloomberg cache (64Kb) is already larger than some vendors can ship (myself included). And it only spans 56 years. The cache solution is great for a special purpose implementation (financial calculations on large computers connected to permanent power sources). But is unacceptable for a history application running on a ram-constrained mobile device.
In the serial->field conversion formulas I'm currently using (which are different from those shown in my 2011 paper), by the time the ordinal_date (day_of_year) and is_leap are computed, the expensive part of the serial->field conversion is done. At that point computing the month and day_of_month is small and fast table lookups (just as you describe above). The only thing that could be eliminated by days_date.month(), without a Bloomberg-sized cache, is the table lookup which converts month and is_leap to day_of_month. Said differently, days_date.month() is 98% the expense of a serial->field conversion. Not a single division operation is eliminated. days_date.day() is 100% of the cost of serial->field conversion. days_date.year() does eliminate 1 out of 6 divisions (I'm not counting divisions by powers of 2) from a full serial->field conversion, and so maybe days_date.year() is 20% cheaper than a full conversion.
I note that in my 2011 paper, to_year_and_doy() contains 5 divisions (not counting division by powers of 2), but the very first of those must be a 64 bit division (This is significantly more expensive than a 32 bit division on a 32 bit machine). In that implementation (DESIGN == 2), all of day(), month() and year() are about the same cost as a full serial->field conversion. to_year_and_doy() is nearly the entire expense.
Putting month() (and day() and year()) on days_date (which I myself am guilty of in my 2011 paper) is directly analogous to giving std::forward_list a size() member, or giving std::list a size() member and saying: it /should/ have constant complexity, wink, wink. It tricks people into writing inefficient code.
Survey evidence of this phenomenon written concerning list::size() in 2005 (long before C++11 changed "should" to "shall"):
I understand your concern and the parallelism with list::size(). From the performance point of view the single operations provided would be the ones that are efficient enough and maybe this is the single part that can be standardized.
However if I take your days_date year() example the user must know that on order to get the year it is better to go through a ordinal_date instead of a ymd_date. I don't know how yet, but as a end-user I would like to get this 20% gain in a less intrusive way.
Maybe days_date shouldn't provide a year() function but can provide another function that will not ensure the minimal performances.
year calculate_year();
I don't know what others think, but I would prefer to type
dd.calculate_year()
than
ordinal_date(dt).year()
The use of calculate_ could be a sympthom of possible deficiencies of the algorithm.
Suggestion for a better name for the calculate_ approach are welcome.
One possibility (untested) would be to make calculate_year a namespace scope function with the following default implementation: year calculate_year(ymd_field_type ymd) {return ymd.year();} So if the serial date will implicitly convert to ymd_filed_type, then you can say: if (ymdTargetDate.month() == calculate_month(date)) And if an implementation thinks they can do better than that then they can overload calculate_xxx(serial_type). This is very analogous to how we currently handle swap. Agreed it would be nice to find a better (shorter) name, but one hasn't immediately come to mind. Howard
On May 5, 2013, at 7:33 PM, Howard Hinnant
On May 5, 2013, at 2:45 PM, Vicente J. Botet Escriba
wrote: One possibility (untested) would be to make calculate_year a namespace scope function with the following default implementation:
year calculate_year(ymd_field_type ymd) {return ymd.year();}
So if the serial date will implicitly convert to ymd_filed_type, then you can say:
if (ymdTargetDate.month() == calculate_month(date))
And if an implementation thinks they can do better than that then they can overload calculate_xxx(serial_type). This is very analogous to how we currently handle swap.
Agreed it would be nice to find a better (shorter) name, but one hasn't immediately come to mind.
get_month? ___ Rob (Sent from my portable computation engine)
On May 5, 2013, at 7:39 PM, Rob Stewart
On May 5, 2013, at 7:33 PM, Howard Hinnant
wrote: On May 5, 2013, at 2:45 PM, Vicente J. Botet Escriba
wrote: One possibility (untested) would be to make calculate_year a namespace scope function with the following default implementation:
year calculate_year(ymd_field_type ymd) {return ymd.year();}
So if the serial date will implicitly convert to ymd_filed_type, then you can say:
if (ymdTargetDate.month() == calculate_month(date))
And if an implementation thinks they can do better than that then they can overload calculate_xxx(serial_type). This is very analogous to how we currently handle swap.
Agreed it would be nice to find a better (shorter) name, but one hasn't immediately come to mind.
get_month?
I thought about that. And that may be best. But someone is going to ask: Why can't I do this: serial_date x = ... set_month(x, dec); We could just say, sorry, that doesn't make sense. But the nice thing about calculate_month, or compute_month, is that the reader will intuitively understand that this isn't necessarily a reversible computation. I.e. get implies returning the value of a field, and it would be nice to not imply that. On the plus side, get_month is lot easier to type! Howard
participants (4)
-
Anurag Kalia
-
Howard Hinnant
-
Rob Stewart
-
Vicente J. Botet Escriba