Re: [boost] [review] Formal review period for Sort library continuing through November 19

20 Nov 2014

      Steven Ross wrote:

>> How is Get_char
>> supposed to ensure all by itself that all first characters of all book
>> titles are considered in the same radix pass, i.e. the one after the last
>> character of the longest author name?
>> 
> By alternating between returning payload radix characters and a "does this
> field continue" byte.  A 0 will specify an end of field, and sort lower in
> the order than a 1, thus making all shorter names sort lower than names
> that are equivalent to the same length but longer.
> Thus, Author: Jon Title: bat will have GetChar output these values:
> 1,'J',1,'o',1,'n',0 ,1,'b',1,'a',1,'t',0
> 
> Author: Jonathan Title:(empty), will output these values:
> 1,'J',1,'o',1,'n',1,'a',1,'t',1,'h',1,'a',1,'n',0, 0
> 
> The 0 vs 1 will cause Jon to appear before Jonathan in order, because it is
> shorter.  Actually, the 0s and 1s aren't necessary for the last subkey (the
> title), but I included them to show this approach.
> 
> There is also a simple GetLength functor that lets string_sort know when
> all characters are exhausted.
> 
>> 
>> By contrast, it is very clear how refining the comparison function would
>> be sufficient for a comparison sort. Just compare by author; in case of a
>> tie, compare by title.
>> 
> 
> That's exactly what the Get_char method I explained is doing, but it's less
> intuitive.  The radix equivalent requires encoding the same information the
> comparison function uses, but sometimes in a different byte format to
> include factors such as sizes and signs.  As I said, doable, but may not be
> worth the effort.
> 
> 
>> To the best of my knowledge, it is common wisdom that radix sorts are not
>> as general as comparison sorts. I can be wrong. I may have missed an
>> important (new?) insight. If that is the case, please explain to me how
>> refining the Get_char function is sufficient to deal with any strict weak
>> ordering (just repeating the claim does not help me understand).
>> 
>> I agree with your statement that this is conventional wisdom.
> 2 major factors are included in this conventional wisdom:
> 1) Most Radix sorts perform poorly relative to comparison sorting for small
> N, large K, or variable-length data.  Hybrid-radix sorting resolves this
> problem.
> 2) The mapping from a sorting requirement to a comparison function is more
> intuitive and generally simpler than the mapping to a Get_char-type radix
> implementation.  With #1 limiting performance, and comparison-based sorts
> being fast enough for most problems, I think most people just didn't put in
> the effort to examine the possibility of generalized radix sorting.
> string_sort provides this capability.  It won't be optimal for many types
> of problems, but it can be used for them and provides worst-case
> performance guarantees.

Thank you, Steven. It clicked. I guess I just needed a “explain like I’m 
10” type of explanation.

Issue resolved, as far as I’m concerned.

Cheers,
Julian

Re: [boost] [review] Formal review period for Sort library continuing through November 19

Julian Gonggrijp