20 Nov
2014
20 Nov
'14
6:03 p.m.
Steven Ross wrote: >> How is Get_char >> supposed to ensure all by itself that all first characters of all book >> titles are considered in the same radix pass, i.e. the one after the last >> character of the longest author name? >> > By alternating between returning payload radix characters and a "does this > field continue" byte. A 0 will specify an end of field, and sort lower in > the order than a 1, thus making all shorter names sort lower than names > that are equivalent to the same length but longer. > Thus, Author: Jon Title: bat will have GetChar output these values: > 1,'J',1,'o',1,'n',0 ,1,'b',1,'a',1,'t',0 > > Author: Jonathan Title:(empty), will output these values: > 1,'J',1,'o',1,'n',1,'a',1,'t',1,'h',1,'a',1,'n',0, 0 > > The 0 vs 1 will cause Jon to appear before Jonathan in order, because it is > shorter. Actually, the 0s and 1s aren't necessary for the last subkey (the > title), but I included them to show this approach. > > There is also a simple GetLength functor that lets string_sort know when > all characters are exhausted. > >> >> By contrast, it is very clear how refining the comparison function would >> be sufficient for a comparison sort. Just compare by author; in case of a >> tie, compare by title. >> > > That's exactly what the Get_char method I explained is doing, but it's less > intuitive. The radix equivalent requires encoding the same information the > comparison function uses, but sometimes in a different byte format to > include factors such as sizes and signs. As I said, doable, but may not be > worth the effort. > > >> To the best of my knowledge, it is common wisdom that radix sorts are not >> as general as comparison sorts. I can be wrong. I may have missed an >> important (new?) insight. If that is the case, please explain to me how >> refining the Get_char function is sufficient to deal with any strict weak >> ordering (just repeating the claim does not help me understand). >> >> I agree with your statement that this is conventional wisdom. > 2 major factors are included in this conventional wisdom: > 1) Most Radix sorts perform poorly relative to comparison sorting for small > N, large K, or variable-length data. Hybrid-radix sorting resolves this > problem. > 2) The mapping from a sorting requirement to a comparison function is more > intuitive and generally simpler than the mapping to a Get_char-type radix > implementation. With #1 limiting performance, and comparison-based sorts > being fast enough for most problems, I think most people just didn't put in > the effort to examine the possibility of generalized radix sorting. > string_sort provides this capability. It won't be optimal for many types > of problems, but it can be used for them and provides worst-case > performance guarantees. Thank you, Steven. It clicked. I guess I just needed a “explain like I’m 10” type of explanation. Issue resolved, as far as I’m concerned. Cheers, Julian