[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

Re: [FWP] sorting text in human-order



Ilmari Karonen <iltzu@sci.fi> writes:

> On Tue, 28 Nov 2000, Yitzchak Scott-Thoennes wrote:
> > I just wrote a script to help my wife keep her bookmarks sorted.  In
> > the process, I found that how she was sorting them by hand wasn't
> > anything like a simple C<cmp>.  By trial and error, I came up with the
> > following sort key calculation to mimic her idea of a "natural" sort
> > order.
> 
> Knuth's _Art of Computer Programming,_ volume 3 details a hideously
> complicated set of rules used by libraries to produce an intuitive (for
> some values of the word, anyway) ordering of book titles.  Implementing
> that in Perl could be interesting, if I can only find a copy..

I think that'd require strong AI wouldn't it? Some of the rules are
splendid:

'1066 et tout la' would get sorted as 'mille et soixante six ans et
tout la', whereas '1066 and all that would be sorted as 'ten sixty six
and all that'. And, just to make things even more confusing something
like '1066 years of solitude' would get sorted as 'one thousand and
sixty six years of solitude'. 

And that's just the numbers. Then you've got the rules for weird stuff
like:

'Tom Jones' gets sorted with the Ts, whereas the hypothetical 'Tom
Jones, the non fictional character, a biography' would get sorted
under 'Jones, Tom'...

It all gets very scary. Of course, if you're actually sorting
bibliographic data then you will hopefully have more data to go on
beyond just the title and author, and you could (say) sort biographies
with different rules from novels, but even so, it gets painful.

Sadly I don't actually have Knuth to hand (it's at home), but the full
list is quite scary. ISTR that his response to the problem is to not
even try to solve it.

Solving it would be *good* though.

-- 
Piers


==== Want to unsubscribe from Fun With Perl?  Well, if you insist...
==== Send email to <fwp-request@technofile.org> with message _body_
====   unsubscribe