[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

[FWP] sorting text in human-order



I just wrote a script to help my wife keep her bookmarks sorted.  In
the process, I found that how she was sorting them by hand wasn't
anything like a simple C<cmp>.  By trial and error, I came up with the
following sort key calculation to mimic her idea of a "natural" sort
order.

   if (m!^<LI><a href="[^"]+">([^<]+)</a>\s+\z!) {
      my $srt = $1;

      $srt =~ s/^The //i;  # squash leading 'The '
      $srt =~ s/&(?:\B|amp;)/AND/g; # Translate & or &amp;
      $srt =~ s/&\w+;//g; # but kill other entities
      $srt =~ tr/0-9a-z\xe9/a-jA-ZE/;  # uc & sort nums after letters
                                       # MORE -- xlat more latin1 chars
      $srt =~ s/\b'(?=[ST]\b)//g; # remove apostrophes
      $srt =~ tr/A-Za-j/ /cs; # remove everything but letters/numbers
      $srt =~ s/^ //; # squash leading and trailing spaces (ook ook!)
      $srt =~ s/ \z//;

Here are some sample keys:

WTM Parent's Forum => WTM PARENTS FORUM
WTM Sale & Swap Board => WTM SALE AND SWAP BOARD
Advanced Book Exchange => ADVANCED BOOK EXCHANGE
Amazon.com => AMAZON COM
Bestbookbuys.com => BESTBOOKBUYS COM
Powells Books => POWELLS BOOKS
Catalog Search (homeeducation.com) => CATALOG SEARCH HOMEEDUCATION COM
Design-A-Study => DESIGN A STUDY
Greenleafpress => GREENLEAFPRESS
Jackdaws => JACKDAWS
Sing 'N Learn => SING N LEARN
Used Curriculum => USED CURRICULUM
Artscroll => ARTSCROLL
10 Steps- Spelling Power => ba STEPS SPELLING POWER

==== Want to unsubscribe from Fun With Perl?  Well, if you insist...
==== Send email to <fwp-request@technofile.org> with message _body_
====   unsubscribe