[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

Re: [MacPerl] Searching help

To: macperl@macperl.org
Subject: Re: [MacPerl] Searching help
From: robinmcf@altern.org
Date: Wed, 16 Aug 2000 16:45:19 +0800
In-Reply-To: <p04320400b5bee0077262@[192.168.0.77]>
References: <200008150336.UAA01339@cfcl.com><200008150336.UAA01339@cfcl.com>

>At 20:42 -0700 2000.08.14, Michael Eggleston wrote:
>>I need some help.  Here is the situaltion.  I have joined an entire HTML
>>document together to make it searchable using s/// and m// and now I have
>>a problem.

Just as a general tip to searching HTML docs - unless you don't mind having
th user wait while a cgi sifts through existing documents, which users
usually aren't too keen on doing, why not write a small program that sifts
through the docs on your HD and create an index file, which you then search
through, having the thing precomputed saves server overhead.

The script below will split a document up into an alphabetically ordered
list with the name of the file it came from appended to eacg word, as it
stands it has a document hard coded into it, however with a little adaption
you could use it to parse a directory and write the output.

#! perl -w

#=========== declare includes =============

use strict;
use diagnostics-verbose;

#========== declare variables =============

my($file,@words,@temp,%data);

#============= script body ================
$file='Macintosh HD:Desktop Folder:temp';
open (IN, $file);
while (<IN>) {
    s/-\n//g;                   # Dehyphenate hyphenations spread over 2 lines.
	tr/A-Z/a-z/;            # text to L/C
	tr/\"\'\(\)\~\[\]\@\.\,\;\:\&\%\-\=©®//ds; #kill any non words
(punctuation,

#brackets etc.)
	tr/1234567890//ds; #kill numbers
	chomp;# Kill return chars
    @temp=split; #break input into individual words

    for (@temp) {

	$data{$_}++; #attach a counter to the word
    }
}

@words=sort(keys(%data)); #put into alphabetical order
for (@words) {

        print "$_\t $file\n" if ~!/\W*/;

}

close (IN);

__END__

# ===== Want to unsubscribe from this list?
# ===== Send mail with body "unsubscribe" to macperl-request@macperl.org

References:
- [MacPerl] Searching help
  - From: Michael Eggleston <nghtstr@michaelsmacshack.com>
- Re: [MacPerl] Searching help
  - From: Chris Nandor <pudge@pobox.com>

Prev by Date: [MacPerl] Memory Leak workaround!
Next by Date: Re: [MacPerl] Memory Leak workaround!
Prev by thread: Re: [MacPerl] Searching help
Next by thread: Re: Re: [MacPerl] Searching help
Navigation: Date Index | Thread Index | Search | Other lists at bumppo.net