[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

Re: [MacPerl-WebCGI] Search Engine Questions



>At 21:11 -0400 4/12/1999, Kevin Reid wrote:
>>I once wrote a search engine that searched a large tab-delimited text
>>file. The way I handled limiting results was that I simply stopped
>>searching after N hits, and put a link at the bottom that included the
>>byte position in the file at which the search stopped so that the search
>>could resume from that point.
>>
>>The limitation of this method is that the results appear in their order
>>in the file.
Richard Gordon Continued:
>That would probably be acceptable if I just had a single document to
>deal with, but I've got around 4000 pages of stuff scattered over 3
>directories in maybe 70 files. It's beginning to look like maybe you
>first write out a file with all of the hits, then open it and send
>the first 20 along with a link back into the file that is supposed to
>extract the next 20 or something. This seems pretty messy since I
>guess you'd have to use some kind of serialization scheme to name the
>files to keep them straight and you've still got to get rid of all of
>them at some point.
>
>With a stateless browser connection, I don't know of another way to
>avoid having to conduct a new search from scratch. Thanks.

And I wonder:

What if your initial search script yields a single file that is an _index_
of all of the hits? The script also returns an html page with the first 20
(or whatever, could be a setting). Script also assign an id number based on
some scheme like process id concatenated with time. The id is used to name
the index file and is therefore incorporated into your "next 20 hits" link.
The link is another call to the cgi, with "?id=$idnum&index=$indexnum" (no
quotes) appended. Upon a call to the cgi with these fields appended, the
cgi creates links from the next 20 indexed hits rather than doing a new
search.

The script could have housekeeping functions to delete files older than 24
hours (or whatever); in this method, though, there'd only be one file
created per search, and only one search through all of your original files
per request.

This is all conceptual, but it doesn't seem _that_ messy to map the
locations in the original files and utilize some simple indexing scheme.
- Bruce

~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Bruce Van Allen
bva@cruzio.com
408/429-1688
P.O. Box 839
Santa Cruz, CA  95061

==== Want to unsubscribe from this list?
==== Send mail with body "unsubscribe" to macperl-webcgi-request@macperl.org