>At 21:11 -0400 4/12/1999, Kevin Reid wrote: >>I once wrote a search engine that searched a large tab-delimited text >>file. The way I handled limiting results was that I simply stopped >>searching after N hits, and put a link at the bottom that included the >>byte position in the file at which the search stopped so that the search >>could resume from that point. >> >>The limitation of this method is that the results appear in their order >>in the file. Richard Gordon Continued: >That would probably be acceptable if I just had a single document to >deal with, but I've got around 4000 pages of stuff scattered over 3 >directories in maybe 70 files. It's beginning to look like maybe you >first write out a file with all of the hits, then open it and send >the first 20 along with a link back into the file that is supposed to >extract the next 20 or something. This seems pretty messy since I >guess you'd have to use some kind of serialization scheme to name the >files to keep them straight and you've still got to get rid of all of >them at some point. > >With a stateless browser connection, I don't know of another way to >avoid having to conduct a new search from scratch. Thanks. And I wonder: What if your initial search script yields a single file that is an _index_ of all of the hits? The script also returns an html page with the first 20 (or whatever, could be a setting). Script also assign an id number based on some scheme like process id concatenated with time. The id is used to name the index file and is therefore incorporated into your "next 20 hits" link. The link is another call to the cgi, with "?id=$idnum&index=$indexnum" (no quotes) appended. Upon a call to the cgi with these fields appended, the cgi creates links from the next 20 indexed hits rather than doing a new search. The script could have housekeeping functions to delete files older than 24 hours (or whatever); in this method, though, there'd only be one file created per search, and only one search through all of your original files per request. This is all conceptual, but it doesn't seem _that_ messy to map the locations in the original files and utilize some simple indexing scheme. - Bruce ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Bruce Van Allen bva@cruzio.com 408/429-1688 P.O. Box 839 Santa Cruz, CA 95061 ==== Want to unsubscribe from this list? ==== Send mail with body "unsubscribe" to macperl-webcgi-request@macperl.org