At 10:08 +0000 10.08.97, Stephen Eastham wrote: >hopefully, i'm asking the right question: > >imagine a file containing a list of words and another file containing >a text which has these words in. [...] Stephen, If your list of words isn't too long, I'd put them all into one regexp. Otherwise, if the list is long but doesn't change too often, I'd build a control-structure of if-then-elses with MacPerl first to speed up the matches. >the first approach seems to be either to read in the whole text document >(which sounds dangerous - just how big could the document be before things >go hay-wire?) or read in a paragraph at a time. thinking about >the paragraph-oriented approach, how does a mac/macperl distinguish lines from >paragraphs in a text file? Reading in the whole file isn't dangerous as long as you set MacPerl's partition size large enough or check the file's size first with something like die "File $file too large\n" if (-s $file > 100_000); Set $/ = ''; to tell Perl to read paragraphs (i.e. blocks of lines delimited by \n\n) and set $/ = undef; to tell Perl to read in the whole text in one chunk. Check the documentation of $/ for details. After reading paragraphs all you have to do is split() your paragraphs into sentences. That should be the hard part I'd expect, because parsing text into SENTENCES isn't trivial for general texts (the definition of 'sentence' isn't all that regular for general texts, just think about abbrevs like "Dr." or "Mr.") I would do something like split() paragraphs at [!?.] and then concatenate a sentence" with the next one again, if it ends in something like "Dr|Mr|Mrs|..." --jc -- Ju:rgen Christoffel, GMD - Forschungszentrum Informationstechnik GmbH E-Mail: christoffel@gmd.de or one of {ftp,news,web}master@gmd.de ***** Want to unsubscribe from this list? ***** Send mail with body "unsubscribe" to mac-perl-request@iis.ee.ethz.ch