Dick Karpinski wrote: >If I were approaching this, I'd start with the hard part, namely deciding >what makes a sentence. If you are very lucky, then the rule might be to >consider that every period followed by two spaces That typographical custom is not followed consistently any more. If the text to be processed needs it, you should also think about: ellipses... superlative exclamation or question marks!?!?!?? section headings, or program version numbers: 2.1.2 titles: Mr. Dr. Sr. abbreviations: etc. et. al. computery words: command.com, a != b Perhaps this is very daunting. But think of the fame and awards and sexual favors a grateful world might bestow on you, if you posted the regexp here. :) > should i place carriage returns between sentences and >a newline between paragraphs, Why not make it really obvious, if you're manually preprocessing, use something that would never be in the text, like '%%%'. Here's one way to do it; use BBEdit to search for periods and go through the text manually, change the real sentence-enders to '. %%%'. Do the same for ? and !. Then in Perl, slurp it in and split on ' %%%'. -- Neil Kandalgaonkar njk@odyssee.net http://www.odyssee.net/~njk/ ***** Want to unsubscribe from this list? ***** Send mail with body "unsubscribe" to mac-perl-request@iis.ee.ethz.ch