[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

Re: [MacPerl] how to prepare (mac) multi-line (text) file, thenmatch ...



Dick Karpinski wrote:
>If I were approaching this, I'd start with the hard part, namely deciding
>what makes a sentence.  If you are very lucky, then the rule might be to
>consider that every period followed by two spaces

That typographical custom is not followed consistently any more.


If the text to be processed needs it, you should also think about:

ellipses...
superlative exclamation or question marks!?!?!??
section headings, or program version numbers: 2.1.2
titles: Mr. Dr. Sr.
abbreviations: etc. et. al.
computery words: command.com, a != b

Perhaps this is very daunting. But think of the fame and awards and sexual
favors a grateful world might bestow on you, if you posted the regexp here.
:)


> should i place carriage returns between sentences and
>a newline between paragraphs,

Why not make it really obvious, if you're manually preprocessing, use
something that would never be in the text, like '%%%'.

Here's one way to do it; use BBEdit to search for periods and go through
the text manually, change the real sentence-enders to '. %%%'. Do the same
for ? and !. Then in Perl, slurp it in and split on ' %%%'.

--
Neil Kandalgaonkar    njk@odyssee.net    http://www.odyssee.net/~njk/



***** Want to unsubscribe from this list?
***** Send mail with body "unsubscribe" to mac-perl-request@iis.ee.ethz.ch