[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

Re: [MacPerl] escape oddities...



At 09.18 -0500 1999.02.08, Scott Prince wrote:
>I thought that the sequence below would clean up any nasties from form
>data submitted through my cgi's.
>
>$form_data{$user_entry} =~ s/<(.|\n)*>//g;        # html

Three things.  Note that the /s modifier makes . include newlines.  Also
note that your regex will be greedy; it will swallow up everything between
two sets of HTML tags.  You probably want ?.

  s/<.*?>//gs

Also, this will not work for HTML tags with embedded >'s in them:

  <IMG SRC="a.gif" ALT=" < my image > " BORDER=1>


>$form_data{$user_entry} =~ s/\t|\n|\r|\|/ /g;     # ht, nl, cr, pipe

A character class would work better (faster and easier to read):

  s/[\t\n\r|]/ /g;


>$form_data{$user_entry} =~ s/ +/ /g;              # multi spaces
>$form_data{$user_entry} =~ s/^ +| +$//g;          # starting & ending spaces



Note below: you should be using \ where you use /.

>One concern being corruption of database files with \n or | characters.
>But after retrieving a db file via ftp (mac), I noticed what seemed to be
>newlines( /r's after fetch ingests them for my mac) breaking my records.
>The odd thing is that the unix server is ignoring the character and not
>seeing /n. - which is a good thing :)

Assuming you mean \r and \n, realize that in MacPerl-speak, \n is newline,
which is \015 (and in CodeWarrior is \r).

But I don't know what you are saying the problem is.  You're saying that
\n's or \r's are in the text?  They shouldn't be.  Whatever your newline
characters are, for Mac, Unix, or Windows, s/\r|\n/ /g or s/[\r\n]/ /g
would remove them all.



>The book, "Perl5 by Example" has a table listing all the usual escape
>char's, but, further into the book there is a code example using the
>escape /cM. A quick test verifies that MacPerl recognizes this as a /r.

No.  MacPerl sees \cM as \n, and \cJ as \r.  Unix and Windows do the opposite.


>The obvious solution for cgi's is to replace any whitespace char with a
>space. But is there any way to predict the way these things bounce from
>platform to platform?

Look for nonambiguous characters.  Look for \015 and \012 instead of \n and
\r (though if you are looking for _both_, at the same time, and in no
particular order, then it probably won't matter, since [\r\n] == [\015\012]
== [\n\r] == [\012\015] on every ASCII platform I know of).

Anyway, I'm still not clear on what problem you're having, but see
perlport.pod on CPAN or my MacPerl site, the section on newlines, and see
if that helps.  If it doesn't, ask again in a different way (unless someone
else can see what problem you have).

--
Chris Nandor          mailto:pudge@pobox.com         http://pudge.net/
%PGPKey = ('B76E72AD', [1024, '0824090B CE73CA10  1FF77F13 8180B6B6'])

***** Want to unsubscribe from this list?
***** Send mail with body "unsubscribe" to mac-perl-request@iis.ee.ethz.ch