At 09.18 -0500 1999.02.08, Scott Prince wrote: >I thought that the sequence below would clean up any nasties from form >data submitted through my cgi's. > >$form_data{$user_entry} =~ s/<(.|\n)*>//g; # html Three things. Note that the /s modifier makes . include newlines. Also note that your regex will be greedy; it will swallow up everything between two sets of HTML tags. You probably want ?. s/<.*?>//gs Also, this will not work for HTML tags with embedded >'s in them: <IMG SRC="a.gif" ALT=" < my image > " BORDER=1> >$form_data{$user_entry} =~ s/\t|\n|\r|\|/ /g; # ht, nl, cr, pipe A character class would work better (faster and easier to read): s/[\t\n\r|]/ /g; >$form_data{$user_entry} =~ s/ +/ /g; # multi spaces >$form_data{$user_entry} =~ s/^ +| +$//g; # starting & ending spaces Note below: you should be using \ where you use /. >One concern being corruption of database files with \n or | characters. >But after retrieving a db file via ftp (mac), I noticed what seemed to be >newlines( /r's after fetch ingests them for my mac) breaking my records. >The odd thing is that the unix server is ignoring the character and not >seeing /n. - which is a good thing :) Assuming you mean \r and \n, realize that in MacPerl-speak, \n is newline, which is \015 (and in CodeWarrior is \r). But I don't know what you are saying the problem is. You're saying that \n's or \r's are in the text? They shouldn't be. Whatever your newline characters are, for Mac, Unix, or Windows, s/\r|\n/ /g or s/[\r\n]/ /g would remove them all. >The book, "Perl5 by Example" has a table listing all the usual escape >char's, but, further into the book there is a code example using the >escape /cM. A quick test verifies that MacPerl recognizes this as a /r. No. MacPerl sees \cM as \n, and \cJ as \r. Unix and Windows do the opposite. >The obvious solution for cgi's is to replace any whitespace char with a >space. But is there any way to predict the way these things bounce from >platform to platform? Look for nonambiguous characters. Look for \015 and \012 instead of \n and \r (though if you are looking for _both_, at the same time, and in no particular order, then it probably won't matter, since [\r\n] == [\015\012] == [\n\r] == [\012\015] on every ASCII platform I know of). Anyway, I'm still not clear on what problem you're having, but see perlport.pod on CPAN or my MacPerl site, the section on newlines, and see if that helps. If it doesn't, ask again in a different way (unless someone else can see what problem you have). -- Chris Nandor mailto:pudge@pobox.com http://pudge.net/ %PGPKey = ('B76E72AD', [1024, '0824090B CE73CA10 1FF77F13 8180B6B6']) ***** Want to unsubscribe from this list? ***** Send mail with body "unsubscribe" to mac-perl-request@iis.ee.ethz.ch