Scott Prince wrote: > > Hello all... > > I thought that the sequence below would clean up any nasties from form > data submitted through my cgi's. > > $form_data{$user_entry} =~ s/<(.|\n)*>//g; # html s/<.*>//gs; # /s on a m// or s/// makes . match newlines. But... I <B>don't</B> think this <I>substitution</I> will work <U>very</U> well... s/<.*?>//gs; would be closer. But to parse HTML safely, you really should use the HTML::Parse module. > $form_data{$user_entry} =~ s/\t|\n|\r|\|/ /g; # ht, nl, cr, pipe s/[\t\n\r|]/ /g; Don't use alternation where a character class will do. tr/\t\n\r|/ /; Don't use substitution where a translation will do. > $form_data{$user_entry} =~ s/ +/ /g; # multi spaces tr/\t\n\r |/ /s; /s on a translation squashes runs of characters to a single character. > $form_data{$user_entry} =~ s/^ +| +$//g; # starting & ending spaces This is covered in the FAQ. It is more efficient to do two substitutions: s/^ +//; s/ +$//; > One concern being corruption of database files with \n or | characters. > But after retrieving a db file via ftp (mac), I noticed what seemed to be > newlines( /r's after fetch ingests them for my mac) breaking my records. > The odd thing is that the unix server is ignoring the character and not > seeing /n. - which is a good thing :) Are you retrieving the db file in TEXT/ASCII mode or in BINARY mode? If it's a plaintext database file, you should transfer it in TEST mode. If it's a binary database file, you should transfer it in BINARY mode. > The book, "Perl5 by Example" has a table listing all the usual escape > char's, but, further into the book there is a code example using the > escape /cM. A quick test verifies that MacPerl recognizes this as a /r. \r and \n in Perl are platform dependent. \cM and \cJ (and \x0A, \012, \x0D, and \015) are platform independent. > The obvious solution for cgi's is to replace any whitespace char with a > space. But is there any way to predict the way these things bounce from > platform to platform? Over FTP, using TEXT mode will translate line endings for the system receiving the file. The standard line-ending for text being sent between systems is \015\012. Ronald ***** Want to unsubscribe from this list? ***** Send mail with body "unsubscribe" to mac-perl-request@iis.ee.ethz.ch