[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

Re: Re: [MacPerl] Searching help



>Hi everyone.
>
>Okay, I am sorry about that.  Here is what I want it to do.  I would like
>to replace the <input *> tags with the value that is stored in the %data
>hash.  I get the field name correctly, however the replace does not
>happen.  I think it has to do with the <input> tag part, but I am not
>sure.  I will attach the bit of code with this email again along with a
>small bit of HTML code to search and replace with.
>
>Oh, the %data hash is generated by a data file created by a different
>script.  I have made sure that the right info is in there.
>
>Perl code:
>
>while ($done ne "true") {
>      $done = "true";
>      if ($TheHTML =~ m#(<input.*?>)#i) {
>           $done = "false";
>           $newline = $1;
>           if ($newline =~ m#(hidden|submit|reset)#i) {
>                $TheHTML =~ s/$newline//g;
>           } elsif ($newline =~ m#checkbox#i) {
>                $newline =~ m#(name=".*?")#i;
>                $thename = $1;
>                $thename =~ s/name="//i;
>                chop $thename;
>                if ($data{$thename}) {$TheHTML =~ s/$newline/$yes/g;}
>                     else {$TheHTML =~ s/$newline/$yes/g;}
>           } else {
>                $newline =~ m#(name=".*?")#i;
>                $thename = $1;
>                $thename =~ s/name="//i;
>                chop $thename;
>                $TheHTML =~ s/$newline/$data{$thename}/g;
>           }
>      }
>}
>
>HTML code:
>
><input type="checkbox" name=test1 value="[Y]">test 1 <br><input
>type="checkbox" name=test2 value="[Y]">test 2
>name <input type="text" name=name>
>
>
>There, that ought to be helpful, I think.
>
>Thanks again!!!!
>===========================================================================

Hello Mike,

parsing HTML "by hand" isn't the best possible solution to your 
problem. For example, some of your regular expressions rely on the 
existense of the /name="/ pattern. Look at your HTML code example and 
you will see, that this pattern will never be matched. It's also 
possible, that the tag attributes are enclosed by single quotes and 
so on...

Before reinventing the wheel, you are better off using the 
HTML::Parser or the HTML::TokeParser module. Because HTML::TokeParser 
doesn't need subclasing, it's a bit easier to use. I've written a 
little script that uses this module and (hopefully) solves your 
problem:

___________

#! perl -w
use strict;
use HTML::TokeParser;
use HTML::Entities;

my $html_file = 'testfile.html';
my $backup_file = $html_file . '.bak'; # for safety, make a backup 
(don't exceed 31 chars filename length)
my $tmp_file = 'html.tmp';


print "Parsing HTML file $html_file ...\n";

open (TMP, ">$tmp_file");

my $p = HTML::TokeParser->new("$html_file");

my $text = $p->get_text(); # get text up to first tag, should be empty
#
# Note: Any entities will be expanded to their corresponding character.
#          We will use encode_entities(...) to reverse this decoding.
#
print TMP encode_entities($text); # write the text to our tmp-file

while (my $token = $p->get_tag()) {
   if ( $token->[0] eq "input" ) { # Ha!, we found an input tag    
       my $thename = $token->[1]{name}; # see TokeParser.pm pod for details
       my $type = $token->[1]{type};
       if ( ($type eq 'hidden') or ($type eq 'submit') or ($type eq 'reset') ) {
          #
          # do your hidden-submit-reset stuff here, i.e. print '' to TMP;
          # because this is an empty string, you need to do nothing in this case
          # Note: $newline = $origtext = $token->[3]
          #
          my $empty = '---hidden-submit-reset---';
          print TMP encode_entities($empty); # I will print this to 
TMP, to indicate the substitution

       } elsif ($type eq 'checkbox') {
          #
          # do your checkbox stuff here, i.e. print $yes to TMP
          # Note: $newline = $origtext = $token->[3]
          #
          my $yes = '---checkbox---';
          print TMP encode_entities($yes); # I will print this to TMP, 
to indicate the substitution
       } else {
          #
          # do your else stuff here, i.e. print  $data{$thename} to TMP
          # Note: $newline = $origtext = $token->[3]
          #
          my $data = '---$data{$thename}---';
          print TMP encode_entities($data); # I will print this to 
TMP, to indicate the substitution
       }#if

   } else {# any other tag (not input)
       if ( $token->[0] =~ m|^/| ) { # it is a end tag
           print TMP $token->[1]; # origtext
       } else { # it is a start tag
          print TMP $token->[3]; # origtext
       }#if
   }#if
   $text = $p->get_text(); # get text up to next tag, may be empty
   print TMP encode_entities($text);
}#while

close (TMP);

# rename OLDNAME,NEWNAME
rename($html_file, $backup_file) or die "Cannot rename $html_file to 
$backup_file: $!";
rename($tmp_file, $html_file) or die "Cannot rename $tmp_file to 
$html_file: $!";

print "Done.\n";

__END__


The script hasn't been tested much. Use it with care. Enjoy.  :-)

Hope that helps.

Best regards

--Thomas

# ===== Want to unsubscribe from this list?
# ===== Send mail with body "unsubscribe" to macperl-request@macperl.org