[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

Re: [MacPerl-AnyPerl] good idea or bad idea?



>Sorry for taking so long to reply and thank you, but here it is:  thanks.
>
>I decided to stick with my own implementation instead of using the 
>single regex you had.  The reason is this, I tried your regex and it 
>kept failing.  I couldn't figure it out until I looked at the 
>original HTML page.  I'm looking at the value of a particular META 
>tag and they had reversed the name= and content= from what I 
>typically see (name first then content).  Obviously I could just 
>reverse the order in your regex, but then I got paranoid that they 
>might decide to do it the "normal" way and break that regex too.
>
>my way is position independent (not that that is the reason i wrote 
>it that way 8-) and I couldn't think of a way to write a single 
>regex that was position independent to strip out the data I'm 
>looking for.
>
>Thanks again,
>Kevin
>
>
>At 2:02 PM -0400 7/13/2000, Ronald J Kimball wrote:
>>On Thu, Jul 13, 2000 at 12:43:18PM -0500, Kevin van Haaren wrote:
>>  > I have a couple of questions about the following code snippet:
>>  >
>>  > my $story_time;
>>  > foreach my $line (split(/\015/,$sci_page_head)) {
>>  >	if ($line =~ /OriginalPublicationDate/) {
>>  >		($story_time) = ($line =~ /content=\"(.*)\"/);
>>  >	}
>>  > }
>>  > print $story_time;
>>  >
>>  > First, is declaring the my $line in the foreach line a good idea/bad
>>  > idea/doesn't matter?  I know if I declare the $story_time as my
>>  > inside the loop it doesn't seem to exist outside the loop.  The code
>>  > as written works fine (I don't need the $line variable outside the
>>  > loop).
>>
>>Doesn't matter, but if you don't need $line outside the loop, it makes more
>>sense to declare it as you did.
>>
>>
>>  > Second, is this a good way of doing this loop?  $sci_page_head is
>>  > just the header info from a web page sucked in via LWP::Simple.  If
>>  > it where a text file I'd probably just do a while <> {}.  I guess I'm
>>  > asking if there is a way to get rid of the $line variable altogether.
>>
>>I don't think you need a loop at all:
>>
>>my($story_time) =
>>   $sci_page_head =~ /OriginalPublicationDate.*content="(.*)"/;
>>print $story_time;
>>
>Ronald


Hi Kevin,

I've expanded Ronald's regex in a way you are looking for:

$sci_page_head =~ 
/(?:OriginalPublicationDate.*)?content="([^"]*)"(?:.*OriginalPublicationDate)?/

Comments:

(a) (?:OriginalPublicationDate.*)? respectively (?:.*OriginalPublicationDate)?

     Matches the (sub-) pattern zero or one time. The parentheses 
delimit the (sub-) pattern, and the
     '?:' immediately following the opening '(' ensures, that this 
(sub-) pattern is not saved (see
     perlre.pod).

(b) ([^"]*) instead of (.*)
     [^"] matches everything except double quotes. This ensures that 
the correct subpattern
     will be saved.

HTH

Best regards

--Thomas




==== Want to unsubscribe from this list?
==== Send mail with body "unsubscribe" to macperl-anyperl-request@macperl.org