[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

Re: [MacPerl] Bug in Mac libwww-perl-5.10 LWP::RobotUA?



John Irving <john@hyperconnect.com> writes:
}Hi,
}
}Has anyone noticed the following strange behaviour when using the macPerl
}port of libwww-perl-5.10 LWP::RobotUA?:

I have now.  It seems to be the default behavior.

}
}It seems than any request generates a 503 error (Service Unavailable)
}regardless of whether there's a robots.txt file or not (on our own servers
}the robots.txt request is registered in the logs). The 503 error (which
}seems to be incorrect), of course, halts the process. I presume that,
}normally, the original or any acceptable requests (after parsing the
}robots.txt file if one exists) would be processed. BTW: I tried fetching
}files from numerous servers (including the evil empire which does have a
}robots.txt file) with the same result.

It's caused by the following in LWP::RobotUA (line 233):

	if ($self->{'use_alarm'}) {
	    sleep($wait)
	} else {
	    my $res = new HTTP::Response
	      &HTTP::Status::RC_SERVICE_UNAVAILABLE, 'Please, slow down';
	    $res->header('Retry-After', time2str(time + $wait));
	    return $res;
	}


}
}It could be a misconfiguration on my end but I think I have everything
}configured properly and most of the other modules/classes seem to work OK
}after some brief initial testing.
}
}Sample script included below. Any pointers/fixes appreciated.
}
}Thanks,
}
}John
}
}
}use LWP::RobotUA;
}use HTTP::Request;
}use HTTP::Response;
}
}$url = 'http://www.foobar.com/';
}
}my $ua = new LWP::RobotUA('My_RobotUA', 'my@email.com');

The simple fix is to insert right here:

$ua->use_alarm(1);

This is documented in the LWP::RobotUA pod, although it's easy to miss.

By default, since $Config::Config{d_alarm} is 'undef' on MacPerl, alarm
isn't used.

}
}my $request = new HTTP::Request('GET', $url);
}my $response = $ua->request($request);
}if ($response->is_success) {
}    print $response->content;
}    } else {
}    print $response->error_as_HTML;
}    }
}
}

After turning alarm on, I tried your script on mors.gsfc.nasa.gov.  There
seems to be a problem with the rules parser, which assumes that the
robots.txt file is returned as native (i.e. in this case, Mac) text.  On
line 97 of WWW::RobotRules:

   for(split(/\n/, $txt)) {

change \n to \015\012|\012|\015, which should deal with all the
possibilities. I'll report this to Gisle Aas, and I'll put a fixed version
out on ftp://mors.gsfc.nasa.gov/pub/MacPerl/Scripts once I've verified that
the fix is working properly.

}
}.....................................................................
}John Irving
}HyperConnect Online Communications
}john@hyperconnect.com
}http://www.hyperconnect.com/
}
}

---
Paul J. Schinder
NASA Goddard Space Flight Center
Code 693, Greenbelt, MD 20771
schinder@pjstoaster.pg.md.us



***** Want to unsubscribe from this list?
***** Send mail with body "unsubscribe" to mac-perl-request@iis.ee.ethz.ch