[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

[MacPerl] Web crawler in Perl?



Hello,

For some time now, I've been working on a Frontier and FileMaker-based web
crawler. My intention is to index a collection of Denver-related sites,
including the two daily newspapers. I hope to provide a better service than
the big search engines by indexing the sites more often (daily, in the case
of the newspapers), and using a bit of intelligence to better account for
idiosyncrasies of the various sites (the Denver Post, for instance, titles
every page "Denver Post Online" -- not very helpful in a list of search
results. My index will extract a more useful title from the text).

Because of the volume of pages to index, performance is crucial. It's my
hope that MacPerl's text-processing tools and built-in TCP functions will
provide better performance than Frontier (we'll worry about the database end
of things later).

So, I'm looking for input on two fronts. First, do the Perl mavens on this
list think my hopes for Perl are justified? And, second, lest I re-invent
the wheel, can anyone point me to any Perl-based robots that I can use as a
starting point? (I realize that Perl has a robot verb, but I'm looking for
help with the total solution.)

Many thanks.

-Dan


# ===== Want to unsubscribe from this list?
# ===== Send mail with body "unsubscribe" to macperl-request@macperl.org