[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

Re: mmap was Re: [FWP] rewrite and simplify (out of memory)



On Thu, 1 Jul 1999, David L. Nicol wrote:

> > It strikes me that there should be a fun way to mmap the file. 
> > ...
> > Anybody ever done this?
> > 
> 
> It strikes me that mmapping it won't help any; the problem is
> that you're duplicating your entire file (newsgroups is it?  That can
> easily be thousands of lines.  Mine is 5136, a third of the length
> of the "active" file which holds the list of all news groups innd
> is carrying.  And we're only carrying things in English.
> 
> The problem is trying to slurp that file.
> 
> An obvious solution is to NOT SLURP THINGS BIGGER THAN YOU CAN STORE
> IN MEMORY and a way to do that is by respecting the file pointer idiom
> and processing way-big files a line at a time.

Unfortunately there are a number of cases where I would have loved to
mmap a $string. Take the output of MS-Word's doc to html converter for
example. Mostly its broken HTML so you can't really parse it with a
standard html parser. (The point of the exercise is to fix and clean
up the broken stuff...)

Now I would like to match tags using regexs but the elements are spread
across many lines and unless you sluuurp (as Todd puts it) you can't
match.

I dare say you can tokenize build a broken parse tree and try
fix. Gaargh! I just want to clean up the mess Ms-word makes
sufficiently to slap the document onto my server. I will definitely
play with the Mmap module Adam Rice pointed out.

I also fiddle with large satellite images (400mb) mmap'ing is such a nice
way of doing neighbourhood operations. mmmmmmmmmm........... luffly!

> ________________________________________________________________________
>   David Nicol 816.235.1187 UMKC Network Operations david@news.umkc.edu
>      "on a 80x24 character cell terminal in a damp basement, under
>        a bare light bulb, perched atop a backless wooden stool."

Cor lumme mate! No wonder you're a depressive! ;-)

John Carter                    EMail: ece@dwaf-hri.pwv.gov.za
Telephone : 27-12-808-0374x194 Fax:- 27-12-808-0338
<http://www.geocities.com/SoHo/Cafe/5947> or <http://iwqs.pwv.gov.za>

Despite all cheerful predictions, I'm moving seamlessly from being a
confused and angry young man to becoming an angry and confused old
man.


==== Want to unsubscribe from Fun With Perl?
==== Well, if you insist... Send mail with body "unsubscribe" to
==== fwp-request@technofile.org