[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

Re: [MacPerl] Counting the number of lines in a document




On Sun, 5 May 1996 17:27:51 +0200 (MET DST)
Bart Lateur <color@pophost.eunet.be> wrote:

> Personally, I detest the use of fixed length records, unless the length of
> the fields is garanteed to fall between some narrow limits. If you provide,
> say, 40 characters for a name, this will be far to much in most cases, while
> sometimes it still isn't enough.
>...
> Why don't you just *ask* Perl where the current line is
> (before reading it) using tell(), and store *this* value instead. 

If you insist on indexing by line numbers there is a simple scheme that
has a fixed auxiliary storage requirement and degrades gracefully with
increasing file size.  It is called "Accordion Index Sequential" and
was kicked around quite a bit here at Maryland in the mainframe days.

The basic idea is you keep a table of seek positions.  Initially it is
every line, but when you fill the table you throw away every odd
pointer, keeping only 2, 4, 6, 8 etc and make a note that you now
only have half the pointers.  When you seek you might have to read
ahead one record (that is, when you seek record 9 you follow the
pointer to record 8 then read one more record).  When you fill the
table again, again you throw away every other pointer, now keeping
pointers to records 4, 8, 12, 16 etc.  Now you might have to read
up to three more records on each seek (but they're probably in the
same block anyway so you might not have to do any real IO at all).

It only requires one fixed-length pointer array and a scalar
variable to keep track of how many lines are currently represented
by each current pointer entry.

Unfortunately I am up to my ears in computational linguistics and
don't have time to code this up.  Maybe somebody will do a good job
and add it to the Perl toolkit...

> --- Embracing the KIS principle: Keep It Simple

Albert Einstein was supposed to have said:

"Everything should be made as simple as possible AND NO SIMPLER!"

----------------------
Charles B. Cranston <zben@ni.umd.edu>
http:"//www.wam.umd.edu/~zben"