[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

Re: [MacPerl] Got it...



>until ($#data < 0) {
>	$line = pop( @data ); # take first record from @data, put into $line
>	next if $line eq ""; # skip the rest if it's empty, go to next
>	for $i (0 .. $#data) { $data[$i] = "" if $data[$i] eq $line; }
>		# read through @data, make empty lines that are duplicates
>of the current one
>	print OUT $line; # dumps current line into output file
>}

ack...why not just read it into an associative array? that will 
immediately get rid of blank lines (well, leave you with one blank line) 
as well as get rid of duplicates...

the reason this is so slow is because for each line, you are going back 
through _each and every line_ to see if there is an equal line...that 
seems pretty lame. ;-) doing a swap sort (search the web, there are 
plenty of references to this) on the data first would be a better method 
because it would group all the same lines together in the array at least 
(although an associative array would probably be the best method since 
you wouldn't need to do a sort first)...

since memory is a concern, one thing you could do is dump directly to a 
dbm database...(which is essentially a disk based associative array)...

try this:

open the input file for reading
open a dbm file for writting
read a line if it isn't blank (ie: "")
print the line to the dbm database
close both files

now, you will have a file on disk that doesn't contain any blanks and 
doesn't contain any duplicate lines.
you can open that back up and read it into another array to do the rest 
of your string comparisons on it.

-jon

Jon (no h) S. Stevens
Web Engineer
j@clearink.com
Clear Ink and The Internet Weather Report
<http://www.clearink.com/> | <http://www.internetweather.com/>


***** Want to unsubscribe from this list?
***** Send mail with body "unsubscribe" to mac-perl-request@iis.ee.ethz.ch