According to Strider: > This is frustrating. I've done everything here, and I can't get this script <snip> > Input File (sum.tab) : > > S 9/5/97 554 1 0 c3 > t 9/2/97 14403 1 0 w2 > t 9/3/97 14404 1 0 w2 > t 9/5/97 33059 3 0 w2 > c 9/1/97 652 1 0 b4 > c 9/2/97 24123 13 2 b4 > c 9/3/97 5758 6 1 b4 > c 9/4/97 23898 17 0 b4 > c 9/5/97 104 1 0 b4 > r 9/2/97 355 1 0 w2 > G 9/2/97 1897 1 0 c3 > s 9/5/97 1539 1 0 o2 > a 9/2/97 2569 1 0 p6 > a 9/3/97 1273 1 0 o2 > a 9/4/97 2460 1 0 p6 > a 9/5/97 4465 1 0 p6 > r 9/2/97 1678 1 0 p6 > r 9/3/97 1238 1 0 p6 > r 9/4/97 446 1 0 p6 > s 9/2/97 1840 1 0 w2 > s 9/4/97 1326 1 0 b4 > s 9/5/97 10466 3 0 w2 > e 9/2/97 1692 1 0 w2 > e 9/4/97 6097 4 1 c3 > e 9/5/97 3927 2 0 w2 > s 9/2/97 3089 1 0 c3 > s 9/3/97 2726 1 0 c3 > s 9/4/97 2283 1 0 c3 > s 9/5/97 7027 1 0 c3 > R 9/2/97 177 1 0 w2 > R 9/3/97 3365 1 0 w2 > R 9/4/97 6291 2 0 w2 > W 9/2/97 677 1 0 w2 > W 9/3/97 2710 1 0 w2 > W 9/5/97 1079 1 0 w2 > I looked at your script and I believe you are doing a bit of overkill. Why not something like this: # #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # # # Assuming you read in whatever amount of data you want into # an array called @theArray. # for( $i=0; $i<=$#theArray; $i++ ){ # # Split up the line. # @theLine = split( /\s/, $theArray[$i] ); # # Re-arrange the information. # $newLine = join( "\t", $theLine[1], $theLine[0], $theLine[2], $theLine[3], $theLine[4] ); } # # Now sort it. # @newArray = sort @theArray; # #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # The above will re-arrange the elements of the array to be sorted so they are in the proper order. This then allows you to just use the built in SORT command. However, there are a couple of problems. The first is that the date field will vary from 1/1/97 to 12/31/97. Or to put that another way - you need to convert the date from being 1/1/97 to being 01/01/97 so all of the fields of the date are the same width. You can do this via the SPRINTF command. The second problem is the size of the database. 300mb won't all fit into memory so you will have to determine a different way to sort the information. My suggestion is to break the file up into X number of smaller files which you can pull into memory individually. These files should be no larger than 1/3 of your available memory. Thus, if you have MacPerl set to 8192 and you have 2mb of memory available to use, your files should not be any larger than about 500k. Then the easiest thing to do (although it is a bit slow) is to do the following: # #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # for( $i=0; $i<=$numFiles; $i++ ){ open( THEFILE, "file.$i" ) || die $!; @theInfo = <THEFILE>; close( THEFILE ); for( $j=$i; $j<=$numFiles; $j++ ){ open( THEFILE, "file.$j" ) || die $!; @moreInfo = <THEFILE>; close( THEFILE ); for( $k=0; $k<=$#theInfo; $k++ ){ @theArray[++$#theArray] = $theInfo[$k]; } for( $k=0; $k<=$#moreInfo; $k++ ){ @theArray[++$#theArray] = $moreInfo[$k]; } @newArray = sort @theArray; undef @theArray; for( $k=0; $k<=$#theInfo; $k++ ){ $theInfo[$k] = $newArray[$k]; } for( $k=0; $k<=$#moreInfo; $k++ ){ $moreInfo[$k] = $newArray[$k+$#theInfo+1]; } open( THEFILE, ">file.$j" ) || die $!; print THEFILE @moreInfo; close( THEFILE ); undef @newArray; } open( THEFILE, ">file.$i" ) || die $!; print THEFILE @theInfo; close( THEFILE ); } # #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% # This will do a sort/merge where two of the files are opened, their contents sorted into order, and then the results are placed back into their appropriate files. It is basically a bubble sort expanded into files. I'm sure there are more efficient ways to do this but this should work. If this is a continuation of the single entry problem you wrote about earlier, then you will probably want to continue using the hash entries. However, 300mb will not fit into your computer's memory unless you have about 450mb of RAM. This is due to overhead in the creation of strings, the hash entries, and the like. So unless you have that much memory you need to go to a disk based methodology. Believe me - disk based solutions are slow. But they do work. :-) ***** Want to unsubscribe from this list? ***** Send mail with body "unsubscribe" to mac-perl-request@iis.ee.ethz.ch