[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

[MacPerl] Change "WORD" to "Word"



According to Ken Cartner:
> 
> I'm trying to change words that are in all upercase to upper and lower.
> And i can'rt seem to come up with a logical way to achieve this.

How about this:

1. Create a file called SUBS.DAT.
2. Write a program to scan the information to be changed.
2a. Each time you do, you read in SUBS.DAT and use it to do
	your substitutions.

2b. Each time you do, you also write out a new SUBS.DAT
	file which contains the old list plus any new words you've
	found in the current document.

3. Then you just edit the SUBS.DAT file to put in the new changes.

Example:

SUBS.DAT
L.A. LAKERS:L.A. Lakers
SAN ANTONIO:San Antonio
etc....


myProg.pl
#!perl
#
#%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
#
#	Get the list
#
	@myList = ();
	open( THEFILE, "SUBS.DAT" ) || die $!;
	while( <THEFILE> ){
		chomp;
		$myList[++$#myList] = $_;
		}

	close( THEFILE );
#
#	Get the document
#
	@myDoc = ();
	open( THEFILE, "myDoc.dat" ) || die $!;
	while( <THEFILE> ){
		chomp;
		$myDoc[++$#myDoc] = $_;
		}

	close( THEFILE );
#
#	Scan it
#
	for( $i=0; $i<=$#myDoc; $i++ ){
		for( $j=0; $j<=$#myList; $j++ ){
			@theLine = split( /:/, $myList[$j] );
			$myDoc[$i] =~ s/$theLine[0]/$theLine[1]/g;
			}
#
#	Begin looking for new UPPERCASE words to add to the list.
#
		@theLine = split( /\s/, $myDoc[$i] );
		for( $j=0; $j<=$#theLine; $j++ ){
#
#	If there is a trailing non-alphanumeric character - get rid of it.
#
			if( $theLine[$j] =~ /[\,\;\:\-\.\!\?]$/ ){
				chop( $theLine[$j] );
				}
#
#	Is this word all UPPERCASE characters?
#
#	If so, then check to see if we need to add it to the list as WORD:WORD.
#
			if( $theLine[$j] =~ /^[A-Z]+$/ ){
				$theFlag = 0;
				for( $k=0; $k<=$#myList; $k++ ){
					if( $myList[$k] =~ /$theLine[$j]/ ){
						$theFlag = 1;
						last;
						}
					}

				if( $theFlag < 1 ){
					$myList[++$#myList] = "$_:$_";
					}
				}
			}
		}
#
#	Write out the new document.
#
	open( THEFILE, ">myDoc.new" ) || die $!;
	for( $i=0; $i<=$#myDoc; $i++ ){
		print THEFILE $myDoc[$i], "\n";
		}

	close( THEFILE );
#
#	Update the substitution list.
#
	open( THEFILE, ">SUBS.DAT" ) || die $!;
	for( $i=0; $i<=$#myList; $i++ ){
		print THEFILE $myList[$i], "\n";
		}

	close( THEFILE );
	exit( 0 );
#
#%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
#

The problem with the above is that things like L.A. LAKERS
will wind up in the file as:

L:L
A:A
LAKERS:LAKERS

or

L.A:L.A
LAKERS:LAKERS

But seeing as how you only have to correct this information
once and then the program does the corrections from then on
it's not all that bad.  Also, you can work on a way of
flagging multiple UPPERCASE words.  A simple way to do this
is to just have a secondary flag which is checked each time
a new word is found to be all in uppercase.  The flag is
unset each time a non-all uppercase word is located too.
Then all you have to do is IF the secondary flag is set,
then pop off the last entry on the @myList array, split it
up, tack on the new word (with a space between the words),
and then put it back together again.  In this case, your
array would contain something like:

L:L

then

L A:L A

then

L A LAKERS:L A LAKERS

(And hopefully they don't send something like: THE L A
LAKERS WIN THE GAME! - or some such.)

***** Want to unsubscribe from this list?
***** Send mail with body "unsubscribe" to mac-perl-request@iis.ee.ethz.ch