I do most of my web page authoring with BBEdit. Spelling and typing mistakes occur frequently, and I needed a means of checking the spelling of files containing HTML. Unfortunately, the HTML tags get in the way of a word-processor spell checker. Solution: A MacPerl program that scans all of the HTML files in a specified directory, strips out the HTML tags (anything between an < and > character) and then checks each word in a dictionary. The design of the program involves maintaining the dictionary as a plain text file (loaded at program start up, and written out when the program finishes). This allows checking with a Word processor spelling checker .. a useful step! A report file is created in the directory for later review. Of course, I could learn Applescript and fire up the scriptable text editor to do the change, but that will be in a later version! Plurals of words have to be stored separately, and I need to add some logic to handle this. As the program executes, a form is displayed for any unknown word, and three options are given: write details to the error log file, add the word to the dictionary, or ignore. The line in question is displayed with the line number. I've found the program to be useful, and I offer it to the MacPerl community for education purposes. May your web pages be correctly spelled! Charles # spell_html.pl 18th October 1996 # This program spell checks an HTML file against a file # of words loaded into an associative array. # The HTML tags are stripped out ... everything between # < and > on a line (assumes that tags dont cross line boundaries) # If words are not found in the list of words, then options are given: # 1. Ignore the error # 2. Add the word to the list of words # 3. Flag as an error (and write to the error log) require "GUSI.ph" ; %goodword = (); # run time copy of dictionary &loaddictionary; $folderchoice = &MacPerl'Choose(&GUSI'AF_FILE, 0, "", "", &GUSI'CHOOSE_DIR); open(ERRS,">".$folderchoice.": Spelling Errors") || die "couldnt make error file\n"; print "Analysing the folder $folderchoice\n\n"; chdir($folderchoice); foreach $file (<*htm*>) { &spellcheck($file); } close(ERRS); &writedictionary; # -------------------------------------------------------------------------- sub spellcheck { local($file) = @_; open(HTM,$file) || die "could not open $file\n"; print ERRS "\n\n$file\n"; $lineno = 0; while(<HTM>) { $lineno += 1; chop; $origtext = $_; s/<[^>]*>//g; s/\(/ /g; s/\)/ /g; s/'s / /g; # remove posessive case eg the cat's food --> cat tr/A-Z/a-z/; # convert uppercase to lower case tr/\,\!\.\:\"\;\/\-\?/ /; # remove punctuation characters tr/0-9/ /; # remove numbers @words = split(/ /); foreach $word (@words) { if ($word ne "") { if ($goodword{$word} ne "Y") { # at this point remove a trailing s (plurals) and try again # an enhancement yet to be done! $action = &MacPerl'Answer("[".$word."] $origtext (Line $lineno)", "Ignore","Add","Error"); if ($action == 1) { $goodword{$word} = "Y"; # add to dictionary } if ($action == 0) { print ERRS " line $lineno\n $word: $_\n"; } } } } } close(HTM); return; } # -------------------------------------------------------------------------- sub loaddictionary { print "Loading the dictionary ...."; local($wc) = 0; # youwill need to change the pathname on the next line and of # course create the file with a few words to get started. open(WORDS,"MacintoshHD:words.txt") || die "no words file!\n"; while(<WORDS>) { chop; $goodword{$_} = 'Y'; $wc += 1; } print "Done\n$wc words loaded\n"; close(WORDS); return; } # --------------------------------------------------------------------------- sub writedictionary { local($wc) = 0; open(WORDS,">MacintoshHD:words.txt") || die "no words file!\n"; foreach $key (sort keys(%goodword)) { if ($goodword{$key} eq "Y") { print WORDS "$key\n"; $wc += 1; } } close(WORDS); print "$wc words written to dictionary\n"; return; } ------------------------------------------------------ Charles Cave Sydney, Australia Email: charles@jolt.mpx.com.au URL: http://www.ozemail.com.au/~caveman ------------------------------------------------------