[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

[MacPerl] An HTML spell checker



I do most of my web page authoring with BBEdit. Spelling
and typing mistakes occur frequently, and I needed a means
of checking the spelling of files containing HTML.

Unfortunately, the HTML tags get in the way of a word-processor
spell checker.

Solution: A MacPerl program that scans all of the HTML files
in a specified directory, strips out the HTML tags (anything
between an < and > character) and then checks each word
in a dictionary.

The design of the program involves maintaining the dictionary
as a plain text file (loaded at program start up, and written
out when the program finishes). This allows checking with
a Word processor spelling checker .. a useful step!

A report file is created in the directory for later review. Of course,
I could learn Applescript and fire up the scriptable text editor to do the
change, but that will be in a later version! Plurals of words have to
be stored separately, and I need to add some logic to handle this.

As the program executes, a form is displayed for any unknown word,
and three options are given: write details to the error log file,
add the word to the dictionary, or ignore. The line in question
is displayed with the line number.

I've found the program to be useful, and I offer it to the MacPerl
community for education purposes. May your web pages be correctly
spelled!


Charles



# spell_html.pl   18th October 1996

# This program spell checks an HTML file against a file
# of words loaded into an associative array.
# The HTML tags are stripped out ... everything between
# < and > on a line (assumes that tags dont cross line boundaries)
# If words are not found in the list of words, then options are given:

# 1. Ignore the error
# 2. Add the word to the list of words
# 3. Flag as an error (and write to the error log)

require "GUSI.ph" ;

%goodword = ();      # run time copy of dictionary
&loaddictionary;

$folderchoice = &MacPerl'Choose(&GUSI'AF_FILE, 0, "", "", &GUSI'CHOOSE_DIR);
open(ERRS,">".$folderchoice.": Spelling Errors") || die  "couldnt make
error file\n";

print "Analysing the folder $folderchoice\n\n";
chdir($folderchoice);
foreach $file (<*htm*>) {
   &spellcheck($file);
}

close(ERRS);

&writedictionary;

# --------------------------------------------------------------------------
sub spellcheck {
  local($file) = @_;
  open(HTM,$file) || die "could not open $file\n";
  print ERRS "\n\n$file\n";
  $lineno = 0;
  while(<HTM>) {
    $lineno += 1;
    chop;
    $origtext = $_;
    s/<[^>]*>//g;
    s/\(/ /g;
    s/\)/ /g;
    s/'s / /g;     # remove posessive case  eg the cat's food --> cat
    tr/A-Z/a-z/;      # convert uppercase to lower case
    tr/\,\!\.\:\"\;\/\-\?/ /;   # remove punctuation characters
    tr/0-9/ /;        # remove numbers

    @words = split(/ /);
    foreach $word (@words) {
      if ($word ne "") {
         if ($goodword{$word} ne "Y") {
# at this point remove a trailing s (plurals) and try again
# an enhancement yet to be done!

           $action = &MacPerl'Answer("[".$word."] $origtext (Line $lineno)",
                    "Ignore","Add","Error");
           if ($action == 1) {
               $goodword{$word} = "Y";   # add to dictionary
           }
           if ($action == 0) {
               print ERRS " line $lineno\n  $word: $_\n";
           }
         }
       }
    }
  }
  close(HTM);
  return;
}


# --------------------------------------------------------------------------
sub loaddictionary {
    print "Loading the dictionary ....";
    local($wc) = 0;
# youwill need to change the pathname on the next line and of
# course create the file with a few words to get started.
    open(WORDS,"MacintoshHD:words.txt") || die "no words file!\n";
    while(<WORDS>) {
       chop;
       $goodword{$_} = 'Y';
       $wc += 1;
    }
    print "Done\n$wc words loaded\n";
    close(WORDS);
    return;
}

# ---------------------------------------------------------------------------
sub writedictionary {
  local($wc) = 0;
  open(WORDS,">MacintoshHD:words.txt") || die "no words file!\n";
  foreach $key (sort keys(%goodword)) {
    if ($goodword{$key} eq "Y")  {
       print WORDS "$key\n";
       $wc += 1;
     }
    }
  close(WORDS);
  print "$wc words written to dictionary\n";
  return;
}







------------------------------------------------------
Charles Cave
Sydney, Australia
Email: charles@jolt.mpx.com.au
URL:   http://www.ozemail.com.au/~caveman
------------------------------------------------------