[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

[MacPerl] Introduction to MacPerl article



Hello,

I'm a quiet subscriber to this list, and my only posting so far has
been a request for information on Applescript.

I'm writing an article on MacPerl for the magazine of Sydney's
Macintosh User group called "Club Mac". Their magazine is called
Macinations and has a circulation of around 1500 copies.

I thought I would send this article to the list for your comments
and suggestions. Please respond quickly as I need to submit the
final copy in about 72 hours time.

Other MacPerl projects I have on the go are programs to check the internal
link integrity of a web site, indexing all HTML documents, and a
Unix like shell.


Thanks in advance for your comments,

Charles Cave
charles@jolt.mpx.com.au


The article was written in Word 5.1, so all the nice formatting
and bolding has disappeared. Let me know if you want the Word
version. The Figures refer to screen shots.


Introducing MacPerl

Charles Cave © 1996

Perl is a scripting language written by Larry Wall initially for the
Unix environment. The motivation for its development was for
scanning arbitrary text files, extracting information from those
files, and printing reports. The language is intended to be practical
(easy to use, efficient, complete) rather than beautiful (tiny,
elegant, minimal).

Documentation for the language is the "Programming Perl" book
by Larry Wall and published by O'Reilly and Associates. Books
from this publisher are distinguished by the animals on the front
cover, and "Programming Perl" is most notable by its camel.
Consequently, the camel has become the icon and symbol of
anything to do with perl.

I learnt perl on several Unix machines, and was delighted to find
an implementation on the Macintosh, not surprisingly named
MacPerl. What I like about perl is the ease in which scripts can be
written to do a variety of tasks: reading the contents of files and
searching for patterns of characters; processing the contents of a
directory; and making my own utilities.

A major benefit of using perl is its availablility on Unix, DOS,
Windows/95 and Macintosh. The same code can be used on all
platforms with the exception of specific system routines. If
programming is not your strength, you will find that perl is very
powerful without the complexity of C or C++. Perl has become the
default language of Web server scripts (gateway scripts), and
MacPerl can be used with Macintosh Web servers.


MacPerl

The MacPerl implementation (by Matthias Neeracher in
Switzerland) provides an interface to some components of the
toolbox, such as dialogue boxes (similar to the Ask and Answer
commands of Hypertalk), interfaces to AppleScript and the ability
to save a script as a droplet for drag and drop support. Technical
support for MacPerl is excellent and a mailing list exists where
users (and Matthias) help each other.

My primary usage for MacPerl is the conversion of spreadsheets
and FileMaker Pro databases into HTML documents for the
World Wide Web. The remainder of this article is a tutorial in
MacPerl and a demonstration of how a spreadsheet can be
converted into HTML.


The Spreadsheet

[Figure 1: Screen shot of the spreadsheet]

A spreadsheet is a useful method of storing information as a
database, with each column representing a field. In this example, I
have a spreadsheet of book titles that make up my reading wish
list. The sheet has the following fields:

 Classification Code
 Book Title
 Author
 Notes about the book
 A flag (Y) indicating if I own the book

The information is sorted first by classification code, then author
followed by book title.

[Figure 2: Save Spreadsheet as Text]

In order to process the spreadsheet with Perl, it is necessary to
save a copy in Text format. This will reduce each row to a line in
the file with each column separated by a tab character. A similar
thing can be done with FileMaker Pro, so this tutorial is equally
applicable to the conversion of databases.

The goal of this program is to display the categories of books as a
list of links to separate files for each category.


The Tutorial

Perl has three types of variables: scalar variables, arrays or lists,
and associative arrays allowing very sophisticated data
processing.

Scalar variables are used for storing numbers and strings of
characters and these variable names always begin with a dollar
sign:

 $myname = "Charles";
 $children = 2;
 $cost = 15.5;

Arrays or lists are sets of scalar variables where each element can
be independently accessed. Array variables always start with an @
sign.

 @animals = ('cat', 'dog', 'mouse');
 print "First animal is @animals[0]\n";      # will
 print cat

Associative arrays are a more powerful form of array where a
string of characters can be used as the index. This is very useful for
creating simple databases systems, lookup tables and
concordances.

 $capital{"NSW"} = "Sydney";
 $capital{"Victoria"} = "Melbourne";
 print "The capital of NSW is $capital{\"NSW\"}\n";


The program

My program to process the text file version of the spreadsheet
will need to open the file and process each line of the file. The
individual cells or fields of the spreadsheet will be separated by
tab characters, so it will be necessary to break up the line into
components. Each category of books needs to be written to a
separate file, so logic is required to check the categories.

The logic of the program is expressed in "pseudo-code" which is
meant to be a human-readable version of the program.

Open the text file for processing
Load the category descriptions into an associative array
Read each line of the input file and do the following for each line:
        If the book category is different to the previous line
                Write the new category details to the category listing
                Close the previously opened file of book details if appropriate
                Open a new file for book details
When the end of the input file is reached, close both files.



# MacPerl program written by Charles Cave to
convert a tab delimited
# text file into HTML.

require "GUSI.ph" ;

$infile = &MacPerl'Choose(
     &GUSI'AF_FILE, 0, "", "");

open(IN,$infile) || die "could not open input file";
open(NDX,">index.html") || die "could not create
index.html";
print NDX "<HTML><HEAD>\n<TITLE>";
print NDX "Books I would like to read in my
lifetime</TITLE></HEAD>\n";
print NDX "<BODY BGCOLOR=\"#FFFFFF\"><HR>";
print NDX "<H2>Lifetime Reading</H2>\n<ul>\n";

$firsttime = "Y";
$docfooter = "\n\n<HR></BODY></HTML>\n";
# load the codetable.txt file into an associative
array

open(CTC,"codetable.txt") || die "could not create
codetable.txt file";
while(<CTC>) {
  chop;
  ($bcat, $codedesc) = split(/\t/, $_);
  $bookcategory{$bcat} = $codedesc;
}
close(CTC);

$lastcode = 7777;

while (<IN>) {
  chop;
  s/"//g;     # get rid of extraneous double quotes
generated by Excel
  ($bcat, $title, $author, $notes, $sortseq, $library) = split(/\t/, $_);

 if ($lastcode ne $bcat) {
     $firsttime = "N";
     $lastcode = $bcat;
     if ($firsttime ne "Y") {
        print OUT "$docfooter";
        close(OUT);        # close the previously  opened file
     }
     open(OUT, ">books".$bcat.".htm") || die "couldnt  create booksnn.htm file";
     print OUT "<HTML><HEAD>\n<TITLE>";
     print OUT "Category:  $bookcategory{$bcat}</TITLE></HEAD>\n";
     print OUT "<BODY BGCOLOR=\"#FFFFFF\"><HR><H2>$bookcategory{$bcat}</H2>\
n";
     print NDX "<li><A HREF=\"books".$bcat.".htm\">$title</A>\n";
    } else {
         print OUT "<strong>$author</strong>, $title\n";
         if ($library ne "") {
             print OUT " <FONT COLOR=red>***</FONT> ";
         }
         print OUT "<BR>";
         if ($notes ne "") {
             print OUT "$notes\n";
         }
          print OUT "<P>\n";
    }

} ;
print OUT "$docfooter";
print NDX "</ul>\n$docfooter";
close(OUT);
close(CTC);
close(NDX);

Commentary on the code:

1. The &MacPerl'Choose function will prompt the user for the
choice of input file using the standard Macintosh dialogue box.
The variable $infile will be set to the full pathname of the chosen
file.

2. The open command is used to open files for input and output.
The first parameter is the name of a file variable used to reference
the file in the program. No special characters are used in the name
and the convention is to use upper case. Output files require a
greater than sign as a prefix.

3. The double vertical bars are actually an OR operator and this
construct will cause the die operator to be invoked if the open
statements failed and will cause the script to terminate.

4. The print statement is used to write output to a named file and
takes as arguments, a file variable and the string to print.
Embedded variables are allowed in the string, along with special
characters such as \n for new line.

[Figure 3 - Book category file display]

5. The while loop shows several features of perl. This part of the
program is used to read a file of book codes and descriptions into
an associative array. The <CTC> construct returns the next line
from the file opened to the CTC variable placing the results in a
special variable called $_. When the end of the file is reached, its
value will be null and the while loop will terminate.

The line read from the file includes the carriage return character
and the chop command is used to remove the last character from a
string. In this case, the chop command operates on the $_
variable.

The category description file contains book codes followed by a tab
character and the book description. The split function is used to
create an array/list  based on splitting up a string ($_)  using the
tab character (specified as \t) as the delimiter. In this case, I have
specified two scalar variables in parentheses as my list.  On the
following line, the category description array is loaded into the
$bookcategory associate array.

The main program logic occurs in the while (<IN>) loop where
each line of the text version of the spreadsheet is split up into six
variables. The s/"//g; command is a string replacement command
to remove the double quotation marks created by Excel in Save As
Text mode. The g denotes a global replace, in this case on the $_
variable.

The remainder of the loop shows the use of if statements to
process the file and create a set of files based on the category
codes. The open(OUT, ">books".$bcat.".htm") code shows how
file names can be generated from variables.

Each file created has a TITLE tag containing the category
description and repeated in an H2 heading tag. If the book is in my
library, three red asterisks are printed with the FONT tag.
Once the program has been run, there will be a file called
index.html and many files such as books2.htm. The index.html will
have been generated as follows:

<HTML><HEAD>
<TITLE>Books I would like to read in my
lifetime</TITLE></HEAD>
<BODY BGCOLOR="#FFFFFF"><HR><H2>Lifetime
Reading</H2>
<ul>
<li><A HREF="books1.htm">Greece</A>
<li><A HREF="books2.htm">Rome</A>

..lines skipped....

<li><A HREF="books31.htm">Science, Mathematics &
Technology</A>
<li><A HREF="books33.htm">Computer Science</A>
</ul>


<HR></BODY></HTML>

Figure 4 - The index.html file displayed with Nestcape

<HTML><HEAD>
<TITLE>Category: Science, Mathematics &
Technology</TITLE></HEAD>
<BODY BGCOLOR="#FFFFFF"><HR><H2>Science,
Mathematics & Technology</H2>
<strong>Brand, Stuart</strong>, The Media Lab:
Inventing the Future at MIT
 <FONT COLOR=red>***</FONT> <BR><P>
<strong>Crick, Francis</strong>, What Mad Pursuit:
A personal view of scientific discovery
 <FONT COLOR=red>***</FONT> <BR><P>
<strong>Dawkins, Richard</strong>, The Blind Watchmaker
<BR><P>
<strong>French (ed)</strong>, Einstein - A
Centenaray Volume
 <FONT COLOR=red>***</FONT> <BR>A treasury of
material on the life, work and worth of Einstein as
a scientist and a human being. its value is
enormously enhanced by perceptive contributions
from illustrious contemporaries of Einstein
<P>
....some stuff deleted...

<HR></BODY></HTML>


Figure 5 - Book details on display

What next?

I hope I haven't lost you completely! Perl is a very versatile tool
and worth spending the time to master. Later, I will show the
tools I have developed to create an index of my web documents
and checking the integrity of the links on my Web site.

I recommend the books "Learning Perl" and "Programming Perl"
from O'Reilly and Associates, but there are more books on the
market and a Perl for Dummies is due for release next year. Be
aware that these books are usually written for Unix, but this is not
a problem as long as you are aware of the differences.

A final note: Full documentation of MacPerl is supplied in HTML
format including the Mac specific information.


Further Details:

O'Reilly and Associates, Publishers
        http://www.ora.com
The main Perl web page
        http://www.perl.com
The MacPerl page
        http://www.iis.ee.ethz.ch/~neeri/macintosh/perl.html

MacPerl is now available from the Club Mac BBS along with the
examples from this article.