Hello, I'm a quiet subscriber to this list, and my only posting so far has been a request for information on Applescript. I'm writing an article on MacPerl for the magazine of Sydney's Macintosh User group called "Club Mac". Their magazine is called Macinations and has a circulation of around 1500 copies. I thought I would send this article to the list for your comments and suggestions. Please respond quickly as I need to submit the final copy in about 72 hours time. Other MacPerl projects I have on the go are programs to check the internal link integrity of a web site, indexing all HTML documents, and a Unix like shell. Thanks in advance for your comments, Charles Cave charles@jolt.mpx.com.au The article was written in Word 5.1, so all the nice formatting and bolding has disappeared. Let me know if you want the Word version. The Figures refer to screen shots. Introducing MacPerl Charles Cave © 1996 Perl is a scripting language written by Larry Wall initially for the Unix environment. The motivation for its development was for scanning arbitrary text files, extracting information from those files, and printing reports. The language is intended to be practical (easy to use, efficient, complete) rather than beautiful (tiny, elegant, minimal). Documentation for the language is the "Programming Perl" book by Larry Wall and published by O'Reilly and Associates. Books from this publisher are distinguished by the animals on the front cover, and "Programming Perl" is most notable by its camel. Consequently, the camel has become the icon and symbol of anything to do with perl. I learnt perl on several Unix machines, and was delighted to find an implementation on the Macintosh, not surprisingly named MacPerl. What I like about perl is the ease in which scripts can be written to do a variety of tasks: reading the contents of files and searching for patterns of characters; processing the contents of a directory; and making my own utilities. A major benefit of using perl is its availablility on Unix, DOS, Windows/95 and Macintosh. The same code can be used on all platforms with the exception of specific system routines. If programming is not your strength, you will find that perl is very powerful without the complexity of C or C++. Perl has become the default language of Web server scripts (gateway scripts), and MacPerl can be used with Macintosh Web servers. MacPerl The MacPerl implementation (by Matthias Neeracher in Switzerland) provides an interface to some components of the toolbox, such as dialogue boxes (similar to the Ask and Answer commands of Hypertalk), interfaces to AppleScript and the ability to save a script as a droplet for drag and drop support. Technical support for MacPerl is excellent and a mailing list exists where users (and Matthias) help each other. My primary usage for MacPerl is the conversion of spreadsheets and FileMaker Pro databases into HTML documents for the World Wide Web. The remainder of this article is a tutorial in MacPerl and a demonstration of how a spreadsheet can be converted into HTML. The Spreadsheet [Figure 1: Screen shot of the spreadsheet] A spreadsheet is a useful method of storing information as a database, with each column representing a field. In this example, I have a spreadsheet of book titles that make up my reading wish list. The sheet has the following fields: Classification Code Book Title Author Notes about the book A flag (Y) indicating if I own the book The information is sorted first by classification code, then author followed by book title. [Figure 2: Save Spreadsheet as Text] In order to process the spreadsheet with Perl, it is necessary to save a copy in Text format. This will reduce each row to a line in the file with each column separated by a tab character. A similar thing can be done with FileMaker Pro, so this tutorial is equally applicable to the conversion of databases. The goal of this program is to display the categories of books as a list of links to separate files for each category. The Tutorial Perl has three types of variables: scalar variables, arrays or lists, and associative arrays allowing very sophisticated data processing. Scalar variables are used for storing numbers and strings of characters and these variable names always begin with a dollar sign: $myname = "Charles"; $children = 2; $cost = 15.5; Arrays or lists are sets of scalar variables where each element can be independently accessed. Array variables always start with an @ sign. @animals = ('cat', 'dog', 'mouse'); print "First animal is @animals[0]\n"; # will print cat Associative arrays are a more powerful form of array where a string of characters can be used as the index. This is very useful for creating simple databases systems, lookup tables and concordances. $capital{"NSW"} = "Sydney"; $capital{"Victoria"} = "Melbourne"; print "The capital of NSW is $capital{\"NSW\"}\n"; The program My program to process the text file version of the spreadsheet will need to open the file and process each line of the file. The individual cells or fields of the spreadsheet will be separated by tab characters, so it will be necessary to break up the line into components. Each category of books needs to be written to a separate file, so logic is required to check the categories. The logic of the program is expressed in "pseudo-code" which is meant to be a human-readable version of the program. Open the text file for processing Load the category descriptions into an associative array Read each line of the input file and do the following for each line: If the book category is different to the previous line Write the new category details to the category listing Close the previously opened file of book details if appropriate Open a new file for book details When the end of the input file is reached, close both files. # MacPerl program written by Charles Cave to convert a tab delimited # text file into HTML. require "GUSI.ph" ; $infile = &MacPerl'Choose( &GUSI'AF_FILE, 0, "", ""); open(IN,$infile) || die "could not open input file"; open(NDX,">index.html") || die "could not create index.html"; print NDX "<HTML><HEAD>\n<TITLE>"; print NDX "Books I would like to read in my lifetime</TITLE></HEAD>\n"; print NDX "<BODY BGCOLOR=\"#FFFFFF\"><HR>"; print NDX "<H2>Lifetime Reading</H2>\n<ul>\n"; $firsttime = "Y"; $docfooter = "\n\n<HR></BODY></HTML>\n"; # load the codetable.txt file into an associative array open(CTC,"codetable.txt") || die "could not create codetable.txt file"; while(<CTC>) { chop; ($bcat, $codedesc) = split(/\t/, $_); $bookcategory{$bcat} = $codedesc; } close(CTC); $lastcode = 7777; while (<IN>) { chop; s/"//g; # get rid of extraneous double quotes generated by Excel ($bcat, $title, $author, $notes, $sortseq, $library) = split(/\t/, $_); if ($lastcode ne $bcat) { $firsttime = "N"; $lastcode = $bcat; if ($firsttime ne "Y") { print OUT "$docfooter"; close(OUT); # close the previously opened file } open(OUT, ">books".$bcat.".htm") || die "couldnt create booksnn.htm file"; print OUT "<HTML><HEAD>\n<TITLE>"; print OUT "Category: $bookcategory{$bcat}</TITLE></HEAD>\n"; print OUT "<BODY BGCOLOR=\"#FFFFFF\"><HR><H2>$bookcategory{$bcat}</H2>\ n"; print NDX "<li><A HREF=\"books".$bcat.".htm\">$title</A>\n"; } else { print OUT "<strong>$author</strong>, $title\n"; if ($library ne "") { print OUT " <FONT COLOR=red>***</FONT> "; } print OUT "<BR>"; if ($notes ne "") { print OUT "$notes\n"; } print OUT "<P>\n"; } } ; print OUT "$docfooter"; print NDX "</ul>\n$docfooter"; close(OUT); close(CTC); close(NDX); Commentary on the code: 1. The &MacPerl'Choose function will prompt the user for the choice of input file using the standard Macintosh dialogue box. The variable $infile will be set to the full pathname of the chosen file. 2. The open command is used to open files for input and output. The first parameter is the name of a file variable used to reference the file in the program. No special characters are used in the name and the convention is to use upper case. Output files require a greater than sign as a prefix. 3. The double vertical bars are actually an OR operator and this construct will cause the die operator to be invoked if the open statements failed and will cause the script to terminate. 4. The print statement is used to write output to a named file and takes as arguments, a file variable and the string to print. Embedded variables are allowed in the string, along with special characters such as \n for new line. [Figure 3 - Book category file display] 5. The while loop shows several features of perl. This part of the program is used to read a file of book codes and descriptions into an associative array. The <CTC> construct returns the next line from the file opened to the CTC variable placing the results in a special variable called $_. When the end of the file is reached, its value will be null and the while loop will terminate. The line read from the file includes the carriage return character and the chop command is used to remove the last character from a string. In this case, the chop command operates on the $_ variable. The category description file contains book codes followed by a tab character and the book description. The split function is used to create an array/list based on splitting up a string ($_) using the tab character (specified as \t) as the delimiter. In this case, I have specified two scalar variables in parentheses as my list. On the following line, the category description array is loaded into the $bookcategory associate array. The main program logic occurs in the while (<IN>) loop where each line of the text version of the spreadsheet is split up into six variables. The s/"//g; command is a string replacement command to remove the double quotation marks created by Excel in Save As Text mode. The g denotes a global replace, in this case on the $_ variable. The remainder of the loop shows the use of if statements to process the file and create a set of files based on the category codes. The open(OUT, ">books".$bcat.".htm") code shows how file names can be generated from variables. Each file created has a TITLE tag containing the category description and repeated in an H2 heading tag. If the book is in my library, three red asterisks are printed with the FONT tag. Once the program has been run, there will be a file called index.html and many files such as books2.htm. The index.html will have been generated as follows: <HTML><HEAD> <TITLE>Books I would like to read in my lifetime</TITLE></HEAD> <BODY BGCOLOR="#FFFFFF"><HR><H2>Lifetime Reading</H2> <ul> <li><A HREF="books1.htm">Greece</A> <li><A HREF="books2.htm">Rome</A> ..lines skipped.... <li><A HREF="books31.htm">Science, Mathematics & Technology</A> <li><A HREF="books33.htm">Computer Science</A> </ul> <HR></BODY></HTML> Figure 4 - The index.html file displayed with Nestcape <HTML><HEAD> <TITLE>Category: Science, Mathematics & Technology</TITLE></HEAD> <BODY BGCOLOR="#FFFFFF"><HR><H2>Science, Mathematics & Technology</H2> <strong>Brand, Stuart</strong>, The Media Lab: Inventing the Future at MIT <FONT COLOR=red>***</FONT> <BR><P> <strong>Crick, Francis</strong>, What Mad Pursuit: A personal view of scientific discovery <FONT COLOR=red>***</FONT> <BR><P> <strong>Dawkins, Richard</strong>, The Blind Watchmaker <BR><P> <strong>French (ed)</strong>, Einstein - A Centenaray Volume <FONT COLOR=red>***</FONT> <BR>A treasury of material on the life, work and worth of Einstein as a scientist and a human being. its value is enormously enhanced by perceptive contributions from illustrious contemporaries of Einstein <P> ....some stuff deleted... <HR></BODY></HTML> Figure 5 - Book details on display What next? I hope I haven't lost you completely! Perl is a very versatile tool and worth spending the time to master. Later, I will show the tools I have developed to create an index of my web documents and checking the integrity of the links on my Web site. I recommend the books "Learning Perl" and "Programming Perl" from O'Reilly and Associates, but there are more books on the market and a Perl for Dummies is due for release next year. Be aware that these books are usually written for Unix, but this is not a problem as long as you are aware of the differences. A final note: Full documentation of MacPerl is supplied in HTML format including the Mac specific information. Further Details: O'Reilly and Associates, Publishers http://www.ora.com The main Perl web page http://www.perl.com The MacPerl page http://www.iis.ee.ethz.ch/~neeri/macintosh/perl.html MacPerl is now available from the Club Mac BBS along with the examples from this article.