[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

Re: [FWP] RecDescent, perhaps? math expression parsing

To: Jeff Pinyan <jeffp@crusoe.net>
Subject: Re: [FWP] RecDescent, perhaps? math expression parsing
From: abigail@foad.org
Date: Thu, 31 Aug 2000 15:08:59 -0400
Cc: fwp@technofile.org
In-Reply-To: <Pine.GSO.4.21.0008311139040.58-100000@crusoe.crusoe.net>; from jeffp@crusoe.net on Thu, Aug 31, 2000 at 01:15:50PM -0400
References: <Pine.GSO.4.21.0008311139040.58-100000@crusoe.crusoe.net>

On Thu, Aug 31, 2000 at 01:15:50PM -0400, Jeff Pinyan wrote:
> I've been working on a mathematical expression parser off and on for a
> year or so -- the goal is to make a module to break an expression down
> into nodes, and (among other things) take the derivative.
> 
> I don't know yet what tool I'll be using to lex the expression.  I run
> into much doodoo because of the following (abbreviated) problem:
> 
>   # OPEN = (
>   # CLOSE = )
>   # OP_H = ^
>   # OP_M = [*/]
>   # OP_L = [+-]
>   # other names should be self-explanatory
> 
>   # ex: $expr is "3x^2"
> 
>   TERM ::=
>     OPEN TERM CLOSE |
>     FUNC OPEN TERM CLOSE |
>     TERM OP_H TERM |
>     NUM TERM |
>     CONST TERM |
>     VAR TERM |
>     TERM OP_M TERM |
>     TERM OP_L TERM |
>     NUM |
>     CONST |
>     VAR
> 
> As you might be able to see, "3x^2" would match at TERM OP_H
> TERM.  However, the first TERM in 'TERM OP_H TERM' matches at TERM OP_H
> TERM again.  A lot of ugly recursion.  Blegh.
> 
> How do I get around this nasty recursion?  I'm at a bit of a loss.

Yes, there is, but giving the grammar alone there isn't a good answer.
It depends on your parser. Parsers tend to be right or left recursive.
Parse::RecDescent is left recursive, yacc is right-recursive. Your grammar
is both. ;-)

Here's part of the grammar I used in my Parse::RecDescent talk; it includes
a grammar to parse expressions:

    expression:     term '+' expression
                  | term '-' expression
                  | term

    term:           factor '*' term
                  | factor '/' term
                  | factor

    factor:         number
                  | variable
                  | '(' statement ')'

    number:         /\d+/
    variable:       /[a-z]+/i

However, this has several problems. First is that to parse a simple term
or factor, the Parser has to try (and reject) two alternatives of expression
or term first, hence doing to much work. Furthermore, it makes the operators
right associative, instead of the expected left. That is:

    1 - 2 + 3

would be parsed as:

    1 - (2 + 3)

Hence, Parse::RecDescent has some "directives" to make life much easier.
One can write the above grammar as:

    expression:     <leftop: term   ('+' | '-') term>

    term:           <leftop: factor ('*' | '/') factor>

    factor:         number
                  | variable
                  | '(' statement ')'

    number:         /\d+/
    variable:       /[a-z]+/i

This reduces the amount of backtracking involved, fixes the associativeness
(there's (of course) also a right associative directive), and makes life
in general much simpler. 

Of course, if you use a yacc like tool, you need to do worry about right
recursive ness. (You mentioned lexing, lexing is the process of turning
your raw data into a token stream. Some language, like C, use context
free tokenizers, which is why you can't parse "i++++j". Parse::RecDescent
doesn't use a separate tokenizer which makes life sometimes easier,
sometimes harder. (Try having a grammar for a language that allows nested
comments between tokens - as the grammar in RFC::RFC822::Address has.))

Also note that your attempt to put all case in a single rule means there's
no precedence between operators. The expression/term/factor split is what
makes precedence.

Abigail

==== Want to unsubscribe from Fun With Perl?  Well, if you insist...
==== Send email to <fwp-request@technofile.org> with message _body_
====   unsubscribe

References:
- [FWP] RecDescent, perhaps? math expression parsing
  - From: Jeff Pinyan <jeffp@crusoe.net>

Prev by Date: RE: [FWP] Traffic Signals Should be Rationalized
Next by Date: Re: [FWP] another word for object?
Prev by thread: [FWP] RecDescent, perhaps? math expression parsing
Next by thread: Re: [FWP] RecDescent, perhaps? math expression parsing
Navigation: Date Index | Thread Index | Search | Other lists at bumppo.net