On Thu, Aug 31, 2000 at 01:15:50PM -0400, Jeff Pinyan wrote: > I've been working on a mathematical expression parser off and on for a > year or so -- the goal is to make a module to break an expression down > into nodes, and (among other things) take the derivative. > > I don't know yet what tool I'll be using to lex the expression. I run > into much doodoo because of the following (abbreviated) problem: > > # OPEN = ( > # CLOSE = ) > # OP_H = ^ > # OP_M = [*/] > # OP_L = [+-] > # other names should be self-explanatory > > # ex: $expr is "3x^2" > > TERM ::= > OPEN TERM CLOSE | > FUNC OPEN TERM CLOSE | > TERM OP_H TERM | > NUM TERM | > CONST TERM | > VAR TERM | > TERM OP_M TERM | > TERM OP_L TERM | > NUM | > CONST | > VAR > > As you might be able to see, "3x^2" would match at TERM OP_H > TERM. However, the first TERM in 'TERM OP_H TERM' matches at TERM OP_H > TERM again. A lot of ugly recursion. Blegh. > > How do I get around this nasty recursion? I'm at a bit of a loss. Yes, there is, but giving the grammar alone there isn't a good answer. It depends on your parser. Parsers tend to be right or left recursive. Parse::RecDescent is left recursive, yacc is right-recursive. Your grammar is both. ;-) Here's part of the grammar I used in my Parse::RecDescent talk; it includes a grammar to parse expressions: expression: term '+' expression | term '-' expression | term term: factor '*' term | factor '/' term | factor factor: number | variable | '(' statement ')' number: /\d+/ variable: /[a-z]+/i However, this has several problems. First is that to parse a simple term or factor, the Parser has to try (and reject) two alternatives of expression or term first, hence doing to much work. Furthermore, it makes the operators right associative, instead of the expected left. That is: 1 - 2 + 3 would be parsed as: 1 - (2 + 3) Hence, Parse::RecDescent has some "directives" to make life much easier. One can write the above grammar as: expression: <leftop: term ('+' | '-') term> term: <leftop: factor ('*' | '/') factor> factor: number | variable | '(' statement ')' number: /\d+/ variable: /[a-z]+/i This reduces the amount of backtracking involved, fixes the associativeness (there's (of course) also a right associative directive), and makes life in general much simpler. Of course, if you use a yacc like tool, you need to do worry about right recursive ness. (You mentioned lexing, lexing is the process of turning your raw data into a token stream. Some language, like C, use context free tokenizers, which is why you can't parse "i++++j". Parse::RecDescent doesn't use a separate tokenizer which makes life sometimes easier, sometimes harder. (Try having a grammar for a language that allows nested comments between tokens - as the grammar in RFC::RFC822::Address has.)) Also note that your attempt to put all case in a single rule means there's no precedence between operators. The expression/term/factor split is what makes precedence. Abigail ==== Want to unsubscribe from Fun With Perl? Well, if you insist... ==== Send email to <fwp-request@technofile.org> with message _body_ ==== unsubscribe