2. Bottom-up recursion in LexAna
Although LexAna is bottom up parser, special type of recursion is implemented anyway. Parser rule of the form:
@ ->Regex
has special meaning. Regex expression must have at least one capturing group. For instance, the parsing rule:
@ -><([^<>]+)>
sends the piece of token string between angle brackets to the parser (calling the same parser with string part as argument). After that, Lexana deletes brackets and replaces the piece of token string with given result.
Problem 1: Parsing
numerical expressions using recursion
Input string:
"-1+8*(4+20/10)*(-3-2)-2+10*(-1+2)"
|
Scanner:
n ->\d+
. ->[()]
m ->[*/]
a ->[+-]
|
Parser:
# Recursion call syntax is: @->X(...)Y, where X and Y are regexes
and (...) is a capturing group
@ ->\(([^()]+)\)
# T - Term is number or expression (or something else, see below)
T ->n|S
# F - Factor is term [*/] term, providing operator
precedence (first */ and then +-)
F ->T(mT)*
# S - Expression is factor [+-] factor
S ->a?F(aF)*
|
Result of the scanning is the token string:
ananm(nanmn)m(anan)ananm(anan)
Recursion rule @ deletes brackets and calls Parser again, where the
argument is string inside. For example, content in leftmost bracket is :
nanmn
LexaAna Parser with this argument returns token S. Result after recursion rule @ is:
ananmSmSananmS
After that, Parser continues and final result is S,
expression is correct. Parsing tree is:
(-, 1, +, (8, *, (4, +, (20, /, 10)), *, (-, 3, -, 2)), -, 2, +, (10,
*, (-, 1, +, 2)))
For nested brackets we have to put sufix + after symbol @. There is no backtracking in LexaAna parser. After recursion call
result will be replaced, no matter what that
result is.
This example is very significant and useful, especially for operators precedence. It is easy to expand it. If
we add, for instnce, another Scanner rule:
v ->[a-zA-Z]+
and change term rule of Parser into:
T->n|S|v
our parser can receive and variables too. Furthermore,
adding vS of the Term rule,
parser recognises and functional calls.
T->n|S|vS|v
Remember, vS
must be before of v
in alternation rule. Also, adding more operators (power, relations < = >, logical operations... ) is
almost trivial:
Input string:
"2+3*(a-5)+f(x-2)>0"
# "(a>b|a!=2) & (x<=(y+5)*2 | 2+2>5)"
|
Scanner:
n->\d+
v ->[a-zA-Z]+
. ->[()]
m ->[*/]
a ->[+-]
p ->\^
c->[=!<>]=|[<>]
|
Parser:
# @+ means: do @ multiple times (for nested brackets)
@+->\(([^()]+)\)
T->n|S|vS|v
P ->T(pT)*
F ->P(mP)*
S->a?F(aF)*
C->ScS
|
Result:
C, ((2, +, (3, *, (a, -, 5)), +, (f, (x, -, 2))),
>, 0)
or L, (((a, >, b), |, (a,
!=, 2)), &, ((x, <=, ((y, +, 5), *, 2)), |, ((2, +, 2), >, 5)))
|
Adding Scanner rule:
l ->[&|]
and Parser rule at the end of the list:
L+ ->[LC](l[LC])*
we allowed our parser to recognizes and logical expressions
(with brackets) too.
LexAna Parser with implemented recursion, process inner parentheses first (like a human). Without recursion, LexAna parser would be just an interesting idea. Altough very powerful, there are some problems:
LexAna Parser with implemented recursion, process inner parentheses first (like a human). Without recursion, LexAna parser would be just an interesting idea. Altough very powerful, there are some problems:
- Difficult to debugging
- Recursion can’t determinates beginning (end) of token string, just beginning of argument string.
- All parsing rules response equaly in recursion call and regular execution.
- During expression evaluation, this process corupts every identificators and numbers assiging it to S symbol. Before recursion call, we need to hide it.
Example: Very simple parser for very simple regular expressions (character classes, capturing groups, simple quantifiers, alternation)
Input string:
"\-?(0|[1-9]\d*)(\.\d+)?([eE][+-]=\d+)?"
#
|
Scanner:
m->\\[.^$*+?{}[\]\\|()-]
c->\\[dDwWsSrntv]|\^|\$ .->[*+?[\]|()-] s->[\s\S] |
Parser:
C ->\[[^\]]*\] @+ ->\(([^()]*)\) E ->[RmcsC\]-] S ->(E[*+?]?)+ R ->S(\|S)* |
Result:
R, (\-, ?, (0, |, (([, 1, -, 9, ]), \d, *)), (\., \d, +), ?, (([, e, E, ]), ([, +, -, ]), =, \d, +), ?)
#
|
Example: Multifunctional call in variable expression
Input string:
"2*(f(x-2,2*g(x+1))-3)"
# "f(1,2),3"
|
Scanner:
n->\d+
w ->[a-zA-Z]+
m ->[*/]
s ->[+-]
. ->[(,)]
|
Parser:
@+ ->\(([^()]+)\)
T ->n|w[LS]|w|S
F ->T(mT)* S ->s?F(sF)* # L as a list of expressions L ->S(,S)+ |
Result:
S, (2, *, ((f, (x, -, 2), (2, *, (g, (x, +, 1)))) , -, 3))
or L, ((f, (1, 2)), 3) This is an error
|
Input string:
"2*g(x+1,a[0,s(0)])-1"
|
Scanner:
n->\d+
f
->[a-zA-Z]+(?=\()
b->[a-zA-Z]+(?=\[)
w
->[a-zA-Z]+
m ->[*/]
s ->[+-]
.
->[(,)[\]]
|
Parser:
@+
->\[([^[\]]+)\]
@+
->\(([^()]+)\)
T ->n|w|[fb][LS]|S
F ->T(mT)* S ->s?F(sF)* L ->S(,S)+ |
Result:
S, ((2, *, (g, ((x, +, 1), (a, (0, (s, 0)))))), -, 1)
|
After recursion call brackets are gone, how to distinct array from function? So, we can use Pebbles mode.
Pebbles mode in LexAna
Example: If we parse simple numerical expression in Pebbles mode with Parser like this one:
Input string:
"-2/1-7*8/2+9-1"
|
Scanner:
n->\d+
m ->[*/] a ->[+-] |
Parser:
F->n(mn)*
S->a?F(aF)* |
Parsing array looks like this:
Or like this:
S, (S:aFaFaFaF, -, (F:nmn, 2, /, 1), -, (F:nmnmn, 7, *, 8, /, 2), +, 9, -, 1)
The first cell of an array contains Pebbles informations. For instance, the cell with index 2 is a factor, containing lexical units nmn, while all array has Pebbles information S:aFaFaFaF.
It could be very useful to distinct different lexical and parsing elements after Parsing.
Example: Parsing a JSON string with pebbles (see previous chapter at the end)
Input string:
'''
{ "id": 1, "Surname": [], "price": 100, "Handler": [ "Bar", "Foo", {}, "Q",1], "Stock": {"Market": 300, "goods": [1], "Retail": {"old":1} }} ''' |
Scanner:
# No scanner. If you avoid it, Lexana works with it's internal scanner
|
Parser:
# Grammar: J = { (str : J) , (str : J)*} | [ J(,J)* ] | str | num | label _->; J ->d|l|q(?!:)|\{\}|\[\] J+ ->\[J(,J)*\]|\{q:J(,q:J)*\} |
J
.J:qJqJqJqJqJ .id .1 .Surname .[] .price .100 .Handler ..J:JJJJJ ..Bar ..Foo ..{} ..Q ..1 .Stock ..J:qJqJqJ ..Market ..300 ..goods ...J:J ...1 ..Retail ...J:qJ ...old ...1 # All pebbles are bolded |
Now it's easy to recognize what kind of syntactic unit is after "Stock", for example (An object).
Summary: Let T ->regex be a parsing rule and string XYZ is the regex match. After replacing group of tokens XYZ with token T, LexAna in Pebbles mode adds, at the beggining of wrapped array, string T:XYZ.
0 Comments:
Post a Comment
Subscribe to Post Comments [Atom]
<< Home