Peggy - Syntax
Syntax of Peggy
Peggy support PEG and a notation based on this wikipedia page.
Self definition of Peggy syntax
Here is complete definition of Peggy grammer which is used for bootstrapping.
Lexical Rule
Identifier of Peggy syntax is alphabet and digit sequence starting lower-case letter.
ident ::: String = [a-z] [0-9a-zA-Z_]* { $1 : $2 }
Peggy syntax is ignore spaces. Space is just a delimiter. Any code layout can be allowd.
Comment
Haskell-like comment are allowed.
-- This is comment hoge = ... {- region comment {- and nested comment -} are also usable -} moge = ...
Definition
Peggy syntax is some definitions of nonterminal symbol. Each definition is looks like this:
<name> :: <Type> = <expr>
or another syntax of definition:
<name> ::: <Type> = <expr>
Here is typical form of a nonterminal definition.
<name> :: <Type> = sequence ... / sequence ... ...
Left recursion
Peggy allows left recursion by transforming a grammer to a grammer which does not contain left recursions.
Expression
Syntax of expression is based on PEG’s one. An expression is constructed by following rules.
String Literal
String literal matches exact same sequence of chars.
'hoge' -- char sequence "hoge" -- string token
A string quoted by single-quote (’) is raw-literal, which matches exact same char sequence. A string quoted by double-quote (") is string token. It means delimited string. A particular behaviour of token is descrived here.
String literal has no value.
Charset
An usual character notation like regular expression.
[a-zA-Z0-9_] -- lower letters, upper letters, digits and under-score (_) [^0-9] -- a char except digits
Charset has a character value.
Ordered Choice
Ordered choice is sequence of expression separated by ‘/’.
<e1> / <e2> / ...
It accepts any alternative. In contrast to CFG, PEG select a most left match always. So the order of choice is meaningfully. The type of all sub-expression must be same.
Sequence
Sequence is a sequence of expression.
<e1> <e2> ...
Semantic Annotation
Each sequence can has semantic rule. It specify a return value of the term.
<e1> <e2> ... { Haskell Expression ... }
Haskell expression specify the return value of expression. It must be a single Haskell expression. Here is an example:
expr :: Int = expr "+" term { $1 + $2 }
Haskell expression can contain placeholder which forms $<num>. The number means the n-th value of sub-expression. Literal and predicate do not has a value, so it is skipped. In above example, $1 indicates expr’s value and $2 indicates term’s value.
Sequence without semantic annotation has a value of tupple of subterms.
hoge :: (Int, Int, Int) = integer "," integer "," integer
Zero or more, One or more, Optional (Zero or One)
The suffix ‘*’ means zero or more repetition.
<e>*
The suffix ‘+’ means one or more repetition.
<e>+
The suffix ’?’ means one or zero occursion.
<e>?
Unlike in CFG, these operator always behave greedily, these match as long as possible.
‘*’ and ‘+’ returns a list of the value. ’?’ returns a value typed Maybe a.
And predicate, Not predicate
The prefix ’?’ means syntax predicates. It look-ahead an expression and if fail to parse, the predicate fails, too. When it success, predicate successes too, but it consume no input.
&<e>
The prefix ’!’ is inverse of and predicate. It fails when look-ahead success.
!<e>
Here is an example of usage of predicate, it parses C-like nested comment.
regionComment :: () = '/*' (regionComment / (!"*/" . { () }))* '*/' { () }
Token behaviour
Peggy recognizes (implicit) token. Token is self delimited character sequence. It ignores pre/post white spaces. It must be delimited (token “for” does not match to character sequence “foreach”).
White space and delimtier semantic can cahnge by defining two special nonterminals, which are “space” and “delimiter”.
space :: () = [ \r\n\t] { () } / lineComment / recionComment delimiter :: () = [()[]{}<>;:,./\] { () }
If these nonterminals are not defined, default implementation is used.