The pState Parser
Parsing Labels
State and transition labels are strings of Unicode characters that need to be converted to objects representing their structure. Both the state label parser and the transition label parser read a label, analyze if it is correct according to the grammar, construct a representation, simplify the representation, and check certain well-formedness conditions, all in one pass. The grammar of labels is defined in two steps:
- one grammar defines the syntax symbols in terms of characters
- one grammar defines the syntax of labels in terms of symbols
The scanner conceptually takes a string and returns a sequence of symbols. It is implemented here as a procedure, getSym()
, that is called by each parser whenever the parser needs the next symbol. Both parsers are a set of mutually recursive procedures according to the technique of recursive descent parsing with one symbol lookahead. The scanner and some parsing procedures are shared by both parsers.
Scanning Labels
A symbol is either an integer literal, a fraction literal, an exponent literal, an identifier, or an operator.
- An integer literal is a (non-empty) sequence of digits
0
, …,9
- A fraction literal is a (non-empty) sequence of digits with possibly a dot
.
at the beginning, in between, or end; a fraction literal may also be a Unicode vulgar fraction½
,⅓
, … - An exponent literal is a (non-empty) sequence of superscript digits
⁰
, …,⁹
, possibly preceded by a superscript minus⁻
; - An identifier may contain uppercase letters
A
, …,Z
, lowercase lettera
, …,z
, digits, and subscripted digits₀
, …,₉
, but must not start with a number. (For code generation purposes, identifiers may not contain an underscore_
.) - Operators are all mathematical symbols that can appear in types, expressions, statements, and labels.
Identifiers are used to name states, variables, constants, and events. Certain identifiers, like if
and max
, have a predetermined meaning and are treated differently by the parser, but are not treated differently by the scanner. The syntax of symbols is defined as:
Symbol ::= Integer | Fraction | Exponent | Identifier | Operator | Keyword
Integer ::= Digit {Digit}
Fraction ::= Digit {Digit} [. {Digit}] | . Digit {Digit} |
½ | ⅓ | ⅔ | ¼ | ¾ | ⅕ | ⅖ | ⅗ | ⅘ | ⅙ | ⅚ | ⅐ |
⅛ | ⅜ | ⅝ | ⅞ | ⅑ | ⅒
Exponent ::= [⁻] Supdigit {Supdigit}
Identifier ::= Letter {Letter | Digit | Subdigit}
Digit ::= 0 | … | 9
Supdigit ::= ⁰ | … | ⁹
Subdigit ::= ₀ | … | ₉
Operator ::= + | − | × | / | ∑ | ∏ | ( | ) | [ | ] | { | } |
# | ∈ | ∉ | ⊂ | ⊆ | ⊃ | ⊇ | ∩ | ∪ | \ |
¬ | ∧ | ∨ | ≡ | ≢ | ⇒ | ⇐ |
= | ≠ | < | ≤ | > | ≥ |
. | ; | , | ? | : | $ | ^ | ┃ | ‥ | ∆ | @ | ≔ | ‖
A state or transition label is a sequence of symbols separated by spaces, tabs, and newline characters:
Label ::= {[Blank] Symbol} {Blank}
Blank ::= ' ' | '\t' | '\r' | '\n'
The principle of the longest match is used to resolve ambiguity: the character sequence Algol60+ 7
is recognized as the identifier Algol60
, operator +
, integer 7
, rather than identifier A
, identifier l
, etc., or identifier Algol
, integer 60
, operator +
, etc.
Procedure getSym()
scans the next symbol sets global variables sym
and val
. Variable sym
becomes
INT
when an integer is scanned; the value is inval
,FRAC
when a fraction is scanned; the value is inval
,EXP
when an exponent is scanned; the value is inval
,IDENT
when an identifier is scanned; the value is inval
,- a character of
operator
when an operator is scanned, None
when the end of the input is reached.
Procedure getSym()
calls getChar()
for assigning the next character of src
to ch
. At the end of the input, getChar()
assigns None
is assigned to ch
.
from fractions import Fraction
from unicodedata import digit
from pstate.core.core import *
INT = 1; FRAC = 2; EXP = 3; IDENT = 4
VULG = {'½': Fraction(1, 2), '⅓': Fraction(1, 3), '⅔': Fraction(2, 3),
'¼': Fraction(1, 4), '¾': Fraction(3, 4), '⅕': Fraction(1, 5),
'⅖': Fraction(2, 5), '⅗': Fraction(3, 5), '⅘': Fraction(4, 5),
'⅙': Fraction(1, 6), '⅚': Fraction(5, 6), '⅐': Fraction(1, 7),
'⅛': Fraction(1, 8), '⅜': Fraction(3, 8), '⅝': Fraction(5, 8),
'⅞': Fraction(7, 8), '⅑': Fraction(1, 9), '⅒': Fraction(1, 10)}
SUPDIGIT = {'⁰', '¹', '²', '³', '⁴', '⁵', '⁶', '⁷', '⁸', '⁹'}
SUBDIGIT = {'₀', '₁', '₂', '₃', '₄', '₅', '₆', '₇', '₈', '₉'}
OP = {'+', '-', '×', '/', '∑', '∏', '(', ')', '[', ']', '{', '}',
'#', '∈', '∉', '⊆', '⊇', '⊂', '⊃', '∩', '∪', '\\',
'¬', '∧', '∨', '≡', '≢', '⇒', '⇐',
'=', '≠', '<', '≤', '>', '≥',
'.', ';', ',', '?', ':', '×', '→', '$', '^', '|', '‥', '∆', '@', '≔', '‖'}
KW = {'in', 'bool', 'if', 'then', 'else', 'exp', 'unif', 'div', 'mod', 'true', 'false', 'set',
'bool', 'min', 'max', 'all', 'any', 'some'}
def getChar():
global pos, ch
if pos < len(src): ch, pos = src[pos], pos + 1
else: ch = chr(0)
# TODO identifiers temporarilly modified to allow '.' as a character: pretty-printing more readable
def getSym():
global sym, val
while ch in ' \t\r\n': getChar()
if 'a' <= ch <= 'z' or 'A' <= ch <= 'Z':
s = ch; getChar()
while 'a' <= ch <= 'z' or 'A' <= ch <= 'Z' or '0' <= ch <= '9' or ch in SUBDIGIT or ch == '.':
s += ch; getChar()
if s in KW: sym = s
else: sym, val = IDENT, s
elif '0' <= ch <= '9':
s = ch; getChar()
while '0' <= ch <= '9': s += ch; getChar()
if ch == '.':
s += ch; getChar()
while '0' <= ch <= '9': s += ch; getChar()
sym, val = FRAC, Fraction(s)
else: sym, val = INT, int(s)
elif ch == '.':
s = ch; getChar()
if '0' <= ch <= '9':
s += ch; getChar()
while '0' <= ch <= '9':
s += ch; getChar()
sym, val = FRAC, Fraction(s)
else: sym = '.'
elif ch in VULG: sym, val = FRAC, VULG[ch]; getChar()
elif ch in OP: sym = ch; getChar()
elif ch in SUPDIGIT:
sym, val = EXP, digit(ch); getChar()
while ch in SUPDIGIT:
val = 10 * val + digit(ch); getChar()
elif ch == '⁻':
getChar()
if ch in SUPDIGIT:
sym, val = EXP, digit(ch); getChar()
while ch in SUPDIGIT:
val = 10 * val + digit(ch); getChar()
val = - val
else: raise Error('Exponent missing')
elif ch == chr(0): sym = None
else: raise Error("Unexpected character '" + ch + "'")
TODO attempt to treat keywords as standard identifiers caused a problem with parsing Designator: with div
, mod
as identifiers, a div 3
is parsed as a(div)
; LL(1) condition no longer holds. Other solution?
def testGetSym(s: str):
global src, pos
src, pos, res = s, 0, []; getChar(); getSym()
while sym is not None:
res.append((sym, val if sym in (INT, FRAC, EXP, IDENT) else '')); getSym()
return res
assert testGetSym('<⅜882 S₃ qq33 +/ ⁻⁴⁷ if a.b .99') == [('<', ''), (FRAC, Fraction(3, 8)), (INT, 882),
(IDENT, 'S₃'), (IDENT, 'qq33'), ('+', ''), ('/', ''), (EXP, -47), ('if', ''), (4, 'a.b'),
(2, Fraction(99, 100))]
Parsing Types
Abstractly, the syntax of chart types is:
Type ::= bool | Subrange | Product | Function | Set
Subrange ::= Expression ‥ Expression
Product ::= Type {× Type}
Function ::= Type → Type
Set ::= set Type
The restrictions are:
- The lower and upper bound of subranges must be constant expressions and the subrange must not be empty.
- The domain of a function can be a subrange,
bool
, or product, but not a function or a product containing a function. - The elements of sets can be boolean and enumerations but not products, functions, or sets
- The operator
×
binds tighter than→
, sot × u → v × w
is parsed as[t × u] → [v × w]
. - The operator
→
associates to the right, sot → u → v
is parsed ast → [u → v]
. - The operator
set
binds tighter than→
and×
, soset 0‥1 × 0‥1
is parsed as[set 0‥1] × 0‥1
- A product must have at least two components.
Brackets can be used in types, e.g. t × [u → v]
and [3 ‥ 9]
. (Parenthesis can also appear in subrange bound expressions, e.g. (N + 1) ‥ 10
, which means that parenthesis cannot not be used for types with one symbol lookahead parsing, e.g. (3 ‥ 9)
.)
For parsing, the following concrete syntax is used:
Type ::= Basic {'×' Basic} ['→' Type]
Basic ::= ['set'] (bool | Expression '‥' Expression | '[' Type ']')
TODO allow sets of products, functions, and sets, still assuming a bitset implementation:
set [0..9 × 0..1]
: 20 members, bitset in one word of 32 bitsset [0..15 × 0..15]
: 256 members, bitset in 8 word of 32 bitsset [bool → bool]
: 4 members, bitset in 1 byteset set 0..7
: has 256 = 2^8 members, bitset in 8 word of 32 bits
TODO allow functions as function arguments, to complete the analogy of sets as boolean-valued functions.
def parseBasicType(s: State) -> Type:
if sym == 'set': getSym(); st = True
else: st = False
if sym == 'bool': getSym(); t = Type('bool')
elif sym in FirstExpression:
lb = parseConditional(s)
if type(lb) == Fraction and lb == int(lb): lb = int(lb)
elif type(lb) != int: raise Error('Integer constant expected', s)
if sym == '‥': getSym()
else: raise Error('‥ expected', s)
ub = parseConditional(s)
if type(ub) == Fraction and ub == int(ub): ub = int(ub)
elif type(ub) != int: raise Error('Integer constant expected', s)
if lb > ub: raise Error('Lower bound must be less than upper bound', s)
t = Type('‥', lb, ub)
elif sym == '[':
getSym(); t = parseType(s)
if sym == ']': getSym()
else: print(sym); raise Error('] expected')
else: raise Error('Type expected', s)
return Type('set', t) if st else t
def parseType(s: State) -> Type:
t = [parseBasicType(s)]
while sym == '×': getSym(); t.append(parseBasicType(s))
t = t[0] if len(t) == 1 else Type('×', *t)
if sym == '→': getSym(); u = parseType(s); return Type('→', t, u)
else: return t
def testParseType(label: str) -> Type:
global src, pos
src, pos = label, 0; getChar(); getSym()
return parseType('some state')
FirstExpression = set() # for the test cases below only
assert str(testParseType('bool×bool×bool → bool × bool')) == 'bool × bool × bool → bool × bool'
assert str(testParseType('bool → [bool → [bool × bool]]')) == 'bool → bool → bool × bool'
assert str(testParseType('set bool → bool → set [bool × bool]')) == 'set bool → bool → set [bool × bool]'
assert str(testParseType('bool × [bool → bool] × bool')) == 'bool × [bool → bool] × bool'
del FirstExpression
Parsing Expressions
Abstractly, the syntax of expressions is:
Expression ::= Integer | Fraction | true | false | Designator |
PrefixOp Expression | Expression PostfixOp |
Expression InfixOp Expression |
Expression ? Expression : Expression |
Comprehension | '{' [Expression] '}' | '(' Expression ')'
Comprehension ::= Expression '|' Identifier ∈ Type [, Expression]
Designator ::= Identifier {. Identifier} | Identifier Expression
PrefixOp ::= - | ¬ | sum | prod | min | max | all | any | some | # | in
PostfixOp ::= Exponent
InfixOp ::= , | + | - | × | / | div | mod | ∧ | ∨ | ⇒ | ⇐ | '≡' | = | ≠ |
< | ≤ | > | ≥ | ∈ | ∉ | ⊆ | ⊇ | ⊂ | ⊃ | ∩ | ∪ | \ | ‥
The precedence of operators is:
- Enumeration
,
: lowest precedence - Conditional
?
…:
- Equivalence operators
⇒
,⇐
,≡
, - Disjunction
∨
- Conjunction
∧
- Relational operators
=
,≠
,<
,≤
,>
,≥
,⊂
,⊆
,⊃
,⊇
,∈
,∉
- Additive operators
+
,-
,∪
- Multiplicative operators
×
,/
,div
,mod
,∩
,\
- Prefix operators
-
,¬
,∑
,∏
,min
,max
,all
,any
,some
,#
,in
- Exponent
- Function application
- Qualification
.
: highest precedence
For example, A ∪ B = C ∧ x ∈ A ⇒ b
is parsed as (((A ∪ B) = C) ∧ (x ∈ A)) ⇒ b
. Furthermore:
- Conditionals associate to the right, i.e.
a ? b : c ? d : e
is parsed asa ? b : (c ? d : e)
- Equivalence operators do not associate, i.e.
a ⇒ b ⇒ c
is syntactically incorrect - Relational operators can be chained, i.e.
a ≤ b < c
meansa ≤ b ∧ b < c
- Prefix operators associate to the right, i.e.
- max a
is parsed as- (max a)
The following syntax is used for parsing:
Expression ::= Comprehension {',' Comprehension}
Comprehension ::= Conditional ['|' Identifier '∈' Type [',' Conditional]]
Conditional ::= Equivalence [? Equivalence : Conditional]
Equivalence ::= Disjunction [(≡ | ≢ | ⇒ | ⇐ ) Disjunction]
Disjunction ::= Conjunction {∨ Conjunction}
Conjunction ::= Relational {∧ Relational}
Relational ::= Arithmetic {(= | ≠ | < | ≤ | > | ≥ | ⊂ | ⊆ | ⊃ | ⊇ | ∈ | ∉) Arithmetic}
Arithmetic ::= Term {(+ | - | ∪) Term}
Term ::= Factor {(× | / | div | mod | ∩ | \) Factor}
Factor ::= (-, ¬, ∑, ∏, min, max, all, any, some, #, in) Factor | Base [Exponent]
Base ::= true | false | Integer | Fraction | Designator | '{' [Expression] '}' |
'(' Expression ')'
Designator ::= Identifier {. Identifier} [Expression]
FirstExpression = PrefixOp | {INT, FRAC, IDENT, '{', '('}
PrefixOp = {'⊖', '¬', '#', 'in', 'min', 'max', 'all', 'any', 'some', '∑', '∏', 'set'}
NonAssocCommOp = {',', '-', '/', 'div', 'mod', '⇒', '⇐', '≡', '≢', '=', '≠', '≤', '≥', '<', '>',
'∈', '∉', '⊆', '⊇', '⊂', '⊃', '\\', '‥'}
def parseExpression(s: State) -> Expression:
e = parseComprehension(s)
if sym == ',':
e = [e]
while sym == ',':
getSym(); e.append(parseExpression(s))
e = Application(',', e)
return e
def parseComprehension(s: State) -> Expression:
e = parseConditional(s)
if sym == '|':
getSym()
if sym == IDENT: i = val; getSym()
else: raise Error('identifier expected', s)
if sym == '∈': getSym()
else: raise Error('∈ expected', s)
r = parseType(s)
if r.kind != Kind.SUBRANGE: raise Error('subrange type expected')
if sym == ',': getSym(); b = parseConditional(s)
else: b = True
e = abstraction(i, r, b, e)
return e
def parseConditional(s: State) -> Expression:
e = parseEquivalence(s)
if sym == '?':
getSym(); f = parseEquivalence(s)
if sym == ':': getSym()
else: raise Error(': expected', s)
e = conditional(e, f, parseConditional(s))
return e
def parseEquivalence(s: State) -> Expression:
e = parseDisjunction(s)
if sym in {'≡', '≢', '⇒', '⇐'}:
op = sym; getSym(); f = parseDisjunction(s)
if op == '≡': e = equality(e, f)
elif op == '≢': e = disequality(e, f)
elif op == '⇒': e = implication(e, f)
else: e = consequence(e, f)
return e
def parseDisjunction(s: State) -> Expression:
e = parseConjunction(s)
while sym == '∨':
getSym(); e = disjunction(e, parseConjunction(s))
return e
def parseConjunction(s: State) -> Expression:
e = parseRelational(s)
while sym == '∧':
getSym(); e = conjunction(e, parseRelational(s))
return e
def parseRelational(s: State) -> Expression:
e = parseArithmetic(s)
if sym in {'=', '≠', '<', '≤', '>', '≥', '⊂', '⊆', '⊃', '⊇', '∈', '∉'}:
p, e = e, True
while sym in {'=', '≠', '<', '≤', '>', '≥', '⊂', '⊆', '⊃', '⊇', '∈', '∉'}:
op = sym; getSym(); f = parseArithmetic(s)
p = equality(p, f) if op == '=' else \
disequality(p, f) if op == '≠' else \
less(p, f) if op == '<' else \
lessequal(p, f) if op == '≤' else \
greater(p, f) if op == '>' else \
greaterequal(p, f) if op == '≥' else \
subset(p, f) if op == '⊂' else \
subsetequal(p, f) if op == '⊆' else \
superset(p, f) if op == '⊃' else \
supersetequal(p, f) if op == '⊇' else \
member(p, f) if op == '∈' else \
notmember(p, f)
e, p = conjunction(e, p), f
return e
def parseArithmetic(s: State) -> Expression:
e = parseTerm(s)
while sym in {'+', '-', '∪'}:
op = sym; getSym(); f = parseTerm(s)
e = addition(e, f) if op == '+' else \
subtraction(e, f) if op == '-' else \
union(e, f)
return e
def parseTerm(s: State) -> Expression:
e = parseFactor(s)
while sym in {'×', '/', '∩', 'div', 'mod', '\\'}:
op = sym; getSym(); f = parseFactor(s)
e = multiplication(e, f) if op == '×' else \
fractionaldivision(e, f) if op == '/' else \
integerdivision(e, f) if op == 'div' else \
integermodulo(e, f) if op == 'mod' else \
intersection(e, f) if op == '∩' else \
difference(e, f)
return e
def parseFactor(s: State) -> Expression:
if sym in {'-', '¬', '∑', '∏', '#', 'min', 'max', 'all', 'any', 'some', 'in'}:
op = sym; getSym(); e = parseFactor(s)
e = negative(e) if op == '-' else \
negation(e) if op == '¬' else \
summation(e) if op == '∑' else \
product(e) if op == '∏' else \
minimum(e) if op == 'min' else \
maximum(e) if op == 'max' else \
universal(e) if op == 'all' else \
existential(e) if op == 'any' else \
some(e) if op == 'some' else \
cardinality(e) if op == '#' else \
statetest(e)
else:
e = parseBase(s)
if sym == EXP: e = exponentiation(e, val); getSym()
return e
def parseBase(s: State) -> Expression:
if sym == 'true': getSym(); e = True
elif sym == 'false': getSym(); e = False
elif sym == INT: e = val; getSym()
elif sym == FRAC: e = val; getSym()
elif sym == IDENT: e = parseDesignator(s)
elif sym == '{':
getSym()
if sym in FirstExpression:
e = parseExpression(s)
if sym == '}': getSym()
else: raise Error('} expected')
elif sym == '(':
getSym(); e = parseExpression(s)
if sym == ')': getSym()
else: raise Error(') expected', s)
else: raise Error('expression expected', 's')
return e
def parseDesignator(s: State) -> Expression:
"""assumes sym == IDENT"""
e = val; getSym()
# e = [val]; getSym()
# while sym == '.':
# getSym()
# if sym != IDENT: error('identifier expected', s)
# e.append(val); getSym()
if sym in FirstExpression:
e = Application(e, parseExpression(s))
return e
Following tests are for operator precedence and associativity.
def testParseExpression(label: str) -> Expression:
global src, pos
src, pos = label, 0; getChar(); getSym();
return parseExpression('some state')
assert str(testParseExpression('a + b | c ∈ 3 + 2 ‥ 5, d < e, f')) == 'c ∈ 5 ‥ 5, d < e • a + b, f'
assert str(testParseExpression('a ≢ b ? c ≡ d : e ⇒ f ∨ (g ⇐ h)')) == '(¬ (a = b) ? c = d : e ⇒ (f ∨ (h ⇒ g)))'
assert str(testParseExpression('a = b ∨ c ≠ d + 2 ∧ e < f ≤ g > h ≥ i ⇒ j ⊂ k ⊆ l ⊃ m ⊇ n ∧ o + p ∈ q ∉ r')) == \
'(((a = b) ∨ (¬ (c = (2 + d)) ∧ (e < f) ∧ (f ≤ g) ∧ (g > h) ∧ (h ≥ i))) ⇒ \
((j ⊂ k) ∧ (k ⊆ l) ∧ (m ⊂ l) ∧ (n ⊆ m) ∧ ((o + p) ∈ q) ∧ ¬ (q ∈ r)))'
assert str(testParseExpression('a - b + c + d - e = f × g × h div 4 mod 4 × i × j')) == \
'((((a - b) + c + d) - e) = ((((f × g × h) div 4) mod 4) × i × j))'
assert str(testParseExpression('a ∪ b ∩ c ∪ d ⊆ e \\ f \\ g ∩ i ∩ j')) == \
'((a ∪ (b ∩ c) ∪ d) ⊆ (((e \\ f) \\ g) ∩ i ∩ j))'
assert str(testParseExpression('- a + ∑(b | b ∈ 4 ‥ 5) - ∏(c × d | c ∈ 6 ‥ 7) - some e')) == \
'(((- a + ∑ (b ∈ 4 ‥ 5 • b)) - ∏ (c ∈ 6 ‥ 7 • c × d)) - some e)'
Parsing Statements
The grammar of chart statements defines ‖
to bind looser than if-then-else
and allows statements to be grouped in parenthesis:
Statement ::= Primary {‖ Primary}
Primary ::= Assignment | Broadcast | CondStat | '(' Statement ')'
Assignment ::= Designator {',' Designator} ≔ Expression
CondStat ::= if Expression then Primary [else Primary]
Broadcast ::= # Identifier
The grammar requires that nested if-then-else
must be parenthesized, as in if b then (if c then s) else t
.
TODO remove # for BroadCast
# TODO proper compatibility check with nested types, type and subrange checks
def checkCompatibility(d: Expression, e: Expression): pass
#if isinstance(d, Application) and d.fn == ',':
# for di in d.arg
#if len(d) < len(e): raise Error('too few expressions on right hand side', s)
#elif len(d) > len(e): raise Error('too many expression on right hand side', s)
def parseStatement(s) -> Statement:
sl = [parsePrimary(s)]
while sym == '‖' or sym == '∥': getSym(); sl.append(parsePrimary(s))
return parallel(*sl)
def parsePrimary(s):
if sym == IDENT: # assignment
d = parseDesignator(s)
if sym == ',':
d = [d]
while sym == ',':
getSym(); d.append(parseDesignator(s))
d = Application(',', d)
if sym == '≔': getSym()
else: raise Error('≔ expected', s)
e = parseExpression(s)
checkCompatibility(d, e)
stat = Assignment(d, e)
elif sym == '#':
getSym()
if sym == IDENT: stat = val; getSym()
else: raise Error('event name expected', s)
elif sym == 'if':
getSym(); cond = parseExpression(s)
if sym != 'then': raise Error('then expected', s)
getSym(); thn = parsePrimary(s)
if sym == 'else': getSym(); stat = CondStat(cond, thn, parsePrimary(s))
else: stat = CondStat(cond, thn)
elif sym == '(':
getSym(); stat = parseStatement(s)
if sym == ')': getSym()
else: raise Error(') expected', s)
else: raise Error('not a statement', s)
return stat
def testParseStatement(label: str) -> Expression:
global src, pos
src, pos = label, 0; getChar(); getSym();
return str(parseStatement('some state'))
assert testParseStatement('if 1/10 < a then a, b ≔ b + 2, min c') == 'if 1/10 < a then a, b ≔ 2 + b, min c'
assert testParseStatement('if - # a ≥ 2 then (#b ‖ a ≔ true) else #c ‖ d, e ≔ f') == \
'(if - # a ≥ 2 then (b ∥ a ≔ True) else c ∥ d, e ≔ f)'
assert testParseStatement('a ≔ b ‖ (c ≔ d ‖ e ≔ f)') == '(a ≔ b ∥ c ≔ d ∥ e ≔ f)'
x := a filter (x • x mod 2 = 0) map (x • x + 2) reduce (x, y • x × y)
x := a filter (x mod 2 = 0 | x) map (x • x + 2) reduce (x, y • x × y)
c := (a zip b) map (_0 = _1) reduce (_0 & _1)
x := a filter (# mod 2 = 0) // filter all with even index
x := a filter (_ mod 2 = 0) map (_ + 2) reduce (_0 * _1)
c := (a zip b) map (_0 = _1) reduce (_0 & _1)
x := a filter (# mod 2 = 0) // filter all with even index
- All children of a state must have unique names. Capitalization is significant and any two children of a state must differ more than in just the capitalization.
Parsing State Labels
- A child state may have the same name as its parent or any ancestor, but the names of all siblings must differ more than just by capitalization. For example,
X1
andx1
cannot be children of the same parent, butX1
andX2
can. Children of distinct parents can have the same name. - Variables and constants declared in a state must have names unique among all variables and constants of that state. Variables declared in different states may have the same name, also if one state is the descendant of another.
-
The variable names declared in a state must differ from the names of all children of that state by more than just capitalization. For example,
X1
andx1
cannot be the name of a variable and a child of state, butX1
andX2
can.StateLabel ::= [Identifier {; (Var | Const)}] [’|’ Expression] {$ Cost / TimeUnit | ^ Event} Event ::= Identifier Const ::= IdentifierList = Expression Var ::= IdentifierList : Type IdentifierList ::= Identifier {‘,’ Identifier} Cost ::= Identifier = Expression [Unit] Unit ::= [UnitPrefix] Identifier TimeUnit ::= [TimeUnitPrefix] s | min | h | d UnitPrefix ::= E | P | T | G | M | k | h | da | d | c | m | µ | n | p | f | a TimeUnitPrefix ::= m | µ | n | p | f | a
TODO add units to constants, variables, make units more fancy, with proper checking and conversion:
Unit ::= [prefix] identifier [exponent] {("." | "/" [prefix] identifier [exponent]}
Cost expressions may refer to variables and states. Examples: TODO
The prefixes are according to https://en.wikipedia.org/wiki/Metric_prefix. The time units are non-SI units accepted for use with SI units, http://www.bipm.org/en/publications/si-brochure/table6.html. Prefixes and units are not keywords, meaning them above identifiers can also be used for event and state names. The prefix does not need to be separated from the time unit.
UnitPrefix = {'E': Fraction(100000000, 1), 'P': Fraction(10000000, 1),
'T': Fraction(1000000, 1), 'G': Fraction(100000, 1), 'M': Fraction(10000, 1),
'k': Fraction(1000, 1), 'h': Fraction(100, 1), 'da': Fraction(10, 1),
'd': Fraction(1, 10), 'c': Fraction(1, 100), 'm': Fraction(1, 1000),
'µ': Fraction(1, 10000), 'n': Fraction(1, 100000), 'p': Fraction(1, 1000000),
'f': Fraction(1, 10000000), 'a': Fraction(1, 100000000)}
TimeUnit = {'s': 1, 'min': 60, 'h': 3600, 'd': 86400}
TimeUnitPrefix = {'m': Fraction(1, 1000), 'µ': Fraction(1, 10000), 'n': Fraction(1, 100000),
'p': Fraction(1, 1000000), 'f': Fraction(1, 10000000), 'a': Fraction(1, 100000000)}
FirstTimeUnit = set(TimeUnitPrefix.keys()) | set(TimeUnit.keys())
def parseStateLabel(label: str, s: State):
# assumes that s.name, s.const, s.var, s.ev, s.inv, s.cost are freshly initialized
global src, pos
src, pos = label, 0; getChar(); getSym()
if sym == IDENT: # check if names of all siblings differ by more than capitalization
if s.parent is None or all(c.name.lower() != val.lower() for c in s.parent.children - {s}):
s.name = val; getSym()
else:
print('parseStateLabel', s.parent.children, val)
raise Error('Sibling state has same or similar name', s)
while sym == ';':
getSym(); ids = parseIdentifierList(s)
if sym == '=': # constant declaration
getSym(); e = parseExpression(s)
if isinstance(e, (int, Fraction)):
for i in ids:
if i not in s.const and i not in s.var: s.const[i] = e
else: raise Error('Duplicate name', s)
else: raise Error('Constant expected', s)
elif sym == ':':
getSym(); tp = parseType(s)
for i in ids:
if i not in s.const and i not in s.var: s.var[i] = tp
else: raise Error('Duplicate name', s)
else: raise Error('Constant or variable declaration expected', s)
if any(v.lower() == c.name.lower() for v in s.var for c in s.children):
raise Error('Child state has same or similar name as variable', s)
if sym == '|': getSym(); s.inv = parseExpression(s)
while sym == '$' or sym == '^': # {Cost / Timeunit | Event}
if sym == '$':
getSym(); ci, ce, cp, cu = parseCost(s) # cost identifier, expression, prefix, unit
if sym == '/': getSym()
else: raise Error('/ expected', s)
ctu = parseTimeUnit(s) # cost time unit relative to sec
if ci not in s.cost: s.cost[ci] = (cp * ce / ctu, cu) # normalize cost
else: raise Error('Duplicate cost name', s)
else:
getSym(); ids = parseIdentifierList(s)
if set(ids) & s.ev == set(): s.ev.update(set(ids))
else: raise Error('Duplicate event name', s)
# returns identifier, expression, prefix, unit
def parseCost(s: State) -> Tuple[Text, Expression, ConstNumberExpression, Optional[Text]]:
if sym != IDENT: raise Error('Identifier expected', s)
i = val; getSym()
if sym != '=': raise Error('= expected', s)
getSym(); e = parseExpression(s)
if sym == IDENT:
if val in UnitPrefix:
p = UnitPrefix[val]; getSym(); u = val
if sym != IDENT: raise Error('Unit expected', s)
elif val[0:1] in UnitPrefix: p, u = UnitPrefix[val[0:1]], val[1:]
elif val[0:2] in UnitPrefix: p, u = UnitPrefix[val[0:2]], val[2:]
else: p, u = 1, val
getSym()
else: p, u = 1, None
return i, e, p, u
def parseIdentifierList(s: State) -> Sequence[str]:
if sym != IDENT: raise Error('Identifier expected', s)
ids = [val]; getSym()
while sym == ',':
getSym()
if sym != IDENT: raise Error('Identifier expected', s)
elif val in ids: raise Error("Duplicate '" + val + "'", s)
else: getSym(); ids.append(val)
return ids
def parseTimeUnit(s) -> ConstNumberExpression:
if sym == IDENT:
if val in TimeUnitPrefix:
p = TimeUnitPrefix[val]; getSym(); v = val
if sym != IDENT: raise Error('Time unit expected', s)
elif val[0:1] in TimeUnitPrefix: p, v = TimeUnitPrefix[val[0:1]], val[1:]
elif val[0:2] in TimeUnitPrefix: p, v = TimeUnitPrefix[val[0:2]], val[2:]
else: p, v = 1, val
getSym()
else: raise Error('Time unit expected', s)
if v in TimeUnit: u = TimeUnit[v]
else: raise Error('Improper time unit', s)
return p * u
def testParseStateLabel(label: str) -> State:
global src, pos
src, pos = label, 0; getChar(); getSym()
s = State('dummy', None); s.name = None
parseStateLabel(label, s)
return s
s = testParseStateLabel('Working; S = 9; jobs: 0 ‥ 9 | jobs mod 2 = 0 ^Off $power = .5mW/ms')
assert s.name == 'Working'; assert s.const == {'S': 9}; assert s.var == {'jobs': Type(Kind.SUBRANGE, 0, 9)}
assert s.inv == equality(integermodulo('jobs', 2), 0)
assert s.ev == {'Off'}; assert s.cost == {'power': (Fraction(1, 2), 'W')}
s = testParseStateLabel('Reading | readers>0 ∧ writers=0 $utility = 5units/s')
assert s.name == 'Reading'; assert s.inv == conjunction(greater('readers', 0), equality('writers', 0))
assert s.cost == {'utility': (5, 'units')}
# TODO '$power = .5W/min' not parsed as 'min' is recognized as a keyword
# TODO '$utility = 5/s' assigns expression 5/s to utility and does not recognize time units
# TODO '$cost = 1cent/s' assigns 1/100 ent to cost
Parsing Connection Labels
ConnectionLabel ::= [Identifier | Timing] {$ Cost} [∆ Time] ['[' Expression ']']
[@ Expression] [/ Statement]
Timing ::= Time [‥ [Time]] | ‥ Time | exp '(' Time ')' | unif '(' Time , Time ')'
Time ::= Expression [Timeunit]
Note that exp
and unif
are keywords, implying that they cannot be used as names of states or events.
FirstTime = FirstExpression
FirstTiming = FirstTime | {'‥', 'exp', 'unif'}
class ConnectionLabel(NamedTuple):
event: Optional[Union[Event, Timing]]
costs: MutableMapping[Text, Cost]
wcet: Optional[Expression]
guard: Optional[Expression]
prob: Optional[Expression]
stat: Optional[Statement]
def parseConnectionLabel(label, cid) -> ConnectionLabel:
global src, pos
src, pos = label, 0; getChar(); getSym()
if sym == IDENT: getSym(); kind = val
elif sym in FirstTiming: kind = parseTiming(cid)
else: kind = None
cost = {}
while sym == '$':
getSym(); ci, ce, cp, cu = parseCost(cid) # cost identifier, expression, prefix, unit
if ci not in cost: cost[ci] = (cp * ce, cu) # normalize cost
else: raise Error('Duplicate cost name', s)
if sym == '∆': getSym(); wcet = parseTime(cid)
else: wcet = None
if sym == '[':
getSym(); guard = parseExpression(cid)
if sym == ']': getSym()
else: raise Error('] expected', cid)
else: guard = None
if sym == '@': getSym(); prob = parseExpression(cid)
else: prob = None
if sym == '/': getSym(); stat = parseStatement(cid)
else: stat = skip
return ConnectionLabel(kind, cost, wcet, guard, prob, stat)
def parseTiming(cid) -> Timing:
if sym in FirstTime:
t0 = parseTime(cid)
if sym == '‥':
getSym()
if sym in FirstTime: t = Timing('between', t0, parseTime(cid))
else: t = Timing('between', t0, None)
else: t = Timing('between', t0, t0)
elif sym == '‥':
getSym(); t = Timing('between', None, parseTime(cid))
elif sym == 'exp':
getSym()
if sym != '(': raise Error('( expected', cid)
getSym(); t = Timing('exp', parseTime(cid))
if sym == ')': getSym()
else: raise Error(') expected', cid)
elif sym == 'unif':
getSym()
if sym != '(': raise Error('( expected', cid)
getSym(); t0 = parseTime(cid)
if sym != ',': raise Error(', expected', cid)
getSym(); t = Timing('unif', t0, parseTime(cid))
if sym == ')': getSym()
else: raise Error(') expected', cid)
return t
#TODO ask many places where parseExpression is used but a literal is required like here with *
def parseTime(cid):
t = parseExpression(cid)
if sym == IDENT: t = t * parseTimeUnit(cid)
return t
def parseFirstConnectionLabel(label, cid):
kind, cost, wcet, guard, prob, stat = parseConnectionLabel(label, cid)
if prob is not None: raise Error('Probability not allowed', cid)
if guard is None: guard = True
return kind, guard, cost, stat
def parseProbConnectionLabel(label, cid):
kind, cost, wcet, guard, prob, stat = parseConnectionLabel(label, cid)
if kind is not None: raise Error('Event/time not allowed', cid)
if guard is not None: raise Error('Guard not allowed', cid)
if prob is None: prob = 1
if cost != {}: raise Error('Cost not allowed', cid)
return prob, stat
def parseCondConnectionLabel(label, cid):
kind, cost, wcet, guard, prob, stat = parseConnectionLabel(label, cid)
if kind is not None: raise Error('Event/time not allowed', cid)
if guard is None: guard = True
if prob is not None: raise Error('Probability not allowed', cid)
if cost != {}: raise Error('Cost not allowed', cid)
return guard, stat
# TODO process wcet
assert parseConnectionLabel('Request $req = 7 unit [in Machine.Idle] / jobs ≔ 1 ‖ #WarmUp', 'some connection') == \
ConnectionLabel(event='Request', costs={'req': (7, 'unit')}, wcet=None,
guard=Application(fn='in', arg='Machine.Idle'), prob=None,
stat=ParallelComp(elems=[Assignment(var='jobs', expr=1), 'WarmUp']))
assert parseConnectionLabel('StartReading ∆.1ms [writers=0] / readers ≔ 1', 'some connection') == \
ConnectionLabel(event='StartReading', costs={}, wcet=Fraction(1, 10000),
guard=Application(fn='=', arg=['writers', 0]), prob=None, stat=Assignment(var='readers', expr=1))
assert parseConnectionLabel('exp(10 ms) $cust = 1 @0.5', 'some connection') == \
ConnectionLabel(event=Timing('exp', Fraction(1, 100)), costs={'cust': (1, None)}, wcet=None, guard=None,
prob=Fraction(1, 2), stat=Skip())
Alternative Implementations
Scanner as Iterator
- Avoids the use of global variables to preserve the state of the scanner; the scanner below can be made even more elegant by turning
getChar
into a coroutine of its own - Reporting of error positions would require that the scanner “yields” both the symbol and its position, necessitating a use of the form
scan = scanner(src); ... sym, pos = next(scan)
(wherepos
is the position ofsym
, not the current position) - Allows an elegant solution to superscripted exponents being viewed by the scanner as two symbols, an exponentiation symbol followed by the integer exponent
- Necessitates that the scanner is passed around as a parameter to all parsing procedures
INT = 1; FRAC = 2; EXP = 3; IDENT = 4
VULG = {'½': Fraction(1, 2), '⅓': Fraction(1, 3), '⅔': Fraction(2, 3),
'¼': Fraction(1, 4), '¾': Fraction(3, 4), '⅕': Fraction(1, 5),
'⅖': Fraction(2, 5), '⅗': Fraction(3, 5), '⅘': Fraction(4, 5),
'⅙': Fraction(1, 6), '⅚': Fraction(5, 6), '⅐': Fraction(1, 7),
'⅛': Fraction(1, 8), '⅜': Fraction(3, 8), '⅝': Fraction(5, 8),
'⅞': Fraction(7, 8), '⅑': Fraction(1, 9), '⅒': Fraction(1, 10)}
SUPDIGIT = {'⁰', '¹', '²', '³', '⁴', '⁵', '⁶', '⁷', '⁸', '⁹'}
SUBDIGIT = {'₀', '₁', '₂', '₃', '₄', '₅', '₆', '₇', '₈', '₉'}
OP = {'+', '-', '×', '/', '∑', '∏', '(', ')', '[', ']', '{', '}',
'#', '∈', '∉', '⊆', '⊇', '⊂', '⊃', '∩', '∪', '\\',
'¬', '∧', '∨', '≡', '≢', '⇒', '⇐',
'=', '≠', '<', '≤', '>', '≥',
'.', ';', ',', '?', ':', '×', '→', '$', '|', '‥', '∆', '@', '≔', '‖'}
KW = {'in', 'bool', 'set', 'if', 'then', 'else', 'exp', 'unif', 'div', 'mod',
'true', 'false', 'min', 'max', 'all', 'any', 'some'}
# TODO identifiers temporarilly modified to allow '.' as a character: pretty-printing more readable
def symbols(src: str) -> Iterator[Union[int, Fraction, str, None]]:
pos: int; ch: str
def getChar():
nonlocal src, ch, pos
if pos < len(src): ch, pos = src[pos], pos + 1
else: ch = chr(0)
pos = 0; getChar()
while ch != chr(0):
while ch in ' \t\r\n': getChar()
if 'a' <= ch <= 'z' or 'A' <= ch <= 'Z':
s = ch; getChar()
while 'a' <= ch <= 'z' or 'A' <= ch <= 'Z' or '0' <= ch <= '9' or ch in SUBDIGIT or ch == '.':
s += ch; getChar()
yield s
elif '0' <= ch <= '9':
s = ch; getChar()
while '0' <= ch <= '9': s += ch; getChar()
if ch == '.':
s += ch; getChar()
while '0' <= ch <= '9': s += ch; getChar()
yield Fraction(s)
else: yield int(s)
elif ch == '.':
s = ch; getChar()
if '0' <= ch <= '9':
s += ch; getChar()
while '0' <= ch <= '9':
s += ch; getChar()
yield Fraction(s)
else: yield '.'
elif ch in VULG: yield VULG[ch]; getChar()
elif ch in OP: yield ch; getChar()
elif ch in SUPDIGIT:
sym, val = EXP, digit(ch); getChar()
while ch in SUPDIGIT:
val = 10 * val + digit(ch); getChar()
yield '^'; yield val
elif ch == '⁻':
getChar()
if ch in SUPDIGIT:
sym, val = EXP, digit(ch); getChar()
while ch in SUPDIGIT:
val = 10 * val + digit(ch); getChar()
yield '^'; yield - val
else: raise Error('Exponent missing')
elif ch == chr(0): yield None
else: raise Error("Unexpected character '" + ch + "'")
yield None
assert list(symbols('<⅜882 S₃ qq33 +/ ⁻⁴⁷ if a.b .99')) == \
['<', Fraction(3, 8), 882, 'S₃', 'qq33', '+', '/', '^', -47, 'if', 'a.b', Fraction(99, 100), None]