@node Grammars
@chapter Grammars
@menu
* Languages::
* Formal Grammars::
* Chomsky Hierarchy::
* Some Grammars::
* Regular Expressions::
* Context-free Grammars::
* Ambiguous Grammars::
@end menu
@node Languages
@section Languages
A @dfn{language} is a set of @dfn{words}, or
@dfn{sentences}.
Operations on languages: concatenation, alternation, power, closure.
Other operations: intersection, left and right divisions.
A parser, a scanner, more generally, an accepter are finite
representation of a language.
@node Formal Grammars
@section Formal Grammars
@cindex Grammar
Interesting languages are infinite. Problem: representing an infinite
language as a finite entity.
An @dfn{alphabet}, a @dfn{string}, @dfn{concatenation} of strings,
@dfn{power} of a string, @dfn{length} of a string, a @dfn{formal
language}.
A @dfn{phrase structure grammar}, or @dfn{grammar}, @var{G} is a
quadruple (@var{T}, @var{N}, @var{S}, @var{R}), @var{T} a set of
terminal symbols, @var{N} , @var{N} a set of non terminal symbols,
@var{S}, the @dfn{axiom} is a non terminal, and @var{R} is a finite set
of production rules. The vocabulary, @var{V} is the union of @var{N}
and @var{T}.
A @dfn{rule} has the form @samp{x -> y} where @var{x} is a non empty
string over @var{V} and @var{y} is a string over @var{V}.
@cindex proto-word
A @dfn{proto-word} is a word made of terminals and non terminals.
Derivations.
Parse trees.
@node Chomsky Hierarchy
@section Chomsky Hierarchy
@table @asis
@item Type 0
Unrestricted phrase structure grammars, Turing Machine.
@item Type 1
@itemx Context sensitive
Linear bounded Turing Machine. Lhs of the production rules must be
shorter than rhs. Each production is of the form
@example
alpha1 A alpha2 -> alpha1 beta alpha2
@end example
@noindent
where @code{alpha1} and @code{alpha2} are proto-words, @code{A} is a non
terminal, and @code{beta} is a non empty proto-word, with one exception
possible @samp{S -> epsilon} where @code{S} must not appear in a rhs.
@item Type 2
@itemx Context free
Lhs must be a single nonterminal. Algebraic language. Any context free
grammar can be rewritten without using the empty string in the rhs.
@item Type 3
Linear production rule: at most one non terminal in @sc{rhs}.
Right linear production rules: @samp{X -> aY}, @samp{X -> a}. Right
linear grammars: right linear rules plus @samp{X -> epsilon}. Left
linear rules, left linear grammars. Adding @samp{X -> Y} is OK.
What of grammars with right @emph{and} left linear rules? What of
@samp{X -> aYb}?
Regular/rational languages, regular/rational expressions, finite state
machines.
@end table
The inclusions are proper.
@node Some Grammars
@section Some Grammars
Producing @{a^n b^n c^n | n >= 1@} (From Salooma, ex. 2.1, p 11):
@example
N = @{S, A, B@}
T = @{a, b, c@}
S = S
P = S -> a b c
S -> a A b c
A b -> b A
A c -> B b c c
b B -> B b
a B -> a a A
a B -> a a
@end example
Producing @{a^n b^n c^n | n >= 1@} (From @emph{Machines, Languages, and
Computation}, ex. 3.1, p. 63):
@example
S -> A
A -> a A B C
A -> a b C
C B -> B C
b B -> b b
b C -> b c
c C -> c c
@end example
Producing @{a^n b^n | n >= 1@} (From @emph{Machines, Languages, and
Computation}, ex. 3.2, p. 63):
@example
S -> A
A -> a A B C
A -> a b C
C B -> B C
b B -> b b
b C -> b
@end example
@noindent
can you find a simple grammar defining the same language?
See Salooma, ex. 2.5, p 14 for a grammar of @{a^p | p prime@}.
Declaration and uses in programming languages:
@example
L = @{ w c w | w in @{0, 1@}+ @}
@end example
@node Regular Expressions
@section Regular Expressions
@table @asis
@item Empty string
@item Symbol
@item Alternation
@item Concatenation
@item Kleene closure
@itemx Repetition
@end table
@menu
* Finite Automata::
@end menu
@node Finite Automata
@subsection Finite Automata
DFA, NFA.
NFA: empty transition, multiple edges with the same label, multiple
entry points.
Building a NFA corresponding to a regexp.
Determinization of NFAs.
@node Context-free Grammars
@section Context-free Grammars
@var{G} grammar is @dfn{context free} if all the rules look like:
@samp{@var{A} -> @var{alpha}}, with @var{A} as not terminal and
@var{alpha} is a proto-word.
Writing a C-like instruction sequence seems easy having @samp{ins} and
@samp{;} as terminals..
@example
S -> S; S
S -> ins
S ->
@end example
Find a grammar producing the following words:
@example
a := 1;
b := a + (c := a + 3, c)
@end example
@noindent
First thing to do is obviously introducing terminals. In a computer
science meaning terminals are not characters or data units, but of
course lexical units. For the current grammar we choose @samp{id},
@samp{num}, @samp{+}, @samp{(}, @samp{)}, @samp{:=}and @samp{;}:
@example
id := num;
id := id + (id := id + num, id)
@end example
@noindent
The only thing to do is to make a simple grammar.
@example
S -> S ; S
S -> id := E
E -> id
E -> num
E -> E + E
E -> (S , E)
@end example
Many different notations exist for grammars. First of them is the BNF,
Backus-Naur Form.
From @samp{http://cui.unige.ch/db-research/Enseignement/analyseinfo/AboutBNF.html}
@quotation
BNF is an acronym for "Backus Naur Form". John Backus and Peter Naur
introduced for the first time a formal notation to describe the syntax
of a given language (This was for the description of the ALGOL 60
programming language, see [Naur 60]).
To be precise, most of BNF was introduced by Backus in a report
presented at an earlier UNESCO conference on ALGOL 58. Few read the
report, but when Peter Naur read it he was surprised at some of the
differences he found between his and Backus's interpretation of ALGOL
58. He decided that for the successor to ALGOL, all participants of the
first design had come to recognize some weaknesses, should be given in a
similar form so that all participants should be aware of what they were
agreeing to. He made a few modificiations that are almost universally
used and drew up on his own the BNF for ALGOL 60 at the meeting where it
was designed. Depending on how you attribute presenting it to the
world, it was either by Backus in 59 or Naur in 60. (For more details on
this period of programming languages history, see the introduction to
Backus's Turing award article in Communications of the ACM, Vol. 21,
No. 8, august 1978. This note was suggested by William B. Clodius from
Los Alamos Natl. Lab).
Since then, almost every author of books on new programming languages
used it to specify the syntax rules of the language. See [Jensen 74] and
[Wirth 82] for examples.
@end quotation
Uses of @samp{:=} or @samp{::=} instead of our @samp{->} is the first
difference. We uses the @samp{|} of regular expresions, terminals are
shown as @samp{this}, or bold characters..
@example
S := S @samp{;} S
| @samp{id} @samp{:=} E
E := @samp{id}
| @samp{num}
| E @samp{+} E
| @samp{(} S @samp{,} E @samp{)}
@end example
Show out a BNF for flotting point numbers
@example
F := @samp{-} FP
| FP
FP := Cs
| Cs @samp{.} Cs
Cs := C
| C Cs
C := @samp{0} | @samp{1} | @samp{2} | @samp{3} | @samp{4} | @samp{5} | @samp{6} | @samp{7} | @samp{8} | @samp{9}
@end example
@itemize @minus
@item
Parenthesis to group (compel priorities)
@item
@samp{?} the character is optional(zero or one time).
@item
@samp{*} the character can be repeated as many time as you want(zero or more).
@item
@samp{+} the character is not optional and can be repeated (1 or more)
@end itemize
@noindent
Having such features help us to have more readable grammars, and eliminating
recursion most of the time:
@example
F := @samp{-}? FP
FP := Cs (@samp{.} Cs)?
Cs := C+
C := @samp{0} | @samp{1} | @samp{2} | @samp{3} | @samp{4} | @samp{5} | @samp{6} | @samp{7} | @samp{8} | @samp{9}
@end example
We can simplify even more:
@example
F := @samp{-}? C+ (@samp{.} C+)?
C := @samp{0} | @samp{1} | @samp{2} | @samp{3} | @samp{4} | @samp{5} | @samp{6} | @samp{7} | @samp{8} | @samp{9}
@end example
Some people use @samp{[A]} for @samp{(A)?} and @samp{@{A@}} for
@samp{(A)*}.
@node Ambiguous Grammars
@section Ambiguous Grammars
@example
S := S @samp{,} S | @samp{I}
@end example
Is this an ambiguous grammar ? Yes it is: @samp{I; I; I}can be
understood two different ways...It's an associativity problem; the same
for instruction sequences. How can we deal with this problem?
@example
S ::= S ; I | I
@end example
@noindent
left recursive, or
@example
S ::= I ; S | I
@end example
@noindent
right recursive.
@sp 1
Ambiguity in context-free grammar can arise in two different ways:
@itemize @minus
@item
Some sentence has two different parse trees.
@example
S -> S a S
S -> b
@end example
@item
Some sentence has two structurally equivalent parse trees, but the inner
nodes are different.
@example
A -> A a
A -> B a
A -> b
B -> B a
B -> b
@end example
@end itemize
@sp 1
For our little toy grammar :
@example
E ::= E' + E or E ::= E + E'
E' ::= id
| num
| (S, E)
@end example
The two are OK for addition but subtraction? @samp{1 - 2 - 3} is
@samp{(1 - 2) - 3} or @samp{1 - (2 - 3)}? The first one, of course,
subtraction is left associative. Then use left recursion.
The simplest grammar for integer arithmetics:
@example
E ::= E * E
| E + E
| E - E
| E / E
| ( E )
| num
@end example
it's ambigous
Think about @samp{1 + 2 * 3} : sum of factors.To make it unambiguous we
must give associativity and priority.First @samp{()}, then @samp{*} and
@samp{/}, then @samp{+} and @samp{-}.
@example
E ::= E + T
| E - T
| T
T ::= T * F
| T / F
| F
F ::= num
| ( E )
@end example
This time it is not ambiguous, but it's the same language! Understanding
that grammars are structurs discribing languages, but grammars are far more
meaningfull: by inference many grammars exist describing the same language
(in fact an infinity, but it's not important because human can't imagine
exactly this concept...unless someone can count till infinity i'll continue
thinking that;it is not a shame, we are able to think about many concept
we can't catch up), good grammars, less good one (ambiguous is not good,
i think you already understood that). The only thing to do is finding the
good grammar matching what you need. That's why we will have much more
interest in grammar properties than languages properties.
@c Local Variables:
@c mode: texinfo
@c ispell-local-dictionary: "american"
@c TeX-master: "compil"
@c End: