Language Theory Foundations - Grammars
Formal Grammars
A Formal Grammar is a set of rules for establishing strings in a given Language. Essentially, it's a set of rules that describe how to form valid strings, using the given language's alphabet. A grammar doesn't define the string's meaning, just the form a string can take.
Generally speaking, a Grammar will begin with a Start Symbol, have a separate sets of Terminal Symbols and Non-Terminal Symbols, and finally have a set of Production Rules which indicate what can be produced for any given symbol.
To put it more formally, we can define a grammar G (for given language L) as:
G = (V, T, S, P)
- Non-Terminal Symbols (V) - A non-empty, finite set of symbols for which production rules cannot create more symbols from.
- Terminal Symbols (T) - A non-empty, finite set of symbols for which production rules creates more symbols from.
- When all symbols in a string are "Terminal Symbols", then we have a fully valid string for the given Language and Grammar.
- Start Symbol (S) - A symbol indicating a start point for the "Production Rules" to begin doing work.
- Production Rules (P) - A finite set of rules to manipulate a given string into a final, valid string for the given Language and Grammar.
- In other words, a finite set of rules to substitute one valid symbol for one or more other valid symbols.
- Derivation - The steps taken in a particular instance of forming a string from a Grammar.
Formal Grammar Derivation
Ex: Let's take a basic Language L(G) = {a^2^n | n ≥ 0 }.
L(G) = {a^2^n | n ≥ 0 } Which results in L(G) = {a, aa, aaaa, aaaaaaaa, ...}
Now let's look at the associated grammar G = (V, T, S, P) such that
V = {S, [, ], A, D} T = {a}
Note that the start symbol, "S", is part of the non-terminal set. So are the characters "[" and "]".
Finally, we can define the rules for this Grammar as:
P = { S → [A] (Rule #1) [ → [D | λ (Rule #2) D] → ] (Rule #3) DA → AAD (Rule #4) ] → λ (Rule #5) A → a (Rule #6) }
Given all that, we can finally derive an example string for our language. In this instance, let's derive a^4.
We always start with the start symbol so:
Start with S S ⇒ [A] (Using Rule #1) ⇒ [DA] (Using Rule #2) ⇒ [AAD] (Using Rule #4) ⇒ [AA] (Using Rule #3) ⇒ [DAA] (Using Rule #2) ⇒ [AADA] (Using Rule #4) ⇒ [AAAAD] (Using Rule #4) ⇒ [AAAA] (Using Rule #3) ⇒ λAAAA] (Using Rule #2) ⇒ λAAAAλ (Using Rule #5) ⇒ AAAA (Definition of λ) ⇒ aaaa (Using Rule #6) ⇒ a^4 (Definition of Concatenation)
While we're at it, we can also define:
- Derivation Tree - A tree formed by showing all steps (from start to finish) of forming a string from a Grammar.
- Ambiguous Grammar - A Grammar one or more strings, which have two or more associated Derivation Trees.
- Ambiguous Language - A language in which all Grammars for it are Ambiguous Grammars.
- Left-Most Derivation - A Derivation in which only the left-most possible symbol changes for each step.
- Right-Most Derivation - A Derivation in which only the right-most possible symbol changes for each step.