Language Theory Foundations - Grammars

From Dev Wiki
Revision as of 20:25, 28 September 2019 by Brodriguez (talk | contribs) (Create page)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Formal Grammars

A Formal Grammar is a set of rules for establishing strings in a given Language. Essentially, it's a set of rules that describe how to form valid strings, using the given language's alphabet. A grammar doesn't define the string's meaning, just the form a string can take.

Generally speaking, a Grammar will begin with a Start Symbol, have a separate sets of Terminal Symbols and Non-Terminal Symbols, and finally have a set of Production Rules which indicate what can be produced for any given symbol.

To put it more formally, we can define a grammar G (for given language L) as:

G = (V, T, S, P)
  • Non-Terminal Symbols (V) - A non-empty, finite set of symbols for which production rules cannot create more symbols from.
  • Terminal Symbols (T) - A non-empty, finite set of symbols for which production rules creates more symbols from.
    • When all symbols in a string are "Terminal Symbols", then we have a fully valid string for the given Language and Grammar.
  • Start Symbol (S) - A symbol indicating a start point for the "Production Rules" to begin doing work.
  • Production Rules (P) - A finite set of rules to manipulate a given string into a final, valid string for the given Language and Grammar.
    • In other words, a finite set of rules to substitute one valid symbol for one or more other valid symbols.
  • Derivation - The steps taken in a particular instance of forming a string from a Grammar.

Formal Grammar Derivation

Ex: Let's take a basic Language L(G) = {a^2^n | n ≥ 0 }.

L(G) = {a^2^n | n ≥ 0 }
Which results in L(G) = {a, aa, aaaa, aaaaaaaa, ...}

Now let's look at the associated grammar G = (V, T, S, P) such that

V = {S, [, ], A, D}
T = {a}

Note that the start symbol, "S", is part of the non-terminal set. So are the characters "[" and "]".
Finally, we can define the rules for this Grammar as:

P = {
   S  → [A]      (Rule #1)
   [  → [D | λ   (Rule #2)
   D] → ]        (Rule #3)
   DA → AAD      (Rule #4)
   ]  → λ        (Rule #5)
   A  → a        (Rule #6)
}

Given all that, we can finally derive an example string for our language. In this instance, let's derive a^4.
We always start with the start symbol so:

Start with S
S ⇒ [A]      (Using Rule #1)
  ⇒ [DA]     (Using Rule #2)
  ⇒ [AAD]    (Using Rule #4)
  ⇒ [AA]     (Using Rule #3)
  ⇒ [DAA]    (Using Rule #2)
  ⇒ [AADA]   (Using Rule #4)
  ⇒ [AAAAD]  (Using Rule #4)
  ⇒ [AAAA]   (Using Rule #3)
  ⇒ λAAAA]   (Using Rule #2)
  ⇒ λAAAAλ   (Using Rule #5)
  ⇒ AAAA     (Definition of λ)
  ⇒ aaaa     (Using Rule #6)
  ⇒ a^4      (Definition of Concatenation)

While we're at it, we can also define:

  • Derivation Tree - A tree formed by showing all steps (from start to finish) of forming a string from a Grammar.
  • Ambiguous Grammar - A Grammar one or more strings, which have two or more associated Derivation Trees.
  • Ambiguous Language - A language in which all Grammars for it are Ambiguous Grammars.
  • Left-Most Derivation - A Derivation in which only the left-most possible symbol changes for each step.
  • Right-Most Derivation - A Derivation in which only the right-most possible symbol changes for each step.