Language Theory Foundations - Grammars

From Dev Wiki
Jump to navigation Jump to search

Formal Grammars

A Formal Grammar is a set of rules for establishing strings in a given Language. Essentially, it's a set of rules that describe how to form valid strings, using the given language's alphabet. A grammar doesn't define the string's meaning, just the form a string can take.

Generally speaking, a Grammar will begin with a Start Symbol, have a separate sets of Terminal Symbols and Non-Terminal Symbols, and finally have a set of Production Rules which indicate what can be produced for any given symbol.

To put it more formally, we can define a grammar G (for given language L) as:

G = (V, T, S, P)
  • Non-Terminal Symbols (V) - A non-empty, finite set of symbols for which production rules cannot create more symbols from.
  • Terminal Symbols (T) - A non-empty, finite set of symbols for which production rules creates more symbols from.
    • When all symbols in a string are "Terminal Symbols", then we have a fully valid string for the given Language and Grammar.
  • Start Symbol (S) - A symbol indicating a start point for the "Production Rules" to begin doing work.
  • Production Rules (P) - A finite set of rules to manipulate a given string into a final, valid string for the given Language and Grammar.
    • In other words, a finite set of rules to substitute one valid symbol for one or more other valid symbols.

Note that Terminal Symbols and Non-Terminal Symbols are disjoint/mutually exclusive. That is, V ∪ T = ∅

  • Derivation - The steps taken in a particular instance of forming a string from a Grammar.

Formal Grammar Derivation

Example 1

Let's take a basic Language L(G) = {a^2^n | n ≥ 0 }.

L(G) = {a^2^n | n ≥ 0 }
Which results in L(G) = {a, aa, aaaa, aaaaaaaa, ...}

Now let's look at the associated grammar G = (V, T, S, P) such that

V = {S, [, ], A, D}
T = {a}

Note that the start symbol, "S", is part of the non-terminal set. So are the characters "[" and "]".
Finally, we can define the rules for this Grammar as:

P = {
   S  → [A]      (Rule #1)
   [  → [D | λ   (Rule #2)
   D] → ]        (Rule #3)
   DA → AAD      (Rule #4)
   ]  → λ        (Rule #5)
   A  → a        (Rule #6)
}

Given all that, we can finally derive an example string for our language. In this instance, let's derive a^4.
We always start with the start symbol so:

Start with S
S ⇒ [A]      (Using Rule #1)
  ⇒ [DA]     (Using Rule #2)
  ⇒ [AAD]    (Using Rule #4)
  ⇒ [AA]     (Using Rule #3)
  ⇒ [DAA]    (Using Rule #2)
  ⇒ [AADA]   (Using Rule #4)
  ⇒ [AAAAD]  (Using Rule #4)
  ⇒ [AAAA]   (Using Rule #3)
  ⇒ λAAAA]   (Using Rule #2)
  ⇒ λAAAAλ   (Using Rule #5)
  ⇒ AAAA     (Definition of λ)
  ⇒ aaaa     (Using Rule #6)
  ⇒ a^4      (Definition of Concatenation)

Example 2

Let's take

L(G) = { w ∈ {a, b, c}* | n_a(w) = n_b(w) = n_c(w) }
V = {A, B, C, S}
T = {a, b, c}
P = {
   S  → λ | ABCS  (Rule #1)
   AB → BA        (Rule #2)
   AC → CA        (Rule #3)
   BA → AB        (Rule #4)
   BC → CB        (Rule #5)
   CA → AC        (Rule #6)
   CB → BC        (Rule #7)
   A  → a         (Rule #8)
   B  → b         (Rule #9)
   C  → c         (Rule #10)
}

Given this Language and Grammar, we can derive the example string "ccbaba":

S ⇒ ABCS     (Using Rule #1)
  ⇒ ABCABCS  (Using Rule #1)
  ⇒ ABCABCλ  (Using Rule #1)
  ⇒ ABCABC   (Definition of λ)
  ⇒ ACBACB   (Using Rule #5)
  ⇒ CABCAB   (Using Rule #3)
  ⇒ CBACBA   (Using Rule #2)
  ⇒ CBCABA   (Using Rule #3)
  ⇒ CCBABA   (Using Rule #5)
  ⇒ ccBABA   (Using Rule #10)
  ⇒ ccbAbA   (Using Rule #9)
  ⇒ ccbaba   (Using Rule #8)


More on Derivations

  • Sentential Form - Any string of symbols that can be derived from a finite steps from the start symbol.
  • Sentence - Any string who's symbols are 100% comprised of Terminal Symbols.
  • Derivation Tree - A tree formed by showing all steps (from start to finish) of forming a string from a Grammar.
  • Ambiguous Grammar - A Grammar one or more strings, which have two or more associated Derivation Trees.
  • Ambiguous Language - A language in which all Grammars for it are Ambiguous Grammars.
  • Left-Most Derivation - A Derivation in which only the left-most possible symbol changes for each step. Excludes terminal symbols, as those cannot be changed.
  • Right-Most Derivation - A Derivation in which only the right-most possible symbol changes for each step. Excludes terminal symbols, as those cannot be changed.