Language Theory Foundations - Grammars: Difference between revisions

From Dev Wiki
Jump to navigation Jump to search
(Create page)
 
(Add another example)
Line 13: Line 13:
* '''Production Rules''' ('''P''') - A finite set of rules to manipulate a given string into a final, valid string for the given Language and Grammar.
* '''Production Rules''' ('''P''') - A finite set of rules to manipulate a given string into a final, valid string for the given Language and Grammar.
** In other words, a finite set of rules to substitute one valid symbol for one or more other valid symbols.
** In other words, a finite set of rules to substitute one valid symbol for one or more other valid symbols.
Note that Terminal Symbols and Non-Terminal Symbols are disjoint/mutually exclusive. That is, <code>V ∪ T = ∅</code>


* '''Derivation''' - The steps taken in a particular instance of forming a string from a Grammar.
* '''Derivation''' - The steps taken in a particular instance of forming a string from a Grammar.


== Formal Grammar Derivation ==
== Formal Grammar Derivation ==
Ex: Let's take a basic Language L(G) = {a^2^n | n ≥ 0 }.
=== Example 1 ===
Let's take a basic Language L(G) = {a^2^n | n ≥ 0 }.
  L(G) = {a^2^n | n ≥ 0 }
  L(G) = {a^2^n | n ≥ 0 }
  Which results in L(G) = {a, aa, aaaa, aaaaaaaa, ...}
  Which results in L(G) = {a, aa, aaaa, aaaaaaaa, ...}
Line 53: Line 56:
   ⇒ a^4      (Definition of Concatenation)
   ⇒ a^4      (Definition of Concatenation)


While we're at it, we can also define:
=== Example 2 ===
Let's take
L(G) = { w ∈ {a, b, c}* | n_a(w) = n_b(w) = n_c(w) }
V = {A, B, C, S}
T = {a, b, c}
P = {
    S  → λ | ABCS  (Rule #1)
    AB → BA        (Rule #2)
    AC → CA        (Rule #3)
    BA → AB        (Rule #4)
    BC → CB        (Rule #5)
    CA → AC        (Rule #6)
    CB → BC        (Rule #7)
    A  → a        (Rule #8)
    B  → b        (Rule #9)
    C  → c        (Rule #10)
}
 
Given this Language and Grammar, we can derive the example string "ccbaba":
S ⇒ ABCS    (Using Rule #1)
  ⇒ ABCABCS  (Using Rule #1)
  ⇒ ABCABCλ  (Using Rule #1)
  ⇒ ABCABC  (Definition of λ)
  ⇒ ACBACB  (Using Rule #5)
  ⇒ CABCAB  (Using Rule #3)
  ⇒ CBACBA  (Using Rule #2)
  ⇒ CBCABA  (Using Rule #3)
  ⇒ CCBABA  (Using Rule #5)
  ⇒ ccBABA  (Using Rule #10)
  ⇒ ccbAbA  (Using Rule #9)
  ⇒ ccbaba  (Using Rule #8)
 
 
== More on Derivations ==


* '''Derivation Tree''' - A tree formed by showing all steps (from start to finish) of forming a string from a Grammar.
* '''Derivation Tree''' - A tree formed by showing all steps (from start to finish) of forming a string from a Grammar.
* '''Ambiguous Grammar''' - A Grammar one or more strings, which have two or more associated Derivation Trees.
* '''Ambiguous Grammar''' - A Grammar one or more strings, which have two or more associated Derivation Trees.
* '''Ambiguous Language''' - A language in which all Grammars for it are Ambiguous Grammars.
* '''Ambiguous Language''' - A language in which all Grammars for it are Ambiguous Grammars.
* '''Left-Most Derivation''' - A Derivation in which only the left-most possible symbol changes for each step.
* '''Left-Most Derivation''' - A Derivation in which only the left-most possible symbol changes for each step. Excludes terminal symbols, as those cannot be changed.
* '''Right-Most Derivation''' - A Derivation in which only the right-most possible symbol changes for each step.
* '''Right-Most Derivation''' - A Derivation in which only the right-most possible symbol changes for each step. Excludes terminal symbols, as those cannot be changed.

Revision as of 20:44, 28 September 2019

Formal Grammars

A Formal Grammar is a set of rules for establishing strings in a given Language. Essentially, it's a set of rules that describe how to form valid strings, using the given language's alphabet. A grammar doesn't define the string's meaning, just the form a string can take.

Generally speaking, a Grammar will begin with a Start Symbol, have a separate sets of Terminal Symbols and Non-Terminal Symbols, and finally have a set of Production Rules which indicate what can be produced for any given symbol.

To put it more formally, we can define a grammar G (for given language L) as:

G = (V, T, S, P)
  • Non-Terminal Symbols (V) - A non-empty, finite set of symbols for which production rules cannot create more symbols from.
  • Terminal Symbols (T) - A non-empty, finite set of symbols for which production rules creates more symbols from.
    • When all symbols in a string are "Terminal Symbols", then we have a fully valid string for the given Language and Grammar.
  • Start Symbol (S) - A symbol indicating a start point for the "Production Rules" to begin doing work.
  • Production Rules (P) - A finite set of rules to manipulate a given string into a final, valid string for the given Language and Grammar.
    • In other words, a finite set of rules to substitute one valid symbol for one or more other valid symbols.

Note that Terminal Symbols and Non-Terminal Symbols are disjoint/mutually exclusive. That is, V ∪ T = ∅

  • Derivation - The steps taken in a particular instance of forming a string from a Grammar.

Formal Grammar Derivation

Example 1

Let's take a basic Language L(G) = {a^2^n | n ≥ 0 }.

L(G) = {a^2^n | n ≥ 0 }
Which results in L(G) = {a, aa, aaaa, aaaaaaaa, ...}

Now let's look at the associated grammar G = (V, T, S, P) such that

V = {S, [, ], A, D}
T = {a}

Note that the start symbol, "S", is part of the non-terminal set. So are the characters "[" and "]".
Finally, we can define the rules for this Grammar as:

P = {
   S  → [A]      (Rule #1)
   [  → [D | λ   (Rule #2)
   D] → ]        (Rule #3)
   DA → AAD      (Rule #4)
   ]  → λ        (Rule #5)
   A  → a        (Rule #6)
}

Given all that, we can finally derive an example string for our language. In this instance, let's derive a^4.
We always start with the start symbol so:

Start with S
S ⇒ [A]      (Using Rule #1)
  ⇒ [DA]     (Using Rule #2)
  ⇒ [AAD]    (Using Rule #4)
  ⇒ [AA]     (Using Rule #3)
  ⇒ [DAA]    (Using Rule #2)
  ⇒ [AADA]   (Using Rule #4)
  ⇒ [AAAAD]  (Using Rule #4)
  ⇒ [AAAA]   (Using Rule #3)
  ⇒ λAAAA]   (Using Rule #2)
  ⇒ λAAAAλ   (Using Rule #5)
  ⇒ AAAA     (Definition of λ)
  ⇒ aaaa     (Using Rule #6)
  ⇒ a^4      (Definition of Concatenation)

Example 2

Let's take

L(G) = { w ∈ {a, b, c}* | n_a(w) = n_b(w) = n_c(w) }
V = {A, B, C, S}
T = {a, b, c}
P = {
   S  → λ | ABCS  (Rule #1)
   AB → BA        (Rule #2)
   AC → CA        (Rule #3)
   BA → AB        (Rule #4)
   BC → CB        (Rule #5)
   CA → AC        (Rule #6)
   CB → BC        (Rule #7)
   A  → a         (Rule #8)
   B  → b         (Rule #9)
   C  → c         (Rule #10)
}

Given this Language and Grammar, we can derive the example string "ccbaba":

S ⇒ ABCS     (Using Rule #1)
  ⇒ ABCABCS  (Using Rule #1)
  ⇒ ABCABCλ  (Using Rule #1)
  ⇒ ABCABC   (Definition of λ)
  ⇒ ACBACB   (Using Rule #5)
  ⇒ CABCAB   (Using Rule #3)
  ⇒ CBACBA   (Using Rule #2)
  ⇒ CBCABA   (Using Rule #3)
  ⇒ CCBABA   (Using Rule #5)
  ⇒ ccBABA   (Using Rule #10)
  ⇒ ccbAbA   (Using Rule #9)
  ⇒ ccbaba   (Using Rule #8)


More on Derivations

  • Derivation Tree - A tree formed by showing all steps (from start to finish) of forming a string from a Grammar.
  • Ambiguous Grammar - A Grammar one or more strings, which have two or more associated Derivation Trees.
  • Ambiguous Language - A language in which all Grammars for it are Ambiguous Grammars.
  • Left-Most Derivation - A Derivation in which only the left-most possible symbol changes for each step. Excludes terminal symbols, as those cannot be changed.
  • Right-Most Derivation - A Derivation in which only the right-most possible symbol changes for each step. Excludes terminal symbols, as those cannot be changed.