Language Theory Foundations - Regular Expressions

From Dev Wiki
Jump to navigation Jump to search

Regular Expressions (known as regex) are a means to specify possible strings within a language.

Regex Operations

Operation Description Symbol Example Regular Expression
Concatenation Combine Strings ab ab
[a-c][de] ad ∪ ae ∪ bd ∪ be ∪ cd ∪ ce
Kleene Star Closure Get Between Zero and Multiple * [ab]* (a ∪ b)*
Disjunction A (a ∪ b)* ∪ A
One or More Get at Least One + [ab]+ (a ∪ b)^+
Zero or One Get One Instance or Empty ? a? (a ∪ λ)
One Character Get One Character from Alphabet . a.a a(a ∪ b)a for Σ={a,b}
n Times Repeat n Times {n} aaaa = a^4
n or More Times Repeat n or More Times {n,} a{4,} aaaa^+
n to m Times Repeat between n and m Times {n, m} a{4, 6} aaaa ∪ aaaaa ∪ aaaaaa


Regex Identities

Recall that ∅ is the empty set. Which is different than an empty string.
Empty string implies that the set has a string, just the string has no symbols in it.
Empty set implies that the set has no elements at all, not even an empty string.

Given arbitrary strings u, v, and w, we can define the following identities:

1) ∅u = u∅ = ∅
2) λu = uλ = u
3) ∅ = λ
4) λ* = λ
5) (u ∪ v) = (v ∪ u)
6) (u ∪ ) = u
7) (u ∪ u) = u
8) u* = (u*)*
9) u(v ∪ w) = (uv ∪ uw)
10) (u ∪ v)w = (uw ∪ vw)
11) (uv)*u = u(vu)*
12.1) (u ∪ v)* = (u* ∪ v)*
12.2) = u*(u ∪ v)*
12.3) = (u ∪ vu*)*
12.4) = (u*v*)*
12.5) = u*(vu*)*
12.6) = (u*v)*u*


Regex Examples

Ex: Let's define the set of strings over {a, b, c} with length of 3 or more.

[abc]{3,}

Ex: Let's define the set of strings over {a, b}, with at least one instance of "aa" and length 2 or more.

[ab]*aa[ab]*

Ex: Let's define the set of strings over {a, b}, with at least two instances of "aa".

[ab]*aa[ab]*aa[ab*]