Language Theory Foundations - Regular Expressions
Jump to navigation
Jump to search
Regular Expressions (known as regex) are a means to specify possible strings within a language.
Regex Operations
Operation | Description | Symbol | Example | Regular Expression |
---|---|---|---|---|
Concatenation | Combine Strings | ab | ab | |
[a-c][de] | ad ∪ ae ∪ bd ∪ be ∪ cd ∪ ce | |||
Kleene Star Closure | Get Between Zero and Multiple | * | [ab]* | (a ∪ b)* |
Disjunction | A | (a ∪ b)* ∪ A | ||
One or More | Get at Least One | + | [ab]+ | (a ∪ b)^+ |
Zero or One | Get One Instance or Empty | ? | a? | (a ∪ λ) |
One Character | Get One Character from Alphabet | . | a.a | a(a ∪ b)a for Σ={a,b} |
n Times | Repeat n Times | {n} | aaaa = a^4 | |
n or More Times | Repeat n or More Times | {n,} | a{4,} | aaaa^+ |
n to m Times | Repeat between n and m Times | {n, m} | a{4, 6} | aaaa ∪ aaaaa ∪ aaaaaa |
Regex Identities
Recall that ∅ is the empty set. Which is different than an empty string. Empty string implies that the set has a string, just the string has no symbols in it. Empty set implies that the set has no elements at all, not even an empty string.
Given arbitrary strings u, v, and w, we can define the following identities:
1) ∅u = u∅ = ∅ 2) λu = uλ = u 3) ∅ = λ 4) λ* = λ 5) (u ∪ v) = (v ∪ u) 6) (u ∪ ) = u 7) (u ∪ u) = u 8) u* = (u*)* 9) u(v ∪ w) = (uv ∪ uw) 10) (u ∪ v)w = (uw ∪ vw) 11) (uv)*u = u(vu)* 12.1) (u ∪ v)* = (u* ∪ v)* 12.2) = u*(u ∪ v)* 12.3) = (u ∪ vu*)* 12.4) = (u*v*)* 12.5) = u*(vu*)* 12.6) = (u*v)*u*
Regex Examples
Ex: Let's define the set of strings over {a, b, c} with length of 3 or more.
[abc]{3,}
Ex: Let's define the set of strings over {a, b}, with at least one instance of "aa" and length 2 or more.
[ab]*aa[ab]*
Ex: Let's define the set of strings over {a, b}, with at least two instances of "aa".
[ab]*aa[ab]*aa[ab*]