Regular expression decision solver
In Spring of 2024, I took the course "CS 6861: Introduction to Kleene Algebra". This course was very theory-heavy, but I wanted to apply some of the things learned into a programming project.
What Is Kleene Algebra?
In short, Kleene algebra is the algebra of regular expressions. It has addition and multiplication, which may be familiar to most people, but it has some other funky properties, some them including:
- $x + x = x$
- if $a <= b$, then $a + b = b$
- The Kleene star operator!
What My Project Does
In the second problem set for this course, one problem involved proving the following equivalences:
- $x* = xx$
- $x* = x**$
- $(xy)* = x*(yx*)*$
- $(x + y)* = x*(yx*)*$
- $x* = (1 + x)(1 + x)(xxx)*$
One of the nice results of Kleene algebra is that the algebra is decidable. This means if you can write an equation, there always exists a proof of whether this equation is true or false. For Kleene algebra specifically, there exists decision procedures, which are a set of rules that you can follow to always compute such a proof.
My project implements such decision procedures and is able to verify those equations are true! It can also reject incorrect equations.
How My Regex Decider Works
At a high level, my program works as follows:
- Parse 2 regular expressions into Abstract Syntax Trees (ASTs)
- Convert ASTs into Deterministic Finite Automatons (DFAs)
- Run bisimulation on the two DFAs to determine equivalence
Inputting Regular Expressions
I use Menhir to generate a parser for regular expressions.
The BNF grammar looks loosely like:
<program> ::= <expression>
<expression> ::=
| ZERO
| ONE
| <character>
| <expression> PLUS <expression>
| <expressin> TIMES <expression>
| LEFT_PARENTHESIS <expression> RIGHT_PARENTHESIS
| <expression> STAR
Using this grammar, we can generate a parser that will take the input $(ab)* + 1$ and turn it into the following AST:
PLUS
/ \
STAR ONE
|
TIMES
/ \
a b
Converting ASTs into Automatons
My program supports two different ways of converting Abstract Syntax Trees (ASTs) into automatons. Ultimately, we want Deterministic Finite Automatons (DFAs) to run bisimulation on.
One way my program can construct DFAs is by first using Thompson's Construction to construct Non-deterministic Finite Automatons (NFAs), and then convert them into DFAs using subset construction.
Another way is by constructing the DFA directly through Brzozowski derivatives. Isn't it cool that you can take derivatives of regular expressions?
Running Bisimulation
Related Resources
- Wikipedia page on Kleene algebra
- Course website from when I took it
- J. E. Hopcroft & R. M. Karp (1971): A linear algorithm for testing equivalence of finite automata. Technical Report 71–114, University of California