An In-Depth Look at Regular Expressions Part 3: Finite State Machines

Sep 22, 2016

So in part 1, we talked about different ways to represent languages in general, and in part 2, we talked about what a regular language is and how to use regular expressions to describe a regular language. Now I want to introduce another way to visualize a regular language.

A finite state machine, or FSM for short, is a type of theoretical machine or automaton, which can be represented by a basic flowchart. The diagrams I’ll be using as examples (which were made with the help of this nifty tool), will look something like this:

cr-regex-fsm-7

So S0, S1, S2, S3, S4, and S5 are all states, and S3 and S5 are accept states. The idea is, starting at S0, each possible path that takes you to an accept state represents an element in the language.

Let’s try tracing through this FSM. Starting a S0, we can move to S1, giving us “y” . Then to S2, giving us “ye” . Finally we move to the accept state S3, giving us “yes” . If we repeat these steps for the other path, we now have a set of strings {“yes”, “no”} . Since FSMs and regular expressions are both representations of regular languages, we can also write a regular expression to represent this language, which would look like this yes|no .

Let’s take a look at the three basic operations we discussed in part 2 in the form of FSM diagrams:

1) Concatination: (ab)

cr-regex-fsm-3

2) Alteration: (a|b)

cr-regex-fsm-2

3) Kleene Star: a*

cr-regex-fsm-1

And just like regular expressions you can combine these basic operations to represent more complex regular languages.

Take a+  for example. Which we know from part 2 is shorthand for a(a*) . If we look at the example FSM for (ab)  and replace the b  with (a*) , we can see the move from state S to state A will be the same. And if we replace states A and F with our Kleene star FSM, we’ll get something like this:

cr-regex-fsm-4

There’s also a useful FSM concept which I’ve heard referred to as “lambda moves”. If you recall from part 1, λ (lambda) is sometimes used to represent an empty character (you might refer to the concept as “epsilon moves” if you use ε (epsilon) to represent empty characters). If we look at another regex shortcut we discussed in part 2, a? , we can use the FSM for alteration to create an FSM for this shortcut, which we know is the same as (a|)  or (a|λ) . Simply replace b  in the alteration example with a lambda move.

cr-regex-fsm-5

Of course you can also shorten this if you use multiple acceptance states.

cr-regex-fsm-6

RECENT POSTS

The Agile Manifesto in Practice: Part 1

The Agile Manifesto in Practice: Part 1

  How Sourcetoad Values People Over Process The software development process can involve a lot of uncertainty for both development teams and clients alike, especially in the early phases of a project. How can the long-term vision of an application be balanced...

What to Consider When Building HIPAA-Compliant Software

What to Consider When Building HIPAA-Compliant Software

In 1999, the Department of Health and Human Services (HHS) passed the Health Insurance Portability and Accountability Act (HIPAA) as a measure to protect personal health information (PHI) and allow people control of their healthcare records. The HITECH Act was enacted...

The Evolution of Buy Now, Pay Later in eCommerce: Part 2

The Evolution of Buy Now, Pay Later in eCommerce: Part 2

In Part 1, we talked about the rapid growth of Buy Now, Pay Later (BNPL) and discussed its expansion across industries. In Part 2, we will consider how impending regulation may shake up the short-term lending space.   Impending Regulation of BNPL While consumers...