Regular expression — or regex — is something that we all, as developers, will encounter at some point.

For many, it remains a form of gibberish. For some developers, they avoid it simply because it feels like it’s just too hard to get your head around.

Whatever your case, here is a quick and simple guide to help you decrypt the mysteries of regular expression — aka, regex.

What’s the big deal with regular expressions?

A regular expression is essentially a way to search through a string of text. 

Why do you need to do this?

The most common reason for using regular expressions is to create validations or advance find and replace situations. 

It is also useful for verifying the structure of strings, extraction, replacement, rearrangement and splitting strings into tokens. 

When you deal with a lot of data, regular expressions gives you the ability to manipulate and transform that data into something meaningful.

If you think about it, regular expression is a version of if else conditions (along with a few other conditional parameters) — but for strings. Its main task is to identify a particular pattern that fits within a certain set condition.

So without further ado, let’s dive right into the world of regular expressions.

Starting with the basics

There are three major concepts when it comes to regular expressions — alternatives, grouping and quantification.

A alternative statement basically tells the interpreter that the matching characters can be either/or. This is signified through a verticle bar — | 

For example:

color | colour

The regex for this will pick up both versions of the spelling.

Grouping is a form of signifying the relationship between the surrounding regular expression. This is done through a pair of ()

Remember back to your old algebra parenthesis rules? Well, it’s the same concept in regular expression.

For example — 

(3+5)(1+1) = 16

(3 + 5) is counted as one group and (1+1) is another. They both get processed first before things can proceed in the equation. This is because the parenthesis( ) acts as a scope for the numbers. 

In regular expression, this idea remains the same.

For example — 

col ( o | ou ) r

( o | ou ) is the equation that gets processed first before things can proceed. In this case, it’s saying that either o or ou can proceed for strings beginning with col and ending in r

Quantification sets the condition for a string’s occurrence. In algebra, we have *, +, , among other things as the primary rules of quantification — that is, what we want to do with the numbers proceeding and following directly after it.

In regular expression, we have the following — 

? : zero or one occurrence of the preceding element. 

* : zero or more occurrences of the preceding element.

+ : one or more occurrences of the preceding element. 

{n} : The preceding element is matched exactly n times.

{min,} : the preceding element is matched the minimum number or more times.

{min,max} : The preceding item is matched at least a minimum number of times but not more than the maximum.

. : called the wildcard because it can match the preceding element and then anything else after. 

Let’s put it into practice

Here is some sample text:

The big fat cat jumped over the lazy dog wearing the big blue bag.

In JavaScript, a regular expression always starts and ends with a pair of / /

This allows us to put extra meta conditions on the string such as g for global matches (find all cases rather than stopping at the first one), i for case insensitive searches or m for multiline matching. 

For example, /yourPatternHere/g would find all the cases inside the string.

Based on the example above, if we use /big/g, the following will be caught by the regular expression — 

The big fat cat jumped over the lazy dog wearing the big blue bag.

Here are some more examples based on the quantifiers discussed above:

? example: /f?at/g

The big fat cat jumped over the lazy dog wearing the big blue bag.

cat is only partially selected because the ? portion makes the f an optional parameter while the at part is compulsory.

* example: /f*t/g

The big fat cat jumped over the lazy dog wearing the big blue bag.

Only the t is selected because the * condition sets the minimum of f preceding the t at zero.

+example: /b+ag/g

The big fat cat jumped over the lazy dog wearing the big blue bag.

The letter b must be matched because the + sign signifies at least one instance of it in the string, followed by ag

combination of *+ and | example: /b+(i|a*)g/g

The big fat cat jumped over the lazy dog wearing the big blue bag.

The ( ) creates a quantifier boundary for the | to create the either/or condition — which is saying, select if the qualifying words have i or contains an a

So the process of elimination looks something like this:

The big fat cat jumped over the lazy dog wearing the big blue bag.
The big fat cat jumped over the lazy dog wearing the big blue bag.
The big fat cat jumped over the lazy dog wearing the big blue bag.

{n} example: /b{2}ig/g

The original text won’t bring up any results but if you look at the modified string, only those with a minimum of x2 b will get caught. Only x2 b will be highlighted as a result.

The bbig fat cat jumped over the lazy dog wearing the bbbig blue bag.

{min,} example: /b{1,}ig/g

The bbig fat cat jumped over the lazy dog wearing the bbbig blue bag.

Both versions are highlighted because only the minimum is specified with to cap at the maximum.

{min,max} example: /b{1,2}ig/g

The bbig fat cat jumped over the lazy dog wearing the bbbig big blue bag.

The range is specified and anything that falls into that range gets counted.

Final words

And that’s basically it for the barebones basics of regular expressions. There are a few more rules concerning regular expressions, but for the length of this piece, I’m going to end it here.

I hope you find it useful.

Thank you for reading.

Comments

0 comments