Regex, or regular expression is used to match characters in a string, the problem with using it is knowing how to use it. As with many things it takes practice and memorization to fully understand and commit it to memory. Regex is a powerful tool and can shorten many tasks such as validating email addresses, redirecting users with mod_rewrite, and searching for illegal characters in a submitted field.
PHP uses two very powerful statements to utilize regex, the first is called preg_match which takes a string and tries to match the pattern you provide it. The second statement is preg_replace which will do the same as preg_match with the exception that it will replace the pattern it finds with some other string you provide. I say only two here and some of you may be thinking “liar!” so I will mention a few others, there is ereg and eregi, they do basically the same things that the preg functions do, just not as fast or powerful. These are the only ones I will be covering, there are several preg functions besides these that you can research at your own pace.
A Very Basic Expression
The above code is one of the most basic examples of regex, we use the preg_match function which will return either a true or false if it finds a match. $searchstring is our string that we want to evaluate, we are looking for the word “beer”. But why the “/’s”? With regular expressions we must provide a beginning and ending deliminator, we do this with the /. So in the above statement our $status will be true if we find the word “beer” in our $searchstring which is “Peanuts and beer”.
Now that we have a basic understanding of how the function works let’s take it to an entirely new level. This is the time you should brew a pot of coffee or pour another shot, either way it does not hurt to have a crutch.
Regex Syntax
- ^ This marks the start of an expression
- $ This marks the end of an expression
- () or {} or [] Is what we use to group
- + or * or ? Is used for counting
I know, I know…wtf right? Let’s do some basic work with some of the above so you can see how it works. Let’s say we wanted to check for the presence of only letters, no numbers. We would have to create a group for our expression which only contains letters. See below:
What do you thing the result of $status will be in the above statement? The answer is false, here is why. In our expression you will notice we have our deliminators “/” which mark the beginning and end of our statment, then we have the ^ and $ which give us our start and end of our actual expression. The key to the answer here is what is in the middle of all of that, the “[a-z]” will search for all letters between a and z. So why did it return false, the $searchstring is all letters from a to z? The reson is that we have capitol letters in our $searchstring and we only searched for lowercase letters. To fix this we would need to add [A-Z] into our expression, the beauty is we can join [a-z] and [A-Z] together like this:
Not too bad right? Ok fine, lets get more complicated. Let’s see if we can validate if an email address is somewhat legitimate. First we need to think about what an email address looks like and how we can break it into sections in order to validate it. We know the first part of an email address is a string of characters with no spaces or special characters with the exception of – and _. So lets build that part first, we know we can use [a-zA-Z] for the letters, what about numbers? [a-zA-Z0-9] Haha, this is not bad! What about the special charachters? [a-zA-Z0-9.-_] There we have it, any letter from a-z upper or lower case, any number, and our characters “-_.”.
Every email also has the @ symbol, we need to add that to our expression in the right order, so after our beginning expression we need to add the following: +@ meaning in addition to our first expression.
Next we need to validate the last portion of the address, so we should have a few more characters followed by a period and a few more characters. See if you can figure it out, the complete expression is below:
$status = preg_match("/^[a-zA-Z0-9._-]+@[a-zA-Z0-9-]+\.[a-zA-Z]{2,5}$/", $email);
A couple notes on the above. The \ must be used to escape a character, since the . must be detected outside of our groups we need to escape it. The grouping at the end {2,5} signals the the previous group [a-zA-Z] must be a least 2 characters and no longer than 5.
Thank you Sir! May I have another?! (Yes you can)
Toss back another espresso or scotch, the feces has hit the fan. We are going to do some replacements now and throw in a pinch of confusion. The preg_replace function can be used to replace a string within another string, we can also keep the piece we took out. Take a look at the expression below:
$newstring = preg_replace ("/^<strong>(.*)<\/strong>$/", "<strong>regex! not $1</strong>", $searchstring);
So above we are looking for anything between <strong> and </strong> and replacing it with “regex not $1″. The $1 symbolizes the string that was replaced. So our end result would look like this:
“I love regex! not beer”
You can replace and recall as much as you would like using regex, each string you replace will be ordered starting with $1 and continuing with $2, $3, etc…
This is a great tool for mod_rewrite functions where you want to replace all instances of a certain variable.
The above statement is for rewriting url’s. So if we had “index.php?category=100peanuts&id=2beers” our rewritten url would look like this “100peanuts/2beers”.
Well kiddos, I hope this helps you understand how regex works, with some practice you too can become an alcoholic. I’m off to go pass out.
![[del.icio.us]](http://scriptperfect.com/wp-content/plugins/bookmarkify/delicious.png)
![[Digg]](http://scriptperfect.com/wp-content/plugins/bookmarkify/digg.png)
![[dzone]](http://scriptperfect.com/wp-content/plugins/bookmarkify/dzone.png)
![[Facebook]](http://scriptperfect.com/wp-content/plugins/bookmarkify/facebook.png)
![[Furl]](http://scriptperfect.com/wp-content/plugins/bookmarkify/furl.png)
![[Google]](http://scriptperfect.com/wp-content/plugins/bookmarkify/google.png)
![[LinkedIn]](http://scriptperfect.com/wp-content/plugins/bookmarkify/linkedin.png)
![[MySpace]](http://scriptperfect.com/wp-content/plugins/bookmarkify/myspace.png)
![[Newsvine]](http://scriptperfect.com/wp-content/plugins/bookmarkify/newsvine.png)
![[Propeller]](http://scriptperfect.com/wp-content/plugins/bookmarkify/propeller.png)
![[Reddit]](http://scriptperfect.com/wp-content/plugins/bookmarkify/reddit.png)
![[Slashdot]](http://scriptperfect.com/wp-content/plugins/bookmarkify/slashdot.png)
![[Spurl]](http://scriptperfect.com/wp-content/plugins/bookmarkify/spurl.png)
![[StumbleUpon]](http://scriptperfect.com/wp-content/plugins/bookmarkify/stumbleupon.png)
![[Technorati]](http://scriptperfect.com/wp-content/plugins/bookmarkify/technorati.png)
![[Twitter]](http://scriptperfect.com/wp-content/plugins/bookmarkify/twitter.png)
![[Email]](http://scriptperfect.com/wp-content/plugins/bookmarkify/email.png)