Script Perfect

         Random snips of code and bugs

Detecting Bad Words PHP (Filter bad words)

Posted by Tim On September - 17 - 2009

Simple and easy, stop most of the nasty words from appearing on your website or blog with one simple function. This has stemmed from multiple users posting profanity on another one of our sites. After searching far and wide for an easy solution, alas we could not find one. This resulted in a few hours behind the keyboard plucking away with some trial and error, the end result is a lightweight function that works great.

The hardest part about developing a function that would detect profanity was accounting for the special characters being used as letters. Like the “#” being used for a “H” or a “$” for a “S”. The solution to detect the special characters was rather simple once we figured out a good method. Now, in it’s entirety, it will stop most profanity and if any gets through it is very easy to modify it for additional characters and words.


This complete function(with a nice list of bad words) as well as a demo can be downloaded here:
Bad Words Filter – zip
- or –
Try the demo, Click Here

The Setup

We will begin by creating a function called BadWords($str) where $str will be the input string to be evaluated. Once we receive the string we will then convert it to lower case and store it in another variable, this is so we can check the original string as well as a string after we replace all of the special characters.

/**
* @author Timothy Sturrock
* @website www.scriptperfect.com
* @license Creative Commons: Attribution-Share Alike 3.0 Unported
* @use You are free to use the following for any purpose as long as the author and website are not removed.
* @copyright 2009
*/

function BadWords($str){
$str=strtolower($str);
$cleanstr = $str;
}

Special Characters

Detecting special characters is accomplished through the use of array, we assign a character or a series of characters to a letter that it would represent starting with the most complicated to the least. Sometime people have a habit of using two special characters such as “|3″ to make letters, that would be a “b”. We are sure that there are many more that can be thrown into the mix here but this will show you the basics we have come up with:

$charlist = array("|3"=>"b", "13"=>"b", "l3"=>"b", "|)"=>"d", "1)"=>"d", "[)"=>"d", "|("=>"k", "1("=>"k", "$"=>"s", "("=>"c", "1"=>"i", "+"=>"t", "|"=>"i", "!"=>"i", "#"=>"h", "<"=>"c", "@"=>"a", "0"=>"o", "{"=>"c", "["=>"c");

Replacing the Caharacters

A simple loop will parse through each of the special characters in the string and replace any of the characters in the array above with their corresponding letters. We will store this into a new variable which will be checked later on along with the original string. Here is how we replace all of the characters:

foreach($charlist as $char=>$value){
$cleanstr = strtolower(str_replace ($char, $value, $cleanstr));
}

Looking for the Filth

So now all that remains is to make a list of the words we are looking for (I have included a nice list in the demo files) separated by commas.

$words = "all,our,bad,words,go,here";

Next we will take that list and explode it using the “,” as our divider. All that is left is to parse through each of the words and try to detect them in our original input string and our string with all the characters that were replaced:

$badarray = explode(",", $words);
foreach ($badarray as $naughty) {
if (preg_match("/$naughty/", $str) or preg_match("/$naughty/", $cleanstr)){
return true;
}
}
return false;

That’s it, simple and effective. On the down side, if your input string is large it may take a bit more time to loop through all the words and characters.

Please post suggestions on how to improve this script and if you have any ideas on how to implement a feature which will replace the bad word with a character such as “*”, but remember to find the string position the special characters have to be taken into consideration.

This complete function(with a nice list of bad words) as well as a demo can be downloaded here:
Bad Words Filter – zip
- or -
Try the demo, Click Here

4 Responses to “Detecting Bad Words PHP (Filter bad words)”

  1. Bastien says:

    This always makes me laugh. People want to block the offensive words and still manage to end up mangling names like Gay, or screwing things when the recipe calls for ’shittake’ mushrooms or the wooed comes from the shittah tree (http://en.m.wikipedia.org/wiki/Shittah-tree).

    A human will make a better filter / moderator than any computer becuase otherwise you end up with merde! Oh, yep, it’s the world wide web, and people swear in lots of languages.

  2. Amber says:

    While I agree that a human makes a better filter, this could have applications as a quick and easy way to flag posts for review by moderators.

  3. Tim says:

    @Bastien – This is a very basic way to detect some of the common swear words and is not intended to be a foolproof way at filtering. There is plenty of room for improvement.

    @Amber – That would be a great implementation

  4. Jon says:

    Thanks! Just what I was looking for!

Leave a Reply

Spam protection by WP Captcha-Free

About Me

I am an independent web developer and webmaster of many sites. The main goal of Script Perfect is to provide answers to some of the hard to find questions when it comes to website design and coding.

Twitter