There are a few different things to do to make Apache_mod rewrite right. Overall the difficulty isnt too great, but setting it up right at the beginning is the key. You dont really want to have to catch every little exception in mod_rewrite regular expressions. Using your database to store safe strings to use in your url makes the whole process much more efficient. This little fact is usually not mentioned in tutorials for mod_rewrite.
You really do want to keep the mod_rewrite rules simple. Dont try to write a complex regexp in mod_rewrite that handles all kinds of apostropes, special characters, etc. (like I did). You dont have to have question marks, quotations, colons in the rewritten url for it to be useful to search engines. You can turn a title like “O’mally’s dog’s bone” into http://domain.com/Omallys_dogs_bone and there is definitely enough textual sense in that rewritten url for a search engine to deal with it.
Take your table with all your content data in it. Create a field for your content for a safe title. Then you can process your old titles into the new field. In your looping construct, use a bit of php to clean out your old titles for spaces, quotes, slashes, and other silly things.
$punctuations = array('.', '\'', '?','!','*','=','Ó','%','@','&',',','/');
$safeTitle = str_replace($punctuations, "", $title);// get rid of the junk
$safeTitle = str_replace(" ", "_", $safeTitle);// replace spaces with underscores
Now you have a content resource which you can add to your output queries that will fill in your url link on your page for mod_rewrite goodness.
Make your mod_rewrite rule in your .htaccess file. Note here that the rule has a place for 2 variables, and is looking for all instances of strings with upper and lower case letters, the numbers 0-9, and the underscore character. And of course, it turns it all back into a query string to submit to your content page.
RewriteRule ^/?([a-zA-Z0-9_]+)/([a-zA-Z0-9_]+)(/)?$ item.php?safeTopicName=$1&safeTitle=$2
Almost done right? Eh, not quite. Almost though. Dont screw over your existing users, who may have linked to something of yours to the past. You can still account for your old reference style to your web content, and you most definitely should. You can write checks for query string data validation to allow for transparent access to content through either the old query string method or the new one.
if($_GET["safeTopicName"]){
$sql = sprintf("SELECT topicId
FROM contentTopics
WHERE safeTopicName
LIKE '%s'",
mysql_real_escape_string($_GET["safeTopicName"]));
diode($topicId = $db->getOne($sql), $sql); // my db connection wrapper
$sql = sprintf("SELECT articleid
FROM content
WHERE safeTitle
LIKE '%s'",
mysql_real_escape_string($_GET["safeTitle"]));
diode($articleid = $db->getOne($sql), $sql);
} else {
if($_GET["topicId"]) {
$topicId = (int)$_GET["topicId"]);
}
if($_GET["articleid"]) {
$articleid = (int)$_GET["articleid"];
}
}
if(!isset($topicId) || !isset($articleid)) {
addMessage("no item found", "MsgErr");
redirect();
exit();
}
A couple notes: Im using PEAR, and a couple of custom functions for efficiency sake. Note the use of (int) and mysql_real_escape_string() for sanitizing and typing. And yes, there are probably better ways to write this up, but you get the idea. Look for your $_GET vars, and if you dont have one set or the other, no result, otherwise, process it so the rest of the code needs no further reliance on these initial options so a user can get to your site with /Planets/earth as well as with item.php?topicId=2&articleid=249.
To Recap:
- Set up safe versions of your content titles
- process the old titles with a script
- make a simpler rewrite rule as a result
- set up your validation to process both kinds of queries
- marvel about how much simpler it was to do it that way than to try and do it all with Mod_Rewrite alone.
