URL
Clean URIs with PHP, part two
Submitted by Arancaytar on Wed, 06/28/2006 - 23:59 – No commentsThis post is not going to go into much detail - I have my last lectures tomorrow, before the exams next week, so I really have other things on the horizon, but I can give a brief run-down of what else is involved in using PHP to make clean URIs.
The Content Handler
In the last post, I explained how to set up your .htaccess file so that all content under a certain directory level (say "/wiki") is handled by a certain PHP script. The next issue is how to set up this PHP script so it can actually serve content.
The PHP script needs to take the requested URI and return the resource. The URI is translated into a file through a lookup table in a MySQL database.
Lookup table
The lookup table can be surprisingly minimal. At the basic level, only two fields are required: The alias, and the corresponding filename.
CREATE TABLE alias_lookup ( alias VARCHAR (64) PRIMARY KEY, filename VARCHAR (64) NOT NULL );
The PHP script
Note: This is really just an example. Several things are unrealistic in practice; for example, the mysql query is potentially vulnerable to injection attacks.
// the directory of this content handler $handler_dir = '/main/'; $request = $_SERVER['REQUEST_URI']; // lookup $link = mysql_db_connect( {your database access info here} ) $sql = "SELECT filename FROM alias_lookup WHERE CONCAT('$handler_dir',alias) = '$request';"; $res = mysql_query($sql,$link); $row = mysql_fetch_array($res); $filename = $row[0]; // include if ($filename) { if (file_exists($filename) include($filename); // resource removed else header("Status: 401 Gone"); } // alias doesn't exist. else header("Status: 404 Not Found"); exit; ?>
Of course, you can do a lot more with this: Setting different content types and handling graphics as well as text, even including PHP programs that are themselves content handlers for resources lying further down the directory tree. In one website I just made, I use a regular expression match to replace certain links and add a visitor counter to html files before displaying them on the page - without having to change the original files!
- Arancaytar's blog
- Add new comment
- 3355 reads
Clean URI schemes using mod_rewrite and PHP (Part one)
Submitted by Arancaytar on Wed, 06/28/2006 - 01:08 – 6 commentsI recently read an article over at w3.org, titled Choose URIs Wisely. The gist of it is that web addresses should be concise, easy to spell and remember, and be persistent even when the site is reorganized.
Now, unlike the XHTML standards and rules for valid markup, the advice given by the W3C are more like guidelines than actual rules. They show the way how the internet should ideally work. But in that way, I would compare web development more to an art than a craft. These standards are a way of life, which takes a lot of effort to implement, but which will reap great rewards in time - if you stick to it persistently.
As an example for how an URL should look, it might be best to examine that of the article itself:
http://www.w3.org/QA/Tips/uri-choose
The directory path doesn't look very spectacular, but it makes sense on a logical level. The article is part of the Tips series, which is part of Questions and Answers, which is a resource W3C offers.
What is unusual for a URL is the absence of a file extension. Extensions are ubiquitous in addresses. Almost all web pages end in .html, .htm or .php
However, this violates the W3 ideal that a resource address should contain only information about the identity of the resource, not any information on how it is stored or displayed. Several great disadvantages:
- Take the .htm mess. .htm is an extension used many years back when there were systems (DOS) that couldn't handle file extensions with more then three letters. There is no sane reason to keep using them, but I know someone who insists on doing it.
- Now picture this: You have remembered a long URL, but not written it down. Does it end in .htm? .html? .php?
The idea is that file extensions should stay out of the URI. Your ideal web address looks like http://web.site.net/directory/directory/page.
That was the rant, now comes the useful stuff. How do you actually implement this ideal? How do you handle resource locators that have nothing to do with where the data is stored on your web server?
The easiest way would be Drupal (which powers this site). Drupal is the first CMS I have seen that supports this ideal for a URI scheme. Its combination of using clean URLs (URLs that contain no parameters, ie "?q=...") and URL aliases (URIs that are stored in a lookup table in MySQL, giving a resource a human-readable name) allows you to run any blog or forum according to the URI ideals given by W3C.
It gets harder when for whatever reason you can't have your website powered by a Drupal engine (I haven't yet come up with something that couldn't be done with it, but it is admittedly hard to migrate certain sites). You have to come up with a URI scheme yourself, and also implement it.
I do this with a combination of mod_rewrite and PHP. Both are things that require you to have reasonably high access to your web server, so doing this on Freewebs or Geocities is pretty much out of the question.
Top level directory
Purpose
You should use some top level directory that all resources are stored beneath. This is actually slightly counter to the ideal way, since it needlessly makes the URI longer, but it makes the resource-handling much easier to implement. Also, should you ever add new resources that are "outside" this resource handler, the top-level directory is an easy way to distinguish these resources.
The top level directory should of course be as short as possible while remaining uniquely identifying. Wikipedia uses a top directory named /wiki/. You will notice that there are resources outside this directory. The 503 error page, for example. /wiki/ identifies a certain resource handler - which is a MediaWiki. Anything that comes after /wiki/ is the name of the resource.
Implementation
For this, you will need the ability to write an .htaccess file. This is a configuration file for your Apache webserver, which allows you to set certain preferences for a specific directory rather than the entire server.
The .htaccess file must be uploaded to the root directory of your web site - ie, the same directory you would put index.html in.
This is what your .htaccess file should contain (if you already have one, this is what you should add to it):
"name" is the top-level directory (eg. "wiki"). handler.php
"handler.php" is the PHP program that handles all page requests.
Note: This may cause conflicts. Specifically, if you have another resource that does not use this resource handler (lies outside this top directory) but still contains "name" somewhere within its URL, you have problems. There are ways to avoid this, but this is too uncommon to be unavoidable. Note that resources in this top directory are not affected - a URL in the "name" directory that contains "name" more than once will still work.
Handler
The handler program is a PHP file that looks up the resource identifier in the MySQL database and displays the appropriate content. It is important that this file does not do or display anything else. This way, the same handler can handle many different resources and content types.
I will explain how to implement the handler (and the lookup table) in the next post, tomorrow.
- Arancaytar's blog
- 6 comments
- 4003 reads



