I recently read an article over at w3.org, titled Choose URIs Wisely. The gist of it is that web addresses should be concise, easy to spell and remember, and be persistent even when the site is reorganized.
Now, unlike the XHTML standards and rules for valid markup, the advice given by the W3C are more like guidelines than actual rules. They show the way how the internet should ideally work. But in that way, I would compare web development more to an art than a craft. These standards are a way of life, which takes a lot of effort to implement, but which will reap great rewards in time - if you stick to it persistently.
As an example for how an URL should look, it might be best to examine that of the article itself:
The directory path doesn't look very spectacular, but it makes sense on a logical level. The article is part of the Tips series, which is part of Questions and Answers, which is a resource W3C offers.
What is unusual for a URL is the absence of a file extension. Extensions are ubiquitous in addresses. Almost all web pages end in .html, .htm or .php
However, this violates the W3 ideal that a resource address should contain only information about the identity of the resource, not any information on how it is stored or displayed. Several great disadvantages:
Take the .htm mess. .htm is an extension used many years back when there were systems (DOS) that couldn't handle file extensions with more then three letters. There is no sane reason to keep using them, but I know someone who insists on doing it.
Now picture this: You have remembered a long URL, but not written it down. Does it end in .htm? .html? .php?
The idea is that file extensions should stay out of the URI. Your ideal web address looks like http://web.site.net/directory/directory/page.That was the rant, now comes the useful stuff. How do you actually implement this ideal? How do you handle resource locators that have nothing to do with where the data is stored on your web server?
The easiest way would be Drupal (which powers this site). Drupal is the first CMS I have seen that supports this ideal for a URI scheme. Its combination of using clean URLs (URLs that contain no parameters, ie "?q=...") and URL aliases (URIs that are stored in a lookup table in MySQL, giving a resource a human-readable name) allows you to run any blog or forum according to the URI ideals given by W3C.
It gets harder when for whatever reason you can't have your website powered by a Drupal engine (I haven't yet come up with something that couldn't be done with it, but it is admittedly hard to migrate certain sites). You have to come up with a URI scheme yourself, and also implement it.I do this with a combination of mod_rewrite and PHP. Both are things that require you to have reasonably high access to your web server, so doing this on Freewebs or Geocities is pretty much out of the question.
Top level directory
Purpose
You should use some top level directory that all resources are stored beneath. This is actually slightly counter to the ideal way, since it needlessly makes the URI longer, but it makes the resource-handling much easier to implement. Also, should you ever add new resources that are "outside" this resource handler, the top-level directory is an easy way to distinguish these resources.
The top level directory should of course be as short as possible while remaining uniquely identifying. Wikipedia uses a top directory named /wiki/. You will notice that there are resources outside this directory. The 503 error page, for example. /wiki/ identifies a certain resource handler - which is a MediaWiki. Anything that comes after /wiki/ is the name of the resource.
Implementation
For this, you will need the ability to write an .htaccess file. This is a configuration file for your Apache webserver, which allows you to set certain preferences for a specific directory rather than the entire server.
The .htaccess file must be uploaded to the root directory of your web site - ie, the same directory you would put index.html in.
This is what your .htaccess file should contain (if you already have one, this is what you should add to it):
"name" is the top-level directory (eg. "wiki"). handler.php
"handler.php" is the PHP program that handles all page requests.
Note: This may cause conflicts. Specifically, if you have another resource that does not use this resource handler (lies outside this top directory) but still contains "name" somewhere within its URL, you have problems. There are ways to avoid this, but this is too uncommon to be unavoidable. Note that resources in this top directory are not affected - a URL in the "name" directory that contains "name" more than once will still work.
Handler
The handler program is a PHP file that looks up the resource identifier in the MySQL database and displays the appropriate content. It is important that this file does not do or display anything else. This way, the same handler can handle many different resources and content types.
I will explain how to implement the handler (and the lookup table) in the next post, tomorrow.
Upon re-reading this article half a year later, I'm afraid I have to admit that the information is woefully inaccurate. This is mostly because I was a beginner at utilizing .htaccess when I wrote this. I know a bit more now, which means that I know how messed-up some of the ways described here are. For example, the case of a URL containing "name" in its address - complete BS. This can be avoided by adjusting the regular expression to only match the beginning of the URL. Or a condition that checks if the file exists before handing it over to the handler - this even allows for a joint directory tree between "real" and "virtual" files, where .htaccess first checks for the presence of a real file before letting the virtual content handler do its work. This is how Drupal works, incidentally. When I have time, I'll rewrite this article to be somewhat more useful. Meanwhile, I hope you don't take me for a complete technical fool after reading this mess - I've learnt since then. -- Arancaytar
Thank you for your information. I would like to create a premalink like Drupal and WordPress. There we can create the pattern of the permalink. But my problem is, how do we can create a custom pattern? Do we need a REGEX skill and anything else?
Comments
Outdated and incorrect
For example, the case of a URL containing "name" in its address - complete BS. This can be avoided by adjusting the regular expression to only match the beginning of the URL.
Or a condition that checks if the file exists before handing it over to the handler - this even allows for a joint directory tree between "real" and "virtual" files, where .htaccess first checks for the presence of a real file before letting the virtual content handler do its work. This is how Drupal works, incidentally.
When I have time, I'll rewrite this article to be somewhat more useful. Meanwhile, I hope you don't take me for a complete technical fool after reading this mess - I've learnt since then.
--
Arancaytar
Thanks
honesty
Er...
Do we need REGEX?
Thank you for your information. I would like to create a premalink like Drupal and WordPress. There we can create the pattern of the permalink. But my problem is, how do we can create a custom pattern? Do we need a REGEX skill and anything else?
Thanks