Perennially Sane

HTTP

HTTP Code Lookup

No comments

Did it again: An hour or so of desperate procrastination yielded a small search engine with built-in auto-suggest feature.

This one does a look-up of HTTP status codes. Seasoned web developers know how often you need to know the meaning of a certain status - both for programming a server response and for understanding it when building a site that fetches or aggregates remote content.

Simply enter a number into the form and see the listed codes. A more detailed description is available from RFC 2616 Section 10, which defines these codes.

The page also includes an OpenSearch module, which can be installed in Firefox 2.0+, MSIE 7+, and (I think) Opera 9+, adding a new search engine to the search box.

The auto-suggest function is especially useful: Sometimes it is enough merely to know the name corresponding to a certain number - this will be shown in the list of suggested terms for the number you enter.

Of course, the script also does reverse lookups, allowing you to search for the number corresponding to a certain name.

External Resources and Privacy

No comments

Something that used to be a perfectly normal part of the Web and HTML specifications has been turned into quite a privacy problem: External resources.

These are every element of HTML that contains a "src" (source) attribute - images, external scripts, frames and inline frames, and even stylesheets. Any time a web page is called, the browser first opens a connection to the web server and loads the requested document; then - without further user input - it begins to request all the external resources this document includes. Of course, no distinctions are made between different servers - I can include the Google logo here without a problem:

IMAGE(<a href="http://www.google.com/intl/en_AL" title="http://www.google.com/intl/en_AL">http://www.google.com/intl/en_AL</a>...)

For good reason, too: Imagine you would receive a pop-up warning every time a web page tries to load a resource not on its own server.

But here's the privacy issue. If an image is included in a page, this causes a separate requests to the server - which is logged. Using dynamic images (eg. with PHP), the server can even send you cookies in response to your image request. The image itself need not even be visible: A 1x1 white pixel in the bottom right corner of every page can allow the image host to log all visits, set and read cookies, and see the path the visitor took. Here's an example in between the brackets:

[IMAGE(<a href="http://ermarian.net/html/php/i_c" title="http://ermarian.net/html/php/i_c">http://ermarian.net/html/php/i_c</a>...)]

Apart from further HTML or JavaScript exploits, this script can do pretty much anything it could have done if you'd clicked this (non-existant) link. And you weren't even warned.

--

Now, Firefox offers a few settings that eliminate part of this problem. Firstly, there is the option named "Automatically load images". Wenn switched off, the browser won't load any images included on the site. It even has an built-in "white-list" that allows exceptions for certain sites. But that's as far as it goes: A website that is white-listed will have all of its images loaded. Also, this doesn't stop iframes, scripts or stylesheets, which can just as easily be used for this purpose.

The second feature is a bit more powerful. Adblock is an extension everybody should install. It allows black-listing all resources whose addresses match a particular URL.

--

In my opinion, however, there is still a gap there. You can either allow certain resources to be loaded everywhere or nowhere. And without additional settings, you have to either white-list every single site you visit, or live with the risk of downloading content before you even know to block it.

Most of the privacy risks come from a third-party having resources on your web page - after all, the original web server already sees your first request; it doesn't need to use images to trick you into sending further requests.

The solution I propose is therefore to block websites from loading resources not on their own server.

Like the other features, this would have a white-list and a black-list. Some websites could be allowed to load any resources, some resources could be allowed from anywhere, and other websites would be permitted to load only content from its own url and, say, "google.com".

Further fine-tuning to block images, scripts, css or frames would be nice too.

Finally, this extension would be a great addition for Thunderbird as well. Mail tracers like "readnotify" rely on invisible resources included in "spiked" mails, which call home whenever the mail is read in HTML format. Simply disallowing such resources from being loaded would save a good deal of grief while still maintaining convenience.

The Perfect Web Form?

1 comment

There are several aspects you could judge a good online form by. One of the most important parts is the ease of understanding it, how much time it takes to submit content, and how frustrating it is to make a mistake.

Take, for example, the worst case.

The form contains a multitude of text fields, but they are not clearly labeled to show what should be filled in (nor in what form it should be filled in). Making a mistake results in one of two things: The form is rejected - preferably without any explanation of what mistake precisely you made - and you get the empty form to fill out from scratch. Or, arguably worse, your erroneous content is submitted and published for all to see (or quietly accepted, until some days later you realize that you never actually subscribed to the newsletter.

Now, the best would be the exact opposite. Apart from designing the form itself in a clear and intuitive way, the process of submission should be easy, fool-proof and cope with as few clicks as possible.

AJAX is good with that, but I manage to suck at JavaScript even more than JavaScript itself sucks. And those who use it regularly know that's not easy. So I try different ways.

The last I experimented with is HTTP redirection. The submission page for, say, a Guestbook, doesn't actually do anything besides say that your entry was saved, and would you please wait while you are redirected back to the book. My idea was to cut that part out and go *straight* back to the book - with your newly added entry. For this, I use the HTTP status code 303 - See Other.

There are several redirection codes, for different purposes. 301 permanently redirects a page. 302 does it temporarily. But neither of them is supposed to be respected if you are sending a POST form - which in this case, you are. (Note I say "supposed" - Firefox 2.0 follows 302 without complaints, converting the POST to a GET request counter to the specification.) 303 is meant for exactly those cases where you are submitting a form, and the server wants to redirect you from the form destination to a new page.

In this case, the "new page" is the book itself, which also contains the form.

So I have a guestbook page that lists the entries. The form is out of the way, at the bottom of the page - an anchor link is at the top that allows you to jump right down to the form.

The form is sent to add_entry.php, which verifies it. Now, one of two things happens: Either the post is deemed okay, and is saved in the database.

Then the script redirects thus:
HTTP/1.1 303 See Other Location: http://ermarian.net/guestbook/
Or the post is NOT okay, in which case it redirects back to the same page, but scrolls right downto the form:
HTTP/1.1 303 See Other Location: http://ermarian.net/guestbook/#sign
Of course, we are no better than the worst web form ever if we empty the form without even telling the use what was wrong. But here's one problem: How do you transfer information between these redirects? They're specifically designed not to redirect post requests, so those form contents get dropped in the redirect.

My solution is to transmit them back to the user as a cookie before redirecting.

The full code for the failed form check is therefore:
HTTP/1.1 303 See Other Location: http:// ermarian.net/guestbook/#sign Set-Cookie: gb_error=You+entered+an+invalid+email+address. Set-Cookie: gb_name=Signer; [...] Set-Cookie: gb_email=mailATemail.com; [...]
And so on. Note that the redirect doesn't mean cookies are ignored. The browser will save the cookies and *then* follow the redirect. Needless to say, the guestbook.php page must be equipped to fill the cookie contents back into the form.

For good measure, the successful redirect also sets a cookie, saying your message was saved.

Now, in the background, the browser is loading three pages. Firstly, the guestbook page, secondly, the add_entry.php script, and thirdly, the guestbook page again.

But the user has seen only a single page - the main guestbook page. The user fills out the form on this main page, clicks "Submit", and (to his eyes) the same page is loaded again with the post. Or, if there was an error, the page is loaded again with a filled form and an error message.

As content submission forms go, it's convenient. And it copes with a minimum of server or client resource usage - as opposed to the JavaScript, which tends to slow down browsers and potentially crash.

I named it "TinyBook", in part because it also consisted of only two PHP files. Of course, it will grow once any noteworthy features are added, but its core really consists of only two files: The one that reads/displays, and the one that writes. It's primitive. But compared to the third-party scripts I tried before I gave up and started from scratch, it is quite smooth.

Syndicate content
Powered by Drupal, an open source content management system

eXTReMe Tracker