External Resources and Privacy

Something that used to be a perfectly normal part of the Web and HTML specifications has been turned into quite a privacy problem: External resources.

These are every element of HTML that contains a "src" (source) attribute - images, external scripts, frames and inline frames, and even stylesheets. Any time a web page is called, the browser first opens a connection to the web server and loads the requested document; then - without further user input - it begins to request all the external resources this document includes. Of course, no distinctions are made between different servers - I can include the Google logo here without a problem:

IMAGE(http://www.google.com/intl/en_ALL/images/logo.gif)

For good reason, too: Imagine you would receive a pop-up warning every time a web page tries to load a resource not on its own server.

But here's the privacy issue. If an image is included in a page, this causes a separate requests to the server - which is logged. Using dynamic images (eg. with PHP), the server can even send you cookies in response to your image request. The image itself need not even be visible: A 1x1 white pixel in the bottom right corner of every page can allow the image host to log all visits, set and read cookies, and see the path the visitor took. Here's an example in between the brackets:

[IMAGE(http://ermarian.net/html/php/i_can_see_you.php)]

Apart from further HTML or JavaScript exploits, this script can do pretty much anything it could have done if you'd clicked this (non-existant) link. And you weren't even warned.

--

Now, Firefox offers a few settings that eliminate part of this problem. Firstly, there is the option named "Automatically load images". Wenn switched off, the browser won't load any images included on the site. It even has an built-in "white-list" that allows exceptions for certain sites. But that's as far as it goes: A website that is white-listed will have all of its images loaded. Also, this doesn't stop iframes, scripts or stylesheets, which can just as easily be used for this purpose.

The second feature is a bit more powerful. Adblock is an extension everybody should install. It allows black-listing all resources whose addresses match a particular URL.

--

In my opinion, however, there is still a gap there. You can either allow certain resources to be loaded everywhere or nowhere. And without additional settings, you have to either white-list every single site you visit, or live with the risk of downloading content before you even know to block it.

Most of the privacy risks come from a third-party having resources on your web page - after all, the original web server already sees your first request; it doesn't need to use images to trick you into sending further requests.

The solution I propose is therefore to block websites from loading resources not on their own server.

Like the other features, this would have a white-list and a black-list. Some websites could be allowed to load any resources, some resources could be allowed from anywhere, and other websites would be permitted to load only content from its own url and, say, "google.com".

Further fine-tuning to block images, scripts, css or frames would be nice too.

Finally, this extension would be a great addition for Thunderbird as well. Mail tracers like "readnotify" rely on invisible resources included in "spiked" mails, which call home whenever the mail is read in HTML format. Simply disallowing such resources from being loaded would save a good deal of grief while still maintaining convenience.

Keywords: 
Taxonomy upgrade extras: 
News Category: 
© 2006-2012: All content, unless otherwise noted, is the property of Arancaytar. It may be copied and modified with attribution for non-commercial purposes. By publishing comments on this site, you grant Arancaytar a non-exclusive, perpetual license to reproduce and publish these comments along with any identifying information provided. (You may request your comments to be deleted or edited voluntarily.)