Perennially Sane

newsfeed

How to generate an RSS 2.0 feed

No comments

An RSS feed has one primary purpose: To format information in such a way that it can be read by just about any standards-compliant feed reader.

Therefore, an RSS feed is really not "content" by itself, but rather a mode of output for content that already exists. If you have content that is published periodically in a way similar to news items or announcements, you can publish that content in an RSS feed.

Since the way the content is originally published differs widely (you might want to export the posts of a forum, or a blog, or even just the status of a network node), I'm not going to go into the ways you would get the content from that form into publishable format.

Rather, here is a simple PHP script that assumes the content that should be published is already in a MySQL table (with the appropriate columns: title, author, date, etc), and simply publishes the content according to RSS 2.0 specifications.

Firstly, here is a brief tree outline of how an RSS feed is built up:
What an RSS feed consists of: - xml-encoding and version - xml stylesheet (optional, for clean appearance in browser.) - rss version - channel: - title - url - description - language (en-us) - date of last entry - number of seconds between updates (common: 300 (5 minutes) up to 3600 (1 hour) - image (optional) - title - source address - link address - width - height repeat this block for each entry | - entry tag | - guid | - title | - author name | - content | - date

This is the color-coded PHP code. The complete script is attached as a text file, so you can skip straight to the end and just download it.

<?php<br />
/* php-program for outputting an RSS 2.0 feed<br />
 * from entries stored in a MySQL table<br />
 */<br />
/*********************************<br />
 * data structure<br />
 * the table (entry object) has the following columns (properties).<br />
 * - guid (primary key identifier)<br />
 * - author (a simple, alphanumeric name)<br />
 * - author_email (valid email address - rss 2.0 requires this, so use a default)<br />
 * - title<br />
 * - content<br />
 * - posted (datetime field)<br />
 * - link_url<br />
 *********************************/<br />
/*************************<br />
 * fetch items from table (adapt for column identifiers)<br />
 *************************/<br />
function get_items() {<br />
  $sql = "select<br />
      guid,<br />
      author,<br />
      author_mail,<br />
      title,<br />
      content,<br />
      unix_timestamp(posted) as posted,<br />
      link_url<br />
    from items<br />
    order by posted desc<br />
    limit 0,10;";<br />
  $res = mysql_query($sql);<br />
  while($row=mysql_fetch_array($res))  $rows[]=$row;<br />
  return $rows;<br />
}<br />
/*************************<br />
 * connect to database (if you have a function for this already, remove this)<br />
 *************************/<br />
function dbconnect() {<br />
  $server = '';<br />
  $user   = '';<br />
  $pass   = '';<br />
  $db     = '';<br />
  $link=mysql_connect($server,$user,$pass);<br />
  mysql_select_db($link);<br />
  return $link;<br />
}<br />
$link = dbconnect();        // connect<br />
$items = get_items();        // fetch entries<br />
$update=$items[0]['posted'];      // determine newest date<br />
mysql_close($link);        // close connection<br />
header("Content-type: application/xml");  // send xml MIME type header<br />
// begin output<br />
?><?='<?xml version="1.0" encoding="ISO-8859-1" ?>'?><br />
<?='<?xml-stylesheet href="/style/feed.css" type="text/css" media="screen"?>'?><br />
<rss version="2.0"><br />
  <channel><br />
    <title>Feed-Title</title></channel></rss></p>
<link />http://feed.url
  <description>This is the Feed Description</description><br />
  <language>en-us</language><br />
  <lastBuildDate><?=date("r",$updated)?></lastbuilddate><br />
  <ttl>900</ttl><br />
  <image><br />
    <title>Image Title</title><br />
    <url>http://image.net/image.jpg</url>
<link />http://feed.url
    <width>88</width><br />
    <height>31</height><br />
  </image><br />
<?php<br />
foreach ($items as $item) { // for each entry<br />
  /* write associative array to vars. saves space. */<br />
  foreach ($item as $name=>$value) $$name=$value;<br />
?><br />
    <item><br />
    <guid isPermaLink='false'><?=$guid?></guid><br />
    <title><?=$title?></title><br />
    <author><?=$author?> <<?=$author_mail?>></?=$author_mail?></author>
<link /><?=$link_url?>
    <description><?=$content?></description>
<pubDate><?=date("r",$posted)?></pubdate>
    </item><br />
<?php<br />
  foreach ($item as $name=>$value) $$name=''; // a precaution.<br />
}<br />
?><br />
  <br />
This is the end of the PHP code.

Baghdad Burning newsfeed

3 comments

Long-time readers of the excellent blog Baghdad Burning by Riverbend will have noticed ages ago that their feed readers stopped working.

Why this? The last item displayed in the Atom feed is dated June 2005 - more than a year ago.

Unfortunately, this also means that readers are not alerted to new posts, which are usually weeks apart. In practice, this means either reading posts many days late or futilely checking a page all the time.


Along comes PHP, and a way to reverse-engineer a (reasonably well-structured) html page back into a database of posts, and from there to an RSS feed - the kind you can put into Bloglines, Google Reader or your favorite aggregator. The colloquial term for this is "scratching" as far as I know.

I've spent the last weeks experimenting with regular expressions on the Pied Piper archives, so it wasn't hard to parse the site. It was actually far harder to find a way to escape/remove the special unicode characters, html tags and other stuff that aggregators don't like. It validates now, however. I present:

The Baghdad Burning newsfeed,
brought to you by feeds.ermarian.net!

Syndicate content
Powered by Drupal, an open source content management system

eXTReMe Tracker