IntroductionDuring the course of this article, I’ll use the phrases ‘RSS document’ and ‘RSS feed’ interchangeably. They’re really the same thing, since a feed is simply an RSS XML document located at a particular URL. It’s up to the provider on how often the RSS feed is refreshed.
The RSS feed could easily be generated by an application on an hourly, daily, or even weekly basis. The application that generates the feed could be written in Perl, PHP, ASP.NET, or Java. It really doesn’t matter, since your web browser, web portal, or RSS reader is just picking up whatever document is located at the URL. Also, in this article, I focus on news stories and news sites since that is a very typical application of RSS
feeds. However, RSS feeds are not limited to just displaying news content – they can be used with any other content that is suitable for syndication on a network.
Without further introduction, I’ll get into the parts that make up the RSS document.
The rss (root) ElementThe RSS document has at its root the
rss element. The
rss element has only one required attribute, called
version. The
version attribute should be set to 2.0.
The channel ElementA channel is the distribution channel through which items are syndicated. The
channel element can have many elements nested inside of it, which are listed in the next two sections.
Required channel ElementsThe XML tags listed here are required in the channel element:
- description: this is some meaningful text that gives readers an indication of what the channel is all about. For example, “Example.com’s newest and exciting news, crop prices, and weather report.”
- item: at least one of these is required. Each item should correspond to a news story or another item that is being syndicated.
- link: the URL to the web site, such as http://www.example.com/news
- title: the title of the channel, which usually matches the title of the web site that is being syndicated. An example is ‘Example.com Latest News’.
Optional channel ElementsThere are many optional elements that can be nested inside a channel. Each one of them provides detail about the channel.
Following is a description of the channel elements:
- category: you can give the channel one or more categories. A category is a way for you to define the type of content being syndicated by this channel. An example of a category is ‘News’.
- cloud: allows you to register with the RSS channel to be notified when the channel is updated. The cloud element defines a procedure that is called with XML-RPC.
- copyright: copyright information.
- docs: the URL for documentation that covers the RSS format used in the current feed.
- generator: the name of the program used to generate the RSS feed.
- image: an image that can be associated with the RSS feed.
- language: the language of the text in the channel. The value is a code as defined by the W3C (see RFC1766). Examples are ‘en’, ‘fr’, ‘en-US’, or ‘fr-FR’.
- lastBuildDate: the last time the RSS document was generated.
- managingEditor: the e-mail address of someone responsible for the content linked from the RSS feed.
- pubDate: the date the content in the feed was published.
- rating: the rating of the content, which corresponds to the Platform for Internet Content Selection (PICS) rating that was originally used for rating the content of web sites.
- skipDays: a list of days that can be skipped by an application that checks for newer versions of the feed. For instance, if the feed is never written on weekends, put Saturday and Sunday in day elements inside this element.
- skipHours: the hours, on a 24-hour clock, that the RSS feed is not updated. Applications that read this part of the feed can check only on the hours not listed in the hour elements inside this one.
- textInput: text input boxes can be added into the feed for the purposes of searching or limiting your selections. However, this means that if you are writing an RSS feed generator, you have to add support for the textInput element. According to the RSS Advisory Board, most RSS aggregators ignore this element.
- ttl: the number of minutes that a feed should stay in a cache before it is refreshed. If the feed is only updated on a daily basis, this cache can be for an entire day (the value 1440).
- webmaster: the e-mail address of the site’s web master.
The item ElementAt least one item element must be contained in the channel, since there is no point in having a channel if nothing is going to be syndicated. An item contains everything worthy of being broadcast with an RSS feed, such as a title and description of a news article, a blog entry, and a newly added file to a repository. At least one of either a title or
description element is required, and the rest of the elements listed here are optional:
- author: the e-mail address of the item’s author. If this magazine article were exposed as an RSS item, the value would be ‘mail@nathang.com’.
- category: includes one or more categories that can be used to sort the item.
- comments: a URL of a web page that includes comments on the item’s content.
- description: a description of the item’s content.
- enclosure: an object attached to the item, like the URL for a podcast containing audio related to the item or the URL of a ZIP file that contains source code for a tutorial.
- guid: a Globally Unique Identifier (GUID). It doesn’t need to be in the GUID/UUID format. Instead, it can be a full URL, that will never change, which identifies the item.
- link: the URL of the item. If the item is a news story, it will be the URL of the page that contains the story.
- pubDate: the date the item was published.
- source: a URL link to the original source of the item, if it didn’t actually originate from your site. This allows you to publish links to other content, like the original news article that is discussed in a blog entry.
- title: the title of the item.
The RSS WriterUp to this point, we’ve just taken a peek at most of the elements included in an RSS document. To me, lists of elements and their descriptions are interesting, but they become more fun when something can be done with them. So I’ll show you how to build a set of classes that will write these elements and create a valid RSS document for you, and you won’t have to worry about building the XML by hand. I’m going to build three classes – one for each of the “main” elements,
rss, channel and
item, named RSS_Feed, RSS_Channel and RSS_Item, respectively. When building the classes, I’ll start from the bottom up, or from the XML perspective at the innermost element and work my way out.
The RSS_Item classThe
RSS_Item class is responsible for serializing data for the
item element. Since I’ll be writing classes to do the serialization, I can enforce rules for the RSS specification, like the rule that each item
must have – one of either a title or a description. I’ll accomplish this by setting up these values as parameters in the constructor, and throwing an Exception if at least one of them isn’t supplied. See Listing 1.
I’ve left out some of the PhpDocumentor documentation for brevity; ideally, the documentation should be updated as the class is being built.
In the constructor of the
RSS_Item, the code checks to see if the title and description were both blank. If they were both blank, the
throw keyword is used to throw an Exception with a useful message to whatever is calling this constructor. It will be up to the code that calls this constructor to handle the exception appropriately. At first, this might flirt very closely with the unpopular notion of using an Exception to handle a condition that isn’t really an exception, but more of a violation of a business rule. In this case though, the constructor needs to do its best to enforce the RSS specification’s rules to ensure that when the object is serialized to XML it will be a valid RSS document.
There is no need to limit setting the title and the description to just the constructor. So I’ve added some access methods that allow the caller to set the title and the description later on. The accessors for the title and the description are included in Listing 2; the source code folder accompanying the magazine lists the accessors for the other RSS item sub-elements.
Listing 1
1 class RSS_Item {
2 var $_title = ‘’;
3 var $_description = ‘’;
4 /**
5 * The constructor for the RSS_Item class.
6 *
7 * At least one of the title or the description must be supplied.
8 *
9 * @param string title
10 * @param string description
11 */
12 function RSS_Item($title = ‘’, $description = ‘’) {
13 $this->_title = $title;
14 $this->_description = $description;
15
16 if ($title == ‘’ && $description == ‘’)
17 {
18 // Both of these cannot be blank!
19 // Note: You could use better error handling technique
20 throw new Exception(“RSS_Item requires at least one of \”title\” or \”description\”!\n”);
21 }
22 }
23 ...
24}
There’s more – a get and set method for each of the ten different elements that can appear as sub-elements of an item. Only three of the remaining eight elements are not as straightforward as the title and description –
category, enclosure and
source. These elements have attributes along with their content, so their accessors need to be handled a little differently. The function to set the source up only takes a couple different parameters – the URL of the source and the title of the original source’s channel. The accessor will stuff these values into an array used internally to hold the values for the source.
The
getSource() method will return the array. Alternatively, I could avoid returning the array of values and, instead, create two new methods,
getSourceURL() and
getSourceTitle(). The benefit of doing the latter is that it hides the implementation better so the caller doesn’t have to worry about what the names of the elements are in the array returned by the
getSource() method. An example of the new accessors is included in Listing 3.
The accessors for the
category and
enclosure elements are handled in the same fashion. Once there are accessors for each of the ten sub-elements of an item, it is time to write a method that handles writing the values in the variables out to an XML string. Some of the method are shown in Listing 4.
Testing the RSS_Item ClassNow that the
xmlSerialize() method [of the
RSS_Item class] is written, it’s time to test it. I’ve written a simple test script in Listing 5 that sets the values of an instance of the
RSS_Item class (here
$item) and calls the xmlSerialize() method at the end to produce some output.
Listing 2
1 /**
2 * Returns the title of the RSS feed item.
3 * @return string Title of the RSS item
4 * @access public
5 */
6 function getTitle() {
7 return $this->_title;
8 }
9
10 /**
11 * Sets the title of the RSS feed item.
12 * @param string The title to give the item.
13 * @return void
14 * @access public
15 */
16 function setTitle($title) {
17 $this->_title = $title;
18 }
19
20 /**
21 * Returns the description of the RSS feed item.
22 * @return string Description of the RSS item
23 * @access public
24 */
25 function getDescription() {
26 return $this->_description;
27 }
28
29 /**
30 * Sets the description of the RSS feed item.
31 * @param string $description The description of the item.
32 * @return void
33 * @access public
34 */
35 function setDescription($description) {
36 $this->_description = $description;
37 }
Note that I had to ‘require’ the
RSS_Item.php file, which contains the
RSS_Item class. The output of the script produces the item element XML as expected. However, it isn’t very readable when it’s printed to the console because there aren’t any newline characters in the serialization method. That’s okay, because to be a valid RSS XML document it doesn’t need the nice formatting, and it’ll save a little bit of room without having the extra indenting because the file will be a little smaller. There are many ways to format the XML so it is more readable; one way is to use the
XML_Beautifier PEAR module to format the string returned by the
xmlSerialize() method The formatted XML returned by the test script looks like this:
<item>
<title>Example.com Exciting Headlines</title>
<source url=”http://www.example.com/source/”>Example.com Sources</source>
<link>http://www.example.com/stories/item1</link>
<author>author1@example.com</author>
<comments>These are my comments about the item.</comments>
<author>author1@example.com</author>
<guid>http://www.example.com/stories/item1</guid>
<category domain=”http://category.example.com/”>News</category>
</item>Listing 3
1 /**
2 * Sets the source of the item.
3 * @param string $url The URL of the original item.
4 * @param string $sourceTitle The title of the original item’s channel.
5 * @access public
6 */
7 function setSource($url, $sourceTitle)
8 {
9 $this->_source[‘url’] = $url;
10 $this->_source[‘sourceTitle’] = $sourceTitle;
11 }
12
13 /**
14 * Returns the values of the source element as an array.
15 * The URL of the source is the ‘url’ element in the array,
16 * and the title of the original source is stored in the
17 * ‘sourceTitle’ element.
18 * @return array Array containing source information
19 * @access public
20 */
21 function getSource()
22 {
23 return $this->_source;
24 }
Listing 4
1 /**
2 * Returns a string that is the XML representation of the
3 * RSS feed item.
4 * @return string String XML of the item, conforming to RSS specs.
5 * @access public
6 */
7 function xmlSerialize() {
8 $str = “<item>”;
9 if ($this->_title)
10 {
11 $str .= sprintf(“<title>%s</title>”, $this->_title);
12 }
13 if ($this->_description)
14 {
15 $str .= sprintf(“<description>%s</description>”, $this->_description);
16 }
17 if ($this->_source)
18 {
19 $str .= sprintf(“<source url=\”%s\”>%s</source>”,
20 $this->_source[‘url’],
21 $this->_source[‘sourceTitle’]);
22 }
23 ...
24 }
25 if ($this->_category)
26 {
27 $str .= sprintf(“<category domain=\”%s\”>%s</category>”,
28 $this->_category[‘domain’],
29 $this->_category[‘value’]);
30 }
31 if ($this->_enclosure[‘url’] != ‘’)
32 {
33 $str .= sprintf(“<enclosure url=\”%s\” length=\”%s\” type=\”%s\” />”,
34 $this->_enclosure[‘url’],
35 $this->_enclosure[‘length’],
36 $this->_enclosure[‘type’]);
37 }
38
39 $str .= “</item>”;
40 return $str;
The RSS_Channel and RSS_Feed classesThe next two classes,
RSS_Channel and
RSS_Feed, are similar to the
RSS_Item class. They have accessors that set up their sub-elements, with the exception that the
RSS_Channel class uses the
RSS_Item for its item sub-element. Since a channel requires at least one item, the
RSS_Channel constructor requires an
RSS_Item object passed to it. Once the
RSS_Channel object is created, you can add additional items to the channel with the
addItem() method. A basic skeleton, without all of the relatively mundane accessor code, is shown in Listing 6.
The
RSS_Channel class contains just enough code to write valid RSS documents, because aside from the item element, the only required elements are the
title, description, and
link elements. The
RSS_Feed class is similar to the
RSS_Channel class, containing only an array for the channels. The
RSS_Feed class’ constructor accepts an
RSS_Channel object ; more can be added using the
addChannel() method. The
RSS_Feed class is included in the sample code. The sample code shown in Listing 7 can be used to write a valid RSS 2.0 document.
Did It Work?When the
RSS_Feed object’s
xmlSerialize() method was called, valid RSS was written to the console. The RSS document, created as a result of the test, looks like this:
<rss version=”2.0”>
<channel>
<title>My Channel</title>
<description>This is my channel</description>
<link>http://www.example.com/mychannel</link>
<item>
<title>Example.com Exciting Headlines</title>
<source url=”http://www.example.com/source/”>Example.com Sources</source>
<link>http://www.example.com/stories/item1</link>
<comments>http://www.example.com/comments</comments>
<author>author1@example.com</author>
<guid>http://www.example.com/stories/item1#12</guid>
<category domain=”http://category.example.com/”>News</category>
</item>
<item>
<title>Example.com Boring News
<source url=”http://www.example.com/source/”>Example.com Sources</source>
<link>http://www.example.com/stories/item2</link>
<comments>http://www.example.com/comments</comments>
<author>author2@example.com</author>
<guid>http://www.example.com/stories/item1#1</guid>
<category domain=”http://category.example.com/”>Technology</category>
</item>
</channel>
</rss>The RSS document output can now be written to a file, either by using the PEAR File package or by using redirection in the command shell to send the output to a file. Once the RSS document is in a file, it can be validated with several different tools. I used a utility called
xmllint on the iBook G4 where I did these examples. (On a Microsoft Windows operating system I use Altova’s XMLSpy.) If the RSS document is available from the Internet, which mine was after I tucked it under a folder on my web site, use the
Feed Validator tool to validate the RSS feed.
Related ToolsThe
PEAR::XML_RSS package can be used to parse RSS documents. It is easy to use and provides the ability to quickly add RSS capabilities to a web site. I used the example given in the user documentation on the
PEAR web site, but modified the server name. The HTML that was generated from this example, once it was pointed to the RSS document on my computer, looks like this:
<h1>Headlines from <a href=”http://localhost”>Localhost</a></h1>
<ul>
<li><a href=”http://www.example.com/stories/item1”>Example.com
Exciting Headlines</a></li>
<li><a href=”http://www.example.com/stories/item2”>Example.com Boring
News</a></li>
</ul>Although it’s a rudimentary example, because it isn’t complete HTML, it works to show that the RSS document is being parsed correctly.
So Where Can I Use RSS?Although I mostly limited my discussion and examples to Newstype items, RSS is not limited to just broadcasting news stories. It can be used for many other purposes, including broadcasting a user’s latest e-mail messages, notifying users of updates in
Listing 5
1 <?php
2 require_once ‘RSS_Item.php’;
3 try
4 {
5 $item = new RSS_Item(‘Example.com Exciting Headlines’, ‘’);
6 $item->setSource(“http://www.example.com/source/”, “Example.com Sources”);
7 $item->setAuthor(“author1@example.com”);
8 $item->setComments(“These are my comments about the item.”);
9 $item->setLink(“http://www.example.com/stories/item1”);
10 $item->setGUID($item->getLink());
11 $item->setCategory(‘http://category.example.com/’, ‘News’);
12 print $item->xmlSerialize() . “\n”;
13 }
14 catch (Exception $e)
15 {
16 printf(“An error occurred while creating the RSS feed:\n%s”, $e->getMessage());
17 }
18 ?>
Listing 6
1 <?php
2 /**
3 * A class that is used to write valid RSS
4 * channel elements.
5 */
6 class RSS_Channel
7 {
8
9 /**
10 * @var string The title of the RSS feed channel.
11 * @access private
12 */
13 var = ‘’;
14
15 /**
16 * @var string Description of the RSS feed channel.
17 * @access private
18 */
19 var $_description = ‘’;
20
21 /**
22 * @var string The URL of the channel.
23 * @access private
24 */
25 var $_link = ‘’;
26
27 /**
28 * @var array RSS items
29 * @access private
30 */
31 var $_items = array();
32
33 /**
34 * Constructor for the RSS_Channel class
35 * @param string $title The title of the channel
36 * @param string $link The URL of the site associated with the channel.
37 * @param string $description A description of the channel.
38 * @param RSS_Item $item An RSS feed item to be added to this channel.
39 * @access public
40 */
41 function RSS_Channel($title, $link, $description, $item)
42 {
43 $this->_title = $title;
44 $this->_link = $link;
45 $this->_description = $description;
46 $this->addItem($item);
47 }
48
49 /**
50 * Adds the supplied RSS item to the collection of
51 * RSS items already contained in this RSS channel
52 * @param RSS_Item $item An RSS_Item object.
53 * @access public
54 */
55 function addItem($item)
56 {
57 array_push($this->_items, $item);
58 }
59
60 /**
61 * Returns an XML reprentation of the RSS_Channel
62 * object.
63 * @return string RSS channel as XML.
64 * @access public
65 */
66 function xmlSerialize()
67 {
68 $str = “<channel>”;
69 $str .= sprintf(“<title>%s</title>”, $this->_title);
70 $str .= sprintf(“<description>%s</description>”, $this->_description);
71 $str .= sprintf(“<link>%s</link>”, $this->_link);
72
73 foreach ($this->_items as $item) {
74 $str .= $item->xmlSerialize();
75 }
76
77 $str .= “</channel>”;
78 return $str;
79 }
80 }
81 ?>
Listing 7
1 <?php
2
3 require_once ‘RSS_Feed.php’;
4 require_once ‘RSS_Channel.php’;
5 require_once ‘RSS_Item.php’;
6 $item = new RSS_Item(‘Example.com Exciting Headlines’, ‘’);
7 $item->setSource(‘http://www.example.com/source/’, ‘Example.com Sources’);
8 $item->setAuthor(‘author1@example.com’);
9 $item->setComments(‘http://www.example.com/comments’);
10 $item->setLink(‘http://www.example.com/stories/item1’);
11 $item->setGUID($item->getLink() . “#1”);
12 $item->setCategory(‘http://category.example.com/’, ‘News’);
13
14 $channel = new RSS_Channel(‘My Channel’,
15 ‘http://www.example.com/mychannel’,
16 ‘This is my channel’,
17 $item);
18
19 // Add a second item
20 $item2 = new RSS_Item(‘Example.com Boring News’, ‘’);
21 $item2->setSource(‘http://www.example.com/source/’, ‘Example.com Sources’);
22 $item2->setAuthor(‘author2@example.com’);
23 $item2->setComments(‘http://www.example.com/comments’);
24 $item2->setLink(‘http://www.example.com/stories/item2’);
25 $item2->setGUID($item->getLink() . “#2”);
26 $item2->setCategory(‘http://category.example.com/’, ‘Technology’);
27
28 $channel->addItem($item2);
29 $rss = new RSS_Feed($channel);
30 echo $rss->xmlSerialize() . “\n”;
31 ?>