Processing RSS

Filed as General on April 20, 2003 1:18 am

by Duncan

Repost This

from xml.com> The goal of this article is to demonstrate the use of XQuery to accomplish a routine, yet interesting task; in particular, to render an HTML page that merges RSS news feeds from two different weblogs. RSS has earned its popularity by allowing people to easily share news among and between web sites. And for almost any programming language used on the Web, there is a good selection of libraries for consuming RSS.

Readers will benefit from a basic knowledge of the XQuery language. Per Bothner has written an informal introduction to XQuery.

Even though XQuery started as an XML-based version of SQL, the language has a very broad application on the Web. In what follows, I will show that XQuery allows RSS feeds to be consumed and processed easily. In fact, we will see that it isn’t necessary to use a specialized library. We will utilize only functions of the core language.

Jump Right In
If we were using another language we would have probably started with a breakdown of the components of the script and their individual responsibilities. But the XQuery script is so brief that there is not much to break apart.

I will let the code speak for itself; if you still think you need further analysis, stick around and read the text further below.

Listing 1: XQuery Script — RSS Feed Merge

define function row ($link, $title)
{

RSS item {$title} is located at
{$link}

}

define function filter-rss ($url)
{
for $b in document($url)/rss/channel/item
return row($b/link/text(), $b/title/text())
}



Remote RSS Feed Demo, written in XQuery.
Compiled and Run by The Open Source QEXO.org engine.


{filter-rss(“http://www.javablogs.com/ViewDaysBlogs.jspa?view=rss”)}
{filter-rss(“http://radio.weblogs.com/0109827/rss.xml”)}


If you want to see the result of this script immediately, visit http://www.cocoonhive.org/xquery/xqueryform.html. It will look similar to the output shown in Listing 2.

Listing 2: XQuery Script Output — RSS Feed Merge


Remote RSS Feed Demo, written in XQuery.
Compiled and Run by The Open Source QEXO.org engine.


RSS item EJB Design Patterns is located at
http://www.javablogs.com/Jump.jspa?id=20692
RSS item There is a first for everything is located at
http://www.javablogs.com/Jump.jspa?id=20667
RSS item is located at
http://radio.weblogs.com/0109827/2002/12/11.html#a1219
RSS item Programmers are Speshal is located at
http://radio.weblogs.com/0109827/2002/12/11.html#a1218



Let’s examine how the script works. It begins with the definition of two functions. The main body starts after the function definitions.



Remote RSS Feed Demo, written in XQuery.
Compiled and Run by The Open Source QEXO.org engine.


{filter-rss(“http://www.javablogs.com/ViewDaysBlogs.jspa?view=rss”)}
{filter-rss(“http://radio.weblogs.com/0109827/rss.xml”)}


As you can see, it is plain html, except for the two lines which enclose calls to the function filter-rss() in curly braces. The curly braces are indication that a XQuery expression needs to be evaluated.

The function filter-rss()is defined by

define function filter-rss ($url)
{
for $i in document($url)/rss/channel/item
return row($i/link/text(), $i/title/text())
}

It loops over all XML nodes matched by the XPath expression “/rss/channel/item”, which is applied to the XML document returned by the built-in function document(). This function itself is invoked with the $url argument passed to filter-rss(). The value of this argument is either http://www.javablogs.com/ViewDaysBlogs.jspa?view=rss or http://radio.weblogs.com/0109827/rss.xml.

The content of the XML documents located at these two URLs looks similar to:



http://www.javablogs.com/ Blog entries on 14/2/2003
en-us

http://www.javablogs.com/Jump.jspa?id=20740 Just a helpful hint:…

http://www.javablogs.com/Jump.jspa?id=20739 Links back to pages that link to it. List of referrers
and trackbacks. …



As you might expect, the for loop assigns in turn to the variable $i each of the elements of the target document. For each value of $i, the function returns the result of invoking the other custom function in this script, that is, row(), passing the textual values of the link and title sub-elements of item. The latter function is very transparent. It simply returns an HTML

element, which contains the textual values of its arguments.

Functional Benefits
I am not aware of another language endorsed by a standards body that can do the same thing more briefly and intuitively. The fact that XQuery recognizes XML nodes as first-class language constructs, combined with the familiar C-like language syntax, makes it an attractive tool for the problems it was built to solve. It must be noted that although it has a for loop structure, XQuery is a purely functional language. In short, this means that XQuery functions always return the same values given the same arguments. This is an important property of the language, which allows advanced compilation optimizations not possible for C or Java.

In the past decade, functional language compilers have shown significant advantages over imperative language compilers. Their unconventional syntax and the inertia of imperative languages keep them under the radar of mainstream development. However, the XQuery team seems to recognize these weaknesses and is making an attempt to overcome them.

This post was written by

You can visit the for a short bio, more posts, and other information about the author.

Submissions & Subscriptions

Submit the post to Reddit, StumbleUpon, Digg or Del.icio.us.

Did you like it? Then subscribe to our RSS feed!



Sorry, comments are closed.