O'Reilly Network    
 Published on O'Reilly Network (http://www.oreillynet.com/)
 See this if you're having trouble printing code examples


The New Bloglines Web Services

by Marc Hedlund
09/28/2004

Bloglines today announced a set of new web services APIs, allowing developers to write applications for reading RSS and Atom feeds by drawing data directly from the Bloglines databases. This is a very significant change in the landscape of RSS/Atom aggregators, the newsreading applications that have become more popular over the past few years. Along with the release of its web services, Bloglines announced that several desktop RSS/Atom aggregators, including FeedDemon, NetNewsWire, and Blogbot, will begin using these APIs to provide additional capabilities in their applications. The Bloglines Web Services make it very easy for developers to use RSS and Atom content for many purposes, and the services will also ease the traffic pileup that aggregators are beginning to cause for many large RSS/Atom publishers.

This article will take a look at the new Bloglines Web Services and their effect on the RSS/Atom landscape. We'll look at the bandwidth issues surrounding RSS/Atom aggregators and how the Bloglines Web Services help conserve bandwidth; then examine the APIs and what they offer; and finally, present a complete, three-pane desktop RSS/Atom reader written in just 150 lines of code, using the Groovy programming language.

The RSS Overload

eWeek recently reported on the bandwidth problems RSS/Atom aggregators have been causing for Web publishers. Spurred in part by Microsoft's announcement that even it was having trouble keeping up with requests for its blogs.msdn.com feeds, publishers have been talking about how much traffic a popular RSS/Atom feed can bring to bear. As one publisher in the eWeek article put it, "Any site that becomes popular is going to be killed by their RSS."

So what's the problem? Haven't web sites been able to keep up with traffic from all over the world for years now? It's true that web servers and protocols are very scalable, but RSS/Atom readers present a new kind of challenge. With a web browser, users visit a web site only while they are in front of their computer and reading that site--in other words, when they are actively browsing. An individual will visit some very large sites (such as My Yahoo or Google News) repeatedly throughout the day, but such sites are usually commercially run and able to support larger streams of traffic. The difference with an RSS/Atom aggregator is that it automatically pulls information from a publisher's site on a regular basis--sometimes as often as once every 5 minutes. Regardless of whether the site has changed or the user is out to lunch or home for the evening, the aggregator will update itself continuously as long as it is running, to ensure that it is able to present the latest information when called on by the user.

Related Reading

Content Syndication with RSS
By Ben Hammersley

Some people joke that a popular RSS site is indistinguishable from a security attack. In security circles, a large number of clients repeatedly making requests to the point of overload is known as a distributed denial-of-service attack, and attacks of this sort have taken down the largest sites on the Web, including Yahoo, eBay, and Amazon. For a small Web publisher, even a moderately popular RSS/Atom feed can cause serious bandwidth consumption, running up ISP bills and preventing users from reaching any part of the site. For larger publishers, RSS/Atom feeds can bring in many more users but can also consume extensive resources.

While many in the RSS/Atom developer community have long recognized the bandwidth overload problem, the possible solutions require that nearly all aggregators adhere to a variety of "polite" practices to ensure that servers are not overwhelmed. As of yet, not all aggregators have done so. Even where developers have made determined efforts, users want very fresh news and therefore often configure their aggregators to poll very frequently.

Bloglines As a Feed Cache

Bloglines is different from most other RSS/Atom aggregators. Like NewsGator, Bloglines is a server-side aggregator. This means that Bloglines maintains a database of RSS/Atom feeds in the same way Google maintains a database of web pages. Bloglines users query that database instead of polling individual RSS/Atom publishers from their desktop machines. In other words, Bloglines appears to publishers--and consumes bandwidth--like one single RSS/Atom aggregator but is able to serve tens of thousands of users.

By offering web services APIs, Bloglines is opening up its database of feeds for anyone to use. Any developer making an RSS/Atom-based application can draw from the Bloglines database, avoiding bandwidth overload for RSS/Atom publishers.

Bandwidth savings, though, is not the only reason to use Bloglines as a feed cache. RSS and Atom are emerging formats on the Internet, and there are many variations on feed formats to deal with. By drawing feeds from the Bloglines database, developers are presented with a single format--Bloglines normalizes all of the feeds it collects before distributing feed content. Another benefit is one that Bloglines users have long enjoyed: synchronization across computers. If you read news on one computer at work and on another at home, using a server-based aggregator lets you have the same set of feeds on both machines, and allows you to update those feeds as you read them from any machine. Using the Bloglines Web Services, client-side (desktop) aggregators can provide this same functionality. You could even use, say, FeedDemon on Windows and NetNewsWire on Macintosh, and share the state of your feeds between them through Bloglines.

While not all of the Bloglines features are available through its web services, many of the key benefits for publishers and users are, and developers have less work to make aggregators, too.

The Bloglines API Calls

The Bloglines Web Services APIs are made available through two simple REST-based URLs: listsubs and getitems. Both of these calls, as well as other APIs that Bloglines provides, are documented at www.bloglines.com/services/api. We'll first walk through the setup of a Bloglines API application, then each of the calls in turn. Finally, we'll look at a sample Bloglines API application.

Setup

Before getting started with your Bloglines API application, collect the following:

  1. All users of Bloglines API applications must have their own Bloglines account. For development, if you do not already have a Bloglines account, register for one now. If you plan to distribute your application to other users, make sure they know they need to get an account, and prompt them for their account email address and password. Once you have a Bloglines account, subscribe to one or more feeds so your account will have data in it.
  2. All Bloglines API calls are authenticated using Basic HTTP Authentication. Whatever programming language you use to develop your application, make sure you have a client HTTP library that provides authentication capabilities, or read up on how to implement authentication yourself (which isn't hard). Java and Groovy users will probably want to use HTTPClient; Perl users will want to use LWP. Other languages have similar libraries available. To authenticate, use the email address and password for your Bloglines account.
  3. The returned information from the API calls is an XML document containing the information you requested in the call. You will need to have an XML parser available, or you can parse the returned document yourself with regular expressions or otherwise.

When you have a Bloglines account, an authenticating HTTP library, and a way to parse XML results, you're ready to start making API calls.

listsubs

The listsubs call is used to list all of the subscriptions for a given user account. The APIs do not provide a way to add feed subscriptions to your account, nor do they provide methods for editing or updating feeds. In order to create or modify subscriptions, you must go to the Bloglines site and use the Bloglines web interface. After your subscription list is registered with Bloglines, you may access that list using listsubs.

The listsubs call is simple and takes no parameters. Every call to listsubs looks the same:


  GET http://rpc.bloglines.com/listsubs

listsubs will return by way of HTTP, using the HTTP response code to indicate the status of the request--200 OK to indicate success, and 401 Unauthorized to indicate that the given email address or password is not valid. Be sure to check the response code from the request and prompt the user to correct his address or password as needed. If the listsubs call succeeds, the HTTP response will contain an XML document with a list of the user's subscriptions. This response document is in OPML format but also contains some Bloglines-specific extensions to OPML: BloglinesSubId, BloglinesUnread, and BloglinesIgnore attributes on <outline> tags, indicating the state of that subscription in the Bloglines user account. BloglinesSubId is an identifier for the subscription within Bloglines' database--you'll need this later to request feed content. BloglinesUnread shows the number of items in the feed that the user has not yet read. BloglinesIgnore (where 1 means ignore and 0 means don't ignore) indicates whether the user wants to be notified of new items on that feed.

One item to note: Bloglines, like many RSS/Atom aggregators, lets users organize subscriptions within subfolders. As a result, the OPML file that listsubs returns may contain several levels of nested <outline> tags, some representing folders and some representing feeds. One good way to check an outline tag to see whether it represents a folder or a feed is to look for the presence of an xmlUrl attribute in the <outline> tag. If an xmlUrl is present, it's a feed; if not, it's a folder. listsubs will return a BloglinesSubId for both folders and feeds, so you can't use that as a distinguishing factor.

An example response to the listsubs call, along with more documentation of the call responses, is provided on the Bloglines site.

getitems

The getitems call is used to retrieve all unread items on a feed to which the user is subscribed; or all items since a given date and time. In order to make a getitems call, you are required to know the BloglinesSubId for the feed you want to retrieve, so you will need to make a listsubs call first and get the BloglinesSubId from the listsubs result.

Especially during development, you may not want to have your Bloglines API application update your feed states on the Bloglines server--you may want to test-read your feed in your application, and then later read the feeds on Bloglines itself. You can control whether the getitems will update the read status of the feed you request by adding n=1 (update) or n=0 (do not update) to the end of the getitems URL call. (A call to listsubs does not affect your read status at any time.)

If you do not specify a date parameter to the getitems call, the call will return an RSS 2.0 document containing all of the unread items in that feed. The number of items should be the same as was listed in the BloglinesUnread attribute returned by listsubs for that feed, but it may contain more--for instance, if another item arrives between the time of the listsubs call and the time of the getitems call. It could also contain fewer items, if the same user has read items through another application (the Bloglines web interface or another Bloglines API application). If there are no unread items on the feed, getitems will respond to the HTTP request with a 304 Not Modified response code.

You can retrieve items that you've previously read by specifying a d=DATE parameter to getitems. The date should be given in Unix time--that is, the number of seconds since January 1, 1970. As before, if there are no items on that feed after the date you specified, you will receive a 304 Not Modified response and an empty response body.

Here's what a typical getitems call might look like:


  GET http://rpc.bloglines.com/getitems?s=270&n=0

This call says that you want to retrieve all unread items for BloglinesSubId 270 and that you do not want your read status updated by this call.

Other examples of the getitems call format, and an example return document, are available on the documentation page for getitems on the Bloglines site.

A Complete Bloglines API Application

Because Bloglines provides so much infrastructure behind the scenes, it's easy to make an application that offers full RSS/Atom functionality with very little work. A Bloglines API application needs only to get the subscription list, wait for the user to select a subscription, get a list of items for that subscription, and display the item contents for items the user selects. Using the Java-based scripting language Groovy, we can build a standard three-pane desktop RSS/Atom aggregator in just about 150 lines of code--not much at all!

If you're interested in learning more about Groovy, check out Get ${Stuff} Done with Groovy. To run this application with Groovy, follow the Groovy installation instructions, and then run the Bloglines client with the command:

groovy BloglinesClient.groovy

The application will ask you for your Bloglines email address and password, then will show the full interface.

Below is the Groovy code for our RSS/Atom aggregator. As you'll see, the tasks we have to accomplish are mostly to get the XML information from Bloglines and to organize that information into an interface the user can comfortably use. All of the heavy lifting of normalizing data and keeping state for the user is already done for us by the Bloglines APIs. (Download the code here.)


/*
 * BloglinesClient.groovy - an example of the Bloglines Web Services
 *
 * Written by Marc Hedlund <marc@precipice.org>, September 2004.
 * Requires Groovy 1.0-beta-6; see <http://groovy.codehaus.org/>.
 *
 * This work is licensed under the Creative Commons Attribution
 * License. To view a copy of this license, visit
 * <http://creativecommons.org/licenses/by/2.0/> or send a letter to
 * Creative Commons, 559 Nathan Abbott Way, Stanford, California 94305, USA.
 */

import groovy.swing.SwingBuilder;
import java.awt.BorderLayout;
import java.net.URL;
import javax.swing.BorderFactory;
import javax.swing.JOptionPane;
import javax.swing.JSplitPane;
import javax.swing.JTree;
import javax.swing.ListSelectionModel;
import javax.swing.SwingUtilities;
import javax.swing.WindowConstants;
import javax.swing.tree.DefaultMutableTreeNode;
import javax.swing.tree.TreeSelectionModel;
import org.apache.commons.httpclient.HttpClient;
import org.apache.commons.httpclient.UsernamePasswordCredentials;
import org.apache.commons.httpclient.methods.GetMethod;

// Set up global variables and data types
server   = 'rpc.bloglines.com';
apiUrl   = { method | "http://${server}/${method}" };
class Feed { name; id; unread; String toString() { 
    return (unread == "0" ? name : "${name} (${unread})");
  } 
}
class Item { title; contents; String toString() { return title; } }

// Ask the user for account information (using simple dialogs)
email = 
  JOptionPane.showInputDialog(null, "Email address:", "Log in to Bloglines", 
			      JOptionPane.QUESTION_MESSAGE);
password = 
  JOptionPane.showInputDialog(null, "Password:", "Log in to Bloglines", 
			      JOptionPane.QUESTION_MESSAGE);

// Use HTTPClient for web requests since the server requires authentication
client = new HttpClient();
credentials = new UsernamePasswordCredentials(email, password);
client.getState().setCredentials("Bloglines RPC", server, credentials);

// Get the list of subscriptions and parse it into a GPath structure
opml = new XmlParser().parseText(callBloglines(apiUrl('listsubs')));

def callBloglines(url) {
  try {
    get = new GetMethod(url);
    get.setDoAuthentication(true);
    client.executeMethod(get);
    return get.getResponseBodyAsString();
  } catch (Exception e) {
    println "Error retrieving <${url}>: ${e}";
    return "";
  }
}

// Descend into the subscription outline, adding to the feed tree as we go
treeTop = new DefaultMutableTreeNode("My Feeds");
parseOutline(opml.body.outline.outline, treeTop);

def parseOutline(parsedXml, treeLevel) {
  parsedXml.each() { outline |
    if (outline['@xmlUrl'] != null) {  // this is an individual feed
      feed = new Feed(name:outline['@title'], id:outline['@BloglinesSubId'], 
		      unread:outline['@BloglinesUnread']);
      treeLevel.add(new DefaultMutableTreeNode(feed));

    } else {  // this is a folder of feeds
      folder = new DefaultMutableTreeNode(outline['@title']);
      parseOutline(outline.outline, folder);
      treeLevel.add(folder);
    }
  }
}

// Build the base user interface objects and configure them
swing = new SwingBuilder();
feedTree = new JTree(treeTop);
itemList = swing.list();
itemText = swing.textPane(contentType:'text/html', editable:false);
model = feedTree.getSelectionModel();
model.setSelectionMode(TreeSelectionModel.SINGLE_TREE_SELECTION);
itemList.setSelectionMode(ListSelectionModel.SINGLE_SELECTION);

// Set up the action closures that will react to user selections
listItems = { feed |
  rssText = callBloglines(apiUrl('getitems') + "?s=${feed.id}&n=0");  
  if (rssText != null) {
    try {
      rss = new XmlParser().parseText(rssText);
      items = new Vector();
      rss.channel.item.each() {
	item = new Item(title:it.title[0].text(), 
			contents:it.description[0].text());
	items.add(item);
      }
      itemList.setListData(items);
      feed.unread = "0";  // update the unread item count in the feed list
    } catch (Exception e) {
      println "Error during <${feed.name}> RSS parse: ${e}";
    }
  }
}

feedTree.valueChanged = { event |
  itemText.setText("");  // clear any old item text
  node = (DefaultMutableTreeNode) feedTree.getLastSelectedPathComponent();
  if (node != null) {
    feed = node.getUserObject();
    if (feed instanceof Feed && feed.unread != "0") {
      listItems(feed);
    }
  }
}

itemList.valueChanged = { event |
  item = event.getSource().getSelectedValue();
  if (item != null && item instanceof Item) {
    itemText.setText("<html><body>${item.contents}</body></html>");
  }
}

// Put the user interface together and display it
gui = 
  swing.frame(title:'Bloglines Client', location:[100,100], size:[800,600], 
	      defaultCloseOperation:WindowConstants.EXIT_ON_CLOSE) {

    panel(layout:new BorderLayout()) {
      splitPane(orientation:JSplitPane.HORIZONTAL_SPLIT, dividerLocation:200) {
        scrollPane() {
	  widget(feedTree);
	}

        splitPane(orientation:JSplitPane.VERTICAL_SPLIT, dividerLocation:150) {
          scrollPane(constraints:BorderLayout.CENTER) {
	    widget(itemList);
	  }

	  scrollPane(constraints:BorderLayout.CENTER) {
	    widget(itemText);
	  }
        }
      }
    }
  }

gui.show();

That's all there is to it. With these few lines of code, we already have a desktop aggregator comparable to many of the full applications that otherwise would take much longer to develop. Here is a screenshot of the finished aggregator in action:

Ideas for Other Bloglines API Applications

Obviously there are some benefits to implementing desktop aggregators with the Bloglines Web Services APIs (foremost among them, as discussed above, are the bandwidth savings and the server-maintained state), but the most exciting part of the Bloglines Web Services is the opportunity for new RSS/Atom applications to emerge:

These are just a few ideas to get you started. Hopefully, the Bloglines Web Services APIs will be a great platform for developers, and a boon to RSS/Atom publishers and readers around the Internet.

Marc Hedlund is an entrepreneur working on a personal finance startup, Wesabe.


Return to the O'Reilly Network

Copyright © 2009 O'Reilly Media, Inc.