Posted on 13 February 2010
Explaining the principles of resource sharing between websites using the publish/subscribe paradigm.
Over the last year you might have heard, either through me, Ralph Meijer or through the grapevine, about the stuff we do at Mediamatic related to the resource sharing technologies we're developing. This resource sharing is a part of our ongoing efforts to create a federation of social networks using a number of open web standards, the collective of which we've dubbed Open-CI. At Mediamatic, we implement Open-CI in Anymeta, our community management system. Despite Anymeta being mainly closed-source, the technologies and standards we use are open, and we're in a continuous effort to document the different aspects that Open-CI is made of. This article will explain resource sharing: one of the most fundamental parts of Open-CI.
The primary reason for writing this article is to highlight the fact that an open-source implementation of resource sharing is currently being implemented by me in Zotonic, a web framework written in Erlang that I work on in my spare time. With this article I hope to explain to Zotonic users and developers what these new modules and dependencies are that have been appearing over the last weeks, and why resource sharing is an essential feature of any modern CMS.
Resource sharing as we define it is a framework in which websites share the resources they have in a publisher/subscribe model. By resources, I refer to any form of web content that a website publishes: blog articles, reviews, pictures, status updates, et cetera.
In the publish/subscribe model websites act as publishers (that's what websites are very good at) on the one hand: they publish their content on their webpages. On the other hand, and this is where the new stuff starts, websites act as subscribers as well: they can subscribe to resources that exist at a different site, for instance, for showing the content of the resource (e.g. the text of an article) in their own context, independent of the layout of the site it was original published on, or just showing a permalink to the resource with the proper title of the article.
Why all this complexity of publishers and subscribers, you might think, when you can just copy-paste the information from one website to another? There's nothing wrong with copy-pasting, except that when doing it that way, the information can easily get outdated. It might be that the title of the article on the original site changes, or even the link to the article might change. If you just copy-paste, you'll end up with outdated information and maybe even broken links. Resource sharing solves this by using the publisher/subscribe model: when the resource at the publisher's site changes, the subscriber is notified of this change and can update his information.
There are several reasons why you would want to share resources between sites. From the top of my head, I'm listing a few:
So far I managed not to talk about any underlying technology at all, so you could grasp the concept. The remaining sections of this article dive into the tech behind this all.
The publish/subscribe paradigm explained in the previous sections has a perfect match on the technical level: Our take on resource sharing is based mainly on XEP-0060, the Publish-Subscribe subprotocol of XMPP (the Extensible Messaging and Presence Protocol, formerly known as Jabber). It goes beyond the scope of this article to explain how XMPP works, but for a quick refresh of XEP-0060, publish/subscribe works with the concept of pubsub nodes on an XMPP server, to which publishers can push information to. Subscribers subscribe to pubsub nodes, receiving the notification in near-realtime whenever a publisher publishes information to a node.
In our resource sharing implementation, resources on a website is linked to a single pubsub node on an XMPP server. This link is a one-to-one link: every single resource is linked to a single pubsub node: so when you have the node, you know which resource it represents, and vice versa. Furthermore, for future reference, resources that are published by a website are called authoritative resources when the resource resides on the website that it was created on. In other cases, when the resource has been imported using pubsub, we call the resource non-authoritative.
The publisher side of resource sharing is pretty straight-forward. Whenever the resource changes, the publishing website publishes an XML representation of the contents of the new resource to this node over XMPP. The XML representation of this "payload" is Atom. We chose Atom because an atom entry is in many cases a fitting representation for a web resource; it includes for example information on when the original resource was published and by whom.
To tell the world that a resourcer can be subscribed to on XMPP to receive updates for resource, the publisher puts a <link> element in the head section of the content. This link element contains the XMPP URI telling subscribers where they should direct their subscriptions to receive updates regarding the resource. This XMPP URI is nothing more than the name of the XMPP server plus the name of the node:
<link rel="xmpp.feed" href="xmpp:pubsub.mediamatic.nl?;node=id/9411" title="XMPP updates for this item" />
If you view the source of this page, you'll see how eJabberd pubsub nodes look like by default.
The subscriber side is somewhat more complex. First of all, the subscribing website must have a way of telling that the resource is non-authoritative: a boolean flag or similar. Also, it must store the original location of the resource: the resource URI. The combination of the two can be used to display a notice on the non-authoritative site to show that the resource has been imported from somehwere else (for an example of this, see this article).
When subscribing to a piece of content on another website, you need this resource URI: the location where the HTML representation of the resource can be found (often the "permalink" of the resource). The subscriber retrieves the HTML representation by dereferencing the URI, and looks for the link element with type="xmpp.feed", extracting the XMPP URI from the href attribute. When this XMPP URI of the resource has been discovered, the subscriber proceeds to create a non-authoritative "stub" article, marking this with the found XMPP URI. It then sends out the subscribe request to the service referenced in the XMPP URI, for the discovered node. When the subscription succeeds, most XMPP servers immediately send a notification to the subscriber. This notification contains the most recent item that was published to the node. On the receiving of this notification, the subscriber can extract the payload, which represents the contents of the resource in Atom XML format, and then proceed to update the "stub" article that was created before the subscription started, using the Atom contents.
Now, the subscriber has made a succesful subscription to a resource on another site. Every time someone changes the original resource, the subscriber gets notified, and can update his own non-authoritative copy of the resource accordingly.
When a subscriber deletes his non-authoritative resource, he also sends out a unsubscribe request to the services referenced in the XMPP URI of the non-authoritative resource, saying, hey, I deleted this so I don't need any future notifications.
The deletion of authoritative resources needs to be propagated over publish/subscribe as well: otherwise, when the original resource gets deleted on the publisher's site, the subscriber's copy is still there and now links to a 404 page on the publisher's site. Luckily, XEP-0060 is extensive enough to cover this situation: when the publisher deletes the resource, he also sends out over XMPP a request to delete the pubsub node that was linked to the content. The XMPP server then notifies every subscriber of this fact, and thus the subscribers can delete their non-authoritative copies of the resource as well.
Resources can also relocate: for instance, you can decide that your blog post should move from your personal blog to your company's blog. Or, you change the domain name of your blog, which also affects the domain name of your XMPP server. In both these cases, the resource is first at one site, and the next moment it's published at another site. Luckily you know the new location of the node, the location of the XMPP server where you're moving your content to. This case can be handled much like a delete: you tell your subscribers that the node is no longer there, but that they should resubscribe to another location to continue to receive updates for the item.
I hope that this article gives insight into how resource sharing over XMPP works, both on a conceptual and a technical level. Technically, there can be a lot more said about this subject. For one, this article did not go into implementation detail on the XMPP side, where one can choose from several possible implementation options.
An article on how to use resource sharing into Zotonic will be convered in a next article. This article will not only cover the setup procedure but will also highlight the implementation details of the responsible mod_pubsub module.