During a customer conversation today I mentioned that RSS was dead which prompted an interesting discussion as to why it died - specifically the technical reasons behind its death.
I was actually one of the inventors of RSS and one of the co-authors of the RSS 1.0 spec. I started two companies around RSS aggregation. Saying it is dead doesn’t really give me much comfort but at least we can learn from our mistakes.
Frankly, with Facebook and Twitter being so popular, why bother with RSS.
Google Reader is long since dead and even systems like Feedly (which have taken its place) usually add external indexing on top of the main HTML content (not necessarily working with the RSS).
The problem here is that these social networks are essentially walled gardens. Content posted on one platform isn’t available on other platforms.
These basic and simple file formats step in and fix the holes that are left by RSS. In fact, Twitter Cards and Facebook Open Graph are used by about 90% of the mainstream news and weblog sources we index.
So while RSS is dead at least there are alternatives.
These standards provide the critical metadata around articles including date published, author information, etc.
The key issue here is that the content in HTML is actually visible.
This means that as users (and the original author) are visiting the site, they can quickly see any formatting or typo errors and notify the publisher.
This isn’t necessarily true with RSS. RSS users tend to represent a small fraction of the site visitors and so have less of a chance to report issues.
Additionally, these issues could only be present in certain RSS aggregators and so there’s even less of a chance to be reported.
Now combine this with a number of problems including lower quality content, multiple RSS formats (0.90, 0.91, 1.0, 2.0, and a dozen additional extensions), charset encoding issues, language incompatibility, etc. We have a system setup to fail simply because no one could ever reasonably resolve these problems.
Part of the problem with RSS, at least for some of our customers, is that it’s really only a container format. There aren’t any 3rd party augmentations on the data which can provide valuable additional functionality.
Features including gender and language detection, sentiment analysis, content classification, summary computation, etc. We’ve build these into Datastreamer and also ship some of these as standard APIs.
For the most part all the above issues are resolved with the Open Web and HTML 5.
All the metadata we need is available to index. This includes post date, author information, etc. This also include videos, images, etc.
Really the only thing that’s ever a problem is isolating the main content on the page. There are formats for this but they’re not reliable since this metadata isn’t presented to the user.
RSS is dead but at least with the open web we have a chance to move forward.