Language Classifying 5TB of Web Content per Day

At Datastreamer we index a lot of HTML. On an average day we index about 5TB of HTML content and write about 600GB of that to our Elasticsearch index. As part of our indexing we perform data augmentation including language detection.

But how do we scale that and provide high quality language classification without any central point of failure?

Read More →

The Death of RSS - Long Live the Open Web

During a customer conversation today I mentioned that RSS was dead which prompted an interesting discussion as to why it died - specifically the technical reasons behind its death.

I was actually one of the inventors of RSS and one of the co-authors of the RSS 1.0 spec. I started two companies around RSS aggregation. Saying it is dead doesn’t really give me much comfort but at least we can learn from our mistakes.

Read More →

Datastreamer 6.5 - Engagement, Social Media Exports, Classifier, and Parser API

Today we’re announcing the release of Datastreamer 6.5, which includes a number of new APIs and features that provide our clients with the ability to listen to, engage with, classify, as well as analyze social media content.

Read More →

Deploying Elasticsearch At Scale for Social Media Analytics

Earlier today we launched a major new release of Datastreamer. This has been in development for about a year so it’s really great to get it over the fence and released and in front of customers.

Read More →

Datastreamer 6.0 Released

Datastreamer 6.0 is here. In a nutshell, we have dramatically improved our pricing for customers needing smaller amounts of data; brought online a new partner program, doubled our hardware capacity, and implemented more features. Let’s take a look…

Read More →

Social Media Analytics Powered by Elasticsearch

Datastreamer is a big data and social media analytics company which provides access to massive datasets of social media, blogs, forums, and other real time and live content.

Read More →

Datastreamer 5.0 Released

Datastreamer 5.0 has been in development for the last year and a half and today we’re now making it available to the public for the first time. This release incorporates new technology that we’ve been developing based on customer feedback we’ve received over the last eight years. The latest version of Datastreamer enables a number of compelling new features.

Read More →

subscribe via RSS