Datastreamer 6.0 Released

Datastreamer 6.0 is here. In a nutshell, we have dramatically improved our pricing for customers needing smaller amounts of data; brought online a new partner program, doubled our hardware capacity, and implemented more features. Let’s take a look…

More efficient infrastructure and improved pricing!

We’ve dramatically improved our infrastructure efficiency over the past year. This means it’s more affordable for us to host our massive dataset, and we want to pass these savings on to our customers.

Datastreamer can now offer amazing cost savings on data feeds for both search and firehose APIs. Some customers can expect to save 5-10x depending on how much data they are buying. If you’re paying too much for your existing data provider, then now is a great time to talk to us!


Partner Program

The social media analysis industry is dominated by a few big players but this absence of competition is not healthy for our industry

These large players have invested heavily building out massive big data platforms, and can spend in excess of $150k per month to first purchase the data and then possibly another $150k or more for the cloud infrastructure required to store the data.

Datastreamer changes this by allowing social media marketing firms raw access to our Elasticsearch cluster. We do this by giving them direct access to aggregations, filters, and other advanced Elasticsearch features which, provides for a very advanced analytics platform.

This direct access provides our clients with features like custom reach, engagement computation by gender, location, sentiment, etc. However, the key point here is that now you can access 10x more data than previously and at the same cost.

Doubled Online Capacity and Six Month Archives

We’ve more than doubled our online capacity in terms of memory and storage. Datastreamer now hosts sixty (60) days of content online on ultra-fast SSDs.

We’ve also brought online extended archives and have content available going back to January 2016 and will be keep archives moving forward.

Gender and sentiment

We’ve expanded our augmentations to support gender and sentiment analysis.

These are available to search over but also supported in aggregations which enable monitoring and analytics over the entire corpus of content.


We’ve also brought online “categories” for mainstream news and weblogs based on reddit “subreddits” with our machine learning classifiers built from the text in these articles.

Datastreamer currently has categories for business, entertainment, health, politics, science, sports, and technology. We can add new / custom categories if needed.

All Queries Executed in Less Than 500ms

We’ve spent a massive amount of time optimizing every aspect of our search stack to make it blazingly fast. Now all queries execute in less than 500ms.

Latest Elasticsearch and Kibana

One of the great things about Datastreamer is that we provide our customers with the raw Elasticsearch API as well as Kibana access for querying elasticsearch visually and interactively.

Firehose via Search

We find that some of our customers want more of a hybrid search/firehose API. They want a LOT of data but their queries are more along the lines of search requests. Datastreamer now supports this in 6.0 as you can page through time ranges re-executing queries as new documents arrive.

More International Content

We’ve bought online more international content, including European languages but we’re now adding additional asian languages including more Japanese and Chinese content.

Custom Reach/Engagement Calculation

Since Datastreamer support aggregations and Elasticsearch supports ranking via functions we can support computing reach/engagement by aggregation of location, gender, or sentiment.

Summary Text

Some posts are too long to display and what you really want is summary text to include with the article. Datastreamer now supports this and try to compute 200-400 characters of summary text along with the news article.

Real-Time Updates (follower counts, shares, comments, likes, etc)

We’ve extended our support for metadata fields, including adding follower counts, shares, comments, likes, etc

These are all updated on the fly and clients can also sort over these fields and compute custom ranking based on content values.

Near duplicate detection

Documents on the web aren’t just published at one URL. For example, Reuters, the Associated Press, and mainstream media sites only exacerbate the problem by publishing the same content under different URLs - often on completely unrelated websites.

Other data providers will leave you to handle this problem on your own. Datastreamer provides integrated “near duplicate” detection. We give you the first instance of the document Datastreamer found in the cluster as well as all documents which are duplicates.

Better images and video

Datastreamer also extracts video and images on posts including all the metadata to properly embed them into your application.