Datastreamer 6.0 is here. In a nutshell, we have dramatically improved our pricing for customers needing smaller amounts of data; brought online a new partner program, doubled our hardware capacity, and implemented more features. Let’s take a look…
We’ve dramatically improved our infrastructure efficiency over the past year. This means it’s more affordable for us to host our massive dataset, and we want to pass these savings on to our customers.
Datastreamer can now offer amazing cost savings on data feeds for both search and firehose APIs. Some customers can expect to save 5-10x depending on how much data they are buying. If you’re paying too much for your existing data provider, then now is a great time to talk to us!
The social media analysis industry is dominated by a few big players but this absence of competition is not healthy for our industry
These large players have invested heavily building out massive big data platforms, and can spend in excess of $150k per month to first purchase the data and then possibly another $150k or more for the cloud infrastructure required to store the data.
Datastreamer changes this by allowing social media marketing firms raw access to our Elasticsearch cluster. We do this by giving them direct access to aggregations, filters, and other advanced Elasticsearch features which, provides for a very advanced analytics platform.
This direct access provides our clients with features like custom reach, engagement computation by gender, location, sentiment, etc. However, the key point here is that now you can access 10x more data than previously and at the same cost.
We’ve more than doubled our online capacity in terms of memory and storage. Datastreamer now hosts sixty (60) days of content online on ultra-fast SSDs.
We’ve also brought online extended archives and have content available going back to January 2016 and will be keep archives moving forward.
We’ve expanded our augmentations to support gender and sentiment analysis.
These are available to search over but also supported in aggregations which enable monitoring and analytics over the entire corpus of content.
We’ve also brought online “categories” for mainstream news and weblogs based on reddit “subreddits” with our machine learning classifiers built from the text in these articles.
Datastreamer currently has categories for business, entertainment, health, politics, science, sports, and technology. We can add new / custom categories if needed.
We’ve spent a massive amount of time optimizing every aspect of our search stack to make it blazingly fast. Now all queries execute in less than 500ms.
One of the great things about Datastreamer is that we provide our customers with the raw Elasticsearch API as well as Kibana access for querying elasticsearch visually and interactively.
We find that some of our customers want more of a hybrid search/firehose API. They want a LOT of data but their queries are more along the lines of search requests. Datastreamer now supports this in 6.0 as you can page through time ranges re-executing queries as new documents arrive.
We’ve bought online more international content, including European languages but we’re now adding additional asian languages including more Japanese and Chinese content.
Since Datastreamer support aggregations and Elasticsearch supports ranking via functions we can support computing reach/engagement by aggregation of location, gender, or sentiment.
Some posts are too long to display and what you really want is summary text to include with the article. Datastreamer now supports this and try to compute 200-400 characters of summary text along with the news article.
We’ve extended our support for metadata fields, including adding follower counts, shares, comments, likes, etc
These are all updated on the fly and clients can also sort over these fields and compute custom ranking based on content values.
Documents on the web aren’t just published at one URL. For example, Reuters, the Associated Press, and mainstream media sites only exacerbate the problem by publishing the same content under different URLs - often on completely unrelated websites.
Other data providers will leave you to handle this problem on your own. Datastreamer provides integrated “near duplicate” detection. We give you the first instance of the document Datastreamer found in the cluster as well as all documents which are duplicates.
Datastreamer also extracts video and images on posts including all the metadata to properly embed them into your application.