No credit card required
Imagine you could analyze the news feeds in real-time, analyze the information within them, and even extract some important bits to plug into your data analysis platform, or even monitor events in real-time. This is a typical scenario that is made possible when using RAW.
It could alsobe about reading information stored in any of the many billion web pages or services publicly available (apparently, there's 50 billion of them!). So, instead of the news, it could be aboutreading web data from a customer, competitor or partner website(s). These make plenty of choice for data sources!
How can we do it? Many websites use the RSS format to present updates to websites in a computer-readable format. RSS is an XML standard and despite the reports of the death of XML, there’s still plenty of XML around. RSS often doesn’t have the actual content of the website; it most contains metadata about what is inside each page. So we can use RSS as a nice index, but then need to traverse down to process more data.
Analyzing the news, live!
What if we use the RSS feed of a news organization, say, from CNN? Well, this means we can actually “query the news, live”! So, here’s the plan: we will build an API to:
- Extract metadata from the RSS feed of CNN, by querying and ordering the underlying XML data;
- Pass the results to a text analysis API to return structured, semantic data (entity extraction) system;
- Aggregate up results for presentation.
This is a fairly standard pattern, and here we will use both the OpenGraph.io API for extracting page metadata, along with Google’s Language Entity Analysis API for the text extraction, but there are plenty of choices out there depending on what you want to do.