How a machine can tell data-driven stories by putting structure to unstructured data
Quill is an automated narrative generation system that tells stories. But to do so, Quill needs data. Without the data, there is no way for it to figure out what’s happening, and thus no way for it to tell us anything.
So, how can Quill work in the world of social media, where “data” is unstructured text? What can a system that lives and breathes structured data do with language that can be sophisticated, ambiguous and messy?
We decided to start experimenting with Twitter. Our goal? Use Quill to provide tweeters – from causal tweeters to “VITs” (Very Important Tweeters) – with personal stories about their sphere of influence they can act on.
The problem was, we had masses of text we could read (billions upon billions of tweets) but nothing a machine could understand. We somehow had to bring order to the chaos of this unstructured data so Quill could write stories.
We quickly realized that a small amount of work would allow us to extract enough data from the texts we were processing to generate a story.
In other words, we could do for Quill what Quill does for others – give it the information in a form most natural for it – structured data – so that it could provide us with the information in a form most natural for us – plain English language.
Once we determined the kind of narratives for Quill to write, we started gathering the data. We decided to look at a mass of tweets and pull out two main data points for each one: the general topic area (e.g., politics, education, business & technology, etc.) and a sentiment score associated with the tweet (positive or negative).
What this “tagging” technique allows Quill to do is powerful. Given a Twitter user – let’s call him John – our classifiers gather up John’s tweets and tag each one with regard to topic and sentiment. This gives Quill the data it needs to build a profile of John based upon what he talks about on Twitter (“John, you seem to be most interested in Business & Technology.”).
These classifiers also look at John’s followers and do the same type of tagging, giving Quill what it needs to notice and comment on similarities and differences between John and his followers (“While you talk a lot about Business & Technology, John, most of your followers are more focused on Science.”).
Finally, the system gathers data around frequency of tweeting, re-tweeting behaviors, the sources of shared documents and hashtag utilization. All of this is then fed into Quill as new, structured data that it can use to generate John’s story.
Our Twitter experiment is now in beta and ready to be tested by you! We are opening up the application, called Quill Connect, to any Twitter user so we can gather feedback and decide where to go next. The stories generated by Quill Connect focus on how you and your followers are similar, how you are different and how to better engage with your followers.
Excerpt from a sample Quill Connect report.
This is an exciting Narrative Science milestone for two reasons:
- Quill can generate narratives using social media data.
- Anyone using Twitter can access a personalized and useful story.
If you’re wondering how well you are interacting with your followers and what they are talking about, ask Quill Connect to write you a story. Who knows what you might learn!