As originally seen on Data Informed, March 6, 2015.
I believe in data and I love the world of big data and the analysis that it affords. However, we are at an inflection point with regard to our ability to utilize data, and it’s reflected in a growing dissatisfaction in the C-Suite.
In a recent Capgemini Consulting survey, less than a quarter of respondents with big data initiatives said they considered them a success, and under 10 percent were “fully satisfied” with the results.
But even in the face of this, the spending is increasing and is projected to cross the $100 billion mark within the next three years. The driver seems to be executive fears that any company without a big data strategy is going to be left behind as industry after industry is disrupted. These investments are not aimed at solving specific problems. Instead, they’re aimed at keeping up with the Jones’. That’s no way to run a company and a frustrating way to run data initiatives.
To glean value from the data initiatives we have, we need an approach that focuses on what we really want: the narratives and insights needed to support decision making, which was the original goal of these initiatives. We need to get to the last mile of data: information and insight about the world. And to do that, we need a new kind of machine that can transform the data we have into the information we need.
Lots of Data, but Little Insight
We gather data from our factories, equipment, and sales teams. Data related to product sales, locations, weather, and demographics. Data linked to individual preferences, choices, and behaviors. Data coming from any and all metrics and processes we can think of.
However, it turns out that data is not information. And, more importantly, data certainly is not insight.
For example, a cache of point-of-sale data associated with every retail outlet I have doesn’t tell me what products are underselling in Lubbock, Texas. Detailed test results don’t show me the difference between a student’s studying problems and my ability to teach. Terabytes of client data scattered about in Hadoop clusters don’t tell me what actions I need to take to reduce churn.
I can figure these things out from the data, but the data alone does not give up this information for free. Fortunately, powerful Big Data Business Intelligence (BI) tools can pull information out of data. Or, if we’re being honest with ourselves, extract even more data. The reality of BI tools is that we use them to generate more data, which often does little to help our decision makers, who are waiting on the information and insight they asked for 10 years ago, when their big data initiatives started.
Decision makers don’t want data. They want to understand what’s happening in the world. Data for the sake of data is a waste of time and money. Spreadsheets, visualizations, and dashboards fail because they may express the data, but they don’t communicate facts and the events in the world that gave rise to them.
The Solution Doesn’t Scale
People have a hard time understanding data. That’s why data scientists and analysts are often asked to convert their findings into narrative reports. People with interpretive skills are turning that data into the thing that most of us easily understand: narratives explaining what is going on in the world based on evidence provided by the data.
Unfortunately, this highly manual mechanism is unsustainable. To get the reporting we need, we use people with strong analytic training, excellent business awareness, and exceptional communications skills to look at the data only they can understand and transform it into reports that everyone else can read.
Tens of billions of dollars have been invested in big data, and the only way we can get value out of it is to have a really smart person sit at a screen, figure out what is going on, and explain it to us. Aside from the fact that this person is expensive and the task requires only a small portion of his or her skill set, this approach creates a bottleneck that chokes our ability to utilize the insights contained in the data we have been gathering for years.
So not only does this approach not scale and is incredibly costly, it also forces people with exceptional skills to perform tasks that they simply don’t like to do.
None of these approaches gives us data-driven reporting that scales. They don’t give us human insight at machine scale. But that is exactly what we need.
We need systems that can analyze data to derive an initial set of facts about the world, characterize clusters of those facts into patterns that make sense, evaluate those facts and clusters to determine what is more or less important, and generate the language to communicate what has been discovered. These steps are the basis for technology that can take raw numerical and symbolic data and transform them into narratives and reports that can be read and understood by nearly everyone.
Ironically, this machine-based approach, which is squarely in the realm of Artificial Intelligence or Cognitive Computing, is really the only way we can produce content that is genuinely personal. If I have data describing 1 million client portfolios, the only way I will be able to generate reports for each of these clients is to have a machine do it. And, if I want those reports to be readable by normal people – no offense to data analysts – I will need that machine to be able to process and interpret data and map it onto language.
Once configured, this communication layer will allow people who do not have data analysis skills to see what is happening in the world without the mediation of data experts.
For companies, this capability will allow decision makers to have access to information about their world that used to require teams of analysts and writers to generate. The speed and scale of such systems supports a wealth of new capabilities, including faster personalized communication with clients, new products based on data that’s already at hand, the ability to generate regulatory documents at scale and low cost, and new lines of communication around performance with divisions, retail outlets, and franchises.
The impact of this technology scopes well beyond business applications. For government, data on traffic, crime, social services, educational services, and health care all can become specific, neighborhood-level narratives that explain and explore what is happening in their world to anyone who can read rather than only to those who can calculate.
Likewise, the data associated with us as individuals, including the wealth of data from the emerging Internet of Things will be transformed into reports that real people will be able to read and understand. Rather than seeing data, they will see stories of their own lives mapped out for them based on artificial intelligence language systems looking at their data and explaining it to them. Data associated with their homes, cars, health, exercise, and fitness will become the clear, clean narratives that will be the stories of their lives.
The days of thinking of data as the end game are over. We now are entering the era of the narrative – narratives generated by systems that understand data, narratives that give us information to support the decisions we need to make about tomorrow.
Data will always be important, but the story of that data is the last mile.