Tag: unstructured data

Which Industries Use More Unstructured Data?

Big Data Study -Which industries use unstructured data?We found that some industries are much heavier users of unstructured data and externally sourced data that has more recently flooded into the digital veins of corporate IT networks.

When it came to the use of structured and unstructured data, media and utility companies reported using the highest percentages of unstructured or semi-structured data (and thus the lowest percentage of structured data). In fact, both industries indicated that about two-thirds of their data was unstructured or semi-structured. In comparison, retailers, travel/hospitality/airlines and energy & resources companies had the highest percentage of structured data – all at least 60%. (See Exhibit III-3)

Exhibit III-3: Industry Comparisons on Structure of Data

Q8: Mean Estimated Percentage of Structured, Unstructured and Semi-Structured Data, Across Big Data Initiatives of All Companies

Exhibit III-3: Industry Comparisons on Structure of Data |Q8:   Mean Estimated Percentage of Structured, Unstructured and Semi-Structured Data Across Big Data Initiatives of All Companies

When it came to the percentage of data sourced internally compared to the percentage of data sourced externally, the companies with the greatest mix of external data were from high tech (36%), telecom (35%), heavy manufacturing (32%), and insurance (32%). At the other end were the media & entertainment industry (only 17% of data was sourced externally) and consumer goods (with 20% of data sourced externally). (See Exhibit III-4)

Exhibit III-4: Industry Comparisons on Sources of Data ( Internal vs. External) 

Q9:   Mean Estimated Percentage of Data from Internal or External Sources Across Big Data Initiatives of All Companies

Exhibit III-4: Industry Comparisons on Sources of Data ( Internal vs. External) |Q9:   Mean Estimated Percentage of Data from Internal or External Sources Across Big Data Initiatives of All Companies

 

 



Big Data Study Findings: Industries
Read more topics in this section:


Home | Download Report | Big Data Services | Contact a Consultant Today

What Kinds of Digital Data are Companies Using?

Big Data Study - What Kinds of Digital Data are Companies Using?

One way that Big Data experts such as Tom Davenport distinguish between the eras of ‘big’ and ’little’ data is on the type of data companies are using. Big Data is more associated with unstructured and external data. But what does this mean? While there are many ways to classify such data, the two most common are:

    • The degree to which the data is ’structured’. Data that is numerical (financial, order, and other data) is regarded as structured – neatly able to fit in the columns and rows of modern database management software. ‘Unstructured’ data cannot so easily be compiled into older database formats. This data could be digital video, text (increasingly coming from comments on social media sites such as Twitter, Facebook and LinkedIn), digitized audio and other types. To analyze this data, the technology needs to process it in some manner. (’Sentiment analysis’ is a hot trend in how to treat social media data – e.g., determining people’s sentiments about a company and its products and practices.)

 

  • Whether the data is ’internal’ or ’external’ data. Is data generated by the company or brought from the outside? For example, an increasing number of companies (particularly retailers and restaurant chains) are seeking external data from telecommunications firms that can track customers’ locations through their mobile devices. The value of this data to retailers is the ability to intercept potential customers who are in the vicinity of their stores with targeted marketing offers that may convince them to walk in.

Defining Types and Sources of Digital Data

In our research, we defined data along two dimensions: structured versus unstructured and internal versus external. Given below are the definitions we used.

On the dimension of data structure:

  • Structured – Data that resides in fixed fields (for example, data in relational databases or in spreadsheets)
  • Unstructured – Data that does not reside in fixed fields (for example, free-form text from articles, email messages, untagged audio and video data, etc.)
  • Semi-structured – Data that does not reside in fixed fields but uses tags or other markers to capture elements of the data (for example, XML, HTML-tagged text)

On the dimension of data source:

  • Internal – from a company’s sales, customer service, manufacturing, and employee records; from visits to the company’s website, etc.
  • External – from sources outside a company such as third-party data providers, public social media sites such as Facebook, Twitter and Google+, etc.

Classifying Big Data along these two dimensions, we then wanted to know how much of companies’ data was structured versus unstructured, as well as how much was generated internally versus externally. We were surprised by the combined results across all four regions of the world that we surveyed:

  • 51% of data is structured
  • 27% of data is unstructured
  • 21% of data is semi-structured

A much higher than anticipated percentage of data was not structured – either unstructured or ’semi-structured’ (when combined, about half ). (See Exhibit II-7)

Exhibit II-7: Percentage of Data that is Structured versus Unstructured
Q8: Mean Estimated Percentage of Structured, Unstructured and Semi-Structured data, across all of the Company’s Big Data Initiatives

Exhibit II-7: Percentage of Data that is Structured versus Unstructured

And a little less than a quarter of the data was external. (See Exhibit II-8)

Exhibit II-8: Percentage of Data That is Internal versus External
Q9: Mean Estimated Percentage of Data that comes from Internal or External sources, across all of the Company’s Big Data Initiatives

Exhibit II-8: Percentage of Data That is Internal versus External

North American companies had the highest percentage of structured data; Asia-Pacific companies had the most unstructured data. North American companies also had the highest percentage of internal data; Asia-Pacific companies had the lowest.

To discover new patterns in Big Data, companies need highly efficient ways to aggregate data across data warehouses and other data stores. Since most data in these stores is structured, it is far easier for analysts to explore it. It is also not difficult to create structured data out of semi-structured data such as web activity.

However, unstructured data (for example, free-form text, video, audio, and image data where context needs to be derived from the data) is hard to discern. The most sought-after data right now, text as natural language processing (NLP), can be used to derive context that is beyond the typical sentiment analysis. Nonetheless, some text data (particularly Twitter tweets) are fairly semi-structured. Hashtags give some sense of context, while mentions, retweets, and @’s provide references to people. Facebook posts, blog posts, and other free-form text are more difficult to analyze, as noted above. However, tags and other meta-data can help narrow down the context of a comment.

In the interviews that our research team conducted, many executives said their companies’ usage of unstructured data is not only increasing but is also becoming essential. “Studies have been done on electronic records that show, on average, 80%-90% or more of data in records is unstructured data,” one health care executive said. “That requires natural language processing to extract information.” He said much of the health care industry is trying to improve capturing and analysis of unstructured data such as images, emails, physician and nurses’ notes, etc.

Companies are increasingly looking to external data to get a fuller picture of activities that might affect them – particularly customer behavior. The soaring use of mobile devices now provides companies with data that, at least in theory, can help them track customer movements. This kind of external data is fully on the radar of global companies.

The head of Ford Motor Company’s analytics group, John Ginder, put it this way to one trade magazine: “We recognize that the volumes of data we generate internally … as well as the universe of data that our customers live in and that exists on the Internet … are huge opportunities for us that will likely require some new specialized techniques or platforms to manage.” Internet data that consumers provide appears to be of big interest. “The fundamental assumption of Big Data is the amount of that data is only going to grow and there’s an opportunity for us to combine that external data with our own internal data in new ways. For better forecasting or better insights into product design, there are many, many opportunities.”1

Who is Selling Their Big (Digital) Data?

With companies capturing so much more digital data today to understand their operational performance moment-by-moment, the behavior of customers and suppliers, and other vital signs of the business, it’s begun to raise eyebrows of both opportunity and concern. Executives are seeking data the organization has that might be of value to another organization, and from which the firm might be able to profit. That’s the opportunity side.

In 2012, about one-quarter of the companies we surveyed (27%) were capitalizing on this opportunity: selling their digital data. U.S. companies profited least from such data, with only 22% doing so. In contrast, half the Asia-Pacific companies we polled said they sell their digital data. About one-quarter of European and Latin American companies sold their digital data in 2012. (See Exhibit II-9)

Exhibit II-9: Who’s Selling Their Digital Data?
Q10: Percentage of Companies that Sell their Digital Data?

Exhibit II-9: Who’s Selling Their Digital Data?

For the approximately one-quarter of companies that sell their digital data, how lucrative is it? Our survey found that the annual revenue from selling such data was not trivial. In 2012, on an average, selling digital data contributed $21.6 million to the revenue of companies. (Exhibit II-10).

Exhibit II-10: How Much Money are Companies Generating From the Data They Sell?
Q13-a: Mean Annual Revenue Per Company in 2012 from Selling Digital Data

Exhibit II-10: How Much Money are Companies Generating From the Data They Sell?

So clearly, some companies are profiting from their data, albeit a distinct minority today. However, of the 73% of companies that did not sell such data, 22% said they do plan to sell such data by 2015; 55% don’t; and 23% did not know. That means by 2015, 43% of companies will sell their digital data (the 27% that already do today, plus the 22% of the 73% that don’t today).



Big Data Study Findings: Regional
Read more topics in this section:


Home | Download Report | Big Data Services | Contact a Consultant Today

 

  1. Jason Hiner, Zdnet, “Ford’s Big Data chief sees massive possibilities, but the tools need work,” July 5, 2012. []

The Spotlight Continues to Shine on Big Data

Big Data Study - The Spotlight Continues to Shine on Big DataBig Data has become big news almost overnight, and there are no signs that interest is waning. In fact, several indicators suggest executive attention will climb even higher. Over the last three years, few business topics have been mentioned in the media and researched as extensively as Big Data. Hundreds of articles have appeared in the general business press (for example, Forbes, Fortune, Bloomberg BusinessWeek, The Wall Street Journal, The Economist), technology publications and industry journals, and more seem to be written by the day. A March 2013 search on Amazon.com surfaces more than 250 books, articles and e-books on the topic, most of them published in the last three years.

Dozens of studies have been conducted on Big Data as well, and every week another one appears. Most of the big consulting firms and IT services companies have weighed in, as well as (of course) the technology research community: Gartner, Forrester, IDC and many of the rest. And the number of times the seven letters “Big Data” have been clicked into a Google search has exploded in the last three years. (See Exhibit I-1)

Exhibit I-1: A Billowing Number of “Big Data” Online Searches

The term saw relatively little usage in online searches in the first half of 2010. And this Google Trend information shows ’Big Data’ was of particularly keen interest in certain countries (Exhibit I-2): India, South Korea, the U.S., Australia, Canada, Western Europe, and Brazil.

Exhibit I-2: Regional interest for “Big Data” Online Searches

The Emerging Big Returns on Big Data:

In 2012, we launched our own study on Big Data. We designed it to shed insights on six core issues, ones on which we felt the marketplace was looking for greater clarity:

  1. How much are companies investing in Big Data, and what kinds of returns are they achieving on their spending?
  2. What are companies in 12 industries doing with Big Data? That is, in which business functions and specific activities are they focusing their investments?
  3. What kinds of digitized data are they finding to be the most important?
  4. How are they organizing the professionals who process and analyze Big Data (e.g., embedded in business functions, in a central analytics group, etc.), and what are the pros and cons of those reporting relationships?
  5. What are the biggest challenges of turning Big Data into insights that enable the company to make far better and faster decisions?
  6. What is the current state of the technology, and where is it going?


Introduction and Key Findings

Read more topics in this section:


Home | Download Report | Big Data Services | Contact a Consultant Today