Would you choose where to go on vacation if you could only access 10 to 20 percent of the reviews and information on a travel website? If you do, you will probably have an unforgettable trip, but for reasons you might not like. Yet government organizations and businesses – from manufacturing to insurance companies, and healthcare to banking – are making decisions along this very same line. And they've been doing so for years. They look at the easy information they can get from structured data while ignoring their unstructured data, which Deloitte believes may account for80 to 90 percent of content generated globally, making unstructured data a tremendous source of untapped value.
In an HTML file, on the other hand, the structure is not always as revealing of the deeper meaning. I can probably figure out that a particular piece of data is a title when it is found within a title / title tag-set. I may know that another piece of data should be underlined or emphasized because of how it is tagged, but I would not convincingly know why. Presumably this information is important, but at this level of structural understanding we run out of clues as to what we can attribute that importance to. Of course, this was by design.
Quoting Wikipedia: - Unstructured Data (or unstructured information) refers to information that either does not have a pre-defined data model and/or does not fit well into relational tables. Unstructured information is typically text-heavy, but may contain data such as dates, numbers, and facts as well. This results in irregularities and ambiguities that make it difficult to understand using traditional computer programs as compared to data stored in fielded form in databases or annotated (semantically tagged) in documents. Yes, most big data sources, including Facebook, twitter etc., have unstructured data. And nearly no analytics can work directly on this unstructured data.
The amount of data generated daily is just mind-boggling. And as much as 90 percent of that data is defined as unstructured data. But what does that mean and what do you need to know about unstructured data? Data that is defined as unstructured is growing at 55-65 percent each year. Unstructured data can't be easily stored in a traditional column-row database or spreadsheet like a Microsoft Excel table.