Published on the 19/09/2023 | Written by Heather Wright
Unstructured, untapped and undervalued…
Ninety percent of data is unstructured, and less than half of it is being shared between employees or systems or being used after initial collection – leading to dark data full of untapped value – and security risks.
That’s according to an IDC report, commissioned by content management, collaboration and file sharing vendor Box, which says 41 percent of those surveyed admitted that the majority of data in their company is used just once, then left unaccessed.
“Centralising unstructured data should be a top priority.”
That unstructured data – some 57,280 exabytes (with an exabyte being one billion gigabytes) was created globally last year, with the figure expected to soar 28 percent to more than 73,000 exabytes this year – comes in many forms, from documents and PDFs to videos, images and audio clips. It’s in purchase orders, product inventories and import and export records, sales agreements, marketing content, contracts, patents, patient treatment notes, financial earnings reports and employee performance records, design and engineering documents, product specifications, product roadmaps, emails, meeting transcripts, notes, presentations and instant messages.
And it is a veritable gold mine, particularly in the age of analytics and generative artificial intelligence, which relies on large language models trained on massive volumes of data.
Yet despite frequent incantations of ‘data is the new gold/oil’ little is apparently being done to mine that gold/oil. And just like gold and oil, data in its raw form isn’t all that useful.
Untapped Value: What Every Executive Needs to Know About Unstructured Data, says key to the issue is the siloed nature of the data, which is created, replicated, stored and managed in myriad applications, tools and systems, with little cataloging of sources of data across organisations.
The rapid growth of unstructured data, over and above companies’ ability to use, process or manage it, and the increasing variety of the data were also cited as key issues by companies.
For companies, and for their staff, the unstructured data and its sprawl is leading to time consuming searches for information, or replication of content such as slide decks, project plans and operating procedures, with IDC noting 22 percent of unstructured data is unnecessarily replicated because people can’t find it, or don’t know it exists.
And the productivity loss is just one side of the coin, with IDC flagging that siloed and sprawled data leaves businesses susceptible to security and compliance risk as it is ‘nearly impossible’ to protect decentralised data. Fifty-one percent of the businesses surveyed reported non-compliance with data regulations in the past 12 months.
But getting the funding for projects leveraging unstructured data is a big challenge, with respondents citing a lack of understanding by IT and line of business management of the value of unstructured data, as top issues, alongside the difficulty in quantifying return on investment and a lack of experience or expertise with unstructured data technology.
To grab the opportunity and value of unstructured data, the report says companies must address four factors affecting and being affected by unstructured data: Complexity, business risks, compliance challenges and productivity.
For companies that had managed to use their unstructured data in the past 12 months, benefits included improved customer satisfaction, engagement and retention, innovation and employee productivity, alongside improved data governance.
While much of the report is focused around harnessing data for generative AI – noting that only three percent of respondents weren’t considering deploying the technology – other more traditional IT offerings, can also benefit from access to the data.
“To not only remain competitive, but to thrive in the era of AI, organisations must treat their data as an asset,” the report says.
While that’s already the case for structured data, which gains 60 percent of tech spend, there’s much work to be done on the unstructured side.
“Organisations are interested in investing more in unstructured data initiatives as they realise the need for such data in training GenAI models. They recognise the risks and costs involved with not investing more in technology, skills and processes related to deriving value from unstructured data.”
The report urges companies to take stock of their unstructured data and the processes that rely on it, and unsurprisingly given that the report is funded by Box, suggests evaluating the latest tech platforms to unify unstructured data, assessing platform scalability, performance, manageability, interoperability and security.
Deploying a data classification scheme to support unstructured data access and utilisation is also urged, along with initiating or expanding data literacy programs for employees ‘to facilitate better interaction with AI-infused data solutions’.
“Centralising unstructured data should be a top priority for IT leaders everywhere,” Box says. “IDC’s research backs this up, as more than half of respondents noted that implementing a unified, governed, secure, accessible unstructured data platform would have a positive impact on key metrics like cost and innovation (92 percent), as well as security (80 percent).
“Failing to manage your unstructured data results in fragmentation of your content, application sprawl, lost productivity and most of all, real business risk.
“Those who prioritise, manage and secure unstructured data will gain a distinct advantage and those who don’t will be left behind.”