As Southeast Asia continues its rapid digital transformation, businesses are generating vast amounts of data daily, much of which remains underused or even forgotten. This phenomenon, known as "dark data," presents a growing challenge for companies across the region. Despite the increasing focus on data as a valuable asset, a significant portion of it — around 52% of an average organization's unstructured data—remains unused, posing both risks and missed opportunities.
Dark data is often a byproduct of everyday business activities, accumulating through various interactions across digital platforms, devices, and systems. This data, which can include everything from server log files to unstructured information from social media and chat logs, is frequently stored without a clear strategy for its utilization. As organizations transition from legacy systems to modern infrastructures, the problem is exacerbated, with data becoming increasingly fragmented and difficult to manage.
The implications of neglecting dark data are substantial. Not only does it represent a significant waste of resources—consuming storage space and energy—but it also poses serious sustainability and privacy risks. The energy required to store vast amounts of unused data contributes to the digital carbon footprint, challenging global efforts to meet net-zero targets. Additionally, dark data often contains sensitive information that, if left unmanaged, could lead to compliance issues and security vulnerabilities.
In this interview, we delve into the complexities of dark data management with Geoffrey Coley, Regional CTO, Asia Pacific, Veritas Technologies. Geoffrey shares his insights on the challenges businesses face in the digital age, the sustainability and privacy concerns associated with dark data, and the best practices companies can adopt to minimize its generation. As Southeast Asia's digital landscape evolves, understanding and addressing the dark data challenge will be crucial for businesses looking to harness the full potential of their data while mitigating associated risks.
Can you explain what dark data is and why it has become a growing challenge for businesses in the age of digitalisation?
Dark data refers to the unknown, unused or untapped data within an organisation, generated through daily interactions across various devices and systems, including machine data, server log files, and unstructured data from sources like social media and chat logs. It makes up roughly 52% of the average organisation's unstructured data estate.
In this age of digitalisation, dark data has become a growing challenge because, despite the increasing emphasis on data's value, large volumes of it remain underutilised or completely undiscovered. For instance, data fragmentation continues to plague many organisations, with data stored in silos, sprawling across different systems, resulting in lack of visibility and isolation. Furthermore, employees are contributing to the growing dark data deluge when they store duplicate, redundant or obsolete information without deleting it regularly.
As enterprises continue to generate and accumulate vast amounts of data and move from legacy systems to modern infrastructure, the problem of effectively managing and leveraging dark data intensifies. When data stored in these legacy systems cannot be integrated with the new analytics tools, it often goes dark, making it difficult to harness potential insights and derive value from these overlooked resources. Lack of a robust data governance framework also, often, contributes to data becoming disorganised, lost or unusable.
What are the sustainability and privacy implications of not addressing dark data?
Digitalisation can contribute to climate change solutions, but storing unused digital data consumes substantial energy. Currently, companies generate 1.3 trillion gigabytes of dark data a day, and storing this data for a year using non-renewable energy generates as much CO2 as 3 million flights from London to New York. With data volumes doubling every two years, the issue of dark data is growing increasingly severe. If left unaddressed, it will be a mounting challenge to meet the global 2050 net-zero targets.
Moreover, as the volume of data grows each year, dark data becomes more likely to go undetected. This undetected data could either be valuable or pose risks, such as non-compliance fines or other liabilities. Dark data often contains sensitive information that must be handled with care to safeguard privacy and ensure compliance with regulations. To address both sustainability and privacy concerns, businesses need to actively monitor and manage data throughout its lifecycle, ensuring data privacy and compliance while contributing to digital decarbonisation efforts.
How has the problem of dark data evolved with the increasing reliance on AI and digital technologies, especially in data-intensive industries?
With the increasing reliance on AI and digital technologies, particularly in data-intensive industries, the problem of dark data has evolved significantly. Rather than viewing dark data as a threat, these industries now see it as a valuable business opportunity. By unlocking this untapped resource, businesses are looking to gain a competitive edge, drive innovation, and enhance data-driven decision-making. To fully leverage the potential of dark data while managing associated risks, businesses should prioritise implementing robust data management strategies. As a first step, organisations will benefit from deploying proper classification tools and policies to understand what data they have, where it is located, who is using it, the number of copies that exist, if it is valuable or not, and more.
AI and other new digital technologies also come in useful for organisations that are revisiting their business processes and looking to integrate AI into their data management strategies with the promising prospect of enhancing efficiency and minimising human intervention. This results in operational efficiencies, increased uptime, higher service levels and AI-driven insights for effective data archiving and intelligent decision-making.
What types of dark data are most commonly generated in Southeast Asia?
Similar to their global counterparts, organisations in Southeast Asia are faced with the challenge of managing their data deluge. According to Gartner, dark data is defined as the information assets organisations collect, process and store during regular activities but generally fail to use for other purposes such as analytics, business relationships and direct monetising. Depending on its structure and discoverability, dark data can be commonly classified into three categories:
- Structured data includes data that is clearly defined and stored in spreadsheets or databases. Examples include server log files, Internet of Things (IoT) sensor data, customer relationship management (CRM) databases, and enterprise resource planning (ERP) systems.
- Unstructured data lacks a predefined format and requires conversion for analysis. Common examples include email correspondences, PDFs, text documents, and social media posts.
- Semi-structured data combines elements of both structured and unstructured data. Examples include HTML code, invoices, graphs, tables, and XML documents.
In your experience, what best practices should companies adopt to minimise the generation of dark data and improve their data management strategies?
Today, a significant proportion of businesses know they are storing irrelevant data but have made little or no effort to delete it. IT leaders must get ahead of the challenge as data volumes are increasing every year due to the rapid adoption of data-intensive technologies such as AI. Organisations must not only delete data waste but also help reduce costs and strengthen compliance.
- Data mapping and data discovery are the first steps in understanding how information flows through an organisation. Gaining visibility and insights into where data and sensitive information are being stored, who has access to them and how long they are being retained is a critical first step in the pursuit of eliminating dark data.
- Data minimisation and purpose limitation ensure organisations reduce the amount of data being stored and establish what is retained is directly related to the purpose in which it was collected. Proper classification, flexible retention and compliant policy engines also allow confident deletion of non-relevant information.
- Organisations must also report certain types of data breaches to the relevant supervisory authority to ensure continual adherence to compliance standards.
- Harnessing efficient backup and storage methods and advanced deduplication techniques can eliminate redundant data and contribute to a more sustainable data management approach.
What do you see as the future of dark data management, and how can businesses prepare for the increasing data demands of the digital age?
The future of dark data depends largely on the future of technological advancements and the effective use of those technologies by us. The challenge is that these technologies are ever evolving so businesses need to continually adapt their systems and processes. Hence, investment in employee training and development on how to make the best use of these technologies is imperative. Understanding of complex technologies is also equally important for resiliency so that businesses can adopt a mindful and balanced approach on how data collection methods are designed and conceptualized, which in turn will significantly reduce the accumulation of dark data from the start.
Additionally, traditional data management methods can no longer keep pace with the increasing volume, velocity and variety of data. Automation of data analytics with the use of AI tools, tracking and reporting to deliver organisational accountability for dark data, file use and security becomes a crucial factor for organisations to prevent data loss and ensure policy-based data retention.
No comments:
Post a Comment