AI will soon oversee its own data management

By Arthur Cole

AI thrives on data. The more data it can access, and the more accurate and contextual that data is, the better the results will be.

The problem is that the data volumes currently being generated by the global digital footprint are so vast that it would take literally millions, if not billions, of data scientists to crunch it all — and it still would not happen fast enough to make a meaningful impact of AI-driven processes.

AI helping AI

This is why many organizations are turning to AI to help scrub the data that is needed by AI to function properly.

According to Dell’s 2021 Global Data Protection Index, the average enterprise is now managing ten times more data compared to five years ago, with the global load skyrocketing from “just” 1.45 petabytes in 2016 to 14.6 petabytes today. With data being generated in the datacenter, the cloud, the edge, and on connected devices around the world, we can expect this upward trend to continue well into the future.

In this environment, any organization that isn’t leveraging data to its full potential is literally throwing money out the window. So going forward, the question is not whether to integrate AI into data management solutions, but how.

AI brings unique capabilities to each step of the data management process, not just by virtue of its capability to sift through massive volumes looking for salient bits and bytes, but by the way it can adapt to changing environments and shifting data flows. For instance, according to David Mariani, founder of, and chief technology officer at AtScale, just in the area of data preparation, AI can automate key functions like matching, tagging, joining, and annotating. From there, it is adept at checking data quality and improving integrity before scanning volumes to identify trends and patterns that otherwise would go unnoticed. All of this is particularly useful when the data is unstructured.

One of the most data-intensive industries is health care, with medical research generating a good share of the load. Small wonder, then, that clinical research organizations (CROs) are at the forefront of AI-driven data management, according to Anju Life Sciences Software. For one thing, it’s important that data sets are not overlooked or simply discarded, since doing so can throw off the results of extremely important research.

Machine learning is already proving its worth in optimizing data collection and management, often preserving the validity of data sets that would normally be rejected due to collection errors or faulty documentation. This, in turn, produces greater insight into the results of trial efforts and drives greater ROI for the entire process.

Mastering the data

Still, many organizations are just getting their new master data management (MDM) suites up and running, making it unlikely they will replace them with new intelligent versions any time soon. Fortunately, they don’t have to. According to Open Logic Systems, new classes of intelligent MDM boosters are hitting the channel, giving organizations the ability to integrate AI into existing platforms to support everything from data creation and analysis to process automation, rules enforcement, and workflow integration. Many of these tasks are trivial and repetitive, which frees up data managers’ time for higher-level analysis and interpretation.

This trend toward deploying AI to manage the data it needs to perform other duties in the digital enterprise will change the nature of work for data scientists and other knowledge workers. People will no longer be tasked with doing the work they do now and instead will focus on monitoring the results of AI-driven processes and then making changes should they veer from defined objectives.

More than anything, however, AI-driven data management will speed up the pace of business dramatically. Data is king in the digital universe, and kings don’t like to wait.

Via VentureBeat.com