DBTA: New Data Horizons: Data Prep, Data Visualization, and Data Catalogs Are Ready for Prime Time
February 9, 2022
Originally published on DBTA, February 10, 2022
Data management has never been so unfettered—and yet so complicated at the same time. An emerging generation of tools and platforms is helping enterprises to get more value from their data than ever. These solutions now support and automate a large swath of structural activities, from data ingestion to storage, and also enhance business-focused operations such as advanced analytics, AI, machine learning, and continuous real-time intelligence. However, persistent data and systems integration challenges, along with legacy systems and processes, still stand in the way of efforts to build end-to-end data-driven enterprises.
Next-Gen Tools Evolving
Today’s generation of data management and enablement solutions has come a long way in recent years, and industry leaders agree that the new generation of data enablement tools and platforms is far more advanced than 10, or even 5, years ago. “Today’s technologies are easier to use, more focused on the business user, and designed to reduce the complexity of data silos,” said Dan DeMers, co-founder and CEO of Cinchy. “Some leading-edge tools go even further by enabling data to be originated in a way that prevents new silos altogether. These tools are taking data enablement away from uber-techies and data nerds and truly democratizing the field.”
This shift is spurred by the growth of a “broader audience that is consuming data, from deeply technical users to less technical domain experts,” said Michel Tricot, co-founder and CEO of Airbyte. Now, he added, even “mom-and-pop shops are dealing with large datasets, heterogeneous data, and complex analytics.”
Companies of all sizes and business categories are adopting digital channels and approaches to compete in today’s economy. “Ten years ago, the business world wasn’t truly digitized,” observed Adi Paz, CEO of GigaSpaces. “Most consumers still went to bank branches, placed calls to agents, drove to the supermarket for groceries, and paid cash for their taxi rides.” As digital has become the dominant channel for many businesses, consumers have also gotten used to real-time, frictionless digital touchpoints.
Accordingly, tools and platforms seek to abstract these digital touchpoints from what are often complex and confounding underlying systems. “These tools are based on innovative architecture concepts designed to decouple apps from their systems of records, bypassing the spaghetti structure that companies already have in place and took as a given reality for years,” said Paz.
Consider how far things have come along in just a few years. Five years ago, “ETL for modern data architectures wasn’t solidified enough to guarantee interoperability and enable full data optimization,” said James Beecham, co-founder and CTO with ALTR. “Organizations have been moving toward a vision of one-click, dynamic data integration and enablement that works for the great cloud migration,?and that path accelerated during the last 2 years.”
Traditional databases only make up a fraction of a company’s digital knowledge. “Much of it lies out of sight, in the mountains of messages, documents, and files a company shares every single day,” said Kon Leong, CEO and co-founder of ZL Technologies. “The new generation of data enablement is solving the challenge of harvesting this data to answer critical questions that traditional structured data analytics is not capable of answering.”
An important emphasis for today’s solutions is that “data is managed based on rules and policies, thus allowing for real-time validation and curation,” said Radhakrishnan Rajagopalan, global head of technology services at Mindtree, who noted that this improves the ability to refine data cataloging and data discovery. “In contrast, old tools and technologies processed data in silos without knowing the context and the needs of the intended audience.”
In addition, the associated visualization and BI tools landscape is evolving quickly, from “tools that mainly accessed highly structured and formatted pre-modeled data to tools that can leverage the variety of data types or formats as well as the massive volumes,” said Balaji Ganesan, CEO of Privacera. “BI tools evolved and soon included built-in performance optimizers and aggregators to pre-calculate analytical insights. Auto data indexing and search style access also became popular because they made tools easier to use and required less expert skills to perform dashboards and analysis.”
The ability to decouple apps from their original systems of record “liberates the immense data previously stored in siloed systems of records,” said Paz. “This enables businesses to deliver a steady flow of new digital services and use cases at a pace they couldn’t even dream of just a decade ago.”
Democratizing and Elevating
Today’s tools “are designed to be used by anyone capable of asking meaningful questions,” Leong observed. “For example: Who are the most influential people in my organization? Who knows what? How does my company feel? These are just a few examples of today’s business-critical questions that can be addressed with minimal to no data science expertise. A fundamental difference in this approach is that insights come not from manipulating database tables but rather from scanning information created by humans for humans.”
With the plethora of tools available to address various aspects of the data lifecycle, companies are finding it more expedient to move to more all-encompassing platform approaches, be they SaaS or PaaS. This is accelerating a shift away from IT-driven data management and enablement to user-driven approaches, said Peter Jackson, chief data officer of Exasol. “These solutions are aimed at data management and business users rather than IT teams, and users are claiming ownership of the platforms and applications as they are no longer the exclusive domain of IT.” What it means is that business users without deep technical expertise can more easily use these tools, and this has been pivotal in helping to accelerate their productivity, he added.
This next generation of tools “is bringing instant visibility to large datasets that we never had before,” said Aubrie Cunningham, senior vice president of business intelligence and pricing at MedRisk. “This type of transparency allows business leaders to make informed decisions based on real time, and even predictive, data. Businesses are greatly benefiting by catching problems before they happen. They are becoming much more efficient in presenting data, eliminating the need to translate what the data is saying. Dynamic visualizations and alerting features allow for a new level of oversight for leadership.”
Data enablement platforms “have evolved significantly over the last decade,” pointed out Kunal Shah, senior product manager at SAS. “As the volume, velocity, and variety of data increased, so did the need for new versions of data platforms that were capable of storing large amounts of both structured and unstructured data in a centralized repository. These platforms—referred to as data lakes—focus on facilitating prescriptive and predictive analytics. And as cloud adoption increased, a new generation of data platforms was created primarily around cloud data storage and management, specifically around a cloud data warehouse.”
New approaches to data management and enablement are also changing the nature of data managers’ jobs. A common feature seen across many of today’s data solutions is greater automation, freeing data managers from repetitive, and often overwhelming, rote tasks. “Data professionals spend most of their time on manual processes to ingest, clean, and transform data in support of data operations,” said Chris Bergh, CEO of DataKitchen. “Automating these processes slashes maintenance costs and enables data scientists and engineers to focus on analytic insights that address business challenges.”
Add to the mix the increasing volume of low-code and no-code tools in the market. Until recently, “the people employed to run and support data management tools had to be highly skilled developers and data scientists,” said Paz. “With the recent rise of DevOps, data management is shifting to the hands of people with standard software skills. We’re now seeing a trend of low-code and no-code data management tools, utilizing a simple drag-and-drop canvas. So, businesses are becoming less reliant on highly skilled data scientists, and these experts can focus on developing innovative new data management concepts and processes.”
These types of tools not only appeal to citizen developers but data users as well, said Tricot, noting that the tools are being created to address specific audiences. “These tools are more specialized and enable companies to grow their hiring pool and focus on domain experts instead of focusing on domain experts who have deep technical skills. They enable teams to be up and running faster.” The tools not only appeal to citizen developers and analysts but also professional developers and data engineers as well, he added.
Bergh sees an emerging leadership role for DataOps engineers, who are essentially DevOps engineers that oversee the data pipeline moving from ingestion to analytics. “If we think of data operations as a factory, then the DataOps engineer is the one who owns the factory assembly line that builds a data and analytic product,” said Bergh. “DataOps engineers introduce automation into a data organization that can improve the productivity of data scientists and analysts by seven to 10 times.”
As with any business or life decision, it’s important to look before you leap. The challenge with data democratization is the sheer volume of data created that needs to be managed. The new generation of tools and platforms may solve the big data problem, but they also “lower the barrier of entry for advanced analytics for more organizations, and empower more consumers to use data,” said Sumit Sarkar, senior director of product marketing at Immuta. “This results in a new bottleneck from increasing users and rules for data use. While our tools are now better, smarter, and faster, the challenge is that there is so much data. And the complexity of storage and retrieval only increases as more users are added.”
Even if there is the ability to handle large volumes of data, “some functions haven’t changed much in 4 decades,” said Tricot. “Ultimately, we need more advances in data integration, which takes up a big portion of operating costs.” Collaboration on data projects is still an area “which is clearly vital but still primitive. We make copies and more copies of numerous documents and datasets, leading to versioning nightmares and potential non-compliance or security vulnerabilities with each new copy. It’s even stranger that it’s almost impossible to separate data from the applications used to generate, collate, or analyze it. Meanwhile, most applications have their own language and data model—a constant obstacle to developers of new applications that need to use that data.”
In addition, many of today’s data integration and enablement tools “still lack essential enterprise capabilities such as a data platform built for AI and, more importantly, data governance,” said Shah. “Effective data governance is a complex problem, and most large enterprises choose a multi-vendor approach for their cloud strategy. This creates an ever-growing list of data, reports, models, and other analytic assets. The topmost priorities of enterprise data stewards are the discovery of these data assets, providing context to the exploding data footprint, cataloging, data lineage, and data protection.”
Data enablement tools and solutions “have yet to exploit the full power of AI and data science disciplines, tools, and technologies,” said Rajagopalan. “They need to be augmented with automation frameworks and data governance standards. The successful implementation of data enablement tools requires a cultural shift and widespread data literacy across the organization, which goes well beyond the scope of such tools and platforms. We foresee a convergence in the data enablement space toward simplifying tools and technologies that are tightly integrated with data governance and standards, cloud platforms, enterprise automation tools, and open source AI, machine learning, and deep learning frameworks.”
In the months and years ahead, there will be “a growing need for data ethics and privacy in the absence of a universal framework,” Shah observed. “As enterprises collect more and more data, new data platforms must provide a way to identify personal and confidential data as an augmented offering.” Look for platforms “that can bring order to the growing data chaos and support evaluating the fitness of data in an efficient, governed, and secure manner,” he advised.
There is an increasing need for global cataloging and governance across varied cloud and on-premise data environments. “The ability to identify, locate, and trace datasets is still a key capability requirement, and this is exacerbated in a hybrid multi-cloud world,” said Anu Mohan, director of product, data integration and management for Vantage Cloud. “Access seems to have improved, in that most tools expect to just read a file or use very basic SQL to import data directly into the data consumption tools. But the non-standard elements of access make it harder. If you have data on-prem and in AWS, you have difficulties. If you switch from AWS to Azure, everything changes. The portability of data, the portability of applications, and the ability to manage across these environments are all challenges to be solved in the next 5 years.”
What’s also missing from most data enablement today “is the capability for organizations to secure the use of data in real time,” said Beecham. “Data enablement is insufficient when its focus is only on the movement of data, and not on how it’s accessed, protected, and transformed within use. Even with a data enablement plan, users can find themselves trapped in separate silos for security, governance, DataOps, analytics, and more—without an approach to data enablement that prioritizes integrated control and security. What’s needed is functionality that secures access to data, especially PII [personally identifiable information], not only when it’s at rest but also when it’s in transit, updated, or cleaned.”
Beecham predicts, however, that making data enablement work as part of a holistic approach to data security and governance is on the horizon and will be available within the next couple of years for the data industry. “Data visibility, control, and security are not just nice-to-haves, or requirements for compliance. They’re the foundation for understanding how data is used across the organization and for making it possible to get the full benefit of business data. When the ability to control and secure data is available across a complicated IT environment, it makes understanding and maximizing the use of all data easier and more valuable for businesses.”
Emerging Tools and Platforms
Industry experts agree that, eventually, emerging generations of tools and platforms will only increase in the ability to bring enterprise data into a common, unified place. “Over time, these tools will offer improved ease of use as well as built-in data governance processes, data quality controls, and checks,” said Jackson. “They will also provide full interoperability of data between tools and platforms so that no organization will ever have one single platform. Instead, they will have legacy tools that will be deeply integrated or have best-of-breed solutions that allow data to flow freely between the new tools.”
These tools can have a “profound top-line as well as bottom-line impact,” said Rajagopalan. “They help comply with regulations, improve customer experience, and drive businesses with powerful and actionable insights. Businesses do not have to grapple with a maze of data repositories. Instead, they have timely access to transformed, curated, and contextualized data, powered by intuitive technologies and appropriate resources.”
Ultimately, this new generation of data enablement tools and platforms is helping to expand what’s possible with data. “We used to be data-driven; now we’re driven by true intelligence, with business decisions that are better informed than ever before,” said DeMers. “This trend will continue—we’ll retain decision-making accountability but will be increasingly supported by intelligent systems that get ‘smarter’ as organizations improve data connectivity.”