What is Zero-Copy, and what can you learn from the Canadian standard for data integration?
March 1, 2023
Originally published in Dutch on IT Daily, the following is a translated copy
Many copies, little control: today most organizations do not handle data very efficiently. In Canada, CIOs are coming together around a new standard: Zero-Copy Integration. This should put an end to data copies and help organizations to remain in compliance with rules such as the GDPR.
Is Canada solving the proliferation of data? The country is at least making an attempt with Zero-Copy Integration. Zero-Copy is a standard developed by the Data Collaboration Alliance, together with Canadian government agencies and CIOs. The standard must ensure that organizations can roll out modern digital solutions without the management of associated data becoming a mess. Although Zero-Copy is a Canadian initiative, the framework provides an excellent basis for digital transformation worldwide.
More need for the same data
The Zero-Copy Integration standard was born in a context where companies and customers increasingly demand digital experiences, which are ideally connected. This causes a proliferation of applications that require access to the same data. To provide this, data from sometimes hundreds of systems is copied into databases tailored to a few applications. Developers do not only follow this approach (often out of necessity) for external applications. Databases are also copied en masse internally to support different applications.
“That wastes almost half of the IT budget for companies worldwide,” says Dan DeMers, President of the Data Collaboration Alliance. “Moreover, it is the reason that citizens and companies no longer have control over their own data.” In 2020 we already wrote that 82 percent of companies use more than ten copies of databases. In general, 65 percent of all data in databases is not unique.
Yet it doesn't have to be this way. There are several digital solutions available today that can avoid the need to copy data en masse to provide certain functionality. Large and progressive organizations worldwide are already using this. Snowflake, for example, earns its keep with a cloud-based solution where applications, organizations and people, both external and internal, get access to data from the same central dataset.
The Canadians are moving forward with a standard that encourages such practices and puts them in a framework. “The notion of having to copy data to share it must go,” explains Keith Jansa, Executive Director of CIO Strategy Council in Canada. Zero-Copy Integration takes the form of a set of six principles that organizations, developers, IT administrators and data architects must follow to prevent unnecessary copies.
- Data centricity is prioritized, along with metadata, for complex code.
- Modularity is preferred over monolithic design.
- Data management should be through a shared data architecture, not across app-specific databases.
- Universal access control must be done via the data layer.
- Data governance runs through the data products and federated access, instead of centralized teams.
- Data is shared through collaboration based on access rights, not through copies.
Get rid of the app database
That deserves a little explanation. With data centricity, Zero-Copy Integration wants to ensure that organizations see data as their most important asset. The data is permanent, while applications come and go. The data architecture thus takes priority over application development and is actually separate from it. Data is in a virtual library, applications can view them there.
Modularity builds on that. By developing a modular environment, it is easier to work with those central data sets. This contrasts with rigid monolithic app development, where a private (copied) database is more difficult to avoid.
The third point is also an extension of that principle. App-specific databases should no longer exist when applications are modular and data is a central primary asset. A database for an app should not exist: all data deserves its place in the central data database.
This implies that security and policy must take place at the level of the data. You give access to accounts and applications via the data layer. This prevents a mixed bag of applications with multiple accounts from providing different access rights to what is often the same (copied) data. By centrally assigning rights to all data, you as an organization maintain overview and control.
In that structure it is possible to place control over data policy with the experts who have knowledge about it. This can be done by developing a policy in which individual teams can work within the margins of their own access rights, for example with the rollout of an application. The approach contrasts with centralized access management performed by one data team, which has ultimate and final control over all details. Of course, someone has final responsibility, but a federated system ensures that teams lower on the rights ladder can still color within certain lines as they wish.
Finally, the Zero-Copy standard stipulates that all access to data must be done via policies and access rights. There is no longer a valid reason to provide internal or external rights holders with a data copy, which can then take on a life of its own. The owner of the data thus retains final control at all times.
This certainly implies that Zero-Copy Integration takes a cloud-first approach when it comes to external access. This is the only way to give external parties efficient access to data. Applications that rely on the central database expect responsiveness. If there is a traffic jam to view or modify data in the central repository, the temptation again beckons to quickly make a copy for an app-specific database and the entire plan for Zero-Copy Integration falls apart.
Zero-Copy Integration has numerous advantages. Data does not take up unnecessary HDD or SSD space with extra copies, there are no more conflicting versions of data, access control is simple, the overview is retained and audits for compliance become possible.
Canadians view the standard as interesting for companies, the government and citizens. For companies, modern data architecture offers a flexible way to deal with digital transformation. By separating data and applications in a well-thought-out manner, it becomes easier to build new applications and dispose of old ones. When Zero-Copy Integration is widely embraced, an architecture is created that allows data to be shared securely and in a controlled manner with external parties. Once again we think of the ambitions of Snowflake, which with its Data Cloud allows companies to combine external data sets with their own data, or vice versa: they can make their data available to third parties.
If we look at individual users, we see Zero-Copy as a standard that fits in with Tim Berners-Lee's Solid story . Solid is a data architecture that relies on data vaults. These vaults form a central repository of data that remains under the control of the data owner. According to the Zero-Copy principle, (external) applications can use the data, but the safe remains the central storage location.
The Flemish government is not behind the Canadian government in that regard. In Canada, Zero-Copy Integration is creating a standard that certainly has broad support from the IT world there, but we are conducting plenty of experiments with Solid. Both initiatives try to provide a solution for different facets of the same problem.