Customer Login   |    Customer Support   |   

Apparity Perspective

Hidden Data Puts GDPR Compliance at Risk

March 15th 2018

The need for data lineage – the ability to trace how data moves through your organization, and how it gets changed (and by whom) during that process - has been well-established in IT departments for years. It is standard fare when it comes to data integration tools, data governance and so on. However, it is far less commonly applied when it comes to end-user computing (EUC) software such as spreadsheets and Access databases.

That’s an issue. It is particularly an issue when it comes to regulatory compliance and, especially, with respect to GDPR (general data protection regulation). Essentially, the problem is that data in one spreadsheet, or EUC resource, is often copied or merged into other spreadsheets or resources. In addition, that data may have been sourced from elsewhere – either externally (a marketing list, say) or internally (from a database or application) – and/or data may be exported from your EUC environment into a database or application, or perhaps a data lake. And from there, the data may be processed or analysed and passed on yet again, even back to another spreadsheet.

The problem for regulatory compliance is that you need to track these movements of data. If a customer activates his right to erasure under GDPR then his personal data needs to be removed from all of the places where it resides. Simply erasing the data from the original source will not be enough so you need to know about every place that has been touched by this data.

The traditional approach to this problem has usually relied on using discovery tools to find out where private data resides and then using data matching technology to bring information together. This allows you to understand that this data element in this place reflects the same customer as the data in that place. In other words, you aim to find all the private data and then join it together. The problem with this approach is that it is both time-consuming and expensive.

Even supposing that relevant tools have the ability to work with spreadsheets (most were designed to work with relational databases), this is not an efficient approach. If you do not know which spreadsheets contain private data, you have to look at them all. And there could be tens of thousands of spreadsheets to look at, so the scale of the problem tends to be much greater than it is for databases. A far better approach is to identify private data as it first enters any particular spreadsheet and then follow it – using data lineage – as it moves across your organization. In the event of a request for erasure, you know all the places to go to.

Of course, you are likely to be already storing a lot of sensitive data, so you also need to be able to track historic data lineage: we recognise that this is private data, where did it come from? So, you actually need two things from products providing data lineage capability: the ability to uncover it and the ability to monitor it. This is fundamental for good governance regardless of any particular regulatory regime.

GDPR, Spreadsheets and Private Data

February 1st 2018

Historically, spreadsheet management and governance has tended to be thought of as something distinct and different from conventional IT-based concerns about governance. In a sense, this is not surprising: spreadsheets are deployed by end users while the sort of data and processes to which data governance applies have traditionally been, at least in part, within the domain of the IT department.

GDPR changes this paradigm. The regulation makes no distinction between private data that is stored in Oracle, SQL Server or Db2, or data that is stored in a spreadsheet or, for that matter, an Access database. Organisations must address the requirements of GDPR regardless of where private data resides. Complying with GDPR means having a single project – albeit with many moving parts – that spans both end-user software such as spreadsheets and more IT-based platforms and systems. I think this is pretty much a first.

What are the implications of this? Well, one of the implications of GDPR itself, is that private data must be treated as a business asset. Of course, many companies have already bought into the idea of the data-driven enterprise but, in that context, some data is more important than other data. If it’s useful for analytics, then its potentially important but if not: not. But GDPR extends this to all private data and, for the benefit of American readers, I should say that the definition of “private” in this context, is much broader than PII (personally identifiable information) data.

So, it’s any private data that is important. But that is not all. To comply with GDPR you need to discover private data, anonymise it where necessary, and ensure that you have consent to use that data before doing so for any specified purpose. However, this is not a one-time fix: it is an ongoing process. In practice, this means that the underlying tasks of discovery, masking and so forth will need to be operationalised. In effect, the data is treated as a business asset.

How does this impact on spreadsheets? Well, they must be a part of this same process. If you have spreadsheets with private data in them, then every time that data is altered, or you add a new person to your spreadsheet, then those changes must be propagated to whatever governance mechanisms are in place to provide compliance monitoring. Or vice versa: the changed data may be sent to the spreadsheet. This will typically mean that these spreadsheets will need to be linked to either the CRM (customer relationship management) or MDM (master data management) systems that will be required to provide the single view of the customer (or employee) that GDPR compliance almost certainly needs. Further, you will no doubt implement corporate policies about how and where private data can be stored and used. Spreadsheets will need to be monitored for compliance with these policies in the same way that more IT-oriented data is monitored.

In other words, spreadsheets will come in from the cold. IT will have a much more significant role in providing oversight with respect to the use of spreadsheets. And that may start with private data, but I don’t think it will end there. My guess is that IT will want to ensure that governance best practices are applied to all aspects of spreadsheet use. Given the absence of spreadsheet-specific features in current data governance suites, that’s going to be good news for vendors specialising in spreadsheet governance and management.


Apparity Solution Summary

Change Control Workflow

Security Access Control

Reporting Dashboard

Risk Assessment, Discovery

Inventory Management

Spreadsheet Integrity

Competitive Differentiators


End User Computing Controls

SOX Compliant Spreadsheet Processes

Model Risk Management


Financial Services





Professional Services

Customer Support


About Apparity


Contact Us

Copyright © 2018 Apparity LLC Apparity