Databases have been the vanguard of enterprise IT for decades. But now they've become more important than ever before, playing a central role in today's data-driven landscape.
At the same time, they're being increasingly targeted by hackers, who are out to steal your data assets and expose the personal information you hold.
New data protection regulations have set out to address the issue of privacy as organizations collect and store growing amounts of personal data. As such, the need for database compliance with these data privacy regulations has had significant implications for database management practices.
In this post, we discuss the challenges database administrators and compliance teams must overcome to meet their data privacy obligations in order to ensure database compliance with data privacy regulations.
Visibility into your data is fundamental to data privacy compliance. Because if you don't know what personal data you're actually storing on your databases then you cannot be certain you're giving it the protection it needs.
Likewise, without this insight, you cannot easily meet other requirements of data privacy legislation. For example, your response to any right-of-access request must give a full account of all the personal information you store about the data subject.
Moreover, you may be unaware of other high risk practices, such as excessive data collection or sharing, which could go unnoticed by legal and compliance teams.
Data mapping is a means of identifying the information your organization keeps and how it moves from one location to another.
Start by drawing up an inventory of your data across all database deployments. This typically documents:
- The different types of sensitive personal data you store.
- The location of the data itself.
- Categories of data subject, such as age group or nationality, as determined by the specific treatment they require under data privacy law.
- Retention policies for that data.
SQL is the query method of choice for locating most forms of data that reside in traditional relational databases. But, as a tool for keeping tabs on your personal data, it can provide only very limited value.
First, you cannot always rely on the column names and other metadata assigned to your tables. So you may need to query a range of columns, relying on pattern matching to identify different types of personal data.
However, this is only suited to text strings that follow specific patterns, such as credit card and social security numbers. Other types of personal information are far less structured and much more difficult to identify without a contextual understanding of the data.
Secondly, enterprises are increasingly adopting NoSQL databases designed for larger data sets. These are generally geared towards access via API and often provide limited or no support for SQL. As NoSQL databases are still a relatively new concept, enterprises may find it difficult to perform the complex database queries that would give them the visibility they need.
The scope of the data involved in meeting database compliance goals has led to the evolution of new AI-based tools that can understand the contextual meaning of database content and therefore provide accurate detection of certain personal information.
These AI-based tools can discover, map, and classify specific personal data that is stored in structured as well as in unstructured forms. They can recognize particular personally identifiable information (PII) such as bank account numbers, personal names, and email addresses. And they can also identify certain special category data, such as details about health, religious beliefs and ethnic origin.
But, above all, they're designed specifically for privacy and compliance professionals, who can carry out their duties without the need for technical knowledge or relying on IT to construct queries on their behalf.
To comply with privacy regulations, as well as industry-specific standards such as the Payment Card Industry Data Security Standard (PCI DSS), appropriate technical and organizational measures are required to protect personal data.
It's easy to overlook that this regulation applies not only to personal data stored in transactional databases but also to personal data stored for analytical purposes.
One form of protection to help comply with data protection obligations is encryption. But there is something to caution about encryption: the data you encrypt is only as secure as the keys you use to encrypt it. Make sure you have a secure key management system in place to protect your keys.
Your data will also be continually on the move as it flows across cluster nodes, between cloud-based and on-premises infrastructure, and between your master databases and backups, replication databases, applications, and end users. So remember to use transport layer security (TLS) to protect data in transit.
Furthermore, other database security and resiliency best practices should also be considered, such as:
- Physical security of your on-premises data centers including protection of server hardware from physical damage.
- Network security measures, including intrusion detection systems and firewalls.
- Strong access control measures to limit user access to resources based on job role or business function.
- Database replication and backups to enable continuity or recovery from system failures or malicious attacks.
- Regular patching and software updates to address vulnerabilities in your database management system.
- Database activity monitoring (DAM) to help detect malicious database activity and vulnerability scanning to identify known weaknesses.
If you deploy databases to the public cloud, you may need to consider d new and different security tools that are adapted to a dynamic shared computing environment rather than traditional static on-premises infrastructure. At the same time, however, you'll also transfer some of the responsibility for security to your cloud service provider (CSP).
In addition, coders play a role in database security. For example, they should take steps to prevent SQL injection—a database-specific threat in which an attacker inserts malicious code into your SQL queries, usually via input fields in online forms.
And, finally, bear in mind that the data in your backups is just as valuable to hackers as your live database content. So make sure you give it the level of protection it needs.
Data Subject Access Requests (DSARs)
Recent new data protection regulations have strengthened the rights of citizens to request a copy of the information you hold about them. You may be required fulfil these requests within strict time limits.
However, this could end up being a protracted process, as data about any single individual may be stored in a variety of different databases. For example, a local authority may maintain information about a member of the public on several different systems, depending on the services they use. So you'd want a quick and efficient way to get at all this data—ideally from a single point of control.
You will also want to make sure that all of the information you provide is correct. For example, if a data subject has the same name as someone else in your records then you risk revealing the information about the wrong person. So look for an intelligent DSAR response solution that comes with some form of identity verification.
Some data protection regulations may prohibit transfers of personal data outside the territory they cover—except to approved countries that provide a strong regulatory level of data protection or under a other circumstances designed to protect the privacy rights of their citizens. This could have data residency implications for both relational and non-relational databases.
Standard legacy relational databases are single-node deployments. This means that, without complex adaptation, you're restricted to hosting your data in a single place. This could prove problematic if you are required to meet data sovereignty rules, as it may no longer be possible to serve an international customer base from just one location.
In the case of NoSQL and some cloud-based SQL databases, you have more scope to geographically distribute your data. However, you may need a multicloud strategy to ensure you have the right mix of data centers to meet database compliance goals.
One of the core principles of GDPR and the other data privacy laws is data minimization. In essence, it means that when you collect personal data for a specified purpose you should only collect what you actually need to meet that purpose.
This can be at odds with your big data goals, as one of the main reasons for adopting a new NoSQL database is to capture and store as much data as possible. This leaves you a choice between:
- updating data subjects explain how you use this data, including getting their consent to do so when necessary
- filtering out personal data in the data ingestion process
Similarly, in your transactional databases, if you're storing more personal information than you actually need, then you might need to stop collecting it and purge all unnecessary data.
This will not only aid compliance and reduce your compliance burden, but will also streamline your data, thereby helping to reduce storage costs and potentially improve query performance.
With any database, compliance with stringent data privacy laws is an ongoing challenge. There are a lot of moving parts to any database: GDPR and other data privacy requirements are just one more thing storage and database admis need to consider.
To help meet your database compliance goals for data privacy, try NetApp Cloud Data Sense. This AI-driven data mapping technology intelligently parses data stored in Oracle, MySQL, MongoDB, MSSQL, SAP HANA, and PostgreSQL databases and automatically identifies and reports on where the data is stored so you can fulfill your data compliance requirements easily.