hamburger icon close icon

Data Classification: The Basics and a 6-Step Checklist

April 14, 2021

Topics: Cloud Data Sense Elementary6 minute read

What is Data Classification?

Data classification is the process of organizing and labeling data into categories, enabling appropriate protection measures, and efficient search, retrieval and use of each data category. Data classification is an important part of data management  at large organizations. It is particularly important for risk management, compliance, and data security. It can also reduce an organization’s storage and backup costs.

Data classification tasks include classifying information according to its sensitivity, labeling data for easy retrieval, and eliminating redundant data. The classification process may sound technical, but it is a topic that any organization’s leaders need to understand and participate in.

In this article, you will learn:

Data Classification Levels

A primary goal of classification is to identify properties of organizational data including:

  • How confidential is the data
  • How important is the data for business operations
  • What is the level of integrity needed for the data
  • How important is it for the data to be available at all times
  • Compliance requirements for the data, if any

By evaluating these and other properties, a data classification process can divide organizational data into several classification levels. Here is a commonly used four-level classification system:

  1. Public—information that is in the public domain
  2. Internal—data that should only be accessible to individuals authorized by the organization.
  3. Confidential—data that has special sensitivity and should require special clearance, Typically this type of data will be covered by standards like HIPAA or PCI DSS, and its exposure could lead to legal exposure, fines, or damage to the business.
  4. Restricted—trade secrets, intellectual property, or any strategic data that, if exposed, can cause major damage to the business.

Types of Data Classification

Data classification involves applying tags and labels to data, which specify the data type, classification level (indicating how confidential is the data, see the previous section), integrity, and usefulness.

The following are three ways to perform data classification:

  • Content-based classification—finding sensitive information by inspecting and
  • Interpreting the content of files.
  • Context-based classification—looks at properties like application used to author the data, location, author, or other metadata is an indirect indication of sensitive information.
  • User-based classification—relies on manual selection of documents by end users or data stewards. User-based classification relies on the knowledge and judgment of users while creating, editing, viewing or distributing sensitive documents.

Related content: read our guide to data discovery 

Compliance Requirements for Classifying Data

Many compliance standards and regulations have requirements for data classification. Below we list some of the common standards that touch on classification:

Compliance Standard

Applies To

Data Classification Requirements

SOC 2

Service organizations

Requires that service organizations include confidentiality data categories in their audits, and must demonstrate that sensitive information is identified and maintained to meet the objectives of related entities (most commonly, the service provider’s customers).

HIPAA

USA healthcare providers and their business partners

Considers private health information (PHI) high risk data. Requires covered entities (health organizations) and business partners to establish mandatory procedures for classifying PHI, and controlling its collection, use, storage, and transmission.

PCI DSS

Organizations storing or processing credit cardholder data

Requirement 9.6.1 states, with respect to credit cardholder data, that organizations must "classify data so that the confidentiality of the data can be verified".

GDPR

Organizations storing or processing personally identifiable information (PII) of EU citizens

Specifies that any organization processing personally identifiable information (PII) pertaining to European Union citizens must perform classification of the data as public, proprietary, or confidential. The GDPR categorizes certain data, including race, sexual orientation, political views, and health data, as "special" data that requires additional protection.

A 6-Step Checklist to Effective Data Classification

When creating your own data classification standards and process, consider the following six steps:

  1. Perform a risk assessment for sensitive data—during this phase, you assess regulatory and contractual requirements for privacy and confidentiality. Collect information from all relevant stakeholders. Once you have this information, you can create relevant data classification objectives.
  2. Develop a formal classification policy—during this phase, you create data classification categories which should be enforced across all departments. Make sure the policy is clear and well understood by all employees. Once the policy is created, you can continuously keep it updated with relevant changes.
  3. Categorize the types of data—during this phase, you should take a look at all types of data the organization collects and generates. Define sensitivity levels across various departments and determine privilege levels.
  4. Identify data locations—during this phase, you need to find all data storage locations, such as cloud-based storage services, mobile devices, local storage, and more. You can use data discovery tools to identify various locations, and then create an inventory.
  5. Identify and classify data—during this phase, you need to take a look at all data types and locations, and then classify them according to compliance requirements. This can help you ensure that each data type is protected according to compliance requirements.
  6. Implement monitoring and management—data is dynamic, and requires constant monitoring for changes or policy compliance violations. Monitoring tools can help you maintain continuous visibility across your pipelines.

Data Classification with NetApp Cloud Data Sense

NetApp® Cloud Data Sense is the data privacy and governance service for data stored in the cloud and on premises. Cloud Data Sense leverages cognitive computing to deliver always-on privacy controls across your hybrid data sources.

By discovering, mapping and identifying personal and sensitive information, Cloud Data Sense automates the most challenging data privacy and governance tasks introduced by GDPR, CCPA, and other data privacy regulations.

Learn more about NetApp Cloud Data Sense

New call-to-action

Senior Marketing and Strategy Manager