Back to Blog

Data Classification: Definition, Types and Steps to Get Started

by Rod Bernabe | Jul 22, 2020

What is data classification?

Data classification is the process of placing data into various categories that helps with both protection and general usage. The very purpose of a classification process is to make your data easily locatable, storable, compliant and retrievable. The process also includes finding and deleting data duplicates to save both storage costs and backup time. The entire data classification process must be properly understood by organization’s leaders to make correct data-related decisions. There’s several areas of the business that rely heavily on a data classification as a process:

Data security
Risk management
Legal Discovery
Compliance

Data classification types

Data classification is all about categorizing a piece of information based on its type, integrity, access permissions, and content. Different security measures should be applied based on the results of the data classification and its sensitivity and/or confidentiality level.

In a typical organization there are four classification levels:

Public data: Data that is freely accessible to the public (i.e. all employees/company personnel). It can be freely used, reused, and redistributed without repercussions. For example, press releases and other public-facing company announcements.
Internal-only data: This is data that needs to restricted to individual internal company personnel or internal employees who are granted access. For example, internal-only communications, business plans, etc.
Confidential data: This data requires specific authorization and/or clearance to access it due to its confidential nature. For example, M&A documents or data regulated by privacy laws such as GDPR and HIPAA.
Restricted data: This is highly sensitive data that, if compromised or accessed without authorization, may lead to legal fines, criminal action or damage the company’s bottom line or competitive advantage. For example, intellectual property (IP) and data protected by government or industry regulations.

This four-tier classification model is often used as a foundation for companies to build their own classification framework off of. Depending on the nature of the data your organization handles, and any regulatory considerations, there may be additional custom classification levels required. Furthermore, performing the classification is just one part of the process, you have to implement the appropriate security measures and/or solutions for the process successful as a whole and to protect your most important data.

Data classification policy

One of the prime purposes of a data classification policy is to define who is responsible for the process. It can be someone responsible for the data accuracy, the information creators, or subject matter experts.

Your classification policy is basically the data classification standard, specifying how to do it in the first place. There’s also more specifics that a policy should define in regards to the data classification process, including the time periods between subsequent data classifications, what types of data are classified, how to classify data (the appliance that performs data classification), and so on. It’s also important to remember that a classification policy is part of the information security policy which specifies the means of protecting sensitive data.

A few questions to consider when forming a data classification policy:

Who is responsible for the data being accurate and complete?
Who is the creator/owner of this information?
Is this information a subject to any compliance regulations? What are the consequences of non-compliance in that case?
Which part of the organization has the most information about the context and/or content of this specific data?
Where is the data stored?

Data classification process

As for the types of data classification methods, there are generally three that are considered to be the industry standard:

Content-based Classification looks at a files’ contents and sensitivity level to determine their importance.
Context-based Classification considers indirect indicators of the information’s sensitivity including location, creator, application, etc.
User-defined Classification is entirely reliant on manual user selection for each document, it relies heavily on the end-user’s discretion and knowledge to appropriately flag documents with different types of sensitivity.

The data classification process can become complicated and cumbersome very quickly. While defining the classification process helps a lot – the company must perform a variety of additional operations for the entire process to work properly, including:

Determine the correct criteria and/or categories that would be used to perform the entire data classification process.
Implement various security-related measures based on the results of the classification process.
Ensure the maintaining of proper data classification protocols by outlining the responsibilities of a company’s employees.

This additional process should provide the company with an operational data classification framework to work with. Each category should also include additional information around security considerations, data types and rules that relate to various processes that can be performed with said data (storage, retrieval, transmission and other processes).

Data classification and compliance

Compliance regulations also put a lot of pressure to implement data classification. A good example is GDPR, which stipulates that if your company works with EU citizens in any way – you have to know what that data is, where it is stored, and protect it with appropriate security measures.

Additionally, regulations like GDPR often demand much heavier security measures for specific data categories. For example, GDPR prohibits any sort of processing of data that’s related to philosophical beliefs, racial or ethnical origins, or political opinions. A proper classification procedure should be able to alleviate a lot of risk that comes with compliance, lessening the chances of a company having compliance issues and being fined.

5 Steps to data classification

It’s incredibly hard to have a proper sensitive data handling system without a good data classification framework in place. However, there’s also a lot of examples when companies can’t find the right approach to their data classification system, making it either too complicated or rendering it ineffective in the first place. Here are five general steps that you should follow for a successful classification system:

Perform a Risk assessment. Get a clear understanding of all of the requirements from the confidential and privacy standpoint is a requirement to begin.
Develop as Classification Policy. Building a comprehensive classification policy without overcomplicating everything is another big step towards putting a data classification system in place.
Categorize Your Data . Understanding your data types and how important they might be beforehand is also heavily recommended before starting.
Discover, Identify and Tag your Data. Once a policy is established your must perform data discovery, along with identification and subsequent classification / tagging of the data based on the classification policies.
Implement Security Measures and Maintenance. Applying appropriate security measures and updating them as necessary ensures that policies you’ve developed regarding data access are enforced according to the data’s classification.

At some point you won’t be working with just a basic set of rules anymore, as compliance regulation can require complex policies and access need. The tips below will help make your classification program more scalable:

Leverage existing file metadata . If you can filter out the files that you are not interested in based on metadata, then you are saving precious time by not even sending them for content classification.
Another tip is to attempt incremental scanning instead of scanning all of your data everything in one go. This allows for more agile and faster feedback to ensure your rules and logic are accurate.
More complicated scaling techniques involve various modern technologies like machine learning, comprehensive audit, permission logging, and so on.

Myths surrounding data classification

Surprisingly enough, there are quite a lot of people that think there’s no need for data classification to exist, or it’s more trouble than it’s worth. Here are the top-3 myths about data classification and why they’re wrong:

It’s extremely complicated. While the overcomplication of data classification projects is possible, a lot of times the blame lies on the scheme creators. When it comes to data classification and the number of categories, you don’t want too many because it just makes it harder for everyone. This is why the general recommendation is to start with only three categories, and add more only after careful consideration.
It’s just more bureaucracy. Quite the contrary, data classification is one of the ways to make also make your data protection that much simpler. It can also allow for better resource allocation and helps with prioritizing the protection measures.
It takes a long time for classification efforts to become valuable. Classification usually begins paying off right away by bringing order to your data, no matter if it’s on the context or content basis. At the same time, it helps with implementing security improvements in a lot of ways.

Automate Data Discovery and Classification

Get simple, fast and dynamic advanced data classification and protection capabilities with NC Protect to prevent data breaches, unauthorized file access and accidental sharing. NC Protect finds your organization’s sensitive data, classifies and secures it with dynamic conditional security across your collaboration tools in on-premises, cloud or hybrid environments.

Discover and tag files or use existing classifications and Microsoft Purview sensitivity labels and add dynamic granular access control and data protection, including encryption and user rights management, to mitigate risk and ensure secure collaboration of all business-critical data. NC Protect works across all of your data collaboration tools including Microsoft 365, SharePoint on-premises, Dropbox, Nutanix Files and Windows File Shares.

Data Classification Guide

Outline the general framework of all the operations related to the data classification in your organisation.

Download Now

Rod Bernabe

Technical Product Manager, archTIS

With 20+years managing software product and development teams, archTIS Technical Product Manager Rod Bernabe is responsible for delivering against the company’s product roadmap and ensuring customer requirements are met.

← 7 Practices to Ensure Compliance with South Africa’s POPI Act 11 Data Security Blogs to Add to Your Summer Reading List →

Back to Blog

Data Classification: Definition, Types and Steps to Get Started

What is data classification?

Data classification types

Data classification policy

Data classification process

Data classification and compliance

5 Steps to data classification

Myths surrounding data classification

Automate Data Discovery and Classification

Data Classification Guide

Subscribe Now

Latest Blogs

Latest Press Releases

Products & Services

Resources

Company

Contact

Investor Relations