A Detailed Look at Enterprise Data Classification: Challenges, Roles, Steps and Practices
Nowadays, with the promotion and popularization of digitalization, the data information created, stored and managed by enterprises in daily operations is growing exponentially, including various sensitive production and sales data, as well as user and employee identity information. In order to ensure the confidentiality, security and compliance of such rich data, we often need a higher level of security control capabilities than before, as well as a series of best practices for data protection. Among them, data classification is an indispensable step.
1. What is data classification?
Data classification is the process of locating, labeling, separating, and organizing the data into relevant levels based on their common characteristics (such as sensitivity, risk, and compliance). On this basis, the enterprise must ensure that only authorized personnel can access or process appropriate data from inside or outside in an appropriate manner in accordance with relevant regulations. It can be seen that the correct completion of data classification will make the use and flow of data within and between enterprises more appropriate and more effective. However, in the actual operation of enterprises, this link is often overlooked, resulting in enterprises not fully understanding the capabilities, uses, and scope of the data they hold.
2. Challenges of Data Classification
Almost every company stores sensitive data of various types, often more than they realize. Of course, they are unlikely to know exactly where the data is stored in the entire system of the company, and how it may be accessed or even leaked. Let's take a deeper look at the typical reasons and hazards of companies failing to perform data classification:
- The leadership always has the lucky mentality that "this will never happen in our company."
- Dealing with data and privacy issues is put behind "urgent matters" such as marketing, expansion, and pricing.
- Businesses don’t know how to locate or identify existing data.
- Businesses are unable to keep up with the constant updating and promulgation of laws and regulations.
- Businesses believe data classification is too complex and will not produce real results.
- Some companies only keep the data classification policy in theory, and even fail to implement it after formulating the strategy.
- Sensitive data sitting in data silos may be lost because it cannot be discovered and is unprotected.
- Improper handling of sensitive information can result in lost customers and reduced future revenue.
- Companies may be fined and penalized by regulators for improper data handling.
- Leaking customer information could lead to lawsuits and damage a company’s reputation.
3. Why do we need data classification?
If an enterprise does not understand its data, where it is stored, and how to protect it, then the security and privacy of the data are out of the question. According to Forrester, a world-renowned technology and market research company, data privacy professionals (such as data privacy officers) will not be able to effectively protect their customers, employees, and corporate data if they do not understand the following:
- What data exists in the enterprise?
- Their exact location
- Their value and risks to the business
- Compliance regulations related to managing data
- Which roles are allowed to access and use data ? Data classification can identify and mark sensitive information and files of enterprises in networks, shared platforms, user terminals and cloud service terminals by providing consistent processes. The principle behind it is to define how to process and protect data of different levels according to enterprise and regulatory requirements by creating data attributes. In this way, enterprises can apply different protection measures based on the sorted data to reduce the risk of data exposure, reduce the scope of data diffusion, avoid the lack or excessive data protection, and reasonably concentrate security resources and correctly apply them to data of different levels of the enterprise.
4. Benefits of Data Classification
According to statistics, only 54% of companies know where their sensitive data is stored. These data in the "dark forest" are obviously a major obstacle to corporate data security and privacy compliance. By fully launching and fully planning to implement data classification, it is bound to bring the following benefits to enterprises:
Improve data security
Data classification allows enterprises to guide the implementation of sensitive data protection by answering key questions such as:
- What sensitive data exists (e.g. IP, PHI, PII, credit card, etc.)?
- Where does this sensitive data exist?
- Who can access, modify and delete them?
- What would be the impact on the business if the data was compromised, destroyed, or inappropriately altered?Knowing the answers to the above questions can help companies:
- Understand the criticality of different types of data.
- Reduce the storage scope of sensitive data to make security management more effective.
- Ensure only authorized users can access sensitive data.
- Implement appropriate data protection technologies such as encryption, data loss prevention (DLP), and identity loss and protection (ILP).
- Optimize costs and avoid wasting resources on non-critical data.
Ensure regulatory compliance
Data classification helps locate the storage location of regulated data (see below), ensures that security controls are in place, ensures that data is retrievable and traceable, and complies with legal and regulatory requirements. Specifically:
- Ensure sensitive data such as medical, credit card and personally identifiable information (PII) are handled in compliance with various laws and regulations.
- Helps keep your business's daily operations in compliance with relevant regulations and privacy requirements.
- Supports rapid retrieval and location of specific information within a limited time frame.
- Demonstrate the company's professional capabilities and meet the requirements and standards of internal and external audits.
Improve business operation efficiency and reduce business risks
From the creation to the destruction of information, data classification can bring the following benefits to the daily operations of an enterprise:
- Gain greater insight and control over the data your business holds and shares.
- Allow enterprises to access and use data more efficiently without compromising security.
- Facilitates risk management by helping organizations assess the value of the data they hold and the impact of its loss, theft, misuse or compromise.
- Enhanced and integrated data record retention and e-discovery capabilities.
5. Data classification and life cycle
The data lifecycle provides an ideal process for controlling the flow of data within and outside the entire enterprise. Data classification can provide security and compliance guidance for every step of data from creation to deletion. The typical data lifecycle includes the following six stages:
- Creation – Sensitive data is generated in a variety of formats, including emails, Excel documents, Word documents, WeChat documents, social media, and websites.
- Use - Appropriate roles are used to tag sensitive data and files based on existing security policies and compliance rules.
- Storage – After use, data is stored with access control and encryption.
- Share – Continuously share data between employees, customers and partners across devices and platforms.
- Archiving – Data that is no longer active eventually gets archived in the enterprise’s storage systems.
- Destruction - Destroy data on demand to reduce the storage burden on the enterprise and improve the overall data security posture. Data should be classified immediately after creation. At the same time, the classification should be continuously evaluated and updated as data moves through the various stages of its life cycle.
6. Data classification and data discovery
In parallel with the data lifecycle is data discovery. It is the process of collecting data from databases and data silos and integrating it into a single source that can be accessed on demand and in a timely manner. Data classification and data discovery complement each other. In practice, we can distinguish three aspects of data discovery:
- Defining Data
- Profiling and analyzing data
- Tagging data : data classification and discovery processes can be automated to improve efficiency. Moreover, automated data classification and discovery can solve the problems of low efficiency, poor accuracy, strong subjectivity, and inconsistency caused by our traditional manual implementation.
7. How to implement data classification
Next, let’s delve into the nuts and bolts of how this works by looking at the types of data classification, compliance requirements, and the roles involved in the classification process.
8. Types of data to be classified
Almost every business holds more sensitive data than it realizes. In general, data in an enterprise can be divided into two categories: regulated and non-regulated data.
Regulated Information
Data that is regulated by compliance agencies must be of a sensitive level, including:
- Personally Identifiable Information (PII) – Data that can be used to identify, contact, or locate a specific individual, or to distinguish one person from others, such as Social Security numbers, driver’s license numbers, physical addresses, and phone numbers.
- Personal Health Information (PHI) – A person’s health and medical information, such as insurance, test results, and health conditions.
- Financial Information – A person’s financial information, such as credit card numbers, bank account information and passwords.
Non-regulatory information
Non-regulatory data is also very sensitive and needs to be protected, including:
- Authentication Information – Data used to verify the identity of a person, system, or service, such as passwords, shared secrets, encryption keys, and hashes.
- Company Intellectual Property – includes information unique to a business, such as IP, business plans, trade secrets, and financial records.
- Government Information - any information classified as Confidential, Top Secret, or Restricted, and any information that could be considered to have compromised confidentiality if disclosed.
Three types of data classification
Generally, we can classify data in three types:
- Manual – Traditional data classification methods require human intervention and execution, which is often time-consuming and error-prone.
- Automation - Technology-driven automation eliminates the risks of manual execution and expands the data surface and continuity of execution.
- Hybrid – Human intervention provides context for data staging, while automated tools ensure efficiency and quality of execution.
Evaluation Data Grading Standards
Before formulating their own data classification model based on actual conditions, enterprises need to refer to different classification standards. For example, US government agencies usually define three types of data: "public, secret, and top secret." Private enterprises often classify data into three categories: "restricted, private, and public."
When companies use traditional classification processes that are too complex and arbitrary, they often fall into the trap of over-segmentation. In fact, data classification does not have to be too cumbersome. The best practice is: the company first creates an initial classification model with three to four data classifications and starts by judging the sensitivity of the data within the company. As the potential impact increases from low to high, the sensitivity gradually increases. Later, companies will add more refined levels based on specific data compliance requirements and other business needs. The National Institute of Standards and Technology (NIST) provides a framework in the Federal Information Processing Standard (FIPS) 199 version, a guide for this process, to guide companies to determine the sensitivity of information based on the following three key criteria:
- Confidentiality - Unauthorized disclosure of information could have limited (low), severe (moderate), or catastrophic (high) adverse effects on business operations or personal assets. Therefore, restrictions on authorized access and disclosure of information should be implemented, including controls to protect personal privacy and proprietary information.
- Integrity - Unauthorized modification or destruction of information could have limited (low), severe (medium), or catastrophic (high) adverse effects on business operations or personal assets. Therefore, improper modification or destruction of information must be prevented, including ensuring that the information is non-repudiable and verifiable.
- Availability - interruptions in access to or use of information systems may have a limited (low), severe (moderate), or catastrophic (high) adverse impact on business operations or personal assets. Ensure that information can be accessed and used in a timely and reliable manner.
Another way to assess the value, sensitivity, and risk of enterprise data is to focus on key questions such as:
- Materiality – Is the data important to day-to-day operations and business continuity?
- Availability - Does the business require that data be accessible in a timely and reliable manner?
- Sensitivity – What is the potential impact on the business if the data is compromised?
- Integrity – the importance of ensuring that data has not been tampered with while in storage or in transit.
- Retention – How long must the data be retained based on regulatory requirements or industry standards?
9. Regulatory Compliance Overview
Currently, most sensitive data in enterprises is regulated by compliance agencies in different countries and regions. In the field of data privacy, there are four main regulations that enterprises need to comply with according to their actual situation.
Health Insurance Portability and Accountability Act (HIPAA)
The regulation is designed to protect personal protected health information (PHI). Currently, HIPAA has up to 18 sensitive data identifiers that must be protected, including: medical record numbers, health plan and health insurance beneficiary numbers, and biometric identifiers such as fingerprints, voiceprints and facial photos. HIPAA's privacy rules require companies to ensure the integrity of electronic personal health information (ePHI).
At the same time, HIPAA's classification guidelines require companies to group data according to its sensitivity as follows:
- Restricted/Confidential Data – Data whose unauthorized disclosure, alteration, or destruction could cause significant harm. This data requires the highest levels of security and controlled access based on the principle of least privilege.
- Internal data - data that could cause low to moderate damage if disclosed, altered, or destroyed without authorization. This data is not released to the public and requires reasonable security controls.
- Public data - while not necessarily protected from unauthorized access, does need to be protected from unauthorized modification or destruction.
Payment Card Industry Data Security Standard (PCI-DSS)
The sensitive data that PCI-DSS requires to be protected is identified as: cardholder data. This standard is designed to protect personal payment card information, including: credit card number, expiration date, CVV code, password, etc. Enterprises need to classify data based on regular risk assessment and security classification processes.
Cardholder data elements should be ranked according to their type, storage permissions, and required protection level to ensure that security controls are applied to all sensitive data. At the same time, it should be confirmed that all instances of cardholder data are documented and that no cardholder data exists outside of the defined cardholder environment.
General Data Protection Regulation (GDPR)
The GDPR is designed to protect the PII of EU citizens. The law defines personal data as any information that can directly or indirectly identify a natural person, such as:
- Name
- Identification Number
- Location data
- Online LogoTo comply with the GDPR, companies must structure their data inventories to classify data about one or more specific factors of an individual’s physical, physiological, genetic, mental, economic, cultural, or social identity. These include:
- Type of data (financial information, health data, etc.)
- Basis of data protection (for personal or sensitive information)
- The category of individuals involved (clients, patients, etc.)
- Categories of recipients (in particular third-party suppliers outside the EU)
California Consumer Privacy Act (CCPA)
The bill, which takes effect on July 1, 2023, brings key data privacy concepts from the European GDPR to California residents. It requires businesses that interact with California residents to comply with a set of consumer rights and obligations under the law covering personal data collected, processed or sold by the company. These include:
- Grant consumers various information rights, including: requesting companies to provide what types of data are collected, the purpose of collection, and the name of the company that sells the data.
- Provides the right to opt-out (not participate) in data collection or sales.
- Gives the right to request the erasure of personal data.To this end, companies need to understand the three components of the California Privacy Rights Act (CPRA), which corresponds to CCPA:
- Special categories of personal information that require protection include: name, social security number, email address, and birthday.
- Requirements to proactively implement security measures to protect personal information.
- Increase oversight of service providers and contractors who have access to personal information held by businesses.
Gramm-Leach-Bliley Act (GLBA)
The Gramm-Leach-Bliley Act, enacted in 1999, is intended to require financial institutions to explain to their customers how information collected by the institution is shared. In response to the requirement to protect sensitive data, GLBA policies protect customers in the following three ways:
- Financial institutions need to protect confidential customer information, guard against threats to security and integrity, and prevent unauthorized access to customer information.
- Financial institutions must be able to explain how the business uses and shares personal information, while giving customers the ability to opt out of sharing certain information.
- Financial institutions must be able to explain to customers how their information will be protected and kept confidential. GLBA applies to many types of institutions. The law covers not only financial institutions such as banks, credit unions, and savings and loan companies, but also securities firms, car dealers, and retailers that collect and share personal information and provide credit lines to customers.
10. The role of data classification
Data classification is not a one-person battle. To improve the data classification process, companies should designate different roles to fulfill specific responsibilities. Here, we can refer to the roles and responsibilities related to data classification defined by Forrester.
Data Champions
The data advocate should ensure that data is appropriately protected based on the business purpose for which it is used. The aim is to ensure that business stakeholders (see below) can support and drive data classification as part of the company's overall data strategy. Of course, this role can be set up in different forms, for example, the Chief Privacy Office (CPO) may be responsible for strategies such as data quality, governance and monetization.
Data Owner
Data owners are often the people who are ultimately responsible for collecting and maintaining data and information for their department. They can be members of senior management, business unit managers, department heads, or equivalent. Their role is to provide additional context for data classification, such as third-party agreements, which are currently beyond the reach of automated tools.
Data creator
Unless the company has an automated data classification system in place, the responsibility for identifying the sensitivity of newly created and discovered data falls to this role. The data creator’s criteria include whether the data could enter the public domain or what the impact would be on the company if it were to become available to competitors.
Data users
As the name implies, data users are anyone who can access the data. They must use the data in a manner consistent with the intended purpose and in compliance with relevant policies. Because they have the right to process and use the data, they can provide concrete feedback on the data classification label and answers to the following questions:
- Based on how the data is used, is the current grading appropriate?
- In what cases might data be processed differently than what is currently permitted under the classification?
Data Auditor
Data auditors may be compliance managers, privacy officers, data security officers, or equivalent roles. They are responsible for reviewing the data owner's assessment of the data classification and determining whether it meets the requirements of business partners, regulators, and other companies. Data auditors also review feedback from data users to evaluate whether the actual or expected data usage is consistent with current data processing policies and procedures.
Data Custodian
As data custodians, IT technology and information security personnel are responsible for maintaining and backing up data stored in enterprise systems, databases, and servers. At the same time, this role is also responsible for implementing technology deployments in accordance with the rules established by the data owner and ensuring that the rules remain effective within the system.
11. Steps of Data Classification
While there is no one-size-fits-all approach to creating a comprehensive and appropriate data classification process, we can summarize the process into seven key steps. Of course, these steps can be tailored to meet the unique needs of a specific organization.
Conduct a sensitive data risk assessment
Have a thorough understanding of your organization’s organizational, regulatory, contractual privacy and confidentiality requirements. Define data classification goals with the following stakeholders:
- Privacy Leadership
- Safety Leadership
- Compliance Leadership
- Legal leadership
Developing a grading policy
In order for everyone in the organization to understand the existing data classification, the policy should cover the following points:
- Objectives – Outline the intent of the data classification and what the company hopes to achieve.
- Workflows – Explain to employees who work with different categories of sensitive data how to implement the classification process step by step.
- Schema - describes the enterprise data categories on which the hierarchy will be performed.
- Data Owners – Outline the roles and responsibilities of those involved in data classification and how they classify and grant access rights to sensitive data.
Distinguishing data categories
Different fields and companies often define sensitive data in different ways. We should pay attention to the following aspects in the process of data classification:
- What customer and partner data does this company collect?
- How is the data used?
- What proprietary data has been created?
- What is the security posture and risk level of existing data across the enterprise?
- What existing privacy regulations apply to my company’s data?
Discover the location of your data
Catalog where data is stored across the enterprise, including:
- Internal and external network
- Endpoints
- Service Equipment
- Cloud Server
Identify and classify data
After discovering the location of the data, we should identify and classify it, assigning tags to each sensitive data asset so that it can be properly protected. We can either manually assign tags by the data owner, or use an automated data classification solution based on the following advantages:
- It can automatically classify various types of data across the entire enterprise according to the method approved by the enterprise.
- Label the data with appropriate classification labels.
- Continuously ensure that all data is graded and updated as needed throughout its data lifecycle.
Enable effective data security controls
By understanding where your data is stored and its corporate value, you can implement appropriate security controls based on the associated risks. That is, establish network security baseline measures and define policy-based controls for each data classification label, and then use DLP, ILP, encryption, and other security solutions to implement comprehensive protection for classified metadata.
Monitor and update the grading system
In order to adapt to the ever-changing data and privacy compliance, as well as the ever-increasing files and data, our classification policy must be dynamic. That is, we need to establish a consistent management process to ensure that the data classification system can operate in the best way and continue to meet the security needs of the enterprise.
12. Data classification practice
According to the data classification standard process introduced above, enterprises can start to apply classification labels to the data in daily operations and storage. Next, we will discuss five best practices for enterprises in the process of implementing data classification.
Implement automated, real-time, and continuous data classification
Reasonable automated system scanning will help simplify the data classification process. The system will automatically analyze and classify data according to pre-defined parameters.
Create a data classification atmosphere
Promoting a data governance culture throughout the entire enterprise from top to bottom and involving everyone will help set the tone for data prioritization. At the same time, this not only demonstrates the due attention of enterprise management to data security, but also makes the implementation of data and privacy protection measures a natural process.
Raising awareness through training
Many companies hold cybersecurity awareness training every year. We can add content about data classification and privacy protection to let data producers, users and owners know more about their roles and responsibilities in protecting sensitive data. This is crucial to reducing the scope of data dissemination and the risk of leakage. Of course, it is best to find the scenarios that best suit the data and privacy risks of employees in their daily business activities.
Collaborate with IT and the business from the beginning
By implementing standardized and repeatable processes with IT, companies can make the data classification policies they develop more realistic and more feasible.
Reduce the scope of sensitive data distribution
At present, with the continuous extension of data use and storage, the protection of sensitive data is bound to become more and more difficult. Enterprises should make good use of data discovery and deduplication tools to delete unnecessary content and reduce unnecessary storage locations. Of course, data classification itself can also help find various redundant, irrelevant, outdated, or even forgotten data, so as to weigh whether it is necessary to retain or protect it. It can be said that only when the sensitive data of an enterprise takes up less space, the data as a whole will be easier to control and protect.