In this interview with Metomic's VP of Engineering, Artem Tabalin, we dig deep into how data classification can transform your business' data security
Effective data classification plays a pivotal role in protecting sensitive information by categorising data based on its sensitivity. In this interview with Metomic's VP of Engineering, Artem Tabalin, we explore the basics of data classification, its critical role in Data Loss Prevention (DLP) programs, and the unique challenges of implementing it within cloud and SaaS environments.
From understanding the technical aspects to real-world examples of data breaches averted through classification, you'll discover valuable insights into safeguarding data within complex digital ecosystems.
Hi Artem! Let's get started with the basics of data classification:Â
Data classification is the process of organising data into categories based on its sensitivity and the impact its exposure can have on an organisation.
This categorisation is crucial for DLP (Data Loss Prevention) programmes because it allows security teams to apply the appropriate level of protection to different types of data. Knowing which data needs the highest level of security ensures that the team is focused on safeguarding the most sensitive information, such as intellectual property or PII (Personally Identifiable Information), while allowing lower-risk data to flow more freely.
The access and protection requirements increase with the sensitivity of the data, with highly confidential data requiring more rigorous encryption, multi-factor authentication, and auditing.
There are 4 typical classification levels based on the sensitivity of the information:
1)Â Public: Data intended for broad sharing, requiring minimal security.
2)Â Internal: Data used within the organisation, requiring moderate controls.â
3)Â Confidential: Sensitive data that could harm the organisation if exposed, requiring strong encryption and restricted access.â
4)Â Highly Confidential or Restricted: Data that could cause severe damage if exposed, requiring the strictest security measures, with access restricted to a minimal number of trusted individuals.
Data classification is central to security architecture in SaaS and cloud platforms because it dictates the level of protection needed for different assets. It allows teams to define access controls, auditing mechanisms, and policies for encryption at rest and in transit.
By classifying data, we ensure that security controls are applied appropriately across different data repositories: SaaS services like Google Workspace or Slack, databases and cloud.
One of the primary challenges is the decentralised and collaborative nature of services like Google Drive, where data can be shared easily across organisational boundaries. Itâs a challenge to ensure that classified data remains appropriately protected across shared documents, requiring robust access control, real-time monitoring, and automated classification. Another challenge is handling different document types and formats, which requires adaptable classification engines that can scan both structured and unstructured data.
DSPM solutions like Metomic address these challenges by automating the classification process, applying labels as data is uploaded, shared, or modified. The real-time labelling ensures that sensitive data is classified correctly and the historical bulk labelling allows teams to classify previously unclassified or misclassified data.
Ensuring compatibility starts with implementing a universal data classification framework that can be applied across platforms. This often involves using APIs and integration tools to unify data policies across various environments. Leveraging tools that support these standards can streamline the enforcement of consistent policies across platforms.
For example, Metomic provides a flexible and universal classification framework that integrates across multiple SaaS services, including Google Drive and Slack. By leveraging APIs and adaptable data classification models, Metomic ensures that data handling policies remain consistent across services, despite each having different data protection capabilities.
Scaling data classification in large SaaS and cloud environments involves managing the volume, velocity, and variety of data generated. Large environments can have massive datasets spread across multiple locations, which can make manual classification and enforcement nearly impossible. The solution would be to rely on automated classification and applying real-time labelling to both new and existing data, enabling consistent enforcement of security policies with little to no manual intervention.
Automated classification leverages AI/ML models like natural language processing (NLP) and pattern recognition to scan documents and categorise them based on the sensitive information they contain. This allows quick and accurate classification of sensitive information such as PII, ensuring compliance with regulatory requirements. These models can learn from labelled data and improve over time, becoming more and more accurate in their classification.
Any automated classification is going to have some level of false positives or false negatives. It may overlook context or misclassify documents due to its reliance on predefined criteria. On the other hand, manual classification, while more accurate in certain cases, is labour-intensive and error-prone, and hardly applicable in the case of large datasets. Itâs also critical for a classification system to support some level of customisation, to be able to tweak classification rules.
The key is to fine-tune the model based on real-world data and edge cases. Precision is increased by training the model with more diverse datasets, improving its ability to detect true positives while minimising false positives. Recall, which focuses on identifying as much relevant data as possible, can be enhanced by increasing the model's sensitivity to certain patterns. Regular feedback loops, where incorrect classifications are reviewed and corrected, help achieve a balance between the two.
Metomicâs classification engine supports a necessary level of customisability, which allows teams to tune classification rules depending on specific requirements to optimise precision and recall.
Data classification is essential for DLP solutions because it enables security policies to be enforced based on the sensitivity of the data. Once the data is labelled correctly, the DLP system can apply the appropriate restrictions, such as blocking unauthorised sharing or external downloads. Such integrations between classification and DLP help to prevent accidental or intentional data breaches. Thatâs why itâs much better when both a powerful classification engine and a DLP solution come together in one product.
The main consideration is ensuring that DLP systems recognise the classification labels applied, ensuring that data marked as âConfidentialâ or âHighly Confidentialâ triggers appropriate DLP actions. This alignment allows encryption, access control, and monitoring policies to be applied in real time based on data classification.
One notable example was when a financial organisation using Google Drive was able to prevent a potential data breach. A document containing PCI information was classified as âHighly Confidentialâ and made private by an automated policy. There have also been cases where a health organisation has made a document containing health and PII information about their patients, public. In all such cases, Metomic would automatically classify such documents as âHighly Confidentialâ and revoke public access automatically.
Data classification solutions help organisations meet compliance requirements by identifying and labelling sensitive information such as personal data (GDPR) and health records (HIPAA). Once classified, specific compliance rules can be applied, ensuring that only authorised users have access, and that data is encrypted and stored appropriately to meet regulatory requirements. This ensures that data in platforms like Google Drive is handled according to the relevant legal frameworks, reducing the risk of non-compliance and potential fines.
Once data is classified, the encryption and access controls appropriate to the classification level need to be applied. For highly confidential documents, this includes encryption at rest and in transit, role-based access control (RBAC), and multi-factor authentication (MFA). Itâs important to keep audit logs to monitor access and changes to make sure the sensitive data is constantly protected.
Misclassification can lead to either insufficient protection of sensitive data or overly restrictive policies on less critical information. To mitigate that risk, Metomic provides real-time classification and the ability to bulk-label historical data, ensuring that all data is properly categorised. Regular audits and customisable classification rules also help correct misclassifications, maintaining high accuracy and data security.
AI-powered data classification is one of our specialities - learn more about how we can help your business here.