DLP-01: Enabling Sensitive Data Classification

Overview

Sensitive data classification in Microsoft 365 uses trainable classifiers and sensitive information types (SITs) to automatically identify and label confidential information. This is the foundation for effective data loss prevention (DLP), enabling organizations to:

Automatically detect sensitive data like credit card numbers, SSNs, and health records
Apply sensitivity labels to protect documents and emails
Enable DLP policies that prevent unauthorized sharing
Meet compliance requirements for data protection (GDPR, HIPAA, PCI-DSS)
Provide visibility into where sensitive data exists in your organization

This guide covers enabling and configuring sensitive data classification in Microsoft 365.

Prerequisites

Required Roles

Global Administrator - Full configuration access
Compliance Administrator - Data classification and DLP configuration
Information Protection Administrator - Sensitivity label management
Security Administrator - Policy configuration

Required Licenses

Feature	License Required
Basic sensitivity labels	Microsoft 365 Business Basic/Premium
Auto-labeling	Microsoft 365 E5 or E5 Compliance
Content explorer	Microsoft 365 E5 or E5 Compliance
Trainable classifiers	Microsoft 365 E5 or E5 Compliance
DLP policies	Microsoft 365 E5 or E5 Compliance
Endpoint DLP	Microsoft 365 E5 or E5 Compliance

Required Permissions

Access to Microsoft Purview compliance portal
Content explorer permissions (Content Explorer Content Viewer role)
Access to Microsoft 365 admin center

Prerequisites Checklist

Microsoft 365 E5 or E5 Compliance licenses assigned
Sensitivity labels enabled in tenant
Users are actively using Microsoft 365 services
Audit logging is enabled

Time Estimate

Task	Duration
Planning classification requirements	2-4 hours
Configuring sensitive information types	1-2 hours
Creating sensitivity labels	2-3 hours
Configuring auto-labeling policies	2-3 hours
Testing and validation	1-2 days
User training	2-4 hours
Total	2-4 weeks (including rollout)

Step-by-Step Instructions

Step 1: Review Current Data Classification Status

Before configuring, understand your current state:

Navigate to Microsoft Purview: https://compliance.microsoft.com
Go to Data classification in the left navigation
Review the Overview dashboard:
- Top sensitive information types detected
- Top sensitivity labels applied
- Data distribution by location
Click Content explorer to see classified content:
- Review detected sensitive data types
- Identify where sensitive data is stored
- Note any gaps in classification

Note: Content explorer requires E5 license and may take 24-48 hours to populate initially.

Step 2: Understand Sensitive Information Types (SITs)

Microsoft provides 300+ built-in sensitive information types:

Category	Examples
Financial	Credit card numbers, bank accounts, SWIFT codes
Healthcare	HIPAA identifiers, drug names, medical terms
Personal	SSN, driver's license, passport numbers
National IDs	By country (US, UK, EU, etc.)
Regulatory	GDPR data, PCI-DSS data

To view available SITs:

Go to Data classification > Sensitive info types
Browse or search for relevant types
Click on a SIT to see:
- Pattern details
- Confidence levels
- Example matches

Step 3: Create Custom Sensitive Information Types

If built-in types don't cover your needs:

Navigate to Data classification > Sensitive info types
Click + Create sensitive info type

Basic Settings

Configure:
- Name: "Company Project Codes"
- Description: "Detects internal project codes in format PRJ-XXXXX"

Pattern Configuration

Click + Add pattern
Configure pattern elements:
- Primary element: Regular expression
- Pattern: PRJ-[A-Z0-9]{5}
Add supporting elements (optional):
- Keywords near the pattern
- Additional patterns
Configure confidence levels:
- High confidence: Primary + 2 supporting
- Medium confidence: Primary + 1 supporting
- Low confidence: Primary only
Set character proximity (within how many characters)
Click Create

Example Custom SITs:

Name	Pattern	Use Case
Employee ID	EMP-[0-9]{6}	Internal employee numbers
Project Code	PRJ-[A-Z]{2}[0-9]{4}	Project identifiers
Customer Account	CUST[0-9]{8}	Customer numbers
Internal Document ID	DOC-[A-Z0-9]{10}	Document tracking

Step 4: Create Sensitivity Labels

Sensitivity labels classify and protect content:

Navigate to Microsoft Purview > Information protection
Click Labels tab
Click + Create a label

Create Label Hierarchy

Parent Label: Confidential

Configure:
- Name: Confidential
- Display name: Confidential
- Description for users: "Business data that could harm the company if disclosed"
- Description for admins: "Parent label for confidential content"
Click Next
Scope: Select:
- Items (files, emails)
- Groups & sites (optional)
Click Next
Items: Configure protection:
- Apply content marking: Header, footer, watermark
- Encryption: Not on parent label
Complete wizard and create label

Sub-Label: Confidential - All Employees

Select the Confidential label
Click + Create sublabel
Configure:
- Name: Confidential - All Employees
- Display name: Confidential - All Employees
- Color: Yellow or Orange
Under Items settings:
- Content marking: Enable header "CONFIDENTIAL"
- Encryption: Enable
- Assign permissions now: All employees > Co-Author
Complete wizard

Recommended Label Structure:

Label	Sub-label	Encryption	Marking
Public	-	None	None
Internal	-	None	Footer
Confidential	All Employees	Yes - All employees	Header + Footer
Confidential	Specific People	Yes - Specific users	Header + Footer
Highly Confidential	All Employees	Yes - View only	Watermark + Header
Highly Confidential	Specific People	Yes - Specific users	Watermark + Header

Step 5: Publish Labels to Users

Make labels available to users:

Navigate to Information protection > Label policies
Click + Publish labels

Configure Policy

Choose labels: Select labels to publish
Admin units: Select if using (optional)
Users and groups:
- All users (recommended)
- Or specific groups for pilot
Policy settings:
- Require users to apply a label: Optional initially
- Default label for documents: "Internal" recommended
- Default label for emails: None or "Internal"
- Require justification to remove a label: Yes
Name: "Organization Sensitivity Labels"
Click Create

Note: Labels can take up to 24 hours to appear for users.

Step 6: Configure Auto-Labeling Policies

Automatically apply labels based on content:

Navigate to Information protection > Auto-labeling
Click + Create auto-labeling policy

Policy Configuration

Choose info to label: Select sensitive information types:
- Credit Card Number
- Social Security Number
- Bank Account Number
- (Add relevant SITs)
Name: "Auto-label Financial Data"
Description: "Automatically labels documents containing financial data"
Choose locations:
- Exchange (email)
- SharePoint sites
- OneDrive accounts
Define policy settings:
- Sensitivity label: Confidential - All Employees
Decide when to run:
- Simulation mode: Yes (start here)
- Turn on automatically: No initially
Click Create

Run Simulation

Wait for simulation to complete (can take hours/days)
Review results in Content to review
Validate matches are appropriate
Tune policy if needed (false positives)
Enable policy when satisfied

Step 7: Enable Trainable Classifiers

Use AI-based classifiers for complex content:

Navigate to Data classification > Trainable classifiers
Review built-in classifiers:
- Resumes
- Source code
- Agreements/Contracts
- Financial statements
- Healthcare forms
- HR documents
- Intellectual property
- Threat/Harassment
- Profanity
Enable classifiers for use in policies:
- These are pre-trained and ready to use
- Can be used in auto-labeling and DLP policies

Create Custom Trainable Classifier (Optional)

If built-in classifiers don't meet needs:

Click + Create trainable classifier
Provide sample content:
- Minimum 50 positive samples
- Minimum 50 negative samples
Train and validate
Publish when accuracy is acceptable

Note: Custom classifiers require significant sample data and training time.

Step 8: Configure Content Explorer Permissions

Grant access to view classified content:

Navigate to Microsoft Purview > Roles & scopes > Role groups
Find or create a role group with:
- Content Explorer Content Viewer
- Content Explorer List Viewer
Add security team members
Click Save

Privacy Note: Content Explorer shows actual file content. Limit access appropriately.

Step 9: Configure Classification Dashboards

Set up monitoring for classification:

Navigate to Data classification > Overview
Review available cards:
- Top sensitive info types
- Top sensitivity labels
- Locations with labeled content
Create custom reports:
- Go to Reports > Data classification
- Review available reports
- Export for analysis

Step 10: Test and Validate Classification

Before full rollout:

Create test content:
- Documents with known sensitive data
- Various file types (Word, Excel, PDF)
Upload to test location:
- Dedicated SharePoint site
- Test user OneDrive
Verify classification:
- Check Content Explorer for detection
- Verify auto-labeling applied
- Test protection is working
Document results:
- False positives
- Missed detections
- Performance issues

Verification Checklist

After configuring data classification, verify:

Troubleshooting

Issue: Labels Not Appearing in Office Apps

Cause: Policy publishing delay or client configuration.

Solution:

Wait 24-48 hours for policy propagation
Ensure user has appropriate license
Check Office is up to date (semi-annual channel)
Sign out and back in to Office
Check for group policy conflicts

Issue: Auto-Labeling Not Working

Cause: Policy configuration or location scope.

Solution:

Verify policy is enabled (not simulation mode)
Check content contains expected sensitive data
Verify location (SharePoint, OneDrive) is in scope
Check confidence level thresholds
Wait for background processing (can take hours)

Issue: Content Not Appearing in Content Explorer

Cause: Indexing delay or permissions.

Solution:

Wait 24-72 hours for initial indexing
Verify audit logging is enabled
Check user has Content Explorer permissions
Verify content is in indexed locations
Check for very large files (may be delayed)

Issue: False Positives in Classification

Cause: Overly broad patterns or low confidence matches.

Solution:

Review the specific matches in Content Explorer
Increase confidence level requirements
Add exclusion patterns to SITs
Use supporting elements (keywords)
Create custom SITs with more specific patterns

Issue: Encryption Preventing Access

Cause: Rights management configuration.

Solution:

Verify user has appropriate permissions
Check RMS connector if using on-premises
Verify Azure RMS is enabled
Clear RMS cache on client
Check for policy conflicts

Cost Considerations

License Requirements

Feature	License	Cost (approx.)
Manual sensitivity labels	M365 Business Premium	$22/user/month
Auto-labeling	M365 E5	$57/user/month
Content explorer	M365 E5	Included in E5
Trainable classifiers	M365 E5	Included in E5
E5 Compliance add-on	E5 Compliance	$12/user/month

Cost Optimization Strategies

Phase licensing:
- Start with Business Premium for manual labeling
- Upgrade to E5 when auto-labeling needed
Target high-risk users:
- E5 for users handling sensitive data
- Basic licenses for general users
Use built-in features first:
- Built-in SITs before custom
- Pre-trained classifiers before custom

Storage Considerations

Encryption may slightly increase storage usage:

Encrypted files have metadata overhead
Plan for 5-10% increase in storage

Best Practices

Start simple:
- Begin with 3-5 sensitivity labels
- Use built-in SITs before custom
- Enable auto-labeling for obvious cases first
Test thoroughly:
- Use simulation mode before enabling policies
- Test with representative content
- Validate in multiple Office versions
Train users:
- Explain why classification matters
- Show how to apply labels
- Provide quick reference guides
Monitor continuously:
- Review Content Explorer weekly
- Monitor auto-labeling accuracy
- Adjust policies based on findings
Document everything:
- Classification taxonomy
- Policy decisions
- Exception handling
Plan for scale:
- Start with pilot groups
- Expand gradually
- Monitor performance

Related Controls

DLP-02: Blocking bulk data exfiltration
LOG-01: Audit log retention
GOV-04: Incident response procedures

Revision History

Date	Version	Author	Changes
2025-01-07	1.0	TrueConfig	Initial release