DLP-01: Enabling Sensitive Data Classification
Overview
Sensitive data classification in Microsoft 365 uses trainable classifiers and sensitive information types (SITs) to automatically identify and label confidential information. This is the foundation for effective data loss prevention (DLP), enabling organizations to:
- Automatically detect sensitive data like credit card numbers, SSNs, and health records
- Apply sensitivity labels to protect documents and emails
- Enable DLP policies that prevent unauthorized sharing
- Meet compliance requirements for data protection (GDPR, HIPAA, PCI-DSS)
- Provide visibility into where sensitive data exists in your organization
This guide covers enabling and configuring sensitive data classification in Microsoft 365.
Prerequisites
Required Roles
- Global Administrator - Full configuration access
- Compliance Administrator - Data classification and DLP configuration
- Information Protection Administrator - Sensitivity label management
- Security Administrator - Policy configuration
Required Licenses
| Feature | License Required |
|---|---|
| Basic sensitivity labels | Microsoft 365 Business Basic/Premium |
| Auto-labeling | Microsoft 365 E5 or E5 Compliance |
| Content explorer | Microsoft 365 E5 or E5 Compliance |
| Trainable classifiers | Microsoft 365 E5 or E5 Compliance |
| DLP policies | Microsoft 365 E5 or E5 Compliance |
| Endpoint DLP | Microsoft 365 E5 or E5 Compliance |
Required Permissions
- Access to Microsoft Purview compliance portal
- Content explorer permissions (Content Explorer Content Viewer role)
- Access to Microsoft 365 admin center
Prerequisites Checklist
- Microsoft 365 E5 or E5 Compliance licenses assigned
- Sensitivity labels enabled in tenant
- Users are actively using Microsoft 365 services
- Audit logging is enabled
Time Estimate
| Task | Duration |
|---|---|
| Planning classification requirements | 2-4 hours |
| Configuring sensitive information types | 1-2 hours |
| Creating sensitivity labels | 2-3 hours |
| Configuring auto-labeling policies | 2-3 hours |
| Testing and validation | 1-2 days |
| User training | 2-4 hours |
| Total | 2-4 weeks (including rollout) |
Step-by-Step Instructions
Step 1: Review Current Data Classification Status
Before configuring, understand your current state:
-
Navigate to Microsoft Purview: https://compliance.microsoft.com
-
Go to Data classification in the left navigation
-
Review the Overview dashboard:
- Top sensitive information types detected
- Top sensitivity labels applied
- Data distribution by location
-
Click Content explorer to see classified content:
- Review detected sensitive data types
- Identify where sensitive data is stored
- Note any gaps in classification
Note: Content explorer requires E5 license and may take 24-48 hours to populate initially.
Step 2: Understand Sensitive Information Types (SITs)
Microsoft provides 300+ built-in sensitive information types:
| Category | Examples |
|---|---|
| Financial | Credit card numbers, bank accounts, SWIFT codes |
| Healthcare | HIPAA identifiers, drug names, medical terms |
| Personal | SSN, driver's license, passport numbers |
| National IDs | By country (US, UK, EU, etc.) |
| Regulatory | GDPR data, PCI-DSS data |
To view available SITs:
- Go to Data classification > Sensitive info types
- Browse or search for relevant types
- Click on a SIT to see:
- Pattern details
- Confidence levels
- Example matches
Step 3: Create Custom Sensitive Information Types
If built-in types don't cover your needs:
- Navigate to Data classification > Sensitive info types
- Click + Create sensitive info type
Basic Settings
- Configure:
- Name: "Company Project Codes"
- Description: "Detects internal project codes in format PRJ-XXXXX"
Pattern Configuration
-
Click + Add pattern
-
Configure pattern elements:
- Primary element: Regular expression
- Pattern:
PRJ-[A-Z0-9]{5}
-
Add supporting elements (optional):
- Keywords near the pattern
- Additional patterns
-
Configure confidence levels:
- High confidence: Primary + 2 supporting
- Medium confidence: Primary + 1 supporting
- Low confidence: Primary only
-
Set character proximity (within how many characters)
-
Click Create
Example Custom SITs:
| Name | Pattern | Use Case |
|---|---|---|
| Employee ID | EMP-[0-9]{6} | Internal employee numbers |
| Project Code | PRJ-[A-Z]{2}[0-9]{4} | Project identifiers |
| Customer Account | CUST[0-9]{8} | Customer numbers |
| Internal Document ID | DOC-[A-Z0-9]{10} | Document tracking |
Step 4: Create Sensitivity Labels
Sensitivity labels classify and protect content:
- Navigate to Microsoft Purview > Information protection
- Click Labels tab
- Click + Create a label
Create Label Hierarchy
Parent Label: Confidential
-
Configure:
- Name: Confidential
- Display name: Confidential
- Description for users: "Business data that could harm the company if disclosed"
- Description for admins: "Parent label for confidential content"
-
Click Next
-
Scope: Select:
- Items (files, emails)
- Groups & sites (optional)
-
Click Next
-
Items: Configure protection:
- Apply content marking: Header, footer, watermark
- Encryption: Not on parent label
-
Complete wizard and create label
Sub-Label: Confidential - All Employees
-
Select the Confidential label
-
Click + Create sublabel
-
Configure:
- Name: Confidential - All Employees
- Display name: Confidential - All Employees
- Color: Yellow or Orange
-
Under Items settings:
- Content marking: Enable header "CONFIDENTIAL"
- Encryption: Enable
- Assign permissions now: All employees > Co-Author
-
Complete wizard
Recommended Label Structure:
| Label | Sub-label | Encryption | Marking |
|---|---|---|---|
| Public | - | None | None |
| Internal | - | None | Footer |
| Confidential | All Employees | Yes - All employees | Header + Footer |
| Confidential | Specific People | Yes - Specific users | Header + Footer |
| Highly Confidential | All Employees | Yes - View only | Watermark + Header |
| Highly Confidential | Specific People | Yes - Specific users | Watermark + Header |
Step 5: Publish Labels to Users
Make labels available to users:
- Navigate to Information protection > Label policies
- Click + Publish labels
Configure Policy
-
Choose labels: Select labels to publish
-
Admin units: Select if using (optional)
-
Users and groups:
- All users (recommended)
- Or specific groups for pilot
-
Policy settings:
- Require users to apply a label: Optional initially
- Default label for documents: "Internal" recommended
- Default label for emails: None or "Internal"
- Require justification to remove a label: Yes
-
Name: "Organization Sensitivity Labels"
-
Click Create
Note: Labels can take up to 24 hours to appear for users.
Step 6: Configure Auto-Labeling Policies
Automatically apply labels based on content:
- Navigate to Information protection > Auto-labeling
- Click + Create auto-labeling policy
Policy Configuration
-
Choose info to label: Select sensitive information types:
- Credit Card Number
- Social Security Number
- Bank Account Number
- (Add relevant SITs)
-
Name: "Auto-label Financial Data"
-
Description: "Automatically labels documents containing financial data"
-
Choose locations:
- Exchange (email)
- SharePoint sites
- OneDrive accounts
-
Define policy settings:
- Sensitivity label: Confidential - All Employees
-
Decide when to run:
- Simulation mode: Yes (start here)
- Turn on automatically: No initially
-
Click Create
Run Simulation
- Wait for simulation to complete (can take hours/days)
- Review results in Content to review
- Validate matches are appropriate
- Tune policy if needed (false positives)
- Enable policy when satisfied
Step 7: Enable Trainable Classifiers
Use AI-based classifiers for complex content:
-
Navigate to Data classification > Trainable classifiers
-
Review built-in classifiers:
- Resumes
- Source code
- Agreements/Contracts
- Financial statements
- Healthcare forms
- HR documents
- Intellectual property
- Threat/Harassment
- Profanity
-
Enable classifiers for use in policies:
- These are pre-trained and ready to use
- Can be used in auto-labeling and DLP policies
Create Custom Trainable Classifier (Optional)
If built-in classifiers don't meet needs:
- Click + Create trainable classifier
- Provide sample content:
- Minimum 50 positive samples
- Minimum 50 negative samples
- Train and validate
- Publish when accuracy is acceptable
Note: Custom classifiers require significant sample data and training time.
Step 8: Configure Content Explorer Permissions
Grant access to view classified content:
-
Navigate to Microsoft Purview > Roles & scopes > Role groups
-
Find or create a role group with:
- Content Explorer Content Viewer
- Content Explorer List Viewer
-
Add security team members
-
Click Save
Privacy Note: Content Explorer shows actual file content. Limit access appropriately.
Step 9: Configure Classification Dashboards
Set up monitoring for classification:
-
Navigate to Data classification > Overview
-
Review available cards:
- Top sensitive info types
- Top sensitivity labels
- Locations with labeled content
-
Create custom reports:
- Go to Reports > Data classification
- Review available reports
- Export for analysis
Step 10: Test and Validate Classification
Before full rollout:
-
Create test content:
- Documents with known sensitive data
- Various file types (Word, Excel, PDF)
-
Upload to test location:
- Dedicated SharePoint site
- Test user OneDrive
-
Verify classification:
- Check Content Explorer for detection
- Verify auto-labeling applied
- Test protection is working
-
Document results:
- False positives
- Missed detections
- Performance issues
Verification Checklist
After configuring data classification, verify:
- Sensitivity labels are visible to users in Office apps
- Auto-labeling policies are running in simulation mode
- Content Explorer shows classified content
- Trainable classifiers are enabled
- Custom SITs are detecting expected content
- Label policies are published to all users
- Content marking (headers/footers) is appearing
- Encryption is working for protected labels
- Reports are populating with data
- Security team has Content Explorer access
Troubleshooting
Issue: Labels Not Appearing in Office Apps
Cause: Policy publishing delay or client configuration.
Solution:
- Wait 24-48 hours for policy propagation
- Ensure user has appropriate license
- Check Office is up to date (semi-annual channel)
- Sign out and back in to Office
- Check for group policy conflicts
Issue: Auto-Labeling Not Working
Cause: Policy configuration or location scope.
Solution:
- Verify policy is enabled (not simulation mode)
- Check content contains expected sensitive data
- Verify location (SharePoint, OneDrive) is in scope
- Check confidence level thresholds
- Wait for background processing (can take hours)
Issue: Content Not Appearing in Content Explorer
Cause: Indexing delay or permissions.
Solution:
- Wait 24-72 hours for initial indexing
- Verify audit logging is enabled
- Check user has Content Explorer permissions
- Verify content is in indexed locations
- Check for very large files (may be delayed)
Issue: False Positives in Classification
Cause: Overly broad patterns or low confidence matches.
Solution:
- Review the specific matches in Content Explorer
- Increase confidence level requirements
- Add exclusion patterns to SITs
- Use supporting elements (keywords)
- Create custom SITs with more specific patterns
Issue: Encryption Preventing Access
Cause: Rights management configuration.
Solution:
- Verify user has appropriate permissions
- Check RMS connector if using on-premises
- Verify Azure RMS is enabled
- Clear RMS cache on client
- Check for policy conflicts
Cost Considerations
License Requirements
| Feature | License | Cost (approx.) |
|---|---|---|
| Manual sensitivity labels | M365 Business Premium | $22/user/month |
| Auto-labeling | M365 E5 | $57/user/month |
| Content explorer | M365 E5 | Included in E5 |
| Trainable classifiers | M365 E5 | Included in E5 |
| E5 Compliance add-on | E5 Compliance | $12/user/month |
Cost Optimization Strategies
-
Phase licensing:
- Start with Business Premium for manual labeling
- Upgrade to E5 when auto-labeling needed
-
Target high-risk users:
- E5 for users handling sensitive data
- Basic licenses for general users
-
Use built-in features first:
- Built-in SITs before custom
- Pre-trained classifiers before custom
Storage Considerations
Encryption may slightly increase storage usage:
- Encrypted files have metadata overhead
- Plan for 5-10% increase in storage
Best Practices
-
Start simple:
- Begin with 3-5 sensitivity labels
- Use built-in SITs before custom
- Enable auto-labeling for obvious cases first
-
Test thoroughly:
- Use simulation mode before enabling policies
- Test with representative content
- Validate in multiple Office versions
-
Train users:
- Explain why classification matters
- Show how to apply labels
- Provide quick reference guides
-
Monitor continuously:
- Review Content Explorer weekly
- Monitor auto-labeling accuracy
- Adjust policies based on findings
-
Document everything:
- Classification taxonomy
- Policy decisions
- Exception handling
-
Plan for scale:
- Start with pilot groups
- Expand gradually
- Monitor performance
Related Controls
- DLP-02: Blocking bulk data exfiltration
- LOG-01: Audit log retention
- GOV-04: Incident response procedures
Revision History
| Date | Version | Author | Changes |
|---|---|---|---|
| 2025-01-07 | 1.0 | TrueConfig | Initial release |