DLP-01: Enabling Sensitive Data Classification

Overview

Sensitive data classification in Microsoft 365 uses trainable classifiers and sensitive information types (SITs) to automatically identify and label confidential information. This is the foundation for effective data loss prevention (DLP), enabling organizations to:

  • Automatically detect sensitive data like credit card numbers, SSNs, and health records
  • Apply sensitivity labels to protect documents and emails
  • Enable DLP policies that prevent unauthorized sharing
  • Meet compliance requirements for data protection (GDPR, HIPAA, PCI-DSS)
  • Provide visibility into where sensitive data exists in your organization

This guide covers enabling and configuring sensitive data classification in Microsoft 365.

Prerequisites

Required Roles

  • Global Administrator - Full configuration access
  • Compliance Administrator - Data classification and DLP configuration
  • Information Protection Administrator - Sensitivity label management
  • Security Administrator - Policy configuration

Required Licenses

FeatureLicense Required
Basic sensitivity labelsMicrosoft 365 Business Basic/Premium
Auto-labelingMicrosoft 365 E5 or E5 Compliance
Content explorerMicrosoft 365 E5 or E5 Compliance
Trainable classifiersMicrosoft 365 E5 or E5 Compliance
DLP policiesMicrosoft 365 E5 or E5 Compliance
Endpoint DLPMicrosoft 365 E5 or E5 Compliance

Required Permissions

  • Access to Microsoft Purview compliance portal
  • Content explorer permissions (Content Explorer Content Viewer role)
  • Access to Microsoft 365 admin center

Prerequisites Checklist

  • Microsoft 365 E5 or E5 Compliance licenses assigned
  • Sensitivity labels enabled in tenant
  • Users are actively using Microsoft 365 services
  • Audit logging is enabled

Time Estimate

TaskDuration
Planning classification requirements2-4 hours
Configuring sensitive information types1-2 hours
Creating sensitivity labels2-3 hours
Configuring auto-labeling policies2-3 hours
Testing and validation1-2 days
User training2-4 hours
Total2-4 weeks (including rollout)

Step-by-Step Instructions

Step 1: Review Current Data Classification Status

Before configuring, understand your current state:

  1. Navigate to Microsoft Purview: https://compliance.microsoft.com

  2. Go to Data classification in the left navigation

  3. Review the Overview dashboard:

    • Top sensitive information types detected
    • Top sensitivity labels applied
    • Data distribution by location
  4. Click Content explorer to see classified content:

    • Review detected sensitive data types
    • Identify where sensitive data is stored
    • Note any gaps in classification

Note: Content explorer requires E5 license and may take 24-48 hours to populate initially.

Step 2: Understand Sensitive Information Types (SITs)

Microsoft provides 300+ built-in sensitive information types:

CategoryExamples
FinancialCredit card numbers, bank accounts, SWIFT codes
HealthcareHIPAA identifiers, drug names, medical terms
PersonalSSN, driver's license, passport numbers
National IDsBy country (US, UK, EU, etc.)
RegulatoryGDPR data, PCI-DSS data

To view available SITs:

  1. Go to Data classification > Sensitive info types
  2. Browse or search for relevant types
  3. Click on a SIT to see:
    • Pattern details
    • Confidence levels
    • Example matches

Step 3: Create Custom Sensitive Information Types

If built-in types don't cover your needs:

  1. Navigate to Data classification > Sensitive info types
  2. Click + Create sensitive info type

Basic Settings

  1. Configure:
    • Name: "Company Project Codes"
    • Description: "Detects internal project codes in format PRJ-XXXXX"

Pattern Configuration

  1. Click + Add pattern

  2. Configure pattern elements:

    • Primary element: Regular expression
    • Pattern: PRJ-[A-Z0-9]{5}
  3. Add supporting elements (optional):

    • Keywords near the pattern
    • Additional patterns
  4. Configure confidence levels:

    • High confidence: Primary + 2 supporting
    • Medium confidence: Primary + 1 supporting
    • Low confidence: Primary only
  5. Set character proximity (within how many characters)

  6. Click Create

Example Custom SITs:

NamePatternUse Case
Employee IDEMP-[0-9]{6}Internal employee numbers
Project CodePRJ-[A-Z]{2}[0-9]{4}Project identifiers
Customer AccountCUST[0-9]{8}Customer numbers
Internal Document IDDOC-[A-Z0-9]{10}Document tracking

Step 4: Create Sensitivity Labels

Sensitivity labels classify and protect content:

  1. Navigate to Microsoft Purview > Information protection
  2. Click Labels tab
  3. Click + Create a label

Create Label Hierarchy

Parent Label: Confidential

  1. Configure:

    • Name: Confidential
    • Display name: Confidential
    • Description for users: "Business data that could harm the company if disclosed"
    • Description for admins: "Parent label for confidential content"
  2. Click Next

  3. Scope: Select:

    • Items (files, emails)
    • Groups & sites (optional)
  4. Click Next

  5. Items: Configure protection:

    • Apply content marking: Header, footer, watermark
    • Encryption: Not on parent label
  6. Complete wizard and create label

Sub-Label: Confidential - All Employees

  1. Select the Confidential label

  2. Click + Create sublabel

  3. Configure:

    • Name: Confidential - All Employees
    • Display name: Confidential - All Employees
    • Color: Yellow or Orange
  4. Under Items settings:

    • Content marking: Enable header "CONFIDENTIAL"
    • Encryption: Enable
    • Assign permissions now: All employees > Co-Author
  5. Complete wizard

Recommended Label Structure:

LabelSub-labelEncryptionMarking
Public-NoneNone
Internal-NoneFooter
ConfidentialAll EmployeesYes - All employeesHeader + Footer
ConfidentialSpecific PeopleYes - Specific usersHeader + Footer
Highly ConfidentialAll EmployeesYes - View onlyWatermark + Header
Highly ConfidentialSpecific PeopleYes - Specific usersWatermark + Header

Step 5: Publish Labels to Users

Make labels available to users:

  1. Navigate to Information protection > Label policies
  2. Click + Publish labels

Configure Policy

  1. Choose labels: Select labels to publish

  2. Admin units: Select if using (optional)

  3. Users and groups:

    • All users (recommended)
    • Or specific groups for pilot
  4. Policy settings:

    • Require users to apply a label: Optional initially
    • Default label for documents: "Internal" recommended
    • Default label for emails: None or "Internal"
    • Require justification to remove a label: Yes
  5. Name: "Organization Sensitivity Labels"

  6. Click Create

Note: Labels can take up to 24 hours to appear for users.

Step 6: Configure Auto-Labeling Policies

Automatically apply labels based on content:

  1. Navigate to Information protection > Auto-labeling
  2. Click + Create auto-labeling policy

Policy Configuration

  1. Choose info to label: Select sensitive information types:

    • Credit Card Number
    • Social Security Number
    • Bank Account Number
    • (Add relevant SITs)
  2. Name: "Auto-label Financial Data"

  3. Description: "Automatically labels documents containing financial data"

  4. Choose locations:

    • Exchange (email)
    • SharePoint sites
    • OneDrive accounts
  5. Define policy settings:

    • Sensitivity label: Confidential - All Employees
  6. Decide when to run:

    • Simulation mode: Yes (start here)
    • Turn on automatically: No initially
  7. Click Create

Run Simulation

  1. Wait for simulation to complete (can take hours/days)
  2. Review results in Content to review
  3. Validate matches are appropriate
  4. Tune policy if needed (false positives)
  5. Enable policy when satisfied

Step 7: Enable Trainable Classifiers

Use AI-based classifiers for complex content:

  1. Navigate to Data classification > Trainable classifiers

  2. Review built-in classifiers:

    • Resumes
    • Source code
    • Agreements/Contracts
    • Financial statements
    • Healthcare forms
    • HR documents
    • Intellectual property
    • Threat/Harassment
    • Profanity
  3. Enable classifiers for use in policies:

    • These are pre-trained and ready to use
    • Can be used in auto-labeling and DLP policies

Create Custom Trainable Classifier (Optional)

If built-in classifiers don't meet needs:

  1. Click + Create trainable classifier
  2. Provide sample content:
    • Minimum 50 positive samples
    • Minimum 50 negative samples
  3. Train and validate
  4. Publish when accuracy is acceptable

Note: Custom classifiers require significant sample data and training time.

Step 8: Configure Content Explorer Permissions

Grant access to view classified content:

  1. Navigate to Microsoft Purview > Roles & scopes > Role groups

  2. Find or create a role group with:

    • Content Explorer Content Viewer
    • Content Explorer List Viewer
  3. Add security team members

  4. Click Save

Privacy Note: Content Explorer shows actual file content. Limit access appropriately.

Step 9: Configure Classification Dashboards

Set up monitoring for classification:

  1. Navigate to Data classification > Overview

  2. Review available cards:

    • Top sensitive info types
    • Top sensitivity labels
    • Locations with labeled content
  3. Create custom reports:

    • Go to Reports > Data classification
    • Review available reports
    • Export for analysis

Step 10: Test and Validate Classification

Before full rollout:

  1. Create test content:

    • Documents with known sensitive data
    • Various file types (Word, Excel, PDF)
  2. Upload to test location:

    • Dedicated SharePoint site
    • Test user OneDrive
  3. Verify classification:

    • Check Content Explorer for detection
    • Verify auto-labeling applied
    • Test protection is working
  4. Document results:

    • False positives
    • Missed detections
    • Performance issues

Verification Checklist

After configuring data classification, verify:

  • Sensitivity labels are visible to users in Office apps
  • Auto-labeling policies are running in simulation mode
  • Content Explorer shows classified content
  • Trainable classifiers are enabled
  • Custom SITs are detecting expected content
  • Label policies are published to all users
  • Content marking (headers/footers) is appearing
  • Encryption is working for protected labels
  • Reports are populating with data
  • Security team has Content Explorer access

Troubleshooting

Issue: Labels Not Appearing in Office Apps

Cause: Policy publishing delay or client configuration.

Solution:

  1. Wait 24-48 hours for policy propagation
  2. Ensure user has appropriate license
  3. Check Office is up to date (semi-annual channel)
  4. Sign out and back in to Office
  5. Check for group policy conflicts

Issue: Auto-Labeling Not Working

Cause: Policy configuration or location scope.

Solution:

  1. Verify policy is enabled (not simulation mode)
  2. Check content contains expected sensitive data
  3. Verify location (SharePoint, OneDrive) is in scope
  4. Check confidence level thresholds
  5. Wait for background processing (can take hours)

Issue: Content Not Appearing in Content Explorer

Cause: Indexing delay or permissions.

Solution:

  1. Wait 24-72 hours for initial indexing
  2. Verify audit logging is enabled
  3. Check user has Content Explorer permissions
  4. Verify content is in indexed locations
  5. Check for very large files (may be delayed)

Issue: False Positives in Classification

Cause: Overly broad patterns or low confidence matches.

Solution:

  1. Review the specific matches in Content Explorer
  2. Increase confidence level requirements
  3. Add exclusion patterns to SITs
  4. Use supporting elements (keywords)
  5. Create custom SITs with more specific patterns

Issue: Encryption Preventing Access

Cause: Rights management configuration.

Solution:

  1. Verify user has appropriate permissions
  2. Check RMS connector if using on-premises
  3. Verify Azure RMS is enabled
  4. Clear RMS cache on client
  5. Check for policy conflicts

Cost Considerations

License Requirements

FeatureLicenseCost (approx.)
Manual sensitivity labelsM365 Business Premium$22/user/month
Auto-labelingM365 E5$57/user/month
Content explorerM365 E5Included in E5
Trainable classifiersM365 E5Included in E5
E5 Compliance add-onE5 Compliance$12/user/month

Cost Optimization Strategies

  1. Phase licensing:

    • Start with Business Premium for manual labeling
    • Upgrade to E5 when auto-labeling needed
  2. Target high-risk users:

    • E5 for users handling sensitive data
    • Basic licenses for general users
  3. Use built-in features first:

    • Built-in SITs before custom
    • Pre-trained classifiers before custom

Storage Considerations

Encryption may slightly increase storage usage:

  • Encrypted files have metadata overhead
  • Plan for 5-10% increase in storage

Best Practices

  1. Start simple:

    • Begin with 3-5 sensitivity labels
    • Use built-in SITs before custom
    • Enable auto-labeling for obvious cases first
  2. Test thoroughly:

    • Use simulation mode before enabling policies
    • Test with representative content
    • Validate in multiple Office versions
  3. Train users:

    • Explain why classification matters
    • Show how to apply labels
    • Provide quick reference guides
  4. Monitor continuously:

    • Review Content Explorer weekly
    • Monitor auto-labeling accuracy
    • Adjust policies based on findings
  5. Document everything:

    • Classification taxonomy
    • Policy decisions
    • Exception handling
  6. Plan for scale:

    • Start with pilot groups
    • Expand gradually
    • Monitor performance

Related Controls

  • DLP-02: Blocking bulk data exfiltration
  • LOG-01: Audit log retention
  • GOV-04: Incident response procedures

Revision History

DateVersionAuthorChanges
2025-01-071.0TrueConfigInitial release