Documentation/Features/Auto-Remediation

Auto-Remediation

Automatically fix security deviations while maintaining safety and control. Configure remediation modes, safety gates, and approval workflows to match your risk tolerance.

Opt-In Feature
Auto-remediation is disabled by default and requires explicit enablement. Even when enabled, safety gates ensure that high-risk changes still require manual approval. You maintain full control over what TrueConfig can change automatically.

What is Auto-Remediation?

Auto-remediation is TrueConfig's capability to automatically fix security deviations detected during scans. Instead of just alerting you to problems, TrueConfig can take corrective action to restore your environment to its baseline state.

Think of it as "infrastructure as code" for identity security - your baseline defines the desired state, and TrueConfig continuously enforces it through automated drift correction.

How It Works

1

Drift Detected

During a scheduled scan, TrueConfig detects that a control has failed (e.g., 5 Global Admins when your baseline allows max 3).

2

Safety Gates Check

Before taking action, TrueConfig runs safety gate checks to ensure the change is safe (reversible, no dependencies broken, emergency access verified).

3

Approval Decision

Based on the control's remediation mode and safety gate results, TrueConfig either auto-remediates, requests manual approval, or provides advisory guidance.

4

Remediation Executed

TrueConfig calls Microsoft Graph API to apply the fix, records audit events, and creates a rollback window for quick reversal if needed.

5

Verification Scan

The next scan verifies the fix was successful. If the control still fails, TrueConfig alerts your team for manual investigation.

Remediation Modes

Each control in your baseline can operate in one of three remediation modes. You can configure modes globally or override per-control based on your risk tolerance.

Advisory Mode

Read-only monitoring with remediation guidance

How It Works

TrueConfig detects deviations and provides step-by-step remediation instructions, but never makes changes automatically. You manually implement fixes using the provided guidance.

When to Use

  • First-time TrueConfig deployment (build trust)
  • High-risk controls (Global Admin changes, CA policies)
  • Controls affecting critical business workflows
  • Learning mode - understanding what changes TrueConfig recommends

Example Advisory Guidance

Control PA-01: Excessive Privileged Accounts

Status: FAIL | Severity: High

Issue:

You have 5 permanent Global Administrators. Your baseline allows maximum 3.

Why It Matters:

Excessive permanent privileged accounts increase attack surface and insider threat risk. Each Global Admin account is a potential path for attackers to gain full tenant control.

Recommended Actions:
  1. Review the 5 Global Admin accounts listed in the evidence
  2. Identify which 2 accounts can be removed or downgraded
  3. Consider using PIM (Privileged Identity Management) for just-in-time access
  4. Keep only break-glass emergency accounts as permanent Global Admins
How to Fix (Manual):
  1. Navigate to Entra ID → Roles and administrators
  2. Select "Global Administrator" role
  3. Remove unnecessary assignments
  4. Document why remaining accounts require permanent access
Best for Getting Started
Advisory mode is perfect for your first 30-60 days with TrueConfig. It lets you understand what changes TrueConfig would make without any risk to your production environment.

Manual Mode

One-click remediation after approval

How It Works

TrueConfig prepares the remediation, shows you exactly what will change, and waits for your approval. You review the planned changes, click "Approve & Execute," and TrueConfig applies the fix.

When to Use

  • Medium-risk controls (app ownership, secret expiration)
  • Controls with business impact (may affect workflows)
  • Compliance-required change management (audit trail needed)
  • Testing auto-remediation before full enablement

What an Approval Request Looks Like

APP-02: Secret Expiration Enforcement

Status: Pending Approval

Planned Change:

Update application secret expiration

Affected Resources:
  • App: Legacy API
  • Current expiration: December 31, 2026 (3 years from now)
  • New expiration: December 21, 2025 (12 months from now)
Safety Checks:
  • ✓ Change is reversible
  • ✓ No dependencies will break
  • ✓ Emergency access verified

Rollback available for 24 hours after execution

You review this request in the TrueConfig dashboard, verify the changes are acceptable, add approval notes if needed, and click "Approve & Execute."

Approval Controls

  • Role-Based Approval: Only users with security_admin or owner roles can approve
  • Audit Trail: All approvals logged with approver identity and timestamp
  • Timeout: Requests expire after 7 days if not approved
  • Rollback Window: 24-hour window to reverse the change
Recommended Mode
Manual mode is the recommended balance between safety and efficiency. You get the convenience of automated remediation with the security of human oversight.

Auto Mode

Automatic remediation with safety gates

How It Works

TrueConfig detects drift, validates safety gates pass, and automatically applies the fix without human intervention. All actions are logged, and rollback windows are available.

When to Use

  • Low-risk controls (app ownership assignment, audit log settings)
  • Controls with low blast radius (≤2)
  • Well-tested controls (after 30+ days in manual mode)
  • High-frequency drift (e.g., developers creating apps without owners)

Safety Gates (Required)

Before auto-remediation executes, ALL safety gates must pass:

1. Reversibility Gate

Change must be reversible. Controls with blast_radius ≥ 4 require manual approval. Irreversible actions (delete user, revoke all permissions) are blocked.

2. Dependency Gate

All prerequisites must be satisfied. Controls check that dependent controls have passed and required licenses are available.

3. Emergency Access Gate

Break-glass accounts must be verified within the last 30 days. This prevents auto-remediation from locking you out of your tenant.

4. Observability Gate

Success signals must be observable. The next scan must be able to verify the fix was applied successfully.

5. Enablement Gate

Auto-remediation must be enabled globally AND for this specific control. Per-control overrides can disable auto mode for high-risk controls.

How Safety Gates Protect You

Before any automatic change is made, TrueConfig runs all five safety gates. If even one gate fails, the change requires manual approval instead. This ensures that risky changes never happen automatically.

Gradual Rollout
Start with 1-2 low-risk controls in auto mode. Monitor for 2-4 weeks, then expand to additional controls. Never enable auto mode for all controls at once.

Configuring Remediation Settings

Remediation settings are configured at the tenant level and can be customized per-control.

Global Configuration

You can configure remediation settings through the TrueConfig dashboard in your tenant settings. This includes:

  • Global kill switch: Instantly disable all auto-remediation if needed
  • Emergency access verification: Track when you last tested your break-glass accounts
  • Control-specific overrides: Enable auto mode for low-risk controls (like APP-01) while keeping high-risk controls (like PA-01) in manual mode
  • Disable controls: Turn off controls that don't apply to your environment (with documented reason)

Enabling Auto-Remediation

Follow these steps to safely enable auto-remediation:

1

Verify Break-Glass Accounts

Test your emergency access accounts to ensure you can regain access if auto-remediation causes issues. Document the test in TrueConfig.

2

Grant Write Permissions

Add write permissions to TrueConfig's app registration (RoleManagement.ReadWrite.Directory, Policy.ReadWrite.ConditionalAccess, etc.).

3

Start with Manual Mode

Enable manual mode for 1-2 low-risk controls (APP-01, APP-02). Test the approval workflow and verify rollback works.

4

Enable Auto for Low-Risk Controls

After 2-4 weeks of successful manual remediation, promote controls to auto mode. Monitor closely for the first week.

5

Gradual Expansion

Incrementally add controls to auto mode. Never enable more than 2-3 controls per month until you've verified stability.

Never Auto-Remediate
Some controls should NEVER be auto-remediated: PA-01 (Global Admin limits), PA-03 (break-glass accounts), CA-01/CA-02 (MFA policies). These always require manual approval to prevent lockouts.

Rollback & Recovery

All remediation actions include rollback capabilities to quickly undo changes if issues arise.

Rollback Window

After any remediation executes, you have 24 hours to undo the change if needed. TrueConfig saves the previous configuration so you can restore it with one click.

For example, if TrueConfig updates an app secret's expiration from 3 years to 12 months, you can restore the 3-year expiration anytime within 24 hours.

How to Rollback

From TrueConfig Dashboard

Navigate to Remediation History → Select the remediation → Click "Rollback Changes". TrueConfig restores the previous configuration and records the rollback in audit logs.

Manual Rollback

If TrueConfig is unavailable, use the recorded before_snapshot from the audit log to manually restore the previous configuration via Entra ID portal or PowerShell.

Recovery Scenarios

Automatic Rollback

If the next scan detects the control still fails after remediation, TrueConfig can automatically rollback the change and alert your team.

Manual Intervention Required

For high-impact remediations (CA policies, role assignments), automatic rollback is disabled. Security admins must manually review and decide whether to rollback.

Emergency Disable

If auto-remediation causes widespread issues, set auto_remediation_enabled=false in tenant settings. All pending remediations are cancelled, and no new ones will be created.

Audit Trail & Compliance

All remediation actions are recorded in immutable audit logs for compliance and forensic analysis.

What Audit Events Capture

Every remediation action creates a detailed audit record that includes:

  • What changed: Exactly what was modified (e.g., "Removed 2 permanent Global Administrator assignments")
  • Before/after state: The configuration before the change (5 Global Admins) and after (3 Global Admins)
  • Who/what made the change: Whether it was automatic (system) or manually approved (with the approver's identity)
  • When: Precise timestamp of when the change occurred
  • Microsoft Graph correlation: Request IDs that can be cross-referenced with Entra ID audit logs

Compliance Features

Immutable Logs

Audit events are append-only and cannot be modified or deleted (enforced by database RLS policies).

Graph Request IDs

Every remediation includes Microsoft Graph request IDs for correlation with Entra ID audit logs.

Before/After Snapshots

Every change records the previous state and new state for forensic analysis.

Actor Attribution

System-initiated vs. user-initiated actions are clearly distinguished with user identity captured.

SOC 2 & ISO 27001 Ready
TrueConfig's audit trail meets requirements for SOC 2 Type II, ISO 27001, and similar compliance frameworks. Audit logs are retained for 7 years and can be exported for external audit.

Best Practices

Start Conservative

Begin with advisory mode for all controls. Progress to manual mode for 1-2 controls. Only enable auto mode after 30+ days of successful manual remediation.

Test Break-Glass Monthly

Verify emergency access accounts monthly. Document the test in TrueConfig to keep the emergency_access_verified gate current.

Monitor Remediation Metrics

Track remediation success rate, rollback frequency, and time-to-remediation. High rollback rates indicate you should downgrade controls to manual mode.

Document Overrides

When disabling auto-remediation for a control, document why in the control_overrides.reason field. Review quarterly.

Never Auto-Remediate in Production First

If you have a test/dev tenant, enable auto-remediation there first. Verify it works as expected before enabling in production.

Next Steps