Documentation/Architecture/Scanning Process

Scanning Process

How TrueConfig scans your Microsoft 365 tenant: 8 phases from data collection to drift detection.

Process Overview

A scan is executed by the sync-tenant Edge Function (Deno runtime) which orchestrates 8 distinct phases. The entire process typically completes in 30-90 seconds for a tenant with 50-500 users.

text

Phase 1: Initialize Scan          (2-5 seconds)
Phase 2: Fetch Microsoft Data      (10-30 seconds)
Phase 3: Normalize & Store Data    (5-15 seconds)
Phase 4: Process Relationships     (3-8 seconds)
Phase 5: Create Security Snapshot  (2-5 seconds)
Phase 6: Detect Configuration Drift (5-15 seconds)
Phase 7: Run DSC Evaluations       (10-30 seconds)
Phase 8: Finalize & Notify         (2-5 seconds)

Total: ~30-90 seconds (varies by tenant size)

Parallel Processing

Phases 2 and 7 use parallel execution where possible. Microsoft Graph API calls are batched, and control evaluations run concurrently with a worker pool to maximize throughput.

Phase 1: Initialize Scan

Creates scan metadata record and validates prerequisites (tenant connection, OAuth token validity).

Key Operations

Generate unique scan_id UUID
Insert record into tenant_scans table with status = "running"
Retrieve Microsoft Graph API refresh token from tenant_connections
Exchange refresh token for new access token (90-day validity)
Validate required Graph API permissions are granted

typescript

// Simplified initialization logic
const scanId = crypto.randomUUID();

await supabase.from('tenant_scans').insert({
  scan_id: scanId,
  tenant_id: tenantId,
  organization_id: organizationId,
  status: 'running',
  started_at: new Date().toISOString()
});

// Decrypt and refresh access token
const accessToken = await refreshMicrosoftToken(tenantId);
const client = Client.init({ authProvider: ... });

Failure Handling

If token refresh fails (e.g., consent revoked), scan immediately fails with status = "error" and error_details JSONB set to {token_expired: true, requires_reconnect: true}.

Phase 2: Fetch Microsoft Data

Parallel API calls to Microsoft Graph endpoints to fetch users, roles, groups, applications, and Conditional Access policies. This is the most time-consuming phase.

Graph API Endpoints

GET /v1.0/users?$select=id,userPrincipalName,displayName,...

Fetches all users (paginated, 999 per request)

GET /v1.0/directoryRoles

Built-in directory roles (Global Admin, etc.)

GET /v1.0/roleManagement/directory/roleAssignments

Who has which admin roles

GET /v1.0/groups

All security and Microsoft 365 groups

GET /v1.0/servicePrincipals

Enterprise applications and service principals

GET /v1.0/applications

App registrations with API permissions

GET /v1.0/identity/conditionalAccess/policies

Conditional Access policies (JSONB stored)

GET /beta/reports/authenticationMethods/userRegistrationDetails

MFA registration status for all users (requires Reports.Read.All)

typescript

// Parallel data fetching with error handling
const [usersData, rolesData, groupsData, caPoliciesData] = await Promise.all([
  client.api('/users')
    .select('id,userPrincipalName,displayName,accountEnabled,userType')
    .top(999)
    .get(),

  client.api('/directoryRoles').get(),
  client.api('/groups').get(),
  client.api('/identity/conditionalAccess/policies').get()
]);

// Pagination handling for large result sets
let allUsers = usersData.value;
while (usersData['@odata.nextLink']) {
  usersData = await client.api(usersData['@odata.nextLink']).get();
  allUsers = allUsers.concat(usersData.value);
}

Rate Limiting

Microsoft Graph API enforces rate limits (10,000 requests per 10 minutes per app). TrueConfig implements exponential backoff with retry (max 3 attempts) if 429 errors occur.

Phase 3: Normalize & Store Data

Raw Graph API responses are normalized to consistent schema and bulk-inserted into tenant_* tables.

Data Normalization

Convert Microsoft date formats to ISO 8601 timestamps
Extract nested JSONB fields (e.g., CA policy conditions)
Map Microsoft enum values to TrueConfig schema (e.g., userType: "Member" → user_type: "member")
Calculate derived fields (is_privileged, is_mfa_enabled from related data)
Tag all rows with current scan_id and tenant_id

typescript

// Bulk insert with conflict resolution
const userRows = allUsers.map(u => ({
  tenant_id: tenantDbId,
  scan_id: scanId,
  user_id: u.id,
  user_principal_name: u.userPrincipalName,
  display_name: u.displayName,
  account_enabled: u.accountEnabled ?? true,
  user_type: u.userType?.toLowerCase() || 'member',
  created_date_time: u.createdDateTime,
  is_deleted: false
}));

// Upsert with composite key (tenant_id, scan_id, user_id)
await supabase
  .from('tenant_users')
  .upsert(userRows, { onConflict: 'tenant_id,scan_id,user_id' });

Performance

Bulk inserts use PostgreSQL COPY protocol under the hood for maximum throughput. Typical insert rate: 5,000 rows/second for user data, 10,000 rows/second for simpler tables.

Phase 4: Process Relationships

Enriches scan data by joining related entities and calculating derived fields (e.g., marking users as privileged if they have any admin role).

sql

-- Mark users as privileged if they have any role assignment
UPDATE tenant_users u
SET is_privileged = true
WHERE (tenant_id, scan_id, user_id) IN (
  SELECT tenant_id, scan_id, principal_id
  FROM tenant_role_assignments
  WHERE tenant_id = u.tenant_id
    AND scan_id = u.scan_id
    AND principal_type = 'User'
);

-- Calculate MFA status from authentication methods
UPDATE tenant_users
SET
  is_mfa_enabled = (
    SELECT COUNT(*) > 0
    FROM tenant_mfa_registrations m
    WHERE m.tenant_id = tenant_users.tenant_id
      AND m.scan_id = tenant_users.scan_id
      AND m.user_id = tenant_users.user_id
      AND m.method_type IN ('microsoftAuthenticator', 'phoneAppNotification')
  )
WHERE tenant_id = $1 AND scan_id = $2;

Phase 5: Create Security Snapshot

Complete security state serialized to JSONB for point-in-time comparison. Used for drift detection.

typescript

// Create snapshot JSONB
const snapshot = {
  conditionalAccessPolicies: caPolicies.map(p => ({
    id: p.policy_id,
    displayName: p.display_name,
    state: p.state,
    conditions: p.conditions,
    grantControls: p.grant_controls
  })),
  roleAssignments: roleAssignments.map(ra => ({
    id: ra.assignment_id,
    roleId: ra.role_definition_id,
    principalId: ra.principal_id,
    assignedVia: ra.assigned_via
  })),
  // ... other security-critical configuration
};

await supabase.from('snapshots').insert({
  tenant_id: tenantDbId,
  scan_id: scanId,
  snapshot_type: 'security_state',
  snapshot_data: snapshot
});

What's Included in Snapshots

All Conditional Access policies (conditions, grant controls, state)
All directory role assignments (who has admin privileges)
Security defaults status (enabled/disabled)
Enterprise application permissions (OAuth grants)
High-privilege service principals

Snapshot Retention

Snapshots kept for last 90 days regardless of plan tier. Older snapshots auto-deleted to manage storage costs (snapshots are large: ~50-500 KB per tenant).

Phase 6: Detect Configuration Drift

Compares current snapshot against previous scan's snapshot using PostgreSQL JSONB operators to detect configuration changes.

typescript

// Simplified drift detection logic
const { data: previousSnapshot } = await supabase
  .from('snapshots')
  .select('snapshot_data')
  .eq('tenant_id', tenantDbId)
  .lt('created_at', currentScanStartTime)
  .order('created_at', { ascending: false })
  .limit(1)
  .maybeSingle();

if (!previousSnapshot) {
  console.log('[drift] No previous snapshot, skipping drift detection');
  return;
}

const currentPolicies = currentSnapshot.conditionalAccessPolicies;
const previousPolicies = previousSnapshot.snapshot_data.conditionalAccessPolicies;

// Detect new policies
const newPolicies = currentPolicies.filter(c =>
  !previousPolicies.find(p => p.id === c.id)
);

// Detect modified policies
const modifiedPolicies = currentPolicies.filter(c => {
  const prev = previousPolicies.find(p => p.id === c.id);
  return prev && JSON.stringify(c) !== JSON.stringify(prev);
});

// Create drift events
for (const policy of newPolicies) {
  await createDriftEvent({
    drift_type: 'ca_policy_created',
    resource_id: policy.id,
    resource_name: policy.displayName,
    severity: policy.state === 'enabled' ? 'high' : 'medium',
    change_description: `Conditional Access policy "${policy.displayName}" was created`
  });
}

Drift Types Detected

Privilege Escalation: New admin role assignments
Policy Changes: CA policies created, modified, or disabled
Security Weakening: Security defaults disabled, MFA requirements removed
Application Changes: New app permissions granted, risky permissions added
User Changes: Guest users added, privileged accounts created

Actor Attribution

When possible, drift events include actor information (who made the change) from Microsoft Entra audit logs. If audit logs don't have the info, drift is recorded with actor_type = "unknown".

Phase 7: Run DSC Control Evaluations

Each enabled control in the tenant's baseline is evaluated against scan data. This is the core compliance checking phase.

Evaluation Steps (per control)

Load baseline level and plan tier to determine which controls to run
Filter controls: must be enabled, within baseline level, and within plan tier
Fetch evaluator function for control (e.g., PA-01 → evaluatePA01())
Load required scan data (uses data dependencies declared by evaluator)
Execute evaluator logic (compare expected vs actual state)
Generate result object: status, severity, summary, evidence, resource counts
Upsert to dsc_evaluations table
Record audit event via record_dsc_event() RPC

typescript

// Example: Evaluate PA-01 (Limit Global Administrators)
async function evaluatePA01(context: EvalContext, data: ScanData): Promise<EvalResult> {
  // Find Global Administrator role
  const globalAdminRole = data.roles.find(r =>
    r.role_template_id === '62e90394-69f5-4237-9190-012177145e10'
  );

  // Count direct assignments (exclude PIM eligible)
  const assignments = data.roleAssignments.filter(ra =>
    ra.role_definition_id === globalAdminRole.role_id &&
    ra.assignment_state === 'active' &&
    ra.assigned_via === 'direct'
  );

  const count = assignments.length;
  const status = (count >= 2 && count <= 4) ? 'pass' : 'fail';

  return {
    status,
    severity: status === 'pass' ? 'info' : 'critical',
    title: 'Limit Global Administrators to 2-4',
    summary: count < 2
      ? `Only ${count} Global Admins (minimum 2 required)`
      : count > 4
      ? `${count} Global Admins exceed recommended maximum of 4`
      : `${count} Global Admins (within recommended range)`,
    evidence: { assignments: assignments.map(a => a.principal_id) },
    resourcesEvaluated: assignments.length,
    resourcesCompliant: status === 'pass' ? count : 0,
    resourcesNonCompliant: status === 'fail' ? count : 0
  };
}

Data Dependencies (Optimization)

Each control declares which scan data it needs. Only required data is loaded from database:

typescript

// Control PA-01 data dependencies
const PA01_EVALUATOR = {
  controlId: 'PA-01',
  dataDependencies: ['roles', 'roleAssignments'], // Only load these
  evaluate: evaluatePA01
};

// Phase 7 loads union of all dependencies
const requiredDeps = new Set<DataDependency>();
for (const control of controlsToRun) {
  control.dataDependencies.forEach(d => requiredDeps.add(d));
}

// Fetch only required tables in parallel
const scanData = await loadScanData(supabase, tenantDbId, scanId, requiredDeps);

Performance

Phase 7 typically evaluates 15-33 controls in 10-30 seconds. Evaluations run sequentially to avoid database connection pool exhaustion, but each evaluation is optimized with indexes.

Phase 8: Finalize & Notify

Scan completion, statistics aggregation, and notification delivery.

Finalization Steps

Aggregate evaluation results (count pass/fail by severity)
Update tenant_scans record: status = "completed", completed_at = now()
Update tenants record: last_scan_at, last_scan_status
Generate notifications for critical findings (if any)
Trigger webhook callbacks (if configured)
Record final audit event (scan_completed)

typescript

// Aggregate stats from Phase 7
const stats = {
  controlsRun: evaluations.length,
  passed: evaluations.filter(e => e.status === 'pass').length,
  critical: evaluations.filter(e => e.status === 'fail' && e.severity === 'critical').length,
  high: evaluations.filter(e => e.status === 'fail' && e.severity === 'high').length,
  medium: evaluations.filter(e => e.status === 'fail' && e.severity === 'medium').length,
  low: evaluations.filter(e => e.status === 'fail' && e.severity === 'low').length
};

// Update scan record
await supabase
  .from('tenant_scans')
  .update({
    status: 'completed',
    completed_at: new Date().toISOString(),
    stats: stats
  })
  .eq('scan_id', scanId);

// Update tenant last_scan metadata
await supabase
  .from('tenants')
  .update({
    last_scan_at: new Date().toISOString(),
    last_scan_status: 'completed'
  })
  .eq('id', tenantDbId);

// Send notifications if critical findings
if (stats.critical > 0) {
  await sendNotification({
    type: 'critical_findings',
    tenant: tenantName,
    count: stats.critical,
    scanId: scanId
  });
}

Scan Complete

After Phase 8, the dashboard immediately reflects the new scan results. Users see updated control statuses, drift events, and severity counts without page refresh (via React Query invalidation).

Scan Triggers & Scheduling

How Scans Are Triggered

Manual: User clicks "Scan Now" button → calls Edge Function via RPC
Scheduled: Supabase cron job runs pg_cron trigger every 24 hours
API: External systems call POST /api/v1/scans with API key
Webhooks: Microsoft Graph change notifications trigger incremental scan

sql

-- Daily scan cron job (runs at 2:00 AM UTC)
SELECT cron.schedule(
  'daily-tenant-scans',
  '0 2 * * *',
  $$
  SELECT net.http_post(
    url := 'https://[project-ref].supabase.co/functions/v1/sync-tenant',
    headers := '{"Authorization": "Bearer [service-role-key]"}'::jsonb,
    body := jsonb_build_object('tenant_id', id)
  )
  FROM tenants
  WHERE status = 'active'
    AND (last_scan_at IS NULL OR last_scan_at < now() - interval '24 hours');
  $$
);

Error Handling & Retries

Scans are designed to be resilient to transient failures (network issues, API throttling).

Retry Strategy

Phase 2 (Graph API calls): Exponential backoff, max 3 retries per endpoint
Phase 3 (Database writes): Transaction rollback on error, retry once
Phase 7 (Control evaluation): Individual control errors don't fail entire scan
Entire scan: If scan fails, retry after 1 hour (max 3 total attempts)

typescript

// Exponential backoff for Graph API calls
async function fetchWithRetry(url: string, maxRetries = 3): Promise<any> {
  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    try {
      return await client.api(url).get();
    } catch (error) {
      if (error.statusCode === 429) {
        // Rate limited - wait and retry
        const delay = Math.pow(2, attempt) * 1000; // 2s, 4s, 8s
        await new Promise(resolve => setTimeout(resolve, delay));
        continue;
      }
      if (attempt === maxRetries) throw error;
    }
  }
}

Permanent Failures

If all retry attempts fail, scan status set to "failed" and error_details JSONB contains error message, stack trace, and failed phase. User receives email notification. Dashboard shows last successful scan data until next scan succeeds.

Data Storage

Connection Issues