Documentation/Architecture/Scanning Process

Scanning Process

How TrueConfig scans your Microsoft 365 tenant: 8 phases from data collection to drift detection.

Process Overview

A scan is executed by the sync-tenant Edge Function (Deno runtime) which orchestrates 8 distinct phases. The entire process typically completes in 30-90 seconds for a tenant with 50-500 users.

text
Phase 1: Initialize Scan          (2-5 seconds)
Phase 2: Fetch Microsoft Data      (10-30 seconds)
Phase 3: Normalize & Store Data    (5-15 seconds)
Phase 4: Process Relationships     (3-8 seconds)
Phase 5: Create Security Snapshot  (2-5 seconds)
Phase 6: Detect Configuration Drift (5-15 seconds)
Phase 7: Run DSC Evaluations       (10-30 seconds)
Phase 8: Finalize & Notify         (2-5 seconds)

Total: ~30-90 seconds (varies by tenant size)
Parallel Processing
Phases 2 and 7 use parallel execution where possible. Microsoft Graph API calls are batched, and control evaluations run concurrently with a worker pool to maximize throughput.

Phase 1: Initialize Scan

Creates scan metadata record and validates prerequisites (tenant connection, OAuth token validity).

Key Operations

  • Generate unique scan_id UUID
  • Insert record into tenant_scans table with status = "running"
  • Retrieve Microsoft Graph API refresh token from tenant_connections
  • Exchange refresh token for new access token (90-day validity)
  • Validate required Graph API permissions are granted
typescript
// Simplified initialization logic
const scanId = crypto.randomUUID();

await supabase.from('tenant_scans').insert({
  scan_id: scanId,
  tenant_id: tenantId,
  organization_id: organizationId,
  status: 'running',
  started_at: new Date().toISOString()
});

// Decrypt and refresh access token
const accessToken = await refreshMicrosoftToken(tenantId);
const client = Client.init({ authProvider: ... });
Failure Handling
If token refresh fails (e.g., consent revoked), scan immediately fails with status = "error" and error_details JSONB set to {token_expired: true, requires_reconnect: true}.

Phase 2: Fetch Microsoft Data

Parallel API calls to Microsoft Graph endpoints to fetch users, roles, groups, applications, and Conditional Access policies. This is the most time-consuming phase.

Graph API Endpoints

GET /v1.0/users?$select=id,userPrincipalName,displayName,...

Fetches all users (paginated, 999 per request)

GET /v1.0/directoryRoles

Built-in directory roles (Global Admin, etc.)

GET /v1.0/roleManagement/directory/roleAssignments

Who has which admin roles

GET /v1.0/groups

All security and Microsoft 365 groups

GET /v1.0/servicePrincipals

Enterprise applications and service principals

GET /v1.0/applications

App registrations with API permissions

GET /v1.0/identity/conditionalAccess/policies

Conditional Access policies (JSONB stored)

GET /beta/reports/authenticationMethods/userRegistrationDetails

MFA registration status for all users (requires Reports.Read.All)

typescript
// Parallel data fetching with error handling
const [usersData, rolesData, groupsData, caPoliciesData] = await Promise.all([
  client.api('/users')
    .select('id,userPrincipalName,displayName,accountEnabled,userType')
    .top(999)
    .get(),

  client.api('/directoryRoles').get(),
  client.api('/groups').get(),
  client.api('/identity/conditionalAccess/policies').get()
]);

// Pagination handling for large result sets
let allUsers = usersData.value;
while (usersData['@odata.nextLink']) {
  usersData = await client.api(usersData['@odata.nextLink']).get();
  allUsers = allUsers.concat(usersData.value);
}
Rate Limiting
Microsoft Graph API enforces rate limits (10,000 requests per 10 minutes per app). TrueConfig implements exponential backoff with retry (max 3 attempts) if 429 errors occur.

Phase 3: Normalize & Store Data

Raw Graph API responses are normalized to consistent schema and bulk-inserted into tenant_* tables.

Data Normalization

  • Convert Microsoft date formats to ISO 8601 timestamps
  • Extract nested JSONB fields (e.g., CA policy conditions)
  • Map Microsoft enum values to TrueConfig schema (e.g., userType: "Member" → user_type: "member")
  • Calculate derived fields (is_privileged, is_mfa_enabled from related data)
  • Tag all rows with current scan_id and tenant_id
typescript
// Bulk insert with conflict resolution
const userRows = allUsers.map(u => ({
  tenant_id: tenantDbId,
  scan_id: scanId,
  user_id: u.id,
  user_principal_name: u.userPrincipalName,
  display_name: u.displayName,
  account_enabled: u.accountEnabled ?? true,
  user_type: u.userType?.toLowerCase() || 'member',
  created_date_time: u.createdDateTime,
  is_deleted: false
}));

// Upsert with composite key (tenant_id, scan_id, user_id)
await supabase
  .from('tenant_users')
  .upsert(userRows, { onConflict: 'tenant_id,scan_id,user_id' });
Performance
Bulk inserts use PostgreSQL COPY protocol under the hood for maximum throughput. Typical insert rate: 5,000 rows/second for user data, 10,000 rows/second for simpler tables.

Phase 4: Process Relationships

Enriches scan data by joining related entities and calculating derived fields (e.g., marking users as privileged if they have any admin role).

sql
-- Mark users as privileged if they have any role assignment
UPDATE tenant_users u
SET is_privileged = true
WHERE (tenant_id, scan_id, user_id) IN (
  SELECT tenant_id, scan_id, principal_id
  FROM tenant_role_assignments
  WHERE tenant_id = u.tenant_id
    AND scan_id = u.scan_id
    AND principal_type = 'User'
);

-- Calculate MFA status from authentication methods
UPDATE tenant_users
SET
  is_mfa_enabled = (
    SELECT COUNT(*) > 0
    FROM tenant_mfa_registrations m
    WHERE m.tenant_id = tenant_users.tenant_id
      AND m.scan_id = tenant_users.scan_id
      AND m.user_id = tenant_users.user_id
      AND m.method_type IN ('microsoftAuthenticator', 'phoneAppNotification')
  )
WHERE tenant_id = $1 AND scan_id = $2;

Phase 5: Create Security Snapshot

Complete security state serialized to JSONB for point-in-time comparison. Used for drift detection.

typescript
// Create snapshot JSONB
const snapshot = {
  conditionalAccessPolicies: caPolicies.map(p => ({
    id: p.policy_id,
    displayName: p.display_name,
    state: p.state,
    conditions: p.conditions,
    grantControls: p.grant_controls
  })),
  roleAssignments: roleAssignments.map(ra => ({
    id: ra.assignment_id,
    roleId: ra.role_definition_id,
    principalId: ra.principal_id,
    assignedVia: ra.assigned_via
  })),
  // ... other security-critical configuration
};

await supabase.from('snapshots').insert({
  tenant_id: tenantDbId,
  scan_id: scanId,
  snapshot_type: 'security_state',
  snapshot_data: snapshot
});

What's Included in Snapshots

  • All Conditional Access policies (conditions, grant controls, state)
  • All directory role assignments (who has admin privileges)
  • Security defaults status (enabled/disabled)
  • Enterprise application permissions (OAuth grants)
  • High-privilege service principals
Snapshot Retention
Snapshots kept for last 90 days regardless of plan tier. Older snapshots auto-deleted to manage storage costs (snapshots are large: ~50-500 KB per tenant).

Phase 6: Detect Configuration Drift

Compares current snapshot against previous scan's snapshot using PostgreSQL JSONB operators to detect configuration changes.

typescript
// Simplified drift detection logic
const { data: previousSnapshot } = await supabase
  .from('snapshots')
  .select('snapshot_data')
  .eq('tenant_id', tenantDbId)
  .lt('created_at', currentScanStartTime)
  .order('created_at', { ascending: false })
  .limit(1)
  .maybeSingle();

if (!previousSnapshot) {
  console.log('[drift] No previous snapshot, skipping drift detection');
  return;
}

const currentPolicies = currentSnapshot.conditionalAccessPolicies;
const previousPolicies = previousSnapshot.snapshot_data.conditionalAccessPolicies;

// Detect new policies
const newPolicies = currentPolicies.filter(c =>
  !previousPolicies.find(p => p.id === c.id)
);

// Detect modified policies
const modifiedPolicies = currentPolicies.filter(c => {
  const prev = previousPolicies.find(p => p.id === c.id);
  return prev && JSON.stringify(c) !== JSON.stringify(prev);
});

// Create drift events
for (const policy of newPolicies) {
  await createDriftEvent({
    drift_type: 'ca_policy_created',
    resource_id: policy.id,
    resource_name: policy.displayName,
    severity: policy.state === 'enabled' ? 'high' : 'medium',
    change_description: `Conditional Access policy "${policy.displayName}" was created`
  });
}

Drift Types Detected

  • Privilege Escalation: New admin role assignments
  • Policy Changes: CA policies created, modified, or disabled
  • Security Weakening: Security defaults disabled, MFA requirements removed
  • Application Changes: New app permissions granted, risky permissions added
  • User Changes: Guest users added, privileged accounts created
Actor Attribution
When possible, drift events include actor information (who made the change) from Microsoft Entra audit logs. If audit logs don't have the info, drift is recorded with actor_type = "unknown".

Phase 7: Run DSC Control Evaluations

Each enabled control in the tenant's baseline is evaluated against scan data. This is the core compliance checking phase.

Evaluation Steps (per control)

  1. Load baseline level and plan tier to determine which controls to run
  2. Filter controls: must be enabled, within baseline level, and within plan tier
  3. Fetch evaluator function for control (e.g., PA-01evaluatePA01())
  4. Load required scan data (uses data dependencies declared by evaluator)
  5. Execute evaluator logic (compare expected vs actual state)
  6. Generate result object: status, severity, summary, evidence, resource counts
  7. Upsert to dsc_evaluations table
  8. Record audit event via record_dsc_event() RPC
typescript
// Example: Evaluate PA-01 (Limit Global Administrators)
async function evaluatePA01(context: EvalContext, data: ScanData): Promise<EvalResult> {
  // Find Global Administrator role
  const globalAdminRole = data.roles.find(r =>
    r.role_template_id === '62e90394-69f5-4237-9190-012177145e10'
  );

  // Count direct assignments (exclude PIM eligible)
  const assignments = data.roleAssignments.filter(ra =>
    ra.role_definition_id === globalAdminRole.role_id &&
    ra.assignment_state === 'active' &&
    ra.assigned_via === 'direct'
  );

  const count = assignments.length;
  const status = (count >= 2 && count <= 4) ? 'pass' : 'fail';

  return {
    status,
    severity: status === 'pass' ? 'info' : 'critical',
    title: 'Limit Global Administrators to 2-4',
    summary: count < 2
      ? `Only ${count} Global Admins (minimum 2 required)`
      : count > 4
      ? `${count} Global Admins exceed recommended maximum of 4`
      : `${count} Global Admins (within recommended range)`,
    evidence: { assignments: assignments.map(a => a.principal_id) },
    resourcesEvaluated: assignments.length,
    resourcesCompliant: status === 'pass' ? count : 0,
    resourcesNonCompliant: status === 'fail' ? count : 0
  };
}

Data Dependencies (Optimization)

Each control declares which scan data it needs. Only required data is loaded from database:

typescript
// Control PA-01 data dependencies
const PA01_EVALUATOR = {
  controlId: 'PA-01',
  dataDependencies: ['roles', 'roleAssignments'], // Only load these
  evaluate: evaluatePA01
};

// Phase 7 loads union of all dependencies
const requiredDeps = new Set<DataDependency>();
for (const control of controlsToRun) {
  control.dataDependencies.forEach(d => requiredDeps.add(d));
}

// Fetch only required tables in parallel
const scanData = await loadScanData(supabase, tenantDbId, scanId, requiredDeps);
Performance
Phase 7 typically evaluates 15-33 controls in 10-30 seconds. Evaluations run sequentially to avoid database connection pool exhaustion, but each evaluation is optimized with indexes.

Phase 8: Finalize & Notify

Scan completion, statistics aggregation, and notification delivery.

Finalization Steps

  1. Aggregate evaluation results (count pass/fail by severity)
  2. Update tenant_scans record: status = "completed", completed_at = now()
  3. Update tenants record: last_scan_at, last_scan_status
  4. Generate notifications for critical findings (if any)
  5. Trigger webhook callbacks (if configured)
  6. Record final audit event (scan_completed)
typescript
// Aggregate stats from Phase 7
const stats = {
  controlsRun: evaluations.length,
  passed: evaluations.filter(e => e.status === 'pass').length,
  critical: evaluations.filter(e => e.status === 'fail' && e.severity === 'critical').length,
  high: evaluations.filter(e => e.status === 'fail' && e.severity === 'high').length,
  medium: evaluations.filter(e => e.status === 'fail' && e.severity === 'medium').length,
  low: evaluations.filter(e => e.status === 'fail' && e.severity === 'low').length
};

// Update scan record
await supabase
  .from('tenant_scans')
  .update({
    status: 'completed',
    completed_at: new Date().toISOString(),
    stats: stats
  })
  .eq('scan_id', scanId);

// Update tenant last_scan metadata
await supabase
  .from('tenants')
  .update({
    last_scan_at: new Date().toISOString(),
    last_scan_status: 'completed'
  })
  .eq('id', tenantDbId);

// Send notifications if critical findings
if (stats.critical > 0) {
  await sendNotification({
    type: 'critical_findings',
    tenant: tenantName,
    count: stats.critical,
    scanId: scanId
  });
}
Scan Complete
After Phase 8, the dashboard immediately reflects the new scan results. Users see updated control statuses, drift events, and severity counts without page refresh (via React Query invalidation).

Scan Triggers & Scheduling

How Scans Are Triggered

  • Manual: User clicks "Scan Now" button → calls Edge Function via RPC
  • Scheduled: Supabase cron job runs pg_cron trigger every 24 hours
  • API: External systems call POST /api/v1/scans with API key
  • Webhooks: Microsoft Graph change notifications trigger incremental scan
sql
-- Daily scan cron job (runs at 2:00 AM UTC)
SELECT cron.schedule(
  'daily-tenant-scans',
  '0 2 * * *',
  $$
  SELECT net.http_post(
    url := 'https://[project-ref].supabase.co/functions/v1/sync-tenant',
    headers := '{"Authorization": "Bearer [service-role-key]"}'::jsonb,
    body := jsonb_build_object('tenant_id', id)
  )
  FROM tenants
  WHERE status = 'active'
    AND (last_scan_at IS NULL OR last_scan_at < now() - interval '24 hours');
  $$
);

Error Handling & Retries

Scans are designed to be resilient to transient failures (network issues, API throttling).

Retry Strategy

  • Phase 2 (Graph API calls): Exponential backoff, max 3 retries per endpoint
  • Phase 3 (Database writes): Transaction rollback on error, retry once
  • Phase 7 (Control evaluation): Individual control errors don't fail entire scan
  • Entire scan: If scan fails, retry after 1 hour (max 3 total attempts)
typescript
// Exponential backoff for Graph API calls
async function fetchWithRetry(url: string, maxRetries = 3): Promise<any> {
  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    try {
      return await client.api(url).get();
    } catch (error) {
      if (error.statusCode === 429) {
        // Rate limited - wait and retry
        const delay = Math.pow(2, attempt) * 1000; // 2s, 4s, 8s
        await new Promise(resolve => setTimeout(resolve, delay));
        continue;
      }
      if (attempt === maxRetries) throw error;
    }
  }
}
Permanent Failures
If all retry attempts fail, scan status set to "failed" and error_details JSONB contains error message, stack trace, and failed phase. User receives email notification. Dashboard shows last successful scan data until next scan succeeds.