Skip to content

Fix Recovery Lock password desync on MDM re-enrollment#43827

Draft
mostlikelee wants to merge 3 commits intomainfrom
43786-mdm-reenroll
Draft

Fix Recovery Lock password desync on MDM re-enrollment#43827
mostlikelee wants to merge 3 commits intomainfrom
43786-mdm-reenroll

Conversation

@mostlikelee
Copy link
Copy Markdown
Contributor

Related issue: Resolves #43786

Summary

Apple wipes a macOS host's Recovery Lock password whenever the MDM profile is removed, but Fleet's host_recovery_key_passwords state machine had no hook into re-enrollment. Two concrete desyncs resulted:

  1. Verified-but-gone — a host has a verified Recovery Lock; the MDM profile is later removed (manually on the device, by an admin, or via Fleet); the device re-enrolls. Fleet still shows status='verified' with the old password, but the device no longer has a Recovery Lock. The recovery-lock cron won't re-SET because GetHostsForRecoveryLockAction requires status IS NULL.

  2. Stuck-pending — the cron enqueued a SetRecoveryLock, the device re-sent Authenticate before acking (common during re-enrollment / SCEP renewal), and nanomdm's ClearQueue marked the queue entry active=0. The command was never delivered. Fleet's row sits at status='pending' indefinitely; no cron query matches that state.

Both states share the same exit point: mdmLifecycle.Do(HostActionReset)MDMResetEnrollment fires on every device Authenticate. That function already deletes other state that's invalidated by re-enrollment (MDM profiles, disk encryption keys, bootstrap-package rows) but didn't touch host_recovery_key_passwords. This PR adds the missing cleanup.

What the change does

  • server/datastore/mysql/apple_mdm.go — Inside MDMResetEnrollment's darwin-specific cleanup block (after the existing scepRenewalInProgress short-circuit), soft-delete the row and NULL pending_encrypted_password, pending_error_message, and auto_rotate_at. Soft-delete (rather than hard DELETE) preserves the row for support diagnostics; all live readers already filter deleted=0, and SetHostsRecoveryLockPasswords' INSERT … ON DUPLICATE KEY UPDATE re-animates it with deleted=0 on the next cron tick. The extra NULLs are essential: the upsert only resets encrypted_password, status, operation_type, error_message, and deleted, so without this a stale auto_rotate_at (from a previously viewed password) would fire auto-rotation against a freshly re-SET password.
  • server/datastore/mysql/hosts.go — Added a comment to the DeleteHost documentation block explaining that host_recovery_key_passwords is intentionally preserved across host deletion. The device may still be enrolled in Apple MDM with the password intact; Orbit re-enrollment recreates the host row and the existing password record remains usable for view/rotate. Only a real MDM re-enrollment invalidates the device-side state, and MDMResetEnrollment handles that path.
  • Self-healing for already-stuck production rows: the fix triggers on the next Authenticate from the affected device. Hosts that never re-authenticate need manual SQL remediation (UPDATE host_recovery_key_passwords SET deleted=1 WHERE host_uuid=?); next cron tick re-enqueues a fresh SetRecoveryLock.

Why not VerifyRecoveryLock or a refetch-time reset

Considered and deferred: periodic VerifyRecoveryLock sweeps, refetch-time detection of host_mdm.enrolled = 0, and reset-on-MDMTurnOff (CheckOut). MDMResetEnrollment is the right semantic boundary (re-enrollment implies the device-side lock is gone) and covers both desync scenarios with a single, minimal hook. The others would shrink stale-display windows but are not required for correctness and can land separately.

Status matrix

DB state UI status password_available After re-enroll
no row / soft-deleted absent n/a
pending, install, pw only pending true soft-deleted → cron re-SETs
verified, install verified true soft-deleted → cron re-SETs
failed, install failed true soft-deleted → cron re-SETs
pending/failed, install, pending rotation pending / failed true soft-deleted, pending pw nulled
pending/NULL, remove removing_enforcement true soft-deleted
failed, remove failed true soft-deleted

Checklist for submitter

  • Changes file added: changes/43786-recovery-lock-reenroll-desync.
  • Input data validated, parameterized SQL, no SELECT *.
  • No new timeouts/retries added.
  • No endpoint paths modified.

Testing

  • Added/updated automated tests — see below.
  • Host-isolation covered: the table-driven matrix test and the integration test exercise per-host rows; assertions are scoped by host_uuid.
  • QA'd all new/changed functionality manually — deferred to QA team (covered by unit + integration tests; manual recovery via SQL documented in changes file context).

Automated tests added

server/datastore/mysql/apple_mdm_test.go

  • RecoveryLockResetOnMDMReEnrollment (6 subtests): verified/stuck-pending/rotation soft-delete, pending_encrypted_password/pending_error_message/auto_rotate_at null-propagation guards, clean re-animation end-to-end, SCEP-renewal preserves the row.
  • DeleteHostPreservesRecoveryLockPassword: byte-for-byte preservation across DeleteHost, including the deleted flag.
  • HostRecoveryLockStatusMatrix: 10 table-driven cases locking GetHostRecoveryLockPasswordStatus + PopulateStatus output for every observable (status, operation_type, has_password, has_pending_pw, deleted) combination.
  • RecoveryLockReadersReturnNotFoundForSoftDeleted: GetHostRecoveryLockPassword, GetRecoveryLockRotationStatus, HasPendingRecoveryLockRotation, GetHostRecoveryLockPasswordStatus all behave correctly for soft-deleted rows — critical because the EE rotate endpoint depends on the notFound branch to return "Host does not have a recovery lock password to rotate."

server/service/integration_mdm_test.go

  • New re-enrollment soft-deletes stored password and cron re-SETs subtest in TestRecoveryLockPasswordIntegration: full enroll → verified → mdmClient.Reenroll() (canonical SCEP + Authenticate + TokenUpdate) → host detail API confirms absent → cron re-SETs → verified with a different password and different command UUID.

Database migrations

  • No schema migrations in this PR.

New Fleet configuration settings

  • No new configuration settings.

fleetd/orbit/Fleet Desktop

  • No changes to fleetd.

Adds the datastore-layer coverage for the MDMResetEnrollment
soft-delete and the DeleteHost non-cascade: reset behavior on every
row state, null-propagation guards against re-animation leaks, a
table-driven host-detail status matrix, and notFound contracts for
view/rotate readers on soft-deleted rows.
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 21, 2026

Codecov Report

❌ Patch coverage is 81.81818% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 66.93%. Comparing base (fa796cd) to head (1c23c5d).
⚠️ Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
server/datastore/mysql/apple_mdm.go 81.81% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #43827      +/-   ##
==========================================
+ Coverage   66.91%   66.93%   +0.01%     
==========================================
  Files        2600     2600              
  Lines      208985   209001      +16     
  Branches     9305     9305              
==========================================
+ Hits       139846   139895      +49     
+ Misses      56397    56372      -25     
+ Partials    12742    12734       -8     
Flag Coverage Δ
backend 68.72% <81.81%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Recovery lock password remains displayed in the OS Settings after disabling MDM on the host

1 participant