Get your DC out of a boot loop

by

Over the weekend, one of our clients’ domain controllers entered into a boot loop condition after I restarted.

The next morning, I attempted to login to it but the server was offline. Investigation revealed that the server was in a boot loop.

The VM was running in Azure so my hand-on options were limited. Because the last time the server was seen it was requesting a restart after installing updates, and because boot loops are often due to a failed Windows update installation, I assumed that it was in a boot loop due to a failed Windows update installation. /start squandering an entire Sunday by restoring from backups.

After booting from backups also ending in a boot loop, further investigation revealed an error code (0xc00002e2) which turns out to be specific to domain controllers; if NTDS.DIT – the Active Directory database – is unavailable, the server crashes (“blue screens”) then restarts – a boot loop.

Per Microsoft’s best practices for running a Domain Controller in Azure, when creating the VM in July, I put the Active Directory data on its own volume separate from the C: volume:

“Create a separate virtual data disk for storing the database, logs, and SYSVOL for Active Directory. Do not store these items on the same disk as the operating system.”

After trying three restore points, I created another VM on Sunday and mounted the DC’s disks in that VM for inspection. Doing so revealed that the volume that contained the Active Directory data was encrypted which explained why the Active Directory data was unavailable which caused the server crash; without a TPM the volume would remain encrypted and the Active Directory data unavailable until manually decrypted.

With that mystery solved, I wanted to understand why the AD volume was encrypted. In my client’s environment, disk encryption is managed by policies in Bitdefender Gravityzone. There, I found that the DC’s object had inadvertently been moved to the container that applies security policies – including encryption policies – to PCs, not servers. The PC-specific security enforces encryption on all local disks. In order to avoid this very problem, the server policy does not. Microsoft Azure encrypts all Managed Disk volumes created after July 10, 2017 so additional, in-VM encryption is not necessary and can be problematic.

Resolution: I moved the DC’s object back to the Bitdefender container that applies server-specific security policies so that disk encryption would not be enforced. Then I disabled Bitlocker on the encrypted volume while it was mounted to the recovery VM, shutdown the recovery VM, returned the now-decrypted volume to the DC and booted it up without error.

In the end, the only mystery is on what date I inadvertently moved the DC object to the PC container inside the Bitdefender console. It stands to reason that the chronology went like this:

  1. I inadvertently moved the DC object to the PC container inside the Bitdefender console
  2. The newly applied security policy encrypted the AD volume. The AD data continued to be available because volumes are not encrypted when in-flight.
  3. When I restarted the server on Saturday night, the AD volume returned to an encrypted state which caused the AD data to be unavailable causing the 0xc00002e2 and the boot loop.