OVH has this guide on what to do after a disk fail on a dedicated server with soft-RAID (ex: two mirrored disks).

I recently had a disk fail on a server with NVMe disks, which requires a EFI-enabled BIOS. Not being very familiar with EFI and NVMe, here are a few random notes:

  • “Serial over LAN” (SoL) IPMI at OVH does not work with their “netboot” kernels. The “KVM with Java applet” is required (which requires firefox-esr and java.. ugh).
  • When booting with a netboot kernel, use the RAID device as the boot device, i.e. /dev/md2 in my case.
  • Double-check the /boot/efi mountpoint in /etc/fstab. Make sure that the files are present on both devices. Copy over if necessary. I ended up copying the files from another host.
  • If you have a swap, make sure you mkswap on the new disk, since it’s not part of the RAID (not a big deal, but will cause boot warnings).
  • Update grub on both devices: grub-install /dev/nvme0n1 and grub-install /dev/nvme1n1.
  • Copy the EFI partition to the new disk: dd if=/dev/nvme1n1p1 bs=4M of=/dev/nvme0n1p1