#1034812 installation-reports: Unbootable after install: UEFI installed to wrong ESP

#1034812#5
Date:
2023-04-24 23:29:21 UTC
From:
To:
(Please provide enough information to help the Debian
maintainers evaluate the report efficiently - e.g., by filling
in the sections below.)

Boot method: USB
Image version: Bullseye
Date: 04/23/2023

Machine: Home built PC
Partitions:
Filesystem                      Type     1K-blocks     Used Available Use% Mounted on
udev                            devtmpfs   3968820        0   3968820   0% /dev
tmpfs                           tmpfs       799096     1360    797736   1% /run
/dev/mapper/thumbdrive--vg-root ext4     116178928 28347284  81883844  26% /
tmpfs                           tmpfs      3995472        0   3995472   0% /dev/shm
tmpfs                           tmpfs         5120        4      5116   1% /run/lock
/dev/sda2                       ext2        481642   118611    338046  26% /boot
/dev/sda1                       vfat        523244    36784    486460   8% /boot/efi
tmpfs                           tmpfs       799092       76    799016   1% /run/user/1000


Base System Installation Checklist:
[O] = OK, [E] = Error (please elaborate below), [O] = didn't try it

Initial boot:           [O]
Detect network card:    [O]
Configure network:      [O]
Detect media:           [O]
Load installer modules: [O]
Clock/timezone setup:   [O]
User/password setup:    [O]
Detect hard drives:     [O]
Partition hard drives:  [O]
Install base system:    [O]
Install tasks:          [O]
Install boot loader:    [E]
Overall install:        [E]

Comments/Problems:

I installed from a thumb drive, to another thumb drive, on a computer
that had an nvme drive that should not have been touched.  The installer
overwrote data on the nvme drive despite the target being /dev/sda. I
manually mounted the installed system (the target thumb drive) on another
computer, figured out what happened (ESP was empty) and fixed it so I could
submit a bug report from the thumb drive that failed to install properly.

This is similar to the other UEFI installation problems, but it did
not install to the MBR, and it did not install any files on the correct
ESP, thus is is a separate issue.

The smoking gun for understanding what went wrong was in /etc/fstab, where
there were two comments:

# /boot was on /dev/sda2 during installation
# /boot/efi was on /dev/nvme0n1p1 during installation

This matches the fact that it left my nvme drive unbootable unless I manually
go into the boot menu and select the nvme drive every time. Failing to do so
results in a grub failure.

I have also repeated this on another computer (mac mini 2019) and was able to
reproduce these results, so I can confirm that it is not an issue unique to my
BIOS. That left a second computer unbootable short of manually going into the
boot menu and selecting the desired device on every boot.

I marked this as "normal" because installing onto a thumb drive and having
another disk in the computer which has an ESP is not a common use case. It
may be reasonable to increase the severity considering it overwrites a disk
that should not have been touched, leaving two systems unbootable (in my
case: the nvme drive and the USB drive).

#1034812#10
Date:
2023-04-25 17:55:27 UTC
From:
To:
Hello,
LVM (without encryption). Guided partitioning is supposed to not use any
partitions outside the selected disk by calling clean_method() defined
in partman-auto/lib/recipes.sh. This is what I observe with non-LVM
schemes, but the two LVM schemes have issues. Here is a summary of my
observations:

Guided - use the largest continuous free space
  calls clean_method() in partman-auto/autopartition
  does not run partman-efi/init.d/efi
  does not use existing EFI or swap partitions on other disks (good)

Guided - use entire disk
  calls clean_method() in partman-auto/autopartition
  does not run partman-efi/init.d/efi
  does not use existing EFI or swap partitions on other disks (good)

Guided - use entire disk and set up LVM
  does not call clean_method()
  runs partman-efi/init.d/efi
  uses existing EFI and swap partitions on other disks (bad)

Guided - use entire disk and set up encrypted LVM
  calls clean_method() in partman-auto-crypto/autopartition-crypto
  runs partman-efi/init.d/efi
  uses existing EFI partitions on other disks (bad)
  does not use existing swap partitions on other disks (good)

partman-efi/init.d/efi detects possible EFI partitions and sets method
"efi" on them.

As you can see, the issue also affects swap partitions (and they will be
reformatted with new UUIDs, which can be harmful if they are used by
another system).

Note: partman-auto-lvm used to call clean_method() in lib/auto-lvm.sh
but it was removed by commit cfc6797f6f561b87069160ba7c375c5b487b7c1e
with code factoring.

Suggested fix is two-fold:

1) Call clean_method() at the beginning of
partman-auto-lvm/autopartition-lvm, as is done in
partman-auto/autopartition and partman-auto-crypto/autopartition-crypto.
This should solve the issue for swap partitions but is not enough for ESPs.

2) In partman-efi/init.d/efi, set method "efi" only once, as is done
with swap partitions in partman-basicfilesystems/init.d/autouse_swap.
I already submitted two patch versions for #1034208 "Partman may reset
user's choice for ESP partitions use" as a follow-up to Steve's latest
fixes for #834373 and #1033913.

Caveat: I don't know if these changes could have any negative impact on
preseeded automatic partitioning.

#1034812#15
Date:
2023-04-27 08:05:25 UTC
From:
To:
Control: tags -1 patch

Patch attached for the partman-auto-lvm part.

#1034812#22
Date:
2023-04-28 16:35:03 UTC
From:
To:
I used "Guided - use entire disk and set up encrypted LVM"
#1034812#27
Date:
2023-07-17 15:45:18 UTC
From:
To:
Am 15.07.23 um 22:57 schrieb Pascal Hambourg:
 > Replacing /lib/partman/init.d/50efi with either attached 50efi.1 or
 > 50efi.2 as 50efi should fix the issue in guided partitioning with
 > encrypted LVM.
 >
 >    cp <source path>/50efi.1 /lib/partman/init.d/50efi
 > or
 >    cp <source path>/50efi.2 /lib/partman/init.d/50efi
 >

I tried each of them and they both solved my problem. /boot and /boot/efi
were on the same (and correct) disk. The installed system was bootable :-)

 > Fixing the issue in guided partitioning with unencrypted LVM also
 > requires replacing /bin/autopartition-lvm with the attached
 > autopartition-lvm.
 >
 >    cp <source path>/autopartition-lvm /bin/autopartition-lvm
 >
 > The files can be replaced after "Load installer components" and before
 > "Partition disks".

I could reproduce the same issue with unencrypted LVM (#1034812) on my
machine with my stock installer image.

Like above, replacing 50efi by 50efi.1 plus the other file in /bin made
the installation work alright. The system was bootable afterwards.

Thanks a lot for the quick fixes!

Cheers
Christof

#1034812#32
Date:
2023-07-17 18:13:23 UTC
From:
To:
Thanks for testing. Should #1034812 and #1041168 be merged ?
#1034812#37
Date:
2023-07-17 20:01:06 UTC
From:
To:
Am 17.07.23 um 20:13 schrieb Pascal Hambourg:

Yes, I think so.

#1034812#44
Date:
2024-05-09 10:53:20 UTC
From:
To:
By the way at "https://www.reddit.com/r/debian/comments/wfk4xt/debian_keeps_installing_bootloader_in_wrong_disk/" other people mention similar problems that are/could be related to this bug.



Jmkr




- The ESP created by me during manual disk partitioning on the SSD that is used for my Debian system was empty and not used at all (not mounted at "/boot/efi" and not mentioned in "/etc/fstab").

- Instead the ESP from the other SSD with the Windows 11 installation was used. It was mounted at "/boot/efi", its UUID was in "/etc/fstab" and DI/GRUB installed the "EFI/debian/grubx64.efi" file there right next to the "Microsoft" and "Boot" directories that contained the Windows garbage.

Thankfully I did not boot that Windows 11 installation, so Windows were not able to mess around with the Debian files that rudely appeared on "their" disk. I was able to (hopefully) fix the situation when I noticed it:

- I archived "/boot/efi/EFI/debian/grubx64.efi" from "/dev/sda1" to "/Archive.tar".

- Then, I unmounted "/dev/sda1" from "/boot/efi" and mounted "/dev/sdb1" to "/boot/efi" and extracted "/Archive.tar" in "/boot/efi" again.

- Later I changed UUID of ESP in "/etc/fstab" file to that of ESP on "/dev/sdb1".

- Finally using EFIBOOTMGR I deleted the EFI boot entry my system used and created a new entry for ESP on "/dev/sdb1".

- Then, I tested that the system boots as before using the new EFI and FSTAB entries and correct ESP.

Did I forget something (that will cause problems in the future) by not reinstalling GRUB or some other stuff?

Do the fixes mentioned above also address the manual partitioning case? If not perhaps you could check that case also. I will keep the Windows 11 installation (as this laptop is for testing anyway) = if you want I could test the fixed PARTMAN files, if these are provided so that my script for customizing the Debian Netinst ISO can include them (after I modify it to extract a new PARTMAN TAR or copy new PARTMAN files directly to the extracted Debian Netinst ISO). (In the future I plan to learn how to create DI ISO from scratch to be able to include modified UDEB packages etc. => if that is needed for testing and you can provide some starting instructions for me I could try that also.)

Regards,
Jmkr

#1034812#49
Date:
2024-05-09 10:42:41 UTC
From:
To:
I had a similar problem with my customized Debian 10 installer. I have not
customized PARTMAN related UDEB packages yet, so these are at Debian 10
versions. What I did was I installed my Debian in one of my test laptops
that also had another SSD with some Windows 11 installation that I plan to
nuke later:). All was working and booting, but when I added ESP capacity
monitoring to my CONKY configurations, I noticed that this laptop had
different ESP space occupied than my other laptops with the same hardware.
After checking what is going on I found that:

- The ESP created by me during manual disk partitioning on the SSD that is
used for my Debian system was empty and not used at all (not mounted at "/
boot/efi" and not mentioned in "/etc/fstab").

- Instead the ESP from the other SSD with the Windows 11 installation was
used. It was mounted at "/boot/efi", its UUID was in "/etc/fstab" and DI/
GRUB installed the "EFI/debian/grubx64.efi" file there right next to the
"Microsoft" and "Boot" directories that contained the Windows garbage.

Thankfully I did not boot that Windows 11 installation, so Windows were not
able to mess around with the Debian files that rudely appeared on "their"
disk. I was able to (hopefully) fix the situation when I noticed it:

- I archived "/boot/efi/EFI/debian/grubx64.efi" from "/dev/sda1" to "/
Archive.tar".

- Then, I unmounted "/dev/sda1" from "/boot/efi" and mounted "/dev/sdb1" to
"/boot/efi" and extracted "/Archive.tar" in "/boot/efi" again.

- Later I changed UUID of ESP in "/etc/fstab" file to that of ESP on "/dev/
sdb1".

- Finally using EFIBOOTMGR I deleted the EFI boot entry my system used and
created a new entry for ESP on "/dev/sdb1".

- Then, I tested that the system boots as before using the new EFI and FSTAB
entries and correct ESP.

Did I forget something (that will cause problems in the future) by not
reinstalling GRUB or some other stuff?

Do the fixes mentioned above also address the manual partitioning case? If
not perhaps you could check that case also. I will keep the Windows 11
installation (as this laptop is for testing anyway) = if you want I could
test the fixed PARTMAN files, if these are provided so that my script for
customizing the Debian Netinst ISO can include them (after I modify it to
extract a new PARTMAN TAR or copy new PARTMAN files directly to the
extracted Debian Netinst ISO). (In the future I plan to learn how to create
DI ISO from scratch to be able to include modified UDEB packages etc. => if
that is needed for testing and you can provide some starting instructions
for me I could try that also.)

Regards,
Jmkr

#1034812#54
Date:
2024-05-09 11:31:22 UTC
From:
To:
Sorry about the wrong order/formatting of last two messages. The webmail interface of Seznam.cz has totally idiotic and unconfigurable defaults and I keep forgetting to click their "Plain Text" button and keep forgetting to remove unnecessary "Quoted Reply Texts" that their idiotic interface forces on me. I will fix my Thunderbird profile to save myself from Seznam's idiotic webmail interface as soon as I have time for it.

Jmkr

#1034812#59
Date:
2024-05-09 12:49:51 UTC
From:
To:
You could just run grub-install to reinstall GRUB into the new ESP and
register it in EFI boot variables.

No, because in manual partitioning it is up to the user to decide which
ESP(s) is/are suitable for the installation, and set the others as "do
not use".

#1034812#64
Date:
2024-05-09 22:52:56 UTC
From:
To:
Pascal Hambourg wrote:

I wanted to try the manual way to learn + to create my EFI boot entry with a customized name. I think GRUB installation can only create EFI boot entry with the name "debian", or is it possible to change that?

So I must have forgotten to set the other ESP as "do not use", stupid me (I switched from MBR + BIOS to GPT + UEFI setup not so long ago and I guess it shows:). But is there a reason why DI partitioning does not set all (or previously existing) ESPs by default to "do not use" and let the user change that manually (perhaps with a reminder message if the user forgets to set the ESP in UEFI mode)? Maybe it would be more intuitive + it could avoid/minimize user errors? Or is a shared ESP so common that DI partitioning needs its current defaults?

Anyway, thanks for such a quick reply.

Jmkr

#1034812#69
Date:
2024-05-10 06:04:35 UTC
From:
To:
You can set an alternative name and location by running grub-install
with --bootloader-id=<name> or with GRUB_DISTRIBUTOR=<name> in
/etc/default/grub. It also affects the directory name in the ESP. But
depending on the grub package version, monolithic GRUB images (signed
for secure boot) do not support being installed in another location than
/EFI/debian (I have been advocating to fix this but no luck so far).

I don't know why it was designed this way.

The installer already warns if the partitioning does set an ESP.

Yes, the ESP is designed to be shared by all boot loaders.

#1034812#74
Date:
2024-05-17 11:22:29 UTC
From:
To:
Pascal Hambourg wrote:

Thanks for this as well as the other info you provided - it is nice to know even if I may end up not using it (as I probably want to use "/EFI/debian" directory for GRUB with just custom EFI boot entry names).

Jmkr

#1034812#79
Date:
2024-05-17 12:53:37 UTC
From:
To:
You can run grub-install with --no-nvram to install GRUB without writing
EFI boot variables. The Debian installer also has an option for this in
expert mode. Then you can create a custom boot entry with efibootmgr.