Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

Florian,

Thank you for your answer. I had gone through the docs and created the RAID1 drives as you suggest but wanted to collect more information before responding.

My objective is to make the machine able to continue operating when there is a total disk failure. When i built the disks, i used manual partition in anaconda to configure RAID1 for /, /boot, and /home. The computer (ASROCK N3150-ITX) uses EFI to boot. Anaconda lets me create a single /boot/efi partition (btw, anaconda says it will create one on both drives but it does not). To get around this, i create the EFI partition prior to the anaconda server load and then manually set the file type and copy the files.

After all this is done, df shows:

[root@rocky ~]# df Filesystem 1K-blocks Used Available Use% Mounted on devtmpfs 3813248 0 3813248 0% /dev tmpfs 3824480 220 3824260 1% /dev/shm tmpfs 3824480 1528 3822952 1% /run tmpfs 3824480 0 3824480 0% /sys/fs/cgroup /dev/md127 31441920 4569432 26872488 15% / tmpfs 3824480 36 3824444 1% /tmp /dev/md126 488293 93611 364951 21% /boot /dev/sda1 204580 8620 195960 5% /boot/efi /dev/md125 3864414624 37024 3864377600 1% /home tmpfs 764900 28 764872 1% /run/user/1000 tmpfs 764900 0 764900 0% /run/user/0

Florian,

Thank you for your answer. I still cannot get the systems to properly boot after removing a drive. I had gone through the docs and created the RAID1 drives as you suggest suggested but wanted to collect more information before responding.

My objective is to make the machine able to continue operating when there is a total disk failure. When i built the disks, i used manual partition in anaconda to configure RAID1 for swap, /, /boot, and /home. The computer (ASROCK N3150-ITX) uses EFI to boot. Anaconda lets me create a single /boot/efi partition (btw, anaconda says it will create one on both drives but it does not). To get around this, i create the EFI partition prior to the anaconda server load and then manually set the file type and copy the files.

After all this is done, df shows:

[root@rocky ~]# df
Filesystem      1K-blocks    Used  Available Use% Mounted on
 devtmpfs          3813248       0    3813248   0% /dev
tmpfs              3824480     220    3824260   1% /dev/shm
tmpfs              3824480    1528    3822952   1% /run
tmpfs              3824480       0    3824480   0% /sys/fs/cgroup
/dev/md127       31441920 4569432   26872488  15% /
tmpfs             3824480      36    3824444   1% /tmp
/dev/md126         488293   93611     364951  21% /boot
/dev/sda1          204580    8620     195960   5% /boot/efi
/dev/md125     3864414624   37024 3864377600   1% /home
tmpfs              764900      28     764872   1% /run/user/1000
tmpfs              764900       0     764900   0% /run/user/0/run/user/0

gdisk shows EFI boot partitions on both sda and sdb although sdb says "EFI system" rather than "EFI system partition"

Disk /dev/sda: 7814037168 sectors, 3.6 TiB
Logical sector size: 512 bytes
Disk identifier (GUID): 3E2D7C4C-8AC4-428F-B08A-418C41EB0C80
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 7814037134
Partitions will be aligned on 2048-sector boundaries
Total free space is 3693 sectors (1.8 MiB)

Number  Start (sector)    End (sector)  Size       Code  Name
   1            2048          411647   200.0 MiB   EF00  EFI System Partition
   2          411648        63358975   30.0 GiB    FD00  
   3        63358976        80142335   8.0 GiB     8E00  
   4        80142336        81168383   501.0 MiB   FD00  
   5        81168384      7814035455   3.6 TiB     FD00 


Disk /dev/sdb: 7814037168 sectors, 3.6 TiB
Logical sector size: 512 bytes
Disk identifier (GUID): DB1BA04E-8DD8-44BD-8B99-2CA9D5061FE0
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 7814037134
Partitions will be aligned on 2048-sector boundaries
Total free space is 5740 sectors (2.8 MiB)

Number  Start (sector)    End (sector)  Size       Code  Name
   1            2048          409600   199.0 MiB   EF00  EFI System
   2          411648        63358975   30.0 GiB    FD00  
   3        63358976        80142335   8.0 GiB     8E00  
   4        80142336        81168383   501.0 MiB   FD00  
   5        81168384      7814035455   3.6 TiB     FD00

I then tried disconnecting first one drive and then the other and both drives did boot, but both had the same failure. the system said it was in emergency mode, the /home drive (/dev/md125) was not mounted, and journalctl showed lots of errors. i also tried to mount the md125 using 'mount /dev/md125 /home' but got an error 'unable to read superblock'

please forgive this long description and thank you again for your help i have screen captures of the failures if more info would help.

updated and SOLVED

Florian,

Thank you for your answer. The key to fixing this issue is that the arrays need to be rebuilt after a failure. I still cannot tried the following using MBR and GRUB as well as UEFI and it seems to work (albeit a bit cumbersome). I hope it will keep working after disk failure and will just need help after a reboot.

To make the system redundant:

  • boot first with the Workstation disk and go to Live mode, then utilities->terminal and use gdisk to create 2 identical /boot/efi partitions of size 200M in /dev/sda and dev/sdb. Simply start gdisk and use 'p' to see whats there and 'd' to get the systems to properly rid of anything. Then use 'n' with the default start and size 200M to create the partition. Be sure to set the type to EF00 (EFI boot).

  • Restart the machine with the server disk (assuming you want server code). Hit 'del' in the boot after removing a drive. I had gone through the docs and created the RAID1 drives process and insure that the DVD is used as you suggested but wanted to collect more information before responding.

    My objective is to make the machine able to continue operating when there is a total disk failure. When i built the disks, i used the primary boot device

  • select both disks and use Anaconda manual partition in anaconda to configure RAID1 for swap, /, /boot, and /home. The computer (ASROCK N3150-ITX) uses EFI to boot. Anaconda lets me create a single /boot/efi partition (btw, anaconda says it will create partitioning and first 'reformat' one on both drives but it does not). To get around this, i create the EFI partition prior to the anaconda server load and then manually set the file type and copy the files.

    After all this is done, df shows:

    [root@rocky ~]# df
    Filesystem      1K-blocks    Used  Available Use% Mounted on
    
    devtmpfs          3813248       0    3813248   0% /dev
    tmpfs              3824480     220    3824260   1% /dev/shm
    tmpfs              3824480    1528    3822952   1% /run
    tmpfs              3824480       0    3824480   0% /sys/fs/cgroup
    /dev/md127       31441920 4569432   26872488  15% /
    tmpfs             3824480      36    3824444   1% /tmp
    /dev/md126         488293   93611     364951  21% /boot
    /dev/sda1          204580    8620     195960   5% /boot/efi
    /dev/md125     3864414624   37024 3864377600   1% /home
    tmpfs              764900      28     764872   1% /run/user/1000
    tmpfs              764900       0     764900   0% /run/user/0
    

    gdisk shows of the 'unknown' EFI boot partitions to be the /boot/efi partition. You have to set the mount point. The create all the other partitions (including swap) as RAID1

  • Anaconda will say there is an error that efi/boot points to a raid array and will have boot problems upon failures and to hit done to go ahead anyway. Go ahead.

  • after the machine is up, all looks good. (look at cat /proc/mdstat and df to see)

  • copy the /boot/efi partition on both sda and sdb although sdb says "EFI system" rather than "EFI system partition"

    sda1 to sdb1 using  "Disk /dev/sda: 7814037168 sectors, 3.6 TiB
    Logical sector size: 512 bytes
    Disk identifier (GUID): 3E2D7C4C-8AC4-428F-B08A-418C41EB0C80
    Partition table holds up to 128 entries
    First usable sector is 34, last usable sector is 7814037134
    Partitions will be aligned on 2048-sector boundaries
    Total free space is 3693 sectors (1.8 MiB)
    
    Number  Start (sector)    End (sector)  Size       Code  Name
       1            2048          411647   200.0 MiB   EF00  EFI System Partition
       2          411648        63358975   30.0 GiB    FD00  
       3        63358976        80142335   8.0 GiB     8E00  
       4        80142336        81168383   501.0 MiB   FD00  
       5        81168384      7814035455   3.6 TiB     FD00 
    
    
    Disk /dev/sdb: 7814037168 sectors, 3.6 TiB
    Logical sector size: 512 bytes
    Disk identifier (GUID): DB1BA04E-8DD8-44BD-8B99-2CA9D5061FE0
    Partition table holds up to 128 entries
    First usable sector is 34, last usable sector is 7814037134
    Partitions will be aligned on 2048-sector boundaries
    Total free space is 5740 sectors (2.8 MiB)
    
    Number  Start (sector)    End (sector)  Size       Code  Name
       1            2048          409600   199.0 MiB   EF00  EFI System
       2          411648        63358975   30.0 GiB    FD00  
       3        63358976        80142335   8.0 GiB     8E00  
       4        80142336        81168383   501.0 MiB   FD00  
       5        81168384      7814035455   3.6 TiB     FD00
    dd if=/dev/sda1 of=/dev/sdb1

    " Many references said to use efibootmgr to create a different label for sdb1 but my hardware saw the second drive. Perhaps this will cause problem later, I then tried disconnecting first don't know.

  • shutdown machine an unplug one drive and then the other and both drives did boot, but both had the same failure. the system said the USB drive

  • reboot and watch the errors. It will come up in "emergency" mode.

  • enter the root password and do 'cat /proc/mdstat' In my case, all the arrays except /dev/md124 (which was the big /home directory) were fine. /dev/md124 was 'inactive' and the device which was there, /dev/sda5, was showing it was in emergency mode, the /home a spare [S].

  • to fix this, the array must be stopped the the device added. Note that you are adding the device which was listed as the [S] spare. mdadm --stop /dev/md124 and mdadm -A --force /dev/md124 /dev/sda5

  • reboot and it will work fine.

  • shutdown and reconnect the drive, then re-add all the other drives using mdadm --manage /dev/md124 --add /dev/sdb5 etc. Note that you can see what drive (/dev/md125) was not mounted, and journalctl showed lots numbers to add from the output of errors. i also tried to mount the md125 using 'mount /dev/md125 /home' but got an error 'unable to read superblock'

    please forgive this long description and thank you again 'cat /proc/mdstat'

good luck, thanks for your help i have screen captures of the failures if more info would help.

the patience.