Ask Your Question
1

Fedora 25, was working, now fails to boot

asked 2017-06-21 09:59:10 -0600

iliffe gravatar image

updated 2017-06-21 14:46:01 -0600

I had a working Fedora 25 system that now boots to the emergency system every time. Reading the journal I notice the following errors:

I/O error on /dev/sde1, sector 2056, logical block 1 async read failed command READ FPDMA QUEUED

Disk uuid F779ABD0-EF2E-4223-A766-78502BD48A96 does not exist

sde1 is one part of a RAID 6 cluster, sde2 is a swap partition.

I have been unable to boot the machine and I need to ensure that the data in the RAID cluster is not lost. What is the best way to get the machine running so I can replace the disk?

FYI, fdisk CAN read the partition table OK and the disk appears to actually be running when checked with e2fsck, although it obviously cannot verifiy anything.

MORE INFO: I disconnected disk sde which is part of the software RAID 6 cluster. System still won't boot but it does get further; now I get the error: md/raid not clean, starting background reconstruction; device sdc1 operational as raid disk 1, sde1 ... disk 4, sdb1 .... disk0, sdd1 ... disk 2. ; cannot start dirty degraded array; failed to start array /dev/md/root input output error; md/raid md127, failed to run raid set. Then it fails to boot and dracut starts the emergency shell.

But isn't the point of using RAID that if a disk fails things continue to run until a new disk is installed? As things stand, everything that was stored onthe RAID cluster is currently lost.

edit retag flag offensive close merge delete

Comments

Yes, RAID will keep running as long as the computer is on, but that's not what's happening here. You stopped the system and it doesn't want to automatically start the RAID because it's damaged. You will need to manually start it.

ssieb gravatar imagessieb ( 2017-06-21 19:13:00 -0600 )edit

Thanks Samuel. To put things in perspective, we got hit by lightning and it didn't do much for the power conditioning equipment. Everything just came to an abrupt halt.

iliffe gravatar imageiliffe ( 2017-06-21 22:02:06 -0600 )edit

2 Answers

Sort by » oldest newest most voted
1

answered 2017-06-21 14:38:01 -0600

ssieb gravatar image

Check if the RAID arrays were built using cat /proc/mdstat.

The best option for checking and recovery would be to use a live boot image.

edit flag offensive delete link more

Comments

/proc/mdstat --> Personalities [raid6][raid5][raid4]; md127:inactive sdc1[1] sde1[4] sdb1[0] sdd1[2] 2514866176 blocks super 1.2; unused devices <none>.

I can boot from the install media but how would I recover from there? I'm afraid of reformatting the RAID drive and losing the data. If I could get the IP port running I could scp the data to another server but I haven't been able to get that to happen yet.

iliffe gravatar imageiliffe ( 2017-06-21 14:56:50 -0600 )edit

Install should give network IP, and you should be able to get to terminal, to copy the data.

SteveEbey73701 gravatar imageSteveEbey73701 ( 2017-06-21 17:11:39 -0600 )edit

You can run the live boot without running the installer. The live image gives you a full Fedora environment to inspect and fix the installed system.

ssieb gravatar imagessieb ( 2017-06-21 19:14:39 -0600 )edit

OK, I think I am failing to understand the term "live boot". You mean the "rescue" function on the install options screen, on the installer USB key, right? When I run that it says "no Linux filesystems found" and gives me a shell prompt.

The problem is that it can't seem to find the RAID cluster. mdadm isn't there so I can't force it to assemble the cluster. I have just bought a new disk and I'll install that in the morning as I suspect the most recent errors (see "more info" above), may have been caused by the hard drives being remapped when I pulled the plug on sde.

iliffe gravatar imageiliffe ( 2017-06-21 21:56:56 -0600 )edit

I have installed an new disk, replacing sde. Using fdisk I added the partition table and marked it type 29 - Linux RAID. using mdadm from the install rescue system, I ran mdadm --detail /dev/md127 and got the 4 working disks listed as expected. Now whenever I try to assemble the array mdadm reports that it cannot get the array info for that array. Same result when I try to fail and re-add sde.

I note that --detail reports the array as RAID0, not RAID6 as it should be.

Where do I go from here?

iliffe gravatar imageiliffe ( 2017-06-22 10:23:29 -0600 )edit
0

answered 2017-06-22 11:54:39 -0600

iliffe gravatar image

OK, I got this working finally. Here are the nausiatingly complele details, just in case you get into the same bind.

First, I replaced the defective sde disk and after installing the new one, I booted from the install/rescue function on the installation USB key.

Now use fdisk to put a partition table on the new disk (fdisk /dev/sde; then g to create a new partition table; then n to add a new partition; then t set the partition type to RAID. (type 29 on my system, but check using l to be sure).

Now STOP the existing array using "mdadm --stop /dev/md127". That allowed the other mdadm functions to work properly. I had to stop twice as sdb was showing as busy the first time.

Then I did an "mdadm --assemble --force /dev/md127 --/dev/sd[bcdef]1" and that forced the degraded array to start and it is now rebuilding /dev/sde1. Odds are, that if you didn't have a spare disk, there won't be enough disks for RAID6 so you will need to use --force, but try it without first.

There is a LOT of information on RAID problems and configuration available at:

https://raid.wiki.kernel.org/index.ph...

edit flag offensive delete link more

Question Tools

1 follower

Stats

Asked: 2017-06-21 09:59:10 -0600

Seen: 477 times

Last updated: Jun 22 '17