In the attached picture you’ll be able to see what’s happening to one of my most important servers containing some irreplaceable climate data. I’m at a loss and understand what’s going on because Linux is not my specialty and my Linux expert has since moved on to other ventures. It’s very important I get this server working again, so I’m asking the WUWT community for help.
The server had been offline for a few weeks, and was properly shut down. Upon powering it back up, I got a “no bootable disk found” message. I determined the RAID (Hardware RAID1 – 2 mirrored drives) had been degraded, and it seemed one disk had failed. So I purchased two new identical HD’s cloned the good one, and rebuilt the RAID1. The RAID is administered by an on-board Adaptec RAID controller, and it reports the RAID as healthy.
What happens now is that it attempts to boot, but gets stuck in a loop on the last messages “Init ID c1, c2…etc” and repeats those error messages. I get the same partial boot and error sequence if I take out the RAID in BIOS, and try booting a single drive in straight SATA mode.
This machine was built circa 2007, and has Slackware Linux of that era installed, I don’t see a version number coming up on boot, so can’t provide it.
Any and all help appreciated. – Anthony

I recommend data recovery using a bootable CD, a live Linux disk that handles disk arrays and LVM. We use Fedora, Centos, FreeNAS. After data recovery replace the computer with a NAS box.
You could either put FreeNAS on generic hardware or get a commercial NAS box. We recommend Synology. They run well and keep their Operating Systems up to date.
FreeNAS loads the operating system from either USB or SSD read only in RAM and it has very good self maintenance hard drive tools.
Best regards and I hope you are up and running soon.
Can you boot the computer into single user mode? Interrupt the boot process and probably type something like boot s at the prompt. It looks like the disks are good now (at least at this point in the boot process) based on the lack of failure messages on your boot screen. Sorry I don’t have direct expertise on this version, but in general before you go off into data recovery land you try to boot single user.
I had something similar happen to a friends old computer. After he had it unplugged for a few hours during a house move, when he started it back up again, he got a message about not finding the Raid drive…I, went into his BIAS setup and found that the BIAS settings had been reset to Factory Default…The BIAS date was set back to 2004 !!! This was caused by the death of the internal BIAS battery.. I replaced the battery and then reset ONLY the BIAS date to present date and it rebooted fine…Some adjustments later to get everything updated, but the hard drives did work…(and it only cost $1.00 for the battery…LOL)
The above is a great story and great advice, but I find it hard to believe Anthony got as far as this without noticing the clock was reset in the BIOS!
Another one for you Anthony, according to this http://kerberix.com/cgi-bin/howto.pl?artikel=artikelen/howto/solve_the_init_io-error.html
your problem does not necessarily have something to do with the physical console settings but instead with a diskrelated hardware problem. /dev/console resides on a disk and if you are getting corrupt data from that disk you may end up with I/O error for /dev/console.
I would:
1. boot from CD/USB.
2. Rescue the valuable data to some external drive.
3. Install a spanking brand new Linux system.
Meh. The only information behind the file /dev/console is a “dev_t” with a major and minor number:
sl:vantage$ ls -l /dev/console
crw——- 1 root root 5, 1 Mar 22 14:10 /dev/console
The “5” identifies which device driver to use, in this case, the 6th loaded device driver. Your mileage will vary. The minor number is passed to the device driver to tell it which thing to work on, in this case, thing 1. That value probably matches yours.
As for “vantage” – that’s because I had to change the battery on my old Vantage Pro weather station sensor unit and apparently manipulated the connector for the wind vane, so it’s be displaying “north” since Saturday.
That may have been the day my SSD main disk disappeared electrically, even reset didn’t bring it back, but cycling power did. I guess it’s a bad week for electronics. I might install a new Linux Mint on the previous computer, but it’s motherboard beeps out “No RAM.” May be time for two SSDs, a new motherboard, bigger RAM, and probably a bigger display. I set that system up in 2006….
Well, that was pretty painless. Mostly out of curiosity (and an urge to have a working backup system), I found my old system had a couple disk partitions on a second (160 MB) disk sized for a swap and root partition. I managed to install LinuxMint on that root-sized partition without wiping out others, and seem to have LinuxMint’s Mate running now.
This will be it’s first post to WUWT. I’ll get a SSD drive and do it all again. In the meantime, I can use a current Linux distro for firefox, YouTube, etc. It would also help if the DVD drive opened reliably. 🙂
I might even try a new keyboard instead of this DEC LK471-AA keyboard with PS-2 connector.
Oh, the RAM problem went away when I removed the second RAM card. 2 GB ought to be enough for anyone.
Anthony Watts March 22, 2017 at 12:30 pm
We have the data backed up, but the program to administer it runs on Linux….
I wanted to call this comment out because it completely changes the problem. Most people in the thread are recommending in regard to data recovery which based on this isn’t exactly the problem since it is backed up. The problem appears to be that the application itself is custom and NOT backed up. Worse, my experience with systems of this type and age is that a lot of apps are riddled with O/S calls and won’t run even if you have the application code because you need the exact o/s variant that contains those exact calls (and other things) to run it on.
Over my head at this point, but those techies still following along, sounds to me like restoring the O/S is probably required, or isolating the application code, running it on a good known machine, and then dealing with any errors reported to reverse engineer something that works. Short version: Nightmare.
no, we have a backup of the program. I’m just trying to get an old machine to boot.
Ah. That changes everything… again 😉
If you have a copy of the program, I would suggest trying to get it to run on a recent Linux installation. If it runs, then you can install a newer Linux version on your server, rather than fixing the old one.
Maybe you tell us a bit more about this program? I suspect it is some sort of relational database application?
If you have a copy of the program, I would suggest trying to get it to run on a recent Linux installation.
What Michael said.
I, like many others here, could probably do something if we were sitting in front of the computer. The difficulty here is that I don’t know what it is supposed to do, and what your end goal is. Those errors might be perfectly normal given the hardware, but I don’t know what it did before this. Did you ever use it with a screen before or did it run entirely over a network?
Given that data is generally more valuable than configuration, I would boot into a rescue cd, mount the disks and recover the data, then set up the machine from scratch with a brand new os.
For something like this I would spend the time setting up something like puppet so that you can have a new machine running with almost zero intervention in under an hour.
I would also put the data on a separate partition so that the data and the os itself are not tied together.
my2p
understand have OS (don’t think slackware is supported anymore ) and data on 5 disks in a RAID, dunno much about how RAID works. You have a backup of the data but the software that handles the data is on the RAID ?
There is a possibility that something on the motherboard has failed. This happened to me a couple of weeks ago on a firewall, mate replaced motherboard and it works again.
OSs are a doddle to install these days.
I think you can have the data on a RAID machine and the OS on another.
as others have said make a liveCD bootable USB drive (there’s a fedora based one ) with something on it like systemrescueCD on another pc.
make a copy of the disks however you do that ( to one disk ? )
remove the disks, put in a new blank disc and boot with the USB stick and do some diagnostics before doing anything else.
not sure of the pros and cons arrays V daily backups but I don’t do anything critical.
Anthony,
I do this kind of Linux work professionally, and have for a few decades.
I have a fairly full set of kit. I recently built a Slackware (on a Raspberry Pi, for fun).
https://chiefio.wordpress.com/2016/05/18/slackware-on-pi-notes/
I’m familiar with back level OSs and hardware.
I’m happy to drive up and recover things, if you would like.
pub 4 all atsign aol dot com minus the spaces…
AO”Hell”?? Wow. You are old school my friend. And I thought I was old 🙂
I actually ran my own email servers for many years, including one at home, but keeping up with the SPAM filters was a pain. About 15? maybe only 10… years ago I moved to an AOL address for public things. They have a decent enough SPAM filter and the address itself keeps a lot of folks from sending me yet more email 😉
I’ve used many others over the years. At least a half dozen. Nothing is ideal.
More time to write.
There are a bunch of things that can cause Linux boot failures. The key question is whether the RAID+LVM+filesystem structures are intact. The best way to do that is boot off a recovery CD and see whether it detects and activates your volumes. If it does, then the problem is in the /boot filesystem.
I still have questions about hardware RAID in the adapter vs. software raid (“md” or metadevice). The boot screen sure looks to me like there is “md” software raid. The RAID controllers I use don’t allow the OS to even see the physical volumes, unless they use a special driver specific to that RAID controller. Instead the OS-level SCSI driver sees the logical volumes the hardware RAID controller presents to it, emulating a common SCSI adapter.
It looks to me like your two disks each have two partitions, a common Linux configuration. sda1 and sdb1 are usually small (500 MB or less), joined in a mirror and formatted as an ext2 filesystem mounted as “/boot”. This is where the second stage boot loader lives, the kernel image, and the initial RAM disk. The other to partitions sda2 and sdb2 occupy the rest of the physical disks and are also usually combined into a mirror and handed over to LVM as a single physical volume which is then carved up into “chunks” usually 32 MB or so, and allocated as needed to create additional filesystems.
I don’t see in the boot messages that the LVM layer activated any volumes, so it looks like the failure is happening before that point, or the LVM structures are not intact, which would explain everything.
You need a working system booted from a recovery CD to poke around and find out exactly what is wrong.
It looks like you are getting some good help, not the least of which is that it is easier if sitting at the machine. I would like to mention a few things that might be useful. It looks like your disks are good enough to start running init and get to runlevel 3, at which point you have trouble running getty. You can probably get a login by intercepting the boot loader, probably grub but it might be lilo in this system, and using the line “init=/bin/bash”. This will run a shell instead of init and you will get your / mounted read-only but you will be root. Depending on your partition scheme you might need to remount / and mount /usr to get the executables in /usr/bin and /usr/sbin. You can get some info about the state of your MD software raid with “cat /proc/mdstat”. You should look for the getty executable which may be corrupted. You might need to replace it manually. Getty is in /sbin and is linked to
#ldd /sbin/getty
linux-vdso.so.1 (0x00007ffdb19db000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f99c586b000)
/lib64/ld-linux-x86-64.so.2 (0x000055d1a3d39000)
any of those could be corrupted also. You will see slightly different versions.
LVM version 1 is different than lvm2 and if you want to boot from a rescue image you will need lvm1 support.
If you do this be sure to flush your file systems as your shutdown will not be clean without the full system init.
probably don’t want to read this now (so save this for later), but for the future:
TL:DR: Disk space is cheap. Make lots of copies. RAID is not a backup scheme
RAID is not a backup scheme. Too many common points of failure and too much fragility since it works at the block level. IMHO RAID is only useful for HA database systems and even then I’ve seen services taken down that depend on it. (e.g., Livejournal ~10 years ago).
My favorite backup scheme is to rsync the disk to a USB attached drive, and then unplug the USB drive and put the drive in a fireproof safe or alternative location. I note OSX users get this for free if they use Time Machine, but Time Machine is basically rsync. USB drives are cheap ($100 per 2TB or so now). Repeat as many times as you want a backup (3 if paranoid, with USB drives from different manufacturers).
If it is truly important and privacy is not a concern, you can make an additional back up to a cloud service (Amazon S3, dropbox, etc).
“In the attached picture you’ll be able to see what’s happening to one of my most important servers containing some irreplaceable climate data. ”
Perhaps some sympathy for Phil Jones would make the lost data recoverable.
No, they destroyed “lost” their data on purpose.
Yes, after they made sh*t up and refused to supply requested FOI material! Also I believe Anthony has stated he has a backup and l don’t think if he ‘lost’ the data it would cost the worlds economy several trillion £/$ in ‘climate mitigation’.
I believe you are confusing the word ‘lost’ with the word ‘deleted’.
Try DOS CMD administrator chkdsk /r /f
Ha-ha! I have a routine around here someplace on punched tape that should help. . .
Anthony,
Let’s break this down a bit. Is the Adaptec controller built on the motherboard or is a card plugged into a PCI slot? From the limited screen shot, it’s seems a built in, which is practically useless, IMHO, and often not supported by the o/s. Plus, if it was a real raid controller card, mdadm and/or LVM would be unnecessary. On the web I found a boot sequence from a similar o/s, it’s getting along to about where the drivers would kick in.
mdadm did have some issues with losing it’s config file, but is repairable, but this could be a hockey stick.
First clue is the partition check is good on each drive. ext2 is ext3 w/o journaling. IIRC, booting from a CD or USB should allow you to mount one of the disks and can check it for errors. The header on the disk will remember the raid config, but one at a time shouldn’t be an issue.
The situation now is the boot process, which appears to be failing.
The suggestion of the bios failure might be sound, if the “raid” got turned on, it could be incompatible with the o/s drivers.
Is it possible to get a screen shot(s) of the bios screens? If the battery went low, and the bios reset, it is possible to make the system unbootable.
If you have your boot cd, and a spare drive, build a new system with the computer. This will make mounting the disks easier and might point to the cause of the problem.
Why is the word “backup” missing in a sentence containing the words “irreplaceable data”? Because they would not be irreplaceable if it were there… Jesus Christ.
All I can say is that the comments here show just how intelligent this community is. I have been following WUWT almost since the beginning and it never ceases to amaze me the level of reader here. The idea that people who question global warming are actually less has definitely been laid to rest.
Anthony – You should be very proud of what you have accomplished. I always wanted to meet you after all these years, maybe one day it will happen : )
Seems to me that these days, in these United States, you have to be a Unix expert to count as a real man.
And there seem to be plenty of real men reading WUWT, as is right and proper.
I’m just a Unix amateur, but I know enough to respect the real men in the trade.
Re: Music to Recovery Your Data By:
“Brahms’ German Requiem is good music to debug an OS crash to.”
Nahh. Mendolssohn’s Reformation Symphony.. Otherwise something quiet and peaceful: Andres Segovia or Juliam Bream playing Vivaldi or Bach…
As to the problem at hand: I would suspect *hardware* issues at this point. It was shut down properly but did not come back up. The cmos battery has been mentioned but there are other things would could be a problem.
I would *remove* the cloned hard drives from that machine and put them into a new, known to be good box. Store the original drives (both of them) somewhere else so they are not mistaken for the clones. There may actually be nothing wrong with them, but it is not the time to explore that. Since the cloned drvies are RAID1 drives (clones of each other) you only need to install one of them.
The clone drive will not be listed in the new boxes fstab, so it will not be mounted.
Presuming a graphical (inittab 5) setup, use sfdisk -l to search for all of the partitions ( on all of the disks), and note what partitions sdx has. Or use gparted.
It will not be mounted as part of a raid setup, but it will be mountable.
You should have enough info at this point to mount the partition(s) in sdx.. which probably amount to one partition: sdx1. Mount in ro mode.
and now you can copy the data: I have an 1TB external USB drive which I picked up on a special for the grand sum of $69 (Cdn!!!). It’s a little slow (USB2) but it works. Since the stuff I *really* want to back up is only about 80G, I recently picked up a 128G Kingston Data Traveller USB stick (about $70 w tax) and copied to that.
At this point, I would not trust the hardware: too old. And I would not trust the software: that OS is far too old. Time to start again with something new.
I’ve been buying those 128 SanDisk USB3 drives from Costco for $28… Canadian!
They’re an awesome insurance for those few gigs of data that you REALLY need to protect, and they actually read and write faster than most of my HDs.
If your data is accessible, is it time to consider getting out of the infrastructure business and finding a home in the cloud for this? Tremendous storage and analytics capabilities available out ‘there’ pretty cheap.
In the cloud…um, no. Not no way, not no how.
Are you able to boot a live Linux CD or USB distribution like Knoppix on the computer?
Anthony,
Contact Tony Heller, he might be able to help you?
What’s with the Tony Heller fan club here. If I had a problem I would not trust it someone with whom I previous issues with concerning “alleged” lack of integrity, failing to admit errors and make necessary corrections. Oops, your RAID content is lost, it’s not my fault it must have been stuffed when you gave it to me. Tough.
1) An old disk failed on startup after a period of being unpowered.
2) A reported ‘no boot disk found’ was displayed.
3) It depends upon what issued that message, linux or the adaptec hw/sw.
4) The cloning of the disk could be faulty.
5) Something failed, to create this issue: battery, nvram, ic getting old, dry joint etc.
6) I would cut and run, copy everything in sight multiple times, bin the old hardware, get a new system and copy your data onto the new system.
7) Otherwise, if you successfully repair your old system, you will have a working *OLD* system, waiting for its next failure. All kit reaches its ‘end of economic life’ eventually.
3) It’s not Linux since it cant find a boot disk ! Sounds like BIOS. RAID firmware was detecting faulty h/w and did not present a device to BIOS. BIOS could not boot since it was expecting that device.
4) I suspect that may be the case otherwise it would have fully booted once the failed HD was replaced.
Personally, I always have a separate partition for data and the root fs and would not use a raid for the boot.
This is why several people have suggested a Linux live boot DVD which will enable examination of the file system being presented by the RAID device. There are tools to check the integrity of the fs.
I would also start by running MEMTEST86+ which is probably on the existing boot menu somewhere and will be on any live DVD boot options.
BINGO!
All you really know at this point is “something is wrong”.
So the best course of action is to replace as much as possible with known good parts.
IF possible, a different system (hardware). Then boot a new Linux. Now you know HW and SW are good. Proceed to disk duplication and data recovery.
IF NOT possible (disks depend on a particular controller for example) boot a recovery system then test and prove up the hardware enough to say you have a known good HW and SW rig. Proceed to disks…
I, too. put root on a distinct disk (with backup copy on other disks). You do not want a RAID failure to kill your boot / recovery system…
Anthony. Could you state the exact problem which you still have. As I understand it you have a clone of the original disk which is what you are now working with ( the original remaining good disk now being in a safe place ) and viable, working RAID hardware with two new disk drives. You say you have a backup of the data but presumably in some kind of specific, bespoke, compressed format which needs specific software to recover. Is that back up also on the RAID or physically stored elsewhere?
Reading between the lines, is the problem that you need to boot the existing Slackware system to use ( or reinstall ) the backup software to recover the backed up copy. Is needing to stick with the same mobo to retain the same RAID controller part of the equation?
There are a number of paths but a lot of perfectly good and wise comments do not seem to relate to the actual situation you currently have. To find the sticking point I think it is necessary to clearly lay out the problem as you currently perceive it.
Did I read you right, you cloned the remaining disk in the array and rebuilt from that?
You didn’t try rebuilding from the remaining good disk in the array?
Oh and backups..
isn’t this a bash error?
what os and kernel version is it?
a kernel update could do it and would not bee seen until next reboot
derp missed the last line showing os.
sorry disregard
OS version? To judge by the screeny : LVM 1.0.8 ( 23/11/2003) : this system has not been updated since it was installed in 2007 and even then was probably not brought up to date after the installation was completed. 😉
If it was not not externally connected that probably does not matter too much but may make installing / fixing things a little more complicated.
Slackware in on LVM2.0.2 and init version 2.88 ( cf 2.0.4 ) kernel is probably circa 2.6.21
yeah I noticed that, i just have zero experience in slackware so can’t help