Is there a Linux specialist in the house?

In the attached picture you’ll be able to see what’s happening to one of my most important servers containing some irreplaceable climate data. I’m at a loss and understand what’s going on because Linux is not my specialty and my Linux expert has since moved on to other ventures. It’s very important I get this server working again, so I’m asking the WUWT community for help.

The server had been offline for a few weeks, and was properly shut down. Upon powering it back up, I got a “no bootable disk found” message. I determined the RAID (Hardware RAID1 – 2 mirrored drives) had been degraded, and it seemed one disk had failed. So I purchased two new identical HD’s cloned the good one, and rebuilt the RAID1. The RAID is administered by an on-board Adaptec RAID controller, and it reports the RAID as healthy.

What happens now is that it attempts to boot, but gets stuck in a loop on the last messages “Init ID c1, c2…etc” and repeats those error messages. I get the same partial boot and error sequence if I take out the RAID in BIOS, and try booting a single drive in straight SATA mode.

This machine was built circa 2007, and has Slackware Linux of that era installed, I don’t see a version number coming up on boot, so can’t provide it.

Any and all help appreciated. – Anthony

 

Advertisements

223 thoughts on “Is there a Linux specialist in the house?

  1. Virtual console issue. I don’t have time to look at it right this second (dentist appt) but there’s a bunch of stuff on google about this error message. Might be having trouble finding the virtual console device or the console fonts. Might have lost one or more files somewhere. Might have to boot off a live CD, mount the raid device and go hunting.

  2. Is the highest priority to recover the data or to get the server running again? If data recovery is most important would suggest offline recovery activities.

    • Is the highest priority to recover the data or to get the server running again? If data recovery is most important would suggest offline recovery activities.

      +1

      First rule of disaster recovery: sit on your hands and not make a bad situation worse.

      Before you do anything else, image those drives somewhere safe.

      After that, I would personally start a disaster recovery using copies of those images and a known good OS in VMWare, but that’s me.

      • And remember it’s the restore that counts.. not the backup. Make sure you don’t restore from a bad backup.

      • My reading of the story is that a new RAID was created using the one good original onto two new replacements. If so, the original is now the Golden Master backup of the data.

        IMHO, best path to recovery is to build a new Linux on a dedicated single drive, then mount the RAID to it and commence to validate the data. Slackware is a decent build, so I see no reason to change (though generally I lean to Debian, or since the systemd infestation, Devuan.

        I’m available to do it if needed. 3 hours drive away.

  3. I think what you have is a RAID puncture. It usually begins with a faulty disk. You replace the disk, the RAID is re-stripped, but the array still refuses to boot. I went through this a few years ago. A punctured RAID array is pretty much lights out. I don’t know if you use a PERC card or not. This seemed to plague those cards about a decade or so ago. I don’t work in the Linux world that much, so I don’t have any Linux tools that come to mind. Here’s a link that may be of help. When I went through this very same problem, I did an emergency P2V (physical to Virtual machine) conversion. I was lucky.

    http://www.theprojectbot.com/what-is-a-punctured-raid-array/

    • JP, if you look at the original post, it’s a RAID 1, not a RAID 5. The failure mode you described doesn’t seem possible for RAID 1.

    • RAID 1, mirrors, can be re-cloned from the last good disk. Anthony reports buying TWO new disks and cloning the old disk. With luck the old disk still exists. Using a live bootable image, you can clone the disk, partition table and everything with “dd” (disk to disk) but this is a thing terribly easy to get wrong and erase the source disk. But it is easy, reliable and every Linux has it built-in.

      I’d install a new linux and then copy the valued data from the old disk.

  4. If the RAID is in fact healthy, another thing to do is to create a bootable Linux flash drive or CD, boot from that drive, see if you can mount the volume in question. If you can mount it, copy the data to another drive. I don’t know how much you can afford, but you may want to begin thinking about taking your site to the cloud. Let the Cloud providers worry about the hardware.

    • Yes. this would be the best course. Boot from a CD or other media and fix the configuration. You can also try bringing up the system in single user mode. The messages indicate that processes forked by init are crashing, likely related to tty’s being misconfigured. Why this would occur after recovering a disk is unknown, but could be because the hard drive failed during an update which left the configuration in a bad state.

    • .

      …begin thinking about taking your site to the cloud.Let the Cloud providers worry about the hardware.

      Yeah, right. They worry about the hardware then all you have to worry about is who, what and where, the “cloud” is and whether that may evaportate at some unknown point in the further. You do realise clouds can evaporate, right?

      Firstly this is not about the WUWT site it was some climate data that he had/has backed up off line. I would imagine that the point of that is be in possession of a physical copy in case the orginal provider decides to remove it from public view. If this is a “valuable” backup then that has probably already happened.

      While making more copies “on the cloud” would not do any harm it does replace being physical possession of your own data.

      • Whenever someone writes “in the cloud” you should read “somebody else’s computer.” And there are cloud failures all the time. Bottom line: as long as your bill is paid, Amazon/Azure doesn’t give a damn if you lose your data. Read any cloud TOS & ask yourself if they have any liability whatsoever. (Hint: they don’t.)

        Anthony…the advice to get a backup right now is sound. P2v if you can. You should focus on saving your data at this point; the hardware isn’t going anywhere.

        In future…if a RAID1 goes down, just break the mirror, mount the good disk, then boot to it. Backup yr data & THEN rebuild the mirror. Trying to mirror a faulty drive can easily result in 2 bad HDs.

      • Cloud failures happen, but local failures happen a lot more often.

        Amazon cares a very great deal whether you lose your data, it’s critical to their whole business model.

        Eventually it may make no more sense to own your own data storage than your own water storage. Now, in some situations you need your own water, but for the vast majority…

      • “You do realise clouds can evaporate, right?”

        Which is why all systems, local or cloud, must have remote offsite backup and restore capabilities for ultimate safety. The only substitute for this capability is supreme confidence that nothing can seriously go wrong. Ever.

        Someone once said experience is the hardest teacher; it first administers the punishment, then the lesson.

        The hard lesson learned by most admins is that, eventually, *something* will go badly wrong.

        Lessosn from experience are often both simple and obvious, as in this case. Purely static data can be reliably transferred to a hard copy non-volatile medium which is easily stored in your bank safe-deposit box (good). Or with a trusted friend in a location geographically distant (better). Or in a salt mine vault (best).

        This simple step to provide ultimate salvage capability for critical data usually becomes obvious only after a major calamity, rather than before. Speaking from bone-headed experience, of course.

  5. Whatever is supposed to run on the virtual console (probably /sbin/login or something like it) is crashing at startup, possibly due to bit errors that the RAID controller didn’t detect. The “respawning too fast” message means that the supervisor process for them (called init, because it’s the program that the OS automatically runs at startup) is waiting in hopes that somebody will fix the problem — it isn’t smart enough to figure out that nobody can fix it in the current state.

    I’d recommend making a “LiveCD” disc or thumb drive, and boot off that. You should be able to check the remaining data, and copy it elsewhere if need be. Installing a clean OS without wiping the existing data can be done, but should be done by someone with a moderate amount of Linux experience.

    If you can’t feasibly make a live CD, you should be able to bit into single user mode by editing the kernel’s command line. Details of how to do that will depend on the bootloader that is set up.

    • I carry two bootable “thumb drives” just for this purpose. Emergency booting of a computer and then copy the disk that won’t boot onto a portable USB disk. Do it twice on different destinations; verify success of the copy. Then you can start rebuilding your system.

      • Gee… similar to my kit. A few bootable USBs of different releases, an empty TB USB disk, a stack of “live CDs” and “installation CDs”, a portable laptop pre-built dual boot to work from… The more your experience, the bigger the kit grows…

  6. If you look at that picture, it looks like the partitions are ok. So the data is most likely safe. I’d boot that computer with a usb-based live-Linux, mount the drives and copy the data to a safe place. Your best bet is to find some knowledgeable Linux-user nearby to do that. When your data is safe, you can then concentrate on fixing the problem.

  7. Before you do ANYTHING on this machine:

    From a different machine, hunt down instructions to create a bootable Slackware USB flash drive.
    Create the boot flash drive on that machine.
    Use your new USB boot drive on this server, mount the RAID, copy the data somewhere else.

    It actually ends up only being a few commands in total for this process (including possibly changing BIOS config to boot from USB), but commands will be specific to your situation.

    Only once these steps are done should you make any attempt a rectifying the issue with your server.
    For future reference, RAID is for availability, not backup.

    • That should be in bold and all-caps: RAID is for availability not backup.
      Server failure, theft, or destruction should never endanger your “irreplaceable climate data”. Although I am not a fan of “cloud storage” it makes a reasonable off-site backup of last resort.

  8. What you have here is the problem of mixing hardware “fake raid” and software raid.
    Those adaptec raid controllers are primarily setup to be used on M$ based systems.
    The linux software raid is the way to go.
    Saying that, your data is not lost, it just will take many many hours to recover. Especially if there are multiple TB of data.
    The mdadm tools are there to recover the data, I have to do this for clients once in a while, thank goodness this is not a common problem.
    There are many many forums on the interwebs to help recover this data, it is just that they are very cryptic.
    I’m sorry that I’m a little far from Chico, or I’d be over there right away.
    The most important thing to do is get a brand new Hitachi or WD HDD and dd your best drive to it, that way you are not making unrecoverable mistakes.
    For a storage server use software raid 1 with raid certified drives ( not all drives are friendly for raid ).
    The best server environment I have used is the Koozali SME server found at contribs.org.
    For just data storage use a ultra low wattage system and a name brand power supply.
    I use a ZOTAC H87-ITX board with the lowest wattage laptop processor available and an Antec power supply.
    Anything I can do from here in BC I will.

    • If it’s a mirror raid then there should be two identical copies of the data.
      I have run many varieties of raid configurations… and lost a lot of data, but never with mirroring.
      My worst data loss was with a 3 drive configuration that should have been able to rebuild except I did it with a virtual machine with soft raid and somehow lost 2 drives in quick succession.
      I spanned 2 drives once, and lost 500GB of tv shows I recorded. I have yet to find out how Battle Star Gallactica turned out (not the original series but the remake).

      • Don’t know anything about RAID drives or Linux, but I think you should feel happy you never saw how they ended Battlestar Galactica. Just keep your imagination about how they might have ended it.

  9. Don’t know if this helps but I saw this on line:

    >This happens when you are using the serial console. To get rid of those
    >messages, comment out all lines under “# TERMINALS” in /etc/inittab and
    >execute “telinit q”. There should be six lines from c1 to c6.
    This is partially right. Here are a few sequential lines on the
    /etc/inittab:


    # SERIAL CONSOLE
    c0:12345:respawn:/sbin/agetty 9600 ttyS0 vt100

    # TERMINALS
    c1:12345:respawn:/sbin/agetty 38400 tty1 linux
    c2:12345:respawn:/sbin/agetty 38400 tty2 linux
    c3:12345:respawn:/sbin/agetty 38400 tty3 linux
    c4:12345:respawn:/sbin/agetty 38400 tty4 linux
    c5:12345:respawn:/sbin/agetty 38400 tty5 linux
    c6:12345:respawn:/sbin/agetty 38400 tty6 linux


    So, if I change the above to this,

    # SERIAL CONSOLE
    #c0:12345:respawn:/sbin/agetty 9600 ttyS0 vt100

    # TERMINALS
    c1:12345:respawn:/sbin/agetty 38400 tty1 linux
    c2:12345:respawn:/sbin/agetty 38400 tty2 linux
    c3:12345:respawn:/sbin/agetty 38400 tty3 linux
    c4:12345:respawn:/sbin/agetty 38400 tty4 linux
    c5:12345:respawn:/sbin/agetty 38400 tty5 linux
    c6:12345:respawn:/sbin/agetty 38400 tty6 linux

    it works. But if I change it to this,

    # SERIAL CONSOLE
    c0:12345:respawn:/sbin/agetty 9600 ttyS0 vt100

    # TERMINALS
    #c1:12345:respawn:/sbin/agetty 38400 tty1 linux
    #c2:12345:respawn:/sbin/agetty 38400 tty2 linux
    #c3:12345:respawn:/sbin/agetty 38400 tty3 linux
    #c4:12345:respawn:/sbin/agetty 38400 tty4 linux
    #c5:12345:respawn:/sbin/agetty 38400 tty5 linux
    #c6:12345:respawn:/sbin/agetty 38400 tty6 linux

    as you suggested, it does not work. I still get the same message and I get
    more error messages. So, the answer, at least in my sparc64, is to comment
    the serial console that is giving you problems. Ie.

    # SERIAL CONSOLE
    #c0:12345:respawn:/sbin/agetty 9600 ttyS0 vt100

  10. At the end of the day get Centos installed, it’s a production grade OS that while behind the curve of the latest releases it is bullet proof for production and they do timely security updates.
    FYI: I had a mirrored RAID setup on a production server once(Novell), when the controller got flaky both drives were hosed. After spending three days rebuilding I discovered the issue and had to restore for the day before backup, lost 8 hours of 26 editors work. Moved to RAID 5 with 6 hot-plug and play drives in external box.
    Now I use USB Thumb drives for backup and long term storage, all my tape units and DVD optical crap is history. Hint: Tape backup is just that, don’t try to restore it two years down the road! DVD’s are close behind due to mechanical and media issues.

      • Why not? 7 years ago I would agree with you but the latest USB 3.x drives are bulletproof. By running read/write diagnostics for a few hours to test the new drives before using them for long term archival storage. The MTF for mechanical drives is several times lower than NAND drives.

      • The electric charge on SSD, SD cards and Thumb Drives very slowly leaks away. You don’t notice in normal use since simply plugging them in refreshes the charge. BUT, put one in a safe deposit box for a few years and you can come back to find the data simply leaked away, or that in the early stages some bits have changed from 1 to 0.

        IFF using those NAND storage systems (or similar) for backups, plug them in to a live system once a year to refresh the charge.

    • Oracle also gives their RedHat variant away, along with their VM server. All commercial grade. You just have to pay for support if you need it. But the software is all free.

  11. Like others have said, use a live CD to boot and then copy the data to a different device. After that, why not just reinstall a modern Linux version (or other OS) from scratch – you don’t want to tell the world that you are running an ancient Linux version that likely doesn’t get any security fixes any more.

  12. If you don’t get enough of an answer here, I might suggest contacting frequent contributor E. M. Smith, “Chiefio”. He has a clue or two about Linux.
    It looks like you’ve already been steered in the right direction.

    • Thanks for the props.

      Yes, I do Linux for a living and have since day one (I.e. started *nix on BSD pre Linux).

      Anthony, I’m available for free if desired and can bring a bag of kit including spare USB disk, thumb drives of several flavors, laptop with Linux and more.

      • Hehe, I popped my cherry on Sys III, but spent a big chunk of the last 20 year using windows, I now have a VM Server that’s all on linux, and a bunch of linux vm’s I run as needed.

        Did you ever have to drill coax to put an ethernet tap in? I hated that, it seemed so cheesy lol. I hope he guy who put it all in a box got very wealthy, he did a great service for the world.

      • “Did you ever have to drill coax to put an ethernet tap in?”

        I came into thicknet right at the tail end. We had one wall covered in figure-8’s of thicknet in order to attach the server farm. You can only attach at equidistant marked spots on the coax to avoid standing waves. These taps were called “AUI” if I remember right.

      • They were at qtr wavelength for 100Mhz, so about 3 m, never had to wire a server room, what a mess that had to be, but I see how you got to figure 8’s lol
        Most of mine were just on coils, or around a few cubicles. They were on First generation electronics design workstations, 68020 running Bsd. The story went around the founders knew the Sun guys from Berkley I presume, but gave them the design which they turned into the Sun3. Which Valid switched to instead of continuing to build their own workstations. Fun times.

      • Yes. Don’t remind me. (Raised ceiling over engineering desktops… DEC gear… At Apple in the ’80s…)

        Nothing like panel dust in your face and shirt, balanced on a ladder, drilling an unsupported round thing (using a jig) while folks ask why you aren’t done yet to make you wonder why folks think computer work is all desk work and comfortable…

  13. You need to comment out entries 1-6 in /etc/inittab as Bear said above. Generally this error is caused by keymappings in rc.conf passing unwanted keystrokes to a virtual terminal. (That’s the “console” error you see in the middle). Editing the init file should remove those possible console connections. Was this machine originally setup without a monitor and only logged in remotely? I’ve seen this when older machine are decommissioned and login is attempted locally instead of remotely. You may need to boot to a live instance to edit the files to get it to boot, or…. just save you data and set up something more modern ;)

    • By the way, you should be able to boot with a recent version of linux and mount the drives to copy/save the data. Do you need the old version running for any particular reason?
      What is the status now?

  14. Quick question: How much data and is it compressed? Just looked at a 1TB SSD drive for online read storage, long term bullet proof stuff, not so good for daily production but for READ-ONLY great stuff.

  15. I’m not the Linux expert you’re looking for, but let me pretty much agree with everyone else. Yes, the symptom (not necessarily the underlying problem) is with your virtual consoles. It’s quite possible that most, maybe all, your data is intact, so your top priority probably should be not to inadvertantly destroy it. What I’d probably do is buy yet more drives, clone the ones you have, verify that the clones do the same thing as the current disks, then try to recover your data from them using a single “disk” linux system — usb stick, cdrom, linux installed in a spare partition if you have one.

    Don’t overlook the possibility that your RAID controller, CPU, or memory might be failing in some weird fashion. That’s probably not the case, but if it is, it’s possible that nothing you try will quite work or make sense.

    I also agree with PSU-EMS-Alum that all things being equal Slackware might be a good choice for the single disk system. It tends to be the most “plain vanilla” unix system around which means various cures/analytic procedures you find on the internet may well work. And my experience in the past is that the is intelligent life at http://www.linuxquestions.org/questions/forumdisplay.php?forumid=14 — something that is not true of all Linux distribution support forums.

    • I’ve used most versions of Linux. Above someone suggested Centos. It is good, but stodgy, and is now systemd for init so LOTS of conversion work for any customizations on the Slackware (that is BSD like rc.d init).

      For ease of update, I’d likely keep it Slackware and update to current as a fresh install, then copy over any customizations and last reinstall the data disks.

      On a second machine I’d be doing the disk cleaning and prooving up. Any Linux ought to do for that, as the file system is likely ext3. (Says ext2 for root but the RAID might vary)

      Last on my list would be a full on conversion to a different OS with a new init system and all… and only if there was some good reason to do it. Arguing over your favorite flavor of Linux is not a good reason…

  16. Another thing to do is run the dmesg command as root and post the output here, as it is, we can’t tell too much of what’s going on, with dmesg output, we could possibly trace down the errors.

  17. I’m a Linux sysadmin. If I were you, I wouldn’t trust half of the answers you’re getting.

    And for that matter, there’s no reason for you to trust me either: ANY jackass can claim to be a Linux expert.

    Call Tony Heller. He’s someone you know, and I’m betting he knows his Linux.

      • There are at least 4 main types of Linux, and countless variations. The details of how to do things vary widely, few people have experience outside their favored flavor, and a given error message can have several causes so folks will recommend the one fix they did in their case while YMMV.

        Then there are huge personal style issues…

      • I learned Unix on my own, so I learned what I learned, but I always liked working with other admins, because many did they same, but learned different things that I did, I tried to snatch up all I could.

        And yes, this is measure 64 times, then measure another couple, then cut situation.

        I would set those drives to the side, build a new system, get it all working. Then mount a copy of your drive on that, and figure out what’s wrong.

  18. I use Linux but my Linux expert keeps it healthy for me. His suggestion follows:

    “Looks like he lost and|or corrupted data when his RAID failed. His trouble is with setting up the virtual terminals. Needs to look at /etc/inittab (not too familiar, his system is running very obsolete version of Linux) He will probably need to use a LiveCD session to fix. May have to comment out the offending init lines to get it to boot and then reinstall the associated visual terminal package that is broken some getty package needed for login.”

    Good luck, I hope you can save your data!

    PMK

  19. Try what Bear and Dougmanxx suggest. Of all the suggestions, that is simplest, with no possible adverse effects on the data. However, it assumes you can boot into a level allowing you to edit that /etc/inittab file. Perhaps someone can coach that aspect?

  20. You are going to need a specialist. I was a linux user for about 6 years, got sick of the unfriendly command line, and lack of hardware support, dumped linux and Went to windows 10 late last year.

  21. This incident points up the importance of keeping an independent offline or nearline backup of “irreplaceable” data. LTO tape is one popular and inexpensive approach.

    • It’s been a decade, maybe two since I had to worry about it, and the world was simpler back then. But if the issue is just inittab (why would inittab and only inittab be bad?) isn’t there such a thing as single user mode? I vaguely think it might be usable to edit inittab.

      I’d still make copies of the disk(s) before I did anything else.

      • Your server shut down normally, was offline for several weeks, and crapped on reboot.
        My first thought is that your onboard cmos battery died and the bios lost the boot settings.
        If that is the case, your drives and data are fine. Replace the battery on the motherboard, reboot and press “delete” key repeatedly until the bios menu comes up. Depending on motherboard, might be F8 or another key to enter bios setup mode.

        Read the Adaptec pages here to find out the bios settings

        http://www.supermicro.com/manuals/other/RAID_SATA_Adaptec_ESB_ICH7R_ICH9R.pdf

        Yes, it could be a drive failure. But the circumstances you describe point more towards a bios setting issue.

        When time permits, please consider a ZFS file system using a FreeBSD Linux base system on a supermicro motherboard w/ECC memory. much more fault tolerant and more reliable. Future stuff, for sure.
        See: https://www.howtogeek.com/175159/an-introduction-to-the-z-file-system-zfs-for-linux/

  22. I’m a data recovery expert. (See me on Linked-In). Tons of UNIX/Linux usage over the last 30+ years.

    Step one, take a few deep breaths.

    The odds are very high that 100% of your data is recoverable. That is until you destroy it in the process of trying to recover it.

    Just go slow and work with copies of the drive(s), not the originals.

    If you want more advice, connect up to me on LinkedIn and send me a message.

    • +1

      I’m not anywhere near as technical, but I’ve been selling backup and disaster recovery systems to corporations for many years. Greg’s advice (and that of others) is spot on. Go after the data first. You don’t really care if you EVER fix the boot issue if you have the data and can load it on a new O/S image. Go after the data from the cloned drive because destroying data by trying to save it happens way more often than one would think (see Greg’s remark above and read it over three or four times while breathing deeply). This leaves the original intact if you accidentally blow it way on the clone.

      Now for Step 2 which Greg didn’t address. The moment you have the data recovered, do not, I repeat, DO NOT go merrily on your way getting the old server running or even a new one. Repeat, DO NOT. Instead, make a G** D*** copy of the F***ing data on something that can go off site. Cloud, disk drive, tape cartridge, pick one. Then do the same for ALL YOUR OTHER COMPUTERS. Take the copy home in the trunk of your car and store in the closet if need be, if your business location has a fire or a flood or theft, no amount of technical expertise will help you. Copy the data, get it off site. Backup, backup often, off site. First rule of preparing for disaster recovery.

  23. For what it could be worth.

    I do not know much about servers, but I have my own computers…..and I had to give up on an old one, few weeks ago, and go for a new one due to a lot of virus attacks.

    I could very easy have recovered my old hardware, but still thought at that stage it was not worth it, and as things stands I do not regret it at this point….

    My old hardware, regardless of my effort, in the end suffered due to a simple but very effective virus, that always manged to cripple the functioning of the CD-drive and the USB drives and the external control from mouse and keyboard…
    Regardless of the antivirus software installed, even the specific one against such a virus, I still had to do a lot to keep the hardware respond to the mouse and the keyboard and be under “exceptional control” …. and at the same time be happy with a partial usage of the CD-drive and the USB’s…..
    The virus infecting that old hard-drive came from an infected USB-stick….In a moment that I got lazy and careless.

    Strangely enough, a lot of virus attacks happening with my new hardware, including the one that crippled my old comp.
    While I am not in the same pattern of the usage with my new hard-ware and not yet have inserted on it a USB-stick…

    My new comp is warning me to install and use the same antivirus software that I had previously in my old Hard-ware…..strange indeed….but the point I am trying a make is, that some times viruses or malware can be very effective in the crippling of the basic functions of the access to the computer or the server without damaging actually the data and the hardware itself in it……
    While in the same time a rush to fix it may resolve in a greater damage to the hard-ware and the data in it.

    Not rushing and taking the necessary time needed is essential in these cases I think…

    cheers

  24. To stay on topic – I hope – for a longer term fix… yes Centos is a good production system, but what you might want to consider is getting a hardware controller appliance. It will cost between $350 to $700 (there are more expensive but those should work, maybe better prices if you can get to Fry’s or go online) and you can have the appliance do a lot of the error checking and preventive management.

    This allows other system to access the drives through a network, some allow hot swaps for the individual drives, you get error reporting and an operating system that is entirely separate from the drives themselves. This is a fixed cost model that has higher upfront but lower ongoing costs than a cloud storage option that will keep sending a bill for as long as you want to keep the data.

  25. I would suggest downloading a copy of Knoppix on another computer, burning a CD or DVD and booting the affected computer from that. There is a good chance you might be able to read and recover the data that way. Knoppix mounts partitions readonly by default, which is the best approach when dealing which what might be a corrupted and fragile partition.

    If it proves necessary to run data recovery tools then always do that on a copy of the original.That way if it makes matters, worse you still have the original.

    http://knoppix.net

    Incidentally at the boot prompt I usually type ‘knoppix no3d’ to stop the compiz effects from loading. Might be a matter of opinion but I just find them a nuisance.

  26. There are several pieces of good advice in earlier comments.

    * I’d be talking with gregfreemyer right about now. His advice is good, and if he will work with you, great.

    * Michael 2’s advice to copy off disk images using dd is excellent. Do that first. Make damned sure you get the if= and of= parameters correct. :-)

    A few other thoughts…

    Take your time.

    You may be able to boot to a modern data recovery CD. I use finnix (https://www.finnix.org/), but gregfreemyer may have a better recommendation.

    It looks like your system uses lvm version 1.0.8 (2003). I have 2.02. After 14 years, it is possible that modern lvm no longer supports your disks. Building a boot CD from your slackware distribution might be the way to go.

    You might also look at rdd to see if you can recover the failed drive. That will depend on what exactly failed in the drive.

    There are user communities for all the software you need; feel free to ask for help.

    Once the crisis is over….

    A more recent distribution of Linux may be in order, as smalliot suggested. Centos is good. I use debian stable. Update things a bit more often than every 14 years.

    Start doing backups. I use amanda, others can recommend other products. amanda is an industrial grade backup system which can be a bear to configure. Once you get it set up it runs forever and degrades gracefully. I back up to virtual tapes (files on disk) and copy to external USB drives for off-site backup.

    Good luck!

  27. BTW, be very careful with the dd command, if you get the parameters wrong you can nuke the data.

  28. I haven’t had to delve into a problem like this on Linux, but my take on things is that the RAID array is fine, but the kernel is confused about where it can have you login.

    Do you normally login from a “login: ” on a mostly bare screen like the boot display or does the system start a graphics/window server?

    I think the best comments are from Bear at https://wattsupwiththat.com/2017/03/22/is-there-a-linux-specialist-in-the-house/comment-page-1/#comment-2457854 and CEH at https://wattsupwiththat.com/2017/03/22/is-there-a-linux-specialist-in-the-house/comment-page-1/#comment-2457919

    Beyond that, I’m not sure where I’d start. Is there a chance that you really do normally login on to the beast from some terminal line (e.g. for remote access) and that the cable is loose or fell out while you were working with the replacement disk drives?

      • Sigh. Then the console stuff could be a red herring, and you may have been getting those messages ever since you first set things up.

        I haven’t maintained RAID anything, but I don’t see anything alarming about the boot time messages about it.

        So it could be from some boot time program that hasn’t printed anything yet. I might have used Slackware, briefly, 20 years ago, so I won’t hazard another guess now.

        Windows and Linux are both a pain in the butt to administer. It just that they are very different pains….

        The simplest thing at this point, I guess, is the rescue CD approach. I used Knoppix a decade or so ago and thought highly of it then, especially for looking at virus infected windows systems. I’d expect new distribution (umm, DVD) installation disks would be able to make sense of the file system. If you can boot from a USB stick, that’s probably the way to go, though I haven’t done that myself.

      • The agetty entries in question are not for physical terminals connected with physical serial cables. They’re for virtual terminals that you can access with alt-f1, alt-f2, etc. If your agetty binary is missing or damaged you won’t get a login: prompt at the console because that’s just virtual terminal number 1.

        https://luv.asn.au/overheads/virtualconsoles.html

  29. A quick google says that those messages are coming from the alternate consoles, so the good news is that it doesn’t look like there is anything wrong with the disks. Are you sure everything is /exactly/ the same as when it was powered off (minus the drive)? Since this is a ~2007 machine, is a PS/2 keyboard attached? I see some talk that this might happen if your network cable isn’t attached (just the messenger here). As been suggested, if you just want to recover the data, it’s probably easiest to boot from a live image DVD and copy the data to another HD. From the looks of things, you’ll probably have to do this anyway and edit /etc/inittab to get it to stop spawning those processes so it will get further in the boot sequence. Be happy to help if you don’t mind remote help…

  30. I can write in more detail later. Interesting you say the mirroring is done on HW raid because the boot messages clearly indicate software raid with the “md” (metadevice) subsystem. The boot messages identify two SCSI disks, which in my experience it doesn’t do if hardware RAID were happening at the adapter level.

    You will need to boot a recovery system from CD/DVD/USB depending on your hardware (which looks kind of old, so I suspect we’re talking CD here). Try here: http://www.system-rescue-cd.org/

    The good news is the recovery system does not need to be slackware, as long as it supports the software raid and LVM. This will get you a working system and hopefully able to access your disk and its filesystems. You then at least have access to the data and can copy it off somewhere safe while you work on fixing the problem.

    I can write more later.

    • The boot messages identify two SCSI disks, which in my experience it doesn’t do if hardware RAID were happening at the adapter level.

      I in general agree. But I have run into (twice in my life) systems where h/w raid was enabled, the volume was logically partitioned, and then s/w raid run across the partitions. Long time ago, don’t remember all the details other than tech guys running around with hair on fire screaming “WTF why would anyone do that?” when the system went south.

      • I’ve administered linux since the 1.2.3 kernel. Started an ISP in 1996 on slackware then to redhat. This fake raid card is probably the problem. The hardware, works,,,, but,,,
        To get anywhere boot in single user mode to do your checks,
        cat /proc/mdstat will tell you what is happening with the array.
        Fake raid is the worst of confounding situations, the worst.
        3ware cards are full hardware raid and this issue would never be noticed or happened to the linux system.
        Software raid repairs well. Fake raid will take a while, connect the drive to an onboard sata port to see if it will boot from there. If it does, all the better, fake raid was not the problem. If it does not boot from a bog standard port then it is an order of magnitude of a bigger problem.
        Your problem is happening when the system is trying to go to runlevel 3, multiuser without the display manager. this is so the init occurs with multiple terminals and uses multiple threads efficiently. This is probably just a red herring as to the real problem.
        keep to runlevel 1, read up on raid recovery, make a new system, use only software raid, transfer the data and software.
        You can repair this os, Linux is 100% repairable, if you have buckets of time. $600 gets you a very good basic server. How much is your time worth?

  31. I think you banned the one guy who could really help you. Could be wrong by Tony Heller never comments here ??

    • Tony Heller is not banned as a commenter, we just don’t carry articles by him anymore when he refused to admit and correct a simple mistake about CO2 vapor pressure and freezing out of the atmosphere in Antarctica.

    • I believe Tony was/is primarily a microprocessor designer. That doesn’t necessarily mean he’d be good at fighting with recalcitrant Linux systems. Heck, my file system and other experience isn’t being much help….

      Nor handle a CO2 frost brouhaha well. I still have scars from it.

  32. This is a virtual console issue that happens when it can not find font files or any number of other issues. I would be interested to know what the /etc/inittab file looks like. Way to find that would be to boot from a bootable CD or flash drive then mount the system hard drive on /mnt and then have a look at what /mnt/etc/inittab looks like. In particular, I would be interested in entries that look something like:

    # TERMINALS
    c0:12345:respawn:/sbin/agetty 38400 vc/0 linux
    c1:12345:respawn:/sbin/agetty 38400 vc/1 linux
    c2:12345:respawn:/sbin/agetty 38400 vc/2 linux
    c3:12345:respawn:/sbin/agetty 38400 vc/3 linux
    c4:12345:respawn:/sbin/agetty 38400 vc/4 linux
    c5:12345:respawn:/sbin/agetty 38400 vc/5 linux
    c6:12345:respawn:/sbin/agetty 38400 vc/7 linux

    I would see if I can edit that file and comment out the entries for c1 through c6 by inserting a “#” char at the start of the line, save the file, remove your boot media and attempt to reboot from the drive.

    It does look like it loads the c0 device, though which might be tty0 rather than vc/0

    What is happening is that it is attempting to initialize the console devices c1 through c6 and that is failing. c0 does appear to be initializing, though, which might be a serial console if c0 says its device is a tty device on your initab. So connecting the serial port (if it has one) to another machine’s serial port with a crossover cable might allow you to terminal in (if you know the baud rate, etc) and get a login prompt.

    Your /etc/inittab might also have something like this:

    # SERIAL CONSOLES
    #s0:12345:respawn:/sbin/agetty 9600 ttyS0 vt100
    #s1:12345:respawn:/sbin/agetty 9600 ttyS1 vt100

    if they are not commented out (as this example IS commented out) you might have a 9600 baud login on a serial port available to you.

    • That was my first thought after looking at the logs.
      It looks like it’s happening during the start of going to multi-user mode.

      Since you have console access, can you boot single-user?
      And carefully look around and see what’s mounted, how it’s mounted?

      The other thing that stuck out is that it said it’s mounted your root fs read-only,
      and it’s an ext2 filesystem. Hoping that your root filesystem really is not ext2,
      and really is ext4 (or at least ext3), and it maybe mounted it as ext2 because
      there’s a problem with the fs journal.

      But that’s just for diagnosing.

      As others have written, protect that original good drive and carefully clone it.

  33. I have never used Linux but I have had many years in computers. The question is always what changed and that would be the new disk drives. What I suspect is something about the new hardware is causing issues in the 2007 software. A possible solution would be to update Linux on the cloned drives to get the latest I/O drivers that may recognize the newer drives.

    A secondary possibility is both original drives were corrupted as the result of the drive failure. If so, you have a massive rebuild a head of you unless you have backups you can recover from.

    • I purchases IDENTICAL drives. Same make, model, LBA and sectors…thats not the problem. I was well aware of how unrealistically sensitive Linux is to drive hardware.

      • Unfortunately they may not be identical. Much of the modern hardware uses firmware to control the function of the device. The manufacture could have had an issue with the product and corrected/changed it in the firmware resulting in the device to function differently than the older product. Unless you purchased the drives with the same lot number, it’s a possibility.

        I have spent years programming software at the hardware level and every so often manufacturing would start having problems with something that was working fine. A little digging around would turn up a part with different functionality than the original part. Fortunately we produced our own drivers and controlled the hardware design so once we figured out what was going on, we could come up with a fix. It’s a good deal more complicated when you don’t know whats going on in the hardware or software.

      • The RAID is administered by an on-board Adaptec RAID controller, and it reports the RAID as healthy.

        Linux ( the OS ) will not need to know about LBA and sector details. If anything is sensitive to that is will be RAID firmware which is below the level of the OS. The drivers in the Linux kernel will have to interact with Adaptec controller.

        It is a wise choice to buy identical drives if replacing half a RAID mirror, though not essential within certain limitations.

        If you make hardware changes which lead to bios reassigning disk order the OS will need to know about those changes. That is not “unrealistically sensitive”.

        It is quite feasible to clone one entire partitions scheme from one HD over to another larger one to plug into the same hole ( eg SCSI connector ) and reboot without the OS having to know anything about it. The new disk can be different make , capacity, LBS, SCSI version.

        If that Slackware is really 2007 vintage and not maintained, moving to a newer Linux of any flavour will change any references to IDE drives from /dev/hda to /dev/sda. If there is only the SCSI RAID devices, as it appears, this will not be an issue.

  34. I’m not going to offer a detailed solution but my suggestion from over 45 years working with,designing and building computer and as a former computer shop owner with a service department. Most of my family also use Linux.

    First rule: “Do no harm”. Turn it off and keep it turned off until you find the right technician you can get it too.. They are scarce. Your problem may not even be software. With older machines components failure occurs in so many ways and IC’s not infrequently slowly die in stages. Not knowing yet exactly what the fault or possible combination of faults allowing the machine to be powered on can cause an even worse situation. Expertise in software is not necessarily sufficient when age and particular hardware is considered. A machine with a component in death throes can go from able to recover to impossible over a short time.
    It may be software but you need someone skilled overall with this. One blessing Linux is robust and provided the read/write subsystem of the hard drives has not physically screwed up critical data your basic data is most likely to be intact. If so putting the disk in another machine for cloning and then attempting to copy the appropriate partitions is likely to be successful.

  35. Not sure which Kernel version you are running, but there have been bugs introduced before that caused this in the older 2.6.31 kernel. When you boot up, if you have a grub menu with a previous kernel version, try booting from that instead and see if it works.

      • Time. Simply time Anthony. Sleep on it. Listen to the advice here and sleep on it. It’s the data that’s important nothing else.I’ve spent ages trying to recover data, and went back a couple of weeks later and couldn’t understand what the problem was – and the data flowed out. There’s no rush. Working fast isn’t the answer. Time is the answer. Keep the original drives. Recopy them with dd in a few days with a new RAID 1 setup. oh. Good luck. It’s all down to luck ;)

      • Whenever I’m facing some new disaster, I’ve found one of the best first steps is to set down, think black thoughts about computers in general, turn my back on the problem, and go for a walk, maybe to a local pizza place and think more black thoughts there. Then figure out what to do.

        Brahms’ German Requiem is good music to debug an OS crash to. I’m not sure what’s good for broken hardware. Mahler’s Resurrection Symphony might be good, but you need to sync the fix to the problem with the climax of the last movement. :-)

  36. I’ll keep my fingers crossed that this mess ends well. I lost data on a HDD a while ago and it was a bloody pain in the a*ss to get it recovered.

  37. Copy the drive contents and then take the old one and throw it into a dumpster. Before you throw it in, though, write “Tr*mp Ru*ssian Sekrets” (use the letter ‘k’ so it looks authentic, maybe write it backwards, too) on the external case of the drive and then call the New York Times with an anonymous tip. Bingo! The drive will be recovered and copied in no time all over the internet for easy access by you later. But I can’t guarantee that the content won’t be ummm, adjusted if some climate “scientists” are involved somewhere along the way.

  38. I have worked with, built, sold,serviced, designed etc computers for 45 years. Also I am a intermediate Linux user.
    Much good advice here but much ignores first rule of service: “Do no harm”. It is not what you think might work is important but the certainty that the data will not be further compromised. So turn crook machine off and do not attempt any recovery for something so important meanwhile.

    I am also concerned with age of software and possibly hardware. Computers at all levels are never static as subtle upgrades are constant and so are internal conflicts arising, timing states etc.. These conflicts are not always significant or evident until catastrophic failure. A generation of hardware for a system builder is only 3 months–It’s been like that for 30 years (Look at the version numbers on parts..)

    In my opinion you need an expert tech in older machines as well as a Linux one. A rebuild is probably called for.

    Good news: unless the file system has been damaged at the physical read/write level your data is probably intact. Keeping the machine unpowered is your best protection at this time while you find the right tech.

    Another hazard is a failing component. They can do the oddest things while dying and recovery may be possible at an early stage and impossible even only minutes later. The more your restart attempts the greater the risk.

    Best regards for your fine work.

  39. Do you still have the original drives. It may be that one went out of sync before the other.

  40. Anthony, It looks like you have lots of help. I’ll just bow out of this discussion, but let me caution you against a few common ways to screw up the recovery.

    1. If at all possible, work on a copy of the disk, not the original

    2. If you copy with dd (bit for bit copy) check the “of=” parameter several times. Take a short break then check it again. It **MUST NOT** be the source device. You can probably survive botching the “if” parameter, but not “of=/dev/whatever”

    3. If someone tells you that you need to run cfdisk, sfdisk, fdisk, gparted, or parted, think about it long and hard. If you screw up your partitioning, your data is gone.

    4. If you copy using file system tools cp, cpio, tar, etc (pretty much anything but dd) you will need to mount your source and destination devices. remember to unmount them when you are finished. If you don’t, buffers may not be flushed and you may lose data.

    … And the unmount command is umount, not unmount

    … And if you can’t umount, it is probably because you have moved yourself into the mounted file system with cd for some reason. It’s OK to do that, but you can’t umount until you move out of it.

    5. Oh yes, if the amount of data you need to save isn’t enormous, you can back up occasionally to a usb flash memory stick. Devices up to 64gb are pretty cheap. It’s slow, but you can wander off and do goodly works elsewhere. Use a stick with an LED indicator and don’t pull it out ’til it quits flashing and the umount command exits. BTW, once you have written a backup, you can write subsequent backups of the same filesystem to the same device much more quickly with rsync. In your case, I’d do two or three backup sticks per server, rotate them, and backup every few days or weeks depending on how many days of data you’re willing to risk losing.

  41. I recommend data recovery using a bootable CD, a live Linux disk that handles disk arrays and LVM. We use Fedora, Centos, FreeNAS. After data recovery replace the computer with a NAS box.
    You could either put FreeNAS on generic hardware or get a commercial NAS box. We recommend Synology. They run well and keep their Operating Systems up to date.
    FreeNAS loads the operating system from either USB or SSD read only in RAM and it has very good self maintenance hard drive tools.
    Best regards and I hope you are up and running soon.

  42. Can you boot the computer into single user mode? Interrupt the boot process and probably type something like boot s at the prompt. It looks like the disks are good now (at least at this point in the boot process) based on the lack of failure messages on your boot screen. Sorry I don’t have direct expertise on this version, but in general before you go off into data recovery land you try to boot single user.

  43. I had something similar happen to a friends old computer. After he had it unplugged for a few hours during a house move, when he started it back up again, he got a message about not finding the Raid drive…I, went into his BIAS setup and found that the BIAS settings had been reset to Factory Default…The BIAS date was set back to 2004 !!! This was caused by the death of the internal BIAS battery.. I replaced the battery and then reset ONLY the BIAS date to present date and it rebooted fine…Some adjustments later to get everything updated, but the hard drives did work…(and it only cost $1.00 for the battery…LOL)

    • The above is a great story and great advice, but I find it hard to believe Anthony got as far as this without noticing the clock was reset in the BIOS!

  44. Another one for you Anthony, according to this http://kerberix.com/cgi-bin/howto.pl?artikel=artikelen/howto/solve_the_init_io-error.html
    your problem does not necessarily have something to do with the physical console settings but instead with a diskrelated hardware problem. /dev/console resides on a disk and if you are getting corrupt data from that disk you may end up with I/O error for /dev/console.
    I would:
    1. boot from CD/USB.
    2. Rescue the valuable data to some external drive.
    3. Install a spanking brand new Linux system.

    • Meh. The only information behind the file /dev/console is a “dev_t” with a major and minor number:

      sl:vantage$ ls -l /dev/console
      crw——- 1 root root 5, 1 Mar 22 14:10 /dev/console
      The “5” identifies which device driver to use, in this case, the 6th loaded device driver. Your mileage will vary. The minor number is passed to the device driver to tell it which thing to work on, in this case, thing 1. That value probably matches yours.

      As for “vantage” – that’s because I had to change the battery on my old Vantage Pro weather station sensor unit and apparently manipulated the connector for the wind vane, so it’s be displaying “north” since Saturday.

      That may have been the day my SSD main disk disappeared electrically, even reset didn’t bring it back, but cycling power did. I guess it’s a bad week for electronics. I might install a new Linux Mint on the previous computer, but it’s motherboard beeps out “No RAM.” May be time for two SSDs, a new motherboard, bigger RAM, and probably a bigger display. I set that system up in 2006….

      • Well, that was pretty painless. Mostly out of curiosity (and an urge to have a working backup system), I found my old system had a couple disk partitions on a second (160 MB) disk sized for a swap and root partition. I managed to install LinuxMint on that root-sized partition without wiping out others, and seem to have LinuxMint’s Mate running now.

        This will be it’s first post to WUWT. I’ll get a SSD drive and do it all again. In the meantime, I can use a current Linux distro for firefox, YouTube, etc. It would also help if the DVD drive opened reliably. :-)

        I might even try a new keyboard instead of this DEC LK471-AA keyboard with PS-2 connector.

        Oh, the RAM problem went away when I removed the second RAM card. 2 GB ought to be enough for anyone.

  45. Anthony Watts March 22, 2017 at 12:30 pm
    We have the data backed up, but the program to administer it runs on Linux….

    I wanted to call this comment out because it completely changes the problem. Most people in the thread are recommending in regard to data recovery which based on this isn’t exactly the problem since it is backed up. The problem appears to be that the application itself is custom and NOT backed up. Worse, my experience with systems of this type and age is that a lot of apps are riddled with O/S calls and won’t run even if you have the application code because you need the exact o/s variant that contains those exact calls (and other things) to run it on.

    Over my head at this point, but those techies still following along, sounds to me like restoring the O/S is probably required, or isolating the application code, running it on a good known machine, and then dealing with any errors reported to reverse engineer something that works. Short version: Nightmare.

  46. I, like many others here, could probably do something if we were sitting in front of the computer. The difficulty here is that I don’t know what it is supposed to do, and what your end goal is. Those errors might be perfectly normal given the hardware, but I don’t know what it did before this. Did you ever use it with a screen before or did it run entirely over a network?
    Given that data is generally more valuable than configuration, I would boot into a rescue cd, mount the disks and recover the data, then set up the machine from scratch with a brand new os.
    For something like this I would spend the time setting up something like puppet so that you can have a new machine running with almost zero intervention in under an hour.
    I would also put the data on a separate partition so that the data and the os itself are not tied together.

  47. my2p
    understand have OS (don’t think slackware is supported anymore ) and data on 5 disks in a RAID, dunno much about how RAID works. You have a backup of the data but the software that handles the data is on the RAID ?
    There is a possibility that something on the motherboard has failed. This happened to me a couple of weeks ago on a firewall, mate replaced motherboard and it works again.
    OSs are a doddle to install these days.
    I think you can have the data on a RAID machine and the OS on another.
    as others have said make a liveCD bootable USB drive (there’s a fedora based one ) with something on it like systemrescueCD on another pc.
    make a copy of the disks however you do that ( to one disk ? )
    remove the disks, put in a new blank disc and boot with the USB stick and do some diagnostics before doing anything else.
    not sure of the pros and cons arrays V daily backups but I don’t do anything critical.

      • I actually ran my own email servers for many years, including one at home, but keeping up with the SPAM filters was a pain. About 15? maybe only 10… years ago I moved to an AOL address for public things. They have a decent enough SPAM filter and the address itself keeps a lot of folks from sending me yet more email ;-)

        I’ve used many others over the years. At least a half dozen. Nothing is ideal.

  48. More time to write.

    There are a bunch of things that can cause Linux boot failures. The key question is whether the RAID+LVM+filesystem structures are intact. The best way to do that is boot off a recovery CD and see whether it detects and activates your volumes. If it does, then the problem is in the /boot filesystem.

    I still have questions about hardware RAID in the adapter vs. software raid (“md” or metadevice). The boot screen sure looks to me like there is “md” software raid. The RAID controllers I use don’t allow the OS to even see the physical volumes, unless they use a special driver specific to that RAID controller. Instead the OS-level SCSI driver sees the logical volumes the hardware RAID controller presents to it, emulating a common SCSI adapter.

    It looks to me like your two disks each have two partitions, a common Linux configuration. sda1 and sdb1 are usually small (500 MB or less), joined in a mirror and formatted as an ext2 filesystem mounted as “/boot”. This is where the second stage boot loader lives, the kernel image, and the initial RAM disk. The other to partitions sda2 and sdb2 occupy the rest of the physical disks and are also usually combined into a mirror and handed over to LVM as a single physical volume which is then carved up into “chunks” usually 32 MB or so, and allocated as needed to create additional filesystems.

    I don’t see in the boot messages that the LVM layer activated any volumes, so it looks like the failure is happening before that point, or the LVM structures are not intact, which would explain everything.

    You need a working system booted from a recovery CD to poke around and find out exactly what is wrong.

  49. It looks like you are getting some good help, not the least of which is that it is easier if sitting at the machine. I would like to mention a few things that might be useful. It looks like your disks are good enough to start running init and get to runlevel 3, at which point you have trouble running getty. You can probably get a login by intercepting the boot loader, probably grub but it might be lilo in this system, and using the line “init=/bin/bash”. This will run a shell instead of init and you will get your / mounted read-only but you will be root. Depending on your partition scheme you might need to remount / and mount /usr to get the executables in /usr/bin and /usr/sbin. You can get some info about the state of your MD software raid with “cat /proc/mdstat”. You should look for the getty executable which may be corrupted. You might need to replace it manually. Getty is in /sbin and is linked to
    #ldd /sbin/getty
    linux-vdso.so.1 (0x00007ffdb19db000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f99c586b000)
    /lib64/ld-linux-x86-64.so.2 (0x000055d1a3d39000)
    any of those could be corrupted also. You will see slightly different versions.

    LVM version 1 is different than lvm2 and if you want to boot from a rescue image you will need lvm1 support.
    If you do this be sure to flush your file systems as your shutdown will not be clean without the full system init.

  50. probably don’t want to read this now (so save this for later), but for the future:

    TL:DR: Disk space is cheap. Make lots of copies. RAID is not a backup scheme

    RAID is not a backup scheme. Too many common points of failure and too much fragility since it works at the block level. IMHO RAID is only useful for HA database systems and even then I’ve seen services taken down that depend on it. (e.g., Livejournal ~10 years ago).

    My favorite backup scheme is to rsync the disk to a USB attached drive, and then unplug the USB drive and put the drive in a fireproof safe or alternative location. I note OSX users get this for free if they use Time Machine, but Time Machine is basically rsync. USB drives are cheap ($100 per 2TB or so now). Repeat as many times as you want a backup (3 if paranoid, with USB drives from different manufacturers).

    If it is truly important and privacy is not a concern, you can make an additional back up to a cloud service (Amazon S3, dropbox, etc).

  51. “In the attached picture you’ll be able to see what’s happening to one of my most important servers containing some irreplaceable climate data. ”

    Perhaps some sympathy for Phil Jones would make the lost data recoverable.

      • Yes, after they made sh*t up and refused to supply requested FOI material! Also I believe Anthony has stated he has a backup and l don’t think if he ‘lost’ the data it would cost the worlds economy several trillion £/$ in ‘climate mitigation’.

  52. Anthony,
    Let’s break this down a bit. Is the Adaptec controller built on the motherboard or is a card plugged into a PCI slot? From the limited screen shot, it’s seems a built in, which is practically useless, IMHO, and often not supported by the o/s. Plus, if it was a real raid controller card, mdadm and/or LVM would be unnecessary. On the web I found a boot sequence from a similar o/s, it’s getting along to about where the drivers would kick in.
    mdadm did have some issues with losing it’s config file, but is repairable, but this could be a hockey stick.

    First clue is the partition check is good on each drive. ext2 is ext3 w/o journaling. IIRC, booting from a CD or USB should allow you to mount one of the disks and can check it for errors. The header on the disk will remember the raid config, but one at a time shouldn’t be an issue.
    The situation now is the boot process, which appears to be failing.
    The suggestion of the bios failure might be sound, if the “raid” got turned on, it could be incompatible with the o/s drivers.
    Is it possible to get a screen shot(s) of the bios screens? If the battery went low, and the bios reset, it is possible to make the system unbootable.
    If you have your boot cd, and a spare drive, build a new system with the computer. This will make mounting the disks easier and might point to the cause of the problem.

  53. Why is the word “backup” missing in a sentence containing the words “irreplaceable data”? Because they would not be irreplaceable if it were there… Jesus Christ.

  54. All I can say is that the comments here show just how intelligent this community is. I have been following WUWT almost since the beginning and it never ceases to amaze me the level of reader here. The idea that people who question global warming are actually less has definitely been laid to rest.

    • Anthony – You should be very proud of what you have accomplished. I always wanted to meet you after all these years, maybe one day it will happen : )

  55. Seems to me that these days, in these United States, you have to be a Unix expert to count as a real man.

    And there seem to be plenty of real men reading WUWT, as is right and proper.

    I’m just a Unix amateur, but I know enough to respect the real men in the trade.

  56. Re: Music to Recovery Your Data By:
    “Brahms’ German Requiem is good music to debug an OS crash to.”

    Nahh. Mendolssohn’s Reformation Symphony.. Otherwise something quiet and peaceful: Andres Segovia or Juliam Bream playing Vivaldi or Bach…

    As to the problem at hand: I would suspect *hardware* issues at this point. It was shut down properly but did not come back up. The cmos battery has been mentioned but there are other things would could be a problem.
    I would *remove* the cloned hard drives from that machine and put them into a new, known to be good box. Store the original drives (both of them) somewhere else so they are not mistaken for the clones. There may actually be nothing wrong with them, but it is not the time to explore that. Since the cloned drvies are RAID1 drives (clones of each other) you only need to install one of them.
    The clone drive will not be listed in the new boxes fstab, so it will not be mounted.
    Presuming a graphical (inittab 5) setup, use sfdisk -l to search for all of the partitions ( on all of the disks), and note what partitions sdx has. Or use gparted.
    It will not be mounted as part of a raid setup, but it will be mountable.
    You should have enough info at this point to mount the partition(s) in sdx.. which probably amount to one partition: sdx1. Mount in ro mode.

    and now you can copy the data: I have an 1TB external USB drive which I picked up on a special for the grand sum of $69 (Cdn!!!). It’s a little slow (USB2) but it works. Since the stuff I *really* want to back up is only about 80G, I recently picked up a 128G Kingston Data Traveller USB stick (about $70 w tax) and copied to that.
    At this point, I would not trust the hardware: too old. And I would not trust the software: that OS is far too old. Time to start again with something new.

    • I’ve been buying those 128 SanDisk USB3 drives from Costco for $28… Canadian!
      They’re an awesome insurance for those few gigs of data that you REALLY need to protect, and they actually read and write faster than most of my HDs.

  57. If your data is accessible, is it time to consider getting out of the infrastructure business and finding a home in the cloud for this? Tremendous storage and analytics capabilities available out ‘there’ pretty cheap.

    • What’s with the Tony Heller fan club here. If I had a problem I would not trust it someone with whom I previous issues with concerning “alleged” lack of integrity, failing to admit errors and make necessary corrections. Oops, your RAID content is lost, it’s not my fault it must have been stuffed when you gave it to me. Tough.

  58. 1) An old disk failed on startup after a period of being unpowered.
    2) A reported ‘no boot disk found’ was displayed.
    3) It depends upon what issued that message, linux or the adaptec hw/sw.
    4) The cloning of the disk could be faulty.

    5) Something failed, to create this issue: battery, nvram, ic getting old, dry joint etc.

    6) I would cut and run, copy everything in sight multiple times, bin the old hardware, get a new system and copy your data onto the new system.

    7) Otherwise, if you successfully repair your old system, you will have a working *OLD* system, waiting for its next failure. All kit reaches its ‘end of economic life’ eventually.

    • 3) It depends upon what issued that message, linux or the adaptec hw/sw.
      4) The cloning of the disk could be faulty.

      3) It’s not Linux since it cant find a boot disk ! Sounds like BIOS. RAID firmware was detecting faulty h/w and did not present a device to BIOS. BIOS could not boot since it was expecting that device.

      4) I suspect that may be the case otherwise it would have fully booted once the failed HD was replaced.

      Personally, I always have a separate partition for data and the root fs and would not use a raid for the boot.

      This is why several people have suggested a Linux live boot DVD which will enable examination of the file system being presented by the RAID device. There are tools to check the integrity of the fs.

      I would also start by running MEMTEST86+ which is probably on the existing boot menu somewhere and will be on any live DVD boot options.

      • BINGO!

        All you really know at this point is “something is wrong”.

        So the best course of action is to replace as much as possible with known good parts.

        IF possible, a different system (hardware). Then boot a new Linux. Now you know HW and SW are good. Proceed to disk duplication and data recovery.

        IF NOT possible (disks depend on a particular controller for example) boot a recovery system then test and prove up the hardware enough to say you have a known good HW and SW rig. Proceed to disks…

        I, too. put root on a distinct disk (with backup copy on other disks). You do not want a RAID failure to kill your boot / recovery system…

  59. Anthony. Could you state the exact problem which you still have. As I understand it you have a clone of the original disk which is what you are now working with ( the original remaining good disk now being in a safe place ) and viable, working RAID hardware with two new disk drives. You say you have a backup of the data but presumably in some kind of specific, bespoke, compressed format which needs specific software to recover. Is that back up also on the RAID or physically stored elsewhere?

    Reading between the lines, is the problem that you need to boot the existing Slackware system to use ( or reinstall ) the backup software to recover the backed up copy. Is needing to stick with the same mobo to retain the same RAID controller part of the equation?

    There are a number of paths but a lot of perfectly good and wise comments do not seem to relate to the actual situation you currently have. To find the sticking point I think it is necessary to clearly lay out the problem as you currently perceive it.

  60. Did I read you right, you cloned the remaining disk in the array and rebuilt from that?

    You didn’t try rebuilding from the remaining good disk in the array?

    Oh and backups..

  61. isn’t this a bash error?
    what os and kernel version is it?
    a kernel update could do it and would not bee seen until next reboot

    • OS version? To judge by the screeny : LVM 1.0.8 ( 23/11/2003) : this system has not been updated since it was installed in 2007 and even then was probably not brought up to date after the installation was completed. ;)

      If it was not not externally connected that probably does not matter too much but may make installing / fixing things a little more complicated.

  62. Okay. Here it is the next day. How’s it going?? Did Anthony accept Linux expert E.M. Smith’s excellent, generous, offer to drive up (only ~3 hours)? As far as we readers know, Anthony just blew off Mr. Smith!

    Well, just hope it is all okay, now.

    • I’m on standby. A local guy was going to take a look so no need for me to drive just yet. System was set aside for a few days while Anthony did other things anyway. I’m supposed to “check in” today, so reading here first to ‘catch up’ then will hit email for any update there.

      I’ve got my kit ready to load into the car and the spouse expects me to be out of town tomorrow if need be.

      So no, Anthony did not “blow me off”, but contacted me via private channel.

  63. De-lurk. Your system, which looks like CentOS at a glance, is trying to start X (graphics mode) after runlevel 3. GUI should be RL 5. Something’s either misconfigured or missing. One fix is to set level 3 as init-default in /etc/inittab.

    id:3:initdefault: – change to this
    #id:5:initdefault: – from this

    This will disable boot-to-GUI.

    If you want a GUI, install xorg, gdm and gnome. Use “yum install” for xorg, gdm and gnome.

    • Thanks, that would be great if I could understand any of it. One of the downsides of Linux is that experts in it often speak in tongues…at least that’s how it seems to mere mortals. ;-)

      • Hopefully you are taking EM Smith’s advice. Yes, Lunix can be arcane, but modern distros like Cinnamon Mint are sufficiently Windows-like for mere mortals.
        Definitely build a new boxen, though if you are on a budget, manufacturer refurbs are the way to go. I recommend Dell.

      • Anthony, it seems you have plenty of competent people willing and able to help but as I suggested above , I think you need to define exactly where you are with this and what the perceived problems and constraints are.

        That will ensure that you specific help , not the whole kitchen sink of everything anyone knows about linux command line.

        The first step, suggested by many is to get a bootable “live” DVD, start the PC off that and see what state your new copy of the file system is on the RAID. Just about any linux distro’s live DVD will do for that.

        I suggested a number of things you may wish to clarify to enable people to give specific, pertinent advice, I won’t repeat them here. Depending upon what is required you may not need to get involved in archane and error prone command line hacking.

      • This seems to be Slackware 10 or 11 from the two package versions we are able to see on the screenshot. That means it is running a 2.4.x kernel . That may be relevant if you need to reinstall something similar to get the backup software running. There are major differences from 2.4 to 2.6 ( which is way out of date itself now ) so anything you have intended to run on that machine is very unlike to run on anything remotely recent.

        see PACKAGES.TXT for the various version here:
        http://ftp5.gwdg.de/pub/linux/slackware/

        Again, define what you need to do why you need to do it and someone will come up with a solution.

        Maybe start with :

        is the only copy of the “backup” data on the RAID devices or do you have a physical backup elsewhere?
        what is the backup software that was used?

      • “anything you have intended to run on that machine is very unlike to run on anything remotely recent”

        Not a showstopper. Install recent Linux distro on new machine. Install relevant Slackware distro in a virtual machine. Proceed from there. Similar to what I did to get Corel Word Perfect for Linux running recently on Cinnamon Mint 17.3. ISOs for old versions of Slackware are readily available for DL.

      • Thanks, that would be great if I could understand any of it. One of the downsides of Linux is that experts in it often speak in tongues…at least that’s how it seems to mere mortals. ;-)

        Well if you knew nothing about mechanics and put out a call for an expert to advise on how get your engine to start, you would probably say the same thing about the advice you got.

    • Something’s either misconfigured or missing.

      That is way it appears and is why the fs integrity needs to be checked as a first step. If it can’t spawn a tty you are not going to get runlevel 3 or an emergency console either !! So far that root fs looks stuffed though it may be repairable or at least partly recoverable for the data.

      I am guessing that the “backup” of the data is also on that drive and may or may not be recoverable.

  64. “Michael Palmer March 22, 2017 at 4:16 pm
    If you have a copy of the program, I would suggest trying to get it to run on a recent Linux installation. If it runs, then you can install a newer Linux version on your server, rather than fixing the old one.”

    Actually… Get a new server. Install Linux, or boot from a temporary Linux disk. See if the old software runs on it. See if you can access the old disk drives (actually, the copies). If so, then install Linux on the new box and try using that.

    Do not throw away the old server until you know that everything is working. Someone else suggested something similar, but their sequence was to toss the old server first.

    By the way… on the old server, consider replacing the keyboard. Maybe someone spilled something on it, and the keyboard might be flooding the console process and making it die.

    • As others here have observed, old hardware dies. Electrolytic capacitors dry out, magnetic domains on disks fade… One machine I had in my early days had what looked like a failure of the video adapter, but it wasn’t. The serial card was faulty. So it goes…

  65. I am going on the assumption that the disk is OK and something changed. So I would check the BIOS and make sure the serial port is enabled. I no longer have any bare metal Linux servers, so I can’t test it. But that is the first thing I would do.

    This is very old software. The LVM version included was released in November 2003.

    Good Luck

  66. Hi,

    It looks to be having problems with serial ports that aren’t live. Open /etc/inittab and look for the corresponding lines that match the message. Should be something like:

    c3:12345:respawn:/sbin/agetty 38400 tty3 linux

    From your display, you should have 6 of them.

    You need to place a # before each of those lines, save the file, then reboot.

    This should solve the issue.

    Best of luck.

  67. Anthony: If I understand your problem correct: you have backup of your data so we may leave that for the moment. You have an old PC, at least about 10 years old, with sufficient resources to run Slackware as a server OS. A new version of that distro may be loaded down from their homepage according to distrowatch.com if need may be.

    Most Linux distros may be able to set up software RAID out of the box, It may be comparable in speed with your hw controller. Mdadm may be installed from the package administration. Possibly you may find a linux driver for your hw controller too, if you keep it but install another distro as your server.

    I have never used Slackware, didn’t find it interesting, though I have used various flavours of Linux since I started with SUSE (now OpenSUSE) when it was delivered on some 3.5 “ floppies. If I were you, I would have downloaded Ubuntu (Debian based) with the preferred desktop and used it as a basis for your server. Burn the downloaded ISO to a DVD or a flash drive, and boot up the PC from DVD or an USB port. Then you can check out if all your hardware works from the live DVD before you install it as a desktop or server on your PC. After that you may start up your selected raid 1 for mirroring the disks.

    You may keep the desktop running after finishing the install, but it should be easy to shut it down and start the OS in server mode. I installed Ubuntu for a friend as a server with wordpress. He preferred to have the desktop running to be able to handle it, but when he has finished installing a large photo gallery on wordpress, he may go for a clean server.

    To your raid solution: Throw it into the dustbin and install an ssd disk if the size and price is in your comfort zone. It may these days last as long as a HD solution, with a tremendous speed increase. No moving parts. Less power used.

    That was my thoughts about your problem. This is written on a ASUS ZenBook UX 305 with an ssd disk, running Linux Mint 18, with use of LibreOffice Writer from where I copied it to your wordpress message window.

    Good luck with your server.

    Greetings from The Land of The Midnight Sun (North Norway)

    • For SSD fans:

      Do note that despite “no moving parts” there is “cell wear” from write operations. The SSD in the Mac from which I am posting this died after about 5 years of use (data recoverable both from cloud and local TB disk). I’m presently posting using an external SD card for file system and OS (as we bought a new Mac for the spouse and I inherited this “broken” system). On my “someday” list is to install a new SSD (much faster than the SD card, that is Bog Slow…)

      So expect your SSD to be good for “a few years” but also expect that heavy read / write can cause it to fail catastrophically and much sooner than a rarely used hard disk…

      • Thanks for this. I’ve got SSD on a 5-year old Mac (backed up six ways from Sunday) but I was unaware of the relative weakness of SSD vs HD. Good to know!! (Just when I’ve got the proper grooves worn in the keys, the damn thing is gonna fail on me, I know it…)

  68. I still don’t understand the exact problem because Anthony has been a bit vague. It looks like getty can’t start because it can’t connect to the virtual consoles. In other words, no login prompt. Commenting out lines in inittab will still mean no login prompt.
    But is that desired anyway? What hardware is this? Does the program you need run off the network or do you login with a gui?
    Tell us what it used to do and we can probably help get it back to that state.
    Although as already said, things that are ten years old and broken are very hard to fix. It would probably be easier to reinstall from scratch.

    • Before worrying about why tty’s aren’t there, the first task is to see whether the root fs is corrupted or not.

    • The “exact problem” can’t be known until it is fixed. Until then, it is just speculation based on diagnostics.

      From what is known, there are several potential causes, from hardware to bios to configuration and on up to RAID failures on the /boot /root areas.

      I’d approach it by bypassing as many of those as possible with new gear / OS and then recover the data; after that, work backward to find the “exact problem”… if terminally curious…

  69. So, Anthony were you able to get the problem fixed? As I said near the top of the post. I suffered through something similar last year. I had an older IBM xSeries server running as a simple Linux file share box. I didn’t know there was any thing critical on it until someone in marketing said they couldn’t get to their data. It was old data, and I told the users that the box wasn’t backed up and was slated to be retired. I was lucky in that I could boot from USB and pull the data off. The punctured RAID array came about because I didn’t update the firmware for over 5 years.

    Lesson: keep the patches and firmware microcode up to date. The value of WhatsUp lies not in the hardware, software, etc… but in the content. You may wish to reevaluate things and consider moving your critical data to the Cloud. If you still prefer Linux because of the low cost, there are plenty of Cloud providers that allow you to spin-up Linux distros, and they charge per subscription. When you have data sitting on 10 year old unpatched hardware you’re just asking for trouble.

    Good luck.

  70. can you post a followup with the fixes, root issue, etc?
    would be good to learn from it as i am interested.

  71. Anthony, EM Smith offered upthread to drive over and help you fix it.

    BTW we should take care not to share any details that could help a potential hacker.

  72. For what it’s worth Anthony, given you have lots of advice already but I take it you haven’t resolved this yet, the first option is to get someone who knows what they’re doing in front of you and the computer – so take up EM Smith on his offer.

    Failing that, I take it that this is mostly about recovering that machine to an operable state, because otherwise you have to rebuild that machine to use the software. And as we all know, that can be a lot of time/work.

    So, I’d work through the likely causes:
    Firstly, is it a corruption on the drive. You say you made a copy, but that may have been a copy of something that was corrupt in some way. So boot from a recovery system (CD or USB drive – Knoppix, RescueCD will work, but I see there’s a liveslak which probably is your best bet: https://docs.slackware.com/slackware:liveslak), and then see if it will mount the disk. If it does, you can rummage around to see whether it all looks right, and you can run a chkdisk to see if linux thinks it is correct. If it doesn’t, you’re into recovering that disk/filesystem, which is a more error prone process.

    IF the hard drive seems sound (you can mount it and the files all look to be there) you can do a CHROOT to try to run from it without running a full boot process. http://docs.slackware.com/howtos:slackware_admin:how_to_chroot_from_media
    No doubt having someone like FE Smith there to do this for you would be much better. Depending on what you find, you can then decide whether you’re better to fix that old disk to make it bootable again (maybe a couple of key boot files are corrupt, replace them and you’re off), or just make a new install and bring the data and config across.

    IF the data is not good, then recover from backup and reinstall the computer.

    IF the data is good, but it can’t be made to boot, then install a new OS on a clean disk – slackware if you wish – and then copy the data into it and reconfigure the software you need as appropriate. Or bite the bullet and put this onto another machine you already have – so you have one fewer machines to maintain.

    • As noted in a comment a bit above:

      Anthony has taken me up on my offer, but was “otherwise occupied” until today. I’m now checking in to find out current status and prep.

  73. Linux starts with a loader utility called GRUB which is prior to the screen-shot at the top. This loads a kernel image and minimal root file system compressed as single file with a name starting initrd…. , that much seems to work. It is that initrd which is showing LVM etc.

    It then loads ( mount ) the real root fs in read only mode, that mount process works but does not mean it is not corrupted. The kernel will then attempt to switch from the initrd minimal fs in memory to the real fs on the disk. This is where it falls over. The root fs is corrupted.

    So I purchased two new identical HD’s cloned the good one, and rebuilt the RAID1.

    Anthony, you have expressed above that you have little to no knowledge of linux commands so how was this “clone” done? Using windoze ?

    • Is that correct way to replace a failed RAID1 disk? I think you should have just added the new virgin disk and used the Adaptec firmware to rebuild the array.

      If you have one byte for byte accurate clone of the working drive ( which you wisely are not using ) then put that in, disable RAID and it may boot. To restore RAID you should let the firmware handle it.

  74. Hi,
    Just to add my voice to the confusion. I held down a job supporting a one billion dollar embedded software project running linux for about 15 years. I did the kernel drivers and os system software.

    I would say that the fundamental problem is new hardware with old software. You have upgraded the controllers So I purchased two new identical HD’s cloned the good one, and rebuilt the RAID1 but used the old software This machine was built circa 2007, and has Slackware Linux of that era installed.

    So I would suggest getting a late version distribution and trying to mount the disks with that.
    good luck,

    • I thought quotes would work. The middle paragraph should be:

      I would say that the fundamental problem is new hardware with old software. You have upgraded the controllers “So I purchased two new identical HD’s cloned the good one, and rebuilt the RAID1” but used the old software “This machine was built circa 2007, and has Slackware Linux of that era installed”.

  75. Anthony. I’ve been thinking on this, and I have a theory that seems to cover all the facts you’ve posted. It’s not all that likely, but I’ll present it here anyway.

    Let me preface it by saying that clearly a lot of stuff works. Your PC passes it’s BIOS quality checks. Otherwise, it just sit there and beep at you. It can find a boot disk, read the first sector, find the boot partition specified in the first sector, locate and load several files — a kernel image (probably called bzImage), an initialization ramdisk (initrd), and an initialization control file (inittab). It’s failing when inittab processing tries to set up “virtual consoles” that allow the keyboard and monitor to pretend to be a terminal. You need one virtual console to log in and run stuff, but for whatever reason Linux traditionally sets up six of them.

    It’s certainly possible that something needed to set up the virtual consoles is broken — in which case, you’ll probably need to reinstall. However, I’m thinking along different lines. What I think could have happened is as follows.

    1. You have elderly hardware there. It comes from an era before programmable non-volatile RAM (NVRAM) was ubiquitous.
    2. The PC boot process requires a little bit of configuration information.That information is stored in NVRAM. From 1985 for two decades or more, NVRAM was implemented as a small static RAM that is powered from a battery when the PC is off. The battery can be either non-rechargeable — typically a CR2032 Lithium coin cell — or rechargeable — typically a small barrel shaped object on the motherboard. Neither kind of battery lasts forever.
    3. PC hardware manufacturers try to set things up so that users don’t have to change the configuration information.by 2003 or so, they had gotten pretty good at it. That’s a good thing because tinkering with BIOS setup parameters is not for the faint of heart. But nonetheless it is sometimes necessary. For example, the 2003 era Walmart $200 PC I’m typing this on ran fine with default settings when I got it. But when I upgraded the memory a few years later it would crash every now and then until I slowed down one of the memory timing parameters in BIOS setup.
    4. So what I think may have happened is that the guy who set up your PC found that it didn’t run quite right until he tweaked some BIOS setting or other. Problem solved. … Until the “CMOS battery” died. And being a server running 24/7 it continued to run fine. Until the PC was powered off. It only failed when you tried to start it after a period off line.

    If my theory is right, it’s entirely possible that nothing you try will quite work. What do you do then? Well you COULD try going into BIOS setup (You push some magic keys while booting — which keys vary with the BIOS. You may see a message telling you what key(s) flash by when the boot is just starting) then resetting everything that looks like it affects timing to the most conservative possible setting. (Needless to say, you should write down the original settings) then rebooting without powering off. If the PC then boots into Linux, replace the battery and either live with a slower machine or spend more time than you really want to tinkering with CMOS settings.

    If you are more comfortable with Windows, there is probably a program somewhere that will allow you to read the ext2 file system on your disks from Windows using a $30 or less USB to IDE(?) adaptor. (I think both disks in a RAID1 array are identical and that RAID is only relevant if you are trying to use the disk in a RAID array , but that’s not something I know much about)

    Good luck.

    • The CMOS battery is a credible scenario, one of the first symptoms is the real time clock losing count, so one could look at the BIOS settings and check the clock. However, if it loses important settings you will normally get CMOS checksum error on boot and will need human interaction before it will go any further. This was not reported , so I would guess this did not happen.

      Maybe it is possible that this affected the RAID config , giving the impression of a defective disk.

      The RAID is administered by an on-board Adaptec RAID controller, and it reports the RAID as healthy.

      One thing I am curious about is the fact that linux kernel seems to be seeing two identical disks at start up. If this was running configured with a hardware RAID controller I would expect the kernel to see a single disc device, not two. Maybe not all firmware works that way. Since we don’t have anything more than “Adaptec” we can not check it out.

      Secondly, from the screen-shot, it seems that linux is trying to provide software RAID in the kernel. Since the 3rd party who configured all this is apparently not available for comment, we could guess that the kernel was configured with RAID support but that was not used if it was indeed done using the firmware RAID. Alternatively, the firmware RAID was not used and it relied on software in the kernel.

      Apparently the kernel image file and initrd are found and read correctly by GRUB , I’m guessing a likely default being the first partition of the first scsi disc. It should be noted that this is NOT Linux reading the fs at this point but GRUB, the boot loader.

      The kernel is then mounting an ext2 fs without error which means there is one on that partition. The GRUB menu needs to be examined to see which partition it is taking as the ‘root’. It will then try to chroot to that fs replacing the minimalistic on it has in memory. This does not mean that this is the original fs intended if the config has changed causing device names to move and change.

      INIT will then start trying to use the root fs. It looks like that partition is corrupted or may not be the intended root fs at all. The content of that fs needs to be examined and checked, which takes us back to the earliest comments that were offered about booting from a rescue disk of some kind.

      This is all good fun but a bit of a guessing game unless Anthony wishes to post some more information from the system.

      • Greg

        Mostly I agree. I don’t think you have to have a boot manager to boot Linux. Even if one is needed, a circa 2007 Slackware would give you the option of installing LILO in its install menus with GRUB — if it’s available at all — relegated to some sort optional extra list. Since there doesn’t seem to be a Windows partition, I’d bet on LILO or no boot manager at all.

        In any case, I agree that if one can ever get to a bash prompt via normal boot or single user mode or a stand alone cdrom or usb linux system or whatever, the first thing I’d do is run fsck.ext2 (fsck is the Unix equivalent of the MSDOS chkdsk Anthony) on the disk partitions.

      • No, you do need a bootloader image installed to bootstrap the system, even if it only has one OS to boot, that’s the way PCs work. You are correct Slackware uses LILO, not GRUB.

        The fsck check should be done initially with the partition mounted read-only to ensure nothing gets “corrected”. Others have said that but just recapping in case it gets missed.

      • You’re right that you need to continue the boot process beyond executing the code in the Master Boot Record. I’d assumed that Linux distributions, like the DOSes and Windows 9 defaulted to a simple next stage bootstraper like syslinux in the boot partition Volume Boot Record. But apparently not.

        I always tried to preserve the Windows OS that came with the machine so I always installed GRUB or LILO or booted unix from Windows (loadlin) back before Microsoft decided that mere users (even admin) were not to be trusted with that awesome power. It looks like Anthony’s machine probably has nothing to dual boot to so the bootloader will likely be whatever the guy who set it up favored. Maybe LILO. Maybe something he carried around on a CD or floppy.

        I also agree that running fsck read-only initially is the proper way to do things. However, that requires tinkering with mount or (worse) fstab. Running on a copy of the disk I personally would probably decide that I was going to try booting from the “repaired” disk no matter how gruesome the fsck output. What’s to lose? (Hmmm. What happens when you run fsck against a RAID array??? That question might be worth pondering for at least a few seconds)

  76. And here we are, three days later…. . I would be nice to hear what happened…. Unless there is some report, it leaves a not-so-nice impression that, apparently, all that good advice, including E. M. Smith’s generous offer to drive up was just *piff* ignored. Bummer.

    Hope you are okay, Anthony. Hope the tech issue is resolved.

    • I agree, because I fix very old computers for poor people for free…(hardware and software), but I have never worked on a system using Linux, so I would be very interested in knowing what exactly failed …
      P.S. Hi Janice, you young librarian you !! LOL

      • I don’t see any evidence of anything Linux related having failed here. There was a hardware failure and our host has lost contact with the guy who put this together for him. He lacks the expertise on Linux himself and apparently did not do any system maintenance for a very long time. Both the h/w and s/w seem to have performed sterling service over the last 10y.

        The Linux kernel booted but at the moment of switching to the root fs on the disk it got stuck. The contents of that fs do not hold a valid linux root fs. My guess is that this is not the same partition as was originally being used ( or it have been seriously corrupted ).

    • Janice,

      Anthony contacted me same day as my first post. He had some priority things to do ( data that has been idle for a few years is important, but not time urgent). We agreed to me being available today and forward, so I’m catching up now and prepping to drive, IFF still needed.

      So no worries on Anthony and “piff”, OK?

      BTW, further up thread Don K & Greg were talking about recovery boot processes and Windows et. al.

      One thing I’m going to ask Anthony, when on site, is “Why Linux?”

      I love it and live on it, but others prefer Windows. It would be very easy to toss a copy of the data onto an NTFS disk if desired. Part of my kit is a Dogbone Stack of 4 Linux boards ( 3 x Raspberry Pi and one Odroid) with support for most filesystems installed. I usually have ext3, ext4, ntfs, FAT32, and Macintosh file systems mounted at any one time. Useful for moving data cross OS types…

      In any case, a modern Linux with NTFS support would allow archiving to a Windows readable disk (with some minor metadata issues so choose wisley the archive format).

      • Thanks for bringing us up to date. The absence of any response to your offer here led me to think there was some reason our host did not want to take you up on it but did not feel it was something he wanted to make public.

        You sound eminently qualified so no need for further suggestions. It would be interesting to know what it was when you find out. I strongly suspect that partitions have moved due to disk swapping and the partition being mounted is not the one where the root fs is .

        It sounds like AW would be more comfortable with Windows, though in response to the “why Linux” I would say more secure, more reliable and less likely to get blown out by a forced update being pushed onto the system in the future.

        /my2c.

        Good luck.

  77. ..The failed CMOS battery was already suggested and laughed at !! ..Sigh !!

    Butch
    March 22, 2017 at 2:01 pm

    I had something similar happen to a friends old computer. After he had it unplugged for a few hours during a house move, when he started it back up again, he got a message about not finding the Raid drive…I, went into his BIAS setup and found that the BIAS settings had been reset to Factory Default…The BIAS date was set back to 2004 !!! This was caused by the death of the internal BIAS battery.. I replaced the battery and then reset ONLY the BIAS date to present date and it rebooted fine…Some adjustments later to get everything updated, but the hard drives did work…(and it only cost $1.00 for the battery…LOL)
    Reply

    ckb
    March 22, 2017 at 3:21 pm

    The above is a great story and great advice, but I find it hard to believe Anthony got as far as this without noticing the clock was reset in the BIOS!

    • Never assume what others ( or yourself ) are capable of overlooking. Especially when panicking about losing “irrecoverable” data.

  78. If you installed that system in 2007, then it’s way out of date by now, so forget the the detail and do a fresh install on a new pair of mirrored drives. There have been dozens of updates and patches to linux since then, including security, so it’s far better to be using a current distro if you can. If you can build the raid to include a hot swap spare, do that as well, as it will automatically rebuild the mirror if one of the drives fails. Put both of the old drives to one side and wire them back into the system to recover the data once the new system is built and fully configured. Favourite distros here are Debian and Suse, both of which are very robust and just work out of the box.

    I do electronics and embedded computing here and use a variety of os’s; linux, freebsd and solaris, for example. Freebsd includes the zfs file system as standard and is well worth considering for critical systems for that reason alone. Just built a remote network backup server recently using freebsd and it just works. The servers and machines here are in the lab, while the backup server is in the house, so if either burns down, there’s no data loss.

    I’m in the uk, but would be glad to help if needed, if only remotely…

    Regards,

    Chris

    • Chris: “Freebsd includes the zfs file system as standard and is well worth considering for critical systems for that reason alone. […] I’m in the uk, but would be glad to help if needed, if only remotely”

      You may want to seriously consider Chris’ offer to help, Anthony. ZFS will leave you grinning.

      I entirely agree anyone farming large collections of critical data will adore ZFS. The administration is dirt simple once you get the hang of it. And you can forget about holes in your data on RAID-like arrays. It’s self-monitoring (can send you an email if there’s a data discrepancy) and self-heals on command. Unless the disc is just busted, in which case you just replace it with another in the array and it immediately reslivers. ZFS mirrors are hot-swappable by definition.

      Now also available on Ubuntu 16.04 LTS though not (yet) its native filesystem. But easy enough to download and put to work immediately.

    • Chris, AFAICT, the reason for wanting to get this hopelessly unmaintained system back up is that the “backup” seems to rely on some specific software ( maybe compression or incremental backups ) which are on the drive and that software will almost certainly not work on an up to date installation.

      Some did suggest then installing the old Slackware in a VM but that would be a bare system and our host is apparently very unfamiliar with linux, so that would be an extra burden. If the disk is not corrupted it would be easier if it can be made to boot.

      • Greg – It may be possible to get one of the 2 drives booted, but the problem with that is that if the drive contents are even slightly corrupted, the boot process may make that worse. If the system crashed, the drives may not be in a sync’ed state and the boot process file system check may not recover from that.

        All in all, better to build a clean system, then mount each drive in turn, read only, to recover the data. You can use dd or similar to create an image of the drive to a file, or even another drive…

        Chris

      • Booting from a live DVD would be a “clean system”. That allows the initial ro check of what in one drive copy. With the state of play shown above, it did not get as far as fsck.

        In any case I think this has all become academic since Anthony has barely responded to the many comments and offers of help , so I would conclude he has dropped his ideas about recovery.

  79. What’s the problem? Oh, the AGE of the system? Yeah, I am still using my 2003 Compaq Presario with Windows XP, and so far, the only thing that has ever failed me was the HP backup drive that crashed with 3 months worth of photographs of wildflowers and birds that I may be able to recover by having some tech geek go in through the back door to find them. Like an idiot, I did NOT copy any of those pictures to a more reliable backup as a duplicate.

    Lesson learned the hard way. Always have a backup that is completely reliable. I’m no computer expert by even a pig’s whisker, but this IS the real disadvantage in relying too much on electronic storage instead of keeping hard copies or discs not attached to the computer. It’s the reason I have print copies of any finished manuscript and the notes that go with it, as well as a copy on disc and/or jump drive from a reliable company.

    Anthony, I wish you well. The data is still there. It’s just you who are blocked from accessing it.

    • “Lesson learned the hard way. Always have a backup that is completely reliable. ”

      … and don’t keep it on the same physical medium as the original ;)

      • Or in the same physical location…

        You can get nice 4 TB USB disks now that fit in a sandwich box… Copy to one and put it in a safe deposit box at the bank. Refresh quarterly (or as needed).

        Or put it at a friends house if you don’t need the security / privacy of a bank…

  80. Been running slackware since march 95 , , ,
    To start with a clean system on a plain primary disk is good advice .
    Entire old system can be restored to a chroot tree if need be . ..
    you can run 57 flavors of userspace at the same time on a single kernel . .

    ######################################################
    ### Linux Mint 17.3 /home/data2/chroot2/LxMint17.3 ###########
    /proc /home/data2/chroot2/LxMint17.3/proc auto bind,noauto,ro 0 0
    /dev /home/data2/chroot2/LxMint17.3/dev auto bind,noauto 0 0
    /sys /home/data2/chroot2/LxMint17.3/sys auto bind,noauto,ro 0 0
    /dev/pts /home/data2/chroot2/LxMint17.3/dev/pts auto bind,noauto 0 0
    /tmp /home/data2/chroot2/LxMint17.3/tmp auto bind,noauto 0 0
    /home /home/data2/chroot2/LxMint17.3/home/homeh auto bind,noauto 0 0
    /home/data2 /home/data2/chroot2/LxMint17.3/home/homeh/data2 auto bind,noauto 0 0
    /home/data3 /home/data2/chroot2/LxMint17.3/home/homeh/data3 auto bind,noauto 0 0
    ##########################################

    . . . Errors seen make it look like /dev is missing / lib missing etc
    Recovery is 1E^42 times easier using another linux system configured
    enough to be a comfortable place to be, on the target hardware.or not.

    You dont want anywhere near stuff you dont want to lose while your
    getting something running . . .

    Linux can have multiple root disks, multiple kernels for each,
    you can pull the broken system back on board,, fix it,, add it to lilo and boot it,,
    slackware is one if the easiest on the planet to do this with because of its simplicity.
    . . . no dumbass systemd to get in your way etc . .

Comments are closed.