Climate Audit is down

While we are on the subject of hardware failure (such as has hit the DMSP satellite NSIDC and Cryosphere Today use) Climate Audit is down due to a file system or HD error. It happens. I’m on my way to the Colo (90 miles away) to effect a repair. Comments may be delayed for a few hours if other moderators aren’t online.

UPDATE: 5:30PM

The Climate Audit server is in fact RAIDed, I built it that way for just such an emergency, but some corrupted data was written before the one disk of the array failed. Since I could not stay at the CoLo all day, I’ve brought the CA server to my office for repairs. Hopefully the RAID rebuild goes smoothly (it takes several hours) and I’ll be able to repair the problem areas. Hard drives were both new, RAID quality units, with 3 year warranty. One failed 1.5 years into the warranty – that’s Murphy for ya.

Wish me luck, otherwise I have to rebuild from scratch and restore from backups which is also a chore.

Just for those who like to know about hardware, here is what Climate Audit runs on:

3.4 GHz Intel Pentium D CPU

2 GB ECC DDR2 400 RAM

RAID1 Dual Western Digital 250GB SATAII drives with 16MB cache ram

Running Linux with WordPress in LAMP config

1u Intel Server enclosure like this one:

intel-1u-1325

sr1325tp1

Thanks to those who hit the tip jar.

UPDATE: 8:30PM

One hard drive of the RAID failed. Now before you panic let me say I anticipated this (but like 2 years from now) and this was a RAIDed system with two drives setup to mirror. Normally when one drive fails, I can unplug the other and reboot the system and it will come up and run on the one, then I can install a new second drive and rebuild the RAID, and off we go.

I’ve done that dozens of times in my own systems. It is why I built the CA server the way I did. It is an identical server to 15 others I’m running here.

But for some reason known only to Murphy, this time when the system failed sometime last night, it appears it wrote corrupted data to the “good” drive before the full hardware failure. So at the moment the system is unbootable.

The good news is that most everything should be recoverable, but it takes time. If I can’t repair the boot sector on the good drive, then we have to rebuild two new drives from scratch, mount the one good drive, and pull files over. Though I don’t know just yet how much corruption there is and how much of it can be fixed.

The annoying thing is that these mirrored Western Digital 250GB drives had only 1.5 years on them, and less that 10% full. They were brand new when I purchased and installed them specifically for CA. They have a 3 year warranty. They’ve been in a temperature controlled and dust controlled environment at the CoLo. For one to totally fail now is quite the surprise. I wasn’t all that worried about regular backups due to the RAID mirroring, now the RAID fails with the drive.

I was able to rebuild the RAID, but it appears that the boot sector is corrupted. This will require a mount from a CDROM boot and fix the file system and make copies that way.

Best laid plans….

I anticipate it will be Monday evening before CA is back up and running.

0 0 votes
Article Rating

Discover more from Watts Up With That?

Subscribe to get the latest posts sent to your email.

150 Comments
Inline Feedbacks
View all comments
Ceolfrith
February 21, 2009 10:06 am

Off Topic!
Did you realise this site is now the repository of all wisdom and knowledge on the truth of global warming?
The UK Telegraph says so 😉
http://www.telegraph.co.uk/comment/columnists/christopherbooker/4742293/Climate-change-rhetoric-spirals-out-of-control.html

Hugo M
February 21, 2009 10:24 am

Anthony, telling from the diagnostics appearing on my screen when accessing CA, the problem is a read-only mounted filesystem for /tmp and /var.

John G. Bell
February 21, 2009 10:41 am

Hugo,
Actually the OS has made those filesystems read only likely due to some sort of filesystem error. That could easily be due to hardware error, disk, memory, power or the like. The OS is trying to save the filesystem from further corruption by preventing writing.

February 21, 2009 11:37 am

Quite important the article in the UK Telegraph, now WUWT is in the front line against wrong data. GWrs must be trembling!

Fred Gams
February 21, 2009 11:40 am

RAID is your friend, use it!

MartinGAtkins
February 21, 2009 11:44 am

Trust WUWT to be first with the news. I was about to post a message asking if anyone else was having trouble.
Is it worth blogging about?
You bet it is.

Paul Penrose
February 21, 2009 12:09 pm

Thanks for dropping everything and running out to repair it Anthony. Your efforts are very much appreciated.

jpt
February 21, 2009 12:12 pm

Have a safe trip!

TerryS
February 21, 2009 12:13 pm

Remote administration is a pain especially when you have to physically go to the site for what will probably turn out to be a 10 minute job.
One of my computers is in an inaccessible place so I used a motherboard with Intel Active Management features. This allows me to remotely cycle the power and select a device to boot from (via a web browser). I’ve put in a DVD that boots a full linux system so if it gets into a state whereby it cant boot from the hard drive I can remotely boot it from the DVD. So far I’ve had to use it twice, both due to power failures, and I managed to get on it both times to repair the filesystem.
Its something you might want to consider, after all, think of all those CO2 emissions from the 180 mile round trip.

MartinGAtkins
February 21, 2009 12:23 pm

Hugo M (10:24:09) :

Anthony, telling from the diagnostics appearing on my screen when accessing CA, the problem is a read-only mounted filesystem for /tmp and /var.

Stupidly I didn’t log the error output. He who’s name we must not mention would be very unhappy with me.

February 21, 2009 12:48 pm

Anthony,
Why not do what other climate sites do; go ahead post corrupted data and only bother to correct it if someone notices? 😉

Editor
February 21, 2009 1:13 pm

Hey, this is almost a relevant thread for:
Blog Stats
* 9,000,454 hits
and counting…
Congrats, many happy returns and many happy readers, but remember you
can’t please everyone. 🙂
BTW, above that text, the headings are:
RECENT COMMENTS
A
[select category]
SHAMELESS PLUG
When you get a chance, please restore “A” to whatever it was months ago.
I hope the CA repairs go well.

Bernie
February 21, 2009 1:15 pm

Seems like the real “team” hangs around WUWT and CA! Donation is to help defray some of the expenses.

Pierre Gosselin
February 21, 2009 1:48 pm

Concerning the excellent Telegraph report, it’s obvious that the Beeb and like media are politically motivated and could give a rat’s butt about verfiying information. We’ve entered a new media disinformation era. They really think they can establish a new reality by repeating lies.
I’m glad to say that I was involved in calling the NSIDC for what it is: a institute that is in shambles.
But I’ll respect Anthony’s request that we be respectful of Dr. Meier. Personally, the NSIDC doesn’t deserve the respect as they have made their bias clear months ago.
Thank you UK Telegraph!!
You’re website is a daily must visit for me.

Mark
February 21, 2009 1:57 pm

The only thing that makes me nervous about this article is that the guy who is writing it has shown some interesting views on other scientific ideas:
http://en.wikipedia.org/wiki/Christopher_Booker
Via his long-running column in the UK’s Sunday Telegraph, Booker has claimed that man-made global warming was “disproved” in 2008[1], that white asbestos is “chemically identical to talcum powder” and poses a “non-existent risk” to human health[2], that “scientific evidence to support [the] belief that inhaling other people’s smoke causes cancer simply does not exist”[3] and that there is “no proof that BSE causes CJD in humans”[4]. He has also defended the theory of Intelligent Design, maintaining that Darwinians “rest their case on nothing more than blind faith and unexamined a priori assumptions”.[5]
His scientific views have resulted in the initiation of the “Christopher Booker Prize 2009” [6] offered for producing clap-trap about climate change.

idlex
Reply to  Mark
February 23, 2009 5:19 am

that “scientific evidence to support [the] belief that inhaling other people’s smoke causes cancer simply does not exist”
It doesn’t exist. Or rather, about 6 out of 7 studies indicate no risk, and in a few cases show a positive benefit.
He has also defended the theory of Intelligent Design, maintaining that Darwinians “rest their case on nothing more than blind faith and unexamined a priori assumptions”.
Booker recently wrote about the adulation heaped upon Darwin, and some of the difficulties with the theory of evolution. It did not amount to a defence of Intelligent Design.
Personally I don’t have any problem with the theory of evolution. But I’m pretty thoroughly sick of Darwin sycophancy in general, and Richard Dawkins in particular. To that extent, Booker’s debunking of Darwinists came as a welcome change. Particularly since he joined it with an attack on global warming simplemindedness.
Finally, both the neutrality and factual accuracy of the Wikipedia article about Christopher Booker are disputed.

Hugo M
February 21, 2009 2:16 pm

Well, MartinGAtkins, the name of the ancient god you are refusing to articulate is certainly Lucifer — stemming from lat. “lux” and “ferre”: he who brought the light to mankind …

tallbloke
February 21, 2009 2:17 pm

think of all those CO2 emissions from the 180 mile round trip.
Especially with all those big heavy batteries Anthony lugs around in his car. :o)
I think we should have a whip round for the tip jar. Whose in?

Leon Brozyna
February 21, 2009 2:29 pm

Hi Yo Silver, Away!

Editor
February 21, 2009 2:29 pm

The number of times I’ve warned clients to BACK-UP the DATA! Of course, all my backups are up 3 months old…. but no one is hanging on my every syllable or meme. I may have more to say about “memes” later. Talk about pseudo-science!

Mark T
February 21, 2009 2:46 pm

Did I read that correctly, the server is in Colorado? If so, there are at least a few of us that live here (MrPete and I are in Colorado Springs).
Mark
Repy: Colo as in Colocation Center. ~ charles the moderator

Mark T
February 21, 2009 3:15 pm

Gotcha. I did misread that then.
Mark

tallbloke
February 21, 2009 3:20 pm

rephelan (14:29:29) :
The number of times I’ve warned clients to BACK-UP the DATA! Of course, all my backups are up 3 months old…

There are two types of computer user. Those who have lost data, and those who are going to…
My faithful old tosh lappy died at Christmas time. My backup was three months old too…
When CA comes back up I’ll drop something in the tip jar there too, towards the RAID system someone mentioned ^upthere.

Larry Sheldon
February 21, 2009 3:20 pm

“colo” — “co-location” — datacenter with for rent or hire rack-space/rackspace-eit-computer, and so on.
If I remember my California Geography, I’m thinking the colo must be in Sacramento or Suison of Fairfield.

schnoerkelman
February 21, 2009 3:23 pm

Perhaps we could get someone like Sun Microsystems or IBM to sponsor some hardware with remote console access. I never leave home without it, or rather I never leave home because of it.

February 21, 2009 3:29 pm

A whip around the tip jar?
That remark alone is worth $20 !

1 2 3 5