Post mortem on the Mauna Loa CO2 data eruption

ObservatoryAs most readers of this blog already know, there has been a posting and revision of CO2 data on the Mauna Loa observatory website in the past day that has generated quite a lot of controversy.Now after having been in touch with Dr. Pieter Tan at MLO through several emails I hope to shed some light on what happened.

It all started Sunday August 3rd when a revision of data was posted that showed a clear drop between January and July of this year.  I did a story on the January to July trend reversal of CO2 at Mauna Loa, The post on that highlighted what the data published by MLO said at that time. What it said was that there was an unusual, never before seen in the history of the dataset lower CO2 PPM value in July than was measured in January.

Then yesterday, Monday August 4th, there was an abrupt change in the MLO data published on their website that very nearly erased the trend highlighted in the previous story, and there was no mention of the change on the NOAA web page for Mauna Loa Observatory. There still isn’t.

So I did another story using a blink comparator to highlight the change in the data and made note of the mystery hoping to get more info from the curator of the MLO CO2 dataset, Dr. Pieter Tans.

Meanwhile, quite a lot of speculation occurred, much of it critical of the entire process MLO used to publish and revise this data. There were also some commenters on this blog that looked at the change in the data to reverse engineer what happened and figure out plausible reasons for it.

Early today, 08/05 8:55AM PST, I received my first communications from Dr. Tans on the subject:

Anthony,

We appreciate your interest in the CO2 data.  The reason was simply that

we had a problem with the equipment for the first half of July, with the

result that the earlier monthly average consisted of only the last 10

days.  Since CO2 always goes down fast during July the monthly average

came out low.  I have now changed the program to take this effect into

account, and adjusting back to the middle of the month using the

multi-year average seasonal cycle.  This change also affected the entire

record because there are missing days here and there.  The other

adjustments were minor, typically less than 0.1 ppm.

Best regards,

Pieter Tans

That left more questions, most notably as to “what happened to the rest of the monthly data” and I followed up with a request for more information:

> Hello Pieter,

>

> Thank you very much for your prompt response. I appreciate you taking

> time from your busy schedule to answer.

>

> Can you elaborate on the problem with the equipment?

>

> And do you keep a public changelog or publish notices of such changes as

> occurred yesterday?

>

> Thank you for your consideration.

>

> Anthony Watts

To which he responded with a blunt one-liner:

From: “Pieter Tans” <Pieter.Tans@xxxx.xxx>

To: “Anthony Watts – TVWeather” <awatts@xxxxxxx.xxx>

Sent: Tuesday, August 05, 2008 9:30 AM

Subject: Re: question on ML CO2 monthly mean data change

The computer disc crashed…

When I read that, I was simply floored. Here we have what is considered the crown jewel of all surface based CO2 measurement stations suddenly missing 20 days of data, and it was all due to a hard disk crash. In this day and age of cheap storage and RAID systems it seemed unfathomable that such a thing could happen, especially to something so important as this data.

So I asked again:

> Thank you Dr. Tan for your forthright communications and willingness to

> explain.

>

> I am puzzled though, as to how a hard disk crash could permanently lose 20

> days of data. Surely with something so important that the whole world is

> watching, you have a backup of the data? Or even written lab notes?

> Collation forms? Data entry forms?

>

> Thank you for your consideration.

>

> Best Regards,

> Anthony Watts

To which he replied:

From: “Pieter Tans” <Pieter.Tans@xxxx.xxx>

To: “Anthony Watts- TVWeather” <awatts@xxxxxx.xxx>

Sent: Tuesday, August 05, 2008 12:47 PM

Subject: Re: THANK YOU Re: question on ML CO2 monthly mean data change

Anthony,

There are three independent backups of the MLO record: We also take

flask samples at MLO analyzed here in Boulder, and at Scripps Ralph

Keeling is continuing the record his father started with both continuous

analyzer measurements as well as flask samples taken at MLO and analyzed

at Scripps.  Beyond that, we are monitoring CO2 at ~60 other places over

the globe, and many other scientists are as well, in yet other places.

Don’t ever make easy assumptions that we are cavalier about this,

keeping no records etc.  We do not have infinite amounts of time and

money, however.  Instruments tend to stay at a given station for a long

time.  We are not allowed to lobby for additional support, as federal

scientists.

Pieter Tans

It started to become clear to me how this might occur, especially with his statement of “Instruments [that] tend to stay at a given station for a long time”. If the data recorder was older, such this as mirrored RAID wouldn’t be a part of it, and indeed a single disk failure could render the entire measurement process void.

http://cdiac.ornl.gov/trends/co2/graphics/machine.jpg

Siemens Ultramat 3 nondispersive infrared gas analyzer used at MLO, referenced here.

Compare to some more modern equipment by the same company

So I wrote back:

> Hello again Pieter,

>

> Thanks again for the reply. I do apologize if my response suggested that

> you are “cavalier”, that was not the intent. You must have had a trying

> couple of weeks. I’m sure that my probing didn’t help.

>

> It’s just that I was just a bit incredulous that 20 days of data could get

> lost in this day and age. But reading your response about “length of time

> equipment stays at a station”, I think I understand now how it could happen.

>

> Having worked for a state agency once, and knowing the procurement processes

> and pitfalls I’m guessing that you are operating an older data recorder,

> probably with a hard drive that is difficult to find these days, like an old

> ST225

>

> Knowing lab technology from the 70’s 80’s and 90’s, I could see how you

> could be going along with such devices thinking it was correctly recording

> the data, only to find later your dataset came up empty.

>

> Things like that have happened to me.

>

> My interest is making sure the data is right, whether it is up or down isn’t

> much consequence to me at this point, trust in the dataset is the most

> important issue.

>

> Initially it appeared that there was a drop in PPM from January to July,

> which was truly unique. Then the data changed suddenly. It was that abrupt

> change that raised concerns. While I’m sure you are removed from it,

> measurement quality control, data quality issues, and arcane unexplained

> adjustments have plagued the surface temperature record. Attempts to get

> answers have been stonewalled and met with hostility. Replication in science

> should never be met with hostility, in my view.

>

> Your dialog though has been a refreshing change from the way people like Jim

> Hansen treat people that ask honest questions like “why did the data change

> abruptly’ and “what were the adjustment methods used”?

>

> I hope you won’t mind a suggestion that could prevent such problems.

>

> Based on my experience, one of the biggest favors you can do yourself would

> be to put up a section near the FTP links that is a running change log about

> the data. That way, if a change is needed for truly valid reasons (such as

> this) you have a way to notify the public.

>

> For example, if I had known that the data posted Sunday August 3rd, was

> missing 20 days for July, I never would have considered looking at it until

> that issue was resolved. The boilerplate caveats saying the data could be

> revised up to a year really don’t convey anything beyond a generic caution.

> In this case a specific caution would have helped, a lot.

>

> To not do so invites a lot of speculation, as you’ve probably noted. But as

> it was presented Sunday August 3rd, it appeared to be ready for primetime,

> and was absent such caveats.

>

> I’m satisfied with the answers you’ve provided, and If you’d like to write

> up a statement explaining the whole issue, beyond what has already been

> said, I’ll be happy to post it. That should quiet things down a bit. Absent

> that, perhaps due to restrictions you may be under, I’m prepared to put the

> issue to rest and write-up what I know based on our correspondence. I’ll

> even offer a preview if you like.

>

> Perhaps I can even help your mission. If it becomes clear public knowledge

> that you are operating with substandard equipment with data recording

> reliability issues, some procurement action could be taken in the future.

> You’d be surprised who reads my blog. Again my whole issue has been and

> always will be “the data should be right”.

>

> Again thank you for your willingness to discuss and communicate.

>

> Best Regards,

>

> Anthony Watts

To which he replied with a final note:

Anthony,

You asked for my comments on your previous email.  I have a few.  With

respect to the “drop” in (seasonally corrected) CO2 you mentioned, it

happens frequently.  The annual increase has averaged about 2 ppm per

year recently, which equals 0.17 per month.  The “noise” in monthly data

is larger than that.  In 1994 there were three months in a row that CO2

went down, for example.  I grant that the drop in July did look

suspicious, though.  I knew that the direct monthly mean based on 10

days had to be corrected, and I had written a program to make that

correction, including also small corrections to other months, for the

same reason.  It was unfortunate that the uncorrected July data did

appear on the web, which was not intended, and we corrected it the next

day.

We have thought about a change log, and may still do that, but thought

that it would be too much detail for almost everyone.  Our methods have

been published, and our data are freely available on the web site, so

that anyone interested can do his/her own analysis.  We are committed to

complete and prompt availability of our data because it is essential to

credibility and it improves the science.  The promptness implies that we

are more likely to make a mistake in public now and then, but we take

that in stride.  Please check out our CarbonTracker web site which

embodies the same philosophy.  CarbonTracker “translates” observed CO2

patterns into an assessment of emissions/uptake of CO2 that is optimally

consistent with the observations.  We are very much aware that in a time

when carbon dioxide emissions will cost a lot of money, there has to be

an objective and fully credible way to quantify emissions.  Without

that, carbon markets cannot function efficiently, and policies cannot be

measured relative to their objectives.  We think that the atmosphere

itself can provide objective quantification.

With respect to reliability, it is a fact that the equipment we use is

not good enough “off-the-shelf” to produce the measurement accuracy that

is needed.  We have to build an entire control and gas handling system

around every analyzer to keep it in check.  We control temperature,

pressure and flow rate, dry the air stream, and inject calibrated

reference gas mixtures at regular intervals, etc.  Since very recently,

there are what appear to be much better instruments on the market,

fortunately.  The last steps in quality control are the comparisons with

independent measurements I mentioned earlier, and scientific analysis of

the data.

I am not much of a blogger, but would appreciate a preview if you write

something about our correspondence.

Best regards,

Pieter Tans

In the meantime, some of the commenters on the blog called for significant scrutiny:

Basil (06:47:08) :

If any think that this is grasping at straws, just remember that this is a bellwether site for the AGW hypothesis. So it deserves all the scrutiny it gets, and has to live up to the strictest standards because of it.

And some did their own analysis, one notable was Dee Norris, who did her own analysis of the changes in the data.

=============

Dee Norris (11:42:47) :

The adjustments go both ways as seen on this plot: http://tinyurl.com/6qb3sg

The net gain is 0.19 ppmv over the entire 34 year record – this includes the July 2008 adjustment. If we back out the July 2008 adjustment of 0.67 ppmv, the gain becomes a decrease of 0.48 ppmv.

I really don’t see anyone here diddling with the data-set in order to amplify the AGW aspect.

and later she wrote:

Dee Norris (11:49:42) :

I have been having an ongoing email exchange with Dr Tans. In the last go round, I asked him to confirm my understanding of the nature of the adjustment.

I wrote:

“Am I correct that when you changed the program to account for the missing 20 days in July, there was a backward propagation of adjustments filling in for other missing days?”

Dr Tans replied:

“You are good.

When I was at it, I made another adjustment to the program. I used to fit 4 harmonics (sine, cosine with frequencies 1/year through 4/year) to describe the average seasonal cycle. I changed that to 6 harmonics.

Therefore, there will be small systematic differences as a function of time-of-year in the de-seasonalized trend. That will be on top of adjustments caused by months in the past during which there were a number of missing days not symmetrically distributed during that month.”

I think we are too conditioned to data getting Hansenized and may be jumping to conclusions. So far, unlike Hansen, Dr Tans has been forthright with communicating his approach.

=============

Summary:

Unlike the seemingly random and cloaked adjustments we’ve seen from Hansen and GISS, the MLO adjustments used in this episode appear to have a purpose, and the result is that the data, while adjusted, doesn’t really get much change at all, except where there is a missing data period. The results and explanation seem reasonable to me, and to others I’ve corresponded with about it.

There are however some remaining significant issues which I think need to be addressed, some of which which have been raised by commenters to this blog.

1. From my perspective a change log is needed for any public dataset like this, and especially one this important. As I mentioned in correspondence, had I known 20+ days were missing from the July 2008 dataset, I would not have even bothered to write about it. I think MLO erred in not making the state of the data known both on the initial posting on August 3rd, as well as the “adjusted data” on August 4th. Both releases suffered from a lack of explanation, which invited speculation.

2. There appears there could be a bit of confirmation bias going on. From Dr. Tan’s own writing to me:

I grant that the drop in July did look  suspicious, though.  I knew that the direct monthly mean based on 10  days had to be corrected, and I had written a program to make that

correction, including also small corrections to other months, for the  same reason.

Thus it appears that the algorithm had not been used before. And many commenters have pointed out that there may have been peaks that weren’t caught in the past, because well, the conditioned expectation we’ve been exposed to is that “CO2 is going up”. So errors on the positive may not have been caught due to this human condition when it comes to inspecting the data. One commenter wrote:

Even if Dr. Tans’ adjustment is reasonable and defensible on its face, it is still a bit troubling. Would a similar adjustment have been made if the CO2 numbers were higher than expected? Somehow I doubt it. And if not, it has the potential to introduce a bias into the numbers.

One of the cardinal rules of statistical analysis is that you choose your criteria BEFORE you see the data. Otherwise it’s very easy to fool yourself (and others) into thinking your results are significant.

By analogy, Dr. Tans should have carefully chosen an averaging method IN ADVANCE and then stuck with it.

While the biases that may exist may be small, catching them on both sides of the zero anomaly line builds confidence in the dataset.

3.  There’s a hole in the public process. This is public data, and thus under the auspices of the Data Quality Act. Altering data in a 24 hour window with no notice, and more specifically no review, public comment, comparitive dataset logs,  and no immediate posted public identification of the cause (except after prodding) is surely a fast track to a DQA violation. While I’m appreciative of Dr. Tan’s willingness to share information and converse, unlike some other publicly funded scientists, I’m also critical of the way it has been handled from the data publishing side. This needs correction, as the public trust is at issue.

It is my opinion that the truly raw CO2 data along with the adjusted data and should be published so that there is complete transparency.

4. Finally there are a few questions that remain that perhaps we’ll get some answers to:

  • How many times in the past have these adjustment algorithms been run?
  • How long has Dr. Pieter Tan been responsible for adjustments?
  • Did his predecessor instruct in the necessity of these wholesale adjustments?
  • Are there any records of previous adjustments?
  • Did Dr. Keeling initiate a protocol that required regular adjustments?

All in all, while this episode produced as Lucia described it, a “Kerfluffle”, it has had the positive effect of putting some needed scrutiny on a dataset that most everyone, until now, has not signifcantly questioned. The errors at MLO that allowed an incomplete data set to be posted with no visible caveats for the public user have highlighted weaknesses in the system that need correction. I’m hopeful that this episode will bring about positive changes, especially in the reporting by MLO on the current state of the data set.

0 0 votes
Article Rating

Discover more from Watts Up With That?

Subscribe to get the latest posts sent to your email.

52 Comments
Inline Feedbacks
View all comments
January 13, 2009 6:49 am

[…] trend line for that period was just about flat at 2.0 PPM annual addition. In a response to a letter from Anthony Watts regarding the July 2008 data (10 days) Pieter Tans wrote: The annual increase has averaged about 2 […]

Idrian Resnick
February 11, 2009 8:18 am

I am a newcomer, non-scientist, to this world of global warming. Can anyone direct me to a time series containing global averages of CO2 ppm from 1908 onward.
Thanks.
Idrian