More Ocean-sized Errors in Levitus et al.

Guest Post by Willis Eschenbach

Previously, we discussed the errors in Levitus et al here in An Ocean of Overconfidence

Unfortunately, the supplemental information for the new Levitus et al. paper has not been published. Fortunately, WUWT regular P. Solar has located a version of the preprint containing their error estimate, located here. This is how they describe the start of the procedure they describe which results in their estimates:

From every observed one-degree mean temperature value at every standard depth level we subtract off a climatological value. For this purpose we use the monthly climatological fields of temperature from Locarnini et a. [2010].

Now, the “climatology” means the long-term average (mean) of the variable. In this case, it is the long-term average for each 1° X 1° gridcell, at each depth. Being a skeptical type of fellow, I though “how much data do they actually have”? It is important because if they don’t have much data, the long-term mean will have a large error component. If we don’t have much data, it increases the size of the expected error in the mean, which is called the “standard error of the mean”.

Regarding the climatology, they say that it is from the World Ocean Atlas 2009 (WOA09), viz: ” … statistics at all standard levels and various climatological averaging periods are available at http://www.nodc.noaa.gov/OC5/WOA09F/pr_woa09f.html

So I went there to see what kind of numbers they have for the monthly climatology at 2000 metres depth … and I got this answer:

The temperature monthly climatologies deeper than 1500 meters have not been calculated.

Well, that sux. How do the authors deal with that? I don’t have a clue. Frustrated at 2000 metres, I figured I’d get the data for the standard error of the mean (SEM) for some month, say January, at 1500 metres. Figure 1 shows their map of the January SEM at 1500 metres depth:

Figure 1. Standard error of the mean (SEM) for the month of January at 1500 metres depth. White areas have no data. Click on image for larger version. SOURCE

YIKES! In 55 years, only 5% of the 1° X 1° gridcells have three observations or more for January at 1500 metre … and they are calculating averages?

Now, statistically cautious folks like myself would look at that and say “Well … with only 5% coverage, there’s not much hope of getting an accurate average”. But that’s why we’re not AGW supporters. The authors, on the other hand, forge on.

Not having climatological data for 95% of the ocean at 1500 metres, what they do is take an average of the surrounding region, and then use that value. However, with only 5% of the gridcells having 3 observations or more, that procedure seems … well, wildly optimistic. It might be useful for infilling if we were missing say 5% of the observations … but when we are missing 95% of the ocean, that just seems goofy.

So how about at the other end of the depth scale? Things are better at the surface, but not great. Here’s that map:

Figure 2. Standard error of the mean (SEM) for the month of January at the surface. White areas have no data. Click on image for larger version. Source as in Fig. 1

As you can see, there are still lots and lots of areas without enough January observations to calculate a standard error of the mean … and in addition, for those that do have enough data, the SEM is often  greater than half a degree. When you take a very accurate temperature measurement, and you subtract from it a climatology with a ± half a degree error, you are greatly reducing the precision of the results.

w.

APPENDIX 1: the data for this analysis was downloaded as an NCDF file  from here (WARNING-570 Mb FILE!). It is divided into 1° gridcells and has 24 depth levels, with a maximum depth of 1500 metres. It shows that some 42% of the gridcell/depth/month combinations have no data. Another 17% have only one observation for the given gridcell and depth, and 9% have two observations. In other words, the median number of observations for a given month, depth, and gridcell is 1 …

APPENDIX 2: the code used to analyze the data (in the computer language “R”) is:

require(ncdf)

mync=open.ncdf("temperature_monthly_1deg.nc")

mytemps=get.var.ncdf(mync,"t_gp")

tempcount=get.var.ncdf(mync,"t_dd")

myse=get.var.ncdf(mync,"t_se")

allcells=length(which(tempcount!=-2147483647))

zerocells=length(which(tempcount==2))

zerocells/allcells

hist(tempcount[which(tempcount!=-2147483647)],breaks=seq(0,6000,1),xlim=c(0,40))

tempcount[which(tempcount==-2147483647)]=NA

whichdepth=24

zerodata=length(which(tempcount[,, whichdepth,1]==0))

totaldata=length(which(!is.na(tempcount[,, whichdepth,1])))

under3data=length(which(tempcount[,, whichdepth,1] < 3))

length(tempcount[,, whichdepth,1])

1-under3data/totaldata

APPENDIX 3: A statistical oddity. In the course of doing this, I got to wondering about how accurate the calculation of the standard error of the mean (SEM) might be when the sample size is small. It’s important since so many of the gridcell/depth/month combinations have only a few observations. The normal calculation of the SEM is the standard deviation divided by the square root of N, sample size.

I did an analysis of the question, and I found out that as the number of samples N decreases, the normal calculation of the SEM progressively underestimates the SEM more and more. At a maximum, if there are only three data points in the sample, which is the case for much of the WOA09 monthly climatology, the SEM calculation underestimates the actual standard error of the mean by about 12%. This doesn’t sound like a lot, but it means that instead of 95% of the data being within the 95% confidence interval of 1.96 * SEM of the true value, only about 80% of the data is in the 95% confidence interval.

Further analysis shows that the standard calculation of the SEM needs to be multiplied by

0.43 N -1.2

to be approximately correct, where N is the sample size.

I also tried using [standard deviation divided by sqrt (N-1)] to calculate the SEM, but that consistently overestimated the SEM at small sample sizes

The code for this investigation was:

sem=function(x) sd(x,na.rm=T)/sqrt(length(x))

# or, alternate sem function using N-1

# sem=function(x) sd(x,na.rm=T)/sqrt(length(x) - 1)

nobs=30000 #number of trials

sample=5 # sample size

ansbox=rep(NA,20)

for (sample in 3:20){

    mybox=matrix(rnorm(nobs*sample),sample)

    themeans=apply(mybox,2,mean)

    thesems=apply(mybox,2,sem)

    ansbox[sample]=round(sd(themeans)/mean(thesems)-1,3)}
Get notified when a new post is published.
Subscribe today!
0 0 votes
Article Rating
58 Comments
Inline Feedbacks
View all comments
jono1066
April 24, 2012 3:59 pm

I wish to complain !
Who gave the approval, and the funding, for the Giants Causeway to be completed between Scotland and Ireland (clearly visible in Figure 1 and 2) ? At least there would be no `landfill tax` to pay for all the fill required.
:0)

Editor
April 24, 2012 4:23 pm

When you take a very accurate temperature measurement, and you subtract from it a climatology with a ± half a degree error, you are greatly reducing the precision of the results.

To be exact, the error of the resulting anomalies should begin with that half degree error from the climatology and then add the error of the measurement, should it not?

Randy
April 24, 2012 4:34 pm

Successful scientists are just like successful business people. Err actually, vice-versa. Never trust the conclusion. Personally go to the raw data and roll around in it and work it up with your own eyes to understand the real picture. Good work as always. Where do you find the time?
You’ve earned another nice big oil check on this one! <</sarc

Wayne2
April 24, 2012 4:35 pm

Very nice! My favorite posting of yours so far.
I believe the important thing on the SEM is that when you calculate the 95% CI, you use a t distribution with probability 0.975 and n-1 degrees of freedom. If you have a lot of points, that comes out to around +/- 1.96 times your SEM. But with only three points, that comes out to +/- 4.3 times your SEM. If your SEM is a half degree, that comes out to +/- 2 degrees.

Lance Wallace
April 24, 2012 4:50 pm

Fascinating graphs.
1. Why are the worst errors at 1500 m depth in the North Atlantic?
2. The standard error at the surface is much larger than at the 1500 m depth (note the color scales). I guess this is because the temperature is much colder down below? How do the relative standard errors (SE/mean) compare? Presumably they should be much smaller at the surface because of greater N, although perhaps larger because of greater fluctuations of the temperature?

kforestcat
April 24, 2012 6:20 pm

Well ahead of us, as usual, Willis.
After reading the paper, I have additional issues with what looks like a serious lack of management oversight and quality control by NOAA. Examples in this paper are too numerous to fully document, but here is a sampling.
Consider where Syd Levitus states:

“Argo profiling float data hat have been corrected for systematic errors provide data through a nominal depth of 1750-2000 of the post-2004 period on a near-global basis. We have used data that were available as of January 2011. Many of these data have been corrected by the Argo delayed-mode quality control teams. If Agro data have not been corrected at the time we downloaded these data we still used them”. (Page 3, Lines 54-58, bolding added for emphasis)

Observation: Are we to understand that NOAA’s management really allows its scientific staff to bypass the agency’s own quality control process and use whatever data suits them?
Where Levitus states:

Our temperature anomaly fields could be considered to be more representative of the 2000-1750 m layer of the World Ocean however we have compared the OHC1750 and OHC2000 and find no difference between them. We hope to acquire additional deep ocean data from research cruises so we have opted to present results for the 0-2000 layer. ” (Page 3, Lines 65-69, bolding added for emphasis)

Observation: Lacking adequate data for in the 1750-2000 meter range Levitus uses the data anyway; because, he “hopes” to acquire adequate data in future research cruises? Since when are an Agency’s scientific personnel supposed to make conclusions based on data a senior staff member “hopes” to acquire data at some future time?
Where Levitus states:

Unlike salinity data from profiling floats, temperature data do not appear to have significant drift problems associated with them. (Page 3, Lines 58-60, bolding added for emphasis)

Observation: This suggests there are significant drift problems associated with the NOAA’s salinity data. Since the author’s specific gravity (density) and specific heats are calculated from the pressure, salinity, & temperature data gathered, then one would think an error here would be cause for concern. After all, Heat content = Temperature x Density x Specific Heat x Conversion factors.
Given Levitus’s brought up the issue, one would expect the he would discuss the significant “salinity” drift problem to the extent necessary to draw a conclusion or to satisfy a reader’s potential concerns. In view of the lead author’s own doubts; can one really conclude, with confidence, that NOAA’s calculated “heat content anomalies” are meaningful?
Where Levitus states:

It is our understanding that problems with profiling floats identified by Barker et al. [2011] have for the most part been corrected”. (Page 3, Lines 60-61; bolding added for emphasis)

Observation: Where the heck is the NOAA’s management and quality control oversight? Either the NOAA’s quality control officer can certify the “problems” have been corrected and cite internal documentation to that effect or not. Also, just when is a correction “for the most part” a satisfactory response?
Where Levitus states:

Also we believe that our quality control procedures (Boyler et al. [2009]; Levitus and Boyer [1994] have eliminated most remaining egregious problems” (Page 3. Lines 61-64; bolding added for emphasis)

Observation: “We believe” “most” of the “egregious problems” have been eliminated? Any project manager worth his salt would require the author to state exactly which “egregious problems” have been eliminated and cite the documentation showing correction. Moreover, exactly what “egregious problems” remain and to what extent could they impact the papers conclusions?
I understand this is a draft; but, this is the kind of sloppy work I would expect from an intern not a established professional.
Regards,
Kforestcat

Louis Hooffstetter
April 24, 2012 6:30 pm

The conclusions section of this paper should have ended with: “Ta Da! And now, for my next trick…”

D. J. Hawkins
April 24, 2012 6:42 pm

jono1066 says:
April 24, 2012 at 3:59 pm
I wish to complain !
Who gave the approval, and the funding, for the Giants Causeway to be completed between Scotland and Ireland (clearly visible in Figure 1 and 2) ? At least there would be no `landfill tax` to pay for all the fill required.
:0)

Don’t you mean “infill tax”? 😀

ferd berple
April 24, 2012 7:13 pm

YIKES! In 55 years, only 5% of the 1° X 1° gridcells have three observations or more for January at 1500 metre … and they are calculating averages?
Didn’t the paper claim better than 50% coverage at that depth???? Guess it depends on your definition of “coverage”.

ferd berple
April 24, 2012 7:18 pm

What effect does changing the gridcell size have on the results?
If the result is sensitive to a change in cell size, then this would argue strongly that the trend is not a true trend. Rather, an artifact of the methodology.

Steve Keohane
April 24, 2012 8:31 pm

Thanks Willis, this is amazingly little data. Is this all we have to compute Ocean Heat Content (OHC) from? How can one do that let alone calculate a trend?

James
April 24, 2012 9:11 pm

Hmm. perhaps you could use the more complete data to estimate the error associated with the more sparse data?
As I understand it, there is one relatively sparse data set (older, deeper) and a data set with better coverage (newer and shallow). You should be able to do some sort of bootstrap to get an estimate of the standard errors from the sparse data. For example you could:
-Assume the more complete data set is the full population and calculate the climatologies
-Next randomly select a subsample from the complete data set that is consistent with the sparse data (e.g. select enough points to get 5% coverage or whatever is most consistent with the sparse data)
-Calculate the climatologies given the sampled data.
-rinse and repeat getting a number of climatologies based on subsamples of similar coverage to the sparse data.
-I would think the standard deviation of these climatologies gives you something of a lower bound on the true uncertainty (since the monte carlo is assuming the sparse data coverage is independently distributed across the complete data coverage)
This would give you an idea of how big a deal it is to have sparse data.
You could even get fancy and try to match up data in the sparse set with the full set, calculate the climatologies and compare that to the value using the full data. That is in some sense a point estimate of the error induced by using the sparse data, but I am not sure how good an estimate that would be.
Probably not going to tell you anything you don’t know (the errors are big…), but might be interesting.
James

Steve Oregon
April 24, 2012 9:39 pm

How’s this for the naive question of the year?
Why can’t our governments rely upon honest and highly skilled people like Willis for the kind of quality input sound public policy making demands?
Instead we have people who are horribly hobbled by interests, ideology, ineptness and worse injecting what is certain to produce the most defective governance possible.

blogagog
April 24, 2012 9:41 pm

“Previously, we discussed the errors in Levitus.”
Haha! I read that sentence too fast and thought you were planning to discuss the errors in the Book of Leviticus :).

Evan Thomas
April 24, 2012 10:03 pm

Can one of your more scientifically knowledgable and literate posters tell us what all these figures and graphs purport to mean? Does the original paper contend that the oceans are warming faster than nature intended? if so dangerously? Or is the warming (if accurately measured) merely consistent with the oceans recovering from the Little Ic Age? Cheers from now chilly Sydney – max. 20*C.

Steve Oregon
April 24, 2012 10:09 pm

error correction:
Instead we have people who are horribly hobbled by interests, ideology, ineptness and worse injecting what is certain to produce the most defective governance possible.
[Fixed. -w.]

Chris G
April 24, 2012 10:21 pm

Willis did they at least correct for specific heat change at different density/level? Just curious.

Len
April 24, 2012 10:30 pm

Thanks again Willis. It takes one with your smarts and sweat to look into such papers. I am so glad you do.

1 2 3