Decimals of Precision

Guest Post by Willis Eschenbach

Over at Judith Curry’s excellent blog there’s a discussion of Trenberth’s missing heat. A new paper about oceanic temperatures says the heat’s not really missing, we just don’t have accurate enough information to tell where it is. The paper’s called Observed changes in top-of-the-atmosphere radiation and upper-ocean heating consistent within uncertainty.

It’s paywalled, and I was interested in one rough number, so I haven’t read it. The number that I wanted was the error estimate for their oceanic heating rates. This error can be seen in Figures 1a and 3a on the abstract page, and it is on the order of about plus or minus one watt/m2. This is consistent with other estimates of upper ocean heat content measurement errors.

I think I can conclusively demonstrate that their claimed error is way too small. To understand why, let me take a detour through the art, science, and business of blackjack.

In a fit of misguided passion, some years back I decided to learn how to count cards at blackjack. I had money and time at the same moment, an unusual combination in my life, so I took a class from a guy I’ll call Jimmy Chan. Paid good money for the class, and I got good value. I’ve always been good with figures, and I came out good at counting cards. Not as good as Jimmy, though, he was a mad keen player who had made a lot of money counting cards.

At the time they were still playing single deck in Reno. And I was young, single, and stupid. So I took twenty thousand dollars from my savings for my grubstake and went to Reno. It was an education about a curious business.

Here are the economics of the business of counting cards.

First, if you count using one of the usual systems as I did, and you are playing single deck, it gives you about a 1% edge on the house. Not much, to be sure, but it is a solid edge. And you can add to that by using a better counting system or a concurrent betting system, where better means more complex.

Second, if you play head-to-head (just you and the dealer) you can typically play about a hundred hands an hour.

Doesn’t take a math whiz to see that if you don’t blow the count, you will win about one extra hand an hour.

And therein is the catch. It means that in the card counting business, your average hourly wage is the amount of your average bet.

It’s a catch because of the other inexorable rule of counting blackjack. This regards surviving the swings and arrows of outrageous luck. If you don’t want to go home empty-handed, you need to have a grubstake that is a thousand times your average bet. Otherwise, you could go bust just from the natural ups and downs.

Now, twenty thousand dollars was all I could scrape together then. So that meant my average bet couldn’t be more than twenty dollars. I started out at the five dollar level.

I’d never spent any time in a casino up until then. I felt like the rube in every movie I ever saw. I played a while at the five dollar level. You never win or lose much there, so nobody paid any attention to me.

After a day or so making the princely sum of $5 per hour, I started betting larger. First at the ten-dollar level. Then at the twenty-dollar level. That was good money back in those days.

But when you start to make a bit of money, like say you hit a few blackjacks in a row and you’re doubling down, they start paying attention to you, and the trouble begins. First they use the casino holodeck to transport a somewhat malignant looking dwarf armed with a pad and a pencil to your table. He materializes at the shoulder of the dealer, and she starts to sweat. I say she because most dealers were women then and now. She starts to sweat because the casino doesn’t really care about card counters. I was making $20 an hour on average? Big deal, everyone in the casino management made that and more.

What scares casino owners is collusion between dealers and players. With the connivance of the dealer a guy can have a “string of luck” that can clean out a table in fifteen minutes and be out the door, meeting the dealer later to split the money. That’s what casino owners worry about, and that’s why the dealer started sweating, she knew she was being watched too. The dwarf peered through coke-bottle thick glasses, and wrote down the number of chips on each stack in the dealer’s rack, how much money I had, how much other players had. He gave the dealer a new deck. He wore a suit that cost as much as my grubstake. His wingtip shoes were shined to a rich luster. He looked at me as though I were a rich man with a loathsome disease. He watched my eyes, my hands. I started sweating like the dealer.

If I continued to win, the holodeck went into action again. This time what materialized were two large, vaguely anthropoid looking gentlemen, whose suits were specially tailored to conceal a bulge under the off-hand shoulder. They appeared one each shoulder of the vertically challenged gentleman, who looked even dwarfier next to them, but clearly at ease in his natural element. They all three stared at me, and when that bored them, at the dealer. And then at me again.

And if the dealer was sweating, I was melting. I’m not made for that kind of game, I’m not good at that kind of pretence. I found out you can take the cowboy out of the country, but you can’t make him go mano-a-mano with the casinos for twenty bucks an hour.

I lasted a week. I logged my hours and my winnings. During that time, I worked well over forty hours. I only made enough money to pay for the flight and the hotel, and that’s about it. I was glad to put my twenty grand back in the bank.

I couldn’t take the constant strain and pressure of counting and not looking like I was counting and trying to stay invisible and feeling like a million eyes in the sky were watching my every eyeblink and having an inescapable feeling of being that guy in the movies who’s about to be squashed like a bug. But for those who can make it a game and keep it up, what an adventure! I’m glad I did it, wouldn’t do it again.

The part I liked the least, curiously, was something else entirely. It was that my every move was fixed. For every conceivable combination of my cards, the dealer’s card, and the count, there is one and only one right move. Not two. Not “player’s choice”. One move. I definitely didn’t like the feeling that I could be replaced by a vaguely humanoid 100% Turing-tested robot with a poor sense of dress and a really, really simple set of blackjack instructions

But I was still interested in the math of it all. And I had my trusty Macintosh 512. And Jimmy Chan had an idea about how to improve the odds by changing his counting method. And so did some of Jimmy’s friends. And he had a guy who tested their new counting method for them, at some university, for five hundred bucks a run.

So I told Jimmy I’d do the analysis for a hundred bucks a run. He and his friends were interested. I wrote a program for my Mac to play blackjack against itself. I wrote it in Basic, because that was what was easy. But it was sloooow. So I taught myself to program in C, and I rewrote the entire program in C. It was still too slow, so I translated the critical sections into assembly language. Finally, it was fast enough. I would set up a run during the day, programming in the details of however the person wanted to do the count. Then I’d start it when I went to bed, and in the morning the run would be done and I’d have made a hundred bucks. I figured that was really what my computer was for, to make me money while I slept.

The computer had to be fast because of the issue that is at the heart of this post. This is, how many hands of blackjack did the computer have to play against itself to find out if the new system beat the old system?

The answer turns out to be a hundred times more hands per decimal. In practice, this means at least a million hands, and many more is better.

What we are looking at is the error of the average. If I measure something many times, I can average my answers. Is the resulting mean value the true underlying mean of what I am measuring? No, of course not. If we flip a hundred coins, usually it won’t be exactly fifty/fifty.

But it will be close to the true average of the data. How close? Well, the measure of how close it is expected to be to the true underlying average is what is called the “standard error of the mean”. It is calculated as the standard deviation of the data divided by the square root of the number of observations.

It is the last fact that concerns us. It means that if we double the number of observations, we don’t cut the error in half, but only to 0.7 of the original value. One consequence of this is that if we need one more decimal of precision, we need a hundred times the number of observations. That is what I meant by a hundred times per decimal. If our precision is plus or minus a tenth (± 0.1) and we want to know the answer to one more decimal, plus or minus one hundredth (± 0.01), we need one hundred times the data to get that precision.

That is the end of the detour, now let me return to my investigation of their error estimate for the ocean heating rate for the top 1800 metres of the ocean. If you recall, or even if you don’t, that was 1 watt per square metre (W/m2).

Now, that is calculated from temperature readings from Argo floats, about 3,000 of them during the study period.

Let me run through the numbers to convert their error (in w/m2) into a temperature change (in °C/year). I’ve comma-separated them for easy import into a spreadsheet if you wish.

We start with the forcing error and the depth heated as our inputs, and one constant, the energy to heat seawater one degree:

Energy to heat seawater:, 4.00E+06, joules/tonne/°C

Forcing error: plus or minus, 1, watts/m2

Depth heated:, 1800, metres

Then we calculate

Seawater weight:, 1860, tonnes

for a density of about 1.03333.

We multiply watts by seconds per year to give

Joules from forcing:, 3.16E+07, joules/yr

Finally, Joules available / (Tonnes of water times energy to heat a tonne by 1°C) gives us

Temperature error: plus or minus, 0.004, degrees/yr

So, assuming there are no problems with my math, they are claiming that they can measure the temperature rise of the top mile of the global ocean to within 0.004°C per year. That seems way too small an error to me. But is it too small? If we have lots and lots of observations, surely we can get the error down to that small?

Here’s the problem with their claim that the error is that small. I’ve raised this question at Judith’s and elsewhere, and gotten no answer. So I am posing the question again, in the hope that someone can unravel the puzzle.

We know that to get a smaller error by one decimal, we need a hundred times more observations per decimal point. But the same is true in reverse. If we need less precision, we don’t need as many observations. If we need one less decimal point, we can do it with one-hundredth of the observations.

Currently, they claim an error of ± 0.004°C (four thousandths of a degree) for the annual average upper ocean temperature from the observations of the three thousand or so Argo buoys.

But that means that if we are satisfied with an error of ± 0.04°C (four hundredths of a degree), we could do it with a hundredth of the number of observations, or about 30 Argo buoys. And it also indicates that 3 Argo buoys could measure that same huge volume, the entire global ocean from pole to pole, to within a tenth of a degree.

And that is the problem I see. There’s no possible way that thirty buoys could measure the top mile of the whole ocean to that kind of accuracy, four hundredths of a degree C. The ocean is far too large and varied for thirty Argo floats to do that.

What am I missing here? Have I made some major math mistake? Their claimed error seems to be way out of line for the number of observations. I’ve not been able to find a good explanation of how they come up with these claims of extreme precision, but however they’re doing it, my math doesn’t support it.

And that’s the puzzle. Comments welcome.

Regards to everyone,



newest oldest most voted
Notify of

Great post…


Each buoy itself makes thousands of observations? Just a thought…


Willis, there is no mystery here. Their precision is between 10 to 100 times less than what they think it is. 3000 Argo buoys is a trifling number considering the vastness of the world’s oceans. The problem is if they acknowledge this lack of precision, they will then claim that the “missing heat” is in the error range of their ocean temperature measurements and therefore that they are right!

Matthew Bergin

The three thousand buoys are only measuring an volume of the ocean equal to the amount they displace at any one time. So only a very small fraction of the total ocean is being measured.

Well, there’s quantity, and then there’s quality. Quality embraces distribution and location. Are the samples random? Are they clustered? Etc.

Chris Nelli

Given enough time, I think your assumption is right. You have to consider the time for the oceans to be mixed, etc. In your blackjack computer model, did you have to consider the time it took to shuffle the deck after so many hands are played? Likely not, if you used the random number generator command. In summary, you have to balance the number of buoys against the speed of mixing that occurs in the ocean. The more buoys, the less concern one has about mixing.

Richard Keen

Willis, nice job and impeccable logic.
A couple of years ago Vincent Gray wrote a short piece on ICECAP about the missing heat, and I chimed in with an estimation of the error based on the differences between Trenberth’s 1997 and 2009 values for the energy budget. The working assumption was that the differences between the two estimates for each component was roughly the error for each value. The conclusion:
Trenberth’s net “missing heat” of 0.9 W/m2 in 2009 had an uncertainly of +/- 1.3 W/m2. In other words, it could be zero, or even negative.
A different approach than Willis used, but the same conclusion.
Details at:

Bob Shapiro

Always enjoy your posts, Willis.
One possible difficulty that I see with your analysis is the proverbial apples to oranges problem. In your fun description of your gambling escapade, you’re looking at percent error. Your betting advantage is 1%, and you’re looking at how to raise that advantage by comparing two systems to see which returns a greater percent advantage.
While I don’t disagree with your conclusion relating to their not being able to measure the oceans with so few “stations,” I think you would need to show percent error rather than a decimal fraction of a degree…. Or mybe my thinking is all wet.


And it also indicates that 3 Argo buoys could measure that same huge volume, the entire global ocean from pole to pole, to within a tenth of a degree.

There’s no possible way that thirty buoys could measure the top mile of the whole ocean to that kind of accuracy, four hundredths of a degree C

Well, lets say the floats report temperature to +/- 0.1 degree. With three floats, we can be fairly certain that we can measure the parts of the ocean where the floats have measured to 0.1 degree. Now comes the second part, extrapolation. It is then absurd to project the average of those three floats across the entire globe because there are more than three different temperature regimes across the globe. Now lets say we go to 300 floats. Now we can get the error down to 0.01 degrees, but again only that certainty in the areas where we have sampled. But we can probably begin to project the values received from those floats as being representative of larger areas of the world’s ocean. So there are two uncertainties here, not one. The first uncertainty is the uncertainty of the measurements themselves. The second is the uncertainty of how much this actually projects across the entire globe.
Using a gambling analogy, lets say you are studying the average payout of slot machines across all of Reno. If you decide to use only three slot machines, one in each of three casinos, you will first need to take a number of samples just to find out what the average of those three are. But are those three representative of all slot machines in all establishments in Reno? So now you decide to have 300 testers at one slot machine in 300 different locations (Reno has slot machines everywhere, even in the gas station). So now you must make a lot of measurements at each one, but are you still sure that the result can be projected across the entire population of slot machines in Reno? Might you have obtained a different result at one of the establishments if you had picked a slot machine at the end of a row close to the entrance?
Ocean temperatures aren’t perfectly stratified across the entire ocean or even the entire region. Maybe a float dove through a cold upwelling someplace that is actually fairly localized. Is its temperature reading typical for a huge region of ocean? So there are two uncertainties involved. The error of the average from taking so many samples in places in so many different locations, and then there is a projection or extrapolation uncertainty. To what extent is it really valid to have a temperature reading in one spot represent some entire grid square? Does one slot machine’s payout represent the payout of all other slot machines within that grid square? Does the average obtained from all of the measurements really have any significant meaning at all? Maybe not. Maybe each place is different until you get so many readings from so many locations that the various local conditions can be accounted for. Which, by the way, is one reason I don’t like the wholesale removal of stations from the global surface temperature databases. The weather in Sacramento may not be representative of the weather in Ashburn and the temperature in Willits may not be representative of the temperature at Ft. Bragg, California (not to be confused with Ft. Bragg, NC).


“Each buoy itself makes thousands of observations? Just a thought…”
But it is moving so each observation is at a different point. If you go from slot machine to slot machine playing one game at each, do you really learn the average payout of machines at the establishment?

I’ve made this argument before about land stations not having enough individual accuracy as demonstrated by Anthony’s Surface station project. then the laughable idea that you can get a global temperature from using anything short of millions of locations, most of them needing to be in the sea, and many more needed in the polar regions.

If there is any missing thermal energy I would expect to find it under the ocean floor (and in the crust under land surfaces) where it really must go when temperatures rise relatively quickly. Please see my post here for reasons . . . January 26, 2012 at 7:26 pm


Claiming a single figure for widely varied oceans seems a bit of a stretch of credibility.
Then… cutting the number of devices doing the sampling would tend to make the expected extremes more extreme.
Effectively, its layer upon layer of BS.

Your statistical preamble is based on all numbers being part of the same chance-based set.
The Argo buoys measure diffferent points that are not part of the same chance-based set but are physical place-times in a variable ocean. The rules for how many observations you need to arrive at an ocean average would surely be completely different. So you have the answer already: “The ocean is far too large and varied“.
Not very well expressed, and I’m no expert in this statistical stuff, so hopefully someone will explain it better.

It’s worse than you thought, Willis! 😉
The observation that you need 100 times the observations to gain an extra significant digit is well stated. But that is all you can hope for provided the observations are independent.
In the case of Argo, there is likely a bit of covariant linkage between the readings. My bias is that neighbors are positivly corelated, and if so, it will take many more than 100 x to gain that decimal point.


Did precision get confused with accuracy? Calibration? Who’s checking? How?


I’d still like to do a variogram on some of the data to see if there’s any spatial correlation, Willis. I’m willing to admit right off we’d find correlation down each temperature profile, but that only counts for one dimension. Whether there’s any geostatistical correlation in the other two dimensions is the big question, and if we assume there isn’t, then there’s no reason to forge ahead in that effort–the data can be treated simply as random samples and your assertions are likely correct–they can’t be as confident as their numbers indicate.
But in the fanciful world of ocean geostatistics, the overall average and distribution of “blocks” representing the entire ocean would be far different from that of the sample points from which it was derived. I’ve seen it hundreds of times applying geostatistics to insitu mineral models on a variety of scales–using drill holes spaced 1,000 feet, 500 ft, 100 ft, and even 20 feet apart (the latter are numerous blast holes drilled on mining benches). And generally the more samples that are taken, the more the range on the variogram shrinks and the natural heterogeneity is accentuated. I wouldn’t be surprised if the same were true of the vast oceans.
Which means we probably won’t ever get there–to achieve their stated level of error with any acceptable confidence would require spatial statistics and likely require far more time and money than anybody is willing to invest. (Ah, to be the recipient of some wealthy aunty’s vast fortune someday.)

Keith W.

Sorry Jeef, but Willis dug into that as well in another post, and they don’t make that many observations.

Jim D

I don’t think anyone would be satisfied with an error of 0.04 degrees because this is much higher than the annual average warming rate which is nearer 0.01 degrees per year. So 0.004 degrees per year, which they have, means they can measure the temperature rise of the ocean on a annual to decadal time scale, and can at least tell with high confidence whether an annual warming rate of 0.01 degrees exists, which I think they can.

Willis Eschenbach

jeef says:
January 26, 2012 at 9:08 pm

Each buoy itself makes thousands of observations? Just a thought…

That’s one of the beauties of this particular way of showing that their claimed error is too small. It doesn’t matter how many observations each Argo float makes. Each float actually makes three vertical temperature profiles per month.
But it doesn’t matter, because however many observations were taken per float, we need a hundred times less observations, which means a hundredth of the floats.

Willis, silly question: Have you asked The Statistician to the Stars?

Thanks Willis, my own experience leads me to disbelieve most any temperature data reported at better than ±0.5°C. Call me crazy, but I just don’t trust temperature measurement that well, especially when there is any room for subjective human judgement.

David Falkner

@ jeef:
There would still be issues with the spatial distribution. That may be where 3000 buoys will work to get a thousandths place error, but 3 will not work for tenths. Although, I am wondering if one of those buoys can even measure a single measurement that precisely.


Math like this only works with observations that are 100% accurate (no error). In the real world, instruments have errors and limited accuracy and instruments will drift with time.
For example, if there is random background noise affecting an instrument then the accuracy of the measurement reading of the instrument can be improved by square root of the number of observations. However, you cannot improve non random error – so ultimately measurement accuracy soon falls victim to inherent limitations of the instrument (non-linearity, offset, drift and resolution). You cannot solve this by adding more instruments or more observations because of the non-random nature of many of these errors (similar built instruments will suffer from similar degrees of non-random error)

Willis Eschenbach

David Falkner says:
January 26, 2012 at 10:14 pm

@ jeef:
There would still be issues with the spatial distribution. That may be where 3000 buoys will work to get a thousandths place error, but 3 will not work for tenths. Although, I am wondering if one of those buoys can even measure a single measurement that precisely.

Thanks, David. I wrote about the Argo floats here. Basically, their thermometers are good to ± 0.005°C, and seem to maintain that over time.

Willis Eschenbach

thepompousgit says:
January 26, 2012 at 10:07 pm

Willis, silly question: Have you asked The Statistician to the Stars?

Briggs sometimes reads WUWT … you out there, William?


I think the counters at Blackjack win by raising their bet significantly when the count is favourable, and sitting around with low bets the other 98% of the time. But this of course raises suspicions at the casino, its easy to see when somone varies their bet greatly. And you have to be willing to lose the big bet, which most people aren’t willing to do.
As for the oceans, I assume they intergrate a time factor, that is; whatever degree of larger error is in the measurement at one time (say within a cold current which moves around), will cancel out the next time (when the cold current has weakened), meaning the larger errors cancel out over time. Not sure if this is what you are after, but a trend over time might reduce such errors.
Off topic a bit, but I agree with many that averaging data further back in time by proxy measurements shouldnt be allowed (such as in Mann’s various papers), as in this case what you are averaging is not data, but proxy data, meaning you are 1) mixing different uncertainties 2) biasing data towards whatever errors are in the proxy itself over time, that is; many proxy methods get less responsive the further back in time you go, meaning if you are averaging proxy data further back in time, you will likely simply flatten any deviations the further one goes back in time, with the older proxy data being by nature, less responsive. (and you get a hockeystick towards the recent end, as data becomes more reponsive). Mann uses this false method to claim the MWP was lower in T than the today.

One of my favorite statistical modeling stories I heard in graduate school in the late ’70s. It dealt with the design of the Trans Alaskan Pipeline Oil Tanker Terminal at Valdez, Alaska. The pipeline was designed for a maximum 2,000,000 bbls of oil per day. So, there had to be capacity at Valdez to temporarily store that oil if tankers were late. So the crucial question was what is the tankage needed for a 9x.x% confidence to store the oil rather than slow down the pumps?
Tankers are mechanical and therefore have some schedule delays. But the crucial problem was Gulf of Alaska weather. No problem. Thanks to Russian fishing villages, they had 200 years of weather reports. All the members of the consortium had the same data. Their OR teams crunched the numbers and came back with:
1.25 days
1.50 days
7.00 days. !?!?
“How do you justify SEVEN Days when we come up with a little more than one?”
“We modeled tanker delay according to the frequency of major storms.”
“So did we.”
“And when a storm delays one tanker, it delays ALL of them.”
“….. hmmm. Right. The delays are not independent. ”
The story is they quickly settled on six days.
It is quite a big terminal.


Jeef is probably right. Many observations per buoy – making it a cluster sample.
Similar calculations can be made – just a little more tedious.

Regarding the error measurements discussed, what Watt says seems right to me.
A very similar point can be made about those models. They estimate SW and LW radiation and each is acknowledged to have 1% to 2% uncertainty. Then they take a difference to get net radiative flux at TOA. But such flux in practice varies between about +0.5% and -0.5% of total incident solar flux So how on Earth can the “accuracy” of the difference be even sufficient to determine whether the end result is positive or negative – ie warming or cooling?
The whole model business would be a joke if the consequences were not so grave – like spending $10 trillion over the next 100 years for developing countries who could well do with the money, yes, but spend it on more useful projects – projects that could easily save lives.

Well, if we believe the numbers, that is what the implication is. The ocean is just one big homogenous lump of water. You don’t need too many thermometers to measure its temperature – it is the same everywhere (within climatic zones, that is).
The missus and I were doing calculations of statistical power recently. I had several revelatory moments even then.
Nicely written.

Duke C.

The SV for blackjack with standard rules is 1.15. With a 1% edge you should have been betting $150 (.75% of your bankroll) per hand. Adjust your bet size according to bankroll fluctuations and you’ll never go broke. Eventually, you would have been a rich Casino owner instead of a poor climate change heretic. 😉

Jim D

Adding to what I wrote above, you have to remember this is a decadal trend that is being evaluated, so this 0.004 degrees per year is really an accuracy of 0.04 degrees per decade. Since the actual trend is probably just over 0.1 degrees per decade, they are saying they can resolve this reasonably well, and at least be certain of the warming. 0.04 degrees per year translates to 0.4 degrees per decade which is quite useless for determining even the sign of such a trend, and I am certain they can do and are doing better than that.

Eyal Porat

First, great post – especially the story :-).
This 0.004 C is soooo accurate and unmeasurable it seems meaningless up front. It is way too accurate to take seriuosly.
As you have taught me: when you smell a rat – in most cases there is one.

David Falkner

Basically, their thermometers are good to ± 0.005°C, and seem to maintain that over time.
In the body of the story, you calculated the error as equivalent to 0.004°C. Shouldn’t the error at least equal the error in the instrument? How could applying math to the output of the instrument make the instrument itself more efficient? I feel like I am missing something in your calculation, perhaps. Or maybe I am not understanding your point?

Two issues (why is it always two? – I will try to make it three or four just to be difficult.)
So, we must separate serial observations from observations in parallel, and precision from accuracy, and both from spatial distribution or coverage, or more pertinently, the density of measurements.
1. For estimating accuracy, not the number of buoys but the number of observations per buoy over time. Since each buoy only records locally, whatever it measures is only local. So if the buoys are moving around, you also need to know something about that.
2. The precision with which we can measure is a complex function of the instrumentation’s inherent or “lab” precision, the environment and the calibration procedures (if any.) This must be accounted for when measuring accuracy.
2. Clearly, even 3,000 buoys do not provide enough density to provide meaningful coverage. The best way to deal with accuracy under these circumstances might be to “stratify” measurements into temperature or energy bands, so that all buoys making measurements in the same band can be aggregated to assess accuracy. I make this point since it seems meaningless to be assessing
accuracy to however many decimals when the temperatures observed by different buoys might vary by factors of 2 or more.
For example, if say, 20% of the buoys operate in a “tropical” band, and the number of observation per buoy is say, 1,000 per week (I have no idea what the actual number might be) then we would have .2(3000)(1000) = 600,000 observations on which to assess accuracy, and then only if the swings in measurement were not too wild and the buoy did not move a thousand miles or more over the period.
Hope I am not writing total nonsense?

David Falkner

And we’re assuming top operational efficiency, at that!

write me. ill send u the paper

Alan Wilkinson

There is no statistical validity whatever in these error estimates since the assumptions that all the measurements are independent and drawn randomly from the same population are certainly false to unknown degrees.
Has anyone any quantitative examinations of why the simple sea-level change measurement should not directly reflect heat content over relatively short time-spans? Since the rate of change has been modest and fairly consistent?

Alan S. Blue

Willis, I keep meaning to write a couple of articles along the lines of “How do we measure Temperature – from a single thermometer to a Global Mean Average.”
One issue with the surface stations that may well be a factor in the oceanic measurements is using the NIST-calibrated “Instrumental Error” when calculating the error of measuring the temperature of a ‘gridcell’. Thermometers with a stated error of 0.1C are common. But the norm in climatology is to take that single measurement and start spreading it towards the next-nearest-thermometer under the assumption that they’re representative of the entire gridcell. And the assumption that the temperature is smooth relative to the size of the gridcells.
There’s nothing too wrong with that when you’re just aiming for a sense of what the contours of the temperature map look like … But when you turn around and attempt to propagate the errors to determine your -actual- precision and accuracy at measuring the temperature, there seems to be a recurring blind spot.
A perfect, out-of-the-box thermometer, NIST-calibrated with stickers and all as “±0.1C” just doesn’t in general provide a measurement of the temperature for the gridcell to an accuracy of 0.1C. Nor anywhere remotely nearby. Yes, the -reading- might well be “3.2C”, but that is a point-source reading
Watching a single weather forecast off your local news should drive this home: There’s only a couple gridcells (max) represented. And yet there’s obvious temperature variations. Some variations are endemic – because they represent height variations or natural features. (Or barbeques, whatever) But there is generally plenty of shifting in this analogous plot of temperatures as well. Some town normally warmer than yours might show up cooler, or way hotter, or tied on any given day.
It’s possible that someone has done a ‘calibration study’ with the Argos, to determine not just that it measures temperature in the spot it’s actually in, but to determine how well the individual sensor measures its gridcell. -Not- a “correlation study”, that just shows “Hey, we agree with other-instrument-X with an R of XXX” – that doesn’t tell you the instrumental error. I just don’t know enough details for the oceanic studies.
A ‘calibration study’ would look more like 3000 instruments dumped into a -single- gridcell. And it has exactly the same issue you bring up with regards to numbers: To make a perfectly accurate “map”, you need a map that’s the same size as the surface being mapped to cover all the irregularities.
In chemical engineering, (my field), temperature can be very important to a given process. Having five or so thermometers on the inlets and dispersed around the tank of a -well understood- process might well be dramatically insufficient to the job of “measuring the temperature” to a measly 0.1C. That’s a tank in a reasonably controlled environment with a “known” process.
So, I wouldn’t pay much attention to the error bars unless the instruments being used are more ‘areal’ as opposed to ‘point source’ or there’s an exhaustive study of some gridcells.

I have to ask, what’s the real point of all this measuring of OHC? Is it just a ploy for more funding to try to prove AGW a reality? The overall trend is not far different from sea surface temperatures. Wouldn’t it be better to get a satellite working again on sea surface temperatures to ensure some continuity of that very important data which, in my mind, is the most indicative of what’s happening and where we’re heading.

F. Ross

Willis. Interesting post.
Just curious; do you know what the power source of the buoys is. And has the possible presence of heat from a power source been taken into account – as far as it might possibly affect temperature measurements?

Alex the skeptic

“……..says the heat’s not really missing, we just don’t have accurate enough information to tell where it is”
It’s like santa. It’s not that santa does not really exist (don’t tell the kids), it’s just that we don’t know exactly where he lives, although he could have drowned togethher with the polar bears.


You can estimate the probability of tossing a head by flipping a coin 100 times by counting the number of heads flipped and dividing by 100. When you do this, the standard deviation of this estimate will be .5*.5 / sqrt(100) = .025 or 2.5%. People then like to say A 95% confidence region is plus or minus 2 standard deviations or +/- 5% in this case.
However, this last bit is relying on the law of large numbers to assert that you average estimate is normally distributed. If you were to only flip your coin once, you would get an estimate that has a 10x standard deviation, or 25%, but you would have to be very careful what you did with that. You could not assert that your estimate is normal and thus the 95% confidence interval is +/- 50%. You would either get a range of -50% to 50% or 50% to 150%…nonsense.
The problem is when you have a small sample, you have to work with the small sample statistical properties of your estimators and abandon all your large sample approximations. Bottom line is your statistics go haywire usually with small samples 🙂
Hope that helps!


“the heat’s not really missing, we just don’t have accurate enough information to tell where it is”
Like Little Bo-Peeps sheep.


If I’m reading this correctly, the error you mention is the error in the mean based on the data from the ARGO buoys….what about the measurement error for the buoys. What is the error associated with the actual measurement. The reason I ask is that errors are cummulative so any measurement error would increase the error of the mean.
Coming from a geostatistical background, how is the data treated in terms of data clustering. If you have lots of measurements very close (i.e. some of the buoys clustered together) then the weighting applied to these measurements should be lower, compared to the buoys from areas of sparse coverage. Unless the coverage is very even, then clustering may be an issue (e.g. more coverage in the tropics compared to the colder Arctic/Antarcic water). The result of this can bias the estimate of the mean.

F. Ross

Please ignore my previous post. Dumb question on my part when a simple Google search shows what I wanted to know.


What if sharks eat some of the buoys?


I guess we can state with certainty that Willis discovered Trenberth’s missing heat, it is found in Casinos at Blackjack tables populated by math whizzes 😉
Just a suggestion for the young rubes out there that get a taste of the favorable mathematics of counting. Besides money management and an adequate stake, there are two other variables that need to be controlled perfectly: the high-low counting must be correct, and the strategy (hitting/staying) must also be flawless. After locking up those two variables, the math is favorable, well, depending on number of decks. It is a hell of a lotta work to earn a small profit as Willis so artfully described. Truly it is the math challenge that draws so many Scientists to these games.
Willis, really wild guess, is Jimmy Chan == S.W.?


Um, I think this was touched on above somewhere, somewhat, but I do not think you could use this type of statistical reasoning at all.
The problem is that the oceans are, in effect, a collection of fluid-based heat-transferring machines. The system is deterministic, not random (not to mention that the systems change a lot over the course of the seasons, depending on the effect of previous seasons). In other words, using interpolation would not be an allowable procedure. (Unless you measure for decades? Even then, climate changes naturally over that timescale, so your results would end up being meaningless anyway.)
If you were to use randomly (or evenly, same thing if the machine is sufficiently unknown) distributed-in-space measurement buoys, then you will surely have very large error – to the point where your measurements would be worth nothing.
In the context of trying to determine global heat-flow, by measuring temps using the buoys.
This is a truism, is it not? I do not know what the buoys were purposed for, but it makes no sense to use them for this.