Decimals of Precision

Guest Post by Willis Eschenbach

Over at Judith Curry’s excellent blog there’s a discussion of Trenberth’s missing heat. A new paper about oceanic temperatures says the heat’s not really missing, we just don’t have accurate enough information to tell where it is. The paper’s called Observed changes in top-of-the-atmosphere radiation and upper-ocean heating consistent within uncertainty.

It’s paywalled, and I was interested in one rough number, so I haven’t read it. The number that I wanted was the error estimate for their oceanic heating rates. This error can be seen in Figures 1a and 3a on the abstract page, and it is on the order of about plus or minus one watt/m2. This is consistent with other estimates of upper ocean heat content measurement errors.

I think I can conclusively demonstrate that their claimed error is way too small. To understand why, let me take a detour through the art, science, and business of blackjack.

In a fit of misguided passion, some years back I decided to learn how to count cards at blackjack. I had money and time at the same moment, an unusual combination in my life, so I took a class from a guy I’ll call Jimmy Chan. Paid good money for the class, and I got good value. I’ve always been good with figures, and I came out good at counting cards. Not as good as Jimmy, though, he was a mad keen player who had made a lot of money counting cards.

At the time they were still playing single deck in Reno. And I was young, single, and stupid. So I took twenty thousand dollars from my savings for my grubstake and went to Reno. It was an education about a curious business.

Here are the economics of the business of counting cards.

First, if you count using one of the usual systems as I did, and you are playing single deck, it gives you about a 1% edge on the house. Not much, to be sure, but it is a solid edge. And you can add to that by using a better counting system or a concurrent betting system, where better means more complex.

Second, if you play head-to-head (just you and the dealer) you can typically play about a hundred hands an hour.

Doesn’t take a math whiz to see that if you don’t blow the count, you will win about one extra hand an hour.

And therein is the catch. It means that in the card counting business, your average hourly wage is the amount of your average bet.

It’s a catch because of the other inexorable rule of counting blackjack. This regards surviving the swings and arrows of outrageous luck. If you don’t want to go home empty-handed, you need to have a grubstake that is a thousand times your average bet. Otherwise, you could go bust just from the natural ups and downs.

Now, twenty thousand dollars was all I could scrape together then. So that meant my average bet couldn’t be more than twenty dollars. I started out at the five dollar level.

I’d never spent any time in a casino up until then. I felt like the rube in every movie I ever saw. I played a while at the five dollar level. You never win or lose much there, so nobody paid any attention to me.

After a day or so making the princely sum of $5 per hour, I started betting larger. First at the ten-dollar level. Then at the twenty-dollar level. That was good money back in those days.

But when you start to make a bit of money, like say you hit a few blackjacks in a row and you’re doubling down, they start paying attention to you, and the trouble begins. First they use the casino holodeck to transport a somewhat malignant looking dwarf armed with a pad and a pencil to your table. He materializes at the shoulder of the dealer, and she starts to sweat. I say she because most dealers were women then and now. She starts to sweat because the casino doesn’t really care about card counters. I was making $20 an hour on average? Big deal, everyone in the casino management made that and more.

What scares casino owners is collusion between dealers and players. With the connivance of the dealer a guy can have a “string of luck” that can clean out a table in fifteen minutes and be out the door, meeting the dealer later to split the money. That’s what casino owners worry about, and that’s why the dealer started sweating, she knew she was being watched too. The dwarf peered through coke-bottle thick glasses, and wrote down the number of chips on each stack in the dealer’s rack, how much money I had, how much other players had. He gave the dealer a new deck. He wore a suit that cost as much as my grubstake. His wingtip shoes were shined to a rich luster. He looked at me as though I were a rich man with a loathsome disease. He watched my eyes, my hands. I started sweating like the dealer.

If I continued to win, the holodeck went into action again. This time what materialized were two large, vaguely anthropoid looking gentlemen, whose suits were specially tailored to conceal a bulge under the off-hand shoulder. They simply appeared, one at each shoulder of the aforementioned vertically challenged gentleman, who looked even dwarfier next to them, but clearly at ease in his natural element. They all three stared at me, and when that bored them, at the dealer. And then at me again.

And if the dealer was sweating, I was melting. I’m not made for that kind of game, I’m not good at that kind of pretence. I found out you can take the cowboy out of the country, but you can’t make him go mano-a-mano with the casinos for twenty bucks an hour.

I lasted a week. I logged my hours and my winnings. During that time, I worked well over forty hours. I only made enough money to pay for the flight and the hotel, and that’s about it. I was glad to put my twenty grand back in the bank.

I couldn’t take the constant strain and pressure of counting and not looking like I was counting and trying to stay invisible and feeling like a million eyes in the sky were watching my every eyeblink and having an inescapable feeling of being that guy in the movies who’s about to be squashed like a bug. But for those who can make it a game and keep it up, what an adventure! I’m glad I did it, wouldn’t do it again.

The part I liked the least, curiously, was something else entirely. It was that my every move was fixed. For every conceivable combination of my cards, the dealer’s card, and the count, there is one and only one right move. Not two. Not “player’s choice”. One move. I definitely didn’t like the feeling that I could be replaced by a vaguely humanoid 100% Turing-tested robot with a poor sense of dress and a really, really simple set of blackjack instructions

But I was still interested in the math of it all. And I had my trusty Macintosh 512. And Jimmy Chan had an idea about how to improve the odds by changing his counting method. And so did some of Jimmy’s friends. And he had a guy who tested their new counting method for them, at some university, for five hundred bucks a run.

So I told Jimmy I’d do the analysis for a hundred bucks a run. He and his friends were interested. I wrote a program for my Mac to play blackjack against itself. I wrote it in Basic, because that was what was easy. But it was sloooow. So I taught myself to program in C, and I rewrote the entire program in C. It was still too slow, so I translated the critical sections into assembly language. Finally, it was fast enough. I would set up a run during the day, programming in the details of however the person wanted to do the count. Then I’d start it when I went to bed, and in the morning the run would be done and I’d have made a hundred bucks. I figured that I’d finally achieved what my computer was really for, which was to make me money while I slept.

The computer had to be fast because of the issue that is at the heart of this post. This is, how many hands of blackjack did the computer have to play against itself to find out if the new system beat the old system?

The answer turns out to be a hundred times more hands per decimal. In practice, this means at least a million hands, and many more is better.

What we are looking at is the error of the average. If I measure something many times, I can average my answers. Is the resulting mean value the true underlying mean of what I am measuring? No, of course not. If we flip a hundred coins, usually it won’t be exactly fifty/fifty.

But it will be close to the true average of the data. How close? Well, the measure of how close it is expected to be to the true underlying average is what is called the “standard error of the mean”. It is calculated as the standard deviation of the data divided by the square root of the number of observations.

It is the last fact that concerns us. It means that if we double the number of observations, we don’t cut the error in half, but only to 0.7 of the original value. One consequence of this is that if we need one more decimal of precision, we need a hundred times the number of observations. That is what I meant by a hundred times per decimal. If our precision is plus or minus a tenth (± 0.1) and we want to know the answer to one more decimal, plus or minus one hundredth (± 0.01), we need one hundred times the data to get that precision.

That is the end of the detour, now let me return to my investigation of their error estimate for the ocean heating rate for the top 1800 metres of the ocean. If you recall, or even if you don’t, that was 1 watt per square metre (W/m2).

Now, that is calculated from temperature readings from Argo floats, about 3,000 of them during the study period.

Let me run through the numbers to convert their error (in w/m2) into a temperature change (in °C/year). I’ve comma-separated them for easy import into a spreadsheet if you wish.

We start with the forcing error and the depth heated as our inputs, and one constant, the energy to heat seawater one degree:

Energy to heat seawater:, 4.00E+06, joules/tonne/°C

Forcing error: plus or minus, 1, watts/m2

Depth heated:, 1800, metres

Then we calculate

Seawater weight:, 1860, tonnes

for a density of about 1.03333.

We multiply watts by seconds per year to give

Joules from forcing:, 3.16E+07, joules/yr

Finally, Joules available / (Tonnes of water times energy to heat a tonne by 1°C) gives us

Temperature error: plus or minus, 0.004, degrees/yr

So, assuming there are no problems with my math, they are claiming that they can measure the temperature rise of the top mile of the global ocean to within 0.004°C per year. That seems way too small an error to me. But is it too small? If we have lots and lots of observations, surely we can get the error down to that small?

Here’s the problem with their claim that the error is that small. I’ve raised this question at Judith’s and elsewhere, and gotten no answer. So I am posing the question again, in the hope that someone can unravel the puzzle.

We know that to get a smaller error by one decimal, we need a hundred times more observations per decimal point. But the same is true in reverse. If we need less precision, we don’t need as many observations. If we need one less decimal point, we can do it with one-hundredth of the observations.

Currently, they claim an error of ± 0.004°C (four thousandths of a degree) for the annual average upper ocean temperature from the observations of the three thousand or so Argo buoys.

But that means that if we are satisfied with an error of ± 0.04°C (four hundredths of a degree), we could do it with a hundredth of the number of observations, or about 30 Argo buoys. And it also indicates that 3 Argo buoys could measure that same huge volume, the entire global ocean from pole to pole, to within a tenth of a degree.

And that is the problem I see. There’s no possible way that thirty buoys could measure the top mile of the whole ocean to that kind of accuracy, four hundredths of a degree C. The ocean is far too large and varied for thirty Argo floats to do that.

What am I missing here? Have I made some major math mistake? Their claimed error seems to be way out of line for the number of observations. I’ve not been able to find a good explanation of how they come up with these claims of extreme precision, but however they’re doing it, my math doesn’t support it.

And that’s the puzzle. Comments welcome.

Regards to everyone,

5 2 votes

Article Rating

180 Comments

Woodshedder

January 26, 2012 9:05 pm

Great post…

jeef

January 26, 2012 9:08 pm

Each buoy itself makes thousands of observations? Just a thought…

Truthseeker

January 26, 2012 9:18 pm

Willis, there is no mystery here. Their precision is between 10 to 100 times less than what they think it is. 3000 Argo buoys is a trifling number considering the vastness of the world’s oceans. The problem is if they acknowledge this lack of precision, they will then claim that the “missing heat” is in the error range of their ocean temperature measurements and therefore that they are right!

Matthew Bergin

January 26, 2012 9:24 pm

The three thousand buoys are only measuring an volume of the ocean equal to the amount they displace at any one time. So only a very small fraction of the total ocean is being measured.

Brian H

January 26, 2012 9:25 pm

Well, there’s quantity, and then there’s quality. Quality embraces distribution and location. Are the samples random? Are they clustered? Etc.

Chris Nelli

January 26, 2012 9:25 pm

Given enough time, I think your assumption is right. You have to consider the time for the oceans to be mixed, etc. In your blackjack computer model, did you have to consider the time it took to shuffle the deck after so many hands are played? Likely not, if you used the random number generator command. In summary, you have to balance the number of buoys against the speed of mixing that occurs in the ocean. The more buoys, the less concern one has about mixing.

Richard Keen

January 26, 2012 9:32 pm

Willis, nice job and impeccable logic.
A couple of years ago Vincent Gray wrote a short piece on ICECAP about the missing heat, and I chimed in with an estimation of the error based on the differences between Trenberth’s 1997 and 2009 values for the energy budget. The working assumption was that the differences between the two estimates for each component was roughly the error for each value. The conclusion:
Trenberth’s net “missing heat” of 0.9 W/m2 in 2009 had an uncertainly of +/- 1.3 W/m2. In other words, it could be zero, or even negative.
A different approach than Willis used, but the same conclusion.
Details at: http://icecap.us/index.php/go/icing-the-hype/the_flat_earth/

Bob Shapiro

January 26, 2012 9:35 pm

Always enjoy your posts, Willis.
One possible difficulty that I see with your analysis is the proverbial apples to oranges problem. In your fun description of your gambling escapade, you’re looking at percent error. Your betting advantage is 1%, and you’re looking at how to raise that advantage by comparing two systems to see which returns a greater percent advantage.
While I don’t disagree with your conclusion relating to their not being able to measure the oceans with so few “stations,” I think you would need to show percent error rather than a decimal fraction of a degree…. Or mybe my thinking is all wet.

crosspatch

January 26, 2012 9:35 pm

And it also indicates that 3 Argo buoys could measure that same huge volume, the entire global ocean from pole to pole, to within a tenth of a degree.
…
There’s no possible way that thirty buoys could measure the top mile of the whole ocean to that kind of accuracy, four hundredths of a degree C

Well, lets say the floats report temperature to +/- 0.1 degree. With three floats, we can be fairly certain that we can measure the parts of the ocean where the floats have measured to 0.1 degree. Now comes the second part, extrapolation. It is then absurd to project the average of those three floats across the entire globe because there are more than three different temperature regimes across the globe. Now lets say we go to 300 floats. Now we can get the error down to 0.01 degrees, but again only that certainty in the areas where we have sampled. But we can probably begin to project the values received from those floats as being representative of larger areas of the world’s ocean. So there are two uncertainties here, not one. The first uncertainty is the uncertainty of the measurements themselves. The second is the uncertainty of how much this actually projects across the entire globe.
Using a gambling analogy, lets say you are studying the average payout of slot machines across all of Reno. If you decide to use only three slot machines, one in each of three casinos, you will first need to take a number of samples just to find out what the average of those three are. But are those three representative of all slot machines in all establishments in Reno? So now you decide to have 300 testers at one slot machine in 300 different locations (Reno has slot machines everywhere, even in the gas station). So now you must make a lot of measurements at each one, but are you still sure that the result can be projected across the entire population of slot machines in Reno? Might you have obtained a different result at one of the establishments if you had picked a slot machine at the end of a row close to the entrance?
Ocean temperatures aren’t perfectly stratified across the entire ocean or even the entire region. Maybe a float dove through a cold upwelling someplace that is actually fairly localized. Is its temperature reading typical for a huge region of ocean? So there are two uncertainties involved. The error of the average from taking so many samples in places in so many different locations, and then there is a projection or extrapolation uncertainty. To what extent is it really valid to have a temperature reading in one spot represent some entire grid square? Does one slot machine’s payout represent the payout of all other slot machines within that grid square? Does the average obtained from all of the measurements really have any significant meaning at all? Maybe not. Maybe each place is different until you get so many readings from so many locations that the various local conditions can be accounted for. Which, by the way, is one reason I don’t like the wholesale removal of stations from the global surface temperature databases. The weather in Sacramento may not be representative of the weather in Ashburn and the temperature in Willits may not be representative of the temperature at Ft. Bragg, California (not to be confused with Ft. Bragg, NC).

crosspatch

January 26, 2012 9:36 pm

“Each buoy itself makes thousands of observations? Just a thought…”
But it is moving so each observation is at a different point. If you go from slot machine to slot machine playing one game at each, do you really learn the average payout of machines at the establishment?

Michael Bergeron (@zerg539)

January 26, 2012 9:41 pm

I’ve made this argument before about land stations not having enough individual accuracy as demonstrated by Anthony’s Surface station project. then the laughable idea that you can get a global temperature from using anything short of millions of locations, most of them needing to be in the sea, and many more needed in the polar regions.

Doug Cotton

January 26, 2012 9:43 pm

If there is any missing thermal energy I would expect to find it under the ocean floor (and in the crust under land surfaces) where it really must go when temperatures rise relatively quickly. Please see my post here for reasons . . .
http://wattsupwiththat.com/2012/01/26/october-to-december-2011-nodc-ocean-heat-content-anomalies-0-700meters-update-and-comments/#more-55499 January 26, 2012 at 7:26 pm

Walter

January 26, 2012 9:45 pm

Claiming a single figure for widely varied oceans seems a bit of a stretch of credibility.
Then… cutting the number of devices doing the sampling would tend to make the expected extremes more extreme.
Effectively, its layer upon layer of BS.

Mike Jonas

Editor

January 26, 2012 9:54 pm

Your statistical preamble is based on all numbers being part of the same chance-based set.
The Argo buoys measure diffferent points that are not part of the same chance-based set but are physical place-times in a variable ocean. The rules for how many observations you need to arrive at an ocean average would surely be completely different. So you have the answer already: “The ocean is far too large and varied“.
Not very well expressed, and I’m no expert in this statistical stuff, so hopefully someone will explain it better.

Stephen Rasey

January 26, 2012 9:55 pm

It’s worse than you thought, Willis! 😉
The observation that you need 100 times the observations to gain an extra significant digit is well stated. But that is all you can hope for provided the observations are independent.
In the case of Argo, there is likely a bit of covariant linkage between the readings. My bias is that neighbors are positivly corelated, and if so, it will take many more than 100 x to gain that decimal point.

Hoser

January 26, 2012 9:56 pm

Did precision get confused with accuracy? Calibration? Who’s checking? How?

RockyRoad

January 26, 2012 9:58 pm

I’d still like to do a variogram on some of the data to see if there’s any spatial correlation, Willis. I’m willing to admit right off we’d find correlation down each temperature profile, but that only counts for one dimension. Whether there’s any geostatistical correlation in the other two dimensions is the big question, and if we assume there isn’t, then there’s no reason to forge ahead in that effort–the data can be treated simply as random samples and your assertions are likely correct–they can’t be as confident as their numbers indicate.
But in the fanciful world of ocean geostatistics, the overall average and distribution of “blocks” representing the entire ocean would be far different from that of the sample points from which it was derived. I’ve seen it hundreds of times applying geostatistics to insitu mineral models on a variety of scales–using drill holes spaced 1,000 feet, 500 ft, 100 ft, and even 20 feet apart (the latter are numerous blast holes drilled on mining benches). And generally the more samples that are taken, the more the range on the variogram shrinks and the natural heterogeneity is accentuated. I wouldn’t be surprised if the same were true of the vast oceans.
Which means we probably won’t ever get there–to achieve their stated level of error with any acceptable confidence would require spatial statistics and likely require far more time and money than anybody is willing to invest. (Ah, to be the recipient of some wealthy aunty’s vast fortune someday.)

Keith W.

January 26, 2012 9:59 pm

Sorry Jeef, but Willis dug into that as well in another post, and they don’t make that many observations.
http://wattsupwiththat.com/2011/12/31/krige-the-argo-probe-data-mr-spock/

Jim D

January 26, 2012 10:00 pm

I don’t think anyone would be satisfied with an error of 0.04 degrees because this is much higher than the annual average warming rate which is nearer 0.01 degrees per year. So 0.004 degrees per year, which they have, means they can measure the temperature rise of the ocean on a annual to decadal time scale, and can at least tell with high confidence whether an annual warming rate of 0.01 degrees exists, which I think they can.

Willis Eschenbach

Author

January 26, 2012 10:04 pm

jeef says:
January 26, 2012 at 9:08 pm

Each buoy itself makes thousands of observations? Just a thought…

That’s one of the beauties of this particular way of showing that their claimed error is too small. It doesn’t matter how many observations each Argo float makes. Each float actually makes three vertical temperature profiles per month.
But it doesn’t matter, because however many observations were taken per float, we need a hundred times less observations, which means a hundredth of the floats.
w.

thepompousgit

January 26, 2012 10:07 pm

Willis, silly question: Have you asked The Statistician to the Stars?
http://wmbriggs.com/blog/

Lonnie E. Schubert

January 26, 2012 10:09 pm

Thanks Willis, my own experience leads me to disbelieve most any temperature data reported at better than ±0.5°C. Call me crazy, but I just don’t trust temperature measurement that well, especially when there is any room for subjective human judgement.

David Falkner

January 26, 2012 10:14 pm

@ur momisugly jeef:
There would still be issues with the spatial distribution. That may be where 3000 buoys will work to get a thousandths place error, but 3 will not work for tenths. Although, I am wondering if one of those buoys can even measure a single measurement that precisely.

Jeremy

January 26, 2012 10:15 pm

Math like this only works with observations that are 100% accurate (no error). In the real world, instruments have errors and limited accuracy and instruments will drift with time.
For example, if there is random background noise affecting an instrument then the accuracy of the measurement reading of the instrument can be improved by square root of the number of observations. However, you cannot improve non random error – so ultimately measurement accuracy soon falls victim to inherent limitations of the instrument (non-linearity, offset, drift and resolution). You cannot solve this by adding more instruments or more observations because of the non-random nature of many of these errors (similar built instruments will suffer from similar degrees of non-random error)

Willis Eschenbach

Author

January 26, 2012 10:19 pm

David Falkner says:
January 26, 2012 at 10:14 pm

@ur momisugly jeef:
There would still be issues with the spatial distribution. That may be where 3000 buoys will work to get a thousandths place error, but 3 will not work for tenths. Although, I am wondering if one of those buoys can even measure a single measurement that precisely.

Thanks, David. I wrote about the Argo floats here. Basically, their thermometers are good to ± 0.005°C, and seem to maintain that over time.
w.

1 2 3 … 8 Next »

wpDiscuz

Share this:

Related Posts

The Surface Energy Budget

Missing heat found in the deep ocean

Bad News for Trenberth’s Missing Heat – New Study Finds the Deep Oceans Cooled from 1992 to 2011 and…

It Would Not Matter If Trenberth Was Correct (Now Includes January Data)