Decimals of Precision

Guest Post by Willis Eschenbach

Over at Judith Curry’s excellent blog there’s a discussion of Trenberth’s missing heat. A new paper about oceanic temperatures says the heat’s not really missing, we just don’t have accurate enough information to tell where it is. The paper’s called Observed changes in top-of-the-atmosphere radiation and upper-ocean heating consistent within uncertainty.

It’s paywalled, and I was interested in one rough number, so I haven’t read it. The number that I wanted was the error estimate for their oceanic heating rates. This error can be seen in Figures 1a and 3a on the abstract page, and it is on the order of about plus or minus one watt/m2. This is consistent with other estimates of upper ocean heat content measurement errors.

I think I can conclusively demonstrate that their claimed error is way too small. To understand why, let me take a detour through the art, science, and business of blackjack.

In a fit of misguided passion, some years back I decided to learn how to count cards at blackjack. I had money and time at the same moment, an unusual combination in my life, so I took a class from a guy I’ll call Jimmy Chan. Paid good money for the class, and I got good value. I’ve always been good with figures, and I came out good at counting cards. Not as good as Jimmy, though, he was a mad keen player who had made a lot of money counting cards.

At the time they were still playing single deck in Reno. And I was young, single, and stupid. So I took twenty thousand dollars from my savings for my grubstake and went to Reno. It was an education about a curious business.

Here are the economics of the business of counting cards.

First, if you count using one of the usual systems as I did, and you are playing single deck, it gives you about a 1% edge on the house. Not much, to be sure, but it is a solid edge. And you can add to that by using a better counting system or a concurrent betting system, where better means more complex.

Second, if you play head-to-head (just you and the dealer) you can typically play about a hundred hands an hour.

Doesn’t take a math whiz to see that if you don’t blow the count, you will win about one extra hand an hour.

And therein is the catch. It means that in the card counting business, your average hourly wage is the amount of your average bet.

It’s a catch because of the other inexorable rule of counting blackjack. This regards surviving the swings and arrows of outrageous luck. If you don’t want to go home empty-handed, you need to have a grubstake that is a thousand times your average bet. Otherwise, you could go bust just from the natural ups and downs.

Now, twenty thousand dollars was all I could scrape together then. So that meant my average bet couldn’t be more than twenty dollars. I started out at the five dollar level.

I’d never spent any time in a casino up until then. I felt like the rube in every movie I ever saw. I played a while at the five dollar level. You never win or lose much there, so nobody paid any attention to me.

After a day or so making the princely sum of $5 per hour, I started betting larger. First at the ten-dollar level. Then at the twenty-dollar level. That was good money back in those days.

But when you start to make a bit of money, like say you hit a few blackjacks in a row and you’re doubling down, they start paying attention to you, and the trouble begins. First they use the casino holodeck to transport a somewhat malignant looking dwarf armed with a pad and a pencil to your table. He materializes at the shoulder of the dealer, and she starts to sweat. I say she because most dealers were women then and now. She starts to sweat because the casino doesn’t really care about card counters. I was making $20 an hour on average? Big deal, everyone in the casino management made that and more.

What scares casino owners is collusion between dealers and players. With the connivance of the dealer a guy can have a “string of luck” that can clean out a table in fifteen minutes and be out the door, meeting the dealer later to split the money. That’s what casino owners worry about, and that’s why the dealer started sweating, she knew she was being watched too. The dwarf peered through coke-bottle thick glasses, and wrote down the number of chips on each stack in the dealer’s rack, how much money I had, how much other players had. He gave the dealer a new deck. He wore a suit that cost as much as my grubstake. His wingtip shoes were shined to a rich luster. He looked at me as though I were a rich man with a loathsome disease. He watched my eyes, my hands. I started sweating like the dealer.

If I continued to win, the holodeck went into action again. This time what materialized were two large, vaguely anthropoid looking gentlemen, whose suits were specially tailored to conceal a bulge under the off-hand shoulder. They simply appeared, one at each shoulder of the aforementioned vertically challenged gentleman, who looked even dwarfier next to them, but clearly at ease in his natural element. They all three stared at me, and when that bored them, at the dealer. And then at me again.

And if the dealer was sweating, I was melting. I’m not made for that kind of game, I’m not good at that kind of pretence. I found out you can take the cowboy out of the country, but you can’t make him go mano-a-mano with the casinos for twenty bucks an hour.

I lasted a week. I logged my hours and my winnings. During that time, I worked well over forty hours. I only made enough money to pay for the flight and the hotel, and that’s about it. I was glad to put my twenty grand back in the bank.

I couldn’t take the constant strain and pressure of counting and not looking like I was counting and trying to stay invisible and feeling like a million eyes in the sky were watching my every eyeblink and having an inescapable feeling of being that guy in the movies who’s about to be squashed like a bug. But for those who can make it a game and keep it up, what an adventure! I’m glad I did it, wouldn’t do it again.

The part I liked the least, curiously, was something else entirely. It was that my every move was fixed. For every conceivable combination of my cards, the dealer’s card, and the count, there is one and only one right move. Not two. Not “player’s choice”. One move. I definitely didn’t like the feeling that I could be replaced by a vaguely humanoid 100% Turing-tested robot with a poor sense of dress and a really, really simple set of blackjack instructions

But I was still interested in the math of it all. And I had my trusty Macintosh 512. And Jimmy Chan had an idea about how to improve the odds by changing his counting method. And so did some of Jimmy’s friends. And he had a guy who tested their new counting method for them, at some university, for five hundred bucks a run.

So I told Jimmy I’d do the analysis for a hundred bucks a run. He and his friends were interested. I wrote a program for my Mac to play blackjack against itself. I wrote it in Basic, because that was what was easy. But it was sloooow. So I taught myself to program in C, and I rewrote the entire program in C. It was still too slow, so I translated the critical sections into assembly language. Finally, it was fast enough. I would set up a run during the day, programming in the details of however the person wanted to do the count. Then I’d start it when I went to bed, and in the morning the run would be done and I’d have made a hundred bucks. I figured that I’d finally achieved what my computer was really for, which was to make me money while I slept.

The computer had to be fast because of the issue that is at the heart of this post. This is, how many hands of blackjack did the computer have to play against itself to find out if the new system beat the old system?

The answer turns out to be a hundred times more hands per decimal. In practice, this means at least a million hands, and many more is better.

What we are looking at is the error of the average. If I measure something many times, I can average my answers. Is the resulting mean value the true underlying mean of what I am measuring? No, of course not. If we flip a hundred coins, usually it won’t be exactly fifty/fifty.

But it will be close to the true average of the data. How close? Well, the measure of how close it is expected to be to the true underlying average is what is called the “standard error of the mean”. It is calculated as the standard deviation of the data divided by the square root of the number of observations.

It is the last fact that concerns us. It means that if we double the number of observations, we don’t cut the error in half, but only to 0.7 of the original value. One consequence of this is that if we need one more decimal of precision, we need a hundred times the number of observations. That is what I meant by a hundred times per decimal. If our precision is plus or minus a tenth (± 0.1) and we want to know the answer to one more decimal, plus or minus one hundredth (± 0.01), we need one hundred times the data to get that precision.

That is the end of the detour, now let me return to my investigation of their error estimate for the ocean heating rate for the top 1800 metres of the ocean. If you recall, or even if you don’t, that was 1 watt per square metre (W/m2).

Now, that is calculated from temperature readings from Argo floats, about 3,000 of them during the study period.

Let me run through the numbers to convert their error (in w/m2) into a temperature change (in °C/year). I’ve comma-separated them for easy import into a spreadsheet if you wish.

We start with the forcing error and the depth heated as our inputs, and one constant, the energy to heat seawater one degree:

Energy to heat seawater:, 4.00E+06, joules/tonne/°C

Forcing error: plus or minus, 1, watts/m2

Depth heated:, 1800, metres

Then we calculate

Seawater weight:, 1860, tonnes

for a density of about 1.03333.

We multiply watts by seconds per year to give

Joules from forcing:, 3.16E+07, joules/yr

Finally, Joules available / (Tonnes of water times energy to heat a tonne by 1°C) gives us

Temperature error: plus or minus, 0.004, degrees/yr

So, assuming there are no problems with my math, they are claiming that they can measure the temperature rise of the top mile of the global ocean to within 0.004°C per year. That seems way too small an error to me. But is it too small? If we have lots and lots of observations, surely we can get the error down to that small?

Here’s the problem with their claim that the error is that small. I’ve raised this question at Judith’s and elsewhere, and gotten no answer. So I am posing the question again, in the hope that someone can unravel the puzzle.

We know that to get a smaller error by one decimal, we need a hundred times more observations per decimal point. But the same is true in reverse. If we need less precision, we don’t need as many observations. If we need one less decimal point, we can do it with one-hundredth of the observations.

Currently, they claim an error of ± 0.004°C (four thousandths of a degree) for the annual average upper ocean temperature from the observations of the three thousand or so Argo buoys.

But that means that if we are satisfied with an error of ± 0.04°C (four hundredths of a degree), we could do it with a hundredth of the number of observations, or about 30 Argo buoys. And it also indicates that 3 Argo buoys could measure that same huge volume, the entire global ocean from pole to pole, to within a tenth of a degree.

And that is the problem I see. There’s no possible way that thirty buoys could measure the top mile of the whole ocean to that kind of accuracy, four hundredths of a degree C. The ocean is far too large and varied for thirty Argo floats to do that.

What am I missing here? Have I made some major math mistake? Their claimed error seems to be way out of line for the number of observations. I’ve not been able to find a good explanation of how they come up with these claims of extreme precision, but however they’re doing it, my math doesn’t support it.

And that’s the puzzle. Comments welcome.

Regards to everyone,

w.

5 2 votes
Article Rating

Discover more from Watts Up With That?

Subscribe to get the latest posts sent to your email.

180 Comments
Inline Feedbacks
View all comments
thingadonta
January 26, 2012 10:20 pm

I think the counters at Blackjack win by raising their bet significantly when the count is favourable, and sitting around with low bets the other 98% of the time. But this of course raises suspicions at the casino, its easy to see when somone varies their bet greatly. And you have to be willing to lose the big bet, which most people aren’t willing to do.
As for the oceans, I assume they intergrate a time factor, that is; whatever degree of larger error is in the measurement at one time (say within a cold current which moves around), will cancel out the next time (when the cold current has weakened), meaning the larger errors cancel out over time. Not sure if this is what you are after, but a trend over time might reduce such errors.
Off topic a bit, but I agree with many that averaging data further back in time by proxy measurements shouldnt be allowed (such as in Mann’s various papers), as in this case what you are averaging is not data, but proxy data, meaning you are 1) mixing different uncertainties 2) biasing data towards whatever errors are in the proxy itself over time, that is; many proxy methods get less responsive the further back in time you go, meaning if you are averaging proxy data further back in time, you will likely simply flatten any deviations the further one goes back in time, with the older proxy data being by nature, less responsive. (and you get a hockeystick towards the recent end, as data becomes more reponsive). Mann uses this false method to claim the MWP was lower in T than the today.

January 26, 2012 10:21 pm

One of my favorite statistical modeling stories I heard in graduate school in the late ’70s. It dealt with the design of the Trans Alaskan Pipeline Oil Tanker Terminal at Valdez, Alaska. The pipeline was designed for a maximum 2,000,000 bbls of oil per day. So, there had to be capacity at Valdez to temporarily store that oil if tankers were late. So the crucial question was what is the tankage needed for a 9x.x% confidence to store the oil rather than slow down the pumps?
Tankers are mechanical and therefore have some schedule delays. But the crucial problem was Gulf of Alaska weather. No problem. Thanks to Russian fishing villages, they had 200 years of weather reports. All the members of the consortium had the same data. Their OR teams crunched the numbers and came back with:
1.25 days
1.50 days
7.00 days. !?!?
“How do you justify SEVEN Days when we come up with a little more than one?”
“We modeled tanker delay according to the frequency of major storms.”
“So did we.”
“And when a storm delays one tanker, it delays ALL of them.”
“….. hmmm. Right. The delays are not independent. ”
The story is they quickly settled on six days.
It is quite a big terminal.

StatGuyInDallas
January 26, 2012 10:22 pm

Jeef is probably right. Many observations per buoy – making it a cluster sample.
Similar calculations can be made – just a little more tedious.

January 26, 2012 10:22 pm

Regarding the error measurements discussed, what Watt says seems right to me.
A very similar point can be made about those models. They estimate SW and LW radiation and each is acknowledged to have 1% to 2% uncertainty. Then they take a difference to get net radiative flux at TOA. But such flux in practice varies between about +0.5% and -0.5% of total incident solar flux So how on Earth can the “accuracy” of the difference be even sufficient to determine whether the end result is positive or negative – ie warming or cooling?
The whole model business would be a joke if the consequences were not so grave – like spending $10 trillion over the next 100 years for developing countries who could well do with the money, yes, but spend it on more useful projects – projects that could easily save lives.

Shub Niggurath
January 26, 2012 10:22 pm

Well, if we believe the numbers, that is what the implication is. The ocean is just one big homogenous lump of water. You don’t need too many thermometers to measure its temperature – it is the same everywhere (within climatic zones, that is).
The missus and I were doing calculations of statistical power recently. I had several revelatory moments even then.
Nicely written.

Duke C.
January 26, 2012 10:26 pm

Willis-
The SV for blackjack with standard rules is 1.15. With a 1% edge you should have been betting $150 (.75% of your bankroll) per hand. Adjust your bet size according to bankroll fluctuations and you’ll never go broke. Eventually, you would have been a rich Casino owner instead of a poor climate change heretic. 😉

Jim D
January 26, 2012 10:42 pm

Adding to what I wrote above, you have to remember this is a decadal trend that is being evaluated, so this 0.004 degrees per year is really an accuracy of 0.04 degrees per decade. Since the actual trend is probably just over 0.1 degrees per decade, they are saying they can resolve this reasonably well, and at least be certain of the warming. 0.04 degrees per year translates to 0.4 degrees per decade which is quite useless for determining even the sign of such a trend, and I am certain they can do and are doing better than that.

Eyal Porat
January 26, 2012 10:44 pm

Willis,
First, great post – especially the story :-).
This 0.004 C is soooo accurate and unmeasurable it seems meaningless up front. It is way too accurate to take seriuosly.
As you have taught me: when you smell a rat – in most cases there is one.

David Falkner
January 26, 2012 10:48 pm

Willis:
Basically, their thermometers are good to ± 0.005°C, and seem to maintain that over time.
——————
In the body of the story, you calculated the error as equivalent to 0.004°C. Shouldn’t the error at least equal the error in the instrument? How could applying math to the output of the instrument make the instrument itself more efficient? I feel like I am missing something in your calculation, perhaps. Or maybe I am not understanding your point?

January 26, 2012 10:49 pm

Two issues (why is it always two? – I will try to make it three or four just to be difficult.)
So, we must separate serial observations from observations in parallel, and precision from accuracy, and both from spatial distribution or coverage, or more pertinently, the density of measurements.
1. For estimating accuracy, not the number of buoys but the number of observations per buoy over time. Since each buoy only records locally, whatever it measures is only local. So if the buoys are moving around, you also need to know something about that.
2. The precision with which we can measure is a complex function of the instrumentation’s inherent or “lab” precision, the environment and the calibration procedures (if any.) This must be accounted for when measuring accuracy.
2. Clearly, even 3,000 buoys do not provide enough density to provide meaningful coverage. The best way to deal with accuracy under these circumstances might be to “stratify” measurements into temperature or energy bands, so that all buoys making measurements in the same band can be aggregated to assess accuracy. I make this point since it seems meaningless to be assessing
accuracy to however many decimals when the temperatures observed by different buoys might vary by factors of 2 or more.
For example, if say, 20% of the buoys operate in a “tropical” band, and the number of observation per buoy is say, 1,000 per week (I have no idea what the actual number might be) then we would have .2(3000)(1000) = 600,000 observations on which to assess accuracy, and then only if the swings in measurement were not too wild and the buoy did not move a thousand miles or more over the period.
Hope I am not writing total nonsense?

David Falkner
January 26, 2012 10:49 pm

And we’re assuming top operational efficiency, at that!

January 26, 2012 10:56 pm

write me. ill send u the paper

Alan Wilkinson
January 26, 2012 11:01 pm

There is no statistical validity whatever in these error estimates since the assumptions that all the measurements are independent and drawn randomly from the same population are certainly false to unknown degrees.
Has anyone any quantitative examinations of why the simple sea-level change measurement should not directly reflect heat content over relatively short time-spans? Since the rate of change has been modest and fairly consistent?

Alan S. Blue
January 26, 2012 11:21 pm

Willis, I keep meaning to write a couple of articles along the lines of “How do we measure Temperature – from a single thermometer to a Global Mean Average.”
One issue with the surface stations that may well be a factor in the oceanic measurements is using the NIST-calibrated “Instrumental Error” when calculating the error of measuring the temperature of a ‘gridcell’. Thermometers with a stated error of 0.1C are common. But the norm in climatology is to take that single measurement and start spreading it towards the next-nearest-thermometer under the assumption that they’re representative of the entire gridcell. And the assumption that the temperature is smooth relative to the size of the gridcells.
There’s nothing too wrong with that when you’re just aiming for a sense of what the contours of the temperature map look like … But when you turn around and attempt to propagate the errors to determine your -actual- precision and accuracy at measuring the temperature, there seems to be a recurring blind spot.
A perfect, out-of-the-box thermometer, NIST-calibrated with stickers and all as “±0.1C” just doesn’t in general provide a measurement of the temperature for the gridcell to an accuracy of 0.1C. Nor anywhere remotely nearby. Yes, the -reading- might well be “3.2C”, but that is a point-source reading
Watching a single weather forecast off your local news should drive this home: There’s only a couple gridcells (max) represented. And yet there’s obvious temperature variations. Some variations are endemic – because they represent height variations or natural features. (Or barbeques, whatever) But there is generally plenty of shifting in this analogous plot of temperatures as well. Some town normally warmer than yours might show up cooler, or way hotter, or tied on any given day.
It’s possible that someone has done a ‘calibration study’ with the Argos, to determine not just that it measures temperature in the spot it’s actually in, but to determine how well the individual sensor measures its gridcell. -Not- a “correlation study”, that just shows “Hey, we agree with other-instrument-X with an R of XXX” – that doesn’t tell you the instrumental error. I just don’t know enough details for the oceanic studies.
A ‘calibration study’ would look more like 3000 instruments dumped into a -single- gridcell. And it has exactly the same issue you bring up with regards to numbers: To make a perfectly accurate “map”, you need a map that’s the same size as the surface being mapped to cover all the irregularities.
In chemical engineering, (my field), temperature can be very important to a given process. Having five or so thermometers on the inlets and dispersed around the tank of a -well understood- process might well be dramatically insufficient to the job of “measuring the temperature” to a measly 0.1C. That’s a tank in a reasonably controlled environment with a “known” process.
So, I wouldn’t pay much attention to the error bars unless the instruments being used are more ‘areal’ as opposed to ‘point source’ or there’s an exhaustive study of some gridcells.

January 26, 2012 11:21 pm

I have to ask, what’s the real point of all this measuring of OHC? Is it just a ploy for more funding to try to prove AGW a reality? The overall trend is not far different from sea surface temperatures. Wouldn’t it be better to get a satellite working again on sea surface temperatures to ensure some continuity of that very important data which, in my mind, is the most indicative of what’s happening and where we’re heading.

F. Ross
January 26, 2012 11:21 pm

Willis. Interesting post.
Just curious; do you know what the power source of the buoys is. And has the possible presence of heat from a power source been taken into account – as far as it might possibly affect temperature measurements?
buoy

Alex the skeptic
January 26, 2012 11:23 pm

“……..says the heat’s not really missing, we just don’t have accurate enough information to tell where it is”
It’s like santa. It’s not that santa does not really exist (don’t tell the kids), it’s just that we don’t know exactly where he lives, although he could have drowned togethher with the polar bears.

James
January 26, 2012 11:27 pm

Willis
You can estimate the probability of tossing a head by flipping a coin 100 times by counting the number of heads flipped and dividing by 100. When you do this, the standard deviation of this estimate will be .5*.5 / sqrt(100) = .025 or 2.5%. People then like to say A 95% confidence region is plus or minus 2 standard deviations or +/- 5% in this case.
However, this last bit is relying on the law of large numbers to assert that you average estimate is normally distributed. If you were to only flip your coin once, you would get an estimate that has a 10x standard deviation, or 25%, but you would have to be very careful what you did with that. You could not assert that your estimate is normal and thus the 95% confidence interval is +/- 50%. You would either get a range of -50% to 50% or 50% to 150%…nonsense.
The problem is when you have a small sample, you have to work with the small sample statistical properties of your estimators and abandon all your large sample approximations. Bottom line is your statistics go haywire usually with small samples 🙂
Hope that helps!
James

old44
January 26, 2012 11:31 pm

“the heat’s not really missing, we just don’t have accurate enough information to tell where it is”
Like Little Bo-Peeps sheep.

John
January 26, 2012 11:39 pm

If I’m reading this correctly, the error you mention is the error in the mean based on the data from the ARGO buoys….what about the measurement error for the buoys. What is the error associated with the actual measurement. The reason I ask is that errors are cummulative so any measurement error would increase the error of the mean.
Coming from a geostatistical background, how is the data treated in terms of data clustering. If you have lots of measurements very close (i.e. some of the buoys clustered together) then the weighting applied to these measurements should be lower, compared to the buoys from areas of sparse coverage. Unless the coverage is very even, then clustering may be an issue (e.g. more coverage in the tropics compared to the colder Arctic/Antarcic water). The result of this can bias the estimate of the mean.

F. Ross
January 26, 2012 11:40 pm

Please ignore my previous post. Dumb question on my part when a simple Google search shows what I wanted to know.

PiperPaul
January 26, 2012 11:46 pm

What if sharks eat some of the buoys?

Blade
January 26, 2012 11:48 pm

I guess we can state with certainty that Willis discovered Trenberth’s missing heat, it is found in Casinos at Blackjack tables populated by math whizzes 😉
Just a suggestion for the young rubes out there that get a taste of the favorable mathematics of counting. Besides money management and an adequate stake, there are two other variables that need to be controlled perfectly: the high-low counting must be correct, and the strategy (hitting/staying) must also be flawless. After locking up those two variables, the math is favorable, well, depending on number of decks. It is a hell of a lotta work to earn a small profit as Willis so artfully described. Truly it is the math challenge that draws so many Scientists to these games.
Willis, really wild guess, is Jimmy Chan == S.W.?

cb
January 26, 2012 11:53 pm

Um, I think this was touched on above somewhere, somewhat, but I do not think you could use this type of statistical reasoning at all.
The problem is that the oceans are, in effect, a collection of fluid-based heat-transferring machines. The system is deterministic, not random (not to mention that the systems change a lot over the course of the seasons, depending on the effect of previous seasons). In other words, using interpolation would not be an allowable procedure. (Unless you measure for decades? Even then, climate changes naturally over that timescale, so your results would end up being meaningless anyway.)
If you were to use randomly (or evenly, same thing if the machine is sufficiently unknown) distributed-in-space measurement buoys, then you will surely have very large error – to the point where your measurements would be worth nothing.
In the context of trying to determine global heat-flow, by measuring temps using the buoys.
This is a truism, is it not? I do not know what the buoys were purposed for, but it makes no sense to use them for this.

Verified by MonsterInsights