Decimals of Precision

Guest Post by Willis Eschenbach

Over at Judith Curry’s excellent blog there’s a discussion of Trenberth’s missing heat. A new paper about oceanic temperatures says the heat’s not really missing, we just don’t have accurate enough information to tell where it is. The paper’s called Observed changes in top-of-the-atmosphere radiation and upper-ocean heating consistent within uncertainty.

It’s paywalled, and I was interested in one rough number, so I haven’t read it. The number that I wanted was the error estimate for their oceanic heating rates. This error can be seen in Figures 1a and 3a on the abstract page, and it is on the order of about plus or minus one watt/m2. This is consistent with other estimates of upper ocean heat content measurement errors.

I think I can conclusively demonstrate that their claimed error is way too small. To understand why, let me take a detour through the art, science, and business of blackjack.

In a fit of misguided passion, some years back I decided to learn how to count cards at blackjack. I had money and time at the same moment, an unusual combination in my life, so I took a class from a guy I’ll call Jimmy Chan. Paid good money for the class, and I got good value. I’ve always been good with figures, and I came out good at counting cards. Not as good as Jimmy, though, he was a mad keen player who had made a lot of money counting cards.

At the time they were still playing single deck in Reno. And I was young, single, and stupid. So I took twenty thousand dollars from my savings for my grubstake and went to Reno. It was an education about a curious business.

Here are the economics of the business of counting cards.

First, if you count using one of the usual systems as I did, and you are playing single deck, it gives you about a 1% edge on the house. Not much, to be sure, but it is a solid edge. And you can add to that by using a better counting system or a concurrent betting system, where better means more complex.

Second, if you play head-to-head (just you and the dealer) you can typically play about a hundred hands an hour.

Doesn’t take a math whiz to see that if you don’t blow the count, you will win about one extra hand an hour.

And therein is the catch. It means that in the card counting business, your average hourly wage is the amount of your average bet.

It’s a catch because of the other inexorable rule of counting blackjack. This regards surviving the swings and arrows of outrageous luck. If you don’t want to go home empty-handed, you need to have a grubstake that is a thousand times your average bet. Otherwise, you could go bust just from the natural ups and downs.

Now, twenty thousand dollars was all I could scrape together then. So that meant my average bet couldn’t be more than twenty dollars. I started out at the five dollar level.

I’d never spent any time in a casino up until then. I felt like the rube in every movie I ever saw. I played a while at the five dollar level. You never win or lose much there, so nobody paid any attention to me.

After a day or so making the princely sum of $5 per hour, I started betting larger. First at the ten-dollar level. Then at the twenty-dollar level. That was good money back in those days.

But when you start to make a bit of money, like say you hit a few blackjacks in a row and you’re doubling down, they start paying attention to you, and the trouble begins. First they use the casino holodeck to transport a somewhat malignant looking dwarf armed with a pad and a pencil to your table. He materializes at the shoulder of the dealer, and she starts to sweat. I say she because most dealers were women then and now. She starts to sweat because the casino doesn’t really care about card counters. I was making $20 an hour on average? Big deal, everyone in the casino management made that and more.

What scares casino owners is collusion between dealers and players. With the connivance of the dealer a guy can have a “string of luck” that can clean out a table in fifteen minutes and be out the door, meeting the dealer later to split the money. That’s what casino owners worry about, and that’s why the dealer started sweating, she knew she was being watched too. The dwarf peered through coke-bottle thick glasses, and wrote down the number of chips on each stack in the dealer’s rack, how much money I had, how much other players had. He gave the dealer a new deck. He wore a suit that cost as much as my grubstake. His wingtip shoes were shined to a rich luster. He looked at me as though I were a rich man with a loathsome disease. He watched my eyes, my hands. I started sweating like the dealer.

If I continued to win, the holodeck went into action again. This time what materialized were two large, vaguely anthropoid looking gentlemen, whose suits were specially tailored to conceal a bulge under the off-hand shoulder. They simply appeared, one at each shoulder of the aforementioned vertically challenged gentleman, who looked even dwarfier next to them, but clearly at ease in his natural element. They all three stared at me, and when that bored them, at the dealer. And then at me again.

And if the dealer was sweating, I was melting. I’m not made for that kind of game, I’m not good at that kind of pretence. I found out you can take the cowboy out of the country, but you can’t make him go mano-a-mano with the casinos for twenty bucks an hour.

I lasted a week. I logged my hours and my winnings. During that time, I worked well over forty hours. I only made enough money to pay for the flight and the hotel, and that’s about it. I was glad to put my twenty grand back in the bank.

I couldn’t take the constant strain and pressure of counting and not looking like I was counting and trying to stay invisible and feeling like a million eyes in the sky were watching my every eyeblink and having an inescapable feeling of being that guy in the movies who’s about to be squashed like a bug. But for those who can make it a game and keep it up, what an adventure! I’m glad I did it, wouldn’t do it again.

The part I liked the least, curiously, was something else entirely. It was that my every move was fixed. For every conceivable combination of my cards, the dealer’s card, and the count, there is one and only one right move. Not two. Not “player’s choice”. One move. I definitely didn’t like the feeling that I could be replaced by a vaguely humanoid 100% Turing-tested robot with a poor sense of dress and a really, really simple set of blackjack instructions

But I was still interested in the math of it all. And I had my trusty Macintosh 512. And Jimmy Chan had an idea about how to improve the odds by changing his counting method. And so did some of Jimmy’s friends. And he had a guy who tested their new counting method for them, at some university, for five hundred bucks a run.

So I told Jimmy I’d do the analysis for a hundred bucks a run. He and his friends were interested. I wrote a program for my Mac to play blackjack against itself. I wrote it in Basic, because that was what was easy. But it was sloooow. So I taught myself to program in C, and I rewrote the entire program in C. It was still too slow, so I translated the critical sections into assembly language. Finally, it was fast enough. I would set up a run during the day, programming in the details of however the person wanted to do the count. Then I’d start it when I went to bed, and in the morning the run would be done and I’d have made a hundred bucks. I figured that I’d finally achieved what my computer was really for, which was to make me money while I slept.

The computer had to be fast because of the issue that is at the heart of this post. This is, how many hands of blackjack did the computer have to play against itself to find out if the new system beat the old system?

The answer turns out to be a hundred times more hands per decimal. In practice, this means at least a million hands, and many more is better.

What we are looking at is the error of the average. If I measure something many times, I can average my answers. Is the resulting mean value the true underlying mean of what I am measuring? No, of course not. If we flip a hundred coins, usually it won’t be exactly fifty/fifty.

But it will be close to the true average of the data. How close? Well, the measure of how close it is expected to be to the true underlying average is what is called the “standard error of the mean”. It is calculated as the standard deviation of the data divided by the square root of the number of observations.

It is the last fact that concerns us. It means that if we double the number of observations, we don’t cut the error in half, but only to 0.7 of the original value. One consequence of this is that if we need one more decimal of precision, we need a hundred times the number of observations. That is what I meant by a hundred times per decimal. If our precision is plus or minus a tenth (± 0.1) and we want to know the answer to one more decimal, plus or minus one hundredth (± 0.01), we need one hundred times the data to get that precision.

That is the end of the detour, now let me return to my investigation of their error estimate for the ocean heating rate for the top 1800 metres of the ocean. If you recall, or even if you don’t, that was 1 watt per square metre (W/m2).

Now, that is calculated from temperature readings from Argo floats, about 3,000 of them during the study period.

Let me run through the numbers to convert their error (in w/m2) into a temperature change (in °C/year). I’ve comma-separated them for easy import into a spreadsheet if you wish.

We start with the forcing error and the depth heated as our inputs, and one constant, the energy to heat seawater one degree:

Energy to heat seawater:, 4.00E+06, joules/tonne/°C

Forcing error: plus or minus, 1, watts/m2

Depth heated:, 1800, metres

Then we calculate

Seawater weight:, 1860, tonnes

for a density of about 1.03333.

We multiply watts by seconds per year to give

Joules from forcing:, 3.16E+07, joules/yr

Finally, Joules available / (Tonnes of water times energy to heat a tonne by 1°C) gives us

Temperature error: plus or minus, 0.004, degrees/yr

So, assuming there are no problems with my math, they are claiming that they can measure the temperature rise of the top mile of the global ocean to within 0.004°C per year. That seems way too small an error to me. But is it too small? If we have lots and lots of observations, surely we can get the error down to that small?

Here’s the problem with their claim that the error is that small. I’ve raised this question at Judith’s and elsewhere, and gotten no answer. So I am posing the question again, in the hope that someone can unravel the puzzle.

We know that to get a smaller error by one decimal, we need a hundred times more observations per decimal point. But the same is true in reverse. If we need less precision, we don’t need as many observations. If we need one less decimal point, we can do it with one-hundredth of the observations.

Currently, they claim an error of ± 0.004°C (four thousandths of a degree) for the annual average upper ocean temperature from the observations of the three thousand or so Argo buoys.

But that means that if we are satisfied with an error of ± 0.04°C (four hundredths of a degree), we could do it with a hundredth of the number of observations, or about 30 Argo buoys. And it also indicates that 3 Argo buoys could measure that same huge volume, the entire global ocean from pole to pole, to within a tenth of a degree.

And that is the problem I see. There’s no possible way that thirty buoys could measure the top mile of the whole ocean to that kind of accuracy, four hundredths of a degree C. The ocean is far too large and varied for thirty Argo floats to do that.

What am I missing here? Have I made some major math mistake? Their claimed error seems to be way out of line for the number of observations. I’ve not been able to find a good explanation of how they come up with these claims of extreme precision, but however they’re doing it, my math doesn’t support it.

And that’s the puzzle. Comments welcome.

Regards to everyone,

w.

5 2 votes
Article Rating

Discover more from Watts Up With That?

Subscribe to get the latest posts sent to your email.

180 Comments
Inline Feedbacks
View all comments
January 27, 2012 8:54 am

Willis,
Excellent story and question about the reality of what is being touted as the measurement accuracy. This goes, however, to the basic reliabilty of the various GISTemp or even (on a statistical basis) the Mann-schtick temperature graphs: we see the huge error bars, but take the central line as reliable, accurate and precise. Why?
What, statistically, do the general error bars tell us about the reality of the central, “statistical” trend? With what certainty do we know that some other temperature path was not followed over the years? If the error bars are so large, what does that tell us about the “global” trend of anything: are the error bars saying that all temperature is regional, and that the global profile is purely a statistical result with possibly no relevance to real-world patterns?
All statistical work on certainty is based on either a randomness of error/variation, or an understanding mathematically of bias. The data in Russia was colder when oil/funding was based on perceived need by the centralized planning board. The speed of the “hurricane” in NYC last year (? time flys) was largest when the TV needed to promote the excitement of a major storm (and connection to CAGW). I don’t know if either of these two biases were considered in the Hansen-style analysis, but I do know that all the work done on data adjustments shows a pattern of cooling the past and warming the present.
To summarize:
1) within error bars, can the actual profile be different and yet just as “certain”?
2) do local records have small error bars, and the large error bars reflect the fact there is no global trend except as a mathematical construct?
3) what does a statistical analysis of data adjustments show as “certainty” to the net adjustment profile?

P. Solar
January 27, 2012 8:58 am

The previous swimming pool analogy does not account for Steve’s very valid point about systematic error. Even 10,000 readings with the same buoy would not reduce the uncertainly due to the manufacturers stated accuracy of the sensor. Using 3000 sensor would. The instrumental error becomes negligible since they are (or claim to be) incredible accurate to start with.
That does not alter the fact that if your buoy does a dive and takes 1000 measurements , it is not measuring the same thing each time. It is not measuring the mean 1000 times.

Ian L. McQueen
January 27, 2012 9:36 am

Willis-
I have skipped 60 replies, so someone else may have made the same observation as follows…..
Is your problem related to the calculation of the “standard error of the mean”? I suspect that this is valid for items of the SAME KIND. Thus you could calculate for some dimension of a large number of widgets made by the same machine. But the same calculation for the same number of widgets, each produced by a different machine, would produce nothing useful. I would regard the ocean as a vast number of widget makers…..
IanM

climate creeper
January 27, 2012 9:55 am

Modeling with enough tunable parameters to make an elephant fly, curve fitting with inverted cherries, and correlation == causation is sooo passé. Now, error bars == invisible monsters! Wherefore art thou Steve McIntyre?

MR Bliss
January 27, 2012 10:31 am

What Trenberth needs is a new component in his heat equations. I would like it to be named after me, but I suppose it would have to be called the Trenberth Constant.
The value of this constant would be equal to the value of the missing heat. The beauty of this constant, in keeping with climate scientists habit of disregarding normal scientific protocols and conventions, is that it would be variable.
A variable constant – one that can be adjusted to whatever the current value of the missing heat happens to be. The peer review process would rubber stamp this, and another pesky problem would go away.

January 27, 2012 10:44 am

Seems to me these are ALL unrelated measurements in every dimension. I imagine the only repeatability will be year by year, then you start having to deal with things such Milankovitch cycles too.
Float measurements surely cannot be statistically analysed to provide a tight figure of statistical error of average worldwide Upper Ocean Heat Content (Sure, we can calculate an UOHC figure against time, but how meaningful is it? By any ‘dimension of measurement’, latitude, time, season, depth, proximity to land, depth of ocean at that point, types of weather, the error bars will by very large.)
Surely statistical error analysis can only be carried out with repeated measurements of the same parameter. Every case/every dimension these measurements are not repeatable, nor are they expected to be.
1. Individual floats will measure temperatures for one particular point in the ocean. But this is not expected to be the same for other points in the ocean. (latitude, currents, storms, seasons, time of day, proximity to land …etc)
2. Individual depth measurements are not expected to be the same, vertically, positional, by day or by hour or by season.
3. Day by day as the seasons change, each measurement is an individual measurement. How many tens (hundreds, thousands?) of years of data would be required to provide meaningful replicates of measurement?
The only meaningful statistical measurements which could have calculated error bars (and they would be huge) would be for example to average 10 years of temperature records for a certain probe at a certain time of year… then compare all similar probes at the same certain latitude…. then perhaps seasonal variation can be statistically removed and all the required adjustments can be made for season, ocean currents, ocean heat anomalies …… still, error bars would be huge. And, if every year there WAS in fact some warming, the “rate of change” could then be calculated for each particular probe, and statistically compared with rate of change for other probes …… (again with all the required adjustments for latitude, season, ocean currents, ocean heat anomalies … I can only imagine the margin of error bars in that case to be huge too).

January 27, 2012 11:03 am

Willis…I wish I had 1/2 your native intellect. But I have ENOUGH background to tell you that you are about to be CONDEMNED by the Pharisees and the Doctors of the Law! But in this case it’s not because you are considered to be “violating” the LAW…but because you are insisting that thee Doctors of the Law ABIDE by the LAW themselves. In this case it is the LAW of sampling, and distributions…and the “central mean” theorem. Actuallyit gets to some very important concepts in “information theory”. AND therefore it insists that the data be examined with a caveat that “at this point these changes cannot be distinguished from noise”.
A hard concession to make for the Doctors of the Law.
Keep up the good work, but don’t do any meditating in any gardens.
Max

zac
January 27, 2012 11:45 am

Do you know how many thermocouples these floats have on board?
It would also be interesting to know what time they are programmed to be in their near surface phase.

Tim Clark
January 27, 2012 11:50 am

[Josh Willis also had ocean-based data sets, including temperature profiles from the Argo robot fleet as well as from expendable bathythermographs, called “XBTs” for short.
But when he factored the too-warm XBT measurements into his ocean warming time series, the last of the ocean cooling went away. Later, Willis teamed up with Susan Wijffels of Australia’s Commonwealth Scientific and Industrial Organization (CSIRO) and other ocean scientists to diagnose the XBT problems in detail and come up with a way to correct them.]
Willis, I don’t know if this has already been brought up, but it’s my understanding they stitch ARGO and XBT together. Does that confound the statistics? What is their accuracy?

Tom Davidson
January 27, 2012 11:53 am

In the animé series Devil May Cry, in the episode “Death Poker”, the protagonist (“Danté”) observes “the only sure way to win at gambling is to have a partner.”

Martin Å
January 27, 2012 11:55 am

Maybe I have missunderstood something, but by your reasoning ONE satellite could never measure anything with any precision. You assume that the ocean temperature is equal to the true average everywhere, and there are 3000 measurements of this one temperature with an added random error. As you say the sensors themselves have high enough accuracy. The question is if 3000 sensors are enough to sample the ocean densely enough. To determine that you need to know spatial autocorrelation of the ocean temerature.

GaryW
January 27, 2012 12:09 pm

Willis,
Accuracy can never be better than instrument error.:
“Not true. As long as the error is randomly distributed over the measurements, the average of a number of measurements can definitely have an error smaller than instrument error.”
There are a couple ways this is incorrect. First, let us assume we are making 1000 measurements of a stable process with one instrument with an accuracy +/1%. We then average the results. If the instrument is reading just shy of 1% high at that process value, your very precise averaged number will still be just shy of 1% high in value. No amount of averaging of multiple values will improve upon that.
Next, let’s consider that same reading taken with 1000 different instruments. There is nothing in a +/- 1% specification that says the instrument error will be random throughout that error range. With real world instruments it will not be random due to common technology, manufacture, and calibration technique.
Your error is assuming that instrument calibration errors may be considered random. Instrumentation folks can tell you that is a bad assumption.

January 27, 2012 12:18 pm

I think it’s in the trees.
More specifically, in molecular bonds in plant cells.

January 27, 2012 12:22 pm

Chemical and molecular bond.

HAS
January 27, 2012 12:50 pm

Perhaps not to answer your question directly, but perhaps a line of inquiry to help understand the difference (and without having read the paper either).
In Fig 3a the heating rates are referenced to a PMEL/JPL/JIMAR product presumably as described in “Estimating Annual Global Upper-Ocean Heat Content Anomalies despite Irregular In Situ Ocean Sampling” Lyman et al. This describes the development of a weighted average integral (WI) “that assumes that the spatial mean of the anomalies in the unsampled regions is the same as the mean for the sampled regions” in comparison with the simple integral (SI) that assumes “zero anomalies in regions that are not sampled’. The paper shows using synthetic data from models that WI “appears to produce more accurate estimates of the global integral of synthetic OHCA than the SI”. Then follows a large number of rather obvious caveats.
If you are to reduce the errors in a measurement then one needs to add additional information (and/or ignore sources of error – something discussed already in this thread and something Lyman et al catalog themselves). The question in this case is where is this additional information coming from? I’d say from this reference that one of those may well be the assumption on the value of unsampled region means using WI.
If this is what Loeb et al have done then they have ended up ignoring the errors inherent in that assumption, and that is where the difference comes from. Using that assumption “3 Argo buoys could measure …. the entire global ocean from pole to pole, to within a tenth of a degree”.
Anyway fun to speculate.

Rosco
January 27, 2012 2:19 pm

I failed several exams giving answers with “one additional decimal of error over the precision of the instrument.”
I would probably have made an excellent climate scientist as I took a few “repeats” to learn to keep to scientific principles with to eror – I simply hated not being precise with an answer.

January 27, 2012 3:09 pm

Willis Eschenbach says:
January 27, 2012 at 10:44 am
Dave, thanks. I understand that there are other sources of variability. My point is that regardless of any and all difficulties in the measurements, we still have an impossibility—a claim that they can measure the temperature of the top mile of the ocean to 0.04°C with thirty Argo floats. That’s the part I can’t figure out.

One possibility Willis, your logic assumes that the standard deviation of the measurements stays constant no matter how small the sample size is, however for a small sample that may not be true?

January 27, 2012 3:58 pm

Willis said:
> As long as the error is randomly distributed over the measurements,
> the average of a number of measurements can definitely have an
> error smaller than instrument error.
Do you have a justification for assuming that the error is randomly distributed? I would have thought there could easily be systematic errors that introduce a bias.
Disclaimer: I’m not a statistician so may be completely wrong.

Verified by MonsterInsights