# Decimals of Precision

Guest Post by Willis Eschenbach

Over at Judith Curry’s excellent blog there’s a discussion of Trenberth’s missing heat. A new paper about oceanic temperatures says the heat’s not really missing, we just don’t have accurate enough information to tell where it is. The paper’s called Observed changes in top-of-the-atmosphere radiation and upper-ocean heating consistent within uncertainty.

It’s paywalled, and I was interested in one rough number, so I haven’t read it. The number that I wanted was the error estimate for their oceanic heating rates. This error can be seen in Figures 1a and 3a on the abstract page, and it is on the order of about plus or minus one watt/m2. This is consistent with other estimates of upper ocean heat content measurement errors.

I think I can conclusively demonstrate that their claimed error is way too small. To understand why, let me take a detour through the art, science, and business of blackjack.

In a fit of misguided passion, some years back I decided to learn how to count cards at blackjack. I had money and time at the same moment, an unusual combination in my life, so I took a class from a guy I’ll call Jimmy Chan. Paid good money for the class, and I got good value. I’ve always been good with figures, and I came out good at counting cards. Not as good as Jimmy, though, he was a mad keen player who had made a lot of money counting cards.

At the time they were still playing single deck in Reno. And I was young, single, and stupid. So I took twenty thousand dollars from my savings for my grubstake and went to Reno. It was an education about a curious business.

Here are the economics of the business of counting cards.

First, if you count using one of the usual systems as I did, and you are playing single deck, it gives you about a 1% edge on the house. Not much, to be sure, but it is a solid edge. And you can add to that by using a better counting system or a concurrent betting system, where better means more complex.

Second, if you play head-to-head (just you and the dealer) you can typically play about a hundred hands an hour.

Doesn’t take a math whiz to see that if you don’t blow the count, you will win about one extra hand an hour.

And therein is the catch. It means that in the card counting business, your average hourly wage is the amount of your average bet.

It’s a catch because of the other inexorable rule of counting blackjack. This regards surviving the swings and arrows of outrageous luck. If you don’t want to go home empty-handed, you need to have a grubstake that is a thousand times your average bet. Otherwise, you could go bust just from the natural ups and downs.

Now, twenty thousand dollars was all I could scrape together then. So that meant my average bet couldn’t be more than twenty dollars. I started out at the five dollar level.

I’d never spent any time in a casino up until then. I felt like the rube in every movie I ever saw. I played a while at the five dollar level. You never win or lose much there, so nobody paid any attention to me.

After a day or so making the princely sum of \$5 per hour, I started betting larger. First at the ten-dollar level. Then at the twenty-dollar level. That was good money back in those days.

But when you start to make a bit of money, like say you hit a few blackjacks in a row and you’re doubling down, they start paying attention to you, and the trouble begins. First they use the casino holodeck to transport a somewhat malignant looking dwarf armed with a pad and a pencil to your table. He materializes at the shoulder of the dealer, and she starts to sweat. I say she because most dealers were women then and now. She starts to sweat because the casino doesn’t really care about card counters. I was making \$20 an hour on average? Big deal, everyone in the casino management made that and more.

What scares casino owners is collusion between dealers and players. With the connivance of the dealer a guy can have a “string of luck” that can clean out a table in fifteen minutes and be out the door, meeting the dealer later to split the money. That’s what casino owners worry about, and that’s why the dealer started sweating, she knew she was being watched too. The dwarf peered through coke-bottle thick glasses, and wrote down the number of chips on each stack in the dealer’s rack, how much money I had, how much other players had. He gave the dealer a new deck. He wore a suit that cost as much as my grubstake. His wingtip shoes were shined to a rich luster. He looked at me as though I were a rich man with a loathsome disease. He watched my eyes, my hands. I started sweating like the dealer.

If I continued to win, the holodeck went into action again. This time what materialized were two large, vaguely anthropoid looking gentlemen, whose suits were specially tailored to conceal a bulge under the off-hand shoulder. They simply appeared, one at each shoulder of the aforementioned vertically challenged gentleman, who looked even dwarfier next to them, but clearly at ease in his natural element. They all three stared at me, and when that bored them, at the dealer. And then at me again.

And if the dealer was sweating, I was melting. I’m not made for that kind of game, I’m not good at that kind of pretence. I found out you can take the cowboy out of the country, but you can’t make him go mano-a-mano with the casinos for twenty bucks an hour.

I lasted a week. I logged my hours and my winnings. During that time, I worked well over forty hours. I only made enough money to pay for the flight and the hotel, and that’s about it. I was glad to put my twenty grand back in the bank.

I couldn’t take the constant strain and pressure of counting and not looking like I was counting and trying to stay invisible and feeling like a million eyes in the sky were watching my every eyeblink and having an inescapable feeling of being that guy in the movies who’s about to be squashed like a bug. But for those who can make it a game and keep it up, what an adventure! I’m glad I did it, wouldn’t do it again.

The part I liked the least, curiously, was something else entirely. It was that my every move was fixed. For every conceivable combination of my cards, the dealer’s card, and the count, there is one and only one right move. Not two. Not “player’s choice”. One move. I definitely didn’t like the feeling that I could be replaced by a vaguely humanoid 100% Turing-tested robot with a poor sense of dress and a really, really simple set of blackjack instructions

But I was still interested in the math of it all. And I had my trusty Macintosh 512. And Jimmy Chan had an idea about how to improve the odds by changing his counting method. And so did some of Jimmy’s friends. And he had a guy who tested their new counting method for them, at some university, for five hundred bucks a run.

So I told Jimmy I’d do the analysis for a hundred bucks a run. He and his friends were interested. I wrote a program for my Mac to play blackjack against itself. I wrote it in Basic, because that was what was easy. But it was sloooow. So I taught myself to program in C, and I rewrote the entire program in C. It was still too slow, so I translated the critical sections into assembly language. Finally, it was fast enough. I would set up a run during the day, programming in the details of however the person wanted to do the count. Then I’d start it when I went to bed, and in the morning the run would be done and I’d have made a hundred bucks. I figured that I’d finally achieved what my computer was really for, which was to make me money while I slept.

The computer had to be fast because of the issue that is at the heart of this post. This is, how many hands of blackjack did the computer have to play against itself to find out if the new system beat the old system?

The answer turns out to be a hundred times more hands per decimal. In practice, this means at least a million hands, and many more is better.

What we are looking at is the error of the average. If I measure something many times, I can average my answers. Is the resulting mean value the true underlying mean of what I am measuring? No, of course not. If we flip a hundred coins, usually it won’t be exactly fifty/fifty.

But it will be close to the true average of the data. How close? Well, the measure of how close it is expected to be to the true underlying average is what is called the “standard error of the mean”. It is calculated as the standard deviation of the data divided by the square root of the number of observations.

It is the last fact that concerns us. It means that if we double the number of observations, we don’t cut the error in half, but only to 0.7 of the original value. One consequence of this is that if we need one more decimal of precision, we need a hundred times the number of observations. That is what I meant by a hundred times per decimal. If our precision is plus or minus a tenth (± 0.1) and we want to know the answer to one more decimal, plus or minus one hundredth (± 0.01), we need one hundred times the data to get that precision.

That is the end of the detour, now let me return to my investigation of their error estimate for the ocean heating rate for the top 1800 metres of the ocean. If you recall, or even if you don’t, that was 1 watt per square metre (W/m2).

Now, that is calculated from temperature readings from Argo floats, about 3,000 of them during the study period.

Let me run through the numbers to convert their error (in w/m2) into a temperature change (in °C/year). I’ve comma-separated them for easy import into a spreadsheet if you wish.

We start with the forcing error and the depth heated as our inputs, and one constant, the energy to heat seawater one degree:

Energy to heat seawater:, 4.00E+06, joules/tonne/°C

Forcing error: plus or minus, 1, watts/m2

Depth heated:, 1800, metres

Then we calculate

Seawater weight:, 1860, tonnes

for a density of about 1.03333.

We multiply watts by seconds per year to give

Joules from forcing:, 3.16E+07, joules/yr

Finally, Joules available / (Tonnes of water times energy to heat a tonne by 1°C) gives us

Temperature error: plus or minus, 0.004, degrees/yr

So, assuming there are no problems with my math, they are claiming that they can measure the temperature rise of the top mile of the global ocean to within 0.004°C per year. That seems way too small an error to me. But is it too small? If we have lots and lots of observations, surely we can get the error down to that small?

Here’s the problem with their claim that the error is that small. I’ve raised this question at Judith’s and elsewhere, and gotten no answer. So I am posing the question again, in the hope that someone can unravel the puzzle.

We know that to get a smaller error by one decimal, we need a hundred times more observations per decimal point. But the same is true in reverse. If we need less precision, we don’t need as many observations. If we need one less decimal point, we can do it with one-hundredth of the observations.

Currently, they claim an error of ± 0.004°C (four thousandths of a degree) for the annual average upper ocean temperature from the observations of the three thousand or so Argo buoys.

But that means that if we are satisfied with an error of ± 0.04°C (four hundredths of a degree), we could do it with a hundredth of the number of observations, or about 30 Argo buoys. And it also indicates that 3 Argo buoys could measure that same huge volume, the entire global ocean from pole to pole, to within a tenth of a degree.

And that is the problem I see. There’s no possible way that thirty buoys could measure the top mile of the whole ocean to that kind of accuracy, four hundredths of a degree C. The ocean is far too large and varied for thirty Argo floats to do that.

What am I missing here? Have I made some major math mistake? Their claimed error seems to be way out of line for the number of observations. I’ve not been able to find a good explanation of how they come up with these claims of extreme precision, but however they’re doing it, my math doesn’t support it.

And that’s the puzzle. Comments welcome.

Regards to everyone,

w.

5 1 vote
Article Rating
180 Comments
Inline Feedbacks
View all comments
Woodshedder
January 26, 2012 9:05 pm

Great post…

jeef
January 26, 2012 9:08 pm

Each buoy itself makes thousands of observations? Just a thought…

Truthseeker
January 26, 2012 9:18 pm

Willis, there is no mystery here. Their precision is between 10 to 100 times less than what they think it is. 3000 Argo buoys is a trifling number considering the vastness of the world’s oceans. The problem is if they acknowledge this lack of precision, they will then claim that the “missing heat” is in the error range of their ocean temperature measurements and therefore that they are right!

Matthew Bergin
January 26, 2012 9:24 pm

The three thousand buoys are only measuring an volume of the ocean equal to the amount they displace at any one time. So only a very small fraction of the total ocean is being measured.

Brian H
January 26, 2012 9:25 pm

Well, there’s quantity, and then there’s quality. Quality embraces distribution and location. Are the samples random? Are they clustered? Etc.

Chris Nelli
January 26, 2012 9:25 pm

Given enough time, I think your assumption is right. You have to consider the time for the oceans to be mixed, etc. In your blackjack computer model, did you have to consider the time it took to shuffle the deck after so many hands are played? Likely not, if you used the random number generator command. In summary, you have to balance the number of buoys against the speed of mixing that occurs in the ocean. The more buoys, the less concern one has about mixing.

Richard Keen
January 26, 2012 9:32 pm

Willis, nice job and impeccable logic.
A couple of years ago Vincent Gray wrote a short piece on ICECAP about the missing heat, and I chimed in with an estimation of the error based on the differences between Trenberth’s 1997 and 2009 values for the energy budget. The working assumption was that the differences between the two estimates for each component was roughly the error for each value. The conclusion:
Trenberth’s net “missing heat” of 0.9 W/m2 in 2009 had an uncertainly of +/- 1.3 W/m2. In other words, it could be zero, or even negative.
A different approach than Willis used, but the same conclusion.
Details at: http://icecap.us/index.php/go/icing-the-hype/the_flat_earth/

Bob Shapiro
January 26, 2012 9:35 pm

Always enjoy your posts, Willis.
One possible difficulty that I see with your analysis is the proverbial apples to oranges problem. In your fun description of your gambling escapade, you’re looking at percent error. Your betting advantage is 1%, and you’re looking at how to raise that advantage by comparing two systems to see which returns a greater percent advantage.
While I don’t disagree with your conclusion relating to their not being able to measure the oceans with so few “stations,” I think you would need to show percent error rather than a decimal fraction of a degree…. Or mybe my thinking is all wet.

crosspatch
January 26, 2012 9:35 pm

And it also indicates that 3 Argo buoys could measure that same huge volume, the entire global ocean from pole to pole, to within a tenth of a degree.

There’s no possible way that thirty buoys could measure the top mile of the whole ocean to that kind of accuracy, four hundredths of a degree C

Well, lets say the floats report temperature to +/- 0.1 degree. With three floats, we can be fairly certain that we can measure the parts of the ocean where the floats have measured to 0.1 degree. Now comes the second part, extrapolation. It is then absurd to project the average of those three floats across the entire globe because there are more than three different temperature regimes across the globe. Now lets say we go to 300 floats. Now we can get the error down to 0.01 degrees, but again only that certainty in the areas where we have sampled. But we can probably begin to project the values received from those floats as being representative of larger areas of the world’s ocean. So there are two uncertainties here, not one. The first uncertainty is the uncertainty of the measurements themselves. The second is the uncertainty of how much this actually projects across the entire globe.
Using a gambling analogy, lets say you are studying the average payout of slot machines across all of Reno. If you decide to use only three slot machines, one in each of three casinos, you will first need to take a number of samples just to find out what the average of those three are. But are those three representative of all slot machines in all establishments in Reno? So now you decide to have 300 testers at one slot machine in 300 different locations (Reno has slot machines everywhere, even in the gas station). So now you must make a lot of measurements at each one, but are you still sure that the result can be projected across the entire population of slot machines in Reno? Might you have obtained a different result at one of the establishments if you had picked a slot machine at the end of a row close to the entrance?
Ocean temperatures aren’t perfectly stratified across the entire ocean or even the entire region. Maybe a float dove through a cold upwelling someplace that is actually fairly localized. Is its temperature reading typical for a huge region of ocean? So there are two uncertainties involved. The error of the average from taking so many samples in places in so many different locations, and then there is a projection or extrapolation uncertainty. To what extent is it really valid to have a temperature reading in one spot represent some entire grid square? Does one slot machine’s payout represent the payout of all other slot machines within that grid square? Does the average obtained from all of the measurements really have any significant meaning at all? Maybe not. Maybe each place is different until you get so many readings from so many locations that the various local conditions can be accounted for. Which, by the way, is one reason I don’t like the wholesale removal of stations from the global surface temperature databases. The weather in Sacramento may not be representative of the weather in Ashburn and the temperature in Willits may not be representative of the temperature at Ft. Bragg, California (not to be confused with Ft. Bragg, NC).

crosspatch
January 26, 2012 9:36 pm

“Each buoy itself makes thousands of observations? Just a thought…”
But it is moving so each observation is at a different point. If you go from slot machine to slot machine playing one game at each, do you really learn the average payout of machines at the establishment?

January 26, 2012 9:41 pm

I’ve made this argument before about land stations not having enough individual accuracy as demonstrated by Anthony’s Surface station project. then the laughable idea that you can get a global temperature from using anything short of millions of locations, most of them needing to be in the sea, and many more needed in the polar regions.

January 26, 2012 9:43 pm

If there is any missing thermal energy I would expect to find it under the ocean floor (and in the crust under land surfaces) where it really must go when temperatures rise relatively quickly. Please see my post here for reasons . . .
http://wattsupwiththat.com/2012/01/26/october-to-december-2011-nodc-ocean-heat-content-anomalies-0-700meters-update-and-comments/#more-55499 January 26, 2012 at 7:26 pm

Walter
January 26, 2012 9:45 pm

Claiming a single figure for widely varied oceans seems a bit of a stretch of credibility.
Then… cutting the number of devices doing the sampling would tend to make the expected extremes more extreme.
Effectively, its layer upon layer of BS.

Editor
January 26, 2012 9:54 pm

Your statistical preamble is based on all numbers being part of the same chance-based set.
The Argo buoys measure diffferent points that are not part of the same chance-based set but are physical place-times in a variable ocean. The rules for how many observations you need to arrive at an ocean average would surely be completely different. So you have the answer already: “The ocean is far too large and varied“.
Not very well expressed, and I’m no expert in this statistical stuff, so hopefully someone will explain it better.

January 26, 2012 9:55 pm

It’s worse than you thought, Willis! 😉
The observation that you need 100 times the observations to gain an extra significant digit is well stated. But that is all you can hope for provided the observations are independent.
In the case of Argo, there is likely a bit of covariant linkage between the readings. My bias is that neighbors are positivly corelated, and if so, it will take many more than 100 x to gain that decimal point.

Hoser
January 26, 2012 9:56 pm

Did precision get confused with accuracy? Calibration? Who’s checking? How?

RockyRoad
January 26, 2012 9:58 pm

I’d still like to do a variogram on some of the data to see if there’s any spatial correlation, Willis. I’m willing to admit right off we’d find correlation down each temperature profile, but that only counts for one dimension. Whether there’s any geostatistical correlation in the other two dimensions is the big question, and if we assume there isn’t, then there’s no reason to forge ahead in that effort–the data can be treated simply as random samples and your assertions are likely correct–they can’t be as confident as their numbers indicate.
But in the fanciful world of ocean geostatistics, the overall average and distribution of “blocks” representing the entire ocean would be far different from that of the sample points from which it was derived. I’ve seen it hundreds of times applying geostatistics to insitu mineral models on a variety of scales–using drill holes spaced 1,000 feet, 500 ft, 100 ft, and even 20 feet apart (the latter are numerous blast holes drilled on mining benches). And generally the more samples that are taken, the more the range on the variogram shrinks and the natural heterogeneity is accentuated. I wouldn’t be surprised if the same were true of the vast oceans.
Which means we probably won’t ever get there–to achieve their stated level of error with any acceptable confidence would require spatial statistics and likely require far more time and money than anybody is willing to invest. (Ah, to be the recipient of some wealthy aunty’s vast fortune someday.)

Keith W.
January 26, 2012 9:59 pm

Sorry Jeef, but Willis dug into that as well in another post, and they don’t make that many observations.
http://wattsupwiththat.com/2011/12/31/krige-the-argo-probe-data-mr-spock/

Jim D
January 26, 2012 10:00 pm

I don’t think anyone would be satisfied with an error of 0.04 degrees because this is much higher than the annual average warming rate which is nearer 0.01 degrees per year. So 0.004 degrees per year, which they have, means they can measure the temperature rise of the ocean on a annual to decadal time scale, and can at least tell with high confidence whether an annual warming rate of 0.01 degrees exists, which I think they can.

January 26, 2012 10:07 pm

Willis, silly question: Have you asked The Statistician to the Stars?
http://wmbriggs.com/blog/

January 26, 2012 10:09 pm

Thanks Willis, my own experience leads me to disbelieve most any temperature data reported at better than ±0.5°C. Call me crazy, but I just don’t trust temperature measurement that well, especially when there is any room for subjective human judgement.

David Falkner
January 26, 2012 10:14 pm

@ jeef:
There would still be issues with the spatial distribution. That may be where 3000 buoys will work to get a thousandths place error, but 3 will not work for tenths. Although, I am wondering if one of those buoys can even measure a single measurement that precisely.

Jeremy
January 26, 2012 10:15 pm

Math like this only works with observations that are 100% accurate (no error). In the real world, instruments have errors and limited accuracy and instruments will drift with time.
For example, if there is random background noise affecting an instrument then the accuracy of the measurement reading of the instrument can be improved by square root of the number of observations. However, you cannot improve non random error – so ultimately measurement accuracy soon falls victim to inherent limitations of the instrument (non-linearity, offset, drift and resolution). You cannot solve this by adding more instruments or more observations because of the non-random nature of many of these errors (similar built instruments will suffer from similar degrees of non-random error)

thingadonta
January 26, 2012 10:20 pm

I think the counters at Blackjack win by raising their bet significantly when the count is favourable, and sitting around with low bets the other 98% of the time. But this of course raises suspicions at the casino, its easy to see when somone varies their bet greatly. And you have to be willing to lose the big bet, which most people aren’t willing to do.
As for the oceans, I assume they intergrate a time factor, that is; whatever degree of larger error is in the measurement at one time (say within a cold current which moves around), will cancel out the next time (when the cold current has weakened), meaning the larger errors cancel out over time. Not sure if this is what you are after, but a trend over time might reduce such errors.
Off topic a bit, but I agree with many that averaging data further back in time by proxy measurements shouldnt be allowed (such as in Mann’s various papers), as in this case what you are averaging is not data, but proxy data, meaning you are 1) mixing different uncertainties 2) biasing data towards whatever errors are in the proxy itself over time, that is; many proxy methods get less responsive the further back in time you go, meaning if you are averaging proxy data further back in time, you will likely simply flatten any deviations the further one goes back in time, with the older proxy data being by nature, less responsive. (and you get a hockeystick towards the recent end, as data becomes more reponsive). Mann uses this false method to claim the MWP was lower in T than the today.

January 26, 2012 10:21 pm

One of my favorite statistical modeling stories I heard in graduate school in the late ’70s. It dealt with the design of the Trans Alaskan Pipeline Oil Tanker Terminal at Valdez, Alaska. The pipeline was designed for a maximum 2,000,000 bbls of oil per day. So, there had to be capacity at Valdez to temporarily store that oil if tankers were late. So the crucial question was what is the tankage needed for a 9x.x% confidence to store the oil rather than slow down the pumps?
Tankers are mechanical and therefore have some schedule delays. But the crucial problem was Gulf of Alaska weather. No problem. Thanks to Russian fishing villages, they had 200 years of weather reports. All the members of the consortium had the same data. Their OR teams crunched the numbers and came back with:
1.25 days
1.50 days
7.00 days. !?!?
“How do you justify SEVEN Days when we come up with a little more than one?”
“We modeled tanker delay according to the frequency of major storms.”
“So did we.”
“And when a storm delays one tanker, it delays ALL of them.”
“….. hmmm. Right. The delays are not independent. ”
The story is they quickly settled on six days.
It is quite a big terminal.

StatGuyInDallas
January 26, 2012 10:22 pm

Jeef is probably right. Many observations per buoy – making it a cluster sample.
Similar calculations can be made – just a little more tedious.

January 26, 2012 10:22 pm

Regarding the error measurements discussed, what Watt says seems right to me.
A very similar point can be made about those models. They estimate SW and LW radiation and each is acknowledged to have 1% to 2% uncertainty. Then they take a difference to get net radiative flux at TOA. But such flux in practice varies between about +0.5% and -0.5% of total incident solar flux So how on Earth can the “accuracy” of the difference be even sufficient to determine whether the end result is positive or negative – ie warming or cooling?
The whole model business would be a joke if the consequences were not so grave – like spending \$10 trillion over the next 100 years for developing countries who could well do with the money, yes, but spend it on more useful projects – projects that could easily save lives.

Shub Niggurath
January 26, 2012 10:22 pm

Well, if we believe the numbers, that is what the implication is. The ocean is just one big homogenous lump of water. You don’t need too many thermometers to measure its temperature – it is the same everywhere (within climatic zones, that is).
The missus and I were doing calculations of statistical power recently. I had several revelatory moments even then.
Nicely written.

Duke C.
January 26, 2012 10:26 pm

Willis-
The SV for blackjack with standard rules is 1.15. With a 1% edge you should have been betting \$150 (.75% of your bankroll) per hand. Adjust your bet size according to bankroll fluctuations and you’ll never go broke. Eventually, you would have been a rich Casino owner instead of a poor climate change heretic. 😉

Jim D
January 26, 2012 10:42 pm

Adding to what I wrote above, you have to remember this is a decadal trend that is being evaluated, so this 0.004 degrees per year is really an accuracy of 0.04 degrees per decade. Since the actual trend is probably just over 0.1 degrees per decade, they are saying they can resolve this reasonably well, and at least be certain of the warming. 0.04 degrees per year translates to 0.4 degrees per decade which is quite useless for determining even the sign of such a trend, and I am certain they can do and are doing better than that.

Eyal Porat
January 26, 2012 10:44 pm

Willis,
First, great post – especially the story :-).
This 0.004 C is soooo accurate and unmeasurable it seems meaningless up front. It is way too accurate to take seriuosly.
As you have taught me: when you smell a rat – in most cases there is one.

David Falkner
January 26, 2012 10:48 pm

Willis:
Basically, their thermometers are good to ± 0.005°C, and seem to maintain that over time.
——————
In the body of the story, you calculated the error as equivalent to 0.004°C. Shouldn’t the error at least equal the error in the instrument? How could applying math to the output of the instrument make the instrument itself more efficient? I feel like I am missing something in your calculation, perhaps. Or maybe I am not understanding your point?

January 26, 2012 10:49 pm

Two issues (why is it always two? – I will try to make it three or four just to be difficult.)
So, we must separate serial observations from observations in parallel, and precision from accuracy, and both from spatial distribution or coverage, or more pertinently, the density of measurements.
1. For estimating accuracy, not the number of buoys but the number of observations per buoy over time. Since each buoy only records locally, whatever it measures is only local. So if the buoys are moving around, you also need to know something about that.
2. The precision with which we can measure is a complex function of the instrumentation’s inherent or “lab” precision, the environment and the calibration procedures (if any.) This must be accounted for when measuring accuracy.
2. Clearly, even 3,000 buoys do not provide enough density to provide meaningful coverage. The best way to deal with accuracy under these circumstances might be to “stratify” measurements into temperature or energy bands, so that all buoys making measurements in the same band can be aggregated to assess accuracy. I make this point since it seems meaningless to be assessing
accuracy to however many decimals when the temperatures observed by different buoys might vary by factors of 2 or more.
For example, if say, 20% of the buoys operate in a “tropical” band, and the number of observation per buoy is say, 1,000 per week (I have no idea what the actual number might be) then we would have .2(3000)(1000) = 600,000 observations on which to assess accuracy, and then only if the swings in measurement were not too wild and the buoy did not move a thousand miles or more over the period.
Hope I am not writing total nonsense?

David Falkner
January 26, 2012 10:49 pm

And we’re assuming top operational efficiency, at that!

January 26, 2012 10:56 pm

write me. ill send u the paper

Alan Wilkinson
January 26, 2012 11:01 pm

There is no statistical validity whatever in these error estimates since the assumptions that all the measurements are independent and drawn randomly from the same population are certainly false to unknown degrees.
Has anyone any quantitative examinations of why the simple sea-level change measurement should not directly reflect heat content over relatively short time-spans? Since the rate of change has been modest and fairly consistent?

Alan S. Blue
January 26, 2012 11:21 pm

Willis, I keep meaning to write a couple of articles along the lines of “How do we measure Temperature – from a single thermometer to a Global Mean Average.”
One issue with the surface stations that may well be a factor in the oceanic measurements is using the NIST-calibrated “Instrumental Error” when calculating the error of measuring the temperature of a ‘gridcell’. Thermometers with a stated error of 0.1C are common. But the norm in climatology is to take that single measurement and start spreading it towards the next-nearest-thermometer under the assumption that they’re representative of the entire gridcell. And the assumption that the temperature is smooth relative to the size of the gridcells.
There’s nothing too wrong with that when you’re just aiming for a sense of what the contours of the temperature map look like … But when you turn around and attempt to propagate the errors to determine your -actual- precision and accuracy at measuring the temperature, there seems to be a recurring blind spot.
A perfect, out-of-the-box thermometer, NIST-calibrated with stickers and all as “±0.1C” just doesn’t in general provide a measurement of the temperature for the gridcell to an accuracy of 0.1C. Nor anywhere remotely nearby. Yes, the -reading- might well be “3.2C”, but that is a point-source reading
Watching a single weather forecast off your local news should drive this home: There’s only a couple gridcells (max) represented. And yet there’s obvious temperature variations. Some variations are endemic – because they represent height variations or natural features. (Or barbeques, whatever) But there is generally plenty of shifting in this analogous plot of temperatures as well. Some town normally warmer than yours might show up cooler, or way hotter, or tied on any given day.
It’s possible that someone has done a ‘calibration study’ with the Argos, to determine not just that it measures temperature in the spot it’s actually in, but to determine how well the individual sensor measures its gridcell. -Not- a “correlation study”, that just shows “Hey, we agree with other-instrument-X with an R of XXX” – that doesn’t tell you the instrumental error. I just don’t know enough details for the oceanic studies.
A ‘calibration study’ would look more like 3000 instruments dumped into a -single- gridcell. And it has exactly the same issue you bring up with regards to numbers: To make a perfectly accurate “map”, you need a map that’s the same size as the surface being mapped to cover all the irregularities.
In chemical engineering, (my field), temperature can be very important to a given process. Having five or so thermometers on the inlets and dispersed around the tank of a -well understood- process might well be dramatically insufficient to the job of “measuring the temperature” to a measly 0.1C. That’s a tank in a reasonably controlled environment with a “known” process.
So, I wouldn’t pay much attention to the error bars unless the instruments being used are more ‘areal’ as opposed to ‘point source’ or there’s an exhaustive study of some gridcells.

January 26, 2012 11:21 pm

I have to ask, what’s the real point of all this measuring of OHC? Is it just a ploy for more funding to try to prove AGW a reality? The overall trend is not far different from sea surface temperatures. Wouldn’t it be better to get a satellite working again on sea surface temperatures to ensure some continuity of that very important data which, in my mind, is the most indicative of what’s happening and where we’re heading.

F. Ross
January 26, 2012 11:21 pm

Willis. Interesting post.
Just curious; do you know what the power source of the buoys is. And has the possible presence of heat from a power source been taken into account – as far as it might possibly affect temperature measurements?
buoy

Alex the skeptic
January 26, 2012 11:23 pm

“……..says the heat’s not really missing, we just don’t have accurate enough information to tell where it is”
It’s like santa. It’s not that santa does not really exist (don’t tell the kids), it’s just that we don’t know exactly where he lives, although he could have drowned togethher with the polar bears.

James
January 26, 2012 11:27 pm

Willis
You can estimate the probability of tossing a head by flipping a coin 100 times by counting the number of heads flipped and dividing by 100. When you do this, the standard deviation of this estimate will be .5*.5 / sqrt(100) = .025 or 2.5%. People then like to say A 95% confidence region is plus or minus 2 standard deviations or +/- 5% in this case.
However, this last bit is relying on the law of large numbers to assert that you average estimate is normally distributed. If you were to only flip your coin once, you would get an estimate that has a 10x standard deviation, or 25%, but you would have to be very careful what you did with that. You could not assert that your estimate is normal and thus the 95% confidence interval is +/- 50%. You would either get a range of -50% to 50% or 50% to 150%…nonsense.
The problem is when you have a small sample, you have to work with the small sample statistical properties of your estimators and abandon all your large sample approximations. Bottom line is your statistics go haywire usually with small samples 🙂
Hope that helps!
James

old44
January 26, 2012 11:31 pm

“the heat’s not really missing, we just don’t have accurate enough information to tell where it is”
Like Little Bo-Peeps sheep.

John
January 26, 2012 11:39 pm

If I’m reading this correctly, the error you mention is the error in the mean based on the data from the ARGO buoys….what about the measurement error for the buoys. What is the error associated with the actual measurement. The reason I ask is that errors are cummulative so any measurement error would increase the error of the mean.
Coming from a geostatistical background, how is the data treated in terms of data clustering. If you have lots of measurements very close (i.e. some of the buoys clustered together) then the weighting applied to these measurements should be lower, compared to the buoys from areas of sparse coverage. Unless the coverage is very even, then clustering may be an issue (e.g. more coverage in the tropics compared to the colder Arctic/Antarcic water). The result of this can bias the estimate of the mean.

F. Ross
January 26, 2012 11:40 pm

Please ignore my previous post. Dumb question on my part when a simple Google search shows what I wanted to know.

PiperPaul
January 26, 2012 11:46 pm

What if sharks eat some of the buoys?

Blade
January 26, 2012 11:48 pm

I guess we can state with certainty that Willis discovered Trenberth’s missing heat, it is found in Casinos at Blackjack tables populated by math whizzes 😉
Just a suggestion for the young rubes out there that get a taste of the favorable mathematics of counting. Besides money management and an adequate stake, there are two other variables that need to be controlled perfectly: the high-low counting must be correct, and the strategy (hitting/staying) must also be flawless. After locking up those two variables, the math is favorable, well, depending on number of decks. It is a hell of a lotta work to earn a small profit as Willis so artfully described. Truly it is the math challenge that draws so many Scientists to these games.
Willis, really wild guess, is Jimmy Chan == S.W.?

cb
January 26, 2012 11:53 pm

Um, I think this was touched on above somewhere, somewhat, but I do not think you could use this type of statistical reasoning at all.
The problem is that the oceans are, in effect, a collection of fluid-based heat-transferring machines. The system is deterministic, not random (not to mention that the systems change a lot over the course of the seasons, depending on the effect of previous seasons). In other words, using interpolation would not be an allowable procedure. (Unless you measure for decades? Even then, climate changes naturally over that timescale, so your results would end up being meaningless anyway.)
If you were to use randomly (or evenly, same thing if the machine is sufficiently unknown) distributed-in-space measurement buoys, then you will surely have very large error – to the point where your measurements would be worth nothing.
In the context of trying to determine global heat-flow, by measuring temps using the buoys.
This is a truism, is it not? I do not know what the buoys were purposed for, but it makes no sense to use them for this.

January 26, 2012 11:58 pm

Three Argo buoys? In principle, that’s not a problem, assuming that the AVERAGE temperature of the ocean down to 1800m is CONSTANT over time. Given that assumption, the LOCAL water temperatures can vary somewhat over time, with some of them going up a bit, while others are going down a bit. Given an arbitrary length of time for making measurements, even 3 buoys could pin down the AVERAGE temperature of the ocean down to 1800m to whatever degree of accuracy you like.
Here’s the fly in the ointment: The handy dandy Standard Error of the Mean (SEM) formula is only valid in the case where you take repeated measurements OF THE SAME THING. In this case, “the same thing” is the AVERAGE temperature of the ocean down to 1800m.
That magic formula does not apply here, because the AVERAGE temperature of the ocean down to 1800m is always changing slightly. Sometimes it’s going up slightly, and at other times it’s going down slightly. Here are a couple of other ways to frame the issue.
The SEM formula only applies to RANDOM ERRORS. It does NOT apply when there are SYSTEMATIC ERRORS, aka METHOD ERRORS. And that’s what we’re talking about here.
The buoys’ thermometers are shooting at a moving target, and that’s beyond the scope of the SEM formula.
To use a Peter Principle expression, we’re promoting the SEM formula to its level of incompetence.
My gut feeling is that the researchers have vastly overstated their case. However I don’t know enough about stats to tell them how to do their analysis correctly. Unfortunately, they don’t know enough either. But that’s par for the course in Climate Change ‘science’.

January 27, 2012 12:19 am

To me the missing 0.9 W/m² is a joke. I won’t bore the moderators with another KT 1997 atmospheric window post on the subject.
The KT 1997 calculation for the atmospheric window (using correct math) gives 80 W/m² or 87 W/m² depending on what assumptions you make. If the 80 W/m^2 value is correct, then that’s a 100% error of 40 W/m². If the 87 W/m² value is correct, then we’re talking about a greater than 100% error of 47 W/m². Either value dwarfs that 0.9 W/m² value.
Even if we calculate the atmospheric window the way KT 1997 does, we get 37.62 W/m² which they round up to 40 W/m². That’s a 2.38 W/m² slop in rounding alone.
As I said, that 0.9 W/m² value is laughable.
Jim

jl
January 27, 2012 12:34 am

The Argo project has given us a blurry snapshot of ocean temperatures, but that is all it has done.
Based on many small points of accurate data a picture has been assembled.
How could they possibly know the accuracy of their assumptions about such a dynamic system as an ocean until they repeat the test, in say, ten years?

January 27, 2012 12:38 am

John De Beer nails the issue – density of measurements against the scope and stability of the system being measured. 3,000 buoys is literally a drop in the ocean. You cannot extrapolate these with any confidence. It is the same mistake GISS makes by smearing one temperature measurement across 500-1400 km. GISS claims a reading in Washington State can indicate the temperature as far away as Southern CA.
Utter nonsense. And the reason it is utter nonsense is the atmosphere is not homogenous or stable over those distances. temperatures are not homogenous over 10’s of kilometers on any given day in most places. The can often have a standard deviation of 3-4 °F.
The required density of measurements for something as large and dynamic as the temperature of the atmosphere or oceans is enormous if you want sub degree accuracy. We don’t come close with land or ocean sampling.
Spacecraft come the closest because they use a uniform measurement (one instrument, one calibration configuration) over the entire globe and take many thousands of measurements. The problem here is they only see an area of the globe every 100 minutes or so. Therefore not all measurements are at the same time of day everywhere on the Earth (though every point on the Earth is measured at the same times a day).
That is why the benchmark should be satellites, and we can use them to measure the error in widely distributed surface samples from an unknown number of sensors of unknown calibration quality. I have proposed this in the past. A satellite can give an average over a region everyday at the same time. You then compare the thermometer results every day to the satellite average. It would be an interesting result to see.
Anyway, the sample density has to reflect the dynamics of the system. By example I need a lot of samples to determine the orbit of a satellite to a high accuracy. But the orbit only slowly decays under various non-linear forces. So I don’t need to re-sample for 7-10 days to regain my precision in orbit knowledge for most applications.
However, if I want to know Sea Surface levels to cm’s, I need to sample my orbit continuously (through GPS) and then post process the data again with ground references (differential GPS) to theoretically get the CM resolution.
Orbits are simple, slow changing systems. Atmosphere and ocean mixing, currents, dynamics etc are not. Once you realize how much dynamic is truly out there, you realize how crazy it is for GISS or CRU or NCDC to claim sub degree accuracy on a global scale about anything. Temperature samples are only good to feet sub degree (if that). It is impossible with current sample densities to improve on that unless you have millions of samples.
In other words, what we don’t measure regarding temps dwarfs what we do sample by many orders of magnitude. Add in the lack of samples from before the modern era, and you realize we are clueless not to just what is happening now, but what happened 50 years ago (let alone 1,000).

Sean Houlihane
January 27, 2012 12:41 am

Read James’ comment again. Your error here is in comparing two different distributions (and not seeming to have read up on any statistics).
When you’re counting improbable events, (p<<1-p), the standard deviation of the number of wins will tend towards sqrt(n) as the number of trials increases.
I'm currently running randomly generated tests to find problems in my design. If I see 5 failures out of 10, I can be reasonably sure that with any other random group of ten tests, I'll still see between 8 and 2 failures. If I only see 1 failure in 100, I need to run 10,000 tests in order to be confident in the underlying failure rate. Only when I've seen at least 10 failures can I have much confidence that my failure wasn't a fluke.
With continuous samples, combining fractions of a degree (particularly if the month to month noise is significant with regard to the sampling resolution) it should be less necessary to take many many measurements – but the statistics is fairly well known, provided you can identify the nature of the variables you are dealing with. You can also make a model fairly simply to determine the outcome of a sample of random runs.

John
January 27, 2012 12:49 am

Just a couple of other points to ponder…
Again, coming from a geology background in mining and geostatistics. What sort of quality control is there on these instruments? For example, do they need to be regularly calibrated? Are there duplicate measurements collected to assess precision (sampling precision).
One thing you touched on is the representativeness of the measurements…for me this basically comes down to what’s often refered to as sampling theory…look up Pierre Gy and Francis Pitard if you’d like to investigate the issues of sampling errors and how representative a sample is. Sampling theory is primarily about sampling particulate materials, but I think the basic principles would also apply to sampling water temperature (and atmospheric temperature and CO2 content).

January 27, 2012 12:51 am

Can the standard error of the mean be smaller than the instrument error? Sure. You just need lots of measurements.

If each measurement was made by a different device (or a random device from the pool of devices) at a random location (spatially & perhaps temporally) this would be true. But if my thermometer (I only have one, you know) is graduated in 1°F, it really doesn’t matter how many times I look at that danged thing, I can’t accurately measure to ±0.01°F.

January 27, 2012 1:17 am

CLIMATE SCIENCE gone HOMEOPATHIC !
There is hilarious cartoon from Josh on the Tallbloke’s blog.
http://tallbloke.wordpress.com/2012/01/27/gavin-schmidt-climate-homeopathy/#more-4579
I was always sceptical about the ppm science.

Fredrick Lightfoot
January 27, 2012 1:17 am

Williis,
a more simple way of “dealing” with this problem is;
we have 335,358,000 sq km of oceans on our planet,
which means that if the buoys were place at regular intervals there would be ;
1 for each 111,785 sq km
the state of Virginia 110,785 sq km
the state of Tennessee 110,910 sq km
or the country of;
Bulgaria 110,910 sq km
Now I am sure that any resident of the above would give you an answer to the problem if the was only ONE temp. recorder in their state, country..

January 27, 2012 1:21 am

This reminds me of what the late John L Daly wrote:
http://www.john-daly.com/altimetry/topex.htm
TOPEX-Poseidon Radar Altimetry:
Averaging the Averages
“How many stages of statistical averaging can take place from a body of raw data before the statistical output becomes hopelessly decoupled from the raw data which creates it?
Imagine for example getting ten people to take turns at measuring the distance from London to New York using a simple ruler on a large map from an atlas. Each person would give a slightly different reading, perhaps accurate to +/- 10 miles, but if all these readings were averaged, would that make the final resolution of the distance accurate to one mile? Perhaps. But if a thousand people were to do it, would that narrow the resolution to mere yards or metres? Intuitively, we know that would not happen, that an average of the measurements of a thousand people would be little better than an average from ten people.
Clearly, there are limits as to how far statistical averaging can be used in this way to obtain greater resolution. An average of even a million such measurements would be scarcely more accurate than the average of ten, diminishing returns from an increasing number of measurements placing a clear limit on the resolution achievable. The problem lay not in the statistics but in the inherent limitations of the measuring devices themselves (in this case, a simple ruler and map)….”

Alan the Brit
January 27, 2012 1:56 am

Great post.
Have they looked at the error tolerances of these 3000 argo buoys? Each one cannot possibly perform exactly the same under all conditons, it is impossible. Where were they made, under what factory conditions? De ja vu I recall when at college in my yoof doing surveying at Brighton Uni for a week. We used the Kingston College Wilt T2 (pronounced Vilt) Total Station theodolite built in Switzerland, there were only two in the UK at the time, one held by the college, the other by what was then Greater London Council, & a Nikon ?? Total Station theodolite can’t remember the model number, but it was almost comparable in quality. However that is irrelevant. We were told that we had to be very careful in calculating station data by “face-left” & “face-right” observations, because of the following:- The T2 was claimed to be able to read angles to an accuracy of 1 second of arc, it said so in the manual, whereas the Nikon only claimed to be able to read angles of up to 3 seconds of arc. The Japanese lenses was ground to an accuracy of 1 second of arc, the T2 lenes were ground to an accuracy of 3 seconds of arc! Therefore one was more reliable than the other, although the other claimed a greater accuracy. You pays your money & takes your choice! And frankly, if anybody tells me that they can measure the temperature of the Earth, the atmosphere, or the oceans, to an accuracy greater than around a tenth of a degree Celcius I’d laugh in their face! Prof Paul Jones’ claims that the three warming periods of the last 150 years had warming rates measured to a thousanth of a degree were just ridiculous & could only be derrived from arithmetical construction!

BenAW
January 27, 2012 2:27 am

You take the depth heated as 1800m.
Looking at the depth of the thermocline in the tropics seems to be more like 500 m or so
that is DIRECTLY heated by incoming solar.
Thermocline depth reduces to 0m towards the polar circle.
Taking an average depth of eg. 200m gives a much greater temp. error

John Marshall
January 27, 2012 2:38 am

Whilst the ARGO system is good it is not that accurate. Just look at the figures, there are 335,258,000sq Km of ocean and 3200 buoys so each buoy is monitoring 104,768sq Km of ocean times depth gives the not inconsiderable volume of 188,582cu Km to look after and get the temperature. Quite a feat but not possible to any real sort of accuracy.
Many buoys are free floating so will remain in the same volume of water in a current which will not be representative of the ocean as a whole.

January 27, 2012 2:49 am

Assuming that the instruments are precise enough to measure a thousandth of a degree. If the temperature of the oceans was the same everywhere, you would need one buoy to take the oceans temperature.
Otherwise you would need one buoy in every blob that was a thousandths of a degree different to its neighbour. If there is a twenty degree range, that means twenty thousand bouys, one for each blob. But the blobs are seperated into thousands or millions of seperate same temp blobs. They are seperated vertically as well.
So I reckon 20,000 * 1,000 * 1,000 buoys might be getting close.
thats twenty giga-bouys

E.M.Smith
Editor
January 27, 2012 2:59 am

There are problems with the notion of taking a bunch of measurements, and averaging them to get ever greater precision. Unfortunately, I’ve fought that battle for months (years?) and have tired of it. Why? Because their are subtleties to it that folks just do not believe.
The first issue is that SOMETIMES you can do it, and it works fine. Other times, not so much. What goes into which bucket is hard to list / explain… so just ends up with endless bickering of the “does so / does not” kind. (For that reason, I’m going to say what I have to say then simply ignore any followup. I know where it ends and it simply is not worth the time.)
The “just do it” folks all had the statistics class that showed how the deviation of the mean could be lower than the deviation of the values. (I had it too). What they didn’t have (or don’t remember?) was the caveats about it not always being applicable.
So what works?
Well, measure a thing with an instrument. The more times you do it, the closer the average comes to the real deal. Take a piece of paper 11 inches wide. Measure it with a ruler and you may get 10.9 one time, 11.1 the next, and 11.0 two times. Repeated enough, that first 0.1 error will tend to be averaged out by more 11.0 and offset by the same number of 11.1 measurements. This removes the random error in the act of measuring.
Measure it with a different instrument each time and you can remove the random instrument errors.
All well and good.
HOWEVER: The error has to be random, and not systematic.
If I always measure from one side (being right handed and right eyed, for example) and have a systematic parallax error in reading the ruler, I will have a systematic error that can not be removed via averaging. If my ruler is simply wrong, all measurements will be similarly biased.
The requirement of ‘random error’ is often forgotten and the assertion is typically made that the error band on the instrument is known, so there is no systematic error. But if you have a requirement for, say, +/- 0.1 you could easily have, for example, an electronic part that always ages toward the + side, introducing a ++ bias in all measurements, but still being inside the ‘acceptance band’.
And what if you are measuring DIFFERENT things? With DIFFERENT instruments? it is not 1,000 ‘trials’ of the same thing, but 1000 measurements of 1000 things with 1000 mutually variable instruments. Then it isn’t quite so clear….
Each of the 1000 things measured is measured only once. You have 11.x +/- 0.1 on it (say) and that is ALL the precision you have for that THING. Taking 1000 different things and finding the average of their measurements WILL tell you an ever more precise report of that “mathematical average”, but that number will NOT be closer to the actual correct average measurement. Each object had only ONE trial. Has only ONE error band. Was done with ONE instrument with an unknown bias. Again, the problem of systematic error comes into it. You don’t know if 1/2 the people were measuring low by 0.1 and the other half measuring low by 0.8 (so in any case you will report low). You can NOT get to a 0.01 accuracy from averaging a bunch of things that are all more than that much in error to the downside.
Yes, the probability of it is low, but it still exists as a possible (until it is proven that no systematic error or bias exists – which typically can not be done as there are ‘unknown unknowns’…) Again, using the same instrument to make 1000 measurements removes the error in the process of measuring (unless a systematic bias of the instrument or the observer). Also using 1000 different instruments removes the instrument error of randomly distributed instrument errors.
But doing 1000 different measurements with 1000 different instruments: Each measurement has ONE error. Each instrument has ONE bias. You are assuming that these will all just magically ‘average out’ by being randomly distributed and that is not known.
A good example of this is calorimetry. Oddly, that is exactly what folks are trying to do with heat gain of the planet. A very lousy kind of calorimetry. What is THE sin in calorimetry? Screwing around with the apparatus and thermometers once it is running. We were FORBIDDEN to change thermometers mid-run or to move the apparatus around the room. What do we do in climate / temperature measuring? Randomly change thermometers, types, and locations.
ALL of them introducing “splice errors” and other systematic instrument errors that are often unrecognized and uncorrectable. Just the kinds of errors that averaging will NOT remove.
The basic problem is simple, really: If you do multiple trials you can reduce the error as long as the errors are random. If all you have is ONE trial, while you can find an amusing number (the average of the SAMPLE DATA) to a higher precision, that is NOT indicative of a higher accuracy in the actual value. (Due to those potentials for systematic errors).
Now, for temperatures, there is another even worse problem: Intrinsic vs extrinsic properties.
That’s a fancy way to say the air is never the same air twice ( or you can never cross the same river twice). So you can only EVER have a sample size of ONE.
This is often talked about as an ‘entropy’ problem, or a problem with UHI, or with siting, or… but it all comes down to the same thing: Other stuff changes, so the two temperatures are not comparable. Thus not averageable.
One example is pretty clear. If you have 2 pots of water, one at 0 C and the other at 20 C and mix them, what is the resulting temperature?
You can not know.
IFF the two contain water of the same salinity, have the same MASS, and the 0 C water is not ice; then you could say the resulting temperature was 10 C. But without the added data, you simply get a non-sense number if you average those two temperatures. And no amount of average that in with other results can remove that error.
Basically, you can average the HEAT (mass x temperature x specific heat ) adjusted for any heat of vaporization or fusion (melting). But just averaging the temperatures is meaningless BY DEFINITION.
We assume implicitly that the air temperature is some kind of “standard air” or some kind of “average air”; but it isn’t. Sometimes it has snow in it. The humidity is different. The barometric pressure is different. (So the mass / volume changes).
For Argo buoys, we have ocean water. That’s a little bit better. But we still have surface evaporation (so that temperature does not serve as a good proxy for heat at the surface as some left via evaporation), we have ice forming in polar regions, and we have different salinities to deal with. Gases dissolve, or leave solution. A whole lot of things happen chemically in the oceans too.
So take two measurements of ocean temperature. One at the surface near Hawaii, the other toward the pole at Greenland. Can you just average them and say anything about heat, really? Even as large ocean overturning currents move masses of cold water to the top? As ice forms releasing heat? (Or melts, absorbing it)? How about a buoy that dives through the various saline layers near the Antarctic. Is there NO heat impact from more / less salt?
Basically, you can not do calorimetry with temperature alone, and all of “Global Warming Climate Science” is based on doing calorimetry with temperatures alone. A foundational flaw.
It is an assumption that the phase changes and mass balances and everything else just “average out”, but we know they do not. Volcanic heat additions to the ocean floor CHANGE over time. We know volcanoes have long cycle variation. Salinity changes from place to place all over the ocean. The Gulf Stream changes location, depth, and velocity and we assume we have random enough samples to not be biased by these things.
Yet temperatures can not be simply averaged and say anything about heat unless you know the mass, phase, and other properties have not changed. Yet we know they change.
And no amount of calculating an average removes those errors. Ever.

January 27, 2012 3:00 am

Some useful info here on why they went for a 3 degree x 3 degree 3,300 float array in water depths greater than 2,000 meters. They say the floats will not clump, not so sure about that myself.
http://www.argo.ucsd.edu/argo-design.pdf

DB-UK
January 27, 2012 3:29 am

Willis,
This problem is impossible to solve and any accuracy claimed makes no sense. If you run a chemical reaction, first thing you do is to stir the solution and achieve maximum mixing to avoid local ‘over heating’. If you then want to achieve certain temperature inside the flask, you need external heating but it is almost impossible to control external heating input to maintain exact temperature inside the flask. To solve that problem, you find the solvent that boils at the temperature that you need, and by keeping the solvent refluxing inside you control the temperature at the same point. When you are working in the ‘open systems’, like the oceans or the air temperature you have all sorts of problems and the only way to treat the data is to treat each individual buoy as the ‘local’ and individual temperature profile. If all individual profiles point to the same direction, then you have the ‘global’ trend, if not, you don’t. That is the same reason why there is no such a thing as ‘global temperature’, but network of huge number of local temperature patterns (we are talking here about air temperature as detected by the thermometer devices). Any trend analysis that tries to average all those individual patterns into a single number are useless exercise since they have nothing to do with the physical reality – if the physical reality is suppose to be the object of the exercise.
Please remember that the buoys are only the ‘messengers’ for which we know accuracy from the manufacturer, and they record the temperature as they suppose to do. It is so called scientists who are trying to interpret what the instrument is detecting that are wrong.

Rick Bradford
January 27, 2012 3:44 am

Gives you an edge of 1%, eh? So in one hour of 100 hands, you’d be winning 50.5 – 49.5.
Now, if you’d combined that with an optimal betting strategy (the 1-2-3-4-5-6, for example), you might have made enough to be taken into the casino parking lot and given a lecture on the expense of major dental treatment….

Dodgy Geezer
January 27, 2012 4:36 am

I have looked through these postings, and the arguments associated with them.
I am not, nor have I ever been, a statistician.
As far as I can see, there are two threads to this issue. One is the esoteric field of statistical estimation of data from individual readings. This looks fun for mathematicians. The other is an attempt to apply this to the real requirement to measure ocean temperatures. I think this is simply not possible.
My concern is that we just do not know how variable the actual water temperatures are in the real ocean. We know that the main currents differ from their surroundings, but we do not know precisely where the edges of the currents are at any one time, probably to within a mile or so. I wonder if the amount of heat stored in a cylinder of water a mile wide on the circumference of each major current is of the same order as the missing heat? Or the columns of cold water which I assume sit under each iceberg?
I suspect that the oceans have many microcells of variable temperatures, and even several thousand ARGOS buoys are unlikely to hit one of these cells during their operational lifetime. And if these cells contain enough heat to satisfy the ‘missing heat’ hypothesis, we will never have a hope of finding it.

Sean
January 27, 2012 4:42 am

I have not done the math but I always thought that the problem with imprecision was not in the ocean heat content (or really change in ocean heat content) but with the radiative numbers at the top of the atmosphere. The Argo numbers for ocean heat (derived from temperature measurements) are much more precise than the radiative flux numbers at the top of the atmosphere. The latter are only good to 1% and you are trying to measure an in going vs. outgoing flux difference of 0.1% which implies you need measurements to 0.01%. My sense of the paper was that it was like a golf ball hit into the rough and may be out of bounds. The ball is the ocean heat content data the grass height is the precision of the radiative flux data. If the grass is short you should be able to find the ball and determine if it’s out of bounds. However, if the grass is very tall, the ball is very difficult to find and if you don’t find it you can’t tell if you are out of bounds. They are using the ball lost in the tall grass to argue they cannot yet determine if they are out of bounds.

Ian W
January 27, 2012 5:19 am

We are dealing with climate ‘scientists’ therefore the first thing to do is ensure that they are using the correct terms in the correct way.
I believe this is a simple undergraduate level error of thinking that instrument precision is the same as measurement accuracy. So I get the wrong result (low accuracy) but to 10 places of decimals (high precision). This is then compounded by ‘clever’ statistical massaging of the high precision but low accuracy results.
What is needed is some idea of the accuracy of the measurement of the world ocean average temperature – but then validation is required against a baseline to check accuracy and there is no baseline. All that can really be done is identify the change since the last measurement, but as the floats are not static this metric is meaningless too.
This is then added to the unsupported claim that there is such a thing as an ‘average’ world ocean temperature and that mathematically averaging output from 3000 randomly placed and moving floats actually provides anything meaningful.
This is truly climate ‘science’ at its best.

kim
January 27, 2012 5:21 am

That we can not find the missing heat is no longer a travesty.
===============================

Dolphinhead
January 27, 2012 5:33 am

E M Smith
spot on. Average temperature is scientifically meaningless yet we spend billions on computer models trying to predict this meaningless metric. Has anything ever been as broken as climate ‘science’!

MikeN
January 27, 2012 5:39 am

The error is the measurement error of the individual buoy.

January 27, 2012 5:47 am

David Falkner asks:

Shouldn’t the error at least equal the error in the instrument?

Larry Fields observes:

The SEM formula only applies to RANDOM ERRORS. It does NOT apply when there are SYSTEMATIC ERRORS, aka METHOD ERRORS.

The international quantitative standard for calculating the full uncertainty is the root mean square combination of all the errors.
See NIST TN 1297 , Guidelines for Evaluating and Expressing Uncertainty of NIST Measurement Results.
For Willis and other quantitatively inclined, see the Law of Propagation of Uncertainty etc.
In How well can we derive Global Ocean Indicators from Argo data? K. von Schuckmann and P.-Y. Le Traon observe:

Long-term trends (15 yr) of GOIs based on the complete Argo sampling for the upper 1500m depth can be estimated with an accuracy of ±0.04mmyr−1 for GSSL, ±0.02Wm−2 for GOHC and ±20 km3 yr−1 for GOFC – under the assumption that no systematic errors remain in the observing system. . . .
This total error includes the uncertainties due to the data processing and the choice of the reference climatology, but it does not take into account possible unknown systematic measurement errors not precisely corrected for in the delayed mode Argo quality control (e.g. pressure errors, salinity sensor drift)..

That assumption of ignoring systematic errors needs to be tested! Sensor drift could overwhelm the rest. Bias error is often as large as the statistical error. Thus their total error could well be understated by ~ 41% (the square root of two.)
Furthermore, on earlier measurements they note:

There are substantial differences in these global statistical analyses, which have been related to instrumental biases, quality control and processing issues, role of salinity and influence of the reference depth for SSL calculation.

The times series methods also need to be examined.
Statistician William Briggs opines:

“it is possible to shift the start or stop point in a time-series regression to get any result you want.”

Statistics Of Loeb’s “Observed Changes In Top-Of-The-Atmosphere Radiation And Upper-Ocean Heating Consistent Within Uncertainty”

This is the whole trick! Nobody ever asks why you chose a particular starting point. You can tell any story you like and people will never think to ask what would happen if you were to use a slightly different data set.

How To Cheat, Or Fool Yourself, With Time Series: Climate Example

January 27, 2012 6:29 am

I have yet to read the paper – it takes time to get without paying – but as I understand the abstract, they are saying that the ‘missing heat’ from the additional forcing they expect from the CO2 model, COULD BE down there in the ocean depths, because the accuracy of the measurments is not good enough to say that it ISN’T there.
I looked at it from the other direction….from the Top of the Atmosphere downwards. It is at the TOA that they ‘measure’ the missing heat. I am not yet sure whether this additional heating is real measurement or modelled – I guess it is the same problem – a mix – because a global average is produced – but the modelling must be dealing with a much simpler situation. So – if the flux at the TOA they have detected is a near-decadal signal of 0.5 watts per square metre, they can then assume that this amount has gone into the deeper ocean rather than the surface laters of the atmosphere where it would be more readily detected. The TOA average flux is 340 watts/square metre. So the TOA signal that they detect is at the level of 0.01%. Given that the solar output varies by 0.1% over the eleven year cycle and has other irregular phases…..I don’t immediately see how they justify that signal as statistically significant, but hope to look more closely.
So they then go looking for the heat. And because the error margins of the ocean heat content calculations to 1800m are large enough to hide their missing heat….they state that their model is consistent with the TOA data (i.e. the observed flatlining of surface temperature as well as heat content in the upper 700m) which is consistent with the CO2 model. What amazes me is that oceanographers are not more sceptical….there is little heat exchange below 200m let alone 700m…..with about 80% of the late 20th centennial rise in upper ocean heat content held within the first 200m.

January 27, 2012 6:47 am

With all of this mathematics and statical uncertainty, we keep getting told that the root of the entire problem is…………………………..
CO2 !!!

wsbriggs
January 27, 2012 6:50 am

This is one more excellent example of Willis’ ability to get to the heart of the matter, in this case, the impossibility of determining the ocean’s heat content with severely aliased measurements. I mean spatially aliased, before someone attacks that statement.
I love the way he illuminates the dank corners of Post-Normal Science. The mold in those corners is starting to cringe away from the piercing light.
Keep it up Willis!

cant be named
January 27, 2012 6:54 am

Nice Post about the Black Jack, as a banned BJ player myself I can appreciate your story. Most card counters go broke becuase they dont have a big enough bankroll. We played with a ROR of .5% calculated on 50 million simulated hands!
God I hate being banned!! Best money I ever made, lasted 3 years!
BTW I trained blackjack players and had a team of 10 players 🙂

higley7
January 27, 2012 7:03 am

Until I see discussions including convectional heat transfer as a major factor, I see arguments over the radiation budget to be lacking. It has been estimates that 85% of the energy lost to space is first transferred to altitude by convection of warm, humid air. As the oceans appear to be cooling, it might be conjectured that convection is doing a good job.

Scott Scarborough
January 27, 2012 7:08 am

You cannot achieve more accuracy than the base accuracy of the temperature device you are using. If the ARGO bouys are accruate to +/- 0.1 C then the maximum accuracy of your sea temperature measurements can only be +/- 0.1 C. If I have a thermometer that is acurate to +/- 0.1 C and I want to measure my temperature, I cannot achieve a result of greater accuracy than +/- 0.1 C NO MATTER WHERE, OR HOW MANY TIMES, I TAKE MY TEMPERATURE.

BenAW
January 27, 2012 7:11 am

Peter Taylor says:
January 27, 2012 at 6:29 am
“What amazes me is that oceanographers are not more sceptical….there is little heat exchange below 200m let alone 700m…..with about 80% of the late 20th centennial rise in upper ocean heat content held within the first 200m.”
Imo here is the explanation to the so called GHE, the assumed 33K deficit in temp.
I you let the solar that hits the oceans heat the very thin upper slice of water + the atmosphere above it, imo there won’t be much of a deficit, if any.
The thin layer of ocean above the thermocline is the buffer that carries the accumulated daytime heat to the night.
The deep oceans just sit there doing their ocean things without much heat exchange with the upper layer OR the hot core below, BUT they have a temperature of ~275K, not the 0K a blackbody approach assumes.

James
January 27, 2012 7:24 am

Willis
I am not trying to lecture you or to say anything about 1800 being a small sample.
Looking back, I guess I was commenting in response to this.
“And it also indicates that 3 Argo buoys could measure that same huge volume, the entire global ocean from pole to pole, to within a tenth of a degree.”
Moving from the stats of 3000 to the stats of 3 is risky business, that is all.
James

Dave in Delaware
January 27, 2012 7:28 am

The key here is that we need an Analysis of Variance (ANOVA) of the total ocean ‘system’. There are other sources of variability besides just the instrument precision.
Lets say we make several plastic parts that go into our product, and these parts all need to be exactly the same shade of our trademark Ocean Blue color. And we have an analytical technique that measures Ocean Blue color very precisely to determine if we are meeting our company targets. We make batches of plastic and each batch can create 10 or 100 or 1000 parts per batch (depending on the size of the part). Right there we need to sample for Batch-to-Batch variability and by sampling multiple parts from each batch we assess Part-to-Part variability. In addition, lets say we have 10 production lines which can vary a little because perhaps the batch mixing is a bit different Line-to-Line and also production lines make different size and shape parts and may use different types of plastic in the batch formulation. We might even find a seasonal difference if the parts run through the machines faster or cool differently at different times of the year. The ANOVA would determine variances for each of those and statistically combine them to determine the overall Ocean Blue color performance.
What we would probably find is .. the instrument precision was tiny compared to the variability in all of the other parameters!
How does that apply to our Ocean temperature problem? Frequent WUWT readers will be familiar with the detailed work of Bob Tisdale on ocean data, which shows that all oceans are not the same. My quick list of parameters might include:
*Time window (month or season)
*Ocean subdivided into Ocean region (eg N S E W or Gulf Stream – sub sections don’t necessarily need to be the same size and shape)
*Latitude (to account for such things as Gulf Stream cooling as it moves from Bahama to Iceland)
*Ocean ‘phase’ (such as AMO, PDO, El Nino, etc)
*Thermocline (percentage of an Argo profile above vs below the thermocline)
Having established fixed sampling sectors from the above, one might then determine statistics from all Argo floats that happen to supply data within that sector for the given time window. It then becomes possible to look at season-to-season variability or phase variability which are likely larger sources of variance than whether there are 10 or 100 floats in a sector at a given time. One might find that something like the percentage of readings above vs below the thermocline might be the largest source of variability. Or not. But unless you do the ANOVA you really don’t know.

Ken Harvey
January 27, 2012 7:31 am

O/t maybe, but what we are talking of here is the shortcomings of metrology and statistical error assessment. Give a thought to that figure that economists haunt us with – GDP. Britain has recently announced that in the last quarter of 2011, GDP fell by 0.2%, National statistic offices and their masters love to believe that GDP (which is not a terribly useful measure in the first place) can be stated with 0.1% precision. That is accuracy of within 1 in a thousand. In reality accuracy of one in one hundred is far beyond attainment and five times that figure would be impressive.
GDP, even in relatively small countries, has many, many times 30.000 recording points – much, much more data. Unfortunately not all of that data makes it to the final count and there are large areas of the economic ocean for which there are no measurements. People forget to send the numbers in, or put it in twice and others put it in the pending basket. Numbers for the ‘no recorded data’ have to be guessed (estimated is the conventional word). There are many guessers, some more able and some less so. Each one of them is subject to some degree of political pressure. No member of any sitting government wants to see a negative answer. (“Young man I think that your estimate for the street value of illegal drug dealing in Chicago is many millions too low. And as for your estimate as to how much activity goes unrecorded in back scratching transactions, I can tell you that my dentist hasn’t paid money for a car repair since he left medical school!)
From all of this we are asked to believe that accuracy of one in a thousand is a purely objective measure. We, the public, wherever we live, are all too gullible.

Paul Linsay
January 27, 2012 7:37 am

From my experience, the law of large numbers breaks down at about three sigma because typical error distributions for experimental data have long tails that are definitely not Gaussian. They are usually much higher, e.g., Lorentzian which has an infinite variance.
Why do they expect to to see any “missing” heat at depths of 1800 m? Even UV, which has an absoption length of about 100 m, won’t get anywhere that deep (http://www.lsbu.ac.uk/water/vibrat.html).

Jeremy
January 27, 2012 7:57 am

There’s no math error, but there’s a huge rules-of-measurement error here.
That calculation shown in your article, Willis, demonstrates how you would try to calculate your theoretical limit of error, not your actual measurement error. The calculation starts from a poorly understood term, that is the presumed error of your forcing. How is the forcing known to such a precision in the first place? What is the relationship between your presumed forcing and your thermocouples taking your readings? I’d wager there is no easy answer to that.
I look at that calculation and I see one possible theoretical limit of error, NOT an actual measurement error. An actual measurement error is always calculated from the measurements themselves or it is meaningless.

GaryW
January 27, 2012 8:17 am

Willis,
I see several people have mentioned this but I feel it must be made very explicit:
Accuracy and precision are different from each other. Accuracy is how close a value is to a true physical value. Precision is what resolution in units of measure a value can be read.
Instruments with high accuracy but low resolution are possible but those with low accuracy but high precision are much more likely. (Walmart digital thermometers for example)
Averaging temperature readings of a stable process value over multiple instruments and time can reduce the effects of noise to improve the precision of the measurement. That average thought does not improve the accuracy of the measurement. Accuracy can never be better than instrument error.
A claim that averaging readings from multiple instruments of the same process value reduces instrument error is invalid. This is based on the assumption that instrument errors are random and not systemic. For that assumption to be valid, all instruments must of entirely different design, manufacture, and operating principle.
Also, instrument accuracy must also include the how the instrument is coupled to the process being measured. Is the instrument sensor in a protective well or sleeve? I is the instrument sensor located in a clean area of the process or in a stagnant pocket. Does the instrument sensor itself modify the flow or temperature of the process?
So, in industry, and hopefully science, value accuracy may never be assumed to be better than instrument calibration accuracy. Any claim of precision better than instrument accuracy must include a description of how it was achieved. As an example here we might say +/- .004 degrees plus +/- 0.005 instrument accuracy.

dp
January 27, 2012 8:19 am

If you look close enough it is incredible the amount of information you can get from a single tree or a single thermometer in a very complex system. Truly incredible.
http://dictionary.reference.com/browse/incredible

Steve
January 27, 2012 8:20 am

All measuring instruments are specified with a given measurement uncertainty. This uncertainty must be included every time a measurement is taken and is a systematic error. It cannot be removed by averaging. Otherwise a cheap meter could replace an expensive one if enough measurements were taken and this is nonsense. Furthermore each measurement taken will also contain some random error. This random error can be reduced by averaging as the random errors will tend to cancel each other out. So, how can I reduce this systematic error? Well, if I used multiple instruments to measure the temperature, each with its own uncertainty then I believe the average of their readings would be closer to the true temperature. However, the ocean is not isothermal and the buoys are not measuring at the same point. Hence averaging their results is meaningless IMO.

P. Solar
January 27, 2012 8:47 am

Their claimed accuracy is a total fallacy.
The square root of N principal applies to repeated measurements of the same thing and assumes the errors in the measurement are normally distributed. (ie nice gaussian bell , random errors).
There is absolutely no way that measuring a time varying quantity at different times in less than random 3D positions at different depths and different geographical in a medium that has is very significant variation in both depth and latitude/longitude plus seasonal changes can be construed as “measuring the same thing”.
If the temperature sensor in a buoy was placed in a swimming pool and took 10,000 measurements within a short period of time, one may be justified in dividing the uncertainty of one reading by 100 due to repeated measurements .
If you do one reading in every swimming pool in the town you will have 10,000 readings with same uncertainly as you have in one single reading. You may then calculate the mean temperature of all the pools and state that has the same accuracy as one single measurement error.
You have not measured the mean 10,000 times. Only once. You have no reason to divide your single measurement uncertainly by anything except one !
This whole idea of dividing by the root of the number of individual data points is one gigantic fraud.
Any qualified scientist making such a claim is either totally incompetent or a fraudster.
[Fixed the bolding – use angle brackets… -ModE ]

Ged
January 27, 2012 8:48 am

I guess it just goes to show that precision is not the same as accuracy.

Doug Proctor
January 27, 2012 8:54 am

Willis,
Excellent story and question about the reality of what is being touted as the measurement accuracy. This goes, however, to the basic reliabilty of the various GISTemp or even (on a statistical basis) the Mann-schtick temperature graphs: we see the huge error bars, but take the central line as reliable, accurate and precise. Why?
What, statistically, do the general error bars tell us about the reality of the central, “statistical” trend? With what certainty do we know that some other temperature path was not followed over the years? If the error bars are so large, what does that tell us about the “global” trend of anything: are the error bars saying that all temperature is regional, and that the global profile is purely a statistical result with possibly no relevance to real-world patterns?
All statistical work on certainty is based on either a randomness of error/variation, or an understanding mathematically of bias. The data in Russia was colder when oil/funding was based on perceived need by the centralized planning board. The speed of the “hurricane” in NYC last year (? time flys) was largest when the TV needed to promote the excitement of a major storm (and connection to CAGW). I don’t know if either of these two biases were considered in the Hansen-style analysis, but I do know that all the work done on data adjustments shows a pattern of cooling the past and warming the present.
To summarize:
1) within error bars, can the actual profile be different and yet just as “certain”?
2) do local records have small error bars, and the large error bars reflect the fact there is no global trend except as a mathematical construct?
3) what does a statistical analysis of data adjustments show as “certainty” to the net adjustment profile?

P. Solar
January 27, 2012 8:58 am

The previous swimming pool analogy does not account for Steve’s very valid point about systematic error. Even 10,000 readings with the same buoy would not reduce the uncertainly due to the manufacturers stated accuracy of the sensor. Using 3000 sensor would. The instrumental error becomes negligible since they are (or claim to be) incredible accurate to start with.
That does not alter the fact that if your buoy does a dive and takes 1000 measurements , it is not measuring the same thing each time. It is not measuring the mean 1000 times.

Ian L. McQueen
January 27, 2012 9:36 am

Willis-
I have skipped 60 replies, so someone else may have made the same observation as follows…..
Is your problem related to the calculation of the “standard error of the mean”? I suspect that this is valid for items of the SAME KIND. Thus you could calculate for some dimension of a large number of widgets made by the same machine. But the same calculation for the same number of widgets, each produced by a different machine, would produce nothing useful. I would regard the ocean as a vast number of widget makers…..
IanM

climate creeper
January 27, 2012 9:55 am

Modeling with enough tunable parameters to make an elephant fly, curve fitting with inverted cherries, and correlation == causation is sooo passé. Now, error bars == invisible monsters! Wherefore art thou Steve McIntyre?

MR Bliss
January 27, 2012 10:31 am

What Trenberth needs is a new component in his heat equations. I would like it to be named after me, but I suppose it would have to be called the Trenberth Constant.
The value of this constant would be equal to the value of the missing heat. The beauty of this constant, in keeping with climate scientists habit of disregarding normal scientific protocols and conventions, is that it would be variable.
A variable constant – one that can be adjusted to whatever the current value of the missing heat happens to be. The peer review process would rubber stamp this, and another pesky problem would go away.

markx
January 27, 2012 10:44 am

Seems to me these are ALL unrelated measurements in every dimension. I imagine the only repeatability will be year by year, then you start having to deal with things such Milankovitch cycles too.
Float measurements surely cannot be statistically analysed to provide a tight figure of statistical error of average worldwide Upper Ocean Heat Content (Sure, we can calculate an UOHC figure against time, but how meaningful is it? By any ‘dimension of measurement’, latitude, time, season, depth, proximity to land, depth of ocean at that point, types of weather, the error bars will by very large.)
Surely statistical error analysis can only be carried out with repeated measurements of the same parameter. Every case/every dimension these measurements are not repeatable, nor are they expected to be.
1. Individual floats will measure temperatures for one particular point in the ocean. But this is not expected to be the same for other points in the ocean. (latitude, currents, storms, seasons, time of day, proximity to land …etc)
2. Individual depth measurements are not expected to be the same, vertically, positional, by day or by hour or by season.
3. Day by day as the seasons change, each measurement is an individual measurement. How many tens (hundreds, thousands?) of years of data would be required to provide meaningful replicates of measurement?
The only meaningful statistical measurements which could have calculated error bars (and they would be huge) would be for example to average 10 years of temperature records for a certain probe at a certain time of year… then compare all similar probes at the same certain latitude…. then perhaps seasonal variation can be statistically removed and all the required adjustments can be made for season, ocean currents, ocean heat anomalies …… still, error bars would be huge. And, if every year there WAS in fact some warming, the “rate of change” could then be calculated for each particular probe, and statistically compared with rate of change for other probes …… (again with all the required adjustments for latitude, season, ocean currents, ocean heat anomalies … I can only imagine the margin of error bars in that case to be huge too).

January 27, 2012 11:03 am

Willis…I wish I had 1/2 your native intellect. But I have ENOUGH background to tell you that you are about to be CONDEMNED by the Pharisees and the Doctors of the Law! But in this case it’s not because you are considered to be “violating” the LAW…but because you are insisting that thee Doctors of the Law ABIDE by the LAW themselves. In this case it is the LAW of sampling, and distributions…and the “central mean” theorem. Actuallyit gets to some very important concepts in “information theory”. AND therefore it insists that the data be examined with a caveat that “at this point these changes cannot be distinguished from noise”.
A hard concession to make for the Doctors of the Law.
Keep up the good work, but don’t do any meditating in any gardens.
Max

zac
January 27, 2012 11:45 am

Do you know how many thermocouples these floats have on board?
It would also be interesting to know what time they are programmed to be in their near surface phase.

Tim Clark
January 27, 2012 11:50 am

[Josh Willis also had ocean-based data sets, including temperature profiles from the Argo robot fleet as well as from expendable bathythermographs, called “XBTs” for short.
But when he factored the too-warm XBT measurements into his ocean warming time series, the last of the ocean cooling went away. Later, Willis teamed up with Susan Wijffels of Australia’s Commonwealth Scientific and Industrial Organization (CSIRO) and other ocean scientists to diagnose the XBT problems in detail and come up with a way to correct them.]
Willis, I don’t know if this has already been brought up, but it’s my understanding they stitch ARGO and XBT together. Does that confound the statistics? What is their accuracy?

January 27, 2012 11:53 am

In the animé series Devil May Cry, in the episode “Death Poker”, the protagonist (“Danté”) observes “the only sure way to win at gambling is to have a partner.”

Martin Å
January 27, 2012 11:55 am

Maybe I have missunderstood something, but by your reasoning ONE satellite could never measure anything with any precision. You assume that the ocean temperature is equal to the true average everywhere, and there are 3000 measurements of this one temperature with an added random error. As you say the sensors themselves have high enough accuracy. The question is if 3000 sensors are enough to sample the ocean densely enough. To determine that you need to know spatial autocorrelation of the ocean temerature.

GaryW
January 27, 2012 12:09 pm

Willis,
Accuracy can never be better than instrument error.:
“Not true. As long as the error is randomly distributed over the measurements, the average of a number of measurements can definitely have an error smaller than instrument error.”
There are a couple ways this is incorrect. First, let us assume we are making 1000 measurements of a stable process with one instrument with an accuracy +/1%. We then average the results. If the instrument is reading just shy of 1% high at that process value, your very precise averaged number will still be just shy of 1% high in value. No amount of averaging of multiple values will improve upon that.
Next, let’s consider that same reading taken with 1000 different instruments. There is nothing in a +/- 1% specification that says the instrument error will be random throughout that error range. With real world instruments it will not be random due to common technology, manufacture, and calibration technique.
Your error is assuming that instrument calibration errors may be considered random. Instrumentation folks can tell you that is a bad assumption.

aaron
January 27, 2012 12:18 pm

I think it’s in the trees.
More specifically, in molecular bonds in plant cells.

aaron
January 27, 2012 12:22 pm

Chemical and molecular bond.

HAS
January 27, 2012 12:50 pm

Perhaps not to answer your question directly, but perhaps a line of inquiry to help understand the difference (and without having read the paper either).
In Fig 3a the heating rates are referenced to a PMEL/JPL/JIMAR product presumably as described in “Estimating Annual Global Upper-Ocean Heat Content Anomalies despite Irregular In Situ Ocean Sampling” Lyman et al. This describes the development of a weighted average integral (WI) “that assumes that the spatial mean of the anomalies in the unsampled regions is the same as the mean for the sampled regions” in comparison with the simple integral (SI) that assumes “zero anomalies in regions that are not sampled’. The paper shows using synthetic data from models that WI “appears to produce more accurate estimates of the global integral of synthetic OHCA than the SI”. Then follows a large number of rather obvious caveats.
If you are to reduce the errors in a measurement then one needs to add additional information (and/or ignore sources of error – something discussed already in this thread and something Lyman et al catalog themselves). The question in this case is where is this additional information coming from? I’d say from this reference that one of those may well be the assumption on the value of unsampled region means using WI.
If this is what Loeb et al have done then they have ended up ignoring the errors inherent in that assumption, and that is where the difference comes from. Using that assumption “3 Argo buoys could measure …. the entire global ocean from pole to pole, to within a tenth of a degree”.
Anyway fun to speculate.

Rosco
January 27, 2012 2:19 pm

I failed several exams giving answers with “one additional decimal of error over the precision of the instrument.”
I would probably have made an excellent climate scientist as I took a few “repeats” to learn to keep to scientific principles with to eror – I simply hated not being precise with an answer.

Phil.
January 27, 2012 3:09 pm

Willis Eschenbach says:
January 27, 2012 at 10:44 am
Dave, thanks. I understand that there are other sources of variability. My point is that regardless of any and all difficulties in the measurements, we still have an impossibility—a claim that they can measure the temperature of the top mile of the ocean to 0.04°C with thirty Argo floats. That’s the part I can’t figure out.

One possibility Willis, your logic assumes that the standard deviation of the measurements stays constant no matter how small the sample size is, however for a small sample that may not be true?

January 27, 2012 3:58 pm

Willis said:
> As long as the error is randomly distributed over the measurements,
> the average of a number of measurements can definitely have an
> error smaller than instrument error.
Do you have a justification for assuming that the error is randomly distributed? I would have thought there could easily be systematic errors that introduce a bias.
Disclaimer: I’m not a statistician so may be completely wrong.

markx
January 27, 2012 4:05 pm

“I’m saying if they are right that 3,000 foats can measure to 0.004°C error, then 30 floats should be able to measure it to 0.04°C error … and that’s not possible….w.”
Nice article, but are you are complicating things unneccessarily?:
Why not simpy try to question how they can possibly calculate a 0.004°C error with the 3,000 floats?

cant be named
January 27, 2012 4:09 pm

“The part I liked the least, curiously, was something else entirely. It was that my every move was fixed. For every conceivable combination of my cards, the dealer’s card, and the count, there is one and only one right move. Not two. Not “player’s choice”. One move. I definitely didn’t like the feeling that I could be replaced by a vaguely humanoid 100% Turing-tested robot with a poor sense of dress and a really, really simple set of blackjack instructions … all of which I probably should add to the head post.”
This is what I liked most of all!! You could train someone to do it right and take all the decision making away from them. Then every so often, polygraph test them to make sure they are sticking to the rules etc!! Also you could simulate deviations from the perfect play, so you could calculate the cost of making plays that make you look like you are making mistakes.
In fact you could make yourself look like a total hopeless card counter to casino staff and other counters, yet have the smug knowledge you are better than the lot, while being still allowed to play, because you are soooo bad. Finally it is the weight of the money that brings you down. You just can’t keep hiding what you win.

Martin Å
January 27, 2012 4:10 pm

Willis: after thinking some more I understand your point. I guess you have to assume that the measurements are far enough in time and space to be independent in order to land at 0.04 C accuracy with 30 floats. But if that assumption doesn’t hold the situation gets even worse, i.e. 30 floats would give an even better accuracy than 0.04 C, which is of course even less probable. I think I got it the other way around in my first comment.

Robert Austin
January 27, 2012 4:39 pm

The last ice age was about 90k years long and we have only been into the Holocene about 10k years. Considering the vast heat capacity of the oceans, there is no reason to think that heat flow to and from the oceans is in equilibrium. Under Halocene temperatures, there is still missing heat that is being stored in the oceans. Trenbreth and company might imply that the ‘missing” heat is a threat to climate in the near future but common sense says the heat will accumulate until the next ice age.

Legatus
January 27, 2012 7:51 pm

“The weaker the data available upon which to base one’s conclusion, the greater the precision which should be quoted in order to give the data authenticity.”–Norman Ralph Augustine

Erinome
January 27, 2012 8:12 pm

Willis wrote:
So, assuming there are no problems with my math, they are claiming that they can measure the temperature rise of the top mile of the global ocean to within 0.004°C per year.
Of course, they are not.
You are all tangled up, Willis. Their published error is statistical, not instrumental. It is the uncertainty of the trend, not of a temperature measurement.
Before accusing professional scientists of being completely incompetent, shouldn’t you at least give them the courtesy of responding to your notions — which in this case are completely off-base, yet put here only, it seems, to try to embarrass them regardless of the facts.
It’s like Drudge — doesn’t matter if he’s right — throw enough junk against the wall, and something is bound to stick somewhere.
I presume you will be writing a letter to Nature Geosciences with your finding?

George E. Smith;
January 27, 2012 8:16 pm

Well I think I read every single post here so far (on this thread).
And there were two things I don’t believe I saw in ANY post.
#1 The sea is not like a piece of rock (large rock), it is full of rivers with water flowing along every which way; meandering if you will, and this meandering is aided and abetted by the twice daily tidal bulge.
So the likelihood of a buoy, no matter how tethered or GPS located, being in the same water for very long is pretty miniscule, so you might as well assume that evry single observation, is actually a single observation of a different piece of water.
Second and far more important, this like all climate recording regimens, is a sampled data sytem.
So before you can even begin to do your statistication on the observations, you have to have VALID data, as determined by the Nyquist theorem.
You have to take samples that are spaced no further apart than half the wavelength of the highest frequency component in the “signal”. The “signal” of course is the time and space varying temperature or whatever else variable you want to observe. That of course means the signal must be band limited, both in space and time. If the Temperature shall we say undergoes cyclic variations in say a 24 hour period, that look like a smooth sinusoid if you take a time continuous record, then thn you must take a sample sooner that 12 hours after the previous one. If the time variation is not a pure sinusoid, then it at least has a second harmonic overtone component, so you would need one sample every six hours,
And if the water is turbulent and has eddies with spatial cycles of say 100 km, then you would need to sample every 50 km. OOoops !! I believe that all of your spatial samples need to be taken at the same time, otherwise you are simply sampling noise.
Now if you do it correctly (fat chance), then in theory it is possible to perfectly reconstruct the original continuous function (of two variables in this case)
Well you don’t really want the original signal do you (or need it). What you want to do is statisticate the numbers and get the “average”. Also known as the zero frequency signal.
Well the Nyqist theorem tells you that if you have a signal that you think is band limited to a frequency B, then you need to sample at a frequency 2B minimum to be able to recover the signal.
If your signal actually has a frequency component at B + b outside the band limit, then the reconstructed continuous function, will now contain a spurious signal at a frequency of B-b, which constitutes a noise.
And note that B – b is less than B, which means it is within the signal passband, so it is inherently impossible, no matter what, to remove the spurious signal by any knd of filter, without simultaneously removing actual real parts of the signal, which is just as bad a corruption as adding noise; known as “aliassing” noise. So you see why your statistics no longer works; not even the central limit theorem can save your hide; because your sampled data set is not avalid data set.
Now suppose instead of sampling at a frequency 2B like you are supposed to, you only sample at a rate B, half of the required minimum rate; and this case will arise, if you sample twice a day, every 12 hours, but your 24 hour signal is not sinusoidal so it contains at least a frequency of one cycle in 12 hours (or higher if it is third harmonic distortion.
So in that case you actually have a spurious signal at a frequency that is B + B, which after reconstruction becomes a noise signal at a frequency of B – B or zero frequency. Now this as we said is the average of the signal.
So even if you don’t need to reconstruct the signal, but only want it’s average, it takes only a factor of 2 in undersampling, and you can no longer recover even the average.
So forget all your fancy statistics; without a Nyquist valid set of samples you can’t do much.
This is the pestilence that afflicts Hansen’s GISStemp. All that his collection of data records is GISStemp, and nothing else, it has no validity as a Temperature for the earth.
Well same thing for the Argo buoys. They don’t tell you a thing about the global ocean Temperature; but they might be giving you interesting information about the general lcations where each of the buoys happens to be; but don’t forget what the ocean river meanders are doing to that locaition.
The very first lecture in climatism 101 should be the general theory of sampled data sytems.

Jim D
January 27, 2012 8:35 pm

Willis, you mention they are not discussing a decadal trend when in the abstract they say
“We combine satellite data with ocean measurements to depths of 1,800 m, and show that between January 2001 and December 2010, Earth has been steadily accumulating energy at a rate of 0.50±0.43 Wm−2 (uncertainties at the 90% confidence level).”
To me, it looks like their error bar applies to a decadal average rate. In other words, the error bar converted to temperature is about 0.04 degrees over the period from 2001-2010. The surface ocean temperature rises by typically 0.1 degrees per decade, so they can actually discern when a warming occurs with this dataset.

GaryW
January 27, 2012 8:40 pm

Willis,
“Thanks, Gary, particularly for quoting my words. It allows me to explain why your claim is not true. I made a general statement with no assumptions at all. It was an IF … THEN statement. In fact the requirement is even weaker, the error only has to be symmetrical, not gaussian normal.
You are correct, that is not true for all distributions. But for the kinds of distributions we’re talking about here it is generally true, certainly close enough to gain some precision through averaging.”
Perhaps the difference between are positions on this is semantic but that is actually the point I am trying to get across. Your use statistics and error analysis is certainly valid for many real world problems. Unfortunately, instrument accuracy is what we are discussing.
Your use of the graphics are interesting but have only a vague relationship to instrumentation concepts. The question is not how how accurate or precise the gun was that shot the bullets but how accurate and precise the target is at measuring the patterns the bullets produced. That is, I expect, a complete inversion of what you are thinking. Of course, it might be fun continuing with the target and gun example and I could probably even fit in a joke about the Texas Sharpshooter but actually it is a poor example for discussing instrument accuracy and precision. From an instrumentation perspective, the target is the measurement instrument, not the gun, and certainly not the bullet patterns.
If you are still not on board with this, consider you accuracy example. Averaging the locations of the bullet holes relative to the center of the target might give you a more precise indication of where the rifle is actually sighted but it will not improve the accuracy of either the rifle or the target. We are discussing instrument accuracy and averaging values.
The accuracy specifications of instruments are not simply made up by the manufacturer for advertising purposes (at least they are not supposed to be!) Each measurement technology has its own fundamental accuracy limits in terms of principle of operation, repeatability during manufacture, repeatability over time, linearity, and accuracy of error compensation features. Even those expensive RTDs in the ARGOS bouys require complex and careful linearity correction and range calibration. RTDs can be very good and are fairly linear in their response to temperature variations, but not 0.005 degree linear. Also, RTD resistance is measured by passing a current through it. That current causes a measurable heating of the sensor. The protective sheath it rides in has its own thermal resistance and heat capacity. Each technical issue and compensation has its own characteristic effect on a final temperature reading. The final result is that instrument errors, even when reduced to small values, are not random within the operating range of the instrument or from instrument to instrument. Assuming so is not valid.
I suspect the semantic problem exists because most folks do not look at an instrument from the perspective of someone trying to squeeze the best accuracy out of it, given the physical constraints. Users of that instrument do not see the tweaks necessary to correct for linearity and hysteresis quirks inherent in the design and still meet the required specs.

Alan S. Blue
January 27, 2012 9:13 pm

Willis, you write:
“Presumable the scientists have calculated, not the theoretical limit of error, but the actual measurement error from the measurements. So my calculation is also about the actual error.”
This is the core of 10+ separate comments above, including mine.
Determining the precision, as you noted, only requires testing the equipment many times to see how well it agrees with itself and other co-located instruments. But accuracy requires more knowledge. To find the “the actual measurement error” for a single float (or surface station, whatever), one needs to know the actual average temperature of the gridcell the float is attempting to measure. This is decidedly not easy.
And we haven’t seen this attempted in more than a rough fashion to my knowledge. And repeated for each individual gridcell, because they aren’t all identical – they are all unique. This is the core difference between a “weather measurement” and a “climate measurement” IMNSHO – one is a point-source, and is judged based on how well it mimics a similar instrument measuring the same value. The ‘instrumental error’ is reasonable here. The plane landing at the airport wants the temperature at the airport, which is why the instruments were placed there in the first place. But… not when you’re extrapolating that same instrument to an entire gridcell.
And as far as I can tell, finding “actual measurement error when used for measuring the gridcell temperature” is not happening. The ‘weather error’ appears to be propagated instead.
Rephrasing:
Measuring the precision and accuracy of a point-source thermometer is not tough.
Measuring the precision of a gridcell-thermometer is also not tough.
Measuring the accuracy of a gridcell-thermometer is daunting. And should result in entire rafts of papers on pure instrumental methods alone.

Theo Goodwin
January 27, 2012 10:00 pm

Brilliant work, Willis. We need to think of a memorable name for your argument so that Warmists can more readily fill with dread every time they hear it. It’s too late for me to think clearly. I will get the ball rolling with the “Thirty Buoy Argument.”

Phil
January 27, 2012 11:21 pm

Springing off of this comment, maybe one could think of it backwards. Pick a resolution. Take the equivalent of a digital picture of the entire temperature field of the oceans. Then, use a compression algorithm to reduce the size of the resulting file. Obviously, the fuzzier the reconstruction that you can tolerate, the more you can compress the file. The resulting compressed file is the sampling that you need to do to reconstruct a picture of the temperature field that you have never seen. Compression in this example would be affected by 3 things, roughly: the original chosen resolution, the amount of fuzziness or loss on compression that one could tolerate and the structure of the file. If the file is composed of many repeating elements (e.g. an even background), the compression is greater or the sampling needed is smaller.
Can this be estimated? I would think it could be. Simply take actual digital photographs of the world’s oceans from space (clouds and all to simulate temperature variations), stitch them together into a mosaic and then compress the resulting digital file to 3,000 pixels. After you do that, blow it back up and see if it resembles in any way the original photographic mosaic. I would think that the result be more than a little fuzzy. The equivalent of 0.004°C resolution per reconstructed pixel?: not likely.
P.S. Your calculation appear to be correct, as usual.

January 28, 2012 4:16 am

I’m reminded that, tucked away in Trenberth’s article on Missing Heat on SkS last year, was his own plot of sea surface temperatures showing a distinct curve which had just passed a maximum and was starting to decline. Yes, it appeared (without comment) on SkS of all places …

January 28, 2012 4:20 am

David L. Hagen says:
January 27, 2012 at 5:47 am
“The international quantitative standard for calculating the full uncertainty is the root mean square combination of all the errors.”
Catchy little slogan, David. Unfortunately, it does not apply to the real world, because of your use of the word, “all.” “All” includes METHOD ERRORS (MEs), IN ADDITION TO RANDOM ERRORS. In the real world, the MAGNITUDES of the MEs are usually unknown, and are often unknowable.
Example: Use Larch tree rings as proxies for temperature, as Keith Briffa did. His climate ‘science’ study extrapolated back to a time when there were no reasonably accurate thermometers, with which he could calibrate the ‘data’ from his (cherry-picked) tree rings. MEs are bloody inconvenient, wouldn’t you say?
Now I’ll try to explain the point that I attempted to make in my earlier post, but from a different angle.
Larry Fields says:
January 26, 2012 at 11:58 pm
Willis, you’re a truly extraordinary science writer, but in this case, you made a logic error that most of us could have made. An analogy will illustrate the point.
Case 1. When your son reaches his fourth birthday, start measuring his height every day. Use a state-of-the-art electronic height-measuring instrument, having sufficient readout resolution to give a slightly different value each time.
When his height stops growing at around 17 or 18 years of age, use this data to estimate the total growth after age 4. And for the sake of Science, throw in the usual uncertainty estimations.
Case 2. This is the fun part. Throw out the measurements from the 2nd, 4th, 6th, etc days. Then do the calculations all over again. The second growth estimate will be the same as the first. Have you done an appreciably (100*(1 – SQRT(2) = 29%) worse job of nailing down the uncertainty for the total height growth since age 4? No.
Why not? Because your son will have many barely measurable overnight growth spurts, and they do not happen every night. Measuring your son’s height every day does not contribute a whole lot to narrowing down the uncertainty estimate for the total height gain since age 4. In this case, the things that matter the most are the alpha and omega points.
The handy dandy Standard Error of the Mean formula (SEM) does not apply here. SEM only applies when you’re measuring THE SAME THING over and over again; and when that SAME THING does not monotonically increase over time, and it does not monotonically decrease over time. We’re definitely NOT doing that in my example.
And we’re not doing that with the Argo buoys either, unless they’re programmed for synchronous temperature measurements, and they never get knocked out of kilter. There are probably valid statistical methods for analyzing the Argo experiment, but I haven’t the foggiest idea what they are.
What I do know is that stats methods are based upon mathematical theorems. A typical theorem has an if-part (hypothesis), and a then-part (conclusion).
Even though I’m no Statistician to the Stars, I am fairly skilled at recognizing when someone is applying the then-part of a theorem, when the if-part is not satisfied. Willis, color yourself busted. 🙂

Unattorney
January 28, 2012 7:56 am

Willis,
As a fisherman, you obviously have seen how a drifting object in the ocean attracts organic life, both plant and animal. While the effect of seaweed and schools of small fish may be minor, it increases with time.

Jim D
January 28, 2012 8:05 am

Willis, yes I see the annual error bars and they are larger than the one in the abstract as you would expect because Argo data becomes much more useful over longer time spans. if you want to use Argo to explain an annual change, I would agree it is difficult because even with the 0.004 C error you calculate, that is at the annual noise level, and it becomes even more impossible if you raise the error to 0.04 degrees by taking less floats. The point I make is that the floats can resolve changes over longer time spans, and their 0.5 W/m2 decadal accuracy seems reasonable to be able to do that.

Theo Goodwin
January 28, 2012 8:44 am

Larry Fields says:
January 28, 2012 at 4:20 am
Your analogy fails because you appeal to well known physical hypotheses about growth. The climate scientists managing or using the ARGO buoys have no similar set of physical hypotheses to appeal to. They have nothing (nada, zip) to define their event space. For them, any two temperature readings are perfectly comparable.

GaryW
January 28, 2012 9:25 am

Willis,
“I say that if 3,000 Argo buoys give an error of 0.004°C, then other things being at least approximately equal, 30 buoys should give an error of 0.04°C.
1. Are there logical or math errors in that calculation?
2. Do you think 30 Argo buoys (1,080 observations per year) can measure the annual temperature rise of the global ocean to within 0.04°C?”
Actually, from my experience with instrumentation, what we have been discussing about the theoretical aspects of instrumentation accuracy and precisions, and your questions above are about are different issues.
First, for question 1, my experience tells me that 0.004°C is wishful thinking by a couple orders of magnitude is the subject is absolute accuracy. Measuring temperatures to that accuracy, even in a laboratory environment, is tremendously difficult. I see you are using the IF…THEN statement again. I am saying your argument is invalid, but specifically about instrument calibration errors.
That 0.004°C value may come from a standard statistics procedure but that does not necessarily make the value meaningful in a discussion of instrument accuracy. As you stated previously, it is an IF…THEN issue on whether you assume instrument errors and measurement errors may be considered random. Obviously, if true, the 0.004°C might have some significance. Unfortunately, the IF test result is FALSE. Instrument errors may not be considered random.
It also disturbs me that the 0.004°C value is claimed when the value measured varies over a range of some 30°C (tropic to arctic), location (floating free), and extended time period. We use averaging of multiple values to estimate a most probable value for a single process condition. We do not assume it does anything but minimize the contribution of random noise, if it exists. It is never assumed to provide an improvement in overall accuracy. When the value is used in subsequent calculations, accuracy values are carried through as in 25.2°C +/- 1.5°C.
For question 2, again 0.004°C and 0.04°C speaks to the precision or most probably value of the temperature readings. It provides an estimate how well you believe your algorithm performs at calculating that, assuming all errors are random. To be complete, you must also describe the absolute accuracy of the original source data. This is where your systemic, irreducible errors must be described. If your measuring instrument is specified to have a +/- 0.005°C accuracy, you cannot claim they will not all be reading 0.005°C high or some other consistent non zero error value.
That, I believe, is the crux of our difference on this subject: whether instrument errors may be assumed random or not. My experience is that like instruments typically have similar error profiles. All that is required of the instrument is for it to meet its accuracy specification. It is neither required or expected that the error profile of a set of instruments be in any way random.
So, more succinctly, 1: NO, 2: NO, but for reasons based upon the applicability of averaging of values with respect to instrument errors, not about reducing the effect of random noise.
Let’s go back to your graphics above. I am assuming your were thinking that the rifles were the measurement instruments and the bullet patterns were examples of the rifle accuracy errors as displayed upon a calibrated plot shown as a target. This pair of graphics is actually applicable to one instrumentation scenario: Calibration adjustment in a lab. The center of the target center represents the desired instrument reading. From those graphics we can make statements about the absolute accuracy of the measuring instruments. Using your targets, a technician would then make the adjustments necessary to center the dots within the target.
When we take the instrument out into the field to make a measurement, we no longer have the absolute value target for comparison. In fact, all we have are the rifle and the bullet pattern. If average the positions of the bullets to arrive at some single point value, no matter how many bullet holes were created, we have no way of knowing if we are seeing the upper or lower pattern. We are effectively using the Texas Sharpshooter algorithm. (Shoot at a blank wall and draw a circle around the biggest concentration of bullet holes!) All we can say about that average as we increase number of holes we count is what the most probable value is tending toward. Our rifles could all be shooting high and to the left – after all they were all sighted in at the same station in the production line.
Do you think we have beat this subject to death yet?

Unattorney
January 28, 2012 9:34 am

Would a floating buoy also attract inorganic matter as well as tend to move toward other floating objects? I suspect a floating buoy warms over time from the micro and macro changes in it’s immediate environment.Has this been tested?

January 28, 2012 11:41 am

Brilliant contribution, George E. Smith; 1/27, 8:16 pm
Bringing in Nyquest into the argument is going for the jugular! You have Nyquest issues simultaneously in time, lat, long and depth. “The emperor has no clothes.”
Going back to Willis’s illustration of Accuracy vs Precision, I just want to point out that when looking at the data, all you have are the shot groupings — you don’t have the bullseye or the scope picture to go on. So the “More Precise” grouping will naturally be seen as “More Accurate” without very careful, very often, recalibration. A broken clock is a very precise time measurement device, worthless for accuracy, but very precise.
Finally, I want to tie the issue of independence of the measurments and covariance I brought up concerning Valdez and with Willis card counting at the top. I believe that the Argo buoys are probably positivly corelated in nearby measurements and that far away they are likely independent. That however, is a hypothesis on my part.
Willis’ card counting example is one of negative correlation: What has happened in the past within a deck changes the odds in the rest of the deck. That’s how it is possible to turn the odds in your favor. Sampling without replacement. Of course, it take a lot of knowledge aforehand about the domain and behavior of the deck.
What if, mind you: If, If, If If…., you had some 8 parameter Bessel function that had uncanny ability to predict ocean temperatures given (Lat, Long, Z, t). You use a random sample of 1/2 of your argo measurements to calibrate the 8 parameters. This then specifies the 4-D temperature profile of the ocean in the domain of the data. You then take the other 1/2 of the data points, calcullate the residual (measured – prediction) and you show that the model accounts for 99.99% of the measurement variance. If, If, If. If you had such a model, then your ability to evaluate the mean temp could be quite high.
Mind you, this is all theoretical. You must first show that magical predictive function, do uncertainty analysis on each of the parameters. But my point is that the measurement of the mean is not simply a function of the standard deviation of the measurements: It really should be the standard deviation of the error (measurement-prediction), which can be a small number.
Even with the theoretical model in mind, Willis’s 100 x more measurements for an other significant digit still stands. The theoretical model is critically based upon George E Smith’s observation that the sampling methodolgy passes the Nyquest test or the theoretical model is a bunch of hooey from the start.

Frank
January 28, 2012 12:55 pm

Willis: You might look at this paper (not behind a paywall) using Monte-Carlo calculations to determine how many stations were needed to meet the needs of the US Climate Reference Network.
Vose, Russell S., Matthew J. Menne, 2004: A Method to Determine Station Density Requirements for Climate Observing Networks. J. Climate, 17, 2961–2971.
A procedure is described that provides guidance in determining the number of stations required in a climate observing system deployed to capture temporal variability in the spatial mean of a climate parameter. The method entails reducing the density of an existing station network in a step-by-step fashion and quantifying subnetwork performance at each iteration. Under the assumption that the full network for the study area provides a reasonable estimate of the true spatial mean, this degradation process can be used to quantify the relationship between station density and network performance. The result is a systematic ‘‘cost–beneﬁt’’ relationship that can be used in conjunction with practical constraints to determine the number of stations to deploy. The approach is demonstrated using temperature and precipitation anomaly data from 4012 stations in the conterminous United States over the period 1971–2000. Results indicate that a U.S. climate observing system should consist of at least 25 quasi-uniformly distributed stations in order to reproduce interannual variability in temperature and precipitation because gains in the calculated performance measures begin to level off with higher station numbers. If trend detection is a high priority, then a higher density network of 135 evenly spaced stations is recommended. Through an analysis of long-term obser vations from the U.S. Historical Climatology Network, the 135-station solution is shown to exceed the climate monitoring goals of the U.S. Climate Reference Network.
http://journals.ametsoc.org/doi/abs/10.1175/1520-0442(2004)017%3C2961%3AAMTDSD%3E2.0.CO%3B2
The appears to be a similar paper for the Argo network, also freely available:
Schiller, A., S. E. Wijffels, G. A. Meyers, 2004: Design Requirements for an Argo Float Array in the Indian Ocean Inferred from Observing System Simulation Experiments. J. Atmos. Oceanic Technol., 21, 1598–1620.
doi: http://dx.doi.org/10.1175/1520-0426(2004)0212.0.CO;2
Experiments using OGCM output have been performed to assess sampling strategies for the Argo array in the Indian Ocean. The results suggest that spatial sampling is critical for resolving intraseasonal oscillations in the upper ocean, that is, about 500 km in the zonal and about 100 km in the equatorial meridional direction. Frequent temporal sampling becomes particularly important in dynamically active areas such as the western boundary current regime and the equatorial waveguide. High-frequency sampling is required in these areas to maintain an acceptable signal-to-noise ratio, suggesting a minimum sampling interval of 5 days for capturing intraseasonal oscillations in the upper Indian Ocean. Sampling of seasonal and longer-term variability down to 2000-m depth is less critical within the range of sampling options of Argo floats, as signal-to-noise ratios for sampling intervals up to about 20 days are almost always larger than one. However, these results are based on a single OGCM and are subject to model characteristics and errors. Based on a coordinated effort, results from various models could provide more robust estimates by minimizing the impact of individual model errors on sampling strategies.

Mike H.
January 28, 2012 1:23 pm

Old44, thank you!

Theo Goodwin
January 28, 2012 1:31 pm

E. M. Smith writes: (5 paragraphs)
“We assume implicitly that the air temperature is some kind of “standard air” or some kind of “average air”; but it isn’t. Sometimes it has snow in it. The humidity is different. The barometric pressure is different. (So the mass / volume changes).
For Argo buoys, we have ocean water. That’s a little bit better. But we still have surface evaporation (so that temperature does not serve as a good proxy for heat at the surface as some left via evaporation), we have ice forming in polar regions, and we have different salinities to deal with. Gases dissolve, or leave solution. A whole lot of things happen chemically in the oceans too.
So take two measurements of ocean temperature. One at the surface near Hawaii, the other toward the pole at Greenland. Can you just average them and say anything about heat, really? Even as large ocean overturning currents move masses of cold water to the top? As ice forms releasing heat? (Or melts, absorbing it)? How about a buoy that dives through the various saline layers near the Antarctic. Is there NO heat impact from more / less salt?
Basically, you can not do calorimetry with temperature alone, and all of “Global Warming Climate Science” is based on doing calorimetry with temperatures alone. A foundational flaw.
It is an assumption that the phase changes and mass balances and everything else just “average out”, but we know they do not. Volcanic heat additions to the ocean floor CHANGE over time. We know volcanoes have long cycle variation. Salinity changes from place to place all over the ocean. The Gulf Stream changes location, depth, and velocity and we assume we have random enough samples to not be biased by these things.”
George M. Smith writes later: (5 paragraphs)
“#1 The sea is not like a piece of rock (large rock), it is full of rivers with water flowing along every which way; meandering if you will, and this meandering is aided and abetted by the twice daily tidal bulge.
So the likelihood of a buoy, no matter how tethered or GPS located, being in the same water for very long is pretty miniscule, so you might as well assume that evry single observation, is actually a single observation of a different piece of water.
Second and far more important, this like all climate recording regimens, is a sampled data sytem.
So before you can even begin to do your statistication on the observations, you have to have VALID data, as determined by the Nyquist theorem.
You have to take samples that are spaced no further apart than half the wavelength of the highest frequency component in the “signal”. The “signal” of course is the time and space varying temperature or whatever else variable you want to observe. That of course means the signal must be band limited, both in space and time. If the Temperature shall we say undergoes cyclic variations in say a 24 hour period, that look like a smooth sinusoid if you take a time continuous record, then thn you must take a sample sooner that 12 hours after the previous one. If the time variation is not a pure sinusoid, then it at least has a second harmonic overtone component, so you would need one sample every six hours,
And if the water is turbulent and has eddies with spatial cycles of say 100 km, then you would need to sample every 50 km. OOoops !! I believe that all of your spatial samples need to be taken at the same time, otherwise you are simply sampling noise.”
Both men nail the problem. E. M. Smith offers a commonsense explanation of the problem which nails the point that all Warmists’ statistical work on the ocean assumes that there are no differences between any two “sections” of ocean measured for temperature. George E. Smith then introduces the Nyquist Theorem to make the point that the Warmists’ regime of sampling is far inadequate to its purpose.
Both points support my argument that if one is to do statistical work at all then one must have an “event space” that is well defined by well confirmed physical hypotheses which serve the role of identifying the relevant events and the kinds that they fall into. Once the event space is so defined, then temperature measurements can be sorted in accordance with the physical hypotheses governing the natural phenomena, such as oceanic “rivers,” in the oceans sampled. Without such knowledge of the oceans sampled, statisticians are assuming that their temperature measurements are wholly plastic; that is, they are assuming that any two temperature measurements are comparable to one another regardless of the differences in the areas of ocean sampled. Such assumptions go beyond the offense of “a priori” science and become a clear cut example of plain old cheating. (Maybe not cheating, but the alternative is idiocy.)
By the way, all the counterexamples to Willis’ card analogy fail because each of them introduces knowledge about the cards, about a child’s growth patterns, or whatever. Yet the Warmists’ use of ARGO data omits all reference to any differences among areas of the ocean sampled so that the event space is plastic. So the counterexamples fail because each introduces knowledge that makes the event space non-plastic.
Why is it that every Warmist fails to understand the requirements of empirical science or seeks to avoid those requirements?

scarletmacaw
January 28, 2012 2:33 pm

Willis, your blackjack story poses a question. If the casino is so concerned that dealer might cheat in favor of (collaboration with) a customer, doesn’t that imply that the casino admits that a dealer is capable of cheating? If so, shouldn’t the customers also be concerned that the dealer might cheat?
That 1% edge could evaporate very quickly.

Camburn
January 28, 2012 4:47 pm

Theo Goodwin says:
January 28, 2012 at 1:31 pm
“Why is it that every Warmist fails to understand the requirements of empirical science or seeks to avoid those requirements?”
This cann’t be repeated often enough.

Theo Goodwin
January 28, 2012 5:07 pm

Willis Eschenbach says:
January 28, 2012 at 5:00 pm
“One thing I learned early in this game is to always avoid absolute statements.
You like that? An absolute statement saying avoid absolute statements.”
Scores with me. Enjoyed your post.

Theo Goodwin
January 28, 2012 6:02 pm

Camburn says:
January 28, 2012 at 4:47 pm
Theo Goodwin says:
January 28, 2012 at 1:31 pm
“Why is it that every Warmist fails to understand the requirements of empirical science or seeks to avoid those requirements?”
“This cann’t be repeated often enough.”
Thanks, Camburn. I want to take just a moment to dredge up the heart of my claim and make it clear as a bell for everyone. The claim is very important as I will explain.
No “consensus” climate science working at this time will address the claim that there are no well confirmed physical hypotheses that can explain even one physical connection between increasing CO2 concentrations in the atmosphere and the behavior of clouds and related phenomena. They will not address the claim because they know that no such physical hypotheses exist. And that is the scandal of climate science today. Even Arrhenius knew that without the “forcings” and “feedbacks” there is no way to know what effects increasing concentrations of CO2 might have on Earth’s temperature. Thus, “consensus” climate scientists are mistaken to claim that there is scientific evidence that supports the CAGW or AGW thesis.
When I say that “consensus” climate scientists either do not understand the requirements of empirical science or seek to avoid those requirements, it is their failure to address the nonexistence of these necessary physical hypotheses that is uppermost in my mind. This is the scandal of climate science and every critic should be doing all that he can to hold “consensus” scientists’ feet to the fire. They are beaten on the science. All that is necessary is that we press the case.
What “consensus” climate scientists are willing to discuss are unvalidated and unvalidatable computer models and the laughable proxy studies. Those two topics are nothing but grand Red Herrings.
Finally, if you need proof that “consensus” climate scientists have no well confirmed physical hypotheses that can explain and predict the effects of CO2 on clouds, just ask them for the hypotheses. None have produced them and none can produce them. The necessary physical hypotheses do not exist.

markx
January 29, 2012 1:35 am

From Theo Goodwin January 28, 2012 at 1:31 pm and quoting E M Smith and George M. Smith:
This is the simple crux of the matter:
1. “E. M. SmithSo the likelihood of a buoy, no matter how tethered or GPS located, being in the same water for very long is pretty miniscule, so you might as well assume that every single observation, is actually a single observation of a different piece of water.”
2. “…. that all of your spatial samples need to be taken at the same time, otherwise you are simply sampling noise.”
Any calculation of SEMs for those plotted curves must be a very strangely fabricated web of modifiers, smoothings and adjustments.

markx
January 29, 2012 12:07 pm

For the statisticians:
How well can we derive Global Ocean Indicators from Argo data? K. von Schuckmann and P.-Y. Le Traon
Ocean Sci. Discuss., 8, 999–1024, 2011
http://www.ocean-sci-discuss.net/8/999/2011/
doi:10.5194/osd-8-999-2011
© Author(s) 2011. CC Attribution 3.0 License.
http://www.ocean-sci-discuss.net/8/999/2011/osd-8-999-2011-print.pdf

January 30, 2012 1:07 pm

re: Argo bias.
Has there been anything written on the “flight” behavior of the Argo submersible that propels itself by changing buoyancy and tilting vanes? If these were hot air balloons they would center themselves in a column of warmer air – and appear to be pushed out of a colder column when sitting astride a boundary – this may even be an amplifier (small change at less than the level of the resolution of the device could have a significant positional effect (?)).
And large bodies of water have much more energy (mass in motion) (in very low frequency turbulence) than air. So how do we know the instrument isn’t “heat seeking” given the opportunity? And what delta in heat will it seek? Smaller than the instrument’s resolution, perhaps?

DeWitt Payne
January 31, 2012 1:41 pm

Willis,
I’ve also done high precision and accuracy temperature measurements in a laboratory. Measuring to 0.001 degree over a fairly wide range of temperature can be tedious, but not at all impossible. But the ARGO buoys do not need to measure a wide range of temperature. I would be very surprised if the individual buoys don’t have a measurement precision and accuracy of better than 0.004 degree.

January 31, 2012 2:56 pm

Instrumentation is what I did for ten years and no way could I guarantee +- .005 c of accuracy on a temperature device that has to measure such a wide temperature range and even then it would require 6 month cals. These floats only cost \$15,000 (only ?) and no way can a device bob around in the ocean for 5 years and remain that accurate.

February 3, 2012 1:21 pm

A follow up on the Nyquist and sampling issues brought up by the following commenters:
Ari Tai (Flight behavior; are floats heat seeking?)
Martin A (spatial Autocorrelation)
George E. Smith (Nyquist sampling)
Here is a link to Univ. of Miami: Surface currents of the Atlantic Ocean. Their Figure 6 and Figure 7 are two examples of how buoys move before, during, and after capture by the Gulf Stream. Though these are only two paths I think these reinforce the importance of Ari question. Figure 3 is a spaghetti plot of NOAA AOML Drifting Buoy Data Assembly (DAC) Center’s archived near-surface buoys, from 1978 to June, 2003. Bear in mind that the Gulf Stream is black because of fast moving buoys, long segment length paths, not necessarily long residence times.
Since the buoys are trapped to be in the “upper half” of the water column in an ocean that has currents in three-dimensions, it is I think provable that the buoys will oversample downward vectored current flow and under sample upward vectored flow. This is spatial autocorrelation, and an element of positive covariance.
This brings me back to the issue of Nyquist issues in sampling density in (time, lat, long, z). To further complicate matters, “time” is composed of time of day (important in near surface) (t1), season (time of year) (t2), temporal position within a hypothesized ocean oscillation (t3), and finally where in the time frame of a hypothesized CO2-forced climate change (the sought after warming trend) (t4).
By my count, we have five dimensions (t1, t2, Lat, Long, z) where Nyquist is an issue. Are we sampling enough to be able to predict temperatures at points of interpolation? Certainly, in some parts of the ocean we are not. The Gulf Stream might need spatial sampling 20 km in cross section and the currents move the buoys away from that highly spatially variable patch quickly. Yes, we can calculate a sampled mean value, but the uncertainty in the population (ocean) mean value is made larger due to Nyquist.
In regards to t3, we have the opposite of the Nyquist. Are we sampling long enough not to conflate t3 (natural oscillations) and t4 (monocline) effects? We account for t1, t2, lat, long, and z with some predictive function of base-case temperature (with uncertainty). From that we wish to we can measure an anomaly based upon t4. But t3 and t4 are conflated. Can we account for the effects of t3 vs. t4 if we have not adequately measured temperature changes WRT natural oceanic oscillation? How can we if the principle cycle of an ocean oscillation is longer than the record of measurements?

February 3, 2012 4:09 pm

Here is a link to Long-term Sensor Drift Found in Recovered Argo Profiling
Floats,
Oka-2005, Journal of Oceanography, Vol. 61, pp. 775 to 781, 2005.
It describes the instrument drift of 3 Argo floats recovered in 2003 off the Japan coast. Temperature drift might be within 0.003 deg C over a span of 0 to 33 deg C. (Impressive!)
Figure 1 of the paper shows the location drift from the surface locations the buoy reporting every 10 days over a span of 2 to 2.5 years (2001 to 2003). One of these buoys moved around in a box approximately the size of Japan.

February 3, 2012 4:49 pm

Other references for Argo Drift:
Interannual atmospheric variability forced by the deep equatorial Atlantic Ocean, Brandt-2011, Nature 473,497–500(26 May 2011).
Figure 4: Argo float east west velocity at 1 S to 1 N latitude, Y axis = date, x-axis longitude. It shows Argos can move or 20 deg W or 30 deg E in a year. (2000 to 3000 km) ( off West Africa )
Figure 2: shows E-W velocities (color) as a function of depth (-20 to 20 cm/sec), with peak to peak reversals in as little as 300 m of depth, repeatedly. (Y-axis depth to 3500 m. x-axis is time) Location: 0 N, 23 W. (Moored, non-Argo, data)