Guest Post by Willis Eschenbach
Over at Judith Curry’s excellent blog there’s a discussion of Trenberth’s missing heat. A new paper about oceanic temperatures says the heat’s not really missing, we just don’t have accurate enough information to tell where it is. The paper’s called Observed changes in top-of-the-atmosphere radiation and upper-ocean heating consistent within uncertainty.
It’s paywalled, and I was interested in one rough number, so I haven’t read it. The number that I wanted was the error estimate for their oceanic heating rates. This error can be seen in Figures 1a and 3a on the abstract page, and it is on the order of about plus or minus one watt/m2. This is consistent with other estimates of upper ocean heat content measurement errors.
I think I can conclusively demonstrate that their claimed error is way too small. To understand why, let me take a detour through the art, science, and business of blackjack.
In a fit of misguided passion, some years back I decided to learn how to count cards at blackjack. I had money and time at the same moment, an unusual combination in my life, so I took a class from a guy I’ll call Jimmy Chan. Paid good money for the class, and I got good value. I’ve always been good with figures, and I came out good at counting cards. Not as good as Jimmy, though, he was a mad keen player who had made a lot of money counting cards.
At the time they were still playing single deck in Reno. And I was young, single, and stupid. So I took twenty thousand dollars from my savings for my grubstake and went to Reno. It was an education about a curious business.
Here are the economics of the business of counting cards.
First, if you count using one of the usual systems as I did, and you are playing single deck, it gives you about a 1% edge on the house. Not much, to be sure, but it is a solid edge. And you can add to that by using a better counting system or a concurrent betting system, where better means more complex.
Second, if you play head-to-head (just you and the dealer) you can typically play about a hundred hands an hour.
Doesn’t take a math whiz to see that if you don’t blow the count, you will win about one extra hand an hour.
And therein is the catch. It means that in the card counting business, your average hourly wage is the amount of your average bet.
It’s a catch because of the other inexorable rule of counting blackjack. This regards surviving the swings and arrows of outrageous luck. If you don’t want to go home empty-handed, you need to have a grubstake that is a thousand times your average bet. Otherwise, you could go bust just from the natural ups and downs.
Now, twenty thousand dollars was all I could scrape together then. So that meant my average bet couldn’t be more than twenty dollars. I started out at the five dollar level.
I’d never spent any time in a casino up until then. I felt like the rube in every movie I ever saw. I played a while at the five dollar level. You never win or lose much there, so nobody paid any attention to me.
After a day or so making the princely sum of $5 per hour, I started betting larger. First at the ten-dollar level. Then at the twenty-dollar level. That was good money back in those days.
But when you start to make a bit of money, like say you hit a few blackjacks in a row and you’re doubling down, they start paying attention to you, and the trouble begins. First they use the casino holodeck to transport a somewhat malignant looking dwarf armed with a pad and a pencil to your table. He materializes at the shoulder of the dealer, and she starts to sweat. I say she because most dealers were women then and now. She starts to sweat because the casino doesn’t really care about card counters. I was making $20 an hour on average? Big deal, everyone in the casino management made that and more.
What scares casino owners is collusion between dealers and players. With the connivance of the dealer a guy can have a “string of luck” that can clean out a table in fifteen minutes and be out the door, meeting the dealer later to split the money. That’s what casino owners worry about, and that’s why the dealer started sweating, she knew she was being watched too. The dwarf peered through coke-bottle thick glasses, and wrote down the number of chips on each stack in the dealer’s rack, how much money I had, how much other players had. He gave the dealer a new deck. He wore a suit that cost as much as my grubstake. His wingtip shoes were shined to a rich luster. He looked at me as though I were a rich man with a loathsome disease. He watched my eyes, my hands. I started sweating like the dealer.
If I continued to win, the holodeck went into action again. This time what materialized were two large, vaguely anthropoid looking gentlemen, whose suits were specially tailored to conceal a bulge under the off-hand shoulder. They simply appeared, one at each shoulder of the aforementioned vertically challenged gentleman, who looked even dwarfier next to them, but clearly at ease in his natural element. They all three stared at me, and when that bored them, at the dealer. And then at me again.
And if the dealer was sweating, I was melting. I’m not made for that kind of game, I’m not good at that kind of pretence. I found out you can take the cowboy out of the country, but you can’t make him go mano-a-mano with the casinos for twenty bucks an hour.
I lasted a week. I logged my hours and my winnings. During that time, I worked well over forty hours. I only made enough money to pay for the flight and the hotel, and that’s about it. I was glad to put my twenty grand back in the bank.
I couldn’t take the constant strain and pressure of counting and not looking like I was counting and trying to stay invisible and feeling like a million eyes in the sky were watching my every eyeblink and having an inescapable feeling of being that guy in the movies who’s about to be squashed like a bug. But for those who can make it a game and keep it up, what an adventure! I’m glad I did it, wouldn’t do it again.
The part I liked the least, curiously, was something else entirely. It was that my every move was fixed. For every conceivable combination of my cards, the dealer’s card, and the count, there is one and only one right move. Not two. Not “player’s choice”. One move. I definitely didn’t like the feeling that I could be replaced by a vaguely humanoid 100% Turing-tested robot with a poor sense of dress and a really, really simple set of blackjack instructions
But I was still interested in the math of it all. And I had my trusty Macintosh 512. And Jimmy Chan had an idea about how to improve the odds by changing his counting method. And so did some of Jimmy’s friends. And he had a guy who tested their new counting method for them, at some university, for five hundred bucks a run.
So I told Jimmy I’d do the analysis for a hundred bucks a run. He and his friends were interested. I wrote a program for my Mac to play blackjack against itself. I wrote it in Basic, because that was what was easy. But it was sloooow. So I taught myself to program in C, and I rewrote the entire program in C. It was still too slow, so I translated the critical sections into assembly language. Finally, it was fast enough. I would set up a run during the day, programming in the details of however the person wanted to do the count. Then I’d start it when I went to bed, and in the morning the run would be done and I’d have made a hundred bucks. I figured that I’d finally achieved what my computer was really for, which was to make me money while I slept.
The computer had to be fast because of the issue that is at the heart of this post. This is, how many hands of blackjack did the computer have to play against itself to find out if the new system beat the old system?
The answer turns out to be a hundred times more hands per decimal. In practice, this means at least a million hands, and many more is better.
What we are looking at is the error of the average. If I measure something many times, I can average my answers. Is the resulting mean value the true underlying mean of what I am measuring? No, of course not. If we flip a hundred coins, usually it won’t be exactly fifty/fifty.
But it will be close to the true average of the data. How close? Well, the measure of how close it is expected to be to the true underlying average is what is called the “standard error of the mean”. It is calculated as the standard deviation of the data divided by the square root of the number of observations.
It is the last fact that concerns us. It means that if we double the number of observations, we don’t cut the error in half, but only to 0.7 of the original value. One consequence of this is that if we need one more decimal of precision, we need a hundred times the number of observations. That is what I meant by a hundred times per decimal. If our precision is plus or minus a tenth (± 0.1) and we want to know the answer to one more decimal, plus or minus one hundredth (± 0.01), we need one hundred times the data to get that precision.
That is the end of the detour, now let me return to my investigation of their error estimate for the ocean heating rate for the top 1800 metres of the ocean. If you recall, or even if you don’t, that was 1 watt per square metre (W/m2).
Now, that is calculated from temperature readings from Argo floats, about 3,000 of them during the study period.
Let me run through the numbers to convert their error (in w/m2) into a temperature change (in °C/year). I’ve comma-separated them for easy import into a spreadsheet if you wish.
We start with the forcing error and the depth heated as our inputs, and one constant, the energy to heat seawater one degree:
Energy to heat seawater:, 4.00E+06, joules/tonne/°C
Forcing error: plus or minus, 1, watts/m2
Depth heated:, 1800, metres
Then we calculate
Seawater weight:, 1860, tonnes
for a density of about 1.03333.
We multiply watts by seconds per year to give
Joules from forcing:, 3.16E+07, joules/yr
Finally, Joules available / (Tonnes of water times energy to heat a tonne by 1°C) gives us
Temperature error: plus or minus, 0.004, degrees/yr
So, assuming there are no problems with my math, they are claiming that they can measure the temperature rise of the top mile of the global ocean to within 0.004°C per year. That seems way too small an error to me. But is it too small? If we have lots and lots of observations, surely we can get the error down to that small?
Here’s the problem with their claim that the error is that small. I’ve raised this question at Judith’s and elsewhere, and gotten no answer. So I am posing the question again, in the hope that someone can unravel the puzzle.
We know that to get a smaller error by one decimal, we need a hundred times more observations per decimal point. But the same is true in reverse. If we need less precision, we don’t need as many observations. If we need one less decimal point, we can do it with one-hundredth of the observations.
Currently, they claim an error of ± 0.004°C (four thousandths of a degree) for the annual average upper ocean temperature from the observations of the three thousand or so Argo buoys.
But that means that if we are satisfied with an error of ± 0.04°C (four hundredths of a degree), we could do it with a hundredth of the number of observations, or about 30 Argo buoys. And it also indicates that 3 Argo buoys could measure that same huge volume, the entire global ocean from pole to pole, to within a tenth of a degree.
And that is the problem I see. There’s no possible way that thirty buoys could measure the top mile of the whole ocean to that kind of accuracy, four hundredths of a degree C. The ocean is far too large and varied for thirty Argo floats to do that.
What am I missing here? Have I made some major math mistake? Their claimed error seems to be way out of line for the number of observations. I’ve not been able to find a good explanation of how they come up with these claims of extreme precision, but however they’re doing it, my math doesn’t support it.
And that’s the puzzle. Comments welcome.
Regards to everyone,
w.
Discover more from Watts Up With That?
Subscribe to get the latest posts sent to your email.
Three Argo buoys? In principle, that’s not a problem, assuming that the AVERAGE temperature of the ocean down to 1800m is CONSTANT over time. Given that assumption, the LOCAL water temperatures can vary somewhat over time, with some of them going up a bit, while others are going down a bit. Given an arbitrary length of time for making measurements, even 3 buoys could pin down the AVERAGE temperature of the ocean down to 1800m to whatever degree of accuracy you like.
Here’s the fly in the ointment: The handy dandy Standard Error of the Mean (SEM) formula is only valid in the case where you take repeated measurements OF THE SAME THING. In this case, “the same thing” is the AVERAGE temperature of the ocean down to 1800m.
That magic formula does not apply here, because the AVERAGE temperature of the ocean down to 1800m is always changing slightly. Sometimes it’s going up slightly, and at other times it’s going down slightly. Here are a couple of other ways to frame the issue.
The SEM formula only applies to RANDOM ERRORS. It does NOT apply when there are SYSTEMATIC ERRORS, aka METHOD ERRORS. And that’s what we’re talking about here.
The buoys’ thermometers are shooting at a moving target, and that’s beyond the scope of the SEM formula.
To use a Peter Principle expression, we’re promoting the SEM formula to its level of incompetence.
My gut feeling is that the researchers have vastly overstated their case. However I don’t know enough about stats to tell them how to do their analysis correctly. Unfortunately, they don’t know enough either. But that’s par for the course in Climate Change ‘science’.
David Falkner says:
January 26, 2012 at 10:48 pm
I’m just reporting what they say. Can the standard error of the mean be smaller than the instrument error? Sure. You just need lots of measurements.
w.
To me the missing 0.9 W/m² is a joke. I won’t bore the moderators with another KT 1997 atmospheric window post on the subject.
The KT 1997 calculation for the atmospheric window (using correct math) gives 80 W/m² or 87 W/m² depending on what assumptions you make. If the 80 W/m^2 value is correct, then that’s a 100% error of 40 W/m². If the 87 W/m² value is correct, then we’re talking about a greater than 100% error of 47 W/m². Either value dwarfs that 0.9 W/m² value.
Even if we calculate the atmospheric window the way KT 1997 does, we get 37.62 W/m² which they round up to 40 W/m². That’s a 2.38 W/m² slop in rounding alone.
As I said, that 0.9 W/m² value is laughable.
Jim
Jim D says:
January 26, 2012 at 10:42 pm
Thanks, Jim. Take another look at where I got the data. It shows the annual average values for the heating rate (in W/m2). Each and every one of these annual values has individual error bars. The value varies year to year, but averages around ± one watt per square metre.
So it is not a decadal trend that we are discussing. It is annual averages, with corresponding annual error bars.
w.
steven mosher says:
January 26, 2012 at 10:56 pm
Will do. I hope the paper will explain how they calculated the error bars. I’ve read several, and come away frustrated. But by all means, send me this one.
However, you’re a mathematician, Mosh. What’s wrong with my calculation that 30 Argo floats could measure the global ocean to 0.04°C? Because that doesn’t depend on how they calculated the error, that’s why I didn’t bother reading it, it makes no difference to the problem I’m highlighting.
w.
Alan Wilkinson says:
January 26, 2012 at 11:01 pm
That’s called the “steric sea level”, Anne Cazenave has done work on that, google those two.
w.
The Argo project has given us a blurry snapshot of ocean temperatures, but that is all it has done.
Based on many small points of accurate data a picture has been assembled.
How could they possibly know the accuracy of their assumptions about such a dynamic system as an ocean until they repeat the test, in say, ten years?
Alan S. Blue says:
January 26, 2012 at 11:21 pm
Thanks for that, Alan. Its always good to hear from folks who actually need to measure this stuff for industry. Five thermometers on a small tank might give us a tenth of a degree accuracy, that agrees with what I might think. Expand that to the top mile of the ocean, I don’t see 30 Argo floats could measure it to a hundredth of a degree. Yet that is what they claim.
w.
F. Ross says:
January 26, 2012 at 11:21 pm
Batteries, AFAIK. And what they do is dive deep. Then they add a little bit of buoyancy, and over six hours float gradually to the surface taking a temperature profile at preselected pressure depths. As such I doubt the battery is doing much during the ascent.
w.
John De Beer nails the issue – density of measurements against the scope and stability of the system being measured. 3,000 buoys is literally a drop in the ocean. You cannot extrapolate these with any confidence. It is the same mistake GISS makes by smearing one temperature measurement across 500-1400 km. GISS claims a reading in Washington State can indicate the temperature as far away as Southern CA.
Utter nonsense. And the reason it is utter nonsense is the atmosphere is not homogenous or stable over those distances. temperatures are not homogenous over 10’s of kilometers on any given day in most places. The can often have a standard deviation of 3-4 °F.
The required density of measurements for something as large and dynamic as the temperature of the atmosphere or oceans is enormous if you want sub degree accuracy. We don’t come close with land or ocean sampling.
Spacecraft come the closest because they use a uniform measurement (one instrument, one calibration configuration) over the entire globe and take many thousands of measurements. The problem here is they only see an area of the globe every 100 minutes or so. Therefore not all measurements are at the same time of day everywhere on the Earth (though every point on the Earth is measured at the same times a day).
That is why the benchmark should be satellites, and we can use them to measure the error in widely distributed surface samples from an unknown number of sensors of unknown calibration quality. I have proposed this in the past. A satellite can give an average over a region everyday at the same time. You then compare the thermometer results every day to the satellite average. It would be an interesting result to see.
Anyway, the sample density has to reflect the dynamics of the system. By example I need a lot of samples to determine the orbit of a satellite to a high accuracy. But the orbit only slowly decays under various non-linear forces. So I don’t need to re-sample for 7-10 days to regain my precision in orbit knowledge for most applications.
However, if I want to know Sea Surface levels to cm’s, I need to sample my orbit continuously (through GPS) and then post process the data again with ground references (differential GPS) to theoretically get the CM resolution.
Orbits are simple, slow changing systems. Atmosphere and ocean mixing, currents, dynamics etc are not. Once you realize how much dynamic is truly out there, you realize how crazy it is for GISS or CRU or NCDC to claim sub degree accuracy on a global scale about anything. Temperature samples are only good to feet sub degree (if that). It is impossible with current sample densities to improve on that unless you have millions of samples.
In other words, what we don’t measure regarding temps dwarfs what we do sample by many orders of magnitude. Add in the lack of samples from before the modern era, and you realize we are clueless not to just what is happening now, but what happened 50 years ago (let alone 1,000).
Read James’ comment again. Your error here is in comparing two different distributions (and not seeming to have read up on any statistics).
When you’re counting improbable events, (p<<1-p), the standard deviation of the number of wins will tend towards sqrt(n) as the number of trials increases.
I'm currently running randomly generated tests to find problems in my design. If I see 5 failures out of 10, I can be reasonably sure that with any other random group of ten tests, I'll still see between 8 and 2 failures. If I only see 1 failure in 100, I need to run 10,000 tests in order to be confident in the underlying failure rate. Only when I've seen at least 10 failures can I have much confidence that my failure wasn't a fluke.
With continuous samples, combining fractions of a degree (particularly if the month to month noise is significant with regard to the sampling resolution) it should be less necessary to take many many measurements – but the statistics is fairly well known, provided you can identify the nature of the variables you are dealing with. You can also make a model fairly simply to determine the outcome of a sample of random runs.
Just a couple of other points to ponder…
Again, coming from a geology background in mining and geostatistics. What sort of quality control is there on these instruments? For example, do they need to be regularly calibrated? Are there duplicate measurements collected to assess precision (sampling precision).
One thing you touched on is the representativeness of the measurements…for me this basically comes down to what’s often refered to as sampling theory…look up Pierre Gy and Francis Pitard if you’d like to investigate the issues of sampling errors and how representative a sample is. Sampling theory is primarily about sampling particulate materials, but I think the basic principles would also apply to sampling water temperature (and atmospheric temperature and CO2 content).
If each measurement was made by a different device (or a random device from the pool of devices) at a random location (spatially & perhaps temporally) this would be true. But if my thermometer (I only have one, you know) is graduated in 1°F, it really doesn’t matter how many times I look at that danged thing, I can’t accurately measure to ±0.01°F.
Blade says:
January 26, 2012 at 11:48 pm
No, he was one of the guys associated with Thorpe.
w.
CLIMATE SCIENCE gone HOMEOPATHIC !
There is hilarious cartoon from Josh on the Tallbloke’s blog.
http://tallbloke.wordpress.com/2012/01/27/gavin-schmidt-climate-homeopathy/#more-4579
I was always sceptical about the ppm science.
Williis,
a more simple way of “dealing” with this problem is;
we have 335,358,000 sq km of oceans on our planet,
which means that if the buoys were place at regular intervals there would be ;
1 for each 111,785 sq km
the state of Virginia 110,785 sq km
the state of Tennessee 110,910 sq km
or the country of;
Bulgaria 110,910 sq km
Now I am sure that any resident of the above would give you an answer to the problem if the was only ONE temp. recorder in their state, country..
This reminds me of what the late John L Daly wrote:
http://www.john-daly.com/altimetry/topex.htm
TOPEX-Poseidon Radar Altimetry:
Averaging the Averages
“How many stages of statistical averaging can take place from a body of raw data before the statistical output becomes hopelessly decoupled from the raw data which creates it?
Imagine for example getting ten people to take turns at measuring the distance from London to New York using a simple ruler on a large map from an atlas. Each person would give a slightly different reading, perhaps accurate to +/- 10 miles, but if all these readings were averaged, would that make the final resolution of the distance accurate to one mile? Perhaps. But if a thousand people were to do it, would that narrow the resolution to mere yards or metres? Intuitively, we know that would not happen, that an average of the measurements of a thousand people would be little better than an average from ten people.
Clearly, there are limits as to how far statistical averaging can be used in this way to obtain greater resolution. An average of even a million such measurements would be scarcely more accurate than the average of ten, diminishing returns from an increasing number of measurements placing a clear limit on the resolution achievable. The problem lay not in the statistics but in the inherent limitations of the measuring devices themselves (in this case, a simple ruler and map)….”
Great post.
Have they looked at the error tolerances of these 3000 argo buoys? Each one cannot possibly perform exactly the same under all conditons, it is impossible. Where were they made, under what factory conditions? De ja vu I recall when at college in my yoof doing surveying at Brighton Uni for a week. We used the Kingston College Wilt T2 (pronounced Vilt) Total Station theodolite built in Switzerland, there were only two in the UK at the time, one held by the college, the other by what was then Greater London Council, & a Nikon ?? Total Station theodolite can’t remember the model number, but it was almost comparable in quality. However that is irrelevant. We were told that we had to be very careful in calculating station data by “face-left” & “face-right” observations, because of the following:- The T2 was claimed to be able to read angles to an accuracy of 1 second of arc, it said so in the manual, whereas the Nikon only claimed to be able to read angles of up to 3 seconds of arc. The Japanese lenses was ground to an accuracy of 1 second of arc, the T2 lenes were ground to an accuracy of 3 seconds of arc! Therefore one was more reliable than the other, although the other claimed a greater accuracy. You pays your money & takes your choice! And frankly, if anybody tells me that they can measure the temperature of the Earth, the atmosphere, or the oceans, to an accuracy greater than around a tenth of a degree Celcius I’d laugh in their face! Prof Paul Jones’ claims that the three warming periods of the last 150 years had warming rates measured to a thousanth of a degree were just ridiculous & could only be derrived from arithmetical construction!
You take the depth heated as 1800m.
Looking at the depth of the thermocline in the tropics seems to be more like 500 m or so
that is DIRECTLY heated by incoming solar.
Thermocline depth reduces to 0m towards the polar circle.
Taking an average depth of eg. 200m gives a much greater temp. error
Whilst the ARGO system is good it is not that accurate. Just look at the figures, there are 335,258,000sq Km of ocean and 3200 buoys so each buoy is monitoring 104,768sq Km of ocean times depth gives the not inconsiderable volume of 188,582cu Km to look after and get the temperature. Quite a feat but not possible to any real sort of accuracy.
Many buoys are free floating so will remain in the same volume of water in a current which will not be representative of the ocean as a whole.
Assuming that the instruments are precise enough to measure a thousandth of a degree. If the temperature of the oceans was the same everywhere, you would need one buoy to take the oceans temperature.
Otherwise you would need one buoy in every blob that was a thousandths of a degree different to its neighbour. If there is a twenty degree range, that means twenty thousand bouys, one for each blob. But the blobs are seperated into thousands or millions of seperate same temp blobs. They are seperated vertically as well.
So I reckon 20,000 * 1,000 * 1,000 buoys might be getting close.
thats twenty giga-bouys
There are problems with the notion of taking a bunch of measurements, and averaging them to get ever greater precision. Unfortunately, I’ve fought that battle for months (years?) and have tired of it. Why? Because their are subtleties to it that folks just do not believe.
The first issue is that SOMETIMES you can do it, and it works fine. Other times, not so much. What goes into which bucket is hard to list / explain… so just ends up with endless bickering of the “does so / does not” kind. (For that reason, I’m going to say what I have to say then simply ignore any followup. I know where it ends and it simply is not worth the time.)
The “just do it” folks all had the statistics class that showed how the deviation of the mean could be lower than the deviation of the values. (I had it too). What they didn’t have (or don’t remember?) was the caveats about it not always being applicable.
So what works?
Well, measure a thing with an instrument. The more times you do it, the closer the average comes to the real deal. Take a piece of paper 11 inches wide. Measure it with a ruler and you may get 10.9 one time, 11.1 the next, and 11.0 two times. Repeated enough, that first 0.1 error will tend to be averaged out by more 11.0 and offset by the same number of 11.1 measurements. This removes the random error in the act of measuring.
Measure it with a different instrument each time and you can remove the random instrument errors.
All well and good.
HOWEVER: The error has to be random, and not systematic.
If I always measure from one side (being right handed and right eyed, for example) and have a systematic parallax error in reading the ruler, I will have a systematic error that can not be removed via averaging. If my ruler is simply wrong, all measurements will be similarly biased.
The requirement of ‘random error’ is often forgotten and the assertion is typically made that the error band on the instrument is known, so there is no systematic error. But if you have a requirement for, say, +/- 0.1 you could easily have, for example, an electronic part that always ages toward the + side, introducing a ++ bias in all measurements, but still being inside the ‘acceptance band’.
And what if you are measuring DIFFERENT things? With DIFFERENT instruments? it is not 1,000 ‘trials’ of the same thing, but 1000 measurements of 1000 things with 1000 mutually variable instruments. Then it isn’t quite so clear….
Each of the 1000 things measured is measured only once. You have 11.x +/- 0.1 on it (say) and that is ALL the precision you have for that THING. Taking 1000 different things and finding the average of their measurements WILL tell you an ever more precise report of that “mathematical average”, but that number will NOT be closer to the actual correct average measurement. Each object had only ONE trial. Has only ONE error band. Was done with ONE instrument with an unknown bias. Again, the problem of systematic error comes into it. You don’t know if 1/2 the people were measuring low by 0.1 and the other half measuring low by 0.8 (so in any case you will report low). You can NOT get to a 0.01 accuracy from averaging a bunch of things that are all more than that much in error to the downside.
Yes, the probability of it is low, but it still exists as a possible (until it is proven that no systematic error or bias exists – which typically can not be done as there are ‘unknown unknowns’…) Again, using the same instrument to make 1000 measurements removes the error in the process of measuring (unless a systematic bias of the instrument or the observer). Also using 1000 different instruments removes the instrument error of randomly distributed instrument errors.
But doing 1000 different measurements with 1000 different instruments: Each measurement has ONE error. Each instrument has ONE bias. You are assuming that these will all just magically ‘average out’ by being randomly distributed and that is not known.
A good example of this is calorimetry. Oddly, that is exactly what folks are trying to do with heat gain of the planet. A very lousy kind of calorimetry. What is THE sin in calorimetry? Screwing around with the apparatus and thermometers once it is running. We were FORBIDDEN to change thermometers mid-run or to move the apparatus around the room. What do we do in climate / temperature measuring? Randomly change thermometers, types, and locations.
ALL of them introducing “splice errors” and other systematic instrument errors that are often unrecognized and uncorrectable. Just the kinds of errors that averaging will NOT remove.
The basic problem is simple, really: If you do multiple trials you can reduce the error as long as the errors are random. If all you have is ONE trial, while you can find an amusing number (the average of the SAMPLE DATA) to a higher precision, that is NOT indicative of a higher accuracy in the actual value. (Due to those potentials for systematic errors).
Now, for temperatures, there is another even worse problem: Intrinsic vs extrinsic properties.
That’s a fancy way to say the air is never the same air twice ( or you can never cross the same river twice). So you can only EVER have a sample size of ONE.
This is often talked about as an ‘entropy’ problem, or a problem with UHI, or with siting, or… but it all comes down to the same thing: Other stuff changes, so the two temperatures are not comparable. Thus not averageable.
One example is pretty clear. If you have 2 pots of water, one at 0 C and the other at 20 C and mix them, what is the resulting temperature?
You can not know.
IFF the two contain water of the same salinity, have the same MASS, and the 0 C water is not ice; then you could say the resulting temperature was 10 C. But without the added data, you simply get a non-sense number if you average those two temperatures. And no amount of average that in with other results can remove that error.
Basically, you can average the HEAT (mass x temperature x specific heat ) adjusted for any heat of vaporization or fusion (melting). But just averaging the temperatures is meaningless BY DEFINITION.
We assume implicitly that the air temperature is some kind of “standard air” or some kind of “average air”; but it isn’t. Sometimes it has snow in it. The humidity is different. The barometric pressure is different. (So the mass / volume changes).
For Argo buoys, we have ocean water. That’s a little bit better. But we still have surface evaporation (so that temperature does not serve as a good proxy for heat at the surface as some left via evaporation), we have ice forming in polar regions, and we have different salinities to deal with. Gases dissolve, or leave solution. A whole lot of things happen chemically in the oceans too.
So take two measurements of ocean temperature. One at the surface near Hawaii, the other toward the pole at Greenland. Can you just average them and say anything about heat, really? Even as large ocean overturning currents move masses of cold water to the top? As ice forms releasing heat? (Or melts, absorbing it)? How about a buoy that dives through the various saline layers near the Antarctic. Is there NO heat impact from more / less salt?
Basically, you can not do calorimetry with temperature alone, and all of “Global Warming Climate Science” is based on doing calorimetry with temperatures alone. A foundational flaw.
It is an assumption that the phase changes and mass balances and everything else just “average out”, but we know they do not. Volcanic heat additions to the ocean floor CHANGE over time. We know volcanoes have long cycle variation. Salinity changes from place to place all over the ocean. The Gulf Stream changes location, depth, and velocity and we assume we have random enough samples to not be biased by these things.
Yet temperatures can not be simply averaged and say anything about heat unless you know the mass, phase, and other properties have not changed. Yet we know they change.
And no amount of calculating an average removes those errors. Ever.
Some useful info here on why they went for a 3 degree x 3 degree 3,300 float array in water depths greater than 2,000 meters. They say the floats will not clump, not so sure about that myself.
http://www.argo.ucsd.edu/argo-design.pdf
James says:
January 26, 2012 at 11:27 pm (Edit)
James, I told the story above in part so people wouldn’t make the mistake you just made, and assume I was some statistics newbie. I know everything you have just said, and more. Do you think I could write a computer program to analyze a million hands and not know about the law of large numbers, and Student’s Distribution, and the Central Limit Theorem, and the like? Maybe somebody might do that for some theoretical problem, but these guys had real money riding on my calculations. I had to know how accurate they were.
There are about 3000 Argo floats out there. Each one makes 3 temperature profiles a month. That’s 108,000 observations per year. From that they get an average, and an error. They are claiming an error of ± 0.004°C, four thousandths of a degree.
Since a decrease by a factor of 100 in N gives us one less decimal, this means that from 1,080 observations we should be able to get an error of ± 0.04, four hundredths of a degree.
So before you lecture me on my supposed lack of knowledge and before claiming that 1,080 measurements is a “small sample”, perhaps you could let me know—is there a problem with my math or not?
All the best,
w.
Willis,
This problem is impossible to solve and any accuracy claimed makes no sense. If you run a chemical reaction, first thing you do is to stir the solution and achieve maximum mixing to avoid local ‘over heating’. If you then want to achieve certain temperature inside the flask, you need external heating but it is almost impossible to control external heating input to maintain exact temperature inside the flask. To solve that problem, you find the solvent that boils at the temperature that you need, and by keeping the solvent refluxing inside you control the temperature at the same point. When you are working in the ‘open systems’, like the oceans or the air temperature you have all sorts of problems and the only way to treat the data is to treat each individual buoy as the ‘local’ and individual temperature profile. If all individual profiles point to the same direction, then you have the ‘global’ trend, if not, you don’t. That is the same reason why there is no such a thing as ‘global temperature’, but network of huge number of local temperature patterns (we are talking here about air temperature as detected by the thermometer devices). Any trend analysis that tries to average all those individual patterns into a single number are useless exercise since they have nothing to do with the physical reality – if the physical reality is suppose to be the object of the exercise.
Please remember that the buoys are only the ‘messengers’ for which we know accuracy from the manufacturer, and they record the temperature as they suppose to do. It is so called scientists who are trying to interpret what the instrument is detecting that are wrong.