Can We Tell If The Oceans Are Warming?

Guest Post by Willis Eschenbach

Well, I was going to write about hourly albedo changes, honest I was, but as is often the case I got sidetractored. My great thanks to Joanne Nova for highlighting a mostly unknown paper on the error estimate for the Argo dataset entitled On the accuracy of North Atlantic temperature and heat storage fields from Argo by R. E. Hadfield et al., hereinafter Hadfield2007. As a bit of history, three years ago in a post entitled “Decimals of Precision” I pointed out inconsistencies in the prevailing Argo error estimates. My calculations in that post showed that their claims of accuracy were way overblown.

The claims of precision at the time, which are unchanged today, can be seen in Figure 1(a) below from the paper Observed changes in top-of-the-atmosphere radiation and upper-ocean heating consistent within uncertainty, Norman G. Loeb et al, paywalled here, hereinafter Loeb2012

 

loeb ocean heating rates

Figure 1. This shows Fig. 1(a) from Loeb2012. ORIGINAL CAPTION: a, Annual global averaged upper-ocean warming rates computed from first differences of the Pacific Marine Environmental Laboratory/Jet Propulsion Laboratory/Joint Institute for Marine and Atmospheric Research (PMEL/JPL/JIMAR), NODC, and Hadley, 0–700m

I must apologize for the quality of the graphics, but sadly the document is paywalled. It’s OK, I just wanted to see their error estimates.

As you can see, Loeb2012 is showing the oceanic heating rates in watts per square metre applied over each year. All three groups report about the same size of error. The error in the earliest data is about 1 W/m2. However, the size of the error starts decreasing once the Argo buoys started coming on line in 2006. At the end of their record all three groups are showing errors well under half a watt per square metre.

 

loeb toa flux ocean heatingFigure 2. This shows Fig. 3(a) from Loeb2012. Black shows the available heat for storage as shown by the CERES satellite data. Blue shows heating rates to 1800 metres, and red shows heating rates to 700 metres. ORIGINAL CAPTION: a, Global annual average (July to June) net TOA flux from CERES observations (based on the EBAF-TOA_Ed2.6 product) and 0–700 and 0–1,800m ocean heating rates from PMEL/JPL/JIMAR

Here we see that at the end of their dataset the error for the 1800 metre deep layer was also under half a watt per square metre.

But how much temperature change does that half-watt per square metre error represent? My rule of thumb is simple.

One watt per square metre for one year warms one cubic metre of the ocean by 8°C

(Yeah, it’s actually 8.15°C, but I do lots of general calcs, so a couple of percent error is OK for ease of calculation and memory). That means a half watt for a year is 4°C per cubic metre.

So … for an 1800 metre deep layer of water, Loeb2012 is saying the standard error of their temperature measurements is 4°C / 1800 = about two thousandths of a degree C (0.002°C). For the shallower 700 metre layer, since the forcing error is the same but the mass is smaller, the same error in W/m2 gives a larger temperature error of 4°C / 700, which equals a whopping temperature error of six thousandths of a degree C (0.006°C).

I said at that time that this claimed accuracy, somewhere around five thousandths of a degree (0.005°C), was … well … highly unlikely.

Jo Nova points out that curiously, the paper was written in 2007, but it got little traction at the time or since. I certainly hadn’t read it when I wrote my post cited above. The following paragraphs from their study are of interest:

 

ABSTRACT:

Using OCCAM subsampled to typical Argo sampling density, it is found that outside of the western boundary, the mixed layer monthly heat storage in the subtropical North Atlantic has a sampling error of 10–20 Wm2 when averaged over a 10 x 10 area. This error reduces to less than 10 Wm2 when seasonal heat storage is considered. Errors of this magnitude suggest that the Argo dataset is of use for investigating variability in mixed layer heat storage on interannual timescales. However, the expected sampling error increases to more than 50 Wm2 in the Gulf Stream region and north of 40N, limiting the use of Argo in these areas.

and

Our analysis of subsampled temperature fields from the OCCAM model has shown that in the subtropical North Atlantic, the Argo project provides temperature data at a spatial and temporal resolution that results in a sampling uncertainty in mixed layer heat storage of order 10–20 Wm−2. The error gets smaller as the period considered increases and at seasonal [annual] timescales is reduced to 7 ± 1.5 Wm−2. Within the Gulf Stream and subpolar regions, the sampling errors are much larger and thus the Argo dataset will be less useful in these regions for investigating variability in the mixed layer heat storage.

Once again I wanted to convert their units of W/m2 to a temperature change. The problem I have with the units many of these papers use is that “7 ± 1.5 Wm−2” just doesn’t mean much to me. In addition, the Argo buoys are not measuring W/m2, they’re measuring temperatures and converting them to W/m2. So my question upon reading the paper was, how much will their cited error of “7 W/m2″ for one year change the temperature of the “mixed layer” of the North Atlantic? And what is the mixed layer anyhow?

Well, they’ve picked a kind of curious thing to measure. The “mixed layer” is the top layer of the ocean that is mixed by both the wind and by the nightly overturning of the ocean. It is of interest in a climate sense because it’s the part of the ocean that responds to the changing temperatures above. It can be defined numerically in a number of ways. Basically, it’s the layer from the surface down to the “thermocline”, the point where the ocean starts cooling rapidly with depth. Jayne Doucette of the Woods Hole Oceanographic Institute has made a lovely drawing of most of the things that go in the mixed layer. [For unknown reasons she’s omitted one of the most important circulations, the nightly overturning of the upper ocean.]

 

Jayne Doucette mixed layer WHOIFigure 3. The mixed layer, showing various physical and biological process occurring in the layer.

According to the paper, the definition that they have chosen is that the mixed layer is the depth at which the ocean is 0.2°C cooler than the temperature at ten metres depth. OK, no problem, that’s one of the standard definitions … but how deep is the mixed layer?

Well, the problem is that the mixed layer depth varies by both location and time of year. Figure 4 shows typical variations in the depth of the mixed layer at a single location by month.

 

monthly mixed layer depthFigure 4. Typical variations of the depth of the mixed layer by month. Sorry, no provenance for the graph other than Wiki. Given the temperatures I’m guessing North Atlantic. In any case, it is entirely representative of the species.

You can see how the temperature is almost the same all the way down to the thermocline, and then starts dropping rapidly.

However, I couldn’t find any number for the average mixed layer depth anywhere. So instead, I downloaded the 2°x2° mixed layer depth monthly climatology dataset entitled “mld_DT02_c1m_reg2.0_Global.nc” from here and took the area-weighted average of the mixed layer depth. It turns out that globally the mixed layer depth averages just under sixty metres. The whole process for doing the calculations including writing the code took about half an hour … I’ve appended the code for those interested.

Then I went on to resample their 2°x2° dataset to a 1°x1° grid, which of course gave me the same answer for the average, but it allowed me to use my usual graphics routines to display the depths.

 

average mixed layer depthFigure 5. Average mixed layer depth around the globe. Green and blue areas show deeper mixed layers.

I do love climate science because I never know what I”ll have to learn in order to do my research. This time I’ve gotten to explore the depth of the mixed layer. As you might imagine, in the stormiest areas the largest waves mix the ocean to the greatest depths, which are shown in green and blue. You can also see the mark of the El Nino/La Nina along the Equator off the coast of Ecuador. There, the trade winds blow the warm surface waters to the west, and leave the thermocline closer to the surface. So much to learn … but I digress. I could see that there were a number of shallow areas in the North Atlantic, which was the area used for the Argo study. So I calculated the average mixed layer depth for the North Atlantic (5°N-65°N, 0°W-90°W. This turns out to be 53 metres, about seven metres shallower than the global average.

Now, recalling the rule of thumb:

One watt per square metre for one year raises one cubic metre of seawater about eight degrees.

Using the rule of thumb with a depth of 53 metres, one W/m2 over one year raises 53 cubic metres (mixed layer depth) of seawater about 8/53 = .15°C. However, they estimate the annual error at seven W/m2 (see their quote above). This means that Hadfield2007 are saying the Argo floats can only determine the average annual temperature of the North Atlantic mixed layer to within plus or minus 1°C …

Now, to me that seems reasonable. It is very, very hard to accurately measure the average temperature of a wildly discontinuous body of water like oh, I don’t know, say the North Atlantic. Or any other ocean.

So far, so good. Now comes the tough part. We know that Argo can measure the temperature of the North Atlantic mixed layer with an error of ±1°C. Then the question becomes … if we could measure the whole ocean with the same density of measurements as the Argo North Atlantic, what would the error of the final average be?

The answer to this rests on a curious fact—assuming that the errors are symmetrical, the error of the average of a series of measurements, each of which has its own inherent error, is smaller than the average of the individual errors. If the errors are all equal to say E, then if we are averaging N items each of which has an error E, the error scales as

sqrt(N)/N

So for example if you are averaging one hundred items each with an error of E, your error is a tenth of E [ sqrt(100)/100 ].

If the 118 errors are not all equal, on the other hand, then what scales by sqrt(N)/N is not the error E but

sqrt(E^2 + SD^2)

where SD is the standard deviation of the errors.

Now, let’s assume for the moment that the global ocean is measured at the same measurement density as the North Atlantic in the study. It’s not, but let’s ignore that for the moment. Regarding the 700 metre deep layer, we need to determine how much larger in volume it is than the volume of the NA mixed layer. It turns out that the answer is that the global ocean down to 700 metres is 118 times the volume of the NA mixed layer.

Unfortunately, while we know the mean error (7 W/m2 = 1°C), we don’t know the standard deviation of those errors. However, they do say that there are many areas with larger errors. So if we assumed something like a standard deviation of say 3.5 W/m2 = 0.5°C, we’d likely be conservative, it may well be larger.

Putting it all together: IF we can measure the North Atlantic mixed layer with a mean error of 1° C and an error SD of 0.5°C, then with the same measurement density we should be able to measure the global ocean to

sqrt(118)/118 * sqrt( 1^2 + 0.5^2 ) = 0.1°C

Now, recall from above that Loeb2012 claimed an error of something like 0.005°C … which appears to be optimistic by a factor of about twenty.

And my guess is that underestimating the actual error by a factor of 20 is the best case. I say this because they’ve already pointed out that “the expected sampling error increases to more than 50 Wm2 in the Gulf Stream region and north of 40N”. So their estimate doesn’t even hold for all of the North Atlantic

I also say it is a best case because it assumes that a) the errors are symmetrical, and that b) all parts of the ocean are sampled with the same frequency as the upper 53 metres of the Mediterranean. I doubt if either of those is true, which would make the uncertainty even larger.

In any case, I am glad that once again, mainstream science verifies the interesting work that is being done here at WUWT. If you wonder what it all means, look at Figure 1, and consider that in reality the errors bars are twenty times larger … clearly, with those kinds of errors we can say nothing about whether the ocean might be warming, cooling, or standing still.

Best to all,

w.

PS: I’ve been a bit slow writing this because a teenage single mother and her four delinquent children seem to have moved in downstairs … and we don’t have a downstairs. Here they are:

CUSTOMARY REQUEST: If you disagree with someone, please quote the exact words you find problems with, so that all of us can understand your objection.

CODE: These days I mostly use the computer language “R” for all my work. I learned it a few years ago at the urging of Steve McIntyre, and it’s far and away the best of the dozen or so computer languages I’ve written code in. The code for getting the weighted average mixed layer depth is pretty simple, and it gives you an idea of the power of the language.

# specify URL and file name -----------------------------------------------

mldurl="http://www.ifremer.fr/cerweb/deboyer/data/mld_DT02_c1m_reg2.0.nc"

mldfile="Mixed Layer Depth DT02_c1m_reg2.0.nc"

# download file -----------------------------------------------------------

download.file(mldurl,mldfile)

# extract and clean up variable ( 90 rows latitude by 180 colums longitude by 12 months)

nc=open.ncdf(mldfile) 

mld=aperm(get.var.ncdf(nc,"mld"),c(2,1,3)) #the “aperm” changes from a 180 row 90 col to 90 x 180

mld[mld==1.000000e+09]=NA # replace missing values with NA

# create area weights ------------(they use a strange unequal 2° grid with the last point at 89.5°N)

latline=seq(-88,90,2)

latline[90]=89.5

latline=cos(latline*pi/180)

latmatrix2=matrix(rep(latline,180),90,180)

# take array gridcell averages over the 12 months 

mldmap=rowMeans(mld,dims = 2,na.rm = T)

dim(mldmap) #checking the dimensions of the result, 90 latitude x 180 longitude

[1]  90 180

# take weighted mean of gridcells 

weighted.mean(mldmap,latmatrix2,na.rm=T)

[1] 59.28661

Advertisements

265 thoughts on “Can We Tell If The Oceans Are Warming?

  1. Just so we’re all on the same page: If their sampling errors are different at different places then we are no longer talking about random variances in the production of the floats. We are talking about variations in the water temperature that aggravate the float. That is, the floats accuracy is vastly worse than stated under real world conditions. And we’re only getting under 50w/m^2 based on ‘unfavourable’ local conditions.
    So we can’t take the float fleet as a set and reduce their errors based on the multiplicity of them existing in different places and measuring different depths at different times. That’s the no brainer. But because it is also the case that it is the temporally changing environmental conditions of a given float that are inducing these errors, then we cannot even average down the errors for a single float. Not based on the design and calibration of the float. To do so we would necessarily have to have the same float making repeat measurements at depth in the same conditions. And we simply do not have that occurring.
    So if we’re talking about a design tolerance — appropriately converted however — of 7.5 w/m^2 then it says nothing about the error under real world conditions. If I assume the various floats are being audited and verified for sanity checks, then we cannot state that any float has a better known accuracy of 50w/m^2 until and unless it has been independently validated.
    But since this is all purely observational skew — and thus purely correlative — it does not establish that we will continue to have any given audited amount of error at a specific location. To know that we’d have to know the conditions that lead to these errors and then go through the model, experiment cycle until we knew what we had to know.
    Even if that was simply that we needed to build new buoys with additional instrumentation installed such that we could overcome the errors being induced.

    • I was quite dissapointed when I saw this article. The whole thing is based on an fundamental error. You simply can not use sqrt(N)/N to calculate the error as described by numerous comments here. This should be crystal clear for everyone.
      I have been wondering why sceptics have swalloved the claimed error of 0,005 C in ARGO data so easily. And I must say I am not satisfied to find out the reason being even prominent sceptics have no basic knowlegde of error calculations.
      The good thing is it is easy to make things better. Now you should start asking proper error estimations for ARGO and any other data sets. If you don’t know the error of your measurements your data is pretty much useless. For Argo data you can do the calculations yourself ( or maybe better hire a professional ) since raw data is available.

      • Naalin Ana commented on Can We Tell If The Oceans Are Warming?.
        in response to Jquip:

        ( or maybe better hire a professional ) since raw data is available.

        Hire a professional? Maybe a professional might offer to help Pro Bono, as I don’t think Willis is getting paid, I know I’m not getting paid.

  2. I tried to execute your R code but the statement nc=open.ncdf(mldfile)
    produces the following error:
    Error in R_nc_open: Invalid argument
    Error in open.ncdf(mldfile) :
    Error in open.ncdf trying to open file Mixed Layer Depth DT02_c1m_reg2.0.nc
    I thank you in advance for your suggestions.

    • You should verify that the file, mldfile, was properly downloaded from the url, mldurl.

    • I think the file needs to be downloaded in binary mode
      download.file(mldurl,mldfile,mode=”wb”)
      I think it also need a
      library(“ncdf”)
      command (you must have figured that). And of course. to have the package ncdf installed.

      • Thanks for the clarification, Nick. Regarding the download, if it won’t download for some reason you can always just download it manually.
        And yes, you do need the
        library(ncdf)
        for it to work.
        w.

  3. Willis
    Always a joy to read your independent approach. After reading Jo Nova’s post, I checked out the 9 papers that cited Hadfield. One of them had its own estimate of the variance (0.05 oC)^2, the square root of whicdh works out to about 0.22 oC not far from your 0.1 oC. Another paper referenced by one of the 9 was very close to the Hadfield estimate of about 0.5C, at 0.48 C. So we have three papers and now your estimate, all in the range of 0.1-0.5 C.
    If readers are interested, I made the two papers available on the public Dropbox link (see my comment at Jo Nova–#39).

    • Thanks, Lance. Let me note that the “Hadfield estimate of about 0.5C” actually is the following:

      Agreement between the hydrographic and Argo-based temperature fields to within 0.5°C was typically found in the eastern basin with higher differences in the western basin, particularly within the boundary current where errors exceed 2°C.

      In fact, although both you and Jo Nova seem to think that the Hadfield estimate is 0.5°C, that actually was not their estimate of the error in the Argo temperature field. Instead, it is a somewhat vague description of the difference between their cruise transect data and the Argo data for the same line of locations.
      To get the error for an actual volume you need to take their results (which only relate to a single line through the ocean) and expand them to the 3-D mixed layer. The results of that analysis were reported by the authors as follows:

      Using OCCAM subsampled to typical Argo sampling density, it is found that outside of the western boundary, the mixed layer monthly heat storage in the subtropical North Atlantic has a sampling error of 10–20 Wm2 when averaged over a 10° x 10° area. This error reduces to less than 10 Wm2 when seasonal heat storage is considered.

      They later specify more exactly that the annual sampling error is 7 W/m2.
      As far as I can see this is the only area-averaged error data in the study. It is well reported in that they’ve given the sampling error, the values for two different time frames (monthly and annual), and the area involved. Once I calculated the depth, it allowed me to convert their error (7 W-years/m2) into a temperature error of 1°C. Note that this 1°C is different from your reported “Hadfield estimate of about 0.5C” because the 1°C is measuring a different thing. It is measuring the annual error in a 10°x10°x53 metres block of the North Atlantic.
      Finally, let me say that the claim that there is an “Argo error” of a certain size is inadequately specified. As my head post shows, the same density of measurements leads to a 1°C error in the North Atlantic mixed layer, but would give a global 0-700 metre error of about a tenth of that. Which one of those is the “Argo error”, 1°C or 0.1°C?
      Without specifying the area (global or some defined sub-area) and the depth (0-700 metres, 0-53 metres, or …) and the time interval (month, year), the error is inadequately defined, and as such is without real meaning.
      As a result, I would say that your claim over at Jo Nova’s that

      So I think you can triple the number of papers reporting on errors in ARGO, and they actually are not far apart: 0.5, 0.48, 0.22.

      is comparing apples, oranges, and potatoes …
      Best regards,
      w.

      • “Now, recall from above that Loeb2012 claimed an error of something like 0.005°C … which appears to be optimistic by a factor of about twenty.”
        Because the matter has been raised before about the ability of the ARGO instrument itself, I can report the following:
        Yesterday I spent a few hours calibrating Resistance Temperature Detectors (RTD’s) and observed their drift relative to each other in a bucket of room temperature water. These are high end 4-wire RTD’s and are being read using an instrument that reads 0-100.0000 millivolts. The RTD is the type of device that is in the ARGO. The claimed 1-year best range accuracy is 0.06 C. The repeatability is I think within 0.03.
        Findings:
        The two RTD’s I tested were initially different by 0.105 degrees for which I entered an offset correction to one to bring them in line. The variability, one relative to the other, was almost always within 0.004 Deg C. That supports the contention that they are accurate to 0.01 degrees and undermines the claim that anything floating around in the ocean can read to a precision of 0.002 which is the minimum requirement to secure an believable precision of 0.005 (assuming they are calibrated). No way. First the devices are not precise enough and second they are not accurate enough. The ARGO instrumentation is not claimed to be more precise than 0.01 and the 1-year accuracy is probably not better than 0.06.
        A note about the claims that adding up a lot of measurements and calculating the average ‘increases the accuracy’. That only applies to multiple measurement of the same thing. The ARGO Floats are not measuring the same thing, they measure different things each time. Each measurement stands alone with its little 0.06 error bar attached. If there were thousands of measurements of the same location, the centre of the error band is known with great precision, but the error range is inherent to the instrument. It remains 0.06 degrees for any and all measurements. Knowing where the middle of the error bar is, does not tell us where within that range the data point really lies.
        And speaking of lies: “…an error of something like 0.005°C.”

      • “A note about the claims that adding up a lot of measurements and calculating the average ‘increases the accuracy’. That only applies to multiple measurement of the same thing.”
        Exactly right!
        There was a discussion a few months ago in which I seemed to be the only one who was willing to recognize that measurements taken at different times and places cannot give an accuracy, when averaged, greater than the accuracy of each reading.
        This can be easily proven by a simple thought experiment.
        And assuming that disparate measurements can be treated using the same statistical methods as repeat measurements of the same thing, seems to be done over and over again in climate science,.
        The entire subject of precision and accuracy is played fast and loose in climate studies.

      • Crispin: There are climatologists implying (without saying it openly) that if you measure a temperature with 1,000 thermometers with an error bound of 1 degree you get an overall accuracy of 0.001 degree.
        Climatology is definitely a post-normal science.

      • Curious George:
        Well, what they really mean is that they know, using statistical analysis, where the middle of the error range is. The innumerate public upon which CAGW relies has little idea about things like this. If the measuring device has an error of +-1 degree, each measurement is known to +- 1 degree. The reason we can’t know the ocean temperature to 0.005 C is that the ARGO measurements are not made in the same locale.
        I use computer logged scales a lot. Let’s say we know the number is going to transmitted from the read head to the nearest gram and that it is varying – like the “Animal” function of a platform scale which can weigh a live, moving animal. We read it 100 times per second and average the readings. That is a completely different ‘thing’ from getting 100 measurements from 100 scales with 100 different objects on them. Remember these ocean temperatures are used to calculate the bulk heat content, not the ‘average temperature’. Finding the average temperature would require making well-spaced measurements in 3D. Is anyone claiming to have done that? Do they average the temperatures first or calculate the heat content per reading? It matters. Volume +-1% x temp +-1% cannot give a heat content +-0.1%. Adding up 1000 such results does not give a total +-0.1%.
        Suppose the scale I am reading 100 times is actually jiggling slightly and the voltage from the sender is slightly variable (because I am reading the signal more precisely, ‘down in the mud’). Getting 40 readings of 1000 g and 60 readings of 1000.1 g from a 1 g resolution scale tells me that the mass is very likely to be 1000.6 g with a high level of confidence (a real confidence, not an IPCC opinion). The read head does not have to be set up to give one gram readings to do this. The numbers might only be ‘certifiable’ at a 5 g resolution, but if I have access to the raw voltage such little sums can be done. The ‘certified’ accuracy of 5 g is for every single measurement reported. That is a very different situation. I want many ‘opinions’ of a single mass, or a varying mass or single temperature and then will use normal methods to calculate the centre point of the range and a StD and CoV.
        In order to take ocean measurements and use them, it must be remembered that each reading ‘stands alone’. There is no way to get even a second opinion of each. Each must therefore be treated according to their certified accuracy (akin to the 5 g certification) and used to calculate the heat content of the local water.
        Yes they are measuring ‘the same ocean’ but the numbers are used to show local variation. Averaging them and claiming both to have considered local spatial variation and to have improved the accuracy is ‘not a valid step’.
        There are lots of analogies. Measure the temperature in 1000 homes within 1 degree. What is the average temperature in all homes and and what is the accuracy of that result? If you measured 5000 homes to within 5 degrees, do you get the same results? Now measure 100 randomly selected homes to within 1 degree and estimate the temperature of the other 900. How good are those numbers? +-0.1 degrees? I think not.
        Lastly, use 4000 available numbers not randomly distributed (ARGO floats) and estimate the temperature at all other possible positions a float might be located. Calculate the bulk temperature of the ocean. What is the result and what is the accuracy? Ten times better than any one reading? Calculate the heat content of the ocean. Ditto.
        Lovely sources: Darrell Huff, How to Lie With Statistics (W.W. Norton, 1954), Chapter 4: “Much Ado about Practically Nothing”, pp. 58-59 which is quite apropos. I found it at the webpage
        http://www.fallacyfiles.org/fakeprec.html which cites T. Edward Damer, Attacking Faulty Reasoning: A Practical Guide to Fallacy-Free Arguments (Third Edition) (Wadsworth, 1995), pp. 120-122: “Fallacy of Fake Precision”
        and
        David Hackett Fischer, Historians’ Fallacies: Toward a Logic of Historical Thought (Harper & Row, 1970), pp. 61-62: “The fallacy of misplaced precision”.
        Alias: Fake Precision / False Precision / Misplaced Precision / Spurious Accuracy
        Taxonomy: Logical Fallacy > Informal Fallacy > Vagueness > Overprecision

  4. Your quandary with the standard deviation of the mean error (“Unfortunately, while we know the mean error (7 W/m2 = 1°C), we don’t know the standard deviation of those errors.”) prompted me to search in vain for any association between sea surface temperatures, errors, and the Poisson statistical distribution.
    http://ds.data.jma.go.jp/tcc/tcc/library/MRCS_SV12/figures/2_sst_norm_e.htm
    http://www.pmel.noaa.gov/pubs/outstand/haye1160/surfaceh.shtml
    “The mean latent heat flux is approximately 45 W m , which again agrees with Reed’s result, and the standard deviation is about 12 W m.”
    There was a method to my madness and perhaps a more exhaustive search might be fruitful. The Poisson distribution has an interesting property that its variance is equal to its mean. And standard deviation is the square root of variance. With the Poisson distribution, if you know the mean, you know the standard deviation.
    Also, a crawl space is a nice enough “downstairs” for your house guests, not much more annoying than bats upstairs in a cabin I rented.

    • Neil, you say:

      There was a method to my madness and perhaps a more exhaustive search might be fruitful. The Poisson distribution has an interesting property that its variance is equal to its mean. And standard deviation is the square root of variance. With the Poisson distribution, if you know the mean, you know the standard deviation.

      Dang, I’d forgotten that. The distribution of errors is at least pseudo-Poisson. So 1 W/m2 would be the best-guess variance, and of course it would also be the standard deviation. That would increase my estimate by sqrt(2)/sqrt(1.25), which is about 25% larger.
      w.

      • you could randomly sample Argo and very likely arrive at the normal distribution due to the CLT.
        for example, rather than trying to average every argo in a grid, then averaging all the grids, sample them randomly instead. Pick the grids randomly and pick a float randomly in each grid. Repeat for each time slice (perhaps hourly?). Then average these to compute your trends.
        what you will have afterwards should be normal, allowing you to do all sorts of argo analysis that climate science hasn’t yet dreamed of. And more importantly, your average should be much more reliable, as should your variance.
        averaging averages is statistical nonsense dreamed up by climate scientists. it smears the average and hides the variance.

      • @ferd
        I don’t think it’s as easy as you make it sound. As I recall, the floats run their measurement cycle only 3 times per month (actually every 10 days). You are unlikely to get much in the way of coordinated measurement, and certainly not hourly.

    • Crispin…
      I agree with you. Multiple measurements increasing the accuracy only applies when you are measuring the same thing! So the only thing that is varying is the measurement, not what is being measured.

  5. So if I am interpreting correctly the error in measurement of temperatures from Argo are about 1C and the estimated temperature change in sea water temperature is on the order of about 0.1C then we can say nothing about the average temperature of the upper seal level ~ 60 m deep world wide, So we don’t know whether the ocean in that mean depth of water is increasing decreasing, or not changing.
    Thus, we don’t know how the temperature of the upper ocean waters behave and we will not, until an order of magnitude or more reduced uncertainty of the temperature measuring device(s) has been achieved and we have measured a sufficient number of years in a spatial density sufficient to compute an average temperature change of sea water to ~60 m in all the seawater around the world.

  6. My understanding is that “sqrt(N)/N” only applies where the data is homogeneous. That is, it would apply if multiple measurements were being made of the temperature of one sample (say a cubic metre) of ocean. It does not apply where data is heterogeneous, such as in multiple measurements of entirely different bits of ocean. If it did apply we could have millions of boat owners stick a finger in the water and with enough measurements we’d get the error of finger-temperature-estimates down to thousandths of a degree.

    • I would expect that any single reported measurement, was actually multiple samples taken over a short period of time, like a few hundred to thousands over a 1 sec period(actually 1Hz). But you would need to look at the ARGO design spec to know exactly what that was.
      At the point where you want a measurement, it too easy to take a bunch of samples for your measurement, and the design team would have to know about the issues with a single sample, and account for it (well they might not, but boy would that be dumb).
      I found this spec for one of the buoy designs.

      1Hz sample rate, averaged over 1-2 dBar, 4.3Kbytes/profile (ARGO program requirements)

    • I think it is sufficient that the measurand is well defined and that it can be characterized by an essentially unique value. Ref. ISO Guide to the expression of uncertainty in measurement. Section 1.2 and 4.2.3.
      I further believe that the average temperature of a defined volume of water over a defined period is sufficiently well defined in this respect. The sampling should be random and representative of the volume and the period. The standard uncertainty of the average value can than be calculated as the standard deviations of all your measurements divided by the square root of the number of measurements.
      This however does not provide you any information about the variability of your measurand over time, or the average value for another volume. The only thing it provides you information about is the standard uncertainty of the average value for the defined volume over the defined period. If you randomly divides your measurements in two groups, and calculate the average value and the standard uncertainty for each of these two groups, the standard uncertainty of the average value provides you information about how large difference you can expect between the two average values.

  7. Yes, the water is moving. The floats are moving as well. I do not see how an average latitude bias would not, over time, be introduced.
    Also I do not see how there would not be a tendency for the floats to gather in certain ocean currents and locations; also potentially producing a systemic bias.
    Are either of these factors adjusted for, and how in the hell could one do that adjustment?

    • another good question to me…
      well from my point of view an error bar undercondition is an underestimated error bar.
      But” we don’t know” seems to be for some reasons unacceptable .

      • Indeed Of course fortunately our land based stations are not moving nearly as much.
        So, in our infinite wisdom, (sarc) we change stations, eliminate long running stations, and homogenize data up to 1200 K away.

      • Thanks Willis, yet your answer confuses me;
        ———-
        “Here’s an analysis I did a couple of years ago showing the number of samples per 100,000 sq. km. over the duration of the Argo program.”
        =========================
        is this the total measurements over the duration of the program, and if so how does that illustrate how they have changed relative to time; I.E. the first year verses the last year.

      • David A, that’s total Argo profiles up to the date of the analysis, from memory 2013. It says nothing about how distribution has changed over time.
        w.

      • “As you can see, the distribution is pretty even.”
        I suppose this depends on what one considers “pretty even”.
        Anywhere between 0 and >96 measurements per 10k sq.km.?
        And large patches and latitudes showing a lot of clumpiness.
        And says nothing about temporal distribution.
        The harder I look, the worse it seems.
        In fact, I can honestly say IT IS WORSE THAN I THOUGHT!
        Loving them satellites more every day.

      • Willis,
        Would it perhaps be possible to extract an independent estimate of the accuracy by looking at ARGO float pairs that happen to be in each other’s neighborhood? It’d be interesting to see one of your famous scatter plots of average (squared) temperature difference versus distance.
        Frank

      • Willis,

        Would it perhaps be possible to extract an independent estimate of the accuracy by looking at ARGO float pairs that happen to be in each other’s neighborhood? It’d be interesting to see one of your famous scatter plots of average (squared) temperature difference versus distance.
        Frank

        Interesting question, but I don’t think you’d get accuracy out of it. Instead, you’d just be measuring the “correlation distance”, the distance at which the average correlation of temperatures drops below some given level.
        w.

  8. Willis.
    Off topic sorry, but in your albedic meanderings, you mention albedo decreasing with temperature up to 26C, what would be the mechanism for this decrease?

    • One factor would be cloudiness. On average cold seas have more clouds than warm ones (the ITCZ is an exception).
      Another would be the amount of nutrients in the water. Warm water can hold less nutrients in solution than cold. This means much fewer pelagic organisms and consequently clearer, more translucent water that absorb more light.
      Many people think that warm tropical seas are biologically rich. They are not, they are biological deserts. There are parts of the South Central Pacific where the particle count is lower than in lab quality distilled water.

      • Thank you tty.
        I thought it must be something like less clouds, but I hadn’t thought about the clearer water, and I shouldv’e, because I came from the UK and now live in the Philippines.

      • “Murky” water would have higher albedo. Small particles scatter light, and part of it is scattered back towards the surface/sky. And if You have practical experience of the sea you will have noticed that tropical waters are deep blue while oceans at higher latitudes are greenish and distinctly lighter in color.
        Backscattering is lowest in the tropics as you can see here:
        http://oceancolor.gsfc.nasa.gov/cgi/l3
        However it is true that this effect is relatively small at lower latitudes. ERBE showed that the ocean surface albedo only varies between about 0.08-0.13 at low latitudes and only rises above 0.20 north of 50 N and south of 60 S (=south of the Antarctic convergence):
        http://www.eoearth.org/view/article/149954/

      • “Another would be the amount of nutrients in the water. Warm water can hold less nutrients in solution than cold.”
        I believe this is incorrect.
        For solids and liquids, solubility in water generally increases with temperature.
        For gasses, it decreases.
        I believe cold water is often more nutrient laden because it often is the result of upwelling of deep water which has been enriched in nutrients for reasons other than temperature.

      • “There are parts of the South Central Pacific where the particle count is lower than in lab quality distilled water.”
        I think if this is the case you need to find a new source of distilled water.

      • one thing i have noted is for the northern hemisphere colder periods coincide with increased plankton production,which in turn increases biomass of all fish species in the relevant areas. the gadoid outburst is one such episode . these cold periods obviously follow warm periods ,so the question i have is how much energy can massive increases in plankton production consume ?

  9. I’m a lazy fellow at heart and one of the laziest pastimes I have is reading. I love reading science papers, posts and research, because of all the grunt work, measurements, searching for data and stuff that you scientists do. It’s something I could never be bothered to get off my arse to do myself. It’s also why I smile when a warmist asks me “are you a scientist?” Why would I go to that amount of trouble when there are so many people running around measuring and collecting data for me?
    Being a lazy person, I also have an incredible ability to cut through the superfluous and see the fastest and easiest ways to an outcome.
    So onto the topic of oceans. What percentage of water in the ocean is being measured? Answer, a tiny fraction. As a lazy person that’s all I need to know, but hey, if it makes a scientist feel good about themselves and gives purpose to their lives to run around like a headless chicken claiming this or that about its temperature, who am I to spoil their game?
    Is there any information that I can look at that can give me an idea of temperature? As far as I can tell, there is only one. It’s quick and easy and suits lazy people to a tee! It’s also the only thing that has relevance the the global warming scaremongering. That is global sea ice. A lazyman like me can take in the info at a glance of a couple of pictures and go “nope, the ocean isn’t warming”. End of story, no need to send me a cheque!

      • “Sea level is probably a better indication of ocean temperature, if you can remove all the complications associated with it.”
        Piffle.
        There are so many things that affect sea level besides temp, it would be looking for a needle in a ocean of needles.

  10. Tony :June 6, 2015 at 11:21 pm

    My understanding is that “sqrt(N)/N” only applies where the data is homogeneous. That is, it would apply if multiple measurements were being made of the temperature of one sample (say a cubic metre) of ocean. It does not apply where data is heterogeneous, such as in multiple measurements of entirely different bits of ocean.

    This is a important point and has been discussed before in relation to acclaimed 0.005K accuracy of ARGO measurements.
    If you measure the same thing 100 time you divide the uncertainty by 10 ( with all assumptions about the nature of the errrors etc. )
    The fundamental folley is that 3000 ARGO measurements are NOT 3000 measurements of the same thing: the global average temp. or whatever because an average is a statistic ifself, not a measureable quantity.
    BTW Willis, the Met Office have just released the NMAT2 data that Karl et al used of remove the hiatus.
    http://www.metoffice.gov.uk/hadobs/hadnmat2/data/download.html
    Unfortunately for the moment it is only in less accessible NetCDF format.
    Since you are proficient in R maybe you could extract something more useful.
    It is to the credit of M.O that they have been responsive to demands for the data but it is not yet provided in the more readily accessed ASCII formats like thier other datasets.
    Neither is there any global time series that can be compared to Karl’s manipulations.
    http://www.metoffice.gov.uk/hadobs/hadnmat2/data/download.html

    • Thanks Mike.
      I assume that based on: “We know that Argo can measure the temperature of the North Atlantic mixed layer with an error of ±1°C.” … then instead of “…then with the same measurement density we should be able to measure the global ocean to … 0.1°C ” … it should be ±1°C.
      In the same way, having lots of folk put their fingers in the water, doesn’t help with accuracy.

    • Mike I think you posted first but mine is higher up.
      “The fundamental [folly] is that 3000 ARGO measurements are NOT 3000 measurements of the same thing: ”
      Precisely! But these guys have been getting away with an additional misrepresentation: that more precisely locating the centre of the error range with multiple measurements of the same thing (which ARGO’s do not) reduces the error range. It just uses statistics to better locate the mean. These guys are claiming that the error range is reduced, even though it is inherent in the instrument.
      Given that the ARGO measurements are of different things, no such claim can be made for any numbers – they all stand alone. Without even starting to look at the representativeness of the sampling, the instrument itself cannot produce a temperature value to within 0.005 degrees C.
      How does this get past peer review?

      • “Given that the ARGO measurements are of different things, no such claim can be made for any numbers”
        Measurements are taken at least 1 per second (and this spec doesn’t actually mean they aren’t taking multiple submeasurements in the 1 second period ), per ARGO requirements spec.
        Therefore it is not unreasonable to call those multiple measurements of the same thing at least for each reported data point.

      • “Given that the ARGO measurements are of different things, no such claim can be made for any numbers – they all stand alone.”
        So glad to see this being recognized. And it is not just for ARGO umbers…but for all temps measured at all the surface stations.
        This same statistical fallacy is used, or so it seems, to overstate the accuracy of the average global temperature taken from all surface readings.
        I have seen many instances of this.
        They are treated as if they are multiple measurements of the same thing.
        Even if was just different days at the same location, it is still a different quantity being measured, and errors are compounded, not reduced, by adding a bunch of them up.

      • Menicholas
        Be careful with terminology: when things are multiplied the errors are literally compounded. When averaged they are treated differently. One can average readings with different accuracies, but that is not compounding either. A few inaccurate numbers are more like ‘contaminating’ the result. Please see my note above on false precision.

      • micro6500 multiple readings per second:
        I agree that is how they get a number to report. So suppose there were 10 readings taken per 1 second. The float is moving at the time. Well, hardware cannot get a 0.01 degree reading in 0.1 seconds. It literally takes time to get a reading that good – and it is done by taking multiple readings and ‘treating’ them. So we are up against physical limits. How good is the motherboard? How fast can it produce a certifiable 0.01 degree reading? It means getting 0.005 accuracy to report ‘correctly’.
        Ask: Can it do that, that fast? Check really expensive instrument. They will give a reading to 0.01 precision but the number may not be that accurate: Precision 0.01 and accuracy 0.06 (measured against a reference temperature). A $5000 instrument cannot produce 10 x 0.01 +-0.005 readings per second. How good is an ARGO? Someone surprise me! The RTD is not that accurate.
        Oceanographers are using the reported temperature, which is +-0.03 to 0.06. Whatever the numbers are, averaging two locations does not increase the accuracy of the result.

        • Your phone digitizes at least 48khz, could be over 200khz, and in the 80’s there were a2d’s running in the Mhz’s.
          Sampling speed isn’t an issue. The tc could be, but there’s no reason it could not sample multiple times per second, and it could have multiple sensors as well.
          They built to a spec, and had to test to a spec, all of these issues could be accounted for.
          I’m not saying they are, but they could be.

      • “micro6500 June 7, 2015 at 6:43 am

        Measurements are taken at least 1 per second (and this spec doesn’t actually mean they aren’t taking multiple submeasurements in the 1 second period ), per ARGO requirements spec.
        Therefore it is not unreasonable to call those multiple measurements of the same thing at least for each reported data point.”

        It is unreasonable. Time is not the determining factor for uniqueness.
        Each measurement is from different equipment, at a different position, at a different depth, at a different pressure level, under different conditions (clear, storm,…), through different sensors, with different electronics, different wires and possibly different coding for the hardware chipsets.
        This equipment is not managed for consistent quality. It is put to sea, literally, and left for the duration. Every piece of deployed equipment is liable to degrade over time.
        Multiple measurements taken by the same individual piece of equipment might provide a reasonable average for that specific measurement, but as equipment ages while journeying through the oceans surfacing and plummeting through the depths not every measurement offers any assurance of continued fantastic accuracy or precision.
        The equipment accuracy at time of deployment is likely to be the best of it’s lifespan. Without thousands of floating technicians maintaining quality, that quality level is temporary and possibly illusion.
        Simple it is not.

        • ATheoK commented on Can We Tell If The Oceans Are Warming?.
          in response to Crispin in Waterloo but really in Yogyakarta:
          “micro6500 June 7, 2015 at 6:43 am

          Measurements are taken at least 1 per second (and this spec doesn’t actually mean they aren’t taking multiple submeasurements in the 1 second period ), per ARGO requirements spec.
          Therefore it is not unreasonable to call those multiple measurements of the same thing at least for each reported data point.”
          It is unreasonable. Time is not the determining factor for uniqueness.
          Each measurement is from different equipment, at a different position, at a different depth, at a different pressure level, under different conditions (clear, storm,…), through different sensors, with different electronics, different wires and possibly different coding for the hardware chipsets.
          This equipment is not managed for consistent quality. It is put to sea, literally, and left for the duration. Every piece of deployed equipment is liable to degrade over time.
          Multiple measurements taken by the same individual piece of equipment might provide a reasonable average for that specific measurement, but as equipment ages while journeying through the oceans surfacing and plummeting through the depths not every measurement offers any assurance of continued fantastic accuracy or precision.
          The equipment accuracy at time of deployment is likely to be the best of it’s lifespan. Without thousands of floating technicians maintaining quality, that quality level is temporary and possibly illusion.
          Simple it is not.

          First I wasn’t referring to measurements from a different buoy.
          Second, I never said it was simple.
          What I said was that based on mission requirements, there are technical solutions that provide stable repeatable measurements for a long mission lifecycle. I followed that with a couple of methods a single buoy could provide highly reliable temperature measurement, they would include multiple sensors, and multiple sub samples per reported reading.
          Multiple sensors could provide higher accuracy, multiple samples higher precision.
          Lastly, the spec and design doc’s would be key to knowing what was done.

      • Crispin in Waterloo but really in Yogyakarta June 7, 2015 at 5:08 pm
        You bring up a good point about the time constant of the measuring apparatus. All that is being reported is the static accuracy (.005C) Which is believable. They say nothing about the dynamic accuracy.

      • I encourage everyone to get a grip on the difficulty involved I getting a accurate, repeatable measurement to a precision of 0.01 degrees C. It is not done in a flash. Look at the very precise instruments available on the Internet. Look at the frequency at which readings can be obtained and the precision for different time intervals.
        Each ARGO reading made is in a different vertical and spatial position. Each one is presumed to be valid. Very accurate readings take about three seconds. A powerful specialised instrument is possible. Is that what the ARGO has in it? How accurate is the one second reading, if that is the frequency?
        All the specs on how accurate temperature armatures work are in the manuals. There is a lot of thought in it, but nothing floating around the oceans provides ‘averages’ to 0.005 degrees!

      • BTW, Phil Jones uses the same technique to claim his ridiculously high accuracy for weather station temperatures. His paper was on the CRU page (which now has bad links). 7 billion people with their fingers in the air estimating temperatures, should score at least the equivalent accuracy.

  11. Willis:
    What are those critters?
    An old Yankee would guess fox of some sort or juvenile coyote with a real coloring problem.
    Eastern New England, here. Back in the day, we never saw fox at all, as they were few and shy. Now they can be seen in quiet places, but seldom.
    As for coyote, I have never seen one, but from 11:00PM to 2:00 AM they can be heard regularly.
    Raccoons can climb up to upper cabinets, get cereal from boxes, open refrigerators, trash your whole house, and look at you to say “who me?”.
    Ask me how I know.
    A raccoon can be tamed, sometimes a fully wild raccoon can act tamed, if it is in it’s interest to do so.
    Never think a raccoon is domesticated, ever.
    Ask me how I know.

    • micro6500…
      Measuring with the same instrument, that has the same biases, multiple times is not the same thing as measuring with multiple instruments, the same cubic meter of sea water. The same quantity has to be measured multiple times with multiple instruments to increase accuracy in accordance with the math used in this blog post.

      • Scott, multiple measurements by the same instrument is used to remove instrument errors,and can be used to improve the precision of that measurement, unless ARGO has multiple sensors (which it could ) it as well as just about all of the temperature measurements are made with a single sensor(land as well as satellite ).

      • Yes, Micro6500,
        I don’t think that the satellite measurements claim 0.005 C accuracy since it is only one sensor per satellite. I think they claim only 0.1 C accuracy. Multiple measurements with one sensor improves accuracy but does not remove instrument bias. It seems to me that the accuracy of the average of all ARGO measurements for making an average temperature result of the whole ocean would depend on the range of measurements you are talking about. For example, if the water temperature at the equator were 100C and 0C at the poles then the accuracy you could claim would not be as good as if the equator were 60C and the poles were 10C. I don’t see anything in the error calculations that takes this into account. The closer all the ARGO thermometers are to measuring the same thing, the more accurate they could be as an average. Correct me if I am wrong.

        • I don’t buy the accuracy they claim for ocean temp, I’m not sure I buy the accuracy of a single buoy, but it could be possible, I do think the measurements are most likely precise.

      • “The same quantity has to be measured multiple times with multiple instruments to increase accuracy in accordance with the math used in this blog post.”
        Yes, otherwise it is the same as using an instrument to calibrate itself.

      • Micro6500 – I do not believe that ISA (the Instrument Society of America) or ANSI (American National Standards Institute) agree with that. And know of no standard that allows that Including the Nuclear Regulatory Commission requirements.
        Why is measuring something once a second better than measuring something continually with an analog device?

      • Menicholas
        June 7, 2015 at 1:58 pm
        “The same quantity has to be measured multiple times with multiple instruments to increase accuracy in accordance with the math used in this blog post.”
        Yes, otherwise it is the same as using an instrument to calibrate itself.

        It can reduce measurement noise. The biases remain.

  12. When steam is coming from the oceans, and the surface is covered with boiled fish, we will know for sure.
    In the meantime, I will point out that sometimes when I am in the sea I detect a warm patch around the lower half of my body. It dissipates quickly, though. If this phenomenon is widespread (and I know other people have had the same experience) eventually the accumulated heat will raise the temperature of the oceans by noticeable amounts.

    • RoHa,
      If there is a little kid standing near you when this happens, I think I can tell you how the warm patch of water forms.

  13. Off topic, but that Win 10 upgrade button reappeared today. Here’s the easy and official way to squash it permanently –
    Can I turn off the notifications?
    Yes. Click “Customize” in the System Tray
      [that’s the area in the lower right where the upgrade button appears. Hit the UP arrow to expand it]  and turn off the Get Windows 10 app notifications in the menu that comes up.

    • Mike
      At 1905 Z I tried that.
      Little pesky icon has gone – as I write.
      Definitely – fingers crossed.
      My thanks!
      Auto

      • @2020 Z today – the delightful icon [well, I do not wish to be offensive] is back.
        Tried again.
        Maybe 3rd time lucky. Perhaps.
        Auto

  14. Willis
    This is a very interesting article. The answer to your question is we do not know and cannot tell.
    I would like you to consider the following, and I would like to see your views.
    As I repeatedly state, the key to understanding the climate on this water world of ours is to understand the oceans. If one is concerned with GLOBAL warming, then since the oceans are the heat pump of the planet and distribute surplus solar energy which is input in the equatorial and tropical oceans, in 3 directions (namely pole-wards, and via ocean overturning and the thermohaline circulation to depth), it is only the oceans that need to be measured, investigated and assessed. ARGO should have been run out when the satellite measurements were launched in 1979. This was a missed opportunity, but that is water under the bridge.
    All the data sets have their own issues, but with the exception of the CO2 Mauna Loa and perhaps the satellite temp sets, all have huge unacknowledged margins of error. Now, I do not know whether CO2 does anything of significance, but what we do know is that Climate Sensitivity (if any) to CO2 is less than a combination of natural variation plus the error bounds of our various measuring devices and that is why we have been unable to detect the signal to CO2 in any temperature data set. Thus if natural variation is small and the error bounds are small, then Climate Sensitivity is small. If natural variation and error bounds are large, Climate Sensitivity could theoretically be large. So may be there is a role for CO2, and your article begs the question: if the oceans are warming why and how is this taking place?
    In praise of our host, it is interesting to consider whether all Watts are equal, or does it matter where within the system the Watts may exist or penetrate? Personally, I consider that not every watt is of equal significance.
    You sate: “The “mixed layer” is the top layer of the ocean that is mixed by both the wind and by the nightly overturning of the ocean. It is of interest in a climate sense because it’s the part of the ocean that responds to the changing temperatures above.” This begs the questions: precisely what energy is getting into the mixed layer? Is it simply solar (the amount of insolation may vary from time to time due to changes in patterns of cloudiness, or levels of atmospheric particulates) or is it solar plus DWLWIR (the latter increasing over time due to increasing levels of CO2 in the atmosphere whether of anthropogenic origin or otherwise)?
    You state: “My rule of thumb is simple. One watt per square metre for one year warms one cubic metre of the ocean by 8°C “ Let us consider that in relation to the K&T energy budget cartoon and the absorption characteristics of LWIR in water.
    According to K&T, there is on average some 324 W/m2 of backradiation “absorbed by the surface”, and some 168 W/m2 of solar insolation “absorbed by the surface.” What is interesting about this is that the oceans are a selective surface the effect of this is that there is about as much DWLWIR absorbed in a volume represented by just 3 MICRONS, as there is solar energy absorbed in a volume represented by a depth of about 3 metres!
    The above is a rule of thumb assessment. In practice almost no solar is absorbed in the top few microns but some 50% of all incoming solar is absorbed within the top 1 metre with 80% of all incoming solar in the top 10 metres, with only 20% making it past 10 metres down to depth, with some small part of which getting down to about 100 metres. By way of contrast, the absorption characteristics of LWIR are:
    https://scienceofdoom.files.wordpress.com/2010/10/dlr-absorption-ocean-matlab.png
    I
    t can be seen that 50% of LWIR is fully absorbed in 3 microns and 60% within 4 microns. However, that is the vertical penetration of LWIR, but since DWLWIR is omni-directional with much of it having a grazing angle of less than 30 degrees, it follows that at least 60% of all DWLWIR must be fully absorbed within 3 just MICRONS
    The upshot of the above is that in the volume represented by the top 3 metres of the ocean there is about 109 W/m2 of solar (ie., 168 W/m2 x 65%), and in the volume represented by the top 3 MICRONS of the ocean there is about 194 W/m2 of DWLWIR (ie., 324 W/m2 x 60%). In ball park terms there is nearly twice as much energy from DWLWIR contained in a volume of a million times less! Pause to consider the implications that that inevitably leads to.
    If DWLWIR is truly being absorbed and if it is sensible energy capable of performing sensible work in the environ in which it finds itself, it would cause the oceans to boil off, from the top down, unless it can be sequestered to depth, and thereby diluted by volume, at a rate faster than the rate that it would otherwise be driving evaporation.
    The first question to consider is how much evaporation would be driven by and at what rate if 3 MICRONS of water are receiving and absorbing 194 W/m2 of DWLWIR? Using your rule of thumb (8°C per watt per year per cbm), the temperature (if my early morning maths after Saturday night partying is right) is raised by about 16.5°C per second. In the tropical seas which are about 27°C to 31°C the entire top few MICRONS would be boiling within 4 seconds. Whilst the oceans are vast, after some 4 billions years, there would not be much oceans slopping around the surface of planet Earth. The oceans would be in the atmosphere having been boiled off from the top down.
    So how is DWLWIR absorbed in the top few MICRONS mixed into the mixed lawyer and at what rate?
    First, there is conduction. This is a non starter since the energy flux at the very top is upwards (the top microns, and indeed mm are cooler than the bulk below) and (unless we are wrong about how energy can transfer), energy cannot swim against the direction of energy flow/flux. See the ocean temperature profile, with plot (a) being the nighttime, and plot (b) the daytime profile.
    http://disc.sci.gsfc.nasa.gov/oceans/additional/science-focus/modis/MODIS_and_AIRS_SST_comp_fig2.i.jpg
    Second, it cannot be ocean overturning. As you note this is a diurnal (“nightly”) event so for example during the day it is not mixing the DWLWIR energy being absorbed in the top 3 MICRONS and which is causing that volume to rise in temperature at a rate of 16.5°C per second. In any event, even when operative, it is (in relative terms) a slow mechanical process quite incapable of sequestering the energy absorbed in the top few microns to depth at a rate faster than that energy would otherwise drive copious evaporation.
    Third, there is the action of the wind and the waves. However, this again is, in relative terms, a slow mechanical process which again is also inoperative or effectively inoperative for much of the time. Consider that according to Stanford University “The global average 10-m wind speed over the ocean from measurements is 6.64 m/s” (see:http://web.stanford.edu/group/efmh/winds/global_winds.html),
    which is speed horizontal to the surface. 6.64 m/s is just under 24 km/h and this means that the average conditions over the oceans is BF4, which covers the range 19.7 to 28.7 km/h. If the average conditions over the ocean is BF4, it follows that much of the oceans must for much of the time be experiencing conditions of BF2 and below (after all we know that there are storms, cyclones and hurricanes, and these have to be off-set elsewhere by benign conditions). BF2 is described as “Light breeze Small wavelets. Crests of glassy appearance, not breaking ” and BF1 is described as “Light air Ripples without crests.” Note the reference to “glassy appearance” so we know that even in BF2 conditions the surface is not being broken. The surface tension of water is such that in these very light conditions there is no effective mixing of the very top surface of the ocean by the operation of wind and waves (there is no waves just ripples or at most wavelets). Indeed, there must be areas (eg., in harbours, inland lakes, particularly crater lakes) where for lengthy periods the conditions encountered (at surface level) are BF0 where there is effectively no wind, and calm as a mill pond conditions encountered which means that the top MICRONS in these areas is not being mixed at all.
    I have never seen anyone put forward a physical process which could effectively mix the top MICRONS at a vertical penetrative rate faster than the rate at which copious evaporation would be driven by the DWLWIR energy absorbed in these few MICRONS. If this energy cannot be sequestered to depth (and thereby diluted and dissipated) at a rate faster than the rate of evaporation, there is a significant issue.
    I do not profess to have answers, but I do see significant issues such that my considered view is that if the oceans are warming this is only because of changes in the receipt of solar insolation or slight changes in the profile and distribution of currents (laterally and/or vertically). It would appear to be the result of natural phenomena, not anthropogenic activity (unless shipping and/or fishing is having an adverse impact on a micro biological level because as you note in your article, biology plays a role in the mixed layer).
    I look forward to hearing your comments.
    PS. You state: “The “mixed layer” …is of interest in a climate sense because it’s the part of the ocean that responds to the changing temperatures above.” However, is it not the oceans that warms the air above them and is responsible for air temperatures immediately above them, and not the air that heats the oceans?.
    If one is in the middle of the ocean, and if there is little wind speed, the air temperature is almost the same as the ocean surface temperature day and night.
    Further the heat capacity of these mediums is such that the air a degree or so warmer than the oceans could perform little heating of the ocean. It is the sun that heats the oceans, although the amount of solar insolation received is variable.

    • Richard: Isn’t there a big mismatch between the claimed absorbed energy at the surface and the total evaporation from the oceans, by something like 50%? There is far too little evaporation to represent that much energy in. The water is not heating enough. Where is the error? How is it transferred to the lower atmosphere without evaporating a heck of a lot more rain, or does it condense immediately, transfer heat to air, then precipitate in the few mm of the water surface?

    • I think everyone who is interested in this, get an IR thermometer and measure the sky.
      Those 324/168W/M^2 numbers require a lot of condition averaging.
      Clear sky Tsky in the 8-14u range has a BB temp from say 0F to -70 to -80F (@N41,W81), clouds increase the temp, thin hight clouds add 5-10F, all the way to being 5-10F colder than air temp.
      Clear sky low humidity is in the range of the 168W/M^2, the 324 would have to be humid with a heavy cloud cover some place warm. And while this might be prevelent for a lot of places I can’t imagine it could average that high.
      More measurements need to be taken.

      • Instead of wasting time taking more measurements, be aware that an IR thermometer is designed to operate in a wavelength range called the ‘atmospheric window’, i.e. it is not supposed to see radiation from the atmosphere (sky)

        • Which is why I included the wavelength. But you can turn that measurement into w/m^2 and then add the specified Co2 flux, and it is still far below the quoted numbers.
          It also means that most of the surface sees that temp, which as you say is a window to space.

      • And when you point that IR thermometer to the sky WHAT are you really measuring? What material is the IR Thermometer calibrated for (Air, wood, steal, or water) Makes a big difference. Different materials need different lenses. FACT. Then what is in the range of focus for the lens on the IRT? MORE GIGO

        • “And when you point that IR thermometer to the sky WHAT are you really measuring? What material is the IR Thermometer calibrated for (Air, wood, steal, or water) Makes a big difference. Different materials need different lenses. FACT. Then what is in the range of focus for the lens on the IRT? MORE GIGO”
          No they don’t need different lens, if anything think of it as a pin hole camera with 1 pixel.
          What it’s detecting is a flux of IR. It’s calibrated against the 8-14u portion of a BB spectrum. BB’S have a emissivity depending on material and surface texture, mine is adjustable for emissivity, but all that really accounts for is fewer photons entering the detect than the expected amount (which is why mine has a K type thermal couple built in). I found a paper that placed the emmisivity of the sky ~.69, but as compared to e=.95 all lowering it does is make the sky even colder.
          Pekka pointed out the cut at 14u, but as I mentioned you can convert the temp to a flux, add the Co2 flux in, and turn it back to a temp. It’s still frigid.
          From the ground looking up,any IR down enters the detector, oh NASA even has a page on measuring the temp of the sky with a handheld IR thermometer.

    • richard verney June 7, 2015 at 4:23 am

      Willis
      This is a very interesting article. The answer to your question is we do not know and cannot tell.
      I would like you to consider the following, and I would like to see your views.

      My views? My view, I fear to say, is “TLDR”. Boil it down some and make it plain just what it is you want me to consider, and I’m more than happy to give it a go. But that’s just too diverse and vague to understand what you want my views on.
      Thanks,
      w.

      • “TLDR”
        I had to look that up.
        Glad I did.
        Too long, didn’t read is a very important to keep in mind when commenting.
        I had to skip it too, even though I generally read every comment and usually like reading what Mr. Verney has to say.
        Ya gotta break it into smaller bites.

    • Richard:
      The key point you are missing in looking at the surface layer of the ocean and its opacity to longwave infrared is the comparable longwave emission of this surface layer. If the surface layer absorbs virtually all LWIR in the top few microns, then virtually all of its thermal emissions will be from the top few microns as well, as it would absorb any LWIR from below before it could “escape” to the atmosphere.
      And since the surface is generally warmer than the atmosphere it is in radiative exchange with, the upward LWIR flux density usually be larger than the downward LWIR. Picking typical (“average”) numbers from K&T, this very thin surface layer will be absorbing ~324 W/m2 downward LWIR and emitting upward ~390 W/m2 upward LWIR. So the radiative flux balance between this surface layer would be ~66 W/m2 upward. So in no way would the downward LWIR cause boiling off of this surface layer.
      I find it useful when thinking of the radiative behavior of water with regard to LWIR to consider it like more familiar opaque objects, like rocks. We don’t have any trouble considering the exchange of the surface of a rock to shortwave or longwave radiation in this way. (Of course, the rock can’t have convective mixing under the surface.)

      • Of course, the rock can’t have convective mixing under the surface.
        Well, actually it can, but it takes a few hundred million years.

    • Richard, the energy absorbed from the Sun (that is not captured long term for the depth) is removed by radiation up, conduction/convection, and evapotransporation. The radiation up is reduced from the value it would be in the case of no back-radiation due to the presence of back-radiation, and the other means of removal (conduction/convection and evapotransporation make up the loss, so that the energy balance is maintained). There never is an average gain in net energy to the surface by back-radiation that is absorbed, only a reduction in net radiation up, that was compensated for by the other processes. Thus back-radiation never heats the surface ON AVERAGE (it can sometimes at night when water temperature is cooler than air temperature, or where currents move cold water into high air temperature areas).

    • Downwelling long wave radiation is a fiction. In thermodynamics the temperature difference between the skin layer of oceans and the effective temperature of the sky above what matters. For water is almost always warmer than the sky above, which means it emits a higher thermal radiation flux upward than it absorbs from above. And only their difference, the net heat transport has any thermodynamical meaning.
      Therefore radiative transfer in the thermal IR range almost always cools the (several microns thick) skin layer, sometimes does so mightily.
      The case of short wave radiation is an entirely different issue, because its color temperature is high (~6000 K), that is, way higher than anything on the surface of planet Earth. In contrast to the thermal IR case, oceans never emit radiation in this frequency range, therefore net heat transfer equals to the radiation flux absorbed.

      • Sounds like the most concise explanation I have heard on this subject in a very long time.
        It is very disheartening to read, over and over again, days long and very contentious arguments on the subject of how ordinary materials, at ordinary temperatures, on and near the surface of the Earth, cool down or heat up in response to various inputs from well known sources.

  15. If I’m not mistaken argo claimed to discover a “calibration” error in tier floats after years of not noticing and no ocean warming around 2007. Then they claimed to “fix” this problem and suddenly there was a very slight upward trend in ocean temps. They changed past data also. It was just another data “boo boo” that changed a non trend or a cooling trend to a slight warming trend. There has never been a corrected mistake that would go in the opposite direction temp trend wise during this whole global warning campaign ever. i suppose that is a coincidence.

    • Not ARGO, that was XTB: one-off cable connected sounding devices thrown off the back of ships.
      They showed cooling so they must have been defective, they were removed.
      NMAT data from ships, which requires “corrections” for the changing height of ships’ deck ( which is not recorded in metadata and has to be guessed on the basis of what ‘typical’ ships looked like ) and then assumptions about how the air temperature varies with this height above sea level involved further guessing games.
      This is deemed by Karl et al as being more reliable than a purpose build floating sensor fleet. So the scientific instruments are “corrected” to fit the guestimated ‘adjusted’ NMAT data.

      • Errors were reportedly found, and adjustments made to BOTH XBT and Argo data.
        Some Argo bouys were found to be making depth calculation errors, according to the researchers, and were eliminated from the records, thus removing the ‘cooling trend’ which was developing.
        I will find and post the links in the morning, when I’m on my computer.

      • “Why wouldn’t they just remove the data from those ship divers completely then”
        Because then they would have no old data at all.

    • You find what you are looking for, not what you are not looking for. This is called bias.

    • The important thing is that every single adjustment made, past and present, large and small, has the same effect…it makes the predetermined conclusion of the warmistas…the ones doing the adjusting…appear to be justified.
      Every single past datum was flawed in the precise way that would bollix up this conclusion, until it was belatedly discovered and “fixed”. And fixed again. And again…
      A seven year old could see through such nonsense.

  16. The climate obsessed response will be that the inaccuracy means the models are correct and it is a failure of the measurements to accurately gather the data. The heat really could be hiding, so it must be hiding.
    So it is still worse then they thought.
    The lack of supporting reality- storms, pack ice, floods, droughts, sea level rise only means it a failure of *those* data sources as well.
    For the climate obsessed, everything works together to show the handiwork of CO2.

  17. OT, but –
    just wondering if the US can watch life the gathering to Elmau G7:
    world’s leaders single elevated by military helicopters to an alpine summit; the last hundert meters transported by CO2 neutral electro caddies through the pastures to discuss ecologic footprints.
    Never avoid laughability.
    Regards – Hans

  18. The entire mess we see in ‘climate science’ is the imposition of ‘absolute accuracy’ that is FAKE. These ‘precise’ measurements are FANTASY.
    This leads to fraud. When climatologists claim their various instruments can measure the temperature of the entire planet and all the oceans far more accurately to a finer degree than thermometers in laboratories, we are seeing people lying.
    They do this because the temperature changes they are claiming are so minimally tiny, they can only be detected by observing weather changes which are tricky to quantify. We do have clues as to whether or not the general climate is growing colder or hotter overall but this is seen mainly in hindsight with the annual frost line shifting north or southwards over time.
    Of course, we do have very alarming information that all Ice Ages begin and end very suddenly which is both puzzling and quite scary and we still don’t know for certain what mechanism aside from the sun, is causing this.
    If it is the sun as I suspect, then seemingly small changes in solar energy output has a dire ability to suddenly flip the climate of this planet from one extreme to the other. This information is highly important because the fantasy that we will roast to death due to a small increase in CO2 makes no sense at all since we are very obviously at the tail end of the present Interglacial era.
    All these ‘measurements’ were set up by NOAA and NASA to prove ‘global warming’ not track the real situation we are facing. They are attempts at gaming the data to create a false picture to justify a gigantic energy use tax imposed on all humanity. There is zero interest in understanding what is really going on which is why if the data doesn’t give ‘global warming’ signals, they toy with the data to create this out of thin air.
    This is why they ignore any incoming data showing global cooling over the last 8,000 years. Ignoring this is getting harder and harder but they work hard at doing this. It is, of course, destroying the concept of how science works, alas.

  19. I’ve adapted Willis’ code to extract the NMAT2 data just released by Hadley ( thanks Willis ) :
    [sourcecode]
    # http://www.metoffice.gov.uk/hadobs/hadnmat2/data/download.html
    ### extract NMAT from netCDF format.
    nmat_url="http://www.metoffice.gov.uk/hadobs/hadnmat2/data/HadNMAT.2.0.1.0.nc"
    nmat_file="HadNMAT.2.0.1.0.nc"
    download.file(nmat_url,nmat_file)
    nc=open.ncdf(nmat_file)
    nmat=get.var.ncdf(nc,"air_temperature")
    nmat=get.var.ncdf(nc,"night_marine_air_temperature_anomaly")
    nmat["night_marine_air_temperature_anomaly"==-999.]=NA
    [/sourcecode]
    now to get it into a more accessibe format.

    • I do admire a man who picks up on an idea and runs with it …
      What is the problem that you have with the format?
      w.

  20. Willis says: One watt per square metre for one year warms one cubic metre of the ocean by 8°C.
    ——-
    Believing you are using Q = m * Cp * T I think your statement can only be true if you think the sun shines 24 hours a day.
    Cp sea water is 3.93 Kj/kg – K
    M is 1024 kg/m^3
    T is 8 K
    Q is 32,194,560 J
    Seconds In a year is 31,536,000

    • mkelly June 7, 2015 at 5:45 am

      Willis says:

      One watt per square metre for one year warms one cubic metre of the ocean by 8°C.

      ——-
      Believing you are using Q = m * Cp * T I think your statement can only be true if you think the sun shines 24 hours a day.

      Thanks, mkelly. In fact, in climate science most all intermittent variables are simply divided by 4 (the difference between the area of a disk and a sphere with radius R) to convert them to a 24/7 average.
      This includes the forcings, which regardless of whether they are intermittent (solar) or semi-constant (DLR) are always given as global 24/7 averages so they can be compared to each other.
      Regards,.
      w.

      • If the solar constant is 1,366 +/- 0.5 W/m^2 why is ToA 340 (+10.7/- 11.2)1 W/m^2 as shown on the plethora of popular heat balances/budgets? Collect an assortment of these global energy budgets/balances graphics. The variations between some of these is unsettling. Some use W/m^2, some use calories/m^2, some show simple %s, some a combination. So much for consensus. What they all seem to have in common is some kind of perpetual motion heat loop with back radiation ranging from 333 to 340.3 W/m^2 without a defined source. BTW additional RF due to CO2 1750-2011, about 2 W/m^2 spherical, 0.6%.
        Consider the earth/atmosphere as a disc.
        Radius of earth is 6,371 km, effective height of atmosphere 15.8 km, total radius 6,387 km.
        Area of 6,387 km disc: PI()*r^2 = 1.28E14 m^2
        Solar Constant……………1,366 W/m^2
        Total power delivered: 1,366 W/m^2 * 1.28E14 m^2 = 1.74E17 W
        Consider the earth/atmosphere as a sphere.
        Surface area of 6,387 km sphere: 4*PI()*r^2 = 5.13E14 m^2
        Total power above spread over spherical surface: 1.74E17/5.13E14 = 339.8 W/m^2
        One fourth. How about that! What a coincidence! However, the total power remains the same.
        1,366 * 1.28E14 = 339.8 * 5.13E14 = 1.74E17 W
        Big power flow times small area = lesser power flow over bigger area. Same same.
        (Watt is a power unit, i.e. energy over time. I’m going English units now.)
        In 24 hours the entire globe rotates through the ToA W/m^2 flux. Disc, sphere, same total result. Total power flow over 24 hours at 3.41 Btu/h per W delivers heat load of:
        1.74E17 W * 3.41 Btu/h /W * 24 h = 1.43E19 Btu/day

  21. Thank you Willis – you have done interesting work with the ARGO system/data and reached a credible conclusion.
    I have so little time these days that I have developed a highly successful empirical approach to these matters.
    Observations:
    Warmists have repeatedly demonstrated unethical conduct.
    Warmists have repeatedly misrepresent facts.
    Warmists have repeatedly made false alarmist predictions that have NOT materialized to date and have a STRONGLY NEGATIVE predictive track record. Their predictions have consistently been false.
    Conclusion:
    Accordingly, every claim of warmists should be viewed as false until it is CONCLUSIVELY PROVEN by them to be true.
    This is a logical approach to evaluate the work of scoundrels and imbeciles.
    To date, this empirical approach has worked with remarkable success – dare I say with precision and accuracy.
    Best to all, Allan 🙂

    • Allan
      QUOTE
      Conclusion:
      Accordingly, every claim of warmists should be viewed as false until it is CONCLUSIVELY PROVEN by them to be true.
      END QUOTE
      So –
      Conclusion:
      Accordingly, every claim of warmists should be viewed as false until it is CONCLUSIVELY PROVEN by folk, without a dog in the fight, to be true.
      Now, doesn’t that look a bit better?
      Auto

      • Thank you Auto – but are you assuming you can find an informed individual who does not have “a dog in the fight”? 🙂
        http://wattsupwiththat.com/2012/09/07/friday-funny-climate-change-is-not-a-joke/#comment-1074966
        [excerpt]
        The “climate skeptics” position is supported by these FACTS , and many others:
        – there has been no net global warming for 10-15 (now 15-20) years despite increasing atmospheric CO2;
        – the flawed computer climate models used to predict catastrophic global warming are inconsistent with observations; and
        – the Climategate emails prove that leading proponents of global warming alarmism are dishonest.
        The political left in Europe and North America have made global warming alarmism a matter of political correctness – a touchstone of their religious faith – and have vilified anyone who disagrees with their failed CAGW hypothesis. The global warming alarmists’ position is untenable nonsense.
        I dislike political labels such as “left“ and “right”. Categorizing oneself as “right wing” or “left wing” tends to preclude the use of rational thought to determine one’s actions. One simply choses which club to belong to, and no longer has to read or think.
        To me, it is not about “right versus left”, it is about “right versus wrong”. Rational decision-making requires a solid grasp of science, engineering and economics, and the global warming alarmists have abjectly failed in ALL these fields. Their scientific hypothesis has failed – there is no global warming crisis. Their “green energy” schemes have also failed, producing no significant useful energy, squandering scarce global resources, driving up energy costs, harming the environment, and not even significantly reducing CO2 emissions! The corn ethanol motor fuel mandates could, in time, be viewed as crimes against humanity.
        It is difficult to imagine a more abject intellectual failure in modern times than global warming alarmism. The economic and humanitarian tragedies of the Former Soviet Union and North Korea provide recent comparisons, a suitable legacy for the cult of global warming alarmism.

      • Allan, and Auto,
        This really cuts to the chase.
        Agree 100%.
        If knowing that are lying would settle anything, or prevent bad policies from being instituted, there would be no point in saying anything else about the whole CAGW meme.
        Unfortunately, this is not the case.
        People must be convinced who are not convinced yet, and for that to occur, false information must be refuted, convincing arguments must be fashioned and honed, and the whole Warmista movement denounced and discredited.

  22. There has been a pretty good jump in Ocean heat content numbers over the last six months after several periods of decline/flat estimates.
    We’ll have to see if the new higher levels continue into the next sets of data but the accumulation rates will have to be recalculated higher if it does.
    http://data.nodc.noaa.gov/woa/DATA_ANALYSIS/3M_HEAT_CONTENT/DATA/basin/3month/ohc2000m_levitus_climdash_seasonal.csv
    http://data.nodc.noaa.gov/woa/DATA_ANALYSIS/3M_HEAT_CONTENT/DATA/basin/3month/ohc_levitus_climdash_seasonal.csv

  23. First off a watt is a power unit, energy over time, 3.412 Btu/lb or 3,600 kJ/kg.
    1.0 W/m^2
    3.412 Btu/h/m^2
    29,889 Btu/y/m^2
    1.0 m^3
    35.34 ft^3
    2,205.0 lb
    1.0 Btu/lb-°F
    13.56 °F
    7.53 C
    However, water evaporates at about 950 to 1,000 Btu/lb so just a minor amount of evaporation can easily compensate for the sensible heating. A few more clouds, big deal. And IPCC credits clouds with a -20 W/m^2 of radiative forcing, ten times the positive forcing of CO2.

  24. Willis,
    The answer to your question about nightly overturning. Oceanographers just consider that to be an element of the mixed layer. Which is simply the layer of water that mixes due to Shear overcoming stability forces in the ocean. This layer is highly variable 1000’+ meters in the Labrador sea, Weddell sea and Antarctic waters. In most tropical areas and mid latitudes the mixed layer is 25-200meters and it changes hourly, with that change being greatest in the winter time during storms. As you mentioned the drivers are solar heating, evaporation, Long wave heating, Long wave radiation, wind speed etc. So it is impossible to quantify globally just using back of the envelope math. There are scientists who spend there whole lives working with this issue as the mixed layer is where most of the interesting things happen in the ocean.
    On a side note, the idea that ARGO floats get a different answer than someone taking a transect of the ocean at any given time is not due to the equipment doing the measuring but the fact that the temperature in the mixed layer changes fast and often enough that one measurement a day is not good enough to capture the envelope of the data.
    v/r,
    David Riser

    • David Riser June 7, 2015 at 8:45 am

      Willis,
      The answer to your question about nightly overturning. Oceanographers just consider that to be an element of the mixed layer.

      Yes it is an element of the mixed layer … which is why it was curious that it wasn’t included in the drawing.

      Which is simply the layer of water that mixes due to Shear overcoming stability forces in the ocean. This layer is highly variable 1000’+ meters in the Labrador sea, Weddell sea and Antarctic waters. In most tropical areas and mid latitudes the mixed layer is 25-200meters and it changes hourly, with that change being greatest in the winter time during storms.

      I just gave you a map in the head post showing the mixed layer depth around the globe, so I’m not sure why you are making such claims. Nowhere is it “1000+ metres” as a monthly or annual average. Hang on …
      OK. For monthly variations, the range is from 10 metres down to 772 metres. For annual averages the range is from 12 to 220 metres. In the tropics the range of the monthly variations are 10 to 167 metres depth.

      As you mentioned the drivers are solar heating, evaporation, Long wave heating, Long wave radiation, wind speed etc. So it is impossible to quantify globally just using back of the envelope math. There are scientists who spend there whole lives working with this issue as the mixed layer is where most of the interesting things happen in the ocean.

      It’s totally unclear who and what you are talking about when you say “it is impossible to quantify globally just using back of the envelope math.” If you have a problem with someone’s math, quote the math that they have put forward and show us exactly where it’s wrong. This is why I ask people to quote what your disagree with.
      Best regards,
      w.

      • Willis,
        Making rough estimates using math where calculus and precise measurement are required is back of the envelope math. All I am saying is your math is ball parking it and your missing some critical pieces of information. The mixed layer in the mentioned areas Labrador Sea, Weddell Sea and Antarctic can be over 1000 meters deep. This is where surface water descends to the deeps and creates the various bottom waters that make up most of the ocean. This is basic Oceanography. The papers definition of the mixed layer is not a standard Oceanography definition as mixed layer depth is not temperature dependent in all areas of the globe.
        A good book on this subject: https://books.google.com/books?id=X0PDBca_EqEC&pg=PA51&lpg=PA51&dq=weddell+sea+mixed+layer&source=bl&ots=o70gfSaZOY&sig=I4gT2EvVKZ-1G2hp3Gb5Oxeckek&hl=en&sa=X&ei=UaZ0Ve2LMoutyQSS-YIQ&ved=0CFwQ6AEwCw#v=onepage&q=weddell%20sea%20mixed%20layer&f=false
        So your average over the earth of 60 meters is not very useful when the variance is as large as it is. And a Annual average in a cyclic system is also somewhat useless. The overturning is not in itself a driver. It is driven by those other things i.e. surface water cools at night (long wave radiation and evaporation) which causes it to become denser and seek equilibrium. This is not any different than water heating during the day and stratifying the water column and being mixed by a strong wind.
        Interesting write-up as always Willis. Normally your estimates are pretty good but in this case even the definition of mixed layer is being seriously abused by both the paper your referring to as well as your own math.
        v/r,
        David Riser

      • David Riser June 7, 2015 at 1:31 pm

        Willis,
        Making rough estimates using math where calculus and precise measurement are required is back of the envelope math. All I am saying is your math is ball parking it and your missing some critical pieces of information.

        I understand that is all you are saying. I’m saying that it’s far from enough. Let me repeat what I said, this time in bold print:
        If you have a problem with someone’s math, quote the math that they have put forward and show us exactly where it’s wrong. This is why I ask people to quote what your disagree with.
        It’s great for you to wave your hands and claim some unknown something else is wrong with my work. But if you want to diss my math, David, you’ll have to do far better than that.
        w.

      • David Riser June 7, 2015 at 1:31 pm Edit

        The mixed layer in the mentioned areas Labrador Sea, Weddell Sea and Antarctic can be over 1000 meters deep. This is where surface water descends to the deeps and creates the various bottom waters that make up most of the ocean. This is basic Oceanography. The papers definition of the mixed layer is not a standard Oceanography definition as mixed layer depth is not temperature dependent in all areas of the globe.

        The paper’s definition of the mixed layer is indeed a standard definition, one of three or so. Since both the paper and my dataset used that definition, it allowed me to calculate the values. Sorry, no errors there.

        So your average over the earth of 60 meters is not very useful when the variance is as large as it is.

        Dear heavens, you haven’t even calculated the variance and now you want to tell me about it?

        And a Annual average in a cyclic system is also somewhat useless.

        That is simply not true. An annual average is unsuited for some purposes and perfectly appropriate for others.

        The overturning is not in itself a driver. It is driven by those other things i.e. surface water cools at night (long wave radiation and evaporation) which causes it to become denser and seek equilibrium. This is not any different than water heating during the day and stratifying the water column and being mixed by a strong wind.

        First, nocturnal overturning is most assuredly different from wind-driven overturning. The former occurs whether there is wind or not.
        Next, please provide a quotation where I ever said that nocturnal overturning is a “driver”. QUOTE WHAT YOU DISAGREE WITH!!

        Interesting write-up as always Willis. Normally your estimates are pretty good but in this case even the definition of mixed layer is being seriously abused by both the paper your referring to as well as your own math.

        And despite claiming my math is flawed, you’ve provided absolutely no math of your own, no code, no data. You haven’t identified a single flaw in my math. You make claims about the variance of the mixed layer depth without ever calculating it.
        My honest suggestion would be for you to start over, David. Go to the place where I got my data, it’s linked in the head post. They have data for THREE DEFINITIONS OF THE MIXED LAYER. Grab the definition you want and show us where the mixed layer depth is 1000 metres deep. Because there is nothing that deep in the dataset I have, or even three-quarters that deep.
        Unfortunately, until you actually do that kind of calculation, you’re just waving your hands and making claims.
        w.

      • David, a final note. You say “So your average over the earth of 60 meters is not very useful when the variance is as large as it is.”
        In fact, the standard deviation of the data shown in Figure 5 is a mere 27 metres. I certainly see no reason that makes the average less than useful.
        Regards,
        w.

      • Willis,
        One more shot at this. Simply put your math is essentially ok. It is back of the envelope because you are using modeled output that was created based off data over a 70 or so year span. You are using one criteria (they aren’t calling it a definition) that they used, not the blending of the three. They openly admit that due to their criteria any MLD less than 10 meters will be 10 meters. The mixed layer depth is actually defined using several equations that are not solvable without making assumptions. Frequently what is found is not what would be calculated.
        So you took average data taken over a span of 70 years or so made the assumption that your data set completely defines MLD took an average and then used the same assumption on MLD depth to calculate a theoretical error for the ARGO dataset.
        So on the back of an envelope you compared a 1/3 of a model built from historical data and compared it to actual data to create a theoretical error rate.
        Mixed layer depth is highly variable, changes hourly based on many more factors than just temperature.
        Don’t take my word for it. This link is to an online oceanography textbook (chapter 8 section 5 and 6) that describes the necessary math if you want to go for a round two. You would still be using a model since there is no global current MLD dataset of actual conditions.
        http://oceanworld.tamu.edu/resources/ocng_textbook/chapter08/chapter08_05.htm
        v/r,
        David Riser

      • David Riser June 8, 2015 at 12:05 am Edit
        Willis,

        One more shot at this. Simply put your math is essentially ok. It is back of the envelope because you are using modeled output that was created based off data over a 70 or so year span. You are using one criteria (they aren’t calling it a definition) that they used, not the blending of the three. They openly admit that due to their criteria any MLD less than 10 meters will be 10 meters. The mixed layer depth is actually defined using several equations that are not solvable without making assumptions. Frequently what is found is not what would be calculated.

        Yep.

        So you took average data taken over a span of 70 years or so made the assumption that your data set completely defines MLD took an average and then used the same assumption on MLD depth to calculate a theoretical error for the ARGO dataset.

        Nope. I used their dataset to give me a reasonable estimate of the average mixed layer depth in the North Atlantic, which was 53 metres.

        So on the back of an envelope you compared a 1/3 of a model built from historical data and compared it to actual data to create a theoretical error rate.
        Mixed layer depth is highly variable, changes hourly based on many more factors than just temperature.

        Yes, I know that David. But I have to analyze the information I’m given. I was given information regarding the heat content estimates for the North Atlantic mixed layer. To convert them to temperatures I used the mean of the monthly estimated depths of the mixed layer. And yes, that is an estimation, and contains errors … so what?

        Don’t take my word for it. This link is to an online oceanography textbook (chapter 8 section 5 and 6) that describes the necessary math if you want to go for a round two. You would still be using a model since there is no global current MLD dataset of actual conditions.
        http://oceanworld.tamu.edu/resources/ocng_textbook/chapter08/chapter08_05.htm
        v/r,
        David Riser

        Thanks, David. I know lots about the variability of the mixed layer—I’m both a recreational and sport diver. So I know how much and how fast it changes, been there, dived that.
        But I have to use the best information I can to make my estimates. My estimate of the MLD of the North Atlantic was 53 metres, and I haven’t noticed you offering a better number. It gave me the estimate that the full global 700 m data covered about 118 times the volume of the NA MLD … if you have better estimates bring them out.
        You’re letting the perfect get in the way of the good. I’m just looking for a go/no-go decision on whether I can trust the Argo claims. I see that I can’t because my best case scenario is that their errors are underestimated by about 20 times … do I care if it’s actually out by 22 times because my MXL depth wasn’t perfect? Not in the slightest. Makes no difference to my analysis.
        My best to you, and thanks for hanging in,
        w.

  25. Tony June 6, 2015 at 11:21 pm says:

    My understanding is that “sqrt(N)/N” only applies where the data is homogeneous. That is, it would apply if multiple measurements were being made of the temperature of one sample (say a cubic metre) of ocean. It does not apply where data is heterogeneous, such as in multiple measurements of entirely different bits of ocean. If it did apply we could have millions of boat owners stick a finger in the water and with enough measurements we’d get the error of finger-temperature-estimates down to thousandths of a degree.

    Thanks, Tony, and mmm … yes and no. To start with, you are conflating two arguments. One is that we can never average something like the ocean. The other is that there is no end to the increase in accuracy from repeated measurements.
    First, let me make a crucial distinction very clear. Accuracy is whether your answer is right. Precision is whether your answer is repeatable. These two are 100% different things, and we must be careful which one we are talking about. Suppose you shoot ten shots at a target. They make some kind of grouping. Accuracy is whether the grouping of shots is centered on the bullseye. Precision, on the other hand, is how tight the grouping of shots is, regardless of where they land. Different things.
    With that distinction in mind, let me say that averaging in general improves precision, but not necessarily accuracy.
    Bear in mind that we may need either accuracy, precision, or both for a particular task. For example, if I need to know my weight to see which weight-class I might be in, I need accuracy.
    But if I want to know if I’m gaining or losing weight, all I need is precision. Even if my scale might be off by five pounds, it can tell me if I gain or lose one pound.
    Returning to your two arguments, I see both arguments a lot. Basically, the first argument says that error reduction is only possible where you have repeated measurements of the exact same thing, and that the thing must be in your words homogeneous.
    So let’s imagine, not an ocean, but a swimming pool. Half is in the sun, half in shadow. The two ends are slightly different temperatures. The top surface is evaporating constantly. The bottom and sides are at the temperature of the surrounding earth. Every part of the pool is at a different temperature, the conditions are totally heterogeneous. Just like the ocean, on a smaller scale and with smaller variations, but equally heterogeneous. You ask me to give you my best estimate of the average pool temperature.
    So I place my only thermometer into what I figure is about the middle of the pool, and I take a temperature. “20°C”, I say. You tell me you want a more accurate estimate. So I say hang on, let me measure in a few more places and I’ll give you a better number.
    You tell me that won’t work … your theory is that because the pool is far from homogeneous, repeated measurements won’t improve the accuracy of my estimate of the average pool temperature.
    You see the problem with your first claim? Regardless of how heterogeneous whatever we are measuring might be, adding more measurements can only improve and reduce the error of our estimate.
    Now, is repeated measurement improving the accuracy of the estimate of the pool temperature? Nope, not in the slightest. If my thermometer is off by 2°C, our average will be off by 2°C no matter how many measurements we take.
    But it does improve the precision of our estimate. I take my thermometer and I measure it at a bunch of different points. If those points vary a lot because the data is inhomogeneous, that is reflected as a larger standard error of our estimated temperature. But it will assuredly improve my estimate.
    So regarding your first claim, even if the data is heterogeneous, additional measurements can only help to reduce the error.
    Remember, however, that this is only the precision and not the accuracy. It is this confusion that has led to you saying:

    If it did apply we could have millions of boat owners stick a finger in the water and with enough measurements we’d get the error of finger-temperature-estimates down to thousandths of a degree.

    Curiously, that’s true … but only of the precision of the resulting estimate. Remember that precision only means repeatability. So all we are saying is that if we took another million [pool] owners and repeated the experiment, the averages would be within a thousandth of a degree of each other … which is true but says NOTHING AT ALL about the actual temperature of the water.
    That requires accuracy, and accuracy is a function of the measuring device. In the Argo case the nominal accuracy is ±0.005°C, and actual floats pulled from the ocean and checked have maintained their accuracy around that range. The problem with Argo is sampling error and heterogeneity, not instrument error. Even with all of those floats the ocean is still undersampled. Here’re some rough numbers:
    3.5E+8 square kilometers of ocean
    3.5E+2 Argo floats
    That’s one float per million square kilometres … it is that, and not the accuracy of the instrumentation, which is the source of the errors. That’s why it’s called a “sampling error”, because the ocean is undersampled.
    Finally, while we can increase precision by repeated averaging, in the real world I have a rule of thumb which is that I’m reluctant to claim more than one decimal place improvement over the precision of the instrument that took the reading. So if someone is averaging thermometers that are graduated in whole degrees, claiming errors that are in hundredths of a degree raises a flag. Might be possible, but it’s getting out there.
    However, in this case the instrumental accuracy is 0.005, so it’s not an issue and likely won’t ever be.
    I hope this assists you,
    w.

    • Willis,
      Apologies if my maths is too old school for this: –
      You write –
      QUOTE
      3.5E+8 square kilometers of ocean
      3.5E+2 Argo floats
      END QUOTE
      Now, I think – open to correction, as it’s decades since I was taught maths – that 3.5E+2 = 3.5 x 100 = 350
      Yet:
      QUOTE
      What is Argo?
      Argo is a global array of more than 3,000 free-drifting profiling floats that measures thetemperature (Auto – their error, cut and pasted with fidelity) and salinity of the upper 2000 m of the ocean. This allows, for the first time, continuous monitoring of the temperature, salinity, and velocity of the upper ocean, with all data being relayed and made publicly available within hours after collection.
      END QUOTE
      From
      this link – http://www.argo.ucsd.edu/
      “more than 3,000 free-drifting profiling floats ”
      Now, this still gives one Argo float for every 100 000 square kilometres of the ocean, (on average).
      An Area greater than – say – Hungary or Portugal; Indiana or Maine; New Brunswick; or the ten biggest ceremonial counties of England all aggregated (per Wikipedia, with its well-known accuracy, reliability and utter freedom from bias, which even I can edit) . . . .
      Roughly, a circle of water with a diameter of 220 miles.
      And all measured by one probe (on average) – so most of the area measured/sampled is (on average) more than 90 kilometres from the sampling point.
      QUOTE
      . . . it is that, and not the accuracy of the instrumentation, which is the source of the errors. That’s why it’s called a “sampling error”, because the ocean is undersampled.
      END QUOTE
      Absolutely.
      Fully agree.
      The ocean is – badly – under-sampled.
      The ocean is pretty big, as has been observed on WUWT before.
      It is still true!
      Auto

    • auto June 7, 2015 at 1:22 pm Edit

      Willis,
      Apologies if my maths is too old school for this: –
      You write –
      QUOTE
      3.5E+8 square kilometers of ocean
      3.5E+2 Argo floats
      END QUOTE

      Well spotted, auto. You are right, I was wrong. So the calculation should be
      3.5E+8 square kilometers of ocean
      3.5E+3 Argo floats
      Which gives one Argo float per hundred thousand square km, not a million.
      However, the point remains. The ocean is undersampled.
      Thanks for pointing out the error, that’s how understanding advances,
      w.

    • Good catch, Stefan, I hadn’t picked up on that. The Gulf Stream is basically a well-mixed river of warm water that is running on top of the more static underlying cold water. I’m surprised by the depth of it, though, that’s getting near 70-80 metres or so.
      You see somewhat the same thing in the Pacific. There, however, the transport of warm water to the poles is not constant like the Gulf Stream. Instead, the warm water is intermittently pumped polewards by the El Nino/La Nina pump. As a result, the track is less definite, but discernible nonetheless e.g. off the coast of Japan.
      https://wattsupwiththat.files.wordpress.com/2015/06/average-mixed-layer-depth-pacific.png
      Thanks,
      w.

  26. Your pool probably has a pump circulating the water through a heater. The pump’s suction line will have a thermometer/thermostat in it telling the heater to fire or not based on the thermostat’s set point. Depending on the location of suction and return the thermometer could give a fairly accurate average temperature of the water. Precision would depend on the quality/calibration of the thermostat, +/- 1%, +/- 0.25%. And then we could get into control theory on the swing in temperature depending on firing rate, overshoot, pool heat loss, ambient conditions, etc. (That’s the BSME and 35 years of power gen coming through.)
    Perhaps the ocean’s thermal circulation between geothermal heat flux on the floor and cold water sinking from the surface is analogous. Now if we could just find that suction line and thermostat in the oceans.
    So it has been 105 F for a week in Phoenix. The pool is shaded from direct sun light. I like it warm, 85 F. To keep the water that way requires heating. How come? If the heater fails pool water is going to cool off even though the air is 105 F. If air heats water per CAGW theories how come the pool doesn’t heat up to 105 F? Actually evaporation from the surface is going to drive the pool’s temperature towards the ambient wet bulb.
    Same with oceans. Evaporation and the water cycle in general are the climate thermostat, adding heat when needed, letting everything cool when needed, all maintaining a relatively comfortable stable millennia of climate with the swinging characteristics typical of any control loop.
    (The popular atmospheric CO2 heat trapping blanket theory ignores this water vapor thermostat.)

  27. Willis,

    then if we are averaging N items each of which has an error E, the error scales as
    sqrt(N)/N
    So for example if you are averaging one hundred items each with an error of E, your error is a tenth of E [ sqrt(100)/100 ]

    This method of getting at the standard error of the mean is true only if the measurements are independent and drawn from identical distributions. This is one issue I have with the error estimates of large data sets like the Argo buoys or surface temperatures and so forth–no one seems interested to verify that the measurements are independent and identically distributed, but continue to use this formula just the same. Another issue I have is that, to my knowledge, no one has bothered to recover any of the buoys to document drift over time. Finally, I have one additional issue that I have mentioned in various posts on the Argo buoys about three times, but no one responds.
    There is a potential secular bias in the instruments. In effect the buoys intend to measure the time integrated partial derivative of temperature with respect to time. But they cannot actually do so. What they actually measure is the integrated total derivative of temperature with respect to time. The difference between the two is an advective term proportional to the dot product of lateral buoy velocity with the lateral temperature gradient. This term will not average to zero when integrated over time because the buoys, being buoyant, will tend to drift toward higher ocean surface, and ocean temperature is correlated with ocean surface topography. It may be a small bias, but the reported temperature increases are small too, and it seems to me to be a bias baked into the measurement system. Nothing I have ever read about the Argo buoys appears to address this issue. Do you, or anyone else reading this thread, know if this issued has ever been addressed? If it is not addressed why should anyone think the measurements document a credible temperature change with time?

    • “the buoys, being buoyant, will tend to drift toward higher ocean surface”
      This effect, if it exists at all, will be completely negligible, since the buoys spend most of their time “parked” 1,000 or 2,000 meters below surface and their drift will be mostly due to currents at this depth.
      However the drift is non-random. A check of sites where measurements have been taken show that buoys strongly “avoid” upwelling areas and large river mouths as well as deep basins enclosed by shallows, none of which is surprising. There are also no measurements from sea-ice areas, but this is probably more due to the buoys being unable to surface there.

    • K. Kilty June 7, 2015 at 10:54 am

      This method of getting at the standard error of the mean is true only if the measurements are independent and drawn from identical distributions. This is one issue I have with the error estimates of large data sets like the Argo buoys or surface temperatures and so forth–no one seems interested to verify that the measurements are independent and identically distributed, but continue to use this formula just the same.

      Indeed, that is correct. It’s part of why I said that what I was presenting was the best case scenario.There’re lots of ways for it to get worse.

      Another issue I have is that, to my knowledge, no one has bothered to recover any of the buoys to document drift over time.

      It has been done more than once, although I don’t have the citation to hand.

      Finally, I have one additional issue that I have mentioned in various posts on the Argo buoys about three times, but no one responds.
      There is a potential secular bias in the instruments. In effect the buoys intend to measure the time integrated partial derivative of temperature with respect to time. But they cannot actually do so. What they actually measure is the integrated total derivative of temperature with respect to time. The difference between the two is an advective term proportional to the dot product of lateral buoy velocity with the lateral temperature gradient. This term will not average to zero when integrated over time because the buoys, being buoyant, will tend to drift toward higher ocean surface, and ocean temperature is correlated with ocean surface topography. It may be a small bias, but the reported temperature increases are small too, and it seems to me to be a bias baked into the measurement system. Nothing I have ever read about the Argo buoys appears to address this issue. Do you, or anyone else reading this thread, know if this issued has ever been addressed? If it is not addressed why should anyone think the measurements document a credible temperature change with time?

      Mmmm … I think you misunderstand the floats. They sleep a thousand metres down. As a result, they spend very little time on the surface, only enough time for ET to radio home. Then they drop back down a thousand metres and go back to sleep.
      As a result, the buoys don’t tend to “drift toward higher ocean surface”.
      Let me repeat the image from above:
      https://wattsupwiththat.files.wordpress.com/2012/02/argo-temperature-profiles-per-100000-sq-km.jpg
      If anything, the buoys are under-represented in the warmest part of the tropics, the ITCZ. So indeed there is a small bias there.
      However, provided that the distribution is relatively unchanging, the bias should not be a problem for what we really want to determine. Usually we’re interested in the trends, not the absolute values.
      Thanks for the interesting questions,
      w.

      • re Willis Eschenbach June 7, 2015 at 12:59 pm
        This map is interesting, but I think you’ll find the ITCZ is the zone, just above the equator, where we see *more* floats? Either side there is a bit less float density. This would suggest that there is some drift towards ITCZ, There will be surface winds and wind induced surface currents as this is where there is most rising air. Air is drawn in either side and thus deflected westwards by Coriolis forces causing the warm westward currents either side of ITCZ.
        KK says : “the buoys, being buoyant, will tend to drift toward higher ocean surface”
        No a buoy is a massive object and will go to the lowest gravitational potential : a dip, like a ball on uneven ground.
        The sea level is higher along ITCZ but that is due to the same winds and wind driven currents that seem to affect ARGO distribution. You raise a valid and interesting point that had not occurred to me before, just the logic was wrong about the cause.
        I’m sure Karl et al can make suitable correction to ARGO to create some more global warming using this information 😉

      • Willis,
        Thanks so much for taking time to post this detailed response. However, it is precisely because the floats drift at depth (any depth really) that I see a potential for warm temperature bias. You can see my thinking in the response to “Mike” way down below at about 9:35 PDT. After reading your response, and Mike’s, I see that this is a really complex issue, and I plan to pirate some of your “R” code to use position of floats over time to investigate it. Let’s see if the distribution stays constant with time.
        Regards,
        Kevin Kilty

  28. Willis
    Just to be sure, in your article above you state that the trade winds blow the warm surface waters to the west in reference to the El Nino. A fact check. Probably mean from the west.

  29. One watt per square metre for one year warms one cubic metre of the ocean by 8°C
    Willis, your thumb is not quite right. 🙂
    Earth-surface = 510.1 x 10 ^ 12 m^2 (the oceans are only 70.8% of earth’s surface, but this drops out of equations)
    1 W/m^2 * 510.1 x 10^12 m^2 = 510.1 x 10^12 Watts
    510.1 x 10^12 Watts (J/sec) * 31,536,000 sec/year = 1.61 x 10^22 Joules/year
    Ocean Cp ~= 4185 J/kg/K
    1 m^3 of water = 1000 kg
    T = E/M/Cp = 1.61 x 10^22 J / 1000 kg / 4185 = 3.84 x 10^15 C. <== that's a lot more than 8.
    Your thumb probably meant to say 1 meter of ocean depth. In this case:
    T = E/M/Cp = 1.61 x 10^22 J / 510.1 x 10 ^ 12 m^3 / 1000 kg/m^3 / 4185 J/kg/K = 7.53 C
    Revised rule of Mr. Thumb:
    One watt per square meter for one year warms one meter deep of water by 7.5°C

    • Viking Explorer

      Revised rule of Mr. Thumb:
      One watt per square meter for one year warms one meter deep of water by 7.5°C

      Well, if you are starting from a 510 total Mkm^2 of a spherical earth surface, then the radiation at top of atmosphere varies over the year – but can be averaged out, but you cannot use 1000 watts/m^2 at bottom of atmosphere anywhere but the “average” at noon between 23.5 north and -23.5 south.
      Everywhere else the approximation gets further and further from the approximate world of approximate flat-earth of Trenberth (er, NASA-GISS) siplified models

      • Mr. Cook, sounds reasonable, but where did I use 1000 W/m^2. I believe that this rule of thumb is just meant to provide an energy equivalent.

      • ahh, yes, I see that now. My mind filters out anything with old fashioned units like Furlong’s per fortnight, tons of TNT, Dog-years, micro-fortnights and royal-albert-halls.
        My commute used to be 58 centi-MPH-Minutes. 🙂

      • 🙂
        Thanks a bunch!
        Do you know how much it hurts when you laugh so hard that Diet Pepsi comes out of one’s nose?

    • VikingExplorer June 7, 2015 at 11:55 am Edit

      One watt per square metre for one year warms one cubic metre of the ocean by 8°C

      Willis, your thumb is not quite right. 🙂
      Earth-surface = 510.1 x 10 ^ 12 m^2 (the oceans are only 70.8% of earth’s surface, but this drops out of equations)
      1 W/m^2 * 510.1 x 10^12 m^2 = 510.1 x 10^12 Watts
      510.1 x 10^12 Watts (J/sec) * 31,536,000 sec/year = 1.61 x 10^22 Joules/year
      Ocean Cp ~= 4185 J/kg/K
      1 m^3 of water = 1000 kg
      T = E/M/Cp = 1.61 x 10^22 J / 1000 kg / 4185 = 3.84 x 10^15 C. <== that's a lot more than 8.
      Your thumb probably meant to say 1 meter of ocean depth.

      I did say one meter of ocean depth, that’s the “cubic metre” referred to in the rule of thumb.

      In this case:
      T = E/M/Cp = 1.61 x 10^22 J / 510.1 x 10 ^ 12 m^3 / 1000 kg/m^3 / 4185 J/kg/K = 7.53 C
      Revised rule of Mr. Thumb:
      One watt per square meter for one year warms one meter deep of water by 7.5°C

      Oh, so close. The only part you are missing is the fact that sea water weighs about 1.03 tonnes. That takes your number to 7.8. Certainly, to the nearest degree we have no disagreement.
      You’ve taken the long way around, however. You don’t need to involve the whole planet. The shorter way is
      1 watt-year * 31536000 seconds/year = 31.5 megajoules
      4 megajoules/°C/tonne specific heat capacity of sea water.
      1 cubic metre seawater = 1.03 tonnes
      SO … temperature rise is 31.5 / 4 * 1.03 = 8.1 °C
      The two seawater variables (density, specific heat capacity) are available here. Note that they both change with temperature.
      All the best,
      w.

      • Willis, you’re right to correct for sea water. However, the specific heat you use isn’t very accurate.
        T = E/M/Cp = 1.61 x 10^22 J / 510.1 x 10 ^ 12 m^3 / 1035 kg/m^3 / 3985 J/kg/K = 7.65 C

  30. Willis, I realize that you are not discussing or calculating instrument error, but I thought it might be interesting. Here is my back-of-napkin error analysis:
    I calculate the volume of water represented by an Argo float as 221,777 cubic km = 221,777 x 10^9 m^3. The mass of that water = 221,777 x 10^9 m^3 * 1000 kg/m^3.
    I’ll calculate the W/m^3 error for the official instrument error of .005 C, and the WUWT mentioned measurement error of +/- .06 C:
    E-joule-error-wuwt = 221,777 x 10^12 kg * 4185 J/kg/K * .06 K = 5.56882047 x 10^19 Joules
    E-joule-error-official = 221,777 x 10^12 kg * 4185 J/kg/K * .005 K = 4.640683725 x 10^18 Joules ref
    E-joule-baseline = 221,777 x 10^12 kg * 4185 J/kg/K * 273 K = 2.53381331385 x 10^23 Joules
    E-%-error-wuwt = E-joule-wuwt / E-joule-baseline = .022 %
    E-%-error-official = E-joule-official / E-joule-baseline = .0018 %
    These Energy uncertainties would be equivalent to the following Earth Power uncertainties for 1 year:
    P-error-wuwt = 5.56882047 x 10^19 Joules/year / 31,536,000 sec/year / 510.1 x 10^12 m^2 = .003 W/m^2
    P-error-official = 4.640683725 x 10^18 Joules/year / 31,536,000 sec/year / 510.1 x 10^12 m^2 = .000288 W/m^2
    (please help correct any arithmetic errors.)

    • A small detail. Argo buoys don’t go below 2,000 meter. Average depth of the ocean = 4300 meter, so 53% of the ocean is never sampled at all.
      Actually it is worse since continental shelf areas, deep basins surrounded by shallows (like the Sea of Okhotsk), upwelling areas, area near large rivermouths and areas under sea-ice are also never or alkmost never sampled.
      Altogether Argo never samples something like 60% of the entire ocean volume.

      • tty, Good point about under sampling. However, the average ocean depth is actually 3,682.2 meters (ref). So, only 45% is never sampled.
        Even worse, we’re sampling none of the lithosphere. The atmosphere is only .07% of the ocean/air system and only .01% of the air/land/sea thermodynamic system.

      • Viking, that is an interesting average and is no where near the median. What is the mode?

  31. Willis , In case you have an interest in generally powerful languages for expressing applied mathematics , while I understand R has lots of useful statistical and matrix routines , APLs are more more general and succinct . You might be interested in checking out http://www.dyalog.com , the leader among traditional APLs , http://jsoftware.com/ , Ken Iverson’s own final very mathy J evolute , and http://kx.com/ , more stripped down to the essentials K , which is the “template” for my own work .
    In these languages a quite competitive planetary model could be written as or more succinctly than the notation in any physics text .

    • Thanks, Bob. R has some advantages. First, it’s free. Second. It’s free. Third, it’s totally cross-platform (Mac/Unix/Linux/PC).
      The two languages (R and APL) actually are quite similar in some ways. In both languages the fundamental unit of data is not a single cell, but instead it is a vector (a list of numbers like say 1, 3, 5, 9). All operations act on each element of the vector. So if we have the vector
      V = 1, 3, 5, 9
      and we want to add two to each of those, we just say
      V + 2
      and the answer is
      3, 5, 7, 11
      So that structure, shared by both languages, is essential. However, a couple of things set R apart for me.
      One is that I can highlight a line of code, ten lines of code, part of a line of code, or even a single word, hit Command-Enter, and it runs just that amount of code. This makes debugging very easy, because I can easily examine and run any size chunk of code.
      The next is the wide availability of packages to do special tasks. I have packages for matrix operations, for mapping, for graphic display, for astronomical calculations, the list is endless. And like R itself, the packages are all free.
      The next is RStudio, far and away the best user interface I’ve ever seen. If you are using R without RStudio, you’ve missed the experience … and RStudio is free as well.
      Anyhow, I’m not trying to convince you so much as I’m trying to encourage any lurkers out there to learn the language. I learned R myself maybe five years ago, when I was 63 or so … and I’m sure many here can do the same.
      Thanks,
      w.

      • Yes, I checked out the first APL link given, and found it extremely expensive. I’ve looked at R before in response to CA comments, and found it well done.
        Another strong contender is F#. The units of measure functionality is quite impressive. However, for performance reasons, I’d probably stick with C++.

      • Like you, Viking, I like C for the speed … so I was happy to see the package Rccp. It lets you write C code directly inline, say in a function like this:

        double all_cpp(Rcpp::NumericMatrix& mat){
          int nrow = mat.nrow();
          int numcomps = nrow*(nrow-1)/2;
          double running_sum = 0;
          for( int i = 0; i < nrow; i++ ){
            for( int j = i+1; j < nrow; j++){
              running_sum += haversine_cpp(mat(i,0), mat(i,1),
                                           mat(j,0), mat(j,1));
            }
          }
          return running_sum / numcomps;
        }

        That gives me a simple way to get extra speed when I need it. Having said that, as long as I don’t do something foolish like iteration loops (for n = 1 to 35000 etc) R has been fast enough for my needs.
        Regards,
        w.

  32. I have been commenting on these problems with ARGO for years. All of the data I have been referred to (www seabird com) AND quoted to as the accuracy, even on the ARGO web site ( www argo ucsd edu ) are GROSSLY misleading, and I would say on purpose. They only provide the accuracy of the electronics and do not provide the change in accuracy as affected by ambient temperature or operating voltage. And therefore all of the data that everyone is taking and using concerning ocean temperature is very suspect. At the minimum it is at least an order of magnitude LESS accurate than the oceanographers claim. Yes, the buoys they are using are laboratory grade equipment with a professed accuracy of 0.001%, however that is for the electronics ONLY and claiming that as the accuracy from the actual source (ocean) to the resultant display – That is not how it works. You also need to take into account the accuracy of the probe – a very expensive one is only 0.005% accurate and that – Plus the electronics accuracy, would be the REAL accuracy of the temperature loop.
    Worse than that – and that is my concern, is that they also fail to recognize and take into account the change in accuracy caused by the change in temperature of the electronics. A laboratory grade instrument is, typically, only accurate in the neighborhood of 20 0ºC ( 72 0ºF) plus or minus a few degrees. Outside those bounds, the accuracy suffers. Usually, quality equipment will provide a graph or data on that error. Has this error been included in the readings reported by the surface station equipment? Most laboratory grad equipment begins losing more than twiceit rated accuracy above or below ten degrees from the calibrated temperature.
    Worse yet, as you get near the bounds of electrical operation, -20 /+ 50 0ºC, the numbers could be nothing more than garbage. I recall seeing a report on a surface station temperature report from Alaska, where the instrument quit all together when it got to 40 or 50 below 0ºF. I just cannot believe that the numbers before then were valid. How many of these bogus numbers (temperatures) are used in the Global Warming Scam?
    You then need to consider the batteries, and the fact that they have reduced voltage at low temperatures and these low temperatures radically affect the operation of the equipment (that is why your car will not start in very cold weather.) Again another specification provided on a REAL data sheet providing REAL accuracy.

  33. Judging by the red areas on the ARGO map, it seems that the areas where the floats measure the least are the areas that are most variable, near shore, shallow seas, high latitudes.
    Any chance tis tosses in a bias?
    Also, are the spatial distributions constant over the course of a year, or do the individual months look any different?

      • “doesn’t the heat from the probe have effects on the water temp of that order?”
        Depends, I think rtd’s can be very small (flea or smaller ), and have very small currents, to a drop of water you might want to be careful, a gallon of water not so much.

      • From that link:
        Features of the SBE 41/41CP Design
        The SBE 41/41CP uses the proven MicroCAT Temperature, Conductivity, and Pressure sensors. The CTD is shipped fully calibrated, and has demonstrated excellent long-term stability, eliminating the need for post-deployment tampering of the calibration to force agreement with the local TS.

    • Argo measures at a 1 Hz rate when it does its rise. No information is given on the time constant (TC) of the temp sensor which will affect the reading. i.e. if you step change the temperature how long will it be to get to 63% (one e fold) of the final? So the sensor could be accurate but if the dwell time is not long enough the readings will not be.
      Measurement is hard. Real accuracy is harder. The fact that the instrument is accurate is only a start. Is the measurement as taken accurate? Different question. To what accuracy is the TC known?

      • While I realize this was good enough for the government, usually designs are appropriate for the application. Small thermal sensors (there are a couple different types ) can have a fast tc, plus water is a good sink.

      • But there is no way to estimate the error from TC unless you know it. I have seen small RTDs that have TCs in water of 1 to 5 seconds. 5 TCs gets you to within 1% of the change. i.e. from 10C to 11C you will see 10.99 in 5 TCs. Roughly. 7 TCs will get you to within 1 part in 1,000 of the change. Roughly. So to get into the accuracy range of the instrument for a 1 C change requires from 5 to 35 seconds dwell.
        Water is a good sink. Yes. But is there flow or a film? Hydrophobic or hydrophilic? And of course this is salt water. So it is doubtful the RTD is in direct contact with it. There are complications.

        • M Simon

          But is there flow or a film? Hydrophobic or hydrophilic? And of course this is salt water. So it is doubtful the RTD is in direct contact with it. There are complications.

          The link above in this page claims there is a “pump” and siphon (vertical loop) that processes water through a loop to the T/C, but stops while the float is at the surface to prevent contamination. But, how good is the float and how clean are its surfaces? There is supposedly a chemical released to kill biologics around the TC, but … any and every mechanical thing WILL fail if left to its own devices in the water for days/weeks/months.

  34. Menicholas June 7, 2015 at 1:38 pm

    “A note about the claims that adding up a lot of measurements and calculating the average ‘increases the accuracy’. That only applies to multiple measurement of the same thing.”

    Exactly right!
    There was a discussion a few months ago in which I seemed to be the only one who was willing to recognize that measurements taken at different times and places cannot give an accuracy, when averaged, greater than the accuracy of each reading.
    This can be easily proven by a simple thought experiment.

    Let me repeat my simple thought experiment from above.

    Let’s imagine, not an ocean, but a swimming pool. Half is in the sun, half in shadow. The two ends are slightly different temperatures. The top surface is evaporating constantly. The bottom and sides are at the temperature of the surrounding earth. Every part of the pool is at a different temperature, the conditions are totally heterogeneous. Just like the ocean, on a smaller scale and with smaller variations, but equally heterogeneous. You ask me to give you my best estimate of the average pool temperature.
    So I place my only thermometer into what I figure is about the middle of the pool, and I take a temperature. “20°C”, I say. You tell me you want a more precise estimate. So I say hang on, let me measure in a few more places and I’ll give you a better number.

    You tell me that won’t work … your theory is that because I’m not taking all of the measurements all at the same time in the same place, repeated measurements won’t improve the accuracy of my estimate of the average pool temperature. As you say, these “measurements taken at different times and places” can’t improve your estimate.
    Me, I say the opposite. I say that the more temperature measurements we take in separate places in the pool, the better our estimate of the average gets. And in some real sense, we can’t take two measurements at exactly the same time. So we commonly use measurements taken over some appropriate time frame.
    This of course is taken up in the calculations by specifying that the measurements are hourly or daily or monthly averages.
    Here’s the question. The pool I described is in its own way quite heterogeneous. If I asked you “Menicholas, I want your best estimate of the temperature of the pool over the next 24 hours.”
    I know of no way to do that than to place as many thermometers as possible in as disparate a group of places and depths in the poos as possible. Then I’d take as many measurements over the day as I could reasonably take.
    And the more measurements I get, in both time and space, the better my estimate becomes.
    So I’m afraid that you can indeed take measurements at different times and places and get a very precise average … not only that, but more different times and places you take them, the more precise your estimate gets.

    And assuming that disparate measurements can be treated using the same statistical methods as repeat measurements of the same thing, seems to be done over and over again in climate science,.

    The results of recording the temperatures in the pool throughout the day are very disparate measurements, some in warm areas, some in cool areas, some in areas with diurnal cycling, some areas with little change. They are taken in different locations and at different times.
    And despite that, yes indeed, the same statistical methods apply to them. In fact there is no way to measure say the daily average temperature of the pool without taking measurements at different times and places.
    My best to you,
    w.

    • Willis,
      For your position to work you have to prove (as best you can) that temperatures were stationary during the measurement. and that there were no (as small as possible) flows. The wind wasn’t causing cooling etc. When you take multiple measurements over time of an object that is changing there sqrt may not be applicable. the improvement may not be (almost certainly isn’t) ^.5 – it could be ^.7 or maybe ^.98 .
      The sqrt works for the length of an iron bar in a temp controlled environment handled as little as possible. You are measuring the “same thing”.
      So what proof do you have that you are measuring the same thing? The point? The sqrt is the most optimistic estimate of the improvement. Now if yo go with that. No problem.

    • Willis is right. The folks arguing that such temperature readings are invalid are really saying that ALL temperature readings are invalid. This is akin to saying I don’t believe in any scientific analysis because all temperature readings are invalid. Is a temperature reading making an assertion about every atom? No, but who ever said it was? Would more buoys be better? Of course. Do buoys move? Yes, so we just need to keep adding new ones up stream.
      People should consider the implications of the positions they are taking. I wish there this was like the twilight zone so that the people who make these kind of arguments would get this response when they check the weather: Sorry, we could tell you an estimate for today’s high temperature, but based on your own standards, it would be invalid for so many parcels of air in your general vicinity, that we can’t take the risk of being wrong.
      In short, they need to change from:
      bool valid, correct;
      to
      float valid, correct;

  35. What does Briggs have to say about averaging the floats in a grid, then averaging the grids? Doesn’t he tell us this is WRONG.
    The Central Limit Theorem tells us to randomly sample the Argo floats. The result will very likely be normally distributed, from which a whole slew of statistical information can be derived.
    An average of average hides the variance. Which hides the error.

  36. Willis writes

    If the errors are all equal to say E, then if we are averaging N items each of which has an error E, the error scales as
    sqrt(N)/N
    So for example if you are averaging one hundred items each with an error of E, your error is a tenth of E [ sqrt(100)/100 ].

    This cant be the whole story. I recall a beautiful description I think it might have been from John Daly describing error calculations and he pointed out that if you have a map and ruler, it doesn’t matter how many times you measure the distance from London to Paris, you’ll never get the answer to within a ballpark estimate.

    • TimTheToolMan June 7, 2015 at 8:49 pm

      Willis writes

      If the errors are all equal to say E, then if we are averaging N items each of which has an error E, the error scales as
      sqrt(N)/N
      So for example if you are averaging one hundred items each with an error of E, your error is a tenth of E [ sqrt(100)/100 ].

      This cant be the whole story. I recall a beautiful description I think it might have been from John Daly describing error calculations and he pointed out that if you have a map and ruler, it doesn’t matter how many times you measure the distance from London to Paris, you’ll never get the answer to within a ballpark estimate.

      Thanks, Tim, good to hear from you. Unfortunately, you are conflating accuracy and precision. Accuracy is how well your estimates match reality.
      Precision, on the other hand, is repeatability. It is how well your estimates match each other. It has nothing to do with accuracy. Averaging in general only increases precision.
      So John Daly is correct about the accuracy not increasing … but we shouldn’t expect it to. Instead, repeatability increases.
      Consider. If I measure London to Paris with a ruler, and you do the same, our answers will likely be quite different. Not repeatable, poor precision.
      But if I average 100 different people’s answers, and you average 100 different people’s answers, the averages are likely to be much closer to each other. This means more repeatability of the estimated value, which is to say, increased precision. It still says nothing about the accuracy of the measurements, for all we know the ruler is wrong. But repeating measurements does increase the precision, even if the ruler is wrong.
      I hope this assists you with the distinction. If not, just ask again.
      w.

      • Willis writes “you are conflating accuracy and precision. Accuracy is how well your estimates match reality.”
        But in this context the precision of water temperature that is being measured and supposedly increased with increased numbers of measurements says nothing about the accuracy of the measurement which is an entirely different error and not properly accounted for.
        So the analogy of 100 people all averaging say 13.42 cm from Paris to London doesn’t make the distance from Paris to London 13.42 x “the map scale” any more accurate than 100 measurements of an Argo buoy representing the average temperature of its 375k cubic kms of ocean.

      • Thanks, Tim. You are correct that increasing the precision doesn’t increase the accuracy. However, in the case of the ocean heat content, we don’t really care about the accuracy. The issue is not whether the ocean heat content is 2.147E+25 joules or 2.293E+25 joules.
        All we care about is whether the heat content is increasing or decreasing, and by how much. And for that, the accuracy error is immaterial. Your scale doesn’t have to be accurate to tell you if you are gaining or losing weight … it just has to be precise.
        w.

      • Willis writes “And for that, the accuracy error is immaterial. Your scale doesn’t have to be accurate to tell you if you are gaining or losing weight … it just has to be precise.”
        But its not precise. The error is not one of measurement precision, its one of what the measurement is actually of. It makes no difference how precise the thermometer is if the measurement is not representative of the object. In this case 1 measurement in 375,000 cubic kms of ocean isn’t representative of that volume of ocean (IMO).

      • TimTheToolMan June 9, 2015 at 2:30 am

        Willis writes

        “And for that, the accuracy error is immaterial. Your scale doesn’t have to be accurate to tell you if you are gaining or losing weight … it just has to be precise.”

        But its not precise. The error is not one of measurement precision, its one of what the measurement is actually of. It makes no difference how precise the thermometer is if the measurement is not representative of the object. In this case 1 measurement in 375,000 cubic kms of ocean isn’t representative of that volume of ocean (IMO).

        I agree, which is why I described it as a “sampling error” and I said several times that the system is undersampled. All I’ve done above is to show that the uncertainty is about twenty times what they have claimed, and that as a result we cannot tell if the ocean heat content is increasing as claimed.
        However, that doesn’t make the measurements useless. As usual, that just means that you need to aggregate them over either a larger time or a larger space. As an example, here’s the Argo data on the ocean heat maximum … it shows a rectangular patch of ocean north of the equator north of Australia:
        http://wattsupwiththat.files.wordpress.com/2012/02/argo-surface-temperatures-n-hemisphere-160-180e-0-45n.jpg
        This shows the annual cycle of temperatures, with two identical cycles shown for clarity. As you can see, there’s a lot to be learned from the Argo data … just not what they claim.
        w.

      • Willis writes “All I’ve done above is to show that the uncertainty is about twenty times what they have claimed, and that as a result we cannot tell if the ocean heat content is increasing as claimed.”
        From the point of view of overall position on this post, I agree with you Willis. I often do.

      • So it occurred to me to look to see whether TOBS is an issue.
        http://www.aoml.noaa.gov/phod/docs/ArgoDMpaper_reprint.pdf
        “The original profiling float sinks after launch to a prescribed pressure level, typically 1000 dbar. After a preprogrammed time (typically 10 days) at this pressure, the float returns to the surface”
        Typically 10 days means potentially variable. And there is no detail as to the specifics of what is currently used. Well not in that document anyway…

    • The example to use a map to measure the distance from London to Paris is a good example for illustration of systematic errors.
      The map is a model. Imagine if the one who made this map missed on the scale by 50 %.
      Then all others who tries to use the map, the model, to estimate the distance from London to Paris will make an error of 50%. This error will be in addition to all other systematic and random errors which may have been made when making the map. And then you will have the systematic error of the ruler.
      This should also illustrate how stupid it is to try to estimate a value from a model.
      If the model is verified, tested and calibrated on multiple points, it may however be used for estimation within its tested, verified and calibrated range. Based on the testing you may also assign an uncertainty value to you model.

  37. If you simply picked people at random from the earth and measured their height, wouldn’t this give you the average height? And wouldn’t this be normally distributed with the standard error equal to the standard deviation? So how come we don’t need to grid people? They certainly aren’t evenly distributed around the globe and static in location.
    ok, just having a bit of fun. but in theory, if we can calculate the average height of people why can;t we calculate the average temperature of argo? and since argo is essentially randomly distributed from one sample to the next, wouldn’t this eliminate the need to grid if all we wanted was to see the trend and the error?
    yes, gridding will be needed to average the earth, but it is not needed to average argo.

    • If you plot the height of everyone on Earth, you will get a bell curve distribution.
      If you plot the temperature of each cubic kilometer of water on the earth, what will be the shape of the resulting graph?
      And people do not constantly shrink and grow by a large percentage of the height they have any given time.

  38. ferdperple writes

    So how come we don’t need to grid people?

    Because heights are (thought to be) evenly distributed around the world. Ocean temperatures aren’t though, so if you sample more in one latitude in one year and less in the next then that will fairly obviously bias the result.

  39. M Simon June 7, 2015 at 4:56 pm

    Willis,
    For your position to work you have to prove (as best you can) that temperatures were stationary during the measurement. and that there were no (as small as possible) flows. The wind wasn’t causing cooling etc.

    Not in the slightest. Consider my example of the pool. Yes, the temperature of the pool is changing. This means that the estimate of the average will automatically have a larger error, because the temperatures will have a greater standard deviation.
    But given that we are starting from a worse point, if we increase the number of measurements, the error generally decreases as sqrt(N)/N, subject to the restrictions I listed earlier.

    When you take multiple measurements over time of an object that is changing there sqrt may not be applicable. the improvement may not be (almost certainly isn’t) ^.5 – it could be ^.7 or maybe ^.98 .

    Again I have to disagree. The difference is taken up in the larger error estimate to start with. But given that, the reduction of error follows the usual rules. If you take one hundred times the measurements, you get an extra decimal in the error.
    The part about the reduction of error has nothing to do with the timing of the measurements. It’s straight math that only has to do with averaging and the effect it has on the errors.
    The error of the sum of N objects each with an error of EN is given by:
    sqrt(E12 + E12 + E12 + …)
    That is to say, the error is the square root of the sum of the squares of the individual errors.
    Note that this is true without regard to the physical situation. It is the mathematical nature of errors. They add “in quadrature”, that is to say as the square root of the sum of their squares.
    Now, if we divide that sum of N values by the number of values N, we get the average of the data. And the error of that average is simply the error divided by N.
    sqrt(E12 + E12 + E12 + …) / N
    There is a way to simplify this IF the errors are all the same. At that point the sum of N errors is just N times the value of the error. So the formula above simplifies to
    sqrt(N * E2)
    This means the error of the average is that error value over N, or
    sqrt(N * E2) / N
    This simplifies to
    sqrt(N) * sqrt(E2) / N
    or sqrt(N)/N * E.
    That is to say, as you average more and more data, the error scales by sqrt(N)/N.
    Now, as you point out, that’s the best case. IF the errors are not equal, then IF the distribution of the errors is symmetrical, the error is increased to sqrt(E2 + SD2), where SD is the standard deviation of the errors. So the error is larger.
    But that larger error still scales the same way, this time as
    sqrt(N)/N * sqrt(E2 + SD2)
    In summary, the issues that you raise of non-stationarity and heterogeneity and the like are real issues … but they are allowed for in the formula because of the variations in both the standard deviation of the data, and the standard deviation of the errors.
    Bottom line is, more measurements give you more precision, but not more accuracy, pretty much regardless of what you are measuring.
    w.

    • I measure the length of 1,000 horses, 1,000 camels and 1,000 dogs. I measure then individually over a year. about 3 a day each. Can I really use sqrt n (3,000 in this case) to reduce the error bar? After all I’m measuring lengths.
      Well OK. lets look at a problem I’m actually working on. I’m measuring the AC line frequency. If I go cycle by cycle my measurement is good to about 1 ppm (I’m clocking my period counter 59 million times a second). But because of line noise the cycle to cycle variation is on the order of 50 to 100 ppm. And that is in fact not the truth. If the grid is functioning tolerably well it can’t change that fast. And I have no way to take the noise out. I can reduce that considerably by averaging 6 cycles (0.1 second in North America). But now because the cycles are continuously varying at a varying rate I don’t really know much about the individual cycles. The old where or when problem you get in quantum mechanics Statistics can’t help much because I want to know the length of every cycle. But there is no way I can find that out. The noise in the system prevents it. Measuring 1,000 cycles does not improve my knowledge of cycle 379. It can place a limit on it. To some extent. But measuring 10,000 cycles will still not improve what I know about cycle 379 by very much. Now if I took 1,000 measurement of cycle 379 at different places where the noise was different I could probably improve my estimate on the order of sqrt 1,000. But baring that my knowledge of 10,000 other cycles does not help with cycle 379.
      And that is the problem you have with these buoys. They can bound the estimates. But the 10,000 buoys more or less evenly but randomly distributed can’t reduce the error much of buoy 379. And that is as much true of the ensemble as it is of #379.
      Or take this case:. you have buoys inside a current and outside a current. (the Gulf Stream say) can you use the one sets of buoys to reduce the error of the other set? Probably not. They aren’t even close to measuring the same thing. To reduce the effect of the measurement noise you have to be measuring the same thing.
      And btw. If you’ve read the whole thread I’m not the only one to make this point.
      The point of sqrt n is to reduce the measurement noise. Can you use an Antarctic thermometer to reduce the measurement noise of a Sahara thermometer? Suppose their biases are different. Then what? Suppose their time constants are different. Then what? Suppose they are both moving. Then what? Suppose their clocks are not well synchronized – then what?
      And on top of that temperature is one of the hardest things to measure accurately. That is why calorimetry (the topic at hand) is so difficult.
      If the measurements given by these buoys were even somewhat honest they would be advertising the time constants (TC) of the measuring apparatus. I’ve looked around. (not extensively) I have yet to see a mention of that problem and how it affects accuracy or the correlation between different thermometers. All they tell you is that the static accuracy is quite good over time. What is the TC of the thermometers? How well do they hold that over time? If you have seen something on that leave a link. If there is even a discussion of that – leave a link.
      Another interesting topic of discussion is how noise affects the Bit Error Rate (BER) of a QUAM modulated signal. You can’t average (sqrt n) the noise to tell how it affected any given bit. The Signal To Noise Ratio (SNR) tells you the average BER for a given bandwidth channel. But it will not tell you which bit(s) is corrupted. Other methods are required. And there is a trade off between BER/SNR and error correction bits that tells you what the information capacity of a given channel is. There are limits that no amout of error correction can overcome.
      I’d like to see something on the information capacity of ARGO. Haven’t seen any mention of that either. All I have seen is “we can measure static temperature quite accurately” followed by “trust us”. As if. Something on the ADCs used would be good. In addition to the sensors. And the bandwidth of the analog circuitry. And the 1/f noise break point. And the noise value at the breakpoint. And the noise slope below the break point. And BTW how well is circuit noise from the microprocessors etc. kept out of the measuring circuits?
      It gets complicated. Much more complicated than sqrt (n).

  40. Crispin in Waterloo but really in Yogyakarta June 7, 2015 at 4:43 am

    Mike I think you posted first but mine is higher up.
    “The fundamental [folly] is that 3000 ARGO measurements are NOT 3000 measurements of the same thing: ”
    Precisely!

    Given that the ARGO measurements are of different things, no such claim can be made for any numbers – they all stand alone.

    Thanks, Crispin. Consider my earlier example of a swimming pool. I want to get the average temperature of the water in the pool. I put in three thermometers in separate locations.
    Are the thermometers measuring the pool water in different locations … or are they measuring three “different things”?
    I say they’re measuring the pool water in different locations … but that doesn’t matter. Here’s an example of why not.
    Suppose I want to know how much the average resident of San Francisco weighs. So I start weighing SF residents at random.
    Do you agree that the more people I weigh, the more precise my estimate will be? Because that is what will happen. Remember that precise means repeatable and is different from accuracy.
    Now, here’s the point. My estimate is getting more and more precise as I weigh more and more different people … but I’m not taking “measurements of the same thing” in your words. I’m taking the weights of totally different people … and despite that, my estimate of the actual average weight keeps improving.
    I suppose you could argue that the people are all part of one “thing”, the “residents of San Francisco” … but if that is the case, then the different parts of the pool are also all part of one thing, the “water in the pool”.
    The aspect of all of this that seems hard for people to grasp is that this error reduction is inherent in the mathematics of averaging any kind of values that have errors. It doesn’t matter what the values are that have the errors. It doesn’t matter the size or details of the grouping, whether the measurements are of all people or just women or distances to the stars or whatever. If there are values with errors and you average them, we know what happens. The errors add in quadrature, and the total error divided by N, the number of data points, is the error of the average. It is pure math, it has nothing to do with what where the numbers come from or what they may be referring to.
    A final example. I want to know the average weight of me, my cow, my chair, and my car. I know the weights of each of them, and I know the error of the various scales used to weigh them.
    From that data I can calculate the average weight, as well as the error estimate for that average. That error will be
    sqrt(Eme2+Ecow2+Echair2+Ecar2) / 4
    where E is the error of the relevant estimate.
    Now, this calculation of the error estimate of the average weight has nothing to do with what is being measured. It doesn’t matter that a cow, a chair, a car and I are most definitely “different things”. It doesn’t matter that two are alive and two are not. It doesn’t matter if the numbers represent weights or ages.
    People claim things like this only works when we’re making repeated measurements of the same object. Not true. It works with an average of cows, chairs and cars. The reduction in error is due to the math, and not to the items being measured. And in general, the error of the average will be SMALLER than the individual errors. Counterintuitive, perhaps, but true.
    Finally, let me say again that accuracy is not increased by averaging, whether the same thing is being measured or not. Precision is increased, which is the same as saying that repeatability is increased.
    Best regards,
    w.

    • Willis writes ” I’m taking the weights of totally different people … and despite that, my estimate of the actual average weight keeps improving.”
      This is a useful argument to demonstrate the point. For arguments sake, lets say a single measurement can reasonably accurately represent 1 cubic km of ocean. So by analogy you can weigh about one person every ten days in my home city. You get to weigh about 36 random people per year in my city.
      Irrespective of what the error is, how useful do you think that degree of measurement is when determining whether the population is gaining or losing weight?

      • Tim, you raise a good point. Fortunately, that’s what statistics are for. Instead of saying “how useful do you think it is”, we can measure exactly how useful it is.
        You can calculate the expected error in the estimated average (mean) as you go. It’s called the “standard error of the mean” or SEM, which is simply the standard deviation of the data divided by the square root of the number of data points. So if I weigh 36 people, and the standard deviation of their weights is 12 pounds, the SEM is ± 2 pounds.
        That means that if the populace gained say 10 pounds each my study has enough resolving power to detect it, but if they all gained a single pound, 36 measurements is not enough to reveal that one-pound gain.
        Unfortunately, the inexorable math says that if I want to make my estimate more precise by one decimal point, I need to measure a hundred times as many weights. This is because as noted above, error scales by sqrt(N)/N, and for N=100, that’s a tenth of the error. So to have the same resolving power for 1 pound as I have for 10 pounds, I’d need to take 3600 measurements.
        These kinds of calculations are important for things like polls. How many people do you need to ask a question in order to determine the true underlying yes/no fraction to within say ±10%? And exactly the same math applies.
        w.

      • Willis writes “That means that if the populace gained say 10 pounds each my study has enough resolving power to detect it, but if they all gained a single pound, 36 measurements is not enough to reveal that one-pound gain.”
        I think you’re forgetting that you’re not taking 36 measurements at once, they’re spread over the year. And over the year people can change their weights significantly. So for example how would you know whether people were generally losing weight over summer and gaining it over winter? You just dont have the resolution or sampling frequency for any of that. Eventually after a very long time you might get enough data to see low frequency trends but you’re simply not going to see anything even vaguely high frequency.

      • TimTheToolMan June 9, 2015 at 2:37 am

        Willis writes

        “That means that if the populace gained say 10 pounds each my study has enough resolving power to detect it, but if they all gained a single pound, 36 measurements is not enough to reveal that one-pound gain.”

        I think you’re forgetting that you’re not taking 36 measurements at once, they’re spread over the year. And over the year people can change their weights significantly. So for example how would you know whether people were generally losing weight over summer and gaining it over winter? You just dont have the resolution or sampling frequency for any of that. Eventually after a very long time you might get enough data to see low frequency trends but you’re simply not going to see anything even vaguely high frequency.

        Thanks, Tim. It doesn’t matter whether the measurements are taken over an hour, a day, a month, or a year. If people gain ten pounds over the year, that will be reflected as an increase in the standard deviation of the results. This will in turn increase the standard error of the mean, and appropriately reduce the resolving power of the study. It’s all taken care of by the math. If we take the measurements over a year, all that happens is that we call it a “yearly average”
        My point is simple. All we are talking about is the averaging of numbers with associated uncertainties. There are clear mathematical rules for taking that average. The rules don’t care whether the data is people’s weights or tomato growth rates or ocean temperatures. The rules don’t care if the measurements were taken in one instant or one hour. Given that those are the measurements and uncertainties, we can say what the uncertainty of the average is.
        w.

      • Willis writes “all that happens is that we call it a “yearly average””
        Right. So after 10 years we have 10 data points. That’s not how it works for Argo, though is it. There are many data points shown but Argo simply doesn’t have the power to resolve that finely.

      • TimTheToolMan June 10, 2015 at 6:17 am

        Willis writes

        “all that happens is that we call it a “yearly average””

        Right. So after 10 years we have 10 data points. That’s not how it works for Argo, though is it. There are many data points shown but Argo simply doesn’t have the power to resolve that finely.

        Say what? Argo gives us thousands of data points per year. From this we construct yearly averages, with an associated uncertainty.
        And yes, after ten years, we have ten years worth of annual averages … what, you were expecting twenty annual averages in ten years?
        And when you say “Argo simply doesn’t have the power to resolve that finely”, that statement is meaningless for a couple of reasons. First, you haven’t specified how finely “that finely” might be. Second, you haven’t specified over what time and space you’re doing the averaging. Annual? Monthly? Global? Regional? 700 metres depth, or 2000 metres? Both of those are critical to knowing how much uncertainty there is in the answer.
        I’ve given above my estimate of how finely the Argo data can resolve the temperature of the top 700 metres of the ocean on a global annual basis. I say that the best case scenario is that we might be able to resolve it with an uncertainty of a tenth of a degree with those constraints (global, annual, 0-700m). However, that comes with some caveats regarding the fact that the Argo buoys don’t sample shallow waters and the like.
        My point is, statistics is what allows us to use averages of scattered, incomplete, fluctuating datasets. It does so by giving us an estimate of the uncertainty associated with the average. It means that we don’t just throw up our hands and say something like “Argo simply doesn’t have the power to resolve that finely”. Instead, we can see exactly how finely Argo can resolve, given a set of specific time and volume constraints.
        Argo doesn’t have any inherent “resolving power” in a general sense. By that I mean that the uncertainty is a function of both time and volume measured.
        The general direction of things is pretty obvious. The more measurements we take, the better our estimate of the average is likely to be. And the more homogeneous the volume of water being measured, the better our estimate of the average s likely to be.
        What statistics does is let us attach numbers to those statements. It lets us know exactly how much better our estimates will be. So instead of simply saying that the average will be better with more measurements, we can say that the error scales with one over the square root of the number of measurements. So if we want half of the uncertainty, we need four times the measurements. And if we want another decimal place on the answer, that’s a tenth of the uncertainty, so we need a hundred times the measurements.
        My best to you,
        w.

      • Willis writes ” Both of those are critical to knowing how much uncertainty there is in the answer.”
        Argo produces one figure that is overwhelmingly the most important for AGW and that is the Ocean Heat Content.
        Obviously Argo produces mountains of data but the point is that say 3 set of readings per month per buoy dont really tell us anything about the ocean because the variation (and error) due to the sparseness of the readings makes monthly “data” worthless. Misleading even.
        A yearly average is just that. An average over the whole year and you cant then subdivide it into months meaningfully – particularly if you’re doing analysis on it such as rates of change.

    • Willis I appreciate your expansive response. You have not wandered as far as others, but there are still a couple of course corrections necessary.
      I will respond in two sections. First the good people of San Francisco. Weigh them and look at the numbers. All the weights end in zero, and you realise they are being weighed to the nearest ten pounds. Averaging all the weights will produce a number that is correct to the nearest ten pounds. If the average answer is 155.613 pounds, the rider is that it is a value plus or minus 5 pounds, a 10 pound range. You can weigh a million residents and calculate a 99.99% confident number for the centre of the 10 pound range, but you cannot reduce the range because the original weights were read to the nearest ten pounds. The trick to dealing with this is to recall that 155.613 is no one’s weight, it is just a number. It was generated with a scale that read to the nearest ten pounds. The only valid report you can make is that the average weight of residents is 160 pounds. Full stop. There is no guarantee whatsoever that if you weighed everyone to the nearest pound the result will be 155.613 or 156 if you report it properly.
      The second issue is with the mixing of the non-overlapping terms ‘precision’ and ‘accuracy’. There are three numbers that rate a scale: repeatability, accuracy and precision. A scale might report the mass 155.613 and next time 155.612 and then 155.611. Very precise and repeatable within 0.001. But the value may be consistently wrong by 2.115 pounds. It is a precise but inaccurate instrument with good repeatability.
      The weight of people in San Francisco cannot be known more precisely than to the nearest ten pounds if the scale has a precision of 10 pounds. Whether the scale is accurate is a completely different matter.
      Let’s weigh someone 100 times. Their actual weight is 155 pounds. The scale says they weigh 150 pounds 50 times and 160 pounds 50 times. The average of all the readings of the same person is 155 pounds and a confidence can be calculated for this value. Is the precision of the number is still 10 pounds? Yes.
      How accurate is the answer? We don’t know until we calibrate the scale against a standard weight. The accuracy of the answer is unrelated to the number of readings. It they are all off by 2.115 pounds then they remain off.
      If the person weighed 152 pounds and we got 150 as the final average, and the precision is still 10 pounds and the accuracy is still unknown without calibration.
      The error people make is to say that many readings of the same thing, like a person’s mass, will increase the precision of the answer. No, it increases the precision of the reported value of the centre of the ten pound range. Multiple readings have no effect of the accuracy of any of them nor the average of the calibration was wrong to start with. A scale can be like some people: disconnected from reality and consistently wrong.
      So let us consider the instrument operating at its design limits. In practise a scale or thermometer has a precision that is ‘worth reporting’. If the inherent variability of the equipment is such that it simply cannot report a weight repeatedly better than 10 grams, then the display will suppress all smaller values. Entering a correct calibration constant and linearisation formula can maintain that 10 gram precision over the full scale range. No problem. Then is will be accurate to within 10 g and report the mass to a precision of 10 g.
      Now weigh 1000 people, once each with that scale. Calculate their average weight. The number might be 70.1245 kg. The scale, having been recently calibrated and not knocked around will be accurate to 10 g. The error will be as you calculated it above. How many digits can we report truthfully? It is 70.12. The 0.045 is an artefact of the calculation and has no value to us because we do not have 1000 opinions of the weight of one person.
      Next weigh 1000 people on 100 scales 10 times each. Ten scales were just calibrated. Ten were calibrated a year ago. The next ten were calibrated two years ago and so on. The accuracy of the final result will depend a lot on the quality of the instruments because some scales drift a lot and some drift less. Some scales are ‘assizable’ and some are not because they cannot maintain their accuracy within acceptable limits for a year. They are marked ‘not legal for trade’ for that reason.
      Now measure the temperature of the ocean using 3600 different instruments in 3600 different places with RTD’s that have a readout value of 0.01 degrees, a repeatability error of 0.01 and a one year accuracy of 0.06 degrees C. Can you support the claim that the temperature of the ocean is known to a precision of 0.005 and an accuracy of better than 0.005 degrees?
      Neither can I. The precision is 0.01 and the the readings after one year are within 0.06 C, assuming it was correctly calibrated at the beginning of the year. Any calculated ‘trend’ within the error bars is no trend at all because we can have greater confidence of No Trend than we can have in Trend. The rest, as they say, is noise.

  41. Willis EschenbachJune 7, 2015 at 7:43 am

    I do admire a man who picks up on an idea and runs with it …
    What is the problem that you have with the format?

    Well Hadley only provide it as NetCDF for now and do not provide a timeseries graph, just a “wow, it hotter now” map.
    This is not much use to normal mortals, especially in the context of Kerl et al’s latest games on global time series.
    I want to extract a lat-weighted global TS for initial comparison.
    the weighted.mean fn only seems able to produce a single scalar result, so I can’t copy what you did for thermocline depth. I find working in R a momumental PITA, so maybe the best thing is to dump it out as ascii and process it in programming language that gives direct control.
    Even without the latitude weighting it is clear that there a some serious issues with NMAT and using it to “correct” purpose built buoys is rather a perverse idea from a scientific point of view. Of course what Karl et al are doing is not motivated by science ….

    • OK, Mike, building on your code I’d go:

      dim(nmat)
      [1]   72   36 1572

      72 rows by 36 columns by 1572 monthly layers. First thing I’d do is put it into normal map format, which is wider than tall.

      nmat = aperm(nmat, c(2,1,3))
      dim(nmat)
      [1]   36   72 1572

      Next I’d grab the list of latitudes, and take the cosines.

      thelats=nc$dim$latitude$vals
      coslats=cos(thelats*pi/180)

      Make the cosines into a 36 x 72 rectangular array:

      cosmatrix=matrix(rep(coslats,72) ,nrow=36,ncol=72)

      Here’s where the beauty of R comes into play. We use the “apply” function to apply the weighted.mean function to every layer (month) of the nmat array:

      monthlytemps=apply(nmat,3,FUN=weighted.mean,w=cosmatrix,na.rm=T)

      That means apply “weighted.mean” to index 3 of nmat, with the weighted.mean variable “w” and “na.rm” set as shown.
      The “3” in the apply function means apply the function to index 3 of (rows, columns, layers), that is to say apply it to each layer rather than each row (1) or column (2).
      As to the time, we have to get the time units and the first couple of time values;

      nc$dim$time$units
      [1] "days since 1850-1-1 0:0:0"
      nc$dim$time$vals[1:2]
      [1] 10972.5 11002.0

      So we’re looking at monthly data, but the start date is 10,972.5 days after January 1, 1850, which turns out to be January 15, 1980. R handles time a couple of ways, most comprehensively as a “POSIXct” object. A POSIXct object keeps time internally in seconds, and it can calculate an offset from a starting time. So I use an offset of 10972.5 * 24 (hr/day) * 3600 (secs/hr) from the start, as follows

      as.POSIXct(10972.5 * 24 *3600, origin = "1850-01-01", tz = "GMT")
      [1] "1880-01-16 12:00:00 GMT"

      So if I were working with the data, I’d make it into a “time series” object starting in January 1880:

      nmatts = ts(monthlytemps, start=c(1880,1), frequency=12)

      That makes it easy to do things like take annual averages, and plot the time series.
      w.

  42. Willis may I suggest a book on Statistical Process Control (SPC). And Information Theory (Channel Bandwidth/Information Limits). The precision of a group of widgets is not improved by measuring 10,000 of them vs 100 of them. The precision of the average can improve. But if the error bar for 100 widgets is 10% and the process is in control the error bar for 10,000 is still going to be 10%. Let us say for 100 you get a measurement average of 10.1 that would be +/- 10% so 10.1 +/-1 and for 10,000 you might be able to say that you have a measurement average of 10.11 +/- 1. You can’t reduce the error of the process by averaging. You just get a more accurate estimate of the average. But it does not improve your process.
    People not familiar with SPC make these kinds of rookie errors all the time.
    ==================
    And then there is this problem. Due to the Time Constant (TC) of the instrument a temperature rate of change of 2 degrees per minute is going to have a different error than a temperature rate of change of 1 degree per minute. If the instrument time constant is 1 second (unlikely) the error is small for that variation in rate of change. If the time constant is 30 seconds (likely) the error will be much different for the two rates of change.
    The static accuracy (.005C) does not tell you a lot about measurement accuracy in a variable rate of change situation.
    That seems to have been glossed over.

    • And it gets worse. If the time constants of the instruments varies significantly (not unlikey) then the errors of the various instruments is going to vary. significantly. Instruments rising 700 m in the Tropics in the summer are going to have larger error bars than those rising 700 m in the Arctic in the winter. Where is this discussed? All I’ve seen mentioned is static errors. Clever boys.

    • M Simon
      Re widgets. From many readings you do not get a more accurate average, you get a more precise value for the centre of the range of error. The accuracy depends on the instrument and how well it was calibrated before use.

  43. I find this article excellent in many ways, however I think it could be improved by following a recognized international standard for expression of uncertainty. There is a freely available and excellent guideline called:
    Guideline to expression of uncertainty in measurement.
    http://www.bipm.org/en/publications/guides/
    Section 4.2.3 covers the uncertainty of the average.
    You deviate from this standard in a few ways, most significantly:
    1. By using the term error in place of the term uncertainty
    2. By not making explicit the confidence level or coverage factor
    Section 7 (Reporting uncertainty) contains a few guidelines on expressing and reporting uncertainty.
    About the guide:
    “This Guide establishes general rules for evaluating and expressing uncertainty in measurement that are intended to be applicable to a broad spectrum of measurements. …. The ClPM Recommendation is the only recommendation concerning the expression of uncertainty in measurement adopted by an intergovernmental organization.
    ..
    The following seven organizations* supported the development of this Guide, which is published in their name:
    BIPM: Bureau International des Poids et Measures
    IEC: International Electrotechnical Commission
    IFCC: International Federation of Clinical Chemistry **
    ISO: International Organization for Standardization
    IUPAC: International Union of Pure and Applied Chemistry
    IUPAP: International Union of Pure and Applied Physics
    OlML: International Organization of Legal Metrology “

    • Also – you could respond more easily to many of the comments here by pointing to the standard and asking commenters to use a terminology in accordance with this standard. That is one of the great benefit of standards.

  44. Richard, right on. An additional point is made in a description from Columbia U.
    “When air is contact with the ocean is at a different temperature than that the sea surface, heat transfer by conduction takes place. On average the ocean is about 1 or 2 degrees warmer than the atmosphere so on average ocean heat is transferred from ocean to atmosphere by conduction.
    If the ocean were colder than the atmosphere (which of course happens) the air in contact with the ocean cools, becoming denser and hence more stable, more stratified. As such the conduction process does a poor job of carrying the atmosphere heat into the cool ocean.”
    They calculate:
    Solar heating of the ocean on a global average is 168 watts per square meter
    Net LW radiation cools the ocean, on a global average by 66 watts per square meter.
    On global average the oceanic heat loss by conduction is only 24 watts per square meter.
    On global average the heat loss by evaporation is 78 watts per square meter.
    https://rclutz.wordpress.com/2015/05/10/empirical-evidence-oceans-make-climate/

  45. Lots of good comments above, However, ALL are still ignoring major factors that contribute to inaccurate measurement of the temperature of the ocean.
    1. The stated accuracy is only good in laboratory conditions. Changes of ambient temperature will affect the reading.
    2. The stated accuracy is for the Electronics – where is the accuracy for the RTD AND the accuracy for the loop (electronics and RTD and connecting conductors?
    3, All electronics operated differently at different temperatures and voltages. Where is the chart/table providing the degradation of accuracy in relation to the change in ambient temperature and operating voltage?
    4. The buoys sit at [1000] feet for a long period of time, become acclimated to that temperature and then rise taking the temperature at various elevations is it rises. What is the initial temperature of the electronics and the degraded accuracy for that temperature? What is the temperature of the electronics and the degraded accuracy for that temperature for each of the elevations it takes another measurement?
    5. The buoy has a pump that pumps water past the RTD where the temperature is taken. What is the TC for the flow of water from the suction to the RTD? What is the TC of the RTD? Is the TC for the medium protecting the RTD from the sea water? Where is all of the data for this?
    6. Battery voltage and currant capacity decreases with temperature. Since the buoy sat for a period of time at low temperature the electronics will be affected. The accuracy is affected by operating voltage. How is this factored into the reading? Does the buoy rise fast enough that it stays at this low voltage/temperature affecting all readings or will it warm up as it rises causing different inaccuracies as it rises?
    This device seems to be the most expensive piece of equipment to purposely ignore multiple important factors to the point that the data it generates is, by design garbage.

    • The electronics handle the voltage issues by operating the electronics at a voltage below the battery minimum.
      The temperature is corrected by using a 4 wire RTD which automatically compensates for the changing resistance of the wires. It’s clever and simple. Most of what you mention are non-issues. There is a hysteresis on the temperature but the rise rate is known and can be automatically corrected by the onboard computer with a signal timing algorithm.
      Your points about the electronics are spot on when it comes to interpreting the output from the RTD. It is a variable resistor, not a temperature reporter.
      A gas chromotagraph can do clever and precise and delicate things, but only once per 30 seconds. Same with an FTIR. I’ll bet an ARGO doesn’t make more than one reading per second.

      • Get on ARGO and/or seabird and read what they are doing. They calibrate the electronics to a Standard RTD resistance source and rely upon the fact that the RTD follows the manufactures standard resistance/temperature curve. NOT the actual RTD at a known temp, That is not how temperature instrumentation is calibrated for nuclear power plants, oil refracting and other processes requiring accurate temperature. That works, more or less, under laboratory conditions with no problems, but how does it work with the electronics at 1000 meters at an ambient temp of 5- 6 oC for several hours?

        • usurbrain

          That works, more or less, under laboratory conditions with no problems, but how does it work with the electronics at 1000 meters at an ambient temp of 5- 6 oC for several hours?

          Worse – each different buoy over its lifetime irregularly coated by different layers of different marine biologics and skum and contaminates – NONE of which can be “calibrated out” because each is different on each different buoy and the different ties each buoy spends in each different sea temperature and sunlight conditions.
          The calibration (at manufacturer) is in a single tank with clean (sterile!) water at absolutely known conditions. Thereafter? Every buoy will change differently over its lifetime uncontrollably from every other buoy!.
          Now, to the point of the paper: Is a series of un-controlled ships randomly dropping buckets over the side in specific shipping lanes using ??? uncalibrated, unknown thermometers recorded under unknown conditions BETTER and MORE RELIABLE to a 1/2 of one degree such that you CHANGE the calibrated buoy temperatures back to what the ship buckets claim they had?

  46. I’ve never taken ARGO’s error estimates particularly seriously, any more than I take HadCRUT4’s estimates seriously. HadCRUT4, recall, has surface temperature anomaly error estimates in the mid-1800s only two times larger than they are today. I don’t think so. In fact, I think it is an absurd claim.
    ARGO has the same general problems that the surface temperature record has, only much worse. For one thing, it is trying to measure an entire spatiotemporal profile in a volume, so they lose precision to the third dimension compared to the two dimensional (surface!) estimates of HadCRUT4. HadCRUT4 presumably at this point incorporates ARGO for sea surface temperatures (or should). Yet it only asserts a contemporary anomaly precision on the order of 0.15 C. Surely this is on the close order of the error estimate for the ocean in depth, as it is a boundary condition where the other boundary is a more or less fixed 4 C on the vast body of the ocean.
    This absolutely matters because we are really looking at non-equilibrium solutions to the Navier-Stokes equation with variable driving on at least the upper surface and with a number of novel nonlinear terms — in particular the haline density component and the sea ice component and the wind evaporation component (which among other things couples it to a second planetary-scale Navier-Stokes problem — the atmosphere). The Thermohaline Circulation pattern of the ocean — the great conveyor belt — carries heat up (really enthalpy) up and down the water column at the same time it moves it great transverse distances at the same time it drives turbulent mixing at the same time the atmosphere and rivers and ice melt are dropping in fresh water and evaporating off fresh water and binding it up in sea surface ice and heating it and cooling it so that the density and hence relative bouyancy varies as it flows around the irregular shapes of the continents, islands and ocean bottom on the rotating non-inertial reference frame surface of the spinning oblate spheroid that gravitationally binds it.
    Just computing the relaxation times of the bulk ocean is a daunting process. If we “suddenly” increased the average temperature of the surface layer of the ocean by 1 C and held it there, how long would it take for this change to equilibrate throughout the bulk ocean? Most estimates I’ve read (which seem reasonable) suggest centuries to over a thousand years. The only reason one can pretend to know ocean bulk temperatures to some high precision is because the bulk of the ocean is within a degree of 4 C and there is a lot of ocean in that bulk. This knowledge is true but useless in talking about the variation in the heat content of the ocean because the uncertainty in the knowledge of the upper boundary condition is as noted order of 0.1-0.2 C. — if you believe HadCRUT4 and the kriging and assumptions used to make the error estimate this small.
    The sad thing about ocean water is that it is a poor conductor, is stratified by density and depth, and is nearly stable in its stratification, so much so that the relaxation time below the thermocline is really really long and involves tiny, tiny changes in temperature as heat transport (pretty much all modes) decreases with the temperature difference. This also makes assumptions about how the temperature varies both laterally and horizontally beg the question when it comes to evaluating probable bulk precision.
    rgb

  47. Mike June 7, 2015 at 10:50 pm
    re Willis Eschenbach June 7, 2015 at 12:59 pm
    This map is interesting, but I think you’ll find the ITCZ is the zone, just above the equator, where we see *more* floats? Either side there is a bit less float density. This would suggest that there is some drift towards ITCZ, There will be surface winds and wind induced surface currents as this is where there is most rising air. Air is drawn in either side and thus deflected westwards by Coriolis forces causing the warm westward currents either side of ITCZ.
    KK says : “the buoys, being buoyant, will tend to drift toward higher ocean surface”
    No a buoy is a massive object and will go to the lowest gravitational potential : a dip, like a ball on uneven ground.
    The sea level is higher along ITCZ but that is due to the same winds and wind driven currents that seem to affect ARGO distribution. You raise a valid and interesting point that had not occurred to me before, just the logic was wrong about the cause.
    I’m sure Karl et al can make suitable correction to ARGO to create some more global warming using this information 😉

    Mike:
    This is more complex than you state. The buoys are massive objects, indeed, but so is a unit volume of water–both are buoyant in the sense that a pressure distribution on the mass maintains its vertical position. This is not so simple as stating that water and other massive things go downhill. We need to consider the lateral pressure gradients, or lateral forces, that are not likely to be in static equilibrium. Think of the warm-core rings that break off the gulf stream as an example.
    There are several forces a person must consider in deciding what the buoys will do. In order to maintain an anomalous height (slope) of the ocean surface requires a lateral pressure gradient. What is the origin of this? Let’s look at the floats from a Lagrangian standpoint–a coordinate system that moves with the float. The lateral pressure gradient derives from gradients in density (temperature or salinity), and because the coordinate system is non-inertial there is also potentially a coriolis force and centrifugal force (no arguments, please, about whether or not these are real forces…we are in a non-inertial system and they behave like real forces). The lateral temperature and salinity gradients are not effective at the surface to maintain a slope and so water would slowly flow downhill if not for the dynamic influences. So to maintain a slope at the surface requires coriolis or centrifugal forces that sum to an inward directed force, or a constant compensating flow of water from depth. That is, the water mass must rotate or there has to be an inflow at depth or both. If the floats drift at 1000m depth they can easily drift toward the ocean surface high. If they spend enough time at the surface, they might drift away from the surface high following the likely divergent water flow. What one needs is information about is the secondary flows of water involved in maintaining ocean height anomalies. I see the potential for temperature bias in all this, and I cannot get any information about how people analyzed this problem and decided if it is a problem or not.
    I think it is time to begin the arduous task of gathering drift data to see what it looks like statistically.
    Kilty

  48. Mod: I posted a long reply to Willis and Mike at about 9:35 am PDT and it appears to be no where, can you check for it, please? I hate to retype the whole thing.

  49. Mod: never mind. It finally refreshed once I sent my previous plea for your help. Thanks.

  50. Crispin in Waterloo but really in Yogyakarta June 8, 2015 at 7:27 am

    Willis I appreciate your expansive response. You have not wandered as far as others, but there are still a couple of course corrections necessary.
    I will respond in two sections. First the good people of San Francisco. Weigh them and look at the numbers. All the weights end in zero, and you realise they are being weighed to the nearest ten pounds. Averaging all the weights will produce a number that is correct to the nearest ten pounds. If the average answer is 155.613 pounds, the rider is that it is a value plus or minus 5 pounds, a 10 pound range. You can weigh a million residents and calculate a 99.99% confident number for the centre of the 10 pound range, but you cannot reduce the range because the original weights were read to the nearest ten pounds.

    Lots of folks believe that, Crispin. Like you, they think the average can’t be any more precise than the underlying measurements. I’ve explained why this is not so—perhaps I can demonstrate it better than explain it.
    I’ll first generate 100,000 random numbers with a mean of 150 pounds and a standard deviation of 20 pounds to represent the weights. I’ll round them to the nearest pound, and display the first ten

    > randoms=round(rnorm(100000,150,20),0)
    > randoms[1:10]
     [1] 114 152 163 130 148 128 129 165 126 158

    Then I’ll take those same random numbers, and round them all to the nearest ten;

    > rounded_randoms=round(randoms,-1)
    > rounded_randoms[1:10]
     [1] 110 150 160 130 150 130 130 160 130 160

    As you can see, these latest figures are what we’d get if our scale only read to the nearest ten pounds.
    Now, let’s compare the averages:

    > mean(randoms)
    [1] 149.9893
    > mean(rounded_randoms)
    [1] 149.9823

    This demostrates that is absolutely NOT true that the average can only be as precise as the underlying measurements. Despite one set of measurements being to the nearest pound and the other to the nearest ten pounds, the means are within the expected variation.
    And how much is that? Well, the expected variation is the “standard error of the mean”, or SEM. The SEM is a measure of the interval where the true mean is likely to fall. It is calculated as the standard deviation divided by the square root of the number of data points.
    Using that relationship, the SEM for these two datasets are

    > sd(randoms)/sqrt(length(randoms))
    [1] 0.06332595
    > sd(rounded_randoms)/sqrt(length(rounded_randoms))
    [1] 0.06395239

    As you can see, difference between the two means is well within the SEMs.
    So no, Crispin, we’re not limited to the precision of the underlying measurements. The average can be much more precise than any of the measurements.
    w.

    • Agree. See this freely available international standard for formal support of your claim:
      “Guide to expression of uncertainty in measurement”
      http://www.bipm.org/en/publications/guides/
      Section 4.2.3 covers the uncertainty of the average – the “experimental standard deviation of the mean”.
      IPCC failed to notice this international guideline in their guidelines.
      “Guidance Note for Lead Authors of the IPCC Fifth Assessment Report on Consistent Treatment of Uncertainties”
      IPCC can easily be outperformed here.

    • Thank you Willis.
      This is your comparison:
      Average of True weights rounded to 1 pound
      We are 95% confident the average lies between 149.99 and 150.12.
      Average of the same True weights rounded to 10 pounds
      We are 95% confident the average lies between 149.85 and 150.11
      I am familiar with the concepts you outline and sometimes use them in my work. Your true weight values do not have any uncertainty attached. Rounding does not introduce uncertainty, it introduces a perfectly balanced rounding which, done on the same 100,000 random numbers, produces virtually no change in the average nor the standard deviation, which all should notice increased, not decreased, when rounded to 10 pounds. The magnitude of the increase is related to the number of samples, of course. Do it with 3600 numbers.
      Your example uses ‘real’ weights of people and rounds them ‘precisely’. You weighed them all with the same perfect scale, mathematically speaking. That is not a good analogy. What we are discussing is not an idealised calculation, it is a practical experiment with additional considerations.
      Generate a set of random weights as precisely as you like, 10 digits. Apply to each one a random error which is akin to the repeatability value of each scale (this applies as well to RTDs). This will truncate the number of valid digits. You can keep lots of significant digits, but beyond a certain number they carry no information. Store the value of the uncertainty applied because you will need it later.
      Then apply to each weight another random error which is akin to the accuracy of each scale as they drift away from reality over time. This further truncates the number of digits that carry valid information. Store the value of the uncertainty applied because you will need it later.
      Then add another random error which represents whether or not the individual just ate a donut. This is akin to the micro climate of the water body being measured. Getting a person’s True Weight is not really possible because their weight changes all day long, just as does the temperature of any volume of real water. (I just had to get the lab staff to use a much smaller volume of water in order to calibrate the thermocouples against the RTD’s.) Store the value of the uncertainty applied because you will need it later.
      Anyone’s ‘true weight’ has a +- attached. In other words, the numbers you generated as people’s true weights have to be recognised as having their own uncertainties. You compared their average with the 10 pound-rounded-average and indicated they are very similar. The two averages are not similar, they are statistically speaking, functionally identical because the system cannot confidently tell whether or not rounding to 1 pound or 10 pounds makes a difference. Your proof is self-referential. It is a proof, near as dang-it, that the numbers were randomly generated.
      Instead, have 100,000 San Franciscans put on a watch that weighs 0.01 pounds. Weigh them all using 100,000 different scales with and without the watch, first using a scale that is accurate to 10 pounds and then another accurate to 1 pound. (Using a 1 pound scale and rounding to 10 is not the same thing. That’s what you did.) Calculate their average weights with and without the watch. Will you detect the watch? Everyone weighs more, right? There is 1000 pounds of watches in there.
      No, it is an ‘undetectable change’. Enough people will have eaten or not eaten a donut to hid the mass of the watch. Enough scales will have deviated their second readings enough to entirely hide the watch. The 10 pound readings couldn’t detect a purse.
      Put a watch on half the group. Weigh the two groups using the same 100,000 scales and they all change scales this time. Can you tell which group has watches and which does not? No. The variability of each reading is larger than the mass of the watch. If you weighed each person 100,000 times each on their respective scales, you might be able to detect the watch but not with one weighing per person.
      What is the standard deviation of one reading?
      Next have 100,000 different people from the same total SF population put on a hat that weighs 0.1 pounds and repeat the readings using the 1 pound scales. You know the hats are there. Can you tell if the people participating are from the first lot or the second? If not, then the deviation of the first and second average weights from the true average weight are smaller than Limit of Quantification (LoQ). Good so far.
      Can you detect the hat’s mass, or even if it is there at all? No, because the Limit of Detection (LoD) is larger than the mass of the hat. The true measurement of LoD should also include all systematic errors into the calculation of standard deviation. You can prove the average is ‘within bounds’ but you cannot detect the hat no matter how may people you weigh once each.
      We are right back to square one. If the measuring device cannot confidently detect the change, averaging a large number of readings that cannot detect the change will not detect it confidently. Higher precision with a low precision instrument is only obtainable by taking a large number of readings of the same thing with that same instrument, and that still doesn’t make it accurate. Accuracy requires calibration and recalibration. A certified lab spends 20% of their time calibrating systems. How much time is spent calibrating ARGOs?
      Taking a large number of readings (OK, not all that large) of isolated portions of an ocean which are known not to be the same temperature throughout cannot allow one to claim that the average temperature of the whole ocean is known with greater precision and accuracy than the precision and accuracy of the individual measurements of each portion. If it were true, we would not need RTDs, we could just put hundreds of thousands of cheap, uncalibrated thermocouples into the water and average the readings.
      Unlike UAH satellites, these ARGO floats cannot be recalibrated after launch. All we know is that measurement uncertainty starts off above 0.01 C and increases with time. While one can estimate that their condition will drift ‘randomly’ and that ‘on average’ their conditions will be the average of the manufacturer’s curve, they are not measuring the same thing, ever, so it doesn’t help to know that. It is just another increase in uncertainty.
      Uncertainties must propagate through subsequent calculations. No one can confidently detect a 0.005 degree change in ocean temperature with measurements at 3600 sample locations that are individually +- 0.06 Degrees. Moby Dick can swim twice through the uncertainty hole.

    • The average may be very accurate, but the total energy content will not be accurate.
      Because temperature is the Kinetic Energy of the associated material you will get different results based on how and when the math is done. Example (I use a unit temperature conversion to kinetic Energy such that produces Temp^2*1 = KE)
      Example
      Energy Calculation
      (#1 – 10C 1unit volume + #2 – 2C 1 unit volume)/2 = Avg KE
      KE_1 = 100 KE_2 = 4
      Total KE = 104
      Avg KE = 52
      Sqrt(Avg KE) = 7.21C
      Temperature Calculation
      (T_1 + T_2)/2 = 6C
      So, averages are nice numbers but they do not calculate out when roots, powers, or division are used because these functions are non linear.
      Which, unless I am wrong, you cannot add temperatures together and use the average for energy content because mixing equal amounts of water at 2 different temperatures does not produce the average of the two temperatures.
      You can test this yourself. You need an IR temperature sensor (for quick response). A 2 cup bowl, a 2/3 cup measuring cup, some really cold water, some hot tap water and a microwave.
      Put 2/3 cup of faucet hot water into the bowl and then heat up in the microwave for about a minute. You should have water in the 150F range. Stir the water to make sure you don’t have an upper layer of hot water!
      Measure the temperature of you cold water, if from the refrigerator then it should be on the order of 30F to 40F. Now pour in the 2/3 cup of the cold water into the hot water and stir it up good. Then measure the temperature of the mixture.
      This mixing and measuring should take less than a minute or you may be cooling off due to natural processes.
      I just did mine again and I got
      Hot water – 152.2F
      Cold water – 38F
      Mixed water – 98.3F
      Average of 152F and 38F is 95.1F.
      A difference of 3.2F (-3.3%) difference.
      If they used the average temperature from various readings to calculate the oceans heat content then they are way off!
      And even at that, to prevent bias due to measurement accuracy it seems they would have to calculate each and every points high and low energy values then sum them to get any where near an accurate number for the energy content of the ocean at any one time, especially for comparison from year to year. Does anyone know if they did that?

      • Bruce, the average is not very accurate, not in the sense claimed. If I used a 100 pound resolution scale and put a 2 pound dog on it, and measured it 1 billion times, do you think the dog would eventually show up as having a non-zero mass?
        The reasons why not is the same reason the cosmic background radiation was not found for so long: they needed far more precise instruments to detect it. Measuring billions of times with a less precise instrument did not detect something below the detection limit. You can generate hundreds of examples. This is different from leaving a photo plate exposed to a distant galaxy and waiting for a long time. That works because the film really can detect the incoming photons which are very sparse but energetic Placing a metal plate over the film rendering the photons ‘undetectable’ will not generate a picture of the galaxy because the film can’t detect them anymore.
        An ARGO float cannot detect a temperature change of 0.005 degrees, at all, let alone reliably. Therefore we cannot know the average temperature of the oceans to that level of precision based on the measurements available.

        • Crispin,
          I’ve been working with surface stations, and to be honest, I’m not sure what is correct or not.
          So, if I may, I’d like to outline what I’m doing and get your thoughts, plus it will be more of a real example.
          Let’s start with a single surface station, NCDC says the temps are +/-0.1F
          I take today’s minimum temp and subtract tomorrows to get the difference. In my mind this is today-tomorrow +/-0.2F. Now I take tomorrows min, and subtract the day after tomorrow’s from that.
          I now have the difference for the first pair of days and the second pair of days. But the min temp on tomorrow can’t be both +0.1F and -0.1F at the same time, it can only be one or the other.
          Is the difference values +/- 0.15F?
          Then if I string 365 days would the daily difference be +/- 0.1F +/- 0.00027F so slightly more than +/-0.1F?
          Should this be rounded to 1 decimal place?
          Okay, second scenario instead of a single station I have 1,000 stations I want to calculate the day to day change for. I take the difference I calculated for the single station, and average that difference for each of the 1,000 stations together as the average day to day change.
          What is the accuracy and precision for the average day to day change?
          How many decimal places should this be?
          Thanks!

      • Crispin
        I realize what you are saying about the accuracy of any measurement by the Argo instruments, I am saying you cannot take 2 temperature readings, average them and get the average temperature of the (in this case) ocean. The temperatures are not linear functions and therefor cannot be averaged regardless of how accurate each reading is.
        The fundamental physics behind temperature (average Kinetic Energy [Ke = 0.5*m*v^2] ) cannot be determined by adding several temperatures and averaging them. So the whole exercise seems to be a waste of time. You have to average the Ke, not temperature.

    • Willis writes “This demostrates that is absolutely NOT true that the average can only be as precise as the underlying measurements.”
      Crispen said it in detail but for those who dont read his post, Willis has made the mistake of applying a symmetrical “correction” across the data and that wont impact on the average much. His example doesn’t demonstrate his argument. Sorry Willis.

      • TheToolMan
        The key ingredient in Willis’ example is that he used a random set of numbers with a known average, and used the same numbers rounded to the nearest 10. That is not the same as generating numbers to the nearest 10 and another set of numbers to the nearest 1. Using the ‘1’ numbers and rounding the last digit has the predictable effect of not moving the average much because the ’10’ numbers are in fact the same as the ‘1’ numbers.
        With measurements made once with unique instruments in an environment you know is different each time, there is no gain on the precision of the average if one takes additional, unique measurements of different things with additional instruments. The ‘rules’ of making lots of measurement are reserved for multiple measurements of the same thing made using the same instruments.
        That is why the 0.005 change in ocean temps is called ‘false precision’. No single instrument can detect such a change – it is literally lost in the noise. Only taking many readings of the same place, probably at the same time, can detect the signal in the noise.
        It relates to Willis’ swimming pool. We know for sure there regions of the pool that are not homogenous. The temperature varies all over. Therefore all readings are unique and taken once. To get a ‘more precise value’ from the instruments used, the swimming pool has to be stirred so it is the same temperature everywhere, in which case there is no need to spread the devices around – they can all be in one place because the temperature is the same everywhere.
        Obviously that is never the case, and is the same in the oceans. Each measurement stands along with its set of uncertainties. The standard deviation of a single measurement is “Does Not Apply”. There is no CoV for 1 measurement. A measurement taken 5 minutes later in in different conditions and a different answer is expected. Additional precision come from measurement that are expected to be the same.
        The implications of this are huge for the outrageous precision claimed for land and sea temperatures. To say we are measuring ‘1 atmosphere’ is not gonna carry the day. The atmosphere is inhomogeneous. All readings of its temperature stand alone in terms of precision, unless you have a way to make multiple measurements at each site.
        Suppose you put 1000 thermometers in a Stevensen screen, each one giving 0.5 deg accuracy, i.e. readable by eye in 0.5 degree steps. Assume competence in the readers. You calibrate them and record temps. There you have 1000 readings of the same thing. This approach is taken at CERN and using XFR analysis where fantastic precision can be obtained by taking a large number of readings and patiently recording them for hours – with the same instrument. Using multiple instruments introduces some uncertainty but the precision will be much better than any individual thermometer, and must be reported with an uncertainty. One can even say if is good to 0.01 with A confidence, 0.1 with B confidence and 0.5 with C confidence. 2014 was the hottest year evah with 3/8ths confidence, out of 1.0. I am surprised he had the guts to admit it after making such a silly claim. False precision, false confidence, false conclusion IMV.
        That example is completely different from taking one reading from each of 1000 locations each of which is expected to be different, with 1000 different thermocouples, even if they were calibrated once-upon-a-time.

      • Crispen writes “That is not the same as generating numbers to the nearest 10 and another set of numbers to the nearest 1.”
        Absolutely. Willis is concentrating on the error of the measurement itself and not on the error inherent in the method.

  51. Willis,
    I still think you need an education in elementary statistical process control.
    Climate science though is special. There are special rules for it.
    Sorry to see you catching the disease.

    • I disagree.
      And I also have an issue with your argument. It is not a decent argument. It is an Ad hominem argument.
      The uncertainty of the average will be reduced by 1/ (square root of the number of measurements)
      if the individual observations differ in value because of random variations in the influence quantities, or random effects. Averaging will not reduce the error caused by systematic effects in measurement or sampling.
      See:
      “Guide to expression of uncertainty in measurement”
      http://www.bipm.org/en/publications/guides/
      Section 4.2.3 covers the uncertainty of the average – the “experimental standard deviation of the mean”.

      • “Averaging will not reduce the error caused by systematic effects in measurement or sampling.”
        Such as too slow a rate of sampling resulting in the average moving before enough data is acquired to calculate it, for example?

  52. Being new to this, does anyone know how many papers are based on temperature rather than energy content/absorption in this whole warming/climate debate? Such as, are the models based on linear temperature changes with energy or are they based on a squared/square root functions of some sort?

  53. “As you might imagine, in the stormiest areas the largest waves mix the ocean to the greatest depths, which are shown in green and blue. You can also see the mark of the El Nino/La Nina along the Equator off the coast of Ecuador.”
    Willis, I disagree. I believe that what you are seeing is upwelling and downwelling areas, or if you prefer, areas of deep water ventilation and areas of deep water formation.
    In areas of upwelling the mixed layer is zero, and in areas of downwelling, it is very deep. Upwelling occurs as a result of Ekman transport along the eastern continental edges and the trade wind ITCZ edges of the Hadley gyres. You can even see delineations of very shallow mixed layer along both tropics, the mean poleward edge of the Hadley/Ferrel analog of the ITCZ.
    For reasons not yet clear (but for an ice covered continent) the very same wind shear seems to produce downwelling along the poleward edges of the Ferrel cells. Downwelling is unimpeded on the Southern Hemisphere but the only vestige in the continent clogged Northern Hemisphere is near Greenland (notably the best NH approximation of an ice covered continent).

  54. Crispin in Waterloo but really in Yogyakarta June 9, 2015 at 9:30 am

    TheToolMan
    The key ingredient in Willis’ example is that he used a random set of numbers with a known average, and used the same numbers rounded to the nearest 10. That is not the same as generating numbers to the nearest 10 and another set of numbers to the nearest 1. Using the ‘1’ numbers and rounding the last digit has the predictable effect of not moving the average much because the ’10’ numbers are in fact the same as the ‘1’ numbers.

    Egads, didn’t I suggest that you try out your claims on the computer first? If not let me do so. These questions can be solved using an Excel spreadsheet, or as I do in R. It makes little difference whether we use new random numbers. Here’s the new calcs …

    > first I take 100,000 random values
    > randoms=round(rnorm(100000,150,20),0)
    > randoms[1:10]
     [1] 192 123 144 121 162 125 191 163 157 110
    >
    > # Then I’ll take a new set of random numbers, and round them all to the nearest ten;
    >
    > rounded_randoms=round(rnorm(100000,150,20),-1)
    > rounded_randoms[1:10]
     [1] 160 140 150 140 130 150 150 130 120 160
    >
    > # As you can see, these latest figures are what we’d get if our scale only read to the nearest ten pounds.
    >
    > # Now, let’s compare the averages:
    >
    > mean(randoms)
    [1] 150.059
    > mean(rounded_randoms)
    [1] 150.0425

    As you can see, despite using a new set of random numbers, the average is within 0.01 … so I fear your explanation is simply wrong.
    Sorry,
    w.

    • Willis I appreciate your demonstration. You are not grasping the fundamental problem of trying to report something that cannot be detected by the instrument. It is not a matter of knowing how to run a program.
      If you measure to 5 significant digits once, how much confidence can you have in the numerical value of a 6th significant digit? None whatsoever. Why? Because you only have one reading to use to estimate (guess) it. The standard deviation of one reading is 0.000. CoV of one reading is 0.000. We have 100% confidence that the one reading is the one reading. We can have no confidence in another significant digit because it was not measured. Karl et al (and many others) claim the ARGOs did. WUWT??
      Did you follow the example of the extra mass on 50,000 people and another 50,000 people without it? If the scales they are using are only allowed to measure the mass once, the mass is undetectable. Full stop.
      Us being right or wrong about averages of large numbers of random numbers is not relevant to the measurement problem. If you used 10,000,000 random numbers the result would have been closer. Why?
      You did not add, as I suggested to make the demonstration relevant, variability (un-confidence) to the numbers, right? All your numbers have no error. You set the final target to be 150.000000000 unless you used double precision.
      randoms=round(rnorm(100000,150,20),0)
      Why then is there any surprise that the final answer is about 150? You guaranteed the answer would be close. We don’t know the actual average temperature of the ocean, that is why we are measuring it.
      Did you catch my point about your swimming pool? Taking multiple measurements of the pool is not going to give you ‘better precision’ of its average temperature because the water temperature is different in each position. It is not multiple readings ‘of the same thing’. Each reading has a precision. You can’t measure in each place once to two significant digits and get an average answer with three significant digits, or four. That’s high school lesson material. By implication, you are claiming it is possible, in concert with Messrs Karl et al.
      The measurement problem is that each reading has an inherent variability comprised of multiple factors and the total uncertainty of every reading is larger than the claimed trend in ocean temperature. That claim is not supportable by mathematical manipulation. The data needed to make, at a higher precision, a claim as to where the centre of the error bars are, is simply not there. We do not have 100,000 readings or even 30 from each position of each instrument. We have only one, and each has its little imprecision bars to go with it. Such uncertainty propagates.
      Having 100,000 readings from a temperature-inhomogeneous ocean is not the equivalent of 100,000 readings of a temperature-homogeneous ocean. The CAGW edifice rests on such fundamental conceptual errors (and models without skill, of course). Unique location measurements of air temperature to within half a degree carry their error bands with them through all subsequent calculations. This is standard ‘propagation of errors’ stuff. My life would be a lot easier if errors disappeared instead of propagating! They are fecund little buggers.
      There is no data set with a known standard deviation available for each ARGO data point. The entire business of ‘calculating things to a higher level of precision’ than is available from the raw data is not even smoke and mirrors. There is no mirror. There is no smoke. The claim to be able to confidently report the temperature of any ocean to within 0.005 degrees C is just wrong by slightly more than a numerical, figurative and logical order of magnitude.

  55. Crispin in Waterloo but really in Yogyakarta June 9, 2015 at 10:40 pm

    Having 100,000 readings from a temperature-inhomogeneous ocean is not the equivalent of 100,000 readings of a temperature-homogeneous ocean.

    Of course it’s not, and everyone working in the field knows it. The difference between them and you seems to be that they understand that if what is measured is inhomogeneous, the standard deviation is much greater, and that widens the uncertainty. They also understand that everything is somewhat inhomogeneous, and that the statistics reflect that.
    I don’t understand your problem with averaging inhomogeneous things. Suppose I have a lump of something inhomogeneous, where different parts have different densities. Does it have an average density despite being inhomogeneous? Of course it does.
    Now, which will give us a better estimate of the average density—taking one sample from one location, or taking many samples from different locations?
    That one is obvious as well … which makes it clear that despite the inhomogeneity more measurements give us a better estimate of the overall average density.
    Now, of course the less difference there is in different parts of the substance, the more accurate our estimate will be for a given number of measurements. So as you say, “Having 100,000 readings from a temperature-inhomogeneous ocean is not the equivalent of 100,000 readings of a temperature-homogeneous ocean.”
    But the only difference is that one has a very wide standard deviation, and one has a narrow standard deviation. And since the standard error of the mean is linearly proportional to the standard deviation, this means that the more inhomogeneous a substance is, the more uncertain our estimate of the average will be.
    As I said, however, this is all taken up by the relationship between the inhomogeneity, the resultant widening of the standard deviation of the measurements, and the uncertainty of the answer.
    Statistics is specifically designed to deal with just the type of inhomogeneities that you discuss, Crispin. They give actual numbers that bound the uncertainty.

    The entire business of ‘calculating things to a higher level of precision’ than is available from the raw data is not even smoke and mirrors. There is no mirror. There is no smoke.

    Let me ask you this. I have four measurements, each of which has an inherent uncertainty of ± 2 units. In your terms, 2 units is the “level of precision that is available from the raw data.”
    Let’s say that the average value of the four measurements is 11.6 units … my question is, what is the uncertainty of that average value? Please show your calculations
    w.
    PS—Let me give you my answer for comparison, along with my calculations. The uncertainty of the average is ± 1 unit. This is because when we add things with associated uncertainty, the uncertainty adds “in quadrature”. This means that if we have four measurements with an uncertainty of 2, the total uncertainty of the sum is
    sqrt( 22 + 22 + 22 + 22 ) = 4
    And just as the average of four numbers is the total divided by four, the uncertainty of that average is the uncertainty of the sum divided by four.
    Note that the uncertainty of the average is less than the uncertainty of the raw data.

  56. Willis writes ” I have four measurements, each of which has an inherent uncertainty of ± 2 units. ”
    In the case of Argo, the uncertainty is not known and almost certainly changes considerably by location and time of day and year.

    • Tim, you guys keep tossing out objections as thought they should stop anyone from using the Argo data at any time for estimating the ocean temperature. But instead, all they do is to increase the uncertainty.
      And yes, we do know the uncertainty of the Argo measurements. As you point out, they change considerably by location and time … so what?
      Seriously, so what? Statistics deals with that situation all the time. All that increased fluctuation does is increase the uncertainty of the estimate by increasing the standard deviation of the measurements. It doesn’t mean we should all throw our hands up and go home as you guys seem to be recommending.
      Seriously?
      Just because the quantity is fluctuating you want to say it’s all too hard?
      Just because you have to measure the ocean temperatures over the span of a year to be able to give us an annual average temperature, that’s too difficult for you?
      This is precisely why statistics was invented, Tim … so we could know how accurate our measurements need to be when measuring something whose values fluctuate over the year.
      The other mystery to me is why y’all seem to think that I believe the Argo data is accurate enough for determining annual heat content changes … when I’ve shown it is not accurate enough, not by waving my hands, but by actually doing the math.
      w.

      • First Willis, I am sure that Tim and I agree that your demonstrations of math and stats are correct, but they are partial. You have demonstrated certain statistical techniques. But there remains an insurmountable challenge: you cannot use statistical techniques to correct a conceptual error.
        Karl et al is ultimately claiming that something below the level of detection can be detected be clever math. It is not a matter of working out from multiple measurements of different things how to find it. There are severe limits place on that approach by the nature of instrumental readings.
        “That one is obvious as well … which makes it clear that despite the inhomogeneity more measurements give us a better estimate of the overall average density.”
        I have been searching around the Net for quotes that are relevant to this subject. Here is a suitable one from 2003:
        From: http://en.wikipedia.org/wiki/Experimental_uncertainty_analysis (well down the page)
        My bold, my italics. The bold indicates the point you make, the italics indicates the point I am about to make:
        ======
        Sample size
        What is missing here, and has been deliberately avoided in all the prior material, is the effect of the sample size on these calculations. The number of measurements n has not appeared in any equation so far. Implicitly, all the analysis has been for the Method 2 approach, taking one measurement (e.g., of T) at a time, and processing it through Eq(2) to obtain an estimate of g.
        To use the various equations developed above, values are needed for the mean and variance of the several parameters that appear in those equations. In practical experiments, these values will be estimated from observed data, i.e., measurements. These measurements are averaged to produce the estimated mean values to use in the equations, e.g., for evaluation of the partial derivatives. Thus, the variance of interest is the variance of the mean, not of the population, and so, for example,
        [gives examples]
        which reflects the fact that, as the number of measurements of T increases, the variance of the mean value of T would decrease. There is some inherent variability in the T measurements, and that is assumed to remain constant, but the variability of the average T will decrease as n increases. Assuming no covariance amongst the parameters (measurements), the expansion of Eq(13) or (15) can be re-stated as
        [Formula]
        where the subscript on n reflects the fact that different numbers of measurements might be done on the several variables (e.g., 3 for L, 10 for T, 5 for θ, etc.)
        This dependence of the overall variance on the number of measurements implies that a component of statistical experimental design would be to define these sample sizes to keep the overall relative error (precision) within some reasonable bounds.
        ===========
        The number of measurements is 1. Each unique ‘experiment’ consists of a single measurement made at a certain place in 3D and time. There are no ‘multiple measurements’. Note the point Author makes that there is an inherent variability in the T measurements. That inherent variability is dealt with statistically by making multiple measurements of the same thing with the same instrument. We never have that for a land or sea temperature data set. Every measurement is unique and it represents an experiment performed once.
        Last paragraph: There is an absolute requirement that in order to constrain the increase in the uncertainty caused by the inherent variability of all instruments and the rising number of readings, multiple measurements must be made of each data point, with a statistical design method that keeps the overall error ‘within reasonable bounds’.
        ARGO floats do not, as a group, keep the average of all readings ‘within reasonable bounds’. Why? Because they are not measuring the same thing. There are no multiple measurements. The designers of the experiment know full well they are, as Monckton has said, measuring bodies of water on average as large as the volume of Lake Superior, and they have to be treated as independent bodies.
        A good analogy is cups of coffee. Put 1000 cups of coffee on 1000 tables in 1000 restaurants in San Francisco. Using 1000 thermocouples accurate to 0.1 degrees C, measure the temperature of the coffee. Average the results. Can the average temperature by known to within 0.01 degrees? No. It cannot. Measuring the temperature of 2000 cups in 2000 restaurants with 2000 instruments will not reduce the uncertainty by half. It is the same or worse than measuring one cup once.
        The math you propose is only valid for multiple measurements of one cup of coffee with one instrument. Even using 100 instruments to take 1 reading each of one cup of coffee is to invalidate the statistical claim to have increased the precision. Just as there is an inherent variability in the taking of each measurement with a single instrument, there is an inherent variability between instruments, and further, they may not be well calibrated against each other.
        This whole air temperature and ocean temperature measurement to 0.001 degrees is so much statistical BS. Correctly described, each claim to ‘remarkable precision’ has to be accompanied by a confidence level.
        There is a certain level of confidence in different levels of precision, which is to say,
        (these number are illustrative)
        Confidence that the average temperature is 30 degrees = 100%
        Confidence that the average temperature is 30.0 degrees: = 95%
        Confidence that the average temperature is 30.00 degrees: = 40%
        Confidence that the average temperature is 30.000 degrees = <<1%
        We have just been lampooning the claim that 2014 was the hottest year evah, at 38% confidence because there is 62% confidence that is was not. If Karl et al claims a change with 0.1% confidence, there is a 99.9% confidence that it was not detected. In other words, that it was a different number, and further, there is no way to know whether or not it is higher or lower. The confidence we can have in the value relates to the quality of the inputs which in this case rules out the use of the technique you have proposed.
        The shape of the 'curve' of the confidence numbers is dictated by the uncertainty of the original measurements and the other confounding factors: area and depth weighting, vertical and horizontal position and so on. Each uncertainty adds to the standard deviation, reducing the confidence with which one can claim to have detected a very small change in the average.
        You have raised the flag of "we can't tell anything from ARGO measurements". That is not what anyone is saying. We can tell lots, but we cannot tell if the oceans (plural) have warmed by 0.005 degrees with any meaningful confidence.
        So Karl et al should have included a number that reflects how confident we can be that a change of 0.005 degrees has been detected. My confidence in his number is very close to zero. I am not saying that 0.005 is the ‘value’ he detected, it is the claim that the change is ‘known’ with that level of precision, with, say, 95% or 68% or some other meaningful level of confidence. The instruments x distribution x inherent variability x an inhomogeneous ocean simply cannot support such a claim.

  57. Crispin in Waterloo but really in Seoul June 11, 2015 at 4:57 pm

    A good analogy is cups of coffee. Put 1000 cups of coffee on 1000 tables in 1000 restaurants in San Francisco. Using 1000 thermocouples accurate to 0.1 degrees C, measure the temperature of the coffee. Average the results. Can the average temperature by known to within 0.01 degrees? No. It cannot. Measuring the temperature of 2000 cups in 2000 restaurants with 2000 instruments will not reduce the uncertainty by half. It is the same or worse than measuring one cup once.

    Sorry, amigo, but that’s simply not true. If I know your weight with an uncertainty of 2 pounds, and I know my weight and two other people’s weight with an uncertainty of 2 pounds, then I know the average weight of the four of us to an uncertainty of 1 pound. You still don’t seem to understand that it doesn’t matter what is measured. The mathematical laws are inexorable, they make no adjustments as to what is being measured, where it is measured, or what instruments are used.
    Now, I’ve been quite clear about the math I’ve used. It is the standard math for uncertainties. They add in quadrature, and when you divide a sum with an uncertainty by a precise number (like for an average), you divide the uncertainty by the same number as well. That’s what I’ve done above with the four weights each with a 2 pound uncertainty. Added in quadrature the uncertainties add up to 4 pounds, and divided by four (people) that’s an uncertainty of 1 pound for the average … which is less than the uncertainty of any of the weights. So that’s my math, which you have found no fault with except to say (with no reason) that it’s the wrong math to use. So that’s my math.
    And I’ve asked you in the past for your math, as to exactly how you would calculate the uncertainty of that same situation … and you said nothing, you replied nothing, you gave me no math of any kind. You just airily wave your hands and tell me the uncertainty is “the same or worse” … really? You can’t even tell us if the uncertainty will be the same or if it will be worse in your own damn example, and you want to pass yourself off as knowledgeable? Get real!

    The math you propose is only valid for multiple measurements of one cup of coffee with one instrument.

    Hogwash. I don’t “propose” any math. I’m giving you the standard mathematical formula for the calculation of the uncertainty of averages. Averages are valid for just about anything, including coffee. There is no mystery or debate about how to calculate the uncertainty of average, so there is nothing to “propose”.

    You have raised the flag of “we can’t tell anything from ARGO measurements”. That is not what anyone is saying.

    You’re damn right about that because that is not what I am saying either, I’m quite sure I never said it. DO NOT TRY TO CRAM WORDS INTO MY MOUTH!! I asked people to quote me for a reason, and I’m damn tired of being ignored in that regard. But you putting quotes around something and pretending I actually said it? Sorry, I don’t have any truck with people who pull that kind of slimy trickery. Here’s what I really said:

    I said several times that the [Argo] system is undersampled. All I’ve done above is to show that the uncertainty is about twenty times what they have claimed, and that as a result we cannot tell if the ocean heat content is increasing as claimed.
    However, that doesn’t make the [Argo] measurements useless. As usual, that just means that you need to aggregate them over either a larger time or a larger space.

    So not only did I not say what you falsely claim, I SAID THE EXACT OPPOSITE!
    When you figure out how to quote someone’s actual words, and you decide not to provide false lying “quotes” about what I’m supposed to have said but in fact said the opposite, we can have another discussion. For now, your arrogance is getting in the way of your mathematical ignorance, which in turn is getting in the way of your understanding something as simple as how to average data with uncertainties, so I’m just going to get out of the way entirely.
    I’m sorry to be so harsh, Crispin, but pretending I said something I didn’t say is an action I simply will not put up with.
    w.

  58. Willis, found this in the Univ. of Colo. Tide Guage Sea Level page.
    Do you know anything about their data base? I am looking for new ways of looking at the Di-urnal cycles other than sea level pressue. I don’t want to waste my time if it has the same flaws as the pressure stations. I am sure it is already corrupted but maybe it gives insight on their conclusions..
    “Major conclusions from tide gauge data have been that global sea level has risen approximately 10-25 cm during the past century.”

  59. Sorry Willis about anything to do with words. I am really tired and traveling thousands of miles and can’t remember to delete everything you will react to. What is important is the methods of determining precision and accuracy. I am just about impossible to offend so don’t worry about harshness. I had a management advisor who was much worse than you ever will be.
    I have consulted several more people on this and I am sorry to say that I have been unable to get you to view the problem as it is, instead of as you wish it to be. Here is a quote:
    “Sorry, amigo, but that’s simply not true. If I know your weight with an uncertainty of 2 pounds, and I know my weight and two other people’s weight with an uncertainty of 2 pounds, then I know the average weight of the four of us to an uncertainty of 1 pound. ”
    You have once again repeated the error of making the measurement with the same instrument. Further, you say, “I know your weight.” There is no uncertainty in that statement. But real measurements have uncertainty. The formula only applies when there is no uncertainty about “my weight”. Weigh everyone once using the same scale which has a resolution of 4 pounds. It yields numbers ± 2 pounds.
    You do not really know my weight, you know my weight within a 4 pound range centered on the indicated value, say 150 pounds (which is not my real weight). You cannot know with greater certainty what my true weight is nor can you make a better estimate of the true position of the centre of the 4 pound range because you only have one measurement of my weight. This uncertainty propagates.
    You also do not know your own weight save that it is within a 4 pound range centered on an indicated value, say, also 150 pounds. To calculate our total weight, it will be the sum of the indicated values plus or minus the sum of the uncertainties of each. The answer is 300 pounds with an uncertainty of ± (2+2 = 4) pounds. Our true combined weight could be as low as 296 or as high as 304. We do not know. Adding another two identical sized people weighed with the same precision would give a total weight of 600 pounds plus or minus 8 pounds. The average weight of the four of us is 150 pounds plus or minus 2 pounds.
    I will apply the quadrature formula to the one reading we have for each person:
    Quadrature applied to a single measurement: Sqrt(2^2) = 2 which is the same as before.
    No matter how many single measurements we make the uncertainty of their average is not reduced. To reduce uncertainty we have to have more than one measurement of each person’s weight.
    Suppose we weighed each person 4 times. This will more accurately place the centre of the range of uncertainty. The indicated (average) value will probably move up or down and the uncertainty range is reduced by half. The uncertainty about each person’s weight will be reduced to 1 pound because we have four measurements to rely on instead of only 1. The uncertainty of the average of all 4 of us will still be 1 pound even though we made 16 measurements total. In order to reduce the uncertainty of the average you would have to weigh each of us a larger number of times.
    Similarly the uncertainty of the average weight of 8 or 16 people is not reduced just because you have included more people. To achieve that you have to take more measurements of each person. The reduction in uncertainty of each person’s weight is limited by the number of measurements of each person, in quadrature as you indicated.
    Do you agree?
    Now consider the same measurements made with 4 different scales, one for each person. This introduces an additional uncertainty related to the uncertainty of the readings, i.e. is it biased? Was it calibrated correctly? Is the response linear with mass change?
    You have to consider the accuracy and drift of different instruments in the equation that calculates the uncertainty. The net is filled with examples but I was unable to find the exact formula for all the readings being taken once by a diversity of instruments. People keep writing about how many repeat measurements they must make of the same ‘thing (a specific point in the ocean) with the same apparatus (the surface temperature buoy). We do not have the luxury of multiple measurements, nor of using the same instrument everywhere.
    Try sending a device for testing to each of four different labs: four samples from a manufacturing run, four labs with four sets of people, and four sets of lab instruments. You will get four different results. Averaging the results will is not more precise than any of the individual results. In many cases it will be less than the best individual result. In some cases you actually lose not just certainty but significant digits.
    “You still don’t seem to understand that it doesn’t matter what is measured.”
    You do not seem to understand that there is a fundamental difference between measuring the diameter of one penny 1000 times and measuring the diameter of 1000 pennies once each.
    Say the uncertainty is 0.1mm per reading. The uncertainty of the first case is Sqrt(0.1^2*1000) The uncertainty of each of the 1000 measurements is Sqrt(0.1^2)
    Your statement implies that we can know the average diameter of all 1000 pennies measured once each just as precisely we will know the exact diameter of one penny measured 1000 times. The 1000 pennies are not all the same diameter – there is a present there which one penny does not have. That variability has to be carried into the precision and the uncertainty of the average if the 1000 pennies are only measured once each.
    When 1000 pennies are measured with 1000 different instruments that have not been calibrated in three years, another type of uncertainty is introduced which considers whether or not the readings are accurate, and the fact that different instruments may have different levels of inherent variability.

  60. Crispin in Waterloo June 12, 2015 at 2:11 pm

    Crispin in Waterloo June 12, 2015 at 2:11 pm Edit
    Sorry Willis about anything to do with words. I am really tired and traveling thousands of miles and can’t remember to delete everything you will react to.

    Crispin, it’s not about remembering not to do things I will assuredly object to.
    It’s about NOT LYING ABOUT WHAT ANOTHER MAN SAID. It has nothing to do with me. It’s about you lying. I’m not even in the picture frame.
    Until you can actually show some sign that you understand and are dealing with your lie and the damage that kind of lie causes, I’m not interested in the slightest in discussing anything with you. I have absolutely no desire to leave myself open for that kind of deceptive underhanded attack. Maybe your friends put up with you passing off false “quotations” as though your friends said them.
    I don’t.
    w.

  61. Let me take another shot at this. Here are some questions. Please include your calculations
    FIRST QUESTION
    We have four numbers. Each has an associated uncertainty. Let’s say that they all have the same uncertainty of 2 units.
    What is the uncertainty of the average of the four numbers?
    SECOND QUESTION
    We want to know if our four cats are gaining weight. We weigh each of them on a scale with an uncertainty of 2 units.
    What is the uncertainty of the average of the four numbers?
    THIRD QUESTION
    Every day I go to a different coffeshop and weigh my coffee. My scale is good to ± 2 ounces. After I do this for four days, I take the average so I can find out how much coffee I’m drinking daily. What is the uncertainty of the average?
    The part that people seem to have trouble with is that the uncertainty of an average is LESS than the average of the uncertainties. Average uncertainty in each of the above cases is 2 units … but the actual uncertainty of the average is only 1 unit.
    The magic of averages is used every day by polling companies. They know that you don’t have to measure the opinion of everyone in the US to determine public sentiment within some specified uncertainty. And they know, depending on the level of uncertainty desired, how many people they have to ask.
    Similarly, we don’t have to weigh every cup of coffee in San Francisco in order to get an accurate view of the average weight of a cup of coffee in SF. All we need to do is to take a representative sample.
    So the question then becomes, how big a sample does it take to be representative? If you don’t ask enough people, or if you don’t ask the right people, you get sampling error.
    And this sampling error was the point of my whole post. It is the first time I’ve seen an estimate of the SAMPLING ERROR of the Argo data.
    Now, knowing the sampling error, I used that to figure out what the corresponding sampling error for the ocean would be. Assuming we have the same sampling error for the globe as for the NA (not true but close enough), I gave an estimate of the BEST CASE of the Argo uncertainty for the globe.
    It turns out to be about twenty times the claimed error, meaning that no, we do NOT know if the oceans are warming. My conclusion was immediately attacked … but strangely, not by warmists. No, by people who think that statistics were never invented, or by people who think that uncertainty makes data useless.
    Note, however, that the Argo data is far from useless. That’s the beauty of statistics. It lets us know what we have enough data to do, and what we don’t have enough data to do.
    Of course you could always wave your hands and say something like …

    … 3 set of readings per month per buoy dont really tell us anything about the ocean because the variation (and error) due to the sparseness of the readings makes monthly “data” worthless. Misleading even.

    Me, I use statistics instead so that I know whether the data is “worthless” or not. I pointed out a graph of my above wherein the Argo data was extremely useful … so obviously, waving your hands and claiming it’s all too uncertain isn’t the most successful strategy. Because the odds are, statistics beats handwaving, with very little uncertainty.
    Here’s the chart again, and if you think it is “worthless”, please point out where and why:
    http://wattsupwiththat.files.wordpress.com/2012/02/argo-surface-temperatures-n-hemisphere-160-180e-0-45n.jpg
    As you can see, there’s plenty we can learn from the Argo data, as long as we don’t just throw up our hands and say that the Argo data doesn’t “really tell us anything about the ocean ..!” …
    Argo doesn’t tell us anything about the ocean? I don’t think so …
    w.
    [2. In the second question, do not the cats need to be weighed (at least) twice? .mod]

    • Willis: “Argo doesn’t tell us anything about the ocean? I don’t think so …”
      I am glad I never said anything like that. ARGO tells us a lot.
      I am saying the Karl et all didn’t find anything. They cannot find an unquantifiable quantity smaller than the limit of detection with the instruments available.

    • Willis: “Similarly, we don’t have to weigh every cup of coffee in San Francisco in order to get an accurate view of the average weight of a cup of coffee in SF. All we need to do is to take a representative sample.”
      Working with a sample increase uncertainty. That is why the number in the denominator is N-1.
      Taking a representative sample of all cups of coffee, one measurement each, allows one to calculate an average which represents the centre of the values recorded. None of the readings are necessarily correct and this applies equally to the average of them. All may be high. Distribution may not be normal.
      The accuracy of the average is not affected by the number of readings, it is inherent in the instrument which you gave as +/- 2 units. The true average value may not lie close to the calculated average of all readings. The only assurance from the manufacturer is that it lies within +/- two units of the average of measurements. This is fundamentally different from increasing the accuracy. If the scale was mis-calibrated all of the readings will be off.
      So what’s the lesson here? All calculated averages are constructs and the result is no better than the accuracy of the readings. Increasing the confidence of where the middle is, doesn’t reduce the range which remains at the accuracy of the instrument. To get a ‘better’ answer people have to use more accurate instruments.

  62. Willis Eschenbach June 12, 2015 at 3:49 pm
    Let me take another shot at this. Here are some questions. Please include your calculations
    FIRST QUESTION
    We have four numbers. Each has an associated uncertainty. Let’s say that they all have the same uncertainty of 2 units.
    What is the uncertainty of the average of the four numbers?
    SECOND QUESTION
    We want to know if our four cats are gaining weight. We weigh each of them on a scale with an uncertainty of 2 units.
    What is the uncertainty of the average of the four numbers?
    THIRD QUESTION
    Every day I go to a different coffeshop and weigh my coffee. My scale is good to ± 2 ounces. After I do this for four days, I take the average so I can find out how much coffee I’m drinking daily. What is the uncertainty of the average?
    The part that people seem to have trouble with is that the uncertainty of an average is LESS than the average of the uncertainties. Average uncertainty in each of the above cases is 2 units … but the actual uncertainty of the average is only 1 unit.

    Everyone has moved on in this thread, but here I am on a Saturday, killing time by reading papers on ARGO, and I come across this little challenge. Here goes.
    FIRST QUESTION
    The uncertainty of the average is sigma (common to all four numbers) divided by the square root of 4, which is to say 2/2 or one unit.
    SECOND QUESTION
    You say you want to know if the cats (as a group) are gaining weight, which implies two sets of measurements and then a comparison of the averages with an associated uncertainty of the difference. However, you ask only for the average of the four numbers (i.e. average weight of four cats at a single point in time). I can assume that each weighing of a different cat has the same uncertainty. This may not be so–the scale may have an uncertainty associated with weight (heterosekadasticity). If all is ideal, then the average of the four numbers is the total weight of the four cats divided by four with an uncertainty of one unit.
    If you really want to know if the set of four cats are gaining weight, then the cats’ varying weights present an additional uncertainty at each set of measurements. If we assume that only the scale contributes uncertainty then the differencing would produce a number with an uncertainty of 1.414 units. But if the cats’ weights are varying throughout the period, so that there is an uncertainty with the cat’s weight according to time of day, then the uncertainty is larger.
    THIRD QUESTION
    Do you take the same coffee cup with you to each coffee shop? If so, then the cup is not a source of uncertainty, only your scale contributes, and your average is uncertain by 1 unit again. If you use the local shop’s cup, then the cups present additional uncertainty.
    If uncertainty is always statistical, and there is no bias, then the uncertainty of an average is less than the average of the uncertainty. But in the worst possible case of non-statistical uncertainty we may have to estimate the upper bound of uncertainty as the sum of the absolute values of individual uncertainties. In manufacturing this is known as the iron-clad rule of stack-up error. In this worst case the additional measurements do not improve uncertainty at all. In two of the stated cases above we had to assume the data were distributed identically and the measurements independent in order to calculate anything at all.

    • K Kilty
      Your last paragraph recognises that any expression of a calculated value has to be accompanied by a statement of confidence in that value. Adding ‘precision’ (meaning significant digits in base 10 or base 2) reduces the confidence. Why? Because the calculation cannot improve the accuracy or any or all measurements. If you weigh all 4 cats at the same time, you get the total to the reporting precision and inherent accuracy of the scale. Then divide by 4. OK, that ;’s better. But weighing them one by one, once each is not the same at all. Stacking error.

  63. Willis, I have consulted a P.Eng in Materials who deals with the issues of sampling and asked him how I can better communicate the major points using the analogies so far. He taught engineers for a few years.
    He insists I communicate the following: we “have not agreed on the definitions of terms and clarified when they will be used.” Oops.
    ++++++
    “SECOND QUESTION
    “We want to know if our four cats are gaining weight. We weigh each of them on a scale with an uncertainty of 2 units.
    “What is the uncertainty of the average of the four numbers?”
    The moderator is catching on. If you only have one measurement of each, your have a reduced uncertainty in the average but you have not increased the precision of the readings nor increased the accuracy of each not the accuracy of the average. The calculated average is not a mass, it is the centre of a range that is the same as before, +/- 2 units. Averaging the numbers does not give a ‘truer’ value, it just gives a number which is a ‘better guess’, that is to say, you have more confidence as to where the centre of the range is probably located. Read on for an example below.
    The poor definition lies with the term ‘uncertainty of the average’ with the implication that an ‘average’ number is going to be closer to the true mass than the individual weights. Behind that thought is the expectation that 4 measurements will be normally distributed about the true mass. That is a logical error – you don’t have 4 measurements of each cat. You don’t get ‘normal distribution’ from 1 measurement.
    ++++++++
    “THIRD QUESTION
    “Every day I go to a different coffeshop and weigh my coffee. My scale is good to ± 2 ounces. After I do this for four days, I take the average so I can find out how much coffee I’m drinking daily. What is the uncertainty of the average?”
    This analogy (discrete cups) is one step close to the real world example of the ARGO floats and the surface buoys. One more step is needed to frame the problem correctly.
    The following is built upon the P.Eng’s recommendation for how to illustrate this type of problem.
    Background:
    You have a scale which reads 0-100 ounces with a readout precision of 1 oz and an accuracy +/-2 oz. The readout precision comes from the makings on the scale and the accuracy is from a test against a very precise Standard Scale. Every scale made by the manufacturer gives, on average and with a high confidence, 99%, a reading within 2 oz of the true weight. (The P.Eng pointed out right from the start that no matter what you do with that scale, the result of any calculated number will still be +/- 2 oz which is why they make more accurate scales. Averaging readings does not change the accuracy of the scale any more than summing readings does.)
    You carry the scale with you and that is not like the ARGO floats which are separate instruments. But this is your experiment. Measuring different things once each isanalogous to the float and buoy measurements which are in a different volume of water each time.
    Each coffee shop you will visit has an “Whizzo Coffee Machine” that is calibrated to reliably produce 10 oz coffees within the limit of the legal definition of ’10 oz’, i.e. at least 9.75 oz and not more than 10.25 oz. All the coffee shops you are going to visit sell 9.8 oz coffees, though you are not aware of that. You are going to weight them yourself.
    The first cup reading is 10 oz
    The second cup reading is 11 oz
    The third cup reading is 11 oz
    The fourth cup reading is 11 oz
    All the readings are correct within the attested accuracy of the scale. The total mass of coffee is 43 oz +/- 8. The ‘8’ is because you have only made a single measurement of each serving, the point picked up by the Moderator. The summed value +/-2 each x 4 = +/-8 got the total.
    The average is 10.75 oz but you are not allowed to claim that because to do so would attribute to the number spurious precision. You have only 2 significant digits from your instruments so the answer is only valid to two, so the average is 11. It is not 10.75 or 10.8. It is 11. (Technically speaking, because your scale goes to 100, not 99, it is a 2-1/2 digit instrument but you are not allowed to use the ‘1/2’ in this argument.)
    The average of the readings is, logically, within the +/- 2 oz range of the true (unknown) value because all of the readings really are within the range. The scale is performing to spec.
    The average of 11 oz is attended by a level confidence. You can be very confident, say 99%, that the true average, which is 9.8, lies within 2 oz of 11. And it does. You know no more than that. The average is not data (a measurement). It is a mathematical construct. If you claim added precision, you lose confidence. The claim is that you can guess more precisely using the stats procedure, where the middle of the +/- 2 range is.
    If you wish to say, “But I have 4 readings and surely I can state the average with higher precision? The reply is that yes, you can! But you have to state that you have less confidence in any particular number that is within spec. The true value might 12 oz. You have no proof that it is not. All the readings are within 2 oz of 12. Any average has a confidence level. In fact all measurements have a confidence level, we just ignore them most of the time.
    You could say you have at least some confidence that the average is 10.8 and that the true mean is +/- 1 oz. In fact it is true that the true mean is just within +/-1 oz, but because you do not know what the true mean is, you have to make that statement with a reduction in confidence. You cannot be as confident that it is within 1 oz of 10.8 as you can that it is within 2 oz of 11. None of the readings are correct and their distribution is not Normal because each cup is unique (and may be a little bit different).
    You decide to drink more coffee. The readings are: 11, 10, 10 and 12. The ’12’ is there because the instrument cannot report the 11.8 it should have been because it rounds to the nearest 1. In fact even the Weights and Measures Inspector might not know if that particular coffee was 10.2 or 10.0 or 9.8 oz. She makes sure it falls within +/- 0,25.
    Your scale is not precise or accurate enough to tell us. What we do know is, that given a lot of single weighings of a multiple objects, each result reported will be within +/- 2 oz of the true value. [Actually there are different ways of reporting repeatability with real scales; round Up, round Even etc: which are beyond the scope of this post.]
    Because we can’t tell whether the last coffee was actually heavier, the best we can do is assume that the true weight was within +/-2 of 12, remembering that the value was rounded to the nearest oz because 1 oz is the reporting precision of the instrument.
    The calculated average is now 10.75 which we have to round to 11, The total mass of 8 coffees is 86 oz +/- 16 and the reported average is 11 +/- 2. You could also report that the total is a10.8 +/-1 with a reduced level of confidence or 10.75 +/- 0.5 with very little confidence, or 10.750 +/- 0.25 (or whatever) with virtually no confidence at all. The ‘guess’ is based on the available data. If you want to ‘know’ the value with greater precision and greater accuracy, you have to get a scale that reports to more significant digits and has greater accuracy. And that is precisely (ha ha) why people make them.
    Increasing the number of readings of each serving would allow for an enhanced level of confidence but would not reduce the ‘full confidence’ range from +/-2. Any number emerging from a stats procedure will always have attached to it the accuracy which is +/-2 because it is inherent in the instrument and therefore the measurement. The true average is still 9.8. You might make 3000 readings all of which are above 10 or 11 because the scale is only accurate to +/-2. Maybe all the readings are high by 1 oz, but that is within spec. A formula can’t solve that. And then there is instrument drift…
    Lastly, consider that the ARGO and Buoy measurements are made once each using different instruments! That would be like each coffee shop having their own scale on which you weigh their single serving once. Readings might vary from 8 to 12 even if they were all 9.8 oz each and all would be within spec. If the servings actually varied from 10.25 to 9.75, as allowed by law, none of the 8 weighed values above would change, meaning that the true total served falls within a 1500 oz range in 3000 cups.
    Disclaimer: I have simplified this last example. The results would actually by worse than indicated using every influence bearing on 4 measurements of single servings each made on a different scale.

    • Willis I am again replying to Question 2 (cat weights) and part of 3:
      “…uncertainty is always statistical, and there is no bias, then the uncertainty of an average is less than the average of the uncertainty.”
      This is quite true however you do not know if the ‘more certain’ average is accurate. This is the issue, not straight forward statistics of perfect and normally distributed numbers. Knowing with less uncertainty the average of a data set does not mean knowing more accurately what the average is. My reply above inspired by P.Eng sets that out pretty clearly. If the accuracy of the instrument is not good, it does not address the following:
      “If I make more measurements of something with an inaccurate instrument, whatever the precision (number of significant digits reported) I know ‘more accurately’ the true average value”.
      That sentence is untrue. If the readings are all high by ‘2’ then the more-precisely-known-average is off by 2 every single time. One cannot assume that all readings from an instrument are distributed around the true value. They can be very nicely and normally distributed about any other value away from the true value.
      Land temps: Because every single reading made on a visual thermometer is +/-0.5 and the instrument itself might be calibrated +/- 0.5 degrees, then the total range is 1.0 degrees from its true value. It is an error to assume that the calibrations are normally distributed around the true value. It is an error to assume that readings are normally distributed around the displayed value each time.
      Claims that an average temperature on land has been calculated to 0.001 degrees are resting on the shifting mud of unlikely assumptions and spurious precision read from the mantissa of a calculator. The improvement of the precision of the average is different when doing the same calculation using bases other than 10 as the number of significant digits changes. Under some conditions the number of significant digits decreases.
      Karl et al shows why one should not mix data sets and it should be put into textbooks as an object lesson with an explanation as to why not: because it turns a pause into a trend where there is none.

Comments are closed.