Systematic Error in Climate Measurements: The surface air temperature record

Guest essay by Pat Frank

Presented at World Federation of Scientists, Erice, Sicily, 20 August 2015

This is a version of the talk I gave about uncertainty in the global average air temperature record at the 48th Conference of the World Federation Scientists on “Planetary Emergences and Other Events,” at Erice, Sicily, in August of 2015.

It was a very interesting conference and, as an aside, for me the take home message was that the short-term emergency is Islamic violence while the long-term emergency is some large-scale bolide coming down. Please, however, do not distract conversation into these topics.

Abstract: I had a longer abstract, but here’s the short form. Those compiling the global averaged surface air temperature record have not only ignored systematic measurement error, but have even neglected the detection limits of the instruments themselves. Since at least 1860, thermometer accuracy has been magicked out of thin air. Also since then, and at the 95% confidence interval, the rate or magnitude of the global rise in surface air temperature is unknowable. Current arguments about air temperature and its unprecedentedness are speculative theology.

1. Introduction: systematic error

Systematic error enters into experimental or observational results through uncontrolled and often cryptic deterministic processes. [1] These can be as simple as a consistent operator error. More typically, error emerges from an uncontrolled experimental variable or instrumental inaccuracy. Instrumental inaccuracy arises from malfunction or lack of calibration. Uncontrolled variables can impact the magnitude of a measurement and/or change the course of an experiment. Figure 1 shows the impact of an uncontrolled variable, taken from my own published work. [2, 3]

clip_image002

Figure 1: Left, titration of dissolved ferrous iron under conditions that allowed an unplanned trace of air to enter the experiment. Inset: the incorrect data precisely followed equilibrium thermodynamics. Right, the same experiment but with the appropriately strict exclusion of air. The data are completely different. Inset: the correct data reflect distinctly different thermodynamics.

Figure 1 shows that the inadvertent entry of a trace of air was enough to completely change the course of the experiment. Nevertheless, the erroneous data display coherent behavior and follow a trajectory completely consistent with equilibrium thermodynamics. To all appearances, the experiment was completely valid. In isolation, the data are convincing. However, they are completely wrong because the intruded air chemically modified the iron.

Figure 1 exemplifies the danger of systematic error. Contaminated experimental or observational results can look and behave just like good data, and can rigorously follow valid physical theory. Without care, such data invite erroneous conclusions.

By its nature, systematic error is difficult to detect and remove. Methods of elimination include careful instrumental calibration under conditions identical to the observation or experiment. Methodologically independent experiments that access the same phenomena provide a check on the results. Careful attention to these practices is standard in the experimental physical sciences.

The recent development of a new and highly accurate atomic clock illustrates the extreme care physicists take to eliminate systematic error. Critical to achievement of its 10-18 second accuracy, was removal of systematic error produced the black-body radiation of the instrument itself. [4]

clip_image004

Figure 2: Close-up picture of the new atomic clock. The timing element is a cluster of fluorescing strontium atoms trapped in an optical lattice. Thermal noise is removed using data provided by a sensor that measures the black-body temperature of the instrument.

As a final word, systematic error does not average away with repeated measurements. Repetition can even increase error. When systematic error cannot be eliminated and is known to be present, uncertainty statements must be reported along with the data. In graphical presentations of measurement or calculational data, systematic error is represented using uncertainty bars. [1] Those uncertainty bars communicate the reliability of the result.

2. Systematic Error in Surface Temperature Measurements

2.1. Land Surface Air Temperature

During most of the 20th century, land surface air temperatures were measured using a liquid-in-glass (LiG) thermometer housed in a box-like louvered shield (Stevenson screen or Cotton Regional Shelter (CRS)). [5, 6] After about 1985, thermistors or platinum resistance thermometers (PRT) housed in an unaspirated cylindrical plastic shield replaced the CRS/LiG sensors in Europe, the Anglo-Pacific countries, and the US. Beginning in 2000, the US Climate Research Network deployed sensors consisting of a trio of PRTs in an aspirated shield. [5, 7-9] An aspirated shield includes a small fan or impeller that ventilates the interior of the shield with outside air.

Unaspirated sensors rely on prevailing wind for ventilation. Solar radiance can heat the sensor shield, warming the interior atmosphere around the sensor. In the winter, upward radiance from the albedo of a snow-covered surface can also produce a warm bias. [10] Significant systematic measurement error occurs when air movement is less than 5 m/sec. [9, 11]

clip_image006

Figure 3: Alpine Plaine Morte Glacier, Switzerland, showing the air temperature sensor calibration experiment carried out by Huwald, et al., in 2007 and 2008. [12] Insets: close-ups of the PRT and the sonic anemometer sensors. Photo credit: Bou-Zeid, Martinet, Huwald, Couach, 2.2006 EPFL-ENAC.

In 2007 and 2008 calibration experiments carried out on the Plaine Morte Glacier (Figure 3) tested the field accuracy of the RM Young PRT housed in an unaspirated louvered shield, situated over a snow-covered surface. In a laboratory setting, the RM Young sensor is capable of ±0.1 C accuracy. Field accuracy was determined by comparison with air temperatures measured using a sonic anemometer, which takes advantage of the impact of temperature on the speed of sound in air and is insensitive to irradiance and wind-speed.

clip_image008

Figure 4: Temperature trends recorded simultaneously on Plaine Morte Glacier during February – April 2007. (¾), Sonic anemometer, and; (¾), RM Young PRT probe.

Figure 4 shows that under identical environmental conditions, the RM Young probe recorded significantly warmer Winter air temperatures than the sonic anemometer. The slope of the RM Young temperature trend is also more than 3 times greater. Referenced against a common mean, the RM Young error would enter a spurious warming trend into a global temperature average. The larger significance of this result is that the RM Young probe is very similar in design and response to the more advanced temperature probes in use world-wide since about 1985.

Figure 5 shows a histogram of the systematic temperature error exhibited by the RM Young probe.

clip_image010

Figure 5. RM Young probe systematic error on Plaine Morte Glacier. Day time error averages 2.0±1.4 C; night-time error averages 0.03±0.32 C.

The RM Young systematic errors mean that, absent an independent calibration instrument, any given daily mean temperature has an associated 1s uncertainty of 1±1.4 C. Figure 5 shows this uncertainty is neither randomly distributed nor constant. It cannot be removed by averaging individual measurements or by taking anomalies. Subtracting the average bias will not remove the non-normal 1s uncertainty. Entry of the RM Young station temperature record into a global average will carry that average error along with it.

Before inclusion in a global average, temperature series from individual meteorological stations are subjected to statistical tests for data quality. [13] Air temperatures are known to show correlation R = 0.5 over distances of about 1200 km. [14, 15] The first quality control test for any given station record includes a statistical check for correlation with temperature series among near-by stations. Figure 6 shows that the RM Young error-contaminated temperature series will pass this most basic quality control test. Further, the erroneous RM Young record will pass every single statistical test used for the quality control of meteorological station temperature records worldwide. [16, 17]

clip_image012

Figure 6: Correlation of the RM Young PRT temperature measurements with those of the sonic anemometer. Inset: Figure 1a from [14] showing correlation of temperature records from meteorological stations in the terrestrial 65-70º N, 0-5º E grid. The 0.5 correlation length is 1.4´103 km.

clip_image014

Figure 7: Calibration experiment at the University of Nebraska, Lincoln (ref. [11], Figure 1); E, MMTS shield; F, CRS shield; G, the aspirated RM Young reference.

Figure 7 shows the screen-type calibration experiment at the University of Nebraska, Lincoln. Each screen contained the identical HMP45C PRT sensor. [11] The calibration reference temperatures were provided by an aspirated RM Young PRT probe, rated as accurate to <±0.2 C below 1100 Wm-2 solar irradiance.

These independent calibration experiments tested the impact of a variety of commonly used screens on the fidelity of air temperature measurements from PRT probes. [10, 11, 18] Screens included the traditional Cotton Regional Shelter (CRS, Stevenson screen), and the MMTS screen now in common use in the US Historical Climate Network, among others.

clip_image016

Figure 8: Average systematic measurement error of an HMP45C PRT probe within an MMTS shelter over a grass (top) or snow-covered (bottom) surface. [10, 11]

Figure 8, top, shows the average systematic measurement error an MMTS shield imposed on a PRT temperature probe, found during the calibration experiment displayed in Figure 7. [11] Figure 8, bottom, shows the results of an independent PRT/MMTS calibration over a snow-covered surface. [10] The average annual systematic uncertainty produced by the MMTS shield can be estimated from these data as, 1s = 0.32±0.23 C. The skewed warm-bias distribution of error over snow is similar in magnitude to the unaspirated RM Young shield in the Plaine Morte experiment (Figure 5).

Figure 9 shows the average systematic measurement error produced by a PRT probe inside a traditional CRS shield. [11]

clip_image018

Figure 9. Average day-night 1s = 0.44 ± 0.41 C systematic measurement error produced by a PRT temperature probe within a traditional CRS shelter.

The warm bias in the data is apparent, as is the non-normal distribution of error. The systematic uncertainty from the CRS shelter was 1s = 0.44 ± 0.41 C. The HMP45C PRT probe is at least as accurate as the traditional LiG thermometers housed within the CRS shield. [19, 20] The PRT/CRS experiment may then estimate a lower limit of systematic measurement uncertainty present in the land-surface temperature record covering all of the 19th and most of the 20th century.

2.2 Sea-Surface Temperature

Although considerable effort has been expended to understand sea-surface temperatures (SSTs), [21-28] there have been very few field calibration experiments of sea-surface temperature sensors. Bucket- and steamship engine cooling-water intake thermometers provided the bulk of early and mid-20th century SST measurements. Sensors mounted on drifting and moored buoys have come into increasing use since about 1980, and now dominate SST measurements. [29] Attention is focused on calibration studies of these instruments.

The series of experiments reported by Charles Brooks in 1926 are by far the most comprehensive field calibrations of bucket and engine-intake thermometer SST measurements carried out by any individual scientist. [30] Figure 10 presents typical examples of the systematic error in bucket and engine intake SSTs that Brooks found.

clip_image020

Figure 10: Systematic measurement error in one set of engine-intake (left) and bucket (right) sea-surface temperatures reported by Brooks. [30]

Brooks also recruited an officer to monitor the ship-board measurements after he concluded his experiments and disembarked. The errors after he had departed the ship were about twice as large as they were when he was aboard. The simplest explanation is that care deteriorated, perhaps back to normal, when no one was looking. This result violates the standard assumption in the field that temperature sensor errors are constant for each ship.

In 1963 Saur reported the largest field calibration experiment of engine-intake thermometers, carried out by volunteers aboard twelve US military transport ships engaged off the US central Pacific coast. [31] The experiment included 6826 pairs of observations. Figure 11 shows the experimental results from one voyage of one ship.

clip_image022

Figure 11: Systematic error in recorded engine intake temperatures aboard one military transport ship operating June-July, 1959. The mean systematic bias and uncertainty represented by these data are, 1s = 0.9±0.6 C.

Saur reported Figure 11 as, “a typical distribution of the differences” reported from the various ships. The ±0.6 C uncertainty about the mean systematic error is comparable to the values reported by Brooks, shown in Figure 10.

Saur concluded his report by noting that, “The average bias of reported sea water temperatures as compared to sea surface temperatures, with 95 percent confidence limits, is estimated to be 1.2±0.6 F [0.67±0.33 C] on the basis of a sample of 12 ships. The standard deviation of differences [between ships] is estimated to be 1.6 F [0.9 C]. Thus, without improved quality control the sea temperature data reported currently and in the past are for the most part adequate only for general climatological studies. [bracketed conversions added]” Saur’s caution is instructive, but has apparently been mislaid by consensus scientists.

Measurements from bathythermograph (BT) and expendable bathythermograph (XBT) instruments have also made significant contributions to the SST record. [32] Extensive BT and XBT calibration experiments revealed multiple sources of systematic error, principally stemming from mechanical problems and calibration errors. [33-35] Relative to a reversing thermometer standard, field BT measurements exhibited ±s = 0.34±0.43 C error. [35] This standard deviation is more than twice as large as the manufacturer-stated accuracy of ±0.2 C and reflects the impact of uncontrolled field variables.

The SST sensors in deployed floating and moored buoys were never field-calibrated during the 20th century, allowing no general estimate of systematic measurement error.

However, Emery estimated a 1s = ±0.3 C error by comparison of SSTs from floating buoys co-located to within 5 km of each other. [28] SST measurements separated by less than 10 km are considered coincident.

A similar ±0.26 C buoy error magnitude was found relative to SSTs retrieved from the Advanced Along-Track Scanning Radiometer (AATSR) satellite. [36] The error distributions were non-normal.

More recently, Argo buoys were field calibrated against very accurate CTD (conductivity-temperature-depth) measurements and exhibited average RMS errors of ±0.56 C. [37] This is similar in magnitude to the reported average ±0.58 C buoy-Advanced Microwave Scanning Radiometer (AMSR) satellite SST difference. [38]

3. Discussion

Until recently, [39, 40] systematic temperature sensor measurement errors were neither mentioned in reports communicating the origin, assessment, and calculation of the global averaged surface air temperature record, nor were they included in error analysis. [15, 16, 39-46] Even after the recent arrival of systematic errors in published literature, however, the Central Limit Theorem is adduced to assert that they average to zero. [36] However, systematic temperature sensor errors are neither randomly distributed nor constant over time, space, or instrument. There is no theoretical reason to expect that these errors follow the Central Limit Theorem, [47, 48] or that such errors are reduced or removed by averaging multiple measurements; even when measurements number in the millions. A complete inventory of contributions to uncertainty in the surface air temperature record must include, indeed must start with, the systematic measurement error of the temperature sensor itself. [39]

The World Meteorological Organization (WMO) offers useful advice regarding systematic error. [20]

“Section 1.6.4.2.3 Estimating the true value – additional remarks.

In practice, observations contain both random and systematic errors. In every case, the observed mean value has to be corrected for the systematic error insofar as it is known. When doing this, the estimate of the true value remains inaccurate because of the random errors as indicated by the expressions and because of any unknown component of the systematic error. Limits should be set to the uncertainty of the systematic error and should be added to those for random errors to obtain the overall uncertainty. However, unless the uncertainty of the systematic error can be expressed in probability terms and combined suitably with the random error, the level of confidence is not known. It is desirable, therefore, that the systematic error be fully determined.

Thus far, in production of the global averaged surface air temperature record, the WMO advice concerning systematic error has been followed primarily in the breach.

Systematic sensor error in air and sea-surface temperature measurements has been woefully under-explored and field calibrations are few. Nevertheless, the reported cases make it clear that the surface air temperature record is contaminated with a very significant level of systematic measurement error. The non-normality of systematic error means that subtracting an average bias will not discharge the measurement uncertainty about the global temperature mean.

Further, the magnitude of the systematic error bias in surface air temperature and SST measurements is apparently as variable in time and space as the magnitude of the standard deviation of systematic uncertainty about the mean error bias. I.e., the mean systematic bias error was 2 C over snow on the Plaine Morte Glacier, Switzerland, but was 0.4 C over snow at Lincoln, Nebraska. Similar differences accrue to the engine-intake systematic error means reported by Brooks and Saur. Therefore, removing an estimate of mean bias will always leave the magnitude ambiguity of the residual mean bias uncertainty. In any complete evaluation of error, the residual uncertainty in mean bias will combine with the 1s standard deviation of measurement uncertainty into the uncertainty total.

A complete evaluation of systematic error is beyond the analysis presented here. However, to the extent that the above errors are representative, a set of estimated uncertainty bars due to systematic error in the global averaged surface air temperature record can be calculated, Figure 12.

The uncertainty bars in Figure 12 (right) reflect a 0.7:0.3 SST:land surface ratio of systematic errors. Combined in quadrature, bucket and engine-intake errors constitute the SST uncertainty prior to 1990. Over the same time interval the systematic error of the PRT/CRS sensor [39, 49], constituted the uncertainty in land-surface temperatures. Floating buoys made a partial contribution (0.25 fraction) to the uncertainty in SST between 1980-1990. After 1990 uncertainty bars are further steadily reduced, reflecting the increasing contribution and smaller errors of MMTS (land) and floating buoy (SS) sensors.

clip_image024

Figure 12: The 2010 global average surface air temperature record obtained from website of the Climate Research Unit (CRU), University of East Anglia, UK. http://www.cru.uea.ac.uk/cru/data/temperature/. Left, error bars following the description provided at the CRU website. Right, error bars reflecting the uncertainty width due to estimated systematic sensor measurement errors within the land and sea surface records. See the text for further discussion.

Figure 12 (right) is very likely a more accurate representation of the state of knowledge than is Figure 12 (left), concerning the rate or magnitude of change in the global averaged surface air temperature since 1850. The revised uncertainty bars represent non-normal systematic error. Therefore the air temperature mean trend loses any status as the most probable trend.

Finally, Figure 13 pays attention to the instrumental resolution of the historical meteorological thermometers.

Figure 13 caused some angry shouts from the audience at Erice, followed by some very rude approaches after the talk, and a lovely debate by email. The argument presented here prevailed.

Instrumental resolution defines the measurement detection limit. For example, the best-case historical 19th to mid-20th century liquid-in-glass (LiG) meteorological thermometers included 1 C graduations. The best-case laboratory-conditions reportable temperature resolution is therefore ±0.25 C. There can be no dispute about that.

The standard SST bucket LiG thermometers from the Challenger voyage on through the 20th century also had 1 C graduations. The same resolution limit applies.

The very best American ship-board engine-intake thermometers included 2 F (~1 C) graduations; on British ships they were 2 C. The very best resolution is then about ±(0.25 – 0.5) C. These are known quantities. Resolution uncertainty, like systematic error, does not average away. Knowing the detection limits of the classes of instruments allows us to estimate the limit of resolution uncertainty in any compiled historical surface air temperature record.

Figure 13 shows this limit of resolution. It compares the instrumental historical ±2s resolution, with ±2s uncertainty in the published Berkeley Earth air temperature compilation. The analysis applies equally well to the published surface air temperature compilations of GISS or CRU/UKMet, which feature the same uncertainty limits.

clip_image026

Figure 13: The Berkeley Earth global averaged air temperature trend with the published ±2s uncertainty limits in grey. The time-wise ±2s instrumental resolution is in red. On the right in blue is a compilation of the best resolution limits of the historical temperature sensors, from which the global resolution limits were calculated.

The globally combined instrumental resolution was calculated using the same fractional contributions as were noted above for the lower limit estimate of systematic measurement error. That is, 0.30:0.70, land : sea surface instruments, and the published historical fractional use of each sort of instrument (land: CRS vs. MMTS, and; SS: buckets vs. engine intakes vs. buoys).

The record shows that during the years 1800-1860, the published global uncertainty limits of field meteorological temperatures equal the accuracy of the best possible laboratory-conditions measurements.

After about 1860 through 2000, the published resolution is small smaller than the detection limits — the resolution limits — of the instruments themselves. From at least 1860, accuracy has been magicked out of thin air.

Does anyone find the published uncertainties credible?

All you engineers and experimental scientists out there may go into shock after reading this. I was certainly shocked by the realization. Espresso helps.

The people compiling the global instrumental record have neglected a experimental limit even more basic than systematic measurement error: the detection limits of their instruments. They have paid no attention to it.

Resolution limits and systematic measurement error produced by the instrument itself constitute lower limits of uncertainty. The scientists engaged in consensus climatology have neglected both of them.

It’s almost as though none of them have ever made a measurement or struggled with an instrument. There is no other rational explanation for that sort of negligence than a profound ignorance of experimental methods.

The uncertainty estimate developed here shows that the rate or magnitude of change in global air temperature since 1850 cannot be known within ±1 C prior to 1980 or within ±0.6 C after 1990, at the 95% confidence interval.

The rate and magnitude of temperature change since 1850 is literally unknowable. There is no support at all for any “unprecedented” in the surface air temperature record.

Claims of highest air temperature ever, based on even 0.5 C differences, are utterly insupportable and without any meaning.

All of the debates about highest air temperature are no better than theological arguments about the ineffable. They are, as William F. Buckley called them, “Tedious speculations about the inherently unknowable.”

There is no support in the temperature record for any emergency concerning climate. Except, perhaps an emergency in the apparent competence of AGW-consensus climate scientists.

4. Acknowledgements: Prof. Hendrik Huwald and Dr. Marc Parlange, Ecole Polytechnique Federale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland, are thanked for generously providing the Plaine Morte sensor calibration data entering into Figure 4, Figure 5, and Figure 6. This work was carried out without any external funding.

5. References

[1] JCGM, Evaluation of measurement data — Guide to the expression of uncertainty in measurement 100:2008, Bureau International des Poids et Mesures: Sevres, France.

[2] Frank, P., et al., Determination of ligand binding constants for the iron-molybdenum cofactor of nitrogenase: monomers, multimers, and cooperative behavior. J. Biol. Inorg. Chem., 2001. 6(7): p. 683-697.

[3] Frank, P. and K.O. Hodgson, Cooperativity and intermediates in the equilibrium reactions of Fe(II,III) with ethanethiolate in N-methylformamide solution. J. Biol. Inorg. Chem., 2005. 10(4): p. 373-382.

[4] Hinkley, N., et al., An Atomic Clock with 10-18 Instability. Science, 2013. 341(p. 1215-1218.

[5] Parker, D.E., et al., Interdecadal changes of surface temperature since the late nineteenth century. J. Geophys. Res., 1994. 99(D7): p. 14373-14399.

[6] Quayle, R.G., et al., Effects of Recent Thermometer Changes in the Cooperative Station Network. Bull. Amer. Met. Soc., 1991. 72(11): p. 1718-1723; doi: 10.1175/1520-0477(1991)072<1718:EORTCI>2.0.CO;2.

[7] Hubbard, K.G., X. Lin, and C.B. Baker, On the USCRN Temperature system. J. Atmos. Ocean. Technol., 2005. 22(p. 1095-1101.

[8] van der Meulen, J.P. and T. Brandsma, Thermometer screen intercomparison in De Bilt (The Netherlands), Part I: Understanding the weather-dependent temperature differences). International Journal of Climatology, 2008. 28(3): p. 371-387.

[9] Barnett, A., D.B. Hatton, and D.W. Jones, Recent Changes in Thermometer Screen Design and Their Impact in Instruments and Observing Methods WMO Report No. 66, J. Kruus, Editor. 1998, World Meteorlogical Organization: Geneva.

[10] Lin, X., K.G. Hubbard, and C.B. Baker, Surface Air Temperature Records Biased by Snow-Covered Surface. Int. J. Climatol., 2005. 25(p. 1223-1236; doi: 10.1002/joc.1184.

[11] Hubbard, K.G. and X. Lin, Realtime data filtering models for air temperature measurements. Geophys. Res. Lett., 2002. 29(10): p. 1425 1-4; doi: 10.1029/2001GL013191.

[12] Huwald, H., et al., Albedo effect on radiative errors in air temperature measurements. Water Resorces Res., 2009. 45(p. W08431; 1-13.

[13] Menne, M.J. and C.N. Williams, Homogenization of Temperature Series via Pairwise Comparisons. J. Climate, 2009. 22(7): p. 1700-1717.

[14] Briffa, K.R. and P.D. Jones, Global surface air temperature variations during the twentieth century: Part 2 , implications for large-scale high-frequency palaeoclimatic studies. The Holocene, 1993. 3(1): p. 77-88.

[15] Hansen, J. and S. Lebedeff, Global Trends of Measured Surface Air Temperature. J. Geophys. Res., 1987. 92(D11): p. 13345-13372.

[16] Brohan, P., et al., Uncertainty estimates in regional and global observed temperature changes: A new data set from 1850. J. Geophys. Res., 2006. 111(p. D12106 1-21; doi:10.1029/2005JD006548; see http://www.cru.uea.ac.uk/cru/info/warming/.

[17] Karl, T.R., et al., The Recent Climate Record: What it Can and Cannot Tell Us. Rev. Geophys., 1989. 27(3): p. 405-430.

[18] Hubbard, K.G., X. Lin, and E.A. Walter-Shea, The Effectiveness of the ASOS, MMTS, Gill, and CRS Air Temperature Radiation Shields. J. Atmos. Oceanic Technol., 2001. 18(6): p. 851-864.

[19] MacHattie, L.B., Radiation Screens for Air Temperature Measurement. Ecology, 1965. 46(4): p. 533-538.

[20] Rüedi, I., WMO Guide to Meteorological Instruments and Methods of Observation: WMO-8 Part I: Measurement of Meteorological Variables, 7th Ed., Chapter 1. 2006, World Meteorological Organization: Geneva.

[21] Berry, D.I. and E.C. Kent, Air–Sea fluxes from ICOADS: the construction of a new gridded dataset with uncertainty estimates. International Journal of Climatology, 2011: p. 987-1001.

[22] Challenor, P.G. and D.J.T. Carter, On the Accuracy of Monthly Means. J. Atmos. Oceanic Technol., 1994. 11(5): p. 1425-1430.

[23] Kent, E.C. and D.I. Berry, Quantifying random measurement errors in Voluntary Observing Ships’ meteorological observations. Int. J. Climatol., 2005. 25(7): p. 843-856; doi: 10.1002/joc.1167.

[24] Kent, E.C. and P.G. Challenor, Toward Estimating Climatic Trends in SST. Part II: Random Errors. Journal of Atmospheric and Oceanic Technology, 2006. 23(3): p. 476-486.

[25] Kent, E.C., et al., The Accuracy of Voluntary Observing Ships’ Meteorological Observations-Results of the VSOP-NA. J. Atmos. Oceanic Technol., 1993. 10(4): p. 591-608.

[26] Rayner, N.A., et al., Global analyses of sea surface temperature, sea ice, and night marine air temperature since the late nineteenth century. Journal of Geophysical Research-Atmospheres, 2003. 108(D14).

[27] Emery, W.J. and D. Baldwin. In situ calibration of satellite sea surface temperature. in Geoscience and Remote Sensing Symposium, 1999. IGARSS ’99 Proceedings. IEEE 1999 International. 1999.

[28] Emery, W.J., et al., Accuracy of in situ sea surface temperatures used to calibrate infrared satellite measurements. J. Geophys. Res., 2001. 106(C2): p. 2387-2405.

[29] Woodruff, S.D., et al., The Evolving SST Record from ICOADS, in Climate Variability and Extremes during the Past 100 Years, S. Brönnimann, et al. eds, 2007, Springer: Netherlands, pp. 65-83.

[30] Brooks, C.F., Observing Water-Surface Temperatures at Sea. Monthly Weather Review, 1926. 54(6): p. 241-253.

[31] Saur, J.F.T., A Study of the Quality of Sea Water Temperatures Reported in Logs of Ships’ Weather Observations. J. Appl. Meteorol., 1963. 2(3): p. 417-425.

[32] Barnett, T.P., Long-Term Trends in Surface Temperature over the Oceans. Monthly Weather Review, 1984. 112(2): p. 303-312.

[33] Anderson, E.R., Expendable bathythermograph (XBT) accuracy studies; NOSC TR 550 1980, Naval Ocean Systems Center: San Diego, CA. p. 201.

[34] Bralove, A.L. and E.I. Williams Jr., A Study of the Errors of the Bathythermograph 1952, National Scientific Laboratories, Inc.: Washington, DC.

[35] Hazelworth, J.B., Quantitative Analysis of Some Bathythermograph Errors 1966, U.S. Naval Oceanographic Office Washington DC.

[36] Kennedy, J.J., R.O. Smith, and N.A. Rayner, Using AATSR data to assess the quality of in situ sea-surface temperature observations for climate studies. Remote Sensing of Environment, 2012. 116(0): p. 79-92.

[37] Hadfield, R.E., et al., On the accuracy of North Atlantic temperature and heat storage fields from Argo. J. Geophys. Res.: Oceans, 2007. 112(C1): p. C01009.

[38] Castro, S.L., G.A. Wick, and W.J. Emery, Evaluation of the relative performance of sea surface temperature measurements from different types of drifting and moored buoys using satellite-derived reference products. J. Geophys. Res.: Oceans, 2012. 117(C2): p. C02029.

[39] Frank, P., Uncertainty in the Global Average Surface Air Temperature Index: A Representative Lower Limit. Energy & Environment, 2010. 21(8): p. 969-989.

[40] Frank, P., Imposed and Neglected Uncertainty in the Global Average Surface Air Temperature Index. Energy & Environment, 2011. 22(4): p. 407-424.

[41] Hansen, J., et al., GISS analysis of surface temperature change. J. Geophys. Res., 1999. 104(D24): p. 30997–31022.

[42] Hansen, J., et al., Global Surface Temperature Change. Rev. Geophys., 2010. 48(4): p. RG4004 1-29.

[43] Jones, P.D., et al., Surface Air Temperature and its Changes Over the Past 150 Years. Rev. Geophys., 1999. 37(2): p. 173-199.

[44] Jones, P.D. and T.M.L. Wigley, Corrections to pre-1941 SST measurements for studies of long-term changes in SSTs, in Proc. Int. COADS Workshop, H.F. Diaz, K. Wolter, and S.D. Woodruff, Editors. 1992, NOAA Environmental Research Laboratories: Boulder, CO. p. 227–237.

[45] Jones, P.D. and T.M.L. Wigley, Estimation of global temperature trends: what’s important and what isn’t. Climatic Change, 2010. 100(1): p. 59-69.

[46] Jones, P.D., T.M.L. Wigley, and P.B. Wright, Global temperature variations between 1861 and 1984. Nature, 1986. 322(6078): p. 430-434.

[47] Emery, W.J. and R.E. Thomson, Data Analysis Methods in Physical Oceanography. 2nd ed. 2004, Amsterdam: Elsevier.

[48] Frank, P., Negligence, Non-Science, and Consensus Climatology. Energy & Environment, 2015. 26(3): p. 391-416.

[49] Folland, C.K., et al., Global Temperature Change and its Uncertainties Since 1861. Geophys. Res. Lett., 2001. 28(13): p. 2621-2624.

5 1 vote
Article Rating

Discover more from Watts Up With That?

Subscribe to get the latest posts sent to your email.

300 Comments
Inline Feedbacks
View all comments
Bruce of Newcastle
April 19, 2016 1:03 pm

When doing work with hot aqueous solutions in our lab I’d grab half a dozen glass scientific thermometers and put them into a beaker of boiling water. Then choose the one reading 100 C. Often they would be out by a couple degrees high or low.
We’ve moved on from glass thermometers mostly, but instrument error has not been repealed.

john harmsworth
Reply to  Bruce of Newcastle
April 19, 2016 1:38 pm

Digital sensors can very easily carry errors at least as great as well calibrated glass thermometers. Digital sensors are mostly correlated to circuit resistance. Circuit resistance can be affected by the wire gauge, the length of the wiring and the quality on any connections in the circuit. All of these can be affected by the ambient temperature as well. Precisely accurate data is very hard to get. These guys arent really trying. Its lousy science.

TDBraun
April 19, 2016 1:19 pm

What answer have the AGW supporters given to this charge?
At the conference, what counter-arguments were given?

Reply to  TDBraun
April 20, 2016 10:19 pm

TDBraun, no one disputed the systematic error part of the talk. But several people said that resolution didn’t apply in the averages of large numbers of measurements. Hence the email debate I mentioned. At the end of that, instrumental resolution applied.

April 19, 2016 1:40 pm

With all of the above taking place they still report the hottest year/month/day etc to 0.01 resolution, they are so smart, these people……
https://www.ncdc.noaa.gov/sotc/global/2015/8/supplemental/page-1

1sky1
April 19, 2016 1:49 pm

Systematic measurement error is indeed a highly under-appreciated problem in climate studies. Unlike random errors, which have a reasonable body of proven theory to provide error estimates, systematic errors often require great practical experience to identify properly and estimate closely. The present post provides a sobering step toward that goal, but only in the case of INDIVIDUAL measurements. Inasmuch as basic climate data consists of various AVERAGES of measurements. the large uncertainty ranges indicated herein greatly overstate the problem, denying the strong ameliorative effects of averaging sizable samples.
Contrary to the presumption here, there is no requirement that the distributions of all variables be Gaussian or identical for the Central Limit Theorem to apply to the composite mean measurement. It suffices that they be statistically independent. With station temperatures, coherent variability is dominated by diurnal and seasonal cycles, leaving a chaotic residual that is effectively a normal random variable that is virtually independent month to month. Nor is a systematic bias in mean measurement an obstacle to resolving temperature CHANGES at any given station. Absolute accuracy is not necessary; only uniformity of measurement is required to gain strong improvement of resolution in large samples.
The upshot is that average annual data at a vetted station will typically allow a Celsius resolution of a tenth or two. The caveat, however, lies in the vetting–not only for all the usual instrumentation issuess, but for the general location away from UHIs that bias the apparent long-term “trend” enormously. That is the systematic error that most afflicts an overwhelmingly urban global data base.

Reply to  1sky1
April 19, 2016 6:18 pm

“Contrary to the presumption here, there is no requirement that the distributions of all variables be Gaussian or identical for the Central Limit Theorem to apply to the composite mean measurement.”
The Central Limit Theorem itself is not needed. All that is required is cancellation; mean effect of errors tends to zero, but doesn’t have to be normally distributed.
I generally agree with much of this comment. The key matter usually is how bias affects changes in temperature. Steady bias subtracts out with anomaly. Contrary to what is said, unsteady bias is a huge concern to climate scientists. It’s what homogenisation and the whole adjustment thing is about. UHI is a big concern; GISS at least tries directly to quantify it. NOAA relies more on homogenisation – also a legitimate approach. Varying bias is why people adjust for TOBS, ship-buoy differences etc.

Reply to  Nick Stokes
April 19, 2016 7:34 pm

” . Contrary to what is said, unsteady bias is a huge concern to climate scientists. It’s what homogenisation and the whole adjustment thing is about. UHI is a big concern; GISS at least tries directly to quantify it. NOAA relies more on homogenisation – also a legitimate approach. Varying bias is why people adjust for TOBS, ship-buoy differences etc.”
Then why doesn’t anyone use a method that makes it irrelevant?
As opposed to using methods that are overly complex and subjected to expectation bias.

Reply to  Nick Stokes
April 19, 2016 9:38 pm

To the All-Knowing Stokes,
The Central Limit Theorem, also knows as the Theory of Large Numbers, only applies to repeated measurements of One Variable!!! But then you knew that.
The temperature in Kalispell is not the same variable as the temperature in Belgrade, nor the temperature in Fairbanks, nor the temperature in Bangkok.
Your grandfather would be ashamed…

Reply to  Nick Stokes
April 19, 2016 11:59 pm

“Then why doesn’t anyone use a method that makes it irrelevant?”
It’s real and it’s relevant.
“The Central Limit Theorem, also knows as the Theory of Large Numbers, only applies to repeated measurements of One Variable!!! But then you knew that.”
No, if it were so restricted, it would not be of much use. But as I said, here the Central Limit Theorem, however restricted you may think it is, is not needed.

Reply to  Nick Stokes
April 20, 2016 4:24 am

” It’s real and it’s relevant.”
Then alter your methods.

Reply to  Nick Stokes
April 20, 2016 5:37 am

““Then why doesn’t anyone use a method that makes it irrelevant?”
It’s real and it’s relevant.”
Yes, but you can get the change in temp of the station with the minimal impact of it, but you have to take an anomaly off that station as compared to itself, instead you blend all of these errors into both your data and your baseline, so you end up with a near worthless mess.
And you have no attribute.
While the reality is that most of the attribution is ocean heat moving from one place to another, and the residual is so small it doesn’t even show up.
It’s almost like you guys don’t want to find out the temp changes over the last 100 years were almost all natural, other than land use changes (which have far more forcing than Co2 does many times over).

1sky1
Reply to  Nick Stokes
April 20, 2016 5:44 pm

While there’s much lip service about UHI being “a big concern,” none of the index makers tackles that time-variable-bias problem with well-founded signal analysis methods. On the contrary, simplistic adjustments are made to foster the impression that the degree of bias is not only easily recognizable, and if not negligible, then reliably correctible. Meanwhile, a plethora of ad hoc adjustments keep pushing the apparent century-long “trend” forever upward.

Reply to  Nick Stokes
April 20, 2016 9:43 pm

Once again, to the All-Knowing Stokes,
No technical professional, none, believe that an instrument giving a reading, can have that reading somehow improved, by Error Analysis. Create improved data, that did not exist before, by your amazingly excellent prescience, but do not claim that there is any proof whatsoever, except that you are far superior to the manufacturer of the instrument, and the people who read the instrument.
No respect for the data? Just exactly what job do you have now? Noisy, great, but really???
Political bias, Bueller, Bueller, anyone? Anyone??
If you continue down this path, you will richly deserve the Limbo to which you belong.
Data? Do you know what Data, and Datum”s” is/are?
If the instrument is wrong, deal with it as best you can, and then, wait for it, Get a Better Instrument!!!
Alter the “Data,” lose the respect of every single professional world-wide who has been trained in the fundamentals of Data.
But you are past that now, into the Media.
Congratulations, and once again, your grandfather is spinning in his grave…

Reply to  Nick Stokes
April 20, 2016 10:28 pm

Lot’s of hand-waving there, Nick, but no solution. The historical systematic error distributions cannot be known. There’s no way to avoid large uncertainty widths in the historical record.

Reply to  Nick Stokes
April 20, 2016 10:34 pm

Nick, you claim the CLT “is not needed,” and yet it is invariably invoked as the error-removal tool in the published literature.
I had an email conversation awhile back with William Emery about error in ARGO temperatures. He referred me to a graduate student who said the Law of Large Numbers and the CLT made errors irrelevant in temperature means.
They need you, Nick.

Reply to  1sky1
April 20, 2016 10:26 pm

1sky1, the central limit theorem only allows one to accurately determine the true mean of any distribution, given a large number of estimates.
Knowing the mean does not normalize a non-normal error distribution, however.
Even if the mean of a non-normal error distribution is found and subtracted out, the error distribution retains its non-normality. The measurement uncertainty does not average away.

1sky1
Reply to  Pat Frank
April 21, 2016 2:46 pm

The major point of CLT is that summing INDEPENDENT random variables leads to a normal distribution for the sum, IRRESPECTIVE of the distribution of the individual variables. That is an essential point when considering large aggregates of instruments, or lengthy time-averages, because normality of the joint distribution of individual measurements then implies independence of individual measurements when they are uncorrelated. Lack of correlation is usually obtained when temperature measurements are suitably “anomalized.”
While I much appreciate your endeavors to explicate the vagaries of Individual measurements, the issue at hand in climate studies is anomaly AVERAGEs, either spatial or temporal.. The r.m.s. values of the latter are invariably much smaller than those of INDIVIDUAL measurements, just as the error of estimating someone’s weight from a large sample of coarse measurements is much smaller than the least-count unit.

Reply to  Pat Frank
April 21, 2016 9:52 pm

1sky1, your analysis includes the standard assumption that measurement error is normally distributed around a systematic offset.
The evidence presented above shows that measurement errors have non-normal distributions. There is no reason whatever to assume that combining non-normal error distributions produces a normal error distribution.
Even if one could derive a valid offset for historical global averages (a wildly optimistic idea), the uncertainty due to a non-normal error distribution would remain. The historical error distributions are unknown. The CLT does not apply to unknown error distributions; especially when available evidence shows non-normality.
The entire standard approach to global air temperature errors is promiscuous in the extreme.

1sky1
Reply to  Pat Frank
April 22, 2016 4:08 pm

Pat Frank:
Once again you overlook the important proviso that in climate studies we do not deal with individual errors, but with aggregate errors. The distribution of individual errors is immaterial to the question of average errors in a large sample, to which the CLT indeed does apply.

Reply to  Pat Frank
April 23, 2016 2:09 pm

1sky1, once again you overlook the fact that there is no reason to assume that aggregating systematically non-normal error distributions produces a normal distribution.
The known error distributions violate the assumptions of the CLT. The compilers all just go ahead and apply its formalism anyway. It’s negligence, pure and simple.

1sky1
Reply to  Pat Frank
April 23, 2016 4:16 pm

Pat Frank:
On the contrary, CLT provides a rigorous reason to expect the aggregated distribution of non-Gaussian variables to be Gaussian. That is the essential point of the theorem. The unmistakable application of it to averages of measurements, the clear issue in climate studies, is discussed here: http://www.statisticalengineering.com/central_limit_theorem.htm

1sky1
Reply to  Pat Frank
April 23, 2016 4:30 pm

Additional insight is provided here: http://davidmlane.com/hyperstat/A14043.html

Reply to  Pat Frank
April 23, 2016 6:19 pm

1sky1, thanks for the link, which says the following:

Central Limit Theorem
The central limit theorem states that given a distribution with a mean μ and variance σ², the sampling distribution of the mean approaches a normal distribution with a mean (μ) and a variance σ²/N as N, the sample size, increases.

As you know, this is equivalent to saying that there is a standard deviation of the mean equal to
σ / N0.5 (Equation 1)
However, this is NOT the case for datasets with high Hurst Exponents, which are common in the climate world. In those datasets the standard deviation of the mean varies as
σ / N1 – H (Equation 2)
where H is the Hurst Exponent. This was first observed by Koutsoyiannis, and I later derived it independently without knowing of his discovery, as discussed in my post A Way To Calculate Effective N.
So we already know that the “Central Limit Theorem” as defined above is a special case. Note that normal datasets have a Hurst Exponent of ~ 0.5. Plug that into Equation 2 above and you get equation 1 …
Note that this effect of the Hurst Exponent can make a very, very large difference in the estimation of statistical significance, I mean orders of magnitude …
Next, let me see if I can clarify the discussion between you and Pat Frank by means of an example.
Suppose we have 1000 measurements, each of which has an inherent error of say +1.60 / – 0.01. In other words, the measuring instrument often reads higher than the actual value, but almost never reads lower than the actual value.
What is the error if we average all of the measurements?
I believe (if I understand you) that you say that the error of the average will be symmetrical because of the CLT.
Pat (if I understand him) says that the CLT gives bounds on the error of the mean … but that doesn’t mean that the error is symmetrical.
IF that is the question under discussion, I agree with Pat. If we average all of those measurements, the chances that the calculated average is below the true average is much smaller than the chance that the calculated average is high. In fact, with that kind of measuring instrument, the resulting mean could be considered a estimate of the maximum possible value of the true mean.
And of course, that still doesn’t include the Hurst Exponent …
Best to both of you,
w.

Reply to  Pat Frank
April 23, 2016 8:56 pm

1Sky1
You’re continuing to overlook the critical point.
Here’s what your own link says: “The CLT is responsible for this remarkable result:
The distribution of an average tends to be Normal, even when the distribution from which the average is computed is decidedly non-Normal.

The CLT says that the distribution of the average is normal. Not that the distribution itself is normal.
That misunderstanding is rife in climate science. The CLT does not say that non-normal distributions themselves become normal.
To illustrate, let’s suppose a measured magnitude, “X” with a non-normal error distribution “E.” The total error “E” includes some offset plus the distribution. Initially we don’t know the magnitude of the offset.
The initial result is written as X±e, where “e” is the empirical standard deviation of the non-normal error distribution “E.”
You sample a set of estimates of the average of “E” and plot the distribution of the estimates. The estimates of the average of “E” is a normal distribution that gives you a good estimate of the average “A_E” of the non-normal distribution, “E.”
You can now correct “X” by subtracting the error offset, “A_E,” so that corrected value of X is X’ = X-A_E.
The new error distribution around X’ is E’ = E-A_E, which is still a non-normal error distribution. The empirical standard deviation of the non-normal error distribution E’ is still “e.”
The corrected valus is now X’±e, where “e” is still non-normal. That is, the uncertainty in the value of X’ is still the empirical standard deviation of a non-normal distribution of error, “e.”
That non-normal distribution “e” does not average away.
When X’ is averaged with other corrected X2’, X3’… all of which have non-normal error distributions, ±e2, ±e3 … the combined non-normal error distributions do necessarily produce a normal distribution of error. They cannot be assumed to combine into a normal distribution. They do not necessarily average to zero, and cannot be assumed to do so.
The uncertainty in the mean, X_mu, is the root-mean-square of the SDs of all the non-normal error distributions, e_mu = ±sqrt[sum over (e1^2 + e2^2 + e3^2 …)/(N-1)].
The CLT performs no magic. It does nothing to normalize non-normal distributions of error.

Reply to  Pat Frank
April 23, 2016 11:32 pm

Bit of a missing but important negative here.
The clause, “±e2, ±e3 … the combined non-normal error distributions do necessarily produce a normal distribution of error.” …
should be, ‘±e2, ±e3 … the combined non-normal error distributions do not necessarily produce a normal distribution of error.
Apologies if I misled or confused anyone.

1sky1
Reply to  Pat Frank
April 26, 2016 5:14 pm

A fundamental point is continually being missed here. In climate, as opposed to real-time physical-weather statistics we are always dealing with averages of averages. The monthly mean is the average of daily means at a particular station. The yearly temperature average is the average of the monthly means, whether they pertain to a single station or any aggregate spatial average thereof. Neither the Hurst exponent of the underlying of individual variables, nor the shape their ordinate-error distributions is material to the determination of such climatic means, which is the only context in which I invoked the CLT here. It’s applicability to the issues at hand should be all the more evident, when the fact that in many cases the sampling, say for May 1945, is EXHAUSTIVE, with zero sampling error. Nowhere do I claim that this changes the shape of any error-distribution.
I’m traveling and will not divert valuable time to respond to any further out-of-context castings of proven statistical ideas or my own words.

Reply to  1sky1
April 26, 2016 6:31 pm

Actually I work directly the NCDC supplied min and max temperature daily records.
But you’re right that daily mean temp allows a lot of different min and max values to average to the same value.

Reply to  Pat Frank
April 29, 2016 10:04 am

1sky1, now that you have admitted, concerning taking averages, that, “ Nowhere do I claim that this changes the shape of any error-distribution. “, you have implicitly admitted that the CLT provides no grounds to assume systematic measurement error is reduced in an average.
Let me also remind you of what you wrote here, to wit, “On the contrary, CLT provides a rigorous reason to expect the aggregated distribution of non-Gaussian variables to be Gaussian. ” This earlier statement is a complete contradiction of your later claim noted immediately above.
So, you’ve diametrically changed your position while asserting it has been constant.
The error distributions of historical air temperature measurements are unknown. The best you can do with the CLT is find the true mean of the set of recorded temperatures.
However, those recorded temperatures are erroneous to an unknown degree, and additionally have an unknown uncertainty distribution. The CLT then gives you the mean of the set of erroneous temperatures. That mean is in error with respect to the unknown true mean temperature. And it has an uncertainty envelope of unknown SD and unknown shape.
The negligently few calibration experiments available provide a lower limit estimate of the unknown SD as ±0.5 C.
So, at the end, you have an incorrect mean temperature with an unknown uncertainty distribution of estimated SD so large that nothing can be said of the rate or magnitude of the 20th century air temperature change.
That’s your global average air temperature.

1sky1
Reply to  Pat Frank
April 30, 2016 3:49 pm

When words are read without understanding the underlying context and concepts, confusion prevails. There is nothing contradictory in what I have maintained throughout.
As is well-known to those who have seriously studied mathematical statistics, there is no indispensable requirement that the random variables in a sum or aggregate be identically distributed for the CLT to apply. Convergence in the mean to the normal distribution occurs for non-identical or even non-independent variables under certain conditions (Lyapunov). The essential requirement is a large number of member variables in the aggregate. Any SYSTEMATIC measurement error in any member irretrievably biases the mean of that member in a FIXED manner, without otherwise affecting its distribution.
But if a large number of such members with INDEPENDENT biases are aggregated, as in a regional or global average, then the variously fixed biases themselves will tend toward a normal distribution. Only if many members of the aggregate have a commonality of bias, such as UHI, will the aggregate bias fail to be sharply reduced by simple averaging. The effect of measurement error per se,–which is almost invariably independent, instrument to instrument and procedure to procedure–becomes miniscule in practice with large enough aggregates. And since the sample description space of monthly climatic averages is very much finite, exhaustive measurements of such leave the nearly uncorrelated year-to-year and longer variability of monthly means as intrinsic random variables of real significance in the climate signal.

Reply to  Pat Frank
May 1, 2016 12:01 pm

1sky1, you wrote, “There is nothing contradictory in what I have maintained throughout.
Let’s see: you wrote, “CLT provides a rigorous reason to expect the aggregated distribution of non-Gaussian variables to be Gaussian.
Followed by, “the shape their ordinate-error distributions is [not] material to the determination of such climatic means, which is the only context in which I invoked the CLT here. … Nowhere do I claim that this changes the shape of any error-distribution.
So you first claimed the CLT proves non-normal error distributions normalize in an aggregate, and later admit the CLT has nothing to say about error distributions.
Your own words contradict you.
You wrote, “Any SYSTEMATIC measurement error in any member irretrievably biases the mean of that member in a FIXED manner, without otherwise affecting its distribution.
That is not correct when the systematic error is due to uncontrolled variables. In such cases, the error goes as the changing variables. The error distribution then and necessarily also changes with the variables.
This is exactly the case for surface air temperature measurements. The land surface variables are wind speed and irradiance. These vary in both time and space. Therefore the error mean and the error distribution also vary in both time and space, both for single instruments over time, and for multiple instruments across space.
Real-time filtering has been introduced in order to remove this error. Real-time filtering would not be necessary at all if your claim was true, because all error would merely average away after removal of your supposedly constant error mean.
Your supposition of constant error offsets has already been demonstrated wrong in the published literature, and is just part of the tendentious assumptions made in the field that allow practitioners to discount the unknowable systematic error in the historical record.
You wrote, “But if a large number of such members with INDEPENDENT biases are aggregated, as in a regional or global average, then the variously fixed biases themselves will tend toward a normal distribution.
Also not correct. We already know the error biases are not fixed. Only the sampled distribution of the mean of the biases will tend toward a normal distribution. The biases themselves need not at all have a normal distribution.
That is, when the biases themselves are the result of varying systematic effects, the distribution of the error biases need not be normal. The fact that the CLT allows one to find the mean bias does nothing to normalize the distribution of the biases.
It’s quite clear that by “INDEPENDENT” you mean tends toward iid, e.g., “tend toward a normal distribution.” You suppose that each instrumental error distribution is iid and the aggregate of error biases also converges to iid.
Everything is iid. Isn’t that just so convenient! a supposition.
That’s your assumption in a nutshell, iid über alles, and it’s unjustifiable.

1sky1
Reply to  Pat Frank
May 2, 2016 3:56 pm

Pat Frank:
On the one hand, you seem to question the central point of CLT, i.e., the
asymptotic convergence in the MEAN (or aggregate) of independent random
variables to the normal distribution,irrespective of their underlying
ordinate distributions. On the other, you view my statement that the SHAPE
of ordinate-error distributions is immaterial to the [empirical]
determination of climatic [data] means as a direct contradiction. Inasmuch
as the mean is a location–not a shape–parameter, your view is wholly
illogical. All the more so when one recognizes that monthly means are
usually determined exhaustively at any location, rather than by random
sampling, leaving no sampling error.
Your appeal to ever-changing uncontrolled variables is simply a red herring,
because they no longer constitute systematic, but SPORADIC errors. While
bad data may be often encountered, thoroughly vetted data do not manifest
such errors in practice.
Your reference to filtering confuses the issue even further, because
AGGREGATE averaging of time-series is a wholly separate issue from
TIME-DOMAIN averaging of physical variables with their distinctive
stochastic structures, which are almost never i.i.d.
Ultimately, the proof that you severely overestimate the effect of
systematic measurement errors in practice is provided by the high
repeatabilty when comparing totally disjoint aggregates of regionally
representative time series and by the high coherence found throughout all
frequencies in cross-spectral analysis with satellite measurements.

Reply to  Pat Frank
May 2, 2016 9:23 pm

1sky1, you wrote, “On the one hand, you seem to question the central point of CLT, i.e., the asymptotic convergence in the MEAN (or aggregate) of independent random variables to the normal distribution,irrespective of their underlying ordinate distributions.
Really? Let’s see:
April 20, 2016 at 10:26 pm (my first comment addressed to you): “the central limit theorem only allows one to accurately determine the true mean of any distribution, given a large number of estimates.”
April 23, 2016 at 8:56 pm: “The CLT says that the distribution of the average is normal. Not that the distribution itself is normal.”
May 1, 2016 at 12:01 pm: “Only the sampled distribution of the mean of the biases will tend toward a normal distribution. … the CLT allows one to find the mean bias …”
Evidence has it that your report of my position is 180 degrees away from my actual position. Are you that careless everywhere, or just here?
You wrote, “On the other, you view my statement that the SHAPE of ordinate-error distributions is immaterial to the [empirical] determination of climatic [data] means as a direct contradiction.
Fortunately, the same set of quotes above fully refutes your second statement as well. The evidence is entirely clear that from the start I viewed the CLT as showing that sampling can produce a normal distribution about the mean of a distribution of any shape.
Re-iterating my April 23, 2016 at 8:56 pm: “The CLT says that the distribution of the average is normal. Not that the distribution itself is normal.”
That’s twice you’ve diametrically misstated my view.
More explicitly, you shifted your ground about the CLT, at first claiming it turrned non-normal distributions into normal distributions, and only later correcting yourself (and then denying you did so).
And so, after shifting your own ground, you’ve attempted to shift mine. Does that seem forthright to you?
You went on to write, “your view is wholly illogical.” Given the easily verifiable evidence above of your tendentious inversions of position, one of us certainly is illogical but it’s not me.
You then wrote (following from your entirely erroneous judgements), “All the more so when one recognizes that monthly means are usually determined exhaustively at any location, rather than by random sampling, leaving no sampling error.
As we have already established, accurately determining a mean, monthly or otherwise, does nothing to remove the systematic error in the mean. Thus your point here is irrelevant.
You wrote, “Your appeal to ever-changing uncontrolled variables is simply a red herring, because they no longer constitute systematic, but SPORADIC errors.
On the contrary, calibration experiments reveal persistent, not sporadic, systematic measurement errors.
By the way, how does “SPORADIC” obviate “systematic?” Systematic errors can easily be sporadic, if the impositional variable is episodic.
While bad data may be often encountered, thoroughly vetted data do not manifest such errors in practice.
How would you know? Data contaminated with systematic error can behave just like good data. That was the very point of the opening discussion in the head-post.
I.e., “Figure 1 exemplifies the danger of systematic error. Contaminated experimental or observational results can look and behave just like good data, and can rigorously follow valid physical theory. Without care, such data invite erroneous conclusions. By its nature, systematic error is difficult to detect and remove.
Remember?
And none of the data you consider holy comes from field-calibrated instruments, so that no one has any idea of the magnitude of systematic error contamination in the record.
You wrote, “Your reference to filtering confuses the issue even further, because AGGREGATE averaging of time-series is a wholly separate issue from TIME-DOMAIN averaging of physical variables with their distinctive stochastic structures, which are almost never i.i.d.
An irrelevance again. Averaging a time series consisting of individual measured magnitudes, e.g., temperatures across a month, must take notice of the structure of the error in each point. Non-normal error in the individual points propagates into the mean and conditions that mean with an uncertainty.
Averaging an aggregate data set, in other words, is not independent of the error in the elements of the set. Your “wholly separate” is obviously wrong. I know of no area of physical science where it would hold that individual non-normal systematic measurement errors do not enter into an aggregate average.
And you just had to throw in “stochastic structures” didn’t you. You just can’t break that assumption addiction, can you. And why would you? Your career depends upon it.
Let’s note as well, that one does not average “physical variables” as you have it, but rather physical measurements. Variables, obviously, are the experimental or observational conditionals that influence measured magnitudes.
You wrote, “Ultimately, the proof that you severely overestimate the effect of systematic measurement errors in practice is provided by the high repeatabilty when comparing totally disjoint aggregates of regionally representative time series and by the high coherence found throughout all frequencies in cross-spectral analysis with satellite measurements.
Systematic error can never be appraised by internal comparisons. How do you know, by the way, that regionally representative time series are disjoint? According to Hansen and Lebedeff (1997) JGR 92(D11), 13,345-13,372 regional time series are highly correlated.
Satellite temperature measurements are not accurate to better than about ±0.3 C. Comparisons of satellite measurements with the surface air temperature record are a worthless indication of physical fidelity, given the extensive manipulations entered into the latter.

1sky1
Reply to  Pat Frank
May 3, 2016 4:59 pm

Pat Frank:
Sadly, you continue to pretend that I once claimed that individual non-normal error-distributions are turned into normal. Meanwhile you notably fail to cite the money quote of April 21 that prompted my extensive comments in the first place: “There is no reason whatever to assume that combining non-normal error distributions produces a normal error distribution.” You then conclude: “The CLT does not apply to unknown error distributions.” So much for your putative comprehension of convergence in the aggregate mean. The only ground I ever shifted is the pedagogical one, pointing out that climate data typically consists of averages of averages, thereby hoping that it would help clarify the issue.
The various errors of situ thermometry are far better known than you seem
aware of. Starting decades ago, numerous technical reports have thoroughly explored those errors based upon measurement schemes nearly an order of magnitude better than those found at typical stations. They invariably show fixed calibration biases and normal error distributions. Historically, Gauss’ very formulation of his distribution is empirically rooted in measurement errors. The major time-varying deterministic component is usually due to shelter deterioration over many years. The 60-day measurement comparison over a Swiss glacier that you show is pitifully short and made in a highly atypical setting; the apparent “trends” are extremely tenuous. It provides no scientific basis for any general conclusions.
Strangely, you refer to the physical effects of winds and insolation upon
temperature as uncontrolled variables, ostensibly contributing to the
measurement error. Inasmuch as these factors are entirely natural, common
sense tells us they produce intrinsic features of the in situ temperature
signal, not measurement error. What also totally escapes you is the fact
that the signal variance necessarily has to rise well above the noise level
(total variable error) for nearby stations to show highly correlated
time-variations. Inasmuch as your claimed uncertainty levels for station
records and satellite measurements roughly equal their respective r.m.s.
values, the observed high correlations would be mathematically impossible.
Given the complexities of real-world systems, geophysical data acquisition,
analysis and interpretation is not for novices or amateurs. The enormous
strides made in our understanding of geophysical processes since WWII have
relied extensively upon modern methods of signal and system analysis as a
an investigative tool. The very fact that you mockingly dismiss the whole
concept of “stochastic structure”–which is by no means limited to the
simple i.i.d. of classical statistics–speaks volumes. Your muddled notions
of independent and/or disjoint sampling and measurement only multiply that
volume. And your risibly cheap ad hominem about my “whole career depends upon it”
convinces me that further discussion here is fruitless.

Reply to  Pat Frank
May 3, 2016 10:17 pm

1sky1, you wrote, “Sadly, you continue to pretend that I once claimed that individual non-normal error-distributions are turned into normal….
There you go shifting my ground again. Here’s what I pointed out about your claim: “Let’s see, you wrote, “CLT provides a rigorous reason to expect the aggregated distribution of non-Gaussian variables to be Gaussian. (bold added)”
Very clever of you to shift your position yet again, from claims about aggregated distributions to “individual non-normal error-distributions”, and then falsely assign it to me.
That’s two falsehoods in one sentence.
You’re displaying evidence of pathological thinking; I’ll leave the causal diagnosis to others.
You wrote, “Meanwhile you notably fail to cite the money quote of April 21 that prompted my extensive comments in the first place: “There is no reason whatever to assume that combining non-normal error distributions produces a normal error distribution.” You then conclude: “The CLT does not apply to unknown error distributions.” So much for your putative comprehension of convergence in the aggregate mean.
So, once again you claim that aggregation of non-normal error distributions converges to a normal error distribution. Wrong again.
Here’s a bit of authoritative literature for you, from V. R. Vasquez and W. R. Whiting (2006) Accounting for Both Random Errors and Systematic Errors in Uncertainty Propagation Analysis … Risk Analysis 25(6) 1669-1681 doi: 10.1111/j.1539-6924.2005.00704.x:
Experimentalists have paid significant attention to the effect of random errors on uncertainty propagation in chemical and physical property estimation. However, even though the concept of systematic error is clear, there is a surprising paucity of methodologies to deal with the propagation analysis of systematic errors. The effect of the latter can be more significant than usually expected. … as pointed out by Shlyakhter (1994), the presence of this type of error violates the assumptions necessary for the use of the central limit theorem, making the use of normal distributions for characterizing errors inappropriate. (bold added)”
You wrote, “Starting decades ago, … blah, blah, blah, … The major time-varying deterministic component is usually due to shelter deterioration over many years.
Refuted by post references 7, 8, 9, 10, 11, 12, 18, and 19. The main time-varying deterministic component of error is insolation and wind-speed.
You wrote, “The 60-day measurement comparison over a Swiss glacier that you show is pitifully short …
It was a two-year experiment, representing thousands of air temperature measurements. I showed a representative part of it.
…and made in a highly atypical setting;” Right. Over snow. In the Alps. Very atypical. Corroborated in reference 10.
It provides no scientific basis for any general conclusions.” Right. Extended calibration experiments of different instruments carried out over years in multiple locales all yielding cross-verified data have no general meaning. Great thinking. You’d chuck the fundamental meaning of repeatability in science in order to save your assumptions about random error.
You wrote, “Inasmuch as [winds and insolation] are entirely natural, common sense tells us they produce intrinsic features of the in situ temperature signal, not measurement error.
Refuted by post references 7, 8, 9, 10, 11, 12, 18, and 19. Just to let you know, the sensor shield are heated by insolation and rely upon wind to exchange the atmosphere inside the shield. When wind is less than about 10 ms^-1, the inner atmosphere heats up, inducing a temperature error.
Your comment, 1sky1, shows you know nothing of meteorological air temperature measurement.
You wrote, “What also totally escapes you is the fact that the signal variance necessarily has to rise well above the noise level (total variable error) for nearby stations to show highly correlated time-variations.
That hasn’t escaped me at all. It also is not true by inspection because the same weather variables that play into air temperature also cause systematic sensor error.
…the observed high correlations would be mathematically impossible.” But not physically impossible. I already have analyses demonstrating this. I hope to publish it.
Given the complexities of real-world systems, geophysical data acquisition, analysis and interpretation is not for novices or amateurs.
This from a guy who has shown no understanding of the instruments under question or of their sources of error. Pretty rich, 1sky1.
The very fact that you mockingly dismiss the whole concept of “stochastic structure”–…
Dismissed the whole concept, did I? Peculiar, I thought the dismissal was about your tendentious assignment of stochastic to air temperature measurement errors. That was the context, wasn’t it.
You wrote, “And your risibly cheap ad hominem about my “whole career depends upon it” convinces me that further discussion here is fruitless.
Where’s the ad hominem in referring to career? I always thought “ad hominem” was ‘to the man.’ Let’s see . . . yup. Wrong again, 1sky1.
Tell me — without the assumption of random error throughout, what cosmic meaning is left in the global air temperature record? Whose careers would fall if that assumption is disproved?
That’s the difference between science and philosophy, by the way. In science, assumptions are provisional upon data. In philosophy, assumptions are invariantly retained. Your approach to the assumption of random error is in the realm of philosophy.

Reply to  1sky1
April 21, 2016 4:36 am

Pat,
“Nick, you claim the CLT “is not needed,” and yet it is invariably invoked”
OK, can you explain what it is used for?
People here are getting CLT and LOLN mixed up. The LOLN says in various ways that averaging larger samples gets you closer to a population mean. That is what we need. The CLT says that a sum or average of variables tends toward a normal distribution, even if the components are not normal. That may be nice to know, and probably holds here, but why do you need it?
As to averaging reducing uncertainty in means, I’ve worked out a real data case upthread.

Reply to  Nick Stokes
April 21, 2016 9:59 pm

Nick, The CLT is used to dispense with measurement error.
The universally applied assumption is that all historical measurement errors are normally distributed with a constant offset. All sorts of pendent theorizing is made to estimate that offset. The estimate is removed, and the CLT makes all the rest of the error disappear.
Except, of course, that the assumption is unwarranted and unwarrantable.
My reply to your dispensation of resolution is also upthread.

April 19, 2016 2:13 pm

Pat,
You don’t seem to mention recording accuracy. Temperatures are now measured electronically but were previously recorded to plus or minus 0.5 degrees
http://www.srh.noaa.gov/ohx/dad/coop/EQUIPMENT.pdf page 11

Reply to  dradb
April 20, 2016 10:39 pm

Thanks, dradb. You’re right, there are lots of other sources of error.
As mentioned to oz4caster above, who made the same point, I’m just trying to estimate a lower limit of uncertainty by looking at the errors from the instruments themselves.
Everything else including recording accuracy just adds on to that.

Christopher Hanley
April 19, 2016 2:29 pm

Anyone with even a passing knowledge of world history, European exploration of Africa, Australia, the Arctic and Antarctic for instance, the vast area of Siberia, not to mention the 70% area of oceans, knows that the notion of a global average temperature anomaly to fractions of one degree C back to 1840, 1860 or even 1880 is patently ridiculous.

Reply to  Christopher Hanley
April 19, 2016 3:03 pm

Many readers might have a thermometer on their porch. If it isn’t a digital one, it will have a liquid that moves up and down in a tube with markings on the glass.
That’s all they back than.
How many of them were marked in fraction of a degree?
Yet those are the readings they are saying we are hotter than by point somethingsomething of a degree…and another point somethingsomething will be the dome of us all!
Lots of theory and speculation in the foundations of “CAGW” theory. Also lots of cracks.

Reply to  Gunga Din
April 19, 2016 3:09 pm

Sorry. Skipped a few words there but I think the readers can plug them in for themselves.
IE “That’s all theyhad back then.”
(I’d starve if I made my living as a typist!)

David A
Reply to  Christopher Hanley
April 20, 2016 5:29 am

Christopher you are correct, and it is not easy now. I drive up and down the 99 in California often. The T varies constantly, often within minutes due to micro climates EVERYWHERE.

April 19, 2016 4:19 pm

“That’s not to say the satellite measurements don’t provide some value, but it is an indication why the surface temperature data analyzed and reported by NASA, NOAA and others is viewed as the gold standard.”
Gavin Schmidt.
http://www.climatecentral.org/news/what-to-know-februarys-satellite-temp-record-20091

Christopher Hanley
Reply to  Mark M
April 19, 2016 5:58 pm

Well he would say that wouldn’t he — but he didn’t.
That’s a quote from Brian Kahn author of the article.

April 19, 2016 7:42 pm

Simply by looking at the data, and especially at how the data has been altered over the years, it is easy to determine that GISTEMP’s asserted confidence intervals (95%) are fantasy.
http://www.elcore.net/ClimateSanity/GISTEMPsOverconfidenceIntervals.html
http://www.elcore.net/ClimateSanity/GISS%20LOTI%20Changes%202007%20to%202015%20ELCore%20small.jpg

April 19, 2016 10:45 pm

Nice work Pat. I’ve been on that bandwagon myself for the past 10 years. I haven’t even approached your rigor, which I expect will be lost on non-scientists and particularly on people that don’t do experiment design.
My favorite way to communicate the problem is to ask if folks can imagine an elderly fellow wearing bi-focals and a bathrobe, reading a thermometer he got mail order from Washington D.C., in South Dakota, at 6pm, in February, during a blizzard, in 1895.
That seems to communicate the point pretty well.

Reply to  Bartleby
April 20, 2016 10:43 pm

Thanks, Bartleby. Mine is looking through distorting glasses and claiming a dark fuzzy blob is really a house with a cat in the window. 🙂

Editor
April 20, 2016 3:35 am

Pat Frank, John Kennedy has an excellent paper on the uncertainties of sea surface temperature data:
http://onlinelibrary.wiley.com/doi/10.1002/2013RG000434/full

Reply to  Bob Tisdale
April 20, 2016 10:48 pm

Thanks, Bob. 🙂 I have that paper, and it’s very useful.

garymount
April 20, 2016 4:30 am

Pat Frank, If I was to plot two separate trend lines using the data of figure 12 on the right, one of which creates the steepest positive slope possible and another trend that also selected data within the error ranges to create the steepest negative slope, would I be correct to claim that either of these slopes / trends are just as likely as the other ?

Reply to  garymount
April 20, 2016 10:53 pm

garymount, yes, but each of them would be very low probability because of the very large number of equally likely possibilities.
Drawing possible trend lines would not be done by selecting data points, though, because the points have the displayed uncertainties.
It would be done by drawing some sort of physically-real-seeming line through the uncertainty envelope. And then having to contend with the fact that the drawn line is no better than any other line.

Bindidion
April 20, 2016 7:34 am

Maybe Mr Frank is willing not only to show us a chart visible to anybody on Berkeley Earth’ web site, but also to spend some more time in a deep reading of the following document:
Berkeley Earth Temperature Averaging Process
http://www.scitechnol.com/berkeley-earth-temperature-averaging-process-IpUG.pdf
Maybe he then understands that, while all his remarks look perfect, all what he pinpoints nevertheless builds in the sum no more than a small part of the problems encountered by surface temperature measurement groups.
Wouldn’t the trends show by far much greater disparities among the different institutions, if his claims about errors and measurement quality were so relevant? I really don’t know, maybe Mr Frank has the appropriate answer…
1. Trends since 1850 – all in °C / decade with 2σ, by Kevin Cowtan’s trend computer
– Berkeley Earth: 0.058 ± 0.006
– HadCRUT4: 0.049 ± 0.006
2. Trends since 1880
– Berkeley Earth: 0.074 ± 0.007
– HadCRUT4: 0.065 ± 0.008
– GISSTEMP: 0.070 ± 0.008
– NOAA: 0.068 ± 0.008
3. Trends since 1891
– Berkeley Earth: 0.079 ± 0.007
– HadCRUT4: 0.073 ± 0.008
– GISSTEMP: 0.078 ± 0.009
– JMA: 0.073 ± 0.001 (*)
– NOAA: 0.077 ± 0.008
(*) not available at York, Linest used instead, since its trends are the same as Cowtan’s ± 0.001 °C for the 4 others when starting in 1891; only the 2σ differ due to Linest not accurately considering matters like white noise etc.
We have trend differences below 0.01 °C / decade, even though these institutions don’t share all raw data and perform highly different computations on that data…
A look at a highly scalable plot of 5 surface temperature datasets (common baseline: 1981-2010, anomalies in °C) shows us, that’s evident, pretty good the same:
http://fs5.directupload.net/images/160420/2sxqhvwh.pdf

Reply to  Bindidion
April 20, 2016 7:48 am

“Maybe he then understands that, while all his remarks look perfect, all what he pinpoints nevertheless builds in the sum no more than a small part of the problems encountered by surface temperature measurement groups.
Wouldn’t the trends show by far much greater disparities among the different institutions, if his claims about errors and measurement quality were so relevant? I really don’t know, maybe Mr Frank has the appropriate answer…”
Or maybe it shows they all use the same basis methodology and biases.
But even at this, a temperature increases does not mean it is attributable to Co2. And they are regional increases, not global.
How do you explain a well mixed gas only warming part of the planet? They hide this by making the narrative global temperature.

Bindidon
Reply to  micro6500
April 20, 2016 9:02 am

1. “Or maybe it shows they all use the same basis methodology and biases.”
Is that the ‘average skeptic answer’ ? Why don’t you inspect the documents these people produce, instead of supposing a priori they do wrong?
Bob Tisdale has published a link to Kennedy’s methodology paper, I did for Berkeley. You are kindly invited to read the stuff, but my little finger tells me you probably won’t. It’s so easy to write critique, isn’t it?
2. “But even at this, a temperature increases does not mean it is attributable to Co2. And they are regional increases, not global. ”
Within less than a week, you are the 3rd person who replies to one of my comments by mentioning this poor CO2 guy, although I did NOT… Slowly but surely it gets really hilarious.
Just a little hint for you: the average surface temperature increase since 1850 (about 0.55 °C/century, i.e. 0.91 °C) is still below the logarithm of CO2’s atmospheric concentration. Understood?
3. “And they are regional increases, not global. ”
Aha. That’s quite interesting: the 5 surface records I mentioned are all global records…

Reply to  Bindidon
April 20, 2016 10:59 am

Is that the ‘average skeptic answer’ ? Why don’t you inspect the documents these people produce, instead of supposing a priori they do wrong?,/blockquote>
I decided to go look at the data myself, and the came up with my own method.
https://micro6500blog.wordpress.com/2015/11/18/evidence-against-warming-from-carbon-dioxide/

Within less than a week, you are the 3rd person who replies to one of my comments by mentioning this poor CO2 guy, although I did NOT… Slowly but surely it gets really hilarious.

Okay, you didn’t mention Co2, but those graphs are the common example of proof.

“And they are regional increases, not global. ”
Aha. That’s quite interesting: the 5 surface records I mentioned are all global records…

You didn’t mention a regional series, but I was trying to point out that global series are a good way to hide the real source of the change in climate.

Bindidon
Reply to  micro6500
April 20, 2016 9:11 am

I mean of course (still for short) the logarithm of its atmospheric concentration’s delta since that year.

Bindidon
Reply to  micro6500
April 20, 2016 1:14 pm

micro6500 April 20, 2016 at 10:59 am
“I decided to go look at the data myself…”
“Okay, you didn’t mention Co2, but…”
“You didn’t mention a regional series, but…”
Many thanks in advance to avoid such meaningless replies in the future.
Sorry: you are incredibly unexperienced.

Reply to  Bindidon
April 20, 2016 1:34 pm

Many thanks in advance to avoid such meaningless replies in the future.

You’re welcome.

Sorry: you are incredibly unexperienced.

Some things sure, others not so much.

Reply to  Bindidion
April 20, 2016 10:56 pm

Bindidion, they’re all working with the same set of numbers. Why shouldn’t they get the same result?
The fact that the numbers have large ± uncertainties associated with them does not change their recorded values.

April 20, 2016 8:41 am

Pat . Great post. It would be a great contribution if you could produce a similar analysis of the satellite data sets.

Bindidon
Reply to  Dr Norman Page
April 20, 2016 9:23 am

YES! That indeed would be welcome.
Especially when considering
– the huge differences between UAH5.6 and UAH6.0beta5 (especially in the North Pole region);
– the differences to be expected between RSS3.3 TLT and RSS 4.0 TLT;
– Kevin Cowtans comparison of the accuracy of surface vs. satellite temp measurements (reviewed by Carl Mears): http://www.skepticalscience.com/surface_temperature_or_satellite_brightness.html
(I know: Kevins paper is at SKS, I’m by far not Nucitellis greatest fan. But it’s worth to be read.)

Reply to  Dr Norman Page
April 20, 2016 10:59 pm

Thanks, Norman. One thing at a time. After 3 years of steady effort, I’ve still not managed to publish my critical analysis of climate model projections.
As here in temperature-record-land, no one in climate-modeling-land is capable of evaluating physical error. Whatever they don’t understand is ipso facto wrong.

Michael Carter
April 20, 2016 1:03 pm

One of the most important posts on this site IMO
What I see so often lacking in this debate is simple common sense. Aside from events of extreme heat or cold why would anyone care about a few degrees here or there while reading and recording a mercury thermometer during the era 1850-1950? Take a look at the scales in F on the typical thermometer of the era. Add to this the influence of shrouds and wind and I doubt that readings were accurate within 3 C. Back then it did not matter!
I am not a statistician but my gut feeling is that if all readings right up to present day were read and recorded to the nearest 1 C we would end up with something just as accurate and useful as what is being reported: “A new global record by over 0.3 C and over 1 C of the 19th century average!”. Yea right!

Bindidon
Reply to  Michael Carter
April 20, 2016 2:26 pm

“Add to this the influence of shrouds and wind and I doubt that readings were accurate within 3 C”
Maybe, but… how did so many datasets, constructed out of sometimes quite different raw data, processed by totally different algorithms, manage to be so similar?
Global Conspiracy?

Reply to  Bindidon
April 20, 2016 7:11 pm

” Maybe, but… how did so many datasets, constructed out of sometimes quite different raw data, processed by totally different algorithms, manage to be so similar?”
Because they are mostly made up of infilled and homogenized data, that overwhelms the measurements. It is even possibly to actually be warming, but little if any is from co2.
So, that’s my answer to the same question as earlier, pretty much the same answer.

Reply to  Bindidon
April 20, 2016 7:28 pm

Actually, I’ve changed my mind, I think there’s been little if any warming, and when the surface stations are aggregated without infilling and homogenization, what they have recorded is indistinguishable from 0.0F + / – 0.1F.

Michael Carter
Reply to  Bindidon
April 20, 2016 9:36 pm

I cannot see that there would be many data sets during the period 1850-1950. Each country had only one official met service where continuous records were kept. Modern analysis comes out similar because they consider the historical data to be accurate enough to establish the + 1 C/ century. How foolish. How could a thesis supervisor approve a work based on such fragile evidence?
As has been pointed out many times before here, the other great flaw relates to the limited distribution of data locations
Based on this very important topic (accuracy) we may even have had cooling over the last 150 years and would not know! – ya can’t measure mean global temperature change throughout the last century within 1 C! Over the next century maybe
.

Bindidon
Reply to  Bindidon
April 21, 2016 4:25 am

Michael Carter, I have the feeling that any attempt to convince you will fail. My reaction in such cases is to do exactly the inverse 🙂
http://www.cao-rhms.ru/krut/OAO2007_12E_03.pdf
You will enjoy.

Michael Carter
Reply to  Bindidon
April 21, 2016 1:00 pm

Bindinon – convince me of what? I don’t follow. I was talking about accuracy within 1C throughout the period 1850-1950 . I thought we were on the same page 🙂
Actually to get to the bottom of this subject what is required is a historian:
Re historical weather stations: What, where, when, why and how? Show me historical records recording 0.1 of a degree. 95+% of the time reading of temperature was a mundane daily affair where a degree here or there did not matter in the objective – to record everyday weather. I know that that in my country readings were often made by volunteers from the general community e.g. farmers. Did they take readings at exactly the same time each day?
Add to this the physical variables relating to shrouding and there is a huge question mark over accuracy within 3 C . To calculate this one would have to build the historical shrouds at their original location and conduct experiments. Take for example latent heat effect in a windy location where brief showers are common

Reply to  Bindidon
April 28, 2016 6:12 am

Bindidon writes:

Michael Carter, I have the feeling that any attempt to convince you will fail. My reaction in such cases is to do exactly the inverse :-)”

and cites “Global temperature: Potential measurement accuracy, stochastic disturbances, and long-term variations”, which on the surface would appear to agree in large part with Mr. Carter’s criticism and with Pat Frank’s analysis, leaving me with no understanding of what “convincing” needs to occur? It seems the three parties involved in the discussion largely agree; the purported accuracy of the historical written records is highly questionable, and conclusions concerning the effects of rising carbon dioxide in Earth’s atmosphere are without foundation.
Are we all arguing with each other for the sake of argument?

Bindidon
April 20, 2016 2:55 pm

In his message posted April 20, 2016 at 8:41 am, Dr Norman Page wrote:
It would be a great contribution if you could produce a similar analysis of the satellite data sets.
Here is a plot comparing, within the satellite era, the actually ‘coolest’ surface temperature measurement record (JMA, Tokio Climate Center) with the actually ‘coolest’ lower troposphere brightness measurement record (UAH6.0beta5, University of Alabama at Huntsville):
http://fs5.directupload.net/images/160420/ujlplpla.jpg
I’m all you want but a statistician. So I can only intuitively imagine it might be some hard work to isolate here what’s good, what’s bad and what’s ugly in such a comparison.
Anyway: it’s amazing to see that, though
– distant in abolute temperature by about 24 °C
– recorded by completely different hardware
– processed by completely different software
the surface and the lower troposphere column show, within a 37 years long time series of their so unluckily “anomalies” named deltas, such a degree of similarity (deltas shown here in °C).

Bindidon
Reply to  Bindidon
April 20, 2016 3:01 pm

The two thick lines in red (UAH) and blue (JMA) are the 37 month running means of the corresponding plotted monthly data.

Reply to  Bindidon
April 20, 2016 3:32 pm

See Fig 5 at
http://climatesense-norpag.blogspot.com/2016/03/the-imminent-collapse-of-cagw-delusion.html
To see what is going on based on the RSS data.
Just add the two trends in Fig 5 on your UAH data showing the millennial peak at 2004 and cut off at 2014 . The current El Nino is a temporary aberration which obscures the trend.

Reply to  Bindidon
April 28, 2016 6:28 am

I would be tempted to say your chart give good evidence there is a high degree of agreement between the JMA record and the UAH satellite record, such a degree that it makes very good sense to consider the satellite record sound and systematically superior to the ground based measurement method simply because:
a) It is regularly calibrated and so, internally consistent.
b) it has true global coverage with no necessity to infill or otherwise approximate.
c) the exact same instrument is used to collect all data, removing error and uncertainty.
For all of these reasons (and certainly more), it would seem the argument against standardizing on these data is very weak?

Bindidon
Reply to  Bindidon
April 20, 2016 4:15 pm

Oops?! Where is my reply to Dr Norman Page? Evaporated?

Bindidon
Reply to  Bindidon
April 21, 2016 2:38 am

My experience shows rather the other way round: many people persist in explaining nearly everything with millenial, centennial and other cycles or even of their combination (Loehle & Scafetta are a pretty good example of that).
And these observations, based on nothing more accurate than other observations, are imho exactly what “obscures the trend”.
Moreover, it is known since longer time that trends over time periods as short as the one you presented (2004-2014) have few significance: the standard error associated to these trends is too high.

Bindidon
April 20, 2016 4:12 pm

Pat Frank
I can understand your unsatisfaction about the potential inaccuracy of surface temperature measurements.
But as we know, everything has its counterpart. What about you having a look at
http://www.moyhu.blogspot.de/2016/04/march-giss-down-006-hottest-march-in.html
where you can see within the comment thread what really happens with unusually hot temperatures actually recorded in the Arctic…
Look at this amazing discussion between Nick Stokes and commenters Olof R and Kevin Cowtan!

Reply to  Bindidon
April 23, 2016 2:17 pm

There’s the temperature record, and then there’s the cause of that record, Bindidon. It’s the cause that powers the debate about AGW. Absent any evidence that air temperatures are influenced by human GHG emissions, what’s the cosmic importance of a couple tenths of a degree?
Apart from that, look again at the plot your site. Where are the physical error bars? How do you or anyone else know any of the differences are physically significant?

Reply to  Bindidon
May 1, 2016 5:12 pm

bindidon,
Cowtan & Way have been so thoroughly deconstructed that you’re only being amusing by mentioning them. Put their names in the search box, and get educated.
And I noticed that you claimed Antarctic temperatures have been rising. That’s about as accurate as your other comments:comment image

basicstats
April 21, 2016 6:18 am

“Central Limit Theorem is adduced to assert that they average to zero.”
There seems to be confusion over the CLT. It provides assumptions under which a sample mean has an approximately normal (gaussian) probability distribution. This is more than is required to justify using averages of large samples to eliminate random error. A Law of Large Numbers will suffice for that. Basically, LLNs say the sample mean converges to the population mean with increasing sample size. Nothing about probability distributions.
The key features required for LLN are independent (or comparable) observations having the same expected value. As with other kinds of error in temperature measurement, these assumptions obviously do not apply to systematic error of the kind described in the post. This is not to say that large sample averages (plus anomalies) will not knock out a lot of measurement error. Just not to the claimed uncertainty limits.

April 21, 2016 7:42 pm

It is a very common technique to use oversampling and decimation to improve the resolution of analog to digital converters, that is to use a converter with maybe 10 bit resolution to achieve 16 bit precision. That is quite similar to what is being discussed here.
http://www.atmel.com/images/doc8003.pdf

Reply to  Eli Rabett
April 22, 2016 9:04 am

Not correct, Eli. The point at issue is the accuracy of the waveform itself as a representation of physical reality, not whether oversampling the waveform can reproduce the waveform.

Reply to  Eli Rabett
April 24, 2016 5:56 pm

Eli Rabett April 21, 2016 at 7:42 pm

It is a very common technique to use oversampling and decimation to improve the resolution of analog to digital converters, that is to use a converter with maybe 10 bit resolution to achieve 16 bit precision. That is quite similar to what is being discussed here.

Let me see if I can clarify this with a simple example of measurement error. Note that we have a variety of possible situations, and I list some of them below in order of what is generally increasing uncertainty.
1. Unvarying data, same instrument, repeated measurements. This is the best case we can hope for. An example would be repeated measurements of the length of a board using the same tape measure.
2. Unvarying data, different instruments, repeated measurements. An example would be repeated measurements of the length of a board using different tape measures.
3. Varying data, same instrument. An example would be measuring the lengths of a string of different boards as they come off some cutting machine using the same tape measure.
4. Varying data, different instruments. An example would be measuring the lengths of a string of different boards as they come off the cutting machine using different tape measures.
5. Different data, different instruments. An example would be measuring the lengths of a string of different boards as they come off identical cutting machines in a number of factories using different tape measures.
Let me discuss the first situation, as it is the simplest. Let’s say we have an extremely accurately marked tape measure that reads to the nearest millimetre, and we’re reading it by eye alone, no magnification. Now, by squinting, I could possibly estimate another decimal point.
Let’s say for the discussion that the board is 314.159625 millimetres long.
So I measure the board, and I estimate it at 314 and a quarter mm. I hand the tape measure to the next person, they say 314.1 mm. The next person says 314 and a half mm. I repeat the experiment until 10,000 people have read the length. Many of them round it off to one half simply because it is between two marks on the tape. Others estimate it to the nearest quarter, others give a variety of different decimal answers.
Then I take a look at the mean (average) of the estimates of the length. Since lots of folks said 314 and a half, the estimate is high. I get say 314.246152, with a standard error which can be calculated as the standard deviation of the estimates (which will be on the order of 0.25 mm) divided by the square root of the number of trials. This gives us an answer of 314.36152 mm, with an uncertainty of ± 0.0025 mm.
Here is the important point. This tiny uncertainty is NOT the uncertainty in the actual length of the board. What we have measured is the uncertainty of our ESTIMATE of the length of the board. And in fact, in this example the true length of the board is far, far outside the indicated and correctly calculated uncertainty in the estimate.
I have simplified my example to highlight the difference between the uncertainty in our ESTIMATES, and the uncertainty in the ACTUAL LENGTH.
The uncertainty in our ESTIMATES can be reduced by making more measurements … but even with 10,000 repeated measurements we can’t measure to two thousandths of a millimetre by eye.
Finally, let me repeat that I’m discussing the simplest situation, repeated measurement of an unvarying length with the same instrument. As we add in complications, it can only increase the uncertainty, never decrease it. So moving to say measuring an unvarying length with different instruments, we have to add in instrumental uncertainty. Then when we go to measuring a varying length with different instruments, yet more uncertainty comes in.
This leads to my own personal rule of thumb, which is that I’m extremely suspicious of anything more than a one-decimal-point increase in accuracy from repeated measurements. If the tape measure is gradated in 1 mm increments, on my planet the best we can hope for is that we can get within a tenth of a millimetre by repeated measurements … and we may not even be able to get that. Here’s why
You may recall the concept of “significant digits”. From Wolfram Mathworld:
Significant Digits

When a number is expressed in scientific notation, the number of significant digits (or significant figures) is the number of digits needed to express the number to within the uncertainty of calculation. For example, if a quantity is known to be 1.234+/-0.002, four figures would be significant
The number of significant figures of a multiplication or division of two or more quantities is equal to the smallest number of significant figures for the quantities involved. For addition or subtraction, the number of significant figures is determined with the smallest significant figure of all the quantities involved. For example, the sum 10.234+5.2+100.3234 is 115.7574, but should be written 115.8 (with rounding), since the quantity 5.2 is significant only to +/-0.1.

Note that this means that when we consider significant digits, the average of say a week’s worth of whole-degree maximum temperatures of say (14,17,18,16,18,14,14) is not 15.857 ± 0 .704 …
It is 16.
Now, like I said, my rule of thumb might allow a value of 15.9 … but I’d have to check the Hurst Exponent first.
Best to all,
w.

Reply to  Willis Eschenbach
April 24, 2016 8:00 pm

” Note that this means that when we consider significant digits, the average of say a week’s worth of whole-degree maximum temperatures of say (14,17,18,16,18,14,14) is not 15.857 ± 0 .704 …
It is 16.”
What would you define the average difference as?
0 +/- 1
or
0 +/- 0.3 (7796……) (1/7^-2)
or something else?
And then to make it more specific(to my method), imagine you are measuring the growth rate of say a bamboo stalk, where you are not actually making a cut, but you measure the entire length everyday to say calculate the monthly growth rate? Is the subsequent measurements more accurate because they are correlated?

Reply to  Willis Eschenbach
April 24, 2016 8:11 pm

” (14,17,18,16,18,14,14) ”
To make sure I explained the above in a cogent manner, based on your numbers each Day you measure
14
31
49
65
83
97
111
So while 49 is +/- 1, the day you measure 49, the 31 from the prior day, can’t still have a +/-1 error while 49 is +/- 1

Editor
April 26, 2016 6:57 pm

1sky1 April 26, 2016 at 5:14 pm

A fundamental point is continually being missed here. In climate, as opposed to real-time physical-weather statistics we are always dealing with averages of averages. The monthly mean is the average of daily means at a particular station. The yearly temperature average is the average of the monthly means, whether they pertain to a single station or any aggregate spatial average thereof.

While that is generally true, I don’t see people missing it … this is why I ask people to quote what they object to.

Neither the Hurst exponent of the underlying of individual variables, nor the shape their ordinate-error distributions is material to the determination of such climatic means, which is the only context in which I invoked the CLT here.

As the person who brought up the Hurst Exponent, it is not “material to the determination of such climatic means”. However, it is applicable to the CLT, as I demonstrated and linked to above.

It’s applicability to the issues at hand should be all the more evident, when the fact that in many cases the sampling, say for May 1945, is EXHAUSTIVE, with zero sampling error. Nowhere do I claim that this changes the shape of any error-distribution.

I have no clue what you are talking about. The sampling of WHAT was exhaustive in May 1945? Again, a quote or link might make your claim understandable … or not.

I’m traveling and will not divert valuable time to respond to any further out-of-context castings of proven statistical ideas or my own words.

You say that as though we care about your oh-so “valuable time” … in my opinion, the most valuable use of your time would be to never post here again. Your vague claims are tedious and boring and you seem unable to respond directly to anyone’s direct points. Instead you wave your hands, tell us once again how smart and important you are, make meaningless uncited unreferenced statements, whine that we’re wasting your valuable time, and contribute nothing in the process.
w.

Verified by MonsterInsights