Guest essay by Pat Frank
Presented at World Federation of Scientists, Erice, Sicily, 20 August 2015
This is a version of the talk I gave about uncertainty in the global average air temperature record at the 48th Conference of the World Federation Scientists on “Planetary Emergences and Other Events,” at Erice, Sicily, in August of 2015.
It was a very interesting conference and, as an aside, for me the take home message was that the short-term emergency is Islamic violence while the long-term emergency is some large-scale bolide coming down. Please, however, do not distract conversation into these topics.
Abstract: I had a longer abstract, but here’s the short form. Those compiling the global averaged surface air temperature record have not only ignored systematic measurement error, but have even neglected the detection limits of the instruments themselves. Since at least 1860, thermometer accuracy has been magicked out of thin air. Also since then, and at the 95% confidence interval, the rate or magnitude of the global rise in surface air temperature is unknowable. Current arguments about air temperature and its unprecedentedness are speculative theology.
1. Introduction: systematic error
Systematic error enters into experimental or observational results through uncontrolled and often cryptic deterministic processes. [1] These can be as simple as a consistent operator error. More typically, error emerges from an uncontrolled experimental variable or instrumental inaccuracy. Instrumental inaccuracy arises from malfunction or lack of calibration. Uncontrolled variables can impact the magnitude of a measurement and/or change the course of an experiment. Figure 1 shows the impact of an uncontrolled variable, taken from my own published work. [2, 3]
Figure 1: Left, titration of dissolved ferrous iron under conditions that allowed an unplanned trace of air to enter the experiment. Inset: the incorrect data precisely followed equilibrium thermodynamics. Right, the same experiment but with the appropriately strict exclusion of air. The data are completely different. Inset: the correct data reflect distinctly different thermodynamics.
Figure 1 shows that the inadvertent entry of a trace of air was enough to completely change the course of the experiment. Nevertheless, the erroneous data display coherent behavior and follow a trajectory completely consistent with equilibrium thermodynamics. To all appearances, the experiment was completely valid. In isolation, the data are convincing. However, they are completely wrong because the intruded air chemically modified the iron.
Figure 1 exemplifies the danger of systematic error. Contaminated experimental or observational results can look and behave just like good data, and can rigorously follow valid physical theory. Without care, such data invite erroneous conclusions.
By its nature, systematic error is difficult to detect and remove. Methods of elimination include careful instrumental calibration under conditions identical to the observation or experiment. Methodologically independent experiments that access the same phenomena provide a check on the results. Careful attention to these practices is standard in the experimental physical sciences.
The recent development of a new and highly accurate atomic clock illustrates the extreme care physicists take to eliminate systematic error. Critical to achievement of its 10-18 second accuracy, was removal of systematic error produced the black-body radiation of the instrument itself. [4]
Figure 2: Close-up picture of the new atomic clock. The timing element is a cluster of fluorescing strontium atoms trapped in an optical lattice. Thermal noise is removed using data provided by a sensor that measures the black-body temperature of the instrument.
As a final word, systematic error does not average away with repeated measurements. Repetition can even increase error. When systematic error cannot be eliminated and is known to be present, uncertainty statements must be reported along with the data. In graphical presentations of measurement or calculational data, systematic error is represented using uncertainty bars. [1] Those uncertainty bars communicate the reliability of the result.
2. Systematic Error in Surface Temperature Measurements
2.1. Land Surface Air Temperature
During most of the 20th century, land surface air temperatures were measured using a liquid-in-glass (LiG) thermometer housed in a box-like louvered shield (Stevenson screen or Cotton Regional Shelter (CRS)). [5, 6] After about 1985, thermistors or platinum resistance thermometers (PRT) housed in an unaspirated cylindrical plastic shield replaced the CRS/LiG sensors in Europe, the Anglo-Pacific countries, and the US. Beginning in 2000, the US Climate Research Network deployed sensors consisting of a trio of PRTs in an aspirated shield. [5, 7-9] An aspirated shield includes a small fan or impeller that ventilates the interior of the shield with outside air.
Unaspirated sensors rely on prevailing wind for ventilation. Solar radiance can heat the sensor shield, warming the interior atmosphere around the sensor. In the winter, upward radiance from the albedo of a snow-covered surface can also produce a warm bias. [10] Significant systematic measurement error occurs when air movement is less than 5 m/sec. [9, 11]
Figure 3: Alpine Plaine Morte Glacier, Switzerland, showing the air temperature sensor calibration experiment carried out by Huwald, et al., in 2007 and 2008. [12] Insets: close-ups of the PRT and the sonic anemometer sensors. Photo credit: Bou-Zeid, Martinet, Huwald, Couach, 2.2006 EPFL-ENAC.
In 2007 and 2008 calibration experiments carried out on the Plaine Morte Glacier (Figure 3) tested the field accuracy of the RM Young PRT housed in an unaspirated louvered shield, situated over a snow-covered surface. In a laboratory setting, the RM Young sensor is capable of ±0.1 C accuracy. Field accuracy was determined by comparison with air temperatures measured using a sonic anemometer, which takes advantage of the impact of temperature on the speed of sound in air and is insensitive to irradiance and wind-speed.
Figure 4: Temperature trends recorded simultaneously on Plaine Morte Glacier during February – April 2007. (¾), Sonic anemometer, and; (¾), RM Young PRT probe.
Figure 4 shows that under identical environmental conditions, the RM Young probe recorded significantly warmer Winter air temperatures than the sonic anemometer. The slope of the RM Young temperature trend is also more than 3 times greater. Referenced against a common mean, the RM Young error would enter a spurious warming trend into a global temperature average. The larger significance of this result is that the RM Young probe is very similar in design and response to the more advanced temperature probes in use world-wide since about 1985.
Figure 5 shows a histogram of the systematic temperature error exhibited by the RM Young probe.
Figure 5. RM Young probe systematic error on Plaine Morte Glacier. Day time error averages 2.0±1.4 C; night-time error averages 0.03±0.32 C.
The RM Young systematic errors mean that, absent an independent calibration instrument, any given daily mean temperature has an associated 1s uncertainty of 1±1.4 C. Figure 5 shows this uncertainty is neither randomly distributed nor constant. It cannot be removed by averaging individual measurements or by taking anomalies. Subtracting the average bias will not remove the non-normal 1s uncertainty. Entry of the RM Young station temperature record into a global average will carry that average error along with it.
Before inclusion in a global average, temperature series from individual meteorological stations are subjected to statistical tests for data quality. [13] Air temperatures are known to show correlation R = 0.5 over distances of about 1200 km. [14, 15] The first quality control test for any given station record includes a statistical check for correlation with temperature series among near-by stations. Figure 6 shows that the RM Young error-contaminated temperature series will pass this most basic quality control test. Further, the erroneous RM Young record will pass every single statistical test used for the quality control of meteorological station temperature records worldwide. [16, 17]
Figure 6: Correlation of the RM Young PRT temperature measurements with those of the sonic anemometer. Inset: Figure 1a from [14] showing correlation of temperature records from meteorological stations in the terrestrial 65-70º N, 0-5º E grid. The 0.5 correlation length is 1.4´103 km.
Figure 7: Calibration experiment at the University of Nebraska, Lincoln (ref. [11], Figure 1); E, MMTS shield; F, CRS shield; G, the aspirated RM Young reference.
Figure 7 shows the screen-type calibration experiment at the University of Nebraska, Lincoln. Each screen contained the identical HMP45C PRT sensor. [11] The calibration reference temperatures were provided by an aspirated RM Young PRT probe, rated as accurate to <±0.2 C below 1100 Wm-2 solar irradiance.
These independent calibration experiments tested the impact of a variety of commonly used screens on the fidelity of air temperature measurements from PRT probes. [10, 11, 18] Screens included the traditional Cotton Regional Shelter (CRS, Stevenson screen), and the MMTS screen now in common use in the US Historical Climate Network, among others.
Figure 8: Average systematic measurement error of an HMP45C PRT probe within an MMTS shelter over a grass (top) or snow-covered (bottom) surface. [10, 11]
Figure 8, top, shows the average systematic measurement error an MMTS shield imposed on a PRT temperature probe, found during the calibration experiment displayed in Figure 7. [11] Figure 8, bottom, shows the results of an independent PRT/MMTS calibration over a snow-covered surface. [10] The average annual systematic uncertainty produced by the MMTS shield can be estimated from these data as, 1s = 0.32±0.23 C. The skewed warm-bias distribution of error over snow is similar in magnitude to the unaspirated RM Young shield in the Plaine Morte experiment (Figure 5).
Figure 9 shows the average systematic measurement error produced by a PRT probe inside a traditional CRS shield. [11]
Figure 9. Average day-night 1s = 0.44 ± 0.41 C systematic measurement error produced by a PRT temperature probe within a traditional CRS shelter.
The warm bias in the data is apparent, as is the non-normal distribution of error. The systematic uncertainty from the CRS shelter was 1s = 0.44 ± 0.41 C. The HMP45C PRT probe is at least as accurate as the traditional LiG thermometers housed within the CRS shield. [19, 20] The PRT/CRS experiment may then estimate a lower limit of systematic measurement uncertainty present in the land-surface temperature record covering all of the 19th and most of the 20th century.
2.2 Sea-Surface Temperature
Although considerable effort has been expended to understand sea-surface temperatures (SSTs), [21-28] there have been very few field calibration experiments of sea-surface temperature sensors. Bucket- and steamship engine cooling-water intake thermometers provided the bulk of early and mid-20th century SST measurements. Sensors mounted on drifting and moored buoys have come into increasing use since about 1980, and now dominate SST measurements. [29] Attention is focused on calibration studies of these instruments.
The series of experiments reported by Charles Brooks in 1926 are by far the most comprehensive field calibrations of bucket and engine-intake thermometer SST measurements carried out by any individual scientist. [30] Figure 10 presents typical examples of the systematic error in bucket and engine intake SSTs that Brooks found.
Figure 10: Systematic measurement error in one set of engine-intake (left) and bucket (right) sea-surface temperatures reported by Brooks. [30]
Brooks also recruited an officer to monitor the ship-board measurements after he concluded his experiments and disembarked. The errors after he had departed the ship were about twice as large as they were when he was aboard. The simplest explanation is that care deteriorated, perhaps back to normal, when no one was looking. This result violates the standard assumption in the field that temperature sensor errors are constant for each ship.
In 1963 Saur reported the largest field calibration experiment of engine-intake thermometers, carried out by volunteers aboard twelve US military transport ships engaged off the US central Pacific coast. [31] The experiment included 6826 pairs of observations. Figure 11 shows the experimental results from one voyage of one ship.
Figure 11: Systematic error in recorded engine intake temperatures aboard one military transport ship operating June-July, 1959. The mean systematic bias and uncertainty represented by these data are, 1s = 0.9±0.6 C.
Saur reported Figure 11 as, “a typical distribution of the differences” reported from the various ships. The ±0.6 C uncertainty about the mean systematic error is comparable to the values reported by Brooks, shown in Figure 10.
Saur concluded his report by noting that, “The average bias of reported sea water temperatures as compared to sea surface temperatures, with 95 percent confidence limits, is estimated to be 1.2±0.6 F [0.67±0.33 C] on the basis of a sample of 12 ships. The standard deviation of differences [between ships] is estimated to be 1.6 F [0.9 C]. Thus, without improved quality control the sea temperature data reported currently and in the past are for the most part adequate only for general climatological studies. [bracketed conversions added]” Saur’s caution is instructive, but has apparently been mislaid by consensus scientists.
Measurements from bathythermograph (BT) and expendable bathythermograph (XBT) instruments have also made significant contributions to the SST record. [32] Extensive BT and XBT calibration experiments revealed multiple sources of systematic error, principally stemming from mechanical problems and calibration errors. [33-35] Relative to a reversing thermometer standard, field BT measurements exhibited ±s = 0.34±0.43 C error. [35] This standard deviation is more than twice as large as the manufacturer-stated accuracy of ±0.2 C and reflects the impact of uncontrolled field variables.
The SST sensors in deployed floating and moored buoys were never field-calibrated during the 20th century, allowing no general estimate of systematic measurement error.
However, Emery estimated a 1s = ±0.3 C error by comparison of SSTs from floating buoys co-located to within 5 km of each other. [28] SST measurements separated by less than 10 km are considered coincident.
A similar ±0.26 C buoy error magnitude was found relative to SSTs retrieved from the Advanced Along-Track Scanning Radiometer (AATSR) satellite. [36] The error distributions were non-normal.
More recently, Argo buoys were field calibrated against very accurate CTD (conductivity-temperature-depth) measurements and exhibited average RMS errors of ±0.56 C. [37] This is similar in magnitude to the reported average ±0.58 C buoy-Advanced Microwave Scanning Radiometer (AMSR) satellite SST difference. [38]
3. Discussion
Until recently, [39, 40] systematic temperature sensor measurement errors were neither mentioned in reports communicating the origin, assessment, and calculation of the global averaged surface air temperature record, nor were they included in error analysis. [15, 16, 39-46] Even after the recent arrival of systematic errors in published literature, however, the Central Limit Theorem is adduced to assert that they average to zero. [36] However, systematic temperature sensor errors are neither randomly distributed nor constant over time, space, or instrument. There is no theoretical reason to expect that these errors follow the Central Limit Theorem, [47, 48] or that such errors are reduced or removed by averaging multiple measurements; even when measurements number in the millions. A complete inventory of contributions to uncertainty in the surface air temperature record must include, indeed must start with, the systematic measurement error of the temperature sensor itself. [39]
The World Meteorological Organization (WMO) offers useful advice regarding systematic error. [20]
“Section 1.6.4.2.3 Estimating the true value – additional remarks.
“In practice, observations contain both random and systematic errors. In every case, the observed mean value has to be corrected for the systematic error insofar as it is known. When doing this, the estimate of the true value remains inaccurate because of the random errors as indicated by the expressions and because of any unknown component of the systematic error. Limits should be set to the uncertainty of the systematic error and should be added to those for random errors to obtain the overall uncertainty. However, unless the uncertainty of the systematic error can be expressed in probability terms and combined suitably with the random error, the level of confidence is not known. It is desirable, therefore, that the systematic error be fully determined.”
Thus far, in production of the global averaged surface air temperature record, the WMO advice concerning systematic error has been followed primarily in the breach.
Systematic sensor error in air and sea-surface temperature measurements has been woefully under-explored and field calibrations are few. Nevertheless, the reported cases make it clear that the surface air temperature record is contaminated with a very significant level of systematic measurement error. The non-normality of systematic error means that subtracting an average bias will not discharge the measurement uncertainty about the global temperature mean.
Further, the magnitude of the systematic error bias in surface air temperature and SST measurements is apparently as variable in time and space as the magnitude of the standard deviation of systematic uncertainty about the mean error bias. I.e., the mean systematic bias error was 2 C over snow on the Plaine Morte Glacier, Switzerland, but was 0.4 C over snow at Lincoln, Nebraska. Similar differences accrue to the engine-intake systematic error means reported by Brooks and Saur. Therefore, removing an estimate of mean bias will always leave the magnitude ambiguity of the residual mean bias uncertainty. In any complete evaluation of error, the residual uncertainty in mean bias will combine with the 1s standard deviation of measurement uncertainty into the uncertainty total.
A complete evaluation of systematic error is beyond the analysis presented here. However, to the extent that the above errors are representative, a set of estimated uncertainty bars due to systematic error in the global averaged surface air temperature record can be calculated, Figure 12.
The uncertainty bars in Figure 12 (right) reflect a 0.7:0.3 SST:land surface ratio of systematic errors. Combined in quadrature, bucket and engine-intake errors constitute the SST uncertainty prior to 1990. Over the same time interval the systematic error of the PRT/CRS sensor [39, 49], constituted the uncertainty in land-surface temperatures. Floating buoys made a partial contribution (0.25 fraction) to the uncertainty in SST between 1980-1990. After 1990 uncertainty bars are further steadily reduced, reflecting the increasing contribution and smaller errors of MMTS (land) and floating buoy (SS) sensors.
Figure 12: The 2010 global average surface air temperature record obtained from website of the Climate Research Unit (CRU), University of East Anglia, UK. http://www.cru.uea.ac.uk/cru/data/temperature/. Left, error bars following the description provided at the CRU website. Right, error bars reflecting the uncertainty width due to estimated systematic sensor measurement errors within the land and sea surface records. See the text for further discussion.
Figure 12 (right) is very likely a more accurate representation of the state of knowledge than is Figure 12 (left), concerning the rate or magnitude of change in the global averaged surface air temperature since 1850. The revised uncertainty bars represent non-normal systematic error. Therefore the air temperature mean trend loses any status as the most probable trend.
Finally, Figure 13 pays attention to the instrumental resolution of the historical meteorological thermometers.
Figure 13 caused some angry shouts from the audience at Erice, followed by some very rude approaches after the talk, and a lovely debate by email. The argument presented here prevailed.
Instrumental resolution defines the measurement detection limit. For example, the best-case historical 19th to mid-20th century liquid-in-glass (LiG) meteorological thermometers included 1 C graduations. The best-case laboratory-conditions reportable temperature resolution is therefore ±0.25 C. There can be no dispute about that.
The standard SST bucket LiG thermometers from the Challenger voyage on through the 20th century also had 1 C graduations. The same resolution limit applies.
The very best American ship-board engine-intake thermometers included 2 F (~1 C) graduations; on British ships they were 2 C. The very best resolution is then about ±(0.25 – 0.5) C. These are known quantities. Resolution uncertainty, like systematic error, does not average away. Knowing the detection limits of the classes of instruments allows us to estimate the limit of resolution uncertainty in any compiled historical surface air temperature record.
Figure 13 shows this limit of resolution. It compares the instrumental historical ±2s resolution, with ±2s uncertainty in the published Berkeley Earth air temperature compilation. The analysis applies equally well to the published surface air temperature compilations of GISS or CRU/UKMet, which feature the same uncertainty limits.
Figure 13: The Berkeley Earth global averaged air temperature trend with the published ±2s uncertainty limits in grey. The time-wise ±2s instrumental resolution is in red. On the right in blue is a compilation of the best resolution limits of the historical temperature sensors, from which the global resolution limits were calculated.
The globally combined instrumental resolution was calculated using the same fractional contributions as were noted above for the lower limit estimate of systematic measurement error. That is, 0.30:0.70, land : sea surface instruments, and the published historical fractional use of each sort of instrument (land: CRS vs. MMTS, and; SS: buckets vs. engine intakes vs. buoys).
The record shows that during the years 1800-1860, the published global uncertainty limits of field meteorological temperatures equal the accuracy of the best possible laboratory-conditions measurements.
After about 1860 through 2000, the published resolution is small smaller than the detection limits — the resolution limits — of the instruments themselves. From at least 1860, accuracy has been magicked out of thin air.
Does anyone find the published uncertainties credible?
All you engineers and experimental scientists out there may go into shock after reading this. I was certainly shocked by the realization. Espresso helps.
The people compiling the global instrumental record have neglected a experimental limit even more basic than systematic measurement error: the detection limits of their instruments. They have paid no attention to it.
Resolution limits and systematic measurement error produced by the instrument itself constitute lower limits of uncertainty. The scientists engaged in consensus climatology have neglected both of them.
It’s almost as though none of them have ever made a measurement or struggled with an instrument. There is no other rational explanation for that sort of negligence than a profound ignorance of experimental methods.
The uncertainty estimate developed here shows that the rate or magnitude of change in global air temperature since 1850 cannot be known within ±1 C prior to 1980 or within ±0.6 C after 1990, at the 95% confidence interval.
The rate and magnitude of temperature change since 1850 is literally unknowable. There is no support at all for any “unprecedented” in the surface air temperature record.
Claims of highest air temperature ever, based on even 0.5 C differences, are utterly insupportable and without any meaning.
All of the debates about highest air temperature are no better than theological arguments about the ineffable. They are, as William F. Buckley called them, “Tedious speculations about the inherently unknowable.”
There is no support in the temperature record for any emergency concerning climate. Except, perhaps an emergency in the apparent competence of AGW-consensus climate scientists.
4. Acknowledgements: Prof. Hendrik Huwald and Dr. Marc Parlange, Ecole Polytechnique Federale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland, are thanked for generously providing the Plaine Morte sensor calibration data entering into Figure 4, Figure 5, and Figure 6. This work was carried out without any external funding.
5. References
[1] JCGM, Evaluation of measurement data — Guide to the expression of uncertainty in measurement 100:2008, Bureau International des Poids et Mesures: Sevres, France.
[2] Frank, P., et al., Determination of ligand binding constants for the iron-molybdenum cofactor of nitrogenase: monomers, multimers, and cooperative behavior. J. Biol. Inorg. Chem., 2001. 6(7): p. 683-697.
[3] Frank, P. and K.O. Hodgson, Cooperativity and intermediates in the equilibrium reactions of Fe(II,III) with ethanethiolate in N-methylformamide solution. J. Biol. Inorg. Chem., 2005. 10(4): p. 373-382.
[4] Hinkley, N., et al., An Atomic Clock with 10-18 Instability. Science, 2013. 341(p. 1215-1218.
[5] Parker, D.E., et al., Interdecadal changes of surface temperature since the late nineteenth century. J. Geophys. Res., 1994. 99(D7): p. 14373-14399.
[6] Quayle, R.G., et al., Effects of Recent Thermometer Changes in the Cooperative Station Network. Bull. Amer. Met. Soc., 1991. 72(11): p. 1718-1723; doi: 10.1175/1520-0477(1991)072<1718:EORTCI>2.0.CO;2.
[7] Hubbard, K.G., X. Lin, and C.B. Baker, On the USCRN Temperature system. J. Atmos. Ocean. Technol., 2005. 22(p. 1095-1101.
[8] van der Meulen, J.P. and T. Brandsma, Thermometer screen intercomparison in De Bilt (The Netherlands), Part I: Understanding the weather-dependent temperature differences). International Journal of Climatology, 2008. 28(3): p. 371-387.
[9] Barnett, A., D.B. Hatton, and D.W. Jones, Recent Changes in Thermometer Screen Design and Their Impact in Instruments and Observing Methods WMO Report No. 66, J. Kruus, Editor. 1998, World Meteorlogical Organization: Geneva.
[10] Lin, X., K.G. Hubbard, and C.B. Baker, Surface Air Temperature Records Biased by Snow-Covered Surface. Int. J. Climatol., 2005. 25(p. 1223-1236; doi: 10.1002/joc.1184.
[11] Hubbard, K.G. and X. Lin, Realtime data filtering models for air temperature measurements. Geophys. Res. Lett., 2002. 29(10): p. 1425 1-4; doi: 10.1029/2001GL013191.
[12] Huwald, H., et al., Albedo effect on radiative errors in air temperature measurements. Water Resorces Res., 2009. 45(p. W08431; 1-13.
[13] Menne, M.J. and C.N. Williams, Homogenization of Temperature Series via Pairwise Comparisons. J. Climate, 2009. 22(7): p. 1700-1717.
[14] Briffa, K.R. and P.D. Jones, Global surface air temperature variations during the twentieth century: Part 2 , implications for large-scale high-frequency palaeoclimatic studies. The Holocene, 1993. 3(1): p. 77-88.
[15] Hansen, J. and S. Lebedeff, Global Trends of Measured Surface Air Temperature. J. Geophys. Res., 1987. 92(D11): p. 13345-13372.
[16] Brohan, P., et al., Uncertainty estimates in regional and global observed temperature changes: A new data set from 1850. J. Geophys. Res., 2006. 111(p. D12106 1-21; doi:10.1029/2005JD006548; see http://www.cru.uea.ac.uk/cru/info/warming/.
[17] Karl, T.R., et al., The Recent Climate Record: What it Can and Cannot Tell Us. Rev. Geophys., 1989. 27(3): p. 405-430.
[18] Hubbard, K.G., X. Lin, and E.A. Walter-Shea, The Effectiveness of the ASOS, MMTS, Gill, and CRS Air Temperature Radiation Shields. J. Atmos. Oceanic Technol., 2001. 18(6): p. 851-864.
[19] MacHattie, L.B., Radiation Screens for Air Temperature Measurement. Ecology, 1965. 46(4): p. 533-538.
[20] Rüedi, I., WMO Guide to Meteorological Instruments and Methods of Observation: WMO-8 Part I: Measurement of Meteorological Variables, 7th Ed., Chapter 1. 2006, World Meteorological Organization: Geneva.
[21] Berry, D.I. and E.C. Kent, Air–Sea fluxes from ICOADS: the construction of a new gridded dataset with uncertainty estimates. International Journal of Climatology, 2011: p. 987-1001.
[22] Challenor, P.G. and D.J.T. Carter, On the Accuracy of Monthly Means. J. Atmos. Oceanic Technol., 1994. 11(5): p. 1425-1430.
[23] Kent, E.C. and D.I. Berry, Quantifying random measurement errors in Voluntary Observing Ships’ meteorological observations. Int. J. Climatol., 2005. 25(7): p. 843-856; doi: 10.1002/joc.1167.
[24] Kent, E.C. and P.G. Challenor, Toward Estimating Climatic Trends in SST. Part II: Random Errors. Journal of Atmospheric and Oceanic Technology, 2006. 23(3): p. 476-486.
[25] Kent, E.C., et al., The Accuracy of Voluntary Observing Ships’ Meteorological Observations-Results of the VSOP-NA. J. Atmos. Oceanic Technol., 1993. 10(4): p. 591-608.
[26] Rayner, N.A., et al., Global analyses of sea surface temperature, sea ice, and night marine air temperature since the late nineteenth century. Journal of Geophysical Research-Atmospheres, 2003. 108(D14).
[27] Emery, W.J. and D. Baldwin. In situ calibration of satellite sea surface temperature. in Geoscience and Remote Sensing Symposium, 1999. IGARSS ’99 Proceedings. IEEE 1999 International. 1999.
[28] Emery, W.J., et al., Accuracy of in situ sea surface temperatures used to calibrate infrared satellite measurements. J. Geophys. Res., 2001. 106(C2): p. 2387-2405.
[29] Woodruff, S.D., et al., The Evolving SST Record from ICOADS, in Climate Variability and Extremes during the Past 100 Years, S. Brönnimann, et al. eds, 2007, Springer: Netherlands, pp. 65-83.
[30] Brooks, C.F., Observing Water-Surface Temperatures at Sea. Monthly Weather Review, 1926. 54(6): p. 241-253.
[31] Saur, J.F.T., A Study of the Quality of Sea Water Temperatures Reported in Logs of Ships’ Weather Observations. J. Appl. Meteorol., 1963. 2(3): p. 417-425.
[32] Barnett, T.P., Long-Term Trends in Surface Temperature over the Oceans. Monthly Weather Review, 1984. 112(2): p. 303-312.
[33] Anderson, E.R., Expendable bathythermograph (XBT) accuracy studies; NOSC TR 550 1980, Naval Ocean Systems Center: San Diego, CA. p. 201.
[34] Bralove, A.L. and E.I. Williams Jr., A Study of the Errors of the Bathythermograph 1952, National Scientific Laboratories, Inc.: Washington, DC.
[35] Hazelworth, J.B., Quantitative Analysis of Some Bathythermograph Errors 1966, U.S. Naval Oceanographic Office Washington DC.
[36] Kennedy, J.J., R.O. Smith, and N.A. Rayner, Using AATSR data to assess the quality of in situ sea-surface temperature observations for climate studies. Remote Sensing of Environment, 2012. 116(0): p. 79-92.
[37] Hadfield, R.E., et al., On the accuracy of North Atlantic temperature and heat storage fields from Argo. J. Geophys. Res.: Oceans, 2007. 112(C1): p. C01009.
[38] Castro, S.L., G.A. Wick, and W.J. Emery, Evaluation of the relative performance of sea surface temperature measurements from different types of drifting and moored buoys using satellite-derived reference products. J. Geophys. Res.: Oceans, 2012. 117(C2): p. C02029.
[39] Frank, P., Uncertainty in the Global Average Surface Air Temperature Index: A Representative Lower Limit. Energy & Environment, 2010. 21(8): p. 969-989.
[40] Frank, P., Imposed and Neglected Uncertainty in the Global Average Surface Air Temperature Index. Energy & Environment, 2011. 22(4): p. 407-424.
[41] Hansen, J., et al., GISS analysis of surface temperature change. J. Geophys. Res., 1999. 104(D24): p. 30997–31022.
[42] Hansen, J., et al., Global Surface Temperature Change. Rev. Geophys., 2010. 48(4): p. RG4004 1-29.
[43] Jones, P.D., et al., Surface Air Temperature and its Changes Over the Past 150 Years. Rev. Geophys., 1999. 37(2): p. 173-199.
[44] Jones, P.D. and T.M.L. Wigley, Corrections to pre-1941 SST measurements for studies of long-term changes in SSTs, in Proc. Int. COADS Workshop, H.F. Diaz, K. Wolter, and S.D. Woodruff, Editors. 1992, NOAA Environmental Research Laboratories: Boulder, CO. p. 227–237.
[45] Jones, P.D. and T.M.L. Wigley, Estimation of global temperature trends: what’s important and what isn’t. Climatic Change, 2010. 100(1): p. 59-69.
[46] Jones, P.D., T.M.L. Wigley, and P.B. Wright, Global temperature variations between 1861 and 1984. Nature, 1986. 322(6078): p. 430-434.
[47] Emery, W.J. and R.E. Thomson, Data Analysis Methods in Physical Oceanography. 2nd ed. 2004, Amsterdam: Elsevier.
[48] Frank, P., Negligence, Non-Science, and Consensus Climatology. Energy & Environment, 2015. 26(3): p. 391-416.
[49] Folland, C.K., et al., Global Temperature Change and its Uncertainties Since 1861. Geophys. Res. Lett., 2001. 28(13): p. 2621-2624.
Discover more from Watts Up With That?
Subscribe to get the latest posts sent to your email.
I’ve always wondered about those magical air temperature thermometers.
Thanks for your interest, Rich.
And if the moderator doesn’t mind, I’ll use this reply to say, Thanks, Anthony for posting my essay! 🙂
Thanks also to everyone for offering your thoughts and responses. It’s very appreciated. I promise to respond to the various questions and challenges but this is evening and weekend work for me, so it may take awhile to work down the thread.
But I’ll get there. Thanks again to everyone, and especially to you, Anthony. Especially for just being there, doing the work you do, and contributing so much to everyone’s sanity.
This is *brilliant* entirely. It is quite literally *fundamental* science! the kind of thing I was taught in Physics 1 labs too many decades ago, but with up to date numbers about a topic of public concern. (And of course we must inspect systematic error in any proxy until it has been measured, if that’s even possible.). From now on, the touchstone for whether any climate scientist is competent to express an opinion will be “have they addressed this issue.” Thank you very much for a clear explanation.
I have been saying for years that the surface temperature data record would be laughed out of existence by any other physical science outside of Climatology because of lack of standardization, quality control and calibration.
I could not agree more. But to add on to you comment, I don’t understand why we are not talking about the energy content of the atmosphere. We only talk temperature. And we only talk temperature at the surface where the temperature is the hottest due to gravity and other factors. And we don’t take humidity into account here on water world,
And then there is the fact that for most of the record we could only measure to the about plus/minus half a degree but we claim accuracy to the 2nd decimal place, What is up with that?
Getting an honest temperature data set out of the government “scientists” is like trying to get a fair loan from a Mafia Don.
~ Mark
Sometimes those surface measurements are not the warmest in the air column. On clear, windless nights radiational cooling in New Hampshire valleys occasionally results in day time low temps that are below the temperature atop Mt Washington. Once the inversion breaks, the valleys wind up some 30°F warmer than Mt Washington.
Satellite reading of the lower (which is not so low) troposphere are a lot more meaningful when considering energy budgets.
Every day we beat Mt Washington or there’s frost on my car I observe that CO2 (and H2O!) levels haven’t stopped all the longwave IR from leaving Earth.
Exactly. If CO2 is causing anything to be retained, it is enthalpy. Temperature at the surface is a very, very poor estimate of retained enthalpy. Most especially with the contamination due to human activity that changes the albedo, hydrology, etc. in large areas and injects enthalpy in the areas of the sensors.
I think you’re overestimating the crookedness of Mafia Dons 🙂
This article raises a very important point. What Judith Curry calls the uncertainty monster.
One of the key problems here is that the officially quoted error estimations are usually taken from the variance of data. ie a certain number of std deviations is taken to give x% uncertainty range of the result.
This is again assuming constant , normally distributed causes of error. It does not account of systematic errors.
For example. hadSST2 made a clunky 0.5 deg C “correction” to SST in 1946. This was obviously clumsy and wrong. So hadSST3 decided it was “right for the wrong reason” and muddled together a system which, instead of step change, phased in the same thing exponentially over 20 or so years. It was less shocking to the eye but gave about the same long term tendency so no one had to redo all their climate science or rewrite the carefully tuned models to produce a different result . IPCC is saved ( phew ! ).
However, some of data from 1946-1960, which had changed by upto 0.5 deg C., was stated as having an uncertainty of 0.1 deg C just as it had done before the changes.
Well both can’t be correct. One set of data must have an uncertainty of +/-0.1 +/-0.5 deg C. or the error assessment technique is being done in a fundamentally incorrect manner.
You cannot seriously put forward two sets of data with a declared 0.1 deg C uncertainty that differ by 0.5 deg C.
The unique purpose of these blatantly optimistic uncertainty claims are to determine what can be called “statistically significant” and “unprecedented”.
Oh, and BTW adding land and sea temps is not legit in the first place since they are different physical media.
https://judithcurry.com/2016/02/10/are-land-sea-temperature-averages-meaningful/
Greg, you’re on the right track. Systematic uncertainty is roundly neglected throughout consensus climatology.
Random measurement error is the unvarying assumption among the air temperature compilers. The assumption allows them to discharge all the error when taking averages.
That assumption is also unwarranted.
Climatology is also the only field where you can intensively measure 3 to 5% of the whole, and then declare you have perfect knowledge of the whole.
…and when they splice that surface temperature record to an even dodgier tree-ring proxy…big yucks, that.
Bingo.
That’s the real elephant in the room.
The historic temperature record is a mess. It is completely unreliable. For large swathes of the planet, there aren’t any records at all; for other large swathes what is being palmed off as temperature records are a joke.
How can anybody seriously believe accurate temperatures were being recorded in China from, say 1910-1960, or in Russia from, say 1917- 1945?
“How can anybody seriously believe accurate temperatures were being recorded in China from, say 1910-1960, or in Russia from, say 1917- 1945?”
Well, I’ve read that with central planning, the colder it was, the more coal your city got. Nope, can’t see any potential gaming there at all.
I really need to find the article I read years ago in a “real” sciency magazine. Basically, the author admitted that he really, really hated to say that tree rings were, at best, accurate to within 2 degrees F because it would be used by “deniers”. Seriously, that’s the mindset of these folks: use inaccurate data, but blame someone for pointing out you are using it. Like a drunk who gets upset when you count his drinks I guess.
Not saying I like their measurements, or their science, but if you take a series of measurements with a known accuracy you can extract a mean value with higher accuracy than the accuracy of individual measurements. The standard error on the mean value falls as you add more data.
I did a simple test to make sure I was right, made a million gaussian distributed numbers offset to the value of 10.215. I changed the number series into values to the nearest 0.1, averaged and get numbers close to the 10.215, closer than the accuracy of each number, which was 0.1.
I still do not think that they understand the systematics of their temperature experiments very well, and do a poor job on data reduction which lacks in transparency and simplicity.
Will
As the paper says, and as they teach experimental physicists, do the groundwork on your systematic errors. If you don’t you wind up with an experiment which shows nothing but that you didn’t do the critical calibration work to show what the systematic error was. There is no magic statistical bullet which eliminates that error. Dr. Frank’s first graphic showed that conclusively.
To make it simple, you have a highly accurate gun, with extreme precision, it places the projectile on the target every time in the same hole, but it’s pointed to the shooting bay next to yours not your target. That is systematic error.
William, you’re dead-on right. And we can augment your great shooter analogy with systematic error by adding the notion that serious hand tremors (uncontrolled variables) strongly compound the targeting error.
Will – your result is only valid for unsystematic error. Systematic error or bias cannot be averaged out – it needs to be corrected for and that is where the uncertainty lies.
Exactly right, Bob Ryan.
And given that the air temperatures are historical, none of the contaminating systematic error in the past temperature record can be known or corrected out.
Will, My key takeaway from the article is that the errors of thermometers are being treated as your gaussian-distribution experiment: as symmetrical, “random” noise that will cancel out more and more as we collect more and more samples. To the extent that I understand the article, he shows that the errors are not symmetrical and are not “random”, but are in fact systemically biased.
So you are correct that many measurements can be used to more accurately determine a value than any individual measurement can. Under the right conditions, and given that the measurements/equipment/environment are consistent with a particular kind of error. The problem is, outdoor temperature measurements aren’t consistent with that kind of error.
For example, look at Figure 10. The bucket measurements are fairly symmetrical. The engine intake measurements are not symmetrical, so your nice gaussian simulation isn’t applicable.
Did you notice Figure 5. The instrument error there is nor Gaussian.
You can’t average out these failings.
If you could then just taking an anomaly would fix the problem.
That is the biggest error I have seen: assuming error is normal without testing it to prove the assumption. Non-Normal Errors do not average out but add a systemic error to the process every time! The errors shown in this report all are skew errors with long right tails, meaning they all add a warm bias that is still there after a million readings.
Wayne, M Courtney, Owen, you’ve all got it exactly right. And given the publication record, and the invariable assumption therein of random measurement error, you all understand something that is apparently completely lost to the professionals in the field.
Measuring the same thing 100 times with the same, accurate, measuring device is not the same as measuring it with one hundred different, accurate, devices.
Yes indeed, measure the diameter of a drilled hole with calipers, a coordinate measuring machine, dedicated bore scope, a set of gauge pins, and a yard stick and you will get five different answers.
And now, just for fun, think about the fact that there are at least 1250 tide gauges around the world and we are treated to scientists telling us they know the rate of sea level rise to a tenth of a millimeter per year.
This sounds similar to the law of large numbers, where averaging allows you to narrow the error band below the inherent accuracy of the instrument. However that doesn’t apply here. In order to apply, you have to measure the same thing in space and time, not different things in different spaces and different times. Proper application: go to the hardware store and look at the display of outdoor thermometers. Quickly record all 200 readings. Now you can use the law of large numbers. Improper application: go to the hardware store and look at the display of outdoor thermometers. Record ONE reading. Repeat over the course of a year. Now you CAN’T use the law of large numbers. Which situation is more like the attempt to measure “global” temperature?
In addition, your experiment ignores the finding that the error distributions are NOT Gaussian as your test assumes. Look at the Winter errors for MMTS and the daytime errors for the Young PRT above.
Finally, your test assumes that there are no complex systemic errors, only a simple offset error.
Uh look at the data.
The systematic error is not Gaussian around the mean.
Oh – let’s take a thermometer with a 1 degree C accuracy. Let’s take 10,000 measurements – voila, we know the temperature to 1/100 degree C accuracy.
Not on my planet.
I could see this argument if you made the measurements of the exact same thing at the exact same time, as if one took a thousand thermometers and measured the temperature in one place at the same time. That is not what we have here. They’re taking thousands of measurements from different places on the earth and claiming they can use the same technique to get a very accurate of the Earth’s temperature, but what they have is thousands of individual measurements of different things and claiming they are of the same thing.
James Schrumpf, I’ve published a paper on systematic error in the temperature record here (869.8 KB), that discusses the cases you raise.
It’s open access.
In short, so long as thermometer error is random, measurement error will decrease in the mean of measurements from different thermometers. But as soon as systematic error enters, all bets are off.
I would accept that your experiment is valid for random errors. I believe it is important to understand that temperature monitoring includes a natural bias toward higher readings due to the fact that all extraneous heat must be excluded from the sensor to get a good reading. One cannot remove more than all extraneous heat so that aspect of error is eliminated. Beyond instrument error and calibration error, this fact is the constant inherent bias in temperature readings that makes it different than most other readings and causes readings to show upside error preferentially.
William Handler, first, your analysis is correct only if error is randomly distributed.
Second, accuracy is determined by calibration experiments. The random error approach to a better mean is strictly true for precision, but not for accuracy.
That is, repetition of inaccurate measurements that are also contaminated with random error will converge to a true mean. But that mean will remain inaccurate.
The systematic error discussed in my essay derives from uncontrolled environmental variables, mostly wind speed and irradiance. The induced error is variable in time and in space, is not randomly distributed, and does not average away. Systematic error may even get larger with repeated measurements.
Thanks Pat, makes sense. I read all the other replies as well!
Thanks right back, William.
I had trouble getting this one across to people who bow to the law of large numbers. You can get the centre of a dart board to the nearest millimetre if you throw a million darts perfectly randomly and measure them to the nearest millimetre. If you are measuring where the darts hit to the nearest metre then its pointless. You can’t tell if its perfectly randon or there is mm systematic error, and you most likely have a systematic error because the resolution is so poor.
Excuse the systematic spelling errors.
Suppose you can measure to an accuracy of +/-0.1 degree (random) PLUS a bias of one degree high. No amount of averaging will do anything to that bias. How could it.
” Suppose you can measure to an accuracy of +/-0.1 degree (random) PLUS a bias of one degree high. No amount of averaging will do anything to that bias. How could it.”
The day to day change in min temp removes that bias, but you don’t get an absolute value. But for this issue (co2 warming) you don’t need absolute value, you really want the change.and if absolute is required, you can get that from an accurate station and use that value to start tracking the day to day change from.
“But for this issue (co2 warming) you don’t need absolute value, you really want the change.and if absolute is required, you can get that from an accurate station and use that value to start tracking the day to day change from.”
You are assuming that the systematic error is stationary and/or Gaussian. Since as the article explains the systematic error is caused primarily by environmental conditions do we expect those conditions to stay the same, get better, or get worse with time.
I came up with my own method that reduces these kinds of systematics to the smallest value possible.
I was interested in how fast it cools. How fast does it close at sunset, through the blanket of Co2 that was suppose to be our death. I knew after spending many evenings setting up my telescope up, it cools fast. I theorized that if Co2 was causing warming, at the fastest cooling rates the planets sees every day, if we couldn’t tell, Co2 had already slowed it down, it was ineffective, no matter what a jar in a lab or made up global temperature series.
So I look at the difference between todays min temp and this afternoons max temp, and the that same max with tomorrow mornings min temp.
Now, what kinds of systematics will a station see in one 24 hours cycle.
1) Slow changes in everything. Grass growing, trees growing, things getting dirty. Things on a slow cycle, that eventually go away, will be removed from future 24 hr cycles with the opposite signed event likely the same magnitude and over a year it will be removed from the over all average.
2) Sharp events, the parking lot gets paved with asphalt will be a shift. but in a week, what is the difference in the cycles, I’m not looking at the temp, I look at how much it changed. so two weeks ago it change about 18F per day average, and this week it’s going to change near 18F per day, it will be at a higher temp, and that 24 hours cycle will have a bump in it’s derivative (because that is basically what I’m calculating because I have a constant cycle time). That bump will go into the average for a years worth of that station. But you know what, by winter, that asphalt is as cold as everything else. Same with an instrument error, if it goes bad it’s removed, and if it doesn’t produce enough days of days per year I don’t include it. If it reads high, it still drops the excess error by the next morning, so the rate of change of the station is higher than it should be, but so is it’s cooling rate, it averages out. Then I also can take advantage of the slow change as the length of day changes, for each station I track how fast the temp changes as the amount of energy is applied.
If I was designing an experiment, trying to understand the thermal response of a complex system, I’d apply a different amount of energy, and then measure it’s response, and then the next period I’d increase it a little and measure that.
You might be able to determine it’s dynamic response, and if Co2 was altering it.
And the answer is, if it is doing anything, it’s barely visible, and there are other process altering the regional climates orders of magnitudes larger than what Co2 is capable of doing.
All of the kvetching the last 30 years are from the oceans, land use, the Sun and not Co2. It’s obvious.
But you’ll never see it when you look at GAT.
https://micro6500blog.wordpress.com/2015/11/18/evidence-against-warming-from-carbon-dioxide/
micro6500, taking anomalies accomplishes nothing when the systematic error is non-normal and varies with every measurement.
” , taking anomalies accomplishes nothing when the systematic error is non-normal and varies with every measurement.”
Can you provide an example of this type of error?
Are you referring to measurement uncertainty? That I agree is not removed in the method I use.
But that’s why I ask for an example, so I understand.
micro6500, all of the land surface temperature errors in my article are examples of that sort of error.
The errors vary with wind speed and irradiance, which vary in time and space.
” The errors vary with wind speed and irradiance, which vary in time and space.”
Ok, thanks.
So, with a methodology trying to resolve to a unique value, I agree with you, solving for an anomaly or averaging imprints the error 8nto the data.
But I’m not solving for a temperature, I’m solving for a derivative based on the min and max temps, and when trying to resolve mean temp to a hundredth or thousandth of a degree the wind makes an important difference, depressing say max temp on one day which alter daily mean, but the way I use it, it would depress both rate of warming and rate of cooling equally, and when you subtract the 2 rates, the wind didn’t change anything.
In other cases, any stack up of errors, as best as I can think of, maybe I get a low difference rate value, but it has to return back to normal, and I get the reverse sign, and it averages away.
micro6500, assuming the wind effect always changes sign identically in warming and cooling is exactly the same assumption as that all measurement error is random.
That is, you’re making the same assumption as the compilers of the published temperature record.
There is no physical reason to think the wind effect auto-cancels. Nor the effect of irradiance.
Pat, it cancels because the sample is at the min or max.
Look, max can artificially high or low, same with the following min, but if it is, it’s no difference than weather, it has to return at some point to the local macro climate for that time of year.
And you can see that if you take the derivative of daily max, it’s annual average is a few hundredth of a degree, it averages out over a year. Min however doesn’t. Go look if you haven’t, and then look at the daily rate of change during the year, this is a strong signal based on the change in length of day.
https://micro6500blog.wordpress.com/2015/11/18/evidence-against-warming-from-carbon-dioxide/
Wow, what a great exposition. I spent a few years gathering data and studying the effects of temperature on linewidth control in ICs during the 80s, and beat the process into a +/-0.1°F to achieve a linewidth of 3 sigma=+/-0.1 micron.
As a sarcastic remark regarding the long term temperature records and the modern ‘adjustments’ to them, I once observed that people are getting taller. Therefore, a fixed height thermometer would be consistently under-read in the past and over-read today. Thus spuriously explaining the ‘adjusting’ to lower, past records, and raising modern ones.
[The opposite though? People are taller now, and so reading the fixed height thermometer from a higher relative point of the top of the mercury column. .mod]
You’re right, not enough coffee yet!!
With one or two exceptions. I have to get a ladder just to see the bulb, let alone where the top of the mercury is in the tube.
Beautiful piece of work!! Instrument accuracy, measurement precision, siting biases and retroactive “fixing” of the data have always left me feeling that the instrumental record tells us nothing about historic global average temperature.
I think I understand why you say the central limit theorem doesn’t apply, but that doesn’t rule out that the average is more accurate than the individual measurements, just that you can’t prove it with that. Could you explain please?
The average (mean) is more precise than the individual measurements, not necessarily more accurate. For some purposes — and measuring “global temperature change” may be one of them — precision is good enough.
For others like evaluating basic physical equations that have temperature terms, you need accuracy and the fact that you know an incorrect T value to several decimal places isn’t going to improve your (probably wrong) answer.
Every measurement includes a systematic and random error component. The random errors from many measurements will form a normal Gaussian distribution and the systematic error will shift (or offset) that distribution away from the true value. Averaging would only move the measurement closer to the true value plus the systematic error component (i.e. reduce the magnitude of the random error). This may sound beneficial at first but systematic error is often a few orders of magnitude greater than random error. You may use repeatability of measurement as a measure of accuracy but you are only confirming the precision of the measurement and not its absolute accuracy.
What is described here is not really systematic error, which is constant. This should mean that the trend of world temperatures is correct but the absolute value of the baseline is not. This would still confirm global warming. However, if temperature sensors have a tendency to drift higher over time, then the measurement bias would be toward higher temperatures. But this article doesn’t really address this.
Precision and accuracy are very different. The average could be more precise, but not more accurate than the individual measurements. However, precision would only increase with more measurements if the distribution of the measurement values were a Gaussian distribution.
Exactly right, isthatright. 🙂
Andrew, given accurate measurements with only random error, then the mean of many measurements has improved accuracy.
Systematic error from uncontrolled variables does not reduce in a mean, because it’s not normally distributed.
I am stunned. As an engineering student, measurement and the error associated with it were very important in all our experimental reports. It always seemed to me that the published temp records had error that were too small. But I never delved into how the measurements were made or processed. Thank you for a clear and compelling analysis!
You’re well-trained, MikeC, and in a position to speak out about the total neglect of error in consensus climatology
Maybe as I layman I just don’t understand, but (and I asked about this in a recent Bob Tisdale topic) why is it that over land we measure the air temperature about 1 meter above the surface while over the oceans we are measuring the surface of the ocean temperature? Is there an exact correlation between the ocean surface temperature and the air temperature 1 meter directly above it? If it isn’t exact then it would seem to me that this would be another area where an uncertainty bias is being recorded and ignored.
Added to what is being discussed, the “Global Average Surface Temperature” becomes even more uncertain.
The ocean ‘temperature’ is a real problem. The surface skin temperature can remain incredibly stable while the temperature 6 inches down can easily vary 20˚ depending on the solar insolation or lack of. Pretty much the opposite problem of measuring the actual surface temperature of a driveway compared to the air.
No.
https://www.researchgate.net/profile/Akiyoshi_Wada/publication/225649603_Diurnal_sea_surface_temperature_variation_and_its_impact_on_the_atmosphere_and_ocean_A_Review/links/0a85e539f0d1e1c144000000.pdf
Toneb – “No.”
Did you read your reference? “While the diurnal thermocline vanishes by sunrise nextmorning, the skin layer usually exists in both the daytimeand nighttime, even in windy conditions
Diurnal sea surface temperature variation and its impact on the atmosphere and ocean: A Review (PDF Download Available). Available from: https://www.researchgate.net/publication/225649603_Diurnal_sea_surface_temperature_variation_and_its_impact_on_the_atmosphere_and_ocean_A_Review [accessed Apr 19, 2016].
Jinghis:
Yes I did read it thanks.
Try reading the post I replied to.
The “no” was in reference to his absurd 20 dog variance in temp 6 ins down.
Toneb
Ahh sorry, I see we are on the same page.
JohnWho,
When I was serving on a weather observing cargo ship the sea temperature was taken anything from 15 to 30 feet below the sea surface, because that is where the engine cooling water intake was.
My question still remains:
Is there a direct relationship between the SST and the air temperature 1 meter above the sea surface?
“Oddly” is right, Bob. And yet, it passed peer review. And so it goes, these days.
JohnWho, the marine air temperatures are not used in part because they represent different (unknown) ship heights, wind speeds, and irradiance.
So, SSTs are used on the pretty good assumption that the air immediately above the sea surface is at pretty much the same temperature as the sea surface.
The problem is, of course, and jinghis has noted, that SSTs were not necessarily measured at the sea surface.
Pat Frank says: “JohnWho, the marine air temperatures are not used in part because they represent different (unknown) ship heights, wind speeds, and irradiance.”
I’m commenting on the “…marine air temperatures are not used…” part of that sentence. Oddly, NOAA used night marine air temperatures (an inferior dataset based on the quantity of observations) to bias adjust sea surface temperature data.
“Pat Frank
April 19, 2016 at 9:49 pm
JohnWho, the marine air temperatures are not used in part because they represent different (unknown) ship heights, wind speeds, and irradiance.
So, SSTs are used on the pretty good assumption that the air immediately above the sea surface is at pretty much the same temperature as the sea surface.”
“Pretty much the same temperature” plus/minus how much?
Seems this is just one more of the systematic errors not being addressed.
Especially if a GASTA is being determined by mixing surface station air temperatures with sea surface temperatures.
“In graphical presentations of measurement or calculational data, systematic error is represented using uncertainty bars. [1]” Is this correct? I usually think of error bars as representing random rather than systematic error. If the syetematic error were known it would be corrected.
Systemic error does not refer only to simple offset error. Look at the results for the Young PRT again. Notice that the daytime errors are a range, and non-Gaussian to boot. The size of the error is different for each measurement. You’d have to construct an algorithm that takes into account the time of measurement and the distribution of the error at that time of day, assuming that based on TOD the errors were, in fact, normally distributed.
seaice1, typically systematic error is estimated by doing calibration experiments under the experimental (observational) conditions. The average of the error is calculated as the usual root-mean-square and then the “±average uncertainty” is appended to every experimental (observational) result.
“Resolution limits and systematic measurement error produced by the instrument itself constitute lower limits of uncertainty. The scientists engaged in consensus climatology have neglected both of them.”
This sums up all of experimental climate science. I have yet to see a data plot with error bars of any kind in this field unlike say, physics, where every data plot includes them.
Mean sea level and sea ice extent are also prime candidates for a serious examination of measurement errors. Anyone who’s spent an hour at the ocean and seen how the wind, the currents, and the tides affect the surface, has to wonder how they get 1mm or better precision for sea level.
Yep, over 1250 tide gauges around the world, and satellites that measure the radar reflections from the bottom of wave troughs.
Boggles the mind it does.
In some ways this provides justification for the treatment of seawater temperatures in Karl et al 2015.
The whole field is rife with meaningless measurements so one more won’t matter.
Man, I missed a good conference. I can imagine the acrimony flying about at this presentation. I am surprised the speaker made it out of the venue to publish this as those who truly believe tend to get a bit violent in their rhetoric (if not in an actual physical sense – though with the AGs prosecuting “heretics” it is getting there). All the conferences I attend are fairly staid with nary a raised voice and lots of pats on the back.
Indeed, Owen (in GA). My first thought upon seeing what group it was that Dr. Frank presented his excellent-as-usual findings to was, “that took guts.” The spirit of Dr. William Gray (and John Daly and Bob Carter and Hal Lewis, et. al.) is alive and well. Well done, Dr. Frank! We are proud of you.
Thank-you, Janice. 🙂 If I may reassure you, though, the World Federation of Science is not at all like the Union of Concerned Scientists. The WFS is, well, actually concerned with dispassionate objective science.
Owen, most of the audience members were polite and interested, some were very accepting of the points made, and only a few were upset. So, it wasn’t dangerous.
The email debate was a bit fraught, though, and could have had an unpleasant personal fallout if it had gone the other way.
Compare that to the previous 30 years without /or a different error and you have artificially created an anomaly.
But here we are discussing thousands of instruments subject to numerous changes over a long time period.Often with instrument changes at various locations. I do not think the systemic error can be so easily dismissed.
Now we see why the NOAA NCEI and Karl et al 2015 were so eager to adjust the bouys and Argo floats SST record up to match the ship engine intakes. There is a +0.5C average error in the ship engine intakes.
Here are the problems I have found with ARGO Buoy System. actually un-intended or ignored biases.
1. The ARGO Sales Brochure and calibration sheet (Dec 8 handout) claim it is calibrated against a TTS (Temperature Transfer Standard). It is not calibrated against an actual Triple Point Water (TPW) bath and a Gallium Melt Point (GPW) bath, or even boiling water at STD T/P. A TTS is an ultra-high accuracy resistor that can be used to provide the EXACT resistance of various Temperature sensors (e.g. RTD, PRT, etc.). It does not “magically” make the exact temperature (think oven or refrigerator) specified. The TTS is hooked up to the electronics with wires (calibrated leads), thus the temperature probe is not in the loop.
2. All of the data about accuracy is for the electronics only. PERIOD. It does not include, or even state the accuracy of the sensor, or how it is affected by Pressure (depth) or temperature. They just claim the sensors are highly accurate and high speed – fast. (See #1)
3. Also not included are the effects of ambient temperature on the electronic equipment. All of the SBE equipment that I have seen data sheets for will repeatedly and flawlessly provide the data they profess on the data sheets when, and only WHEN performed in laboratory environment. They do not provide any data as to what happens to the equipment or electronics under different ambient conditions. Read them yourself – it is not there. Real Scientific Instruments provide this data. Look through a Fisher Scientific, Omega Engineering, or other scientific instrument supplier and the ambient data is giving on the better equipment (or will with a phone call). Why isn’t SBE providing this data?
4. Where is the data for the PRT sensor? Are they telling us that everyone is exactly the same and provide exactly the same resistance as every other PRT they make, with exactly the same curve and readings at each of the ONLY two reference points they have calibrated this super expensive boondoggle as ever other PRT they make? I have only one word for that claim – B…… Again, why is this data missing?
5. Also missing is the effects of the probe (enclosure surrounding the “highly accurate” PRT temperature sensor). It is designed to withstand test depth plus some unknown margin (not specified on their “Sales Brochure”) or it would leak. That means there is a need to transfer the temperature of the ocean to the PRT. That means there is a gap between the two surfaces. That gap causes four (readily apparent) things 1. Decreases the speed, 2. Causes latency due to the fact that the probe must also change temperature. 3. The biggest problem is the gap causes a difference in temperature. 4. With cyclical pressure increases/decreases the gap will increase aggravating this condition. (I have seen it happen many times.) This the reason that you calibrate the equipment with a real triple point bath, etc. But this is EXPENSIVE, VERY EXPENSIVE. I have done it. You could buy a house or at least a very nice car for the price of calibrating the entire set of sensors in each probe for what this costs. That is why they use a TTS etc.
6. As explained earlier, all of the “Sales Brochures” indicates information and accuracy that would be obtainable only if used in laboratory conditions (an environmentally controlled area, including at least temperature, humidity, and pressure conditions equal to the conditions of calibration, +/- a few degrees). Equipment like this (this expensive) can result in a difference in displayed value of over 1-2% when subject to a temperature 100 degrees different than the factory calibration ambient. The ARGO probes are, from my understanding, subject to about 50 degrees F change from bottom to top of travel. That tells me that you will get about a 1% error that they do not discuss or deny.
7. Most of the above is also applicable to the surface thermometers. — The electronics are stuck in a small shelter that will beat the same temperature as the “measured” temperature. This means that the electrons is no longer at the same temperature as I it was when calibrated.
Wow.
having personally experienced the difficulties of weighing industrial ceramic components in the green state to four decimal places i can appreciate what you are saying. if the same measured conditions were not maintained in the lab it played havoc with the finished product.
Excellent description of errors and calibration. In addition, almost any instrument will drift over time. Users should be aware of the magnitude of the drift and use that data to determine calibration frequencies.
usurbrain, you’re right. The accuracy and precision statements given in sensor brochures are only for ideal laboratory conditions.
Hence the need for field calibrations, as you so thoroughly explain. Field calibrations of meteorological air and SS temperature sensors are rare birds. This is almost criminal negligence, given the huge value of the decisions that are based on them.
Thanks for the insights about TTS calibration. I wasn’t aware of that.
Pat Frank, Thanks.
Not saying they do not exist, however with the wealth of information I found on the internet I never found a “Calibration Curve” for the RTD (High accuracy resistor) cited or offered. This, after spending more than a week looking (I am retired and spent more than 8 hour days looking.) I worked with precision RTDs (0.1%) and all came with a calibration curve providing the exact resistance at 5 specified points. I would use these to adjust the function curve for the electronics, rather than using the standard “Alpha” curve. [“Alpha” is the slope of the resistance between 0°C and 100°C. This is also referred to as the temperature coefficient of resistance, with the most common being 0.00385W/W/°C.] I then “field calibrated” them to five points, two of which were the triple point and Boiling point at STP/Elevation for water. other three usually were “within tolerance.” The only thing I ever use the TTS for was a “Is it broke test.” These were also three wire RTDs so that the lead/termination/junction resistance temperature effects could be compensated for. I could find no evidence of this being done. Again, if used in a laboratory with a controlled environment, there is no need for that, the eads junctions are not going to see a temperature change. Look at any Precision Calibration facility, on the wall you will see a thermometer, barometer, Humidity, and other important environmental gauges, usually recording or-connected to a computer now-days.
In a nutshell this all looks like an expensive scam using expensive “Laboratory” equipment for a purpose in an environment it was not designed for – electronically. Or,at least done so by some graduate assistants that had the technical/book knowledge but none of the real world practical experience.
usurbrain, those are all extremely salient criticisms, and seem centrally important.
I’d wonder if there was a way you could write them up analytically, and submit that to an instrumental journal.
The problem is that the error isn’t always 1 degree too high. It has a skew right distribution that is not normal about some consistent high point. Your accuracy will never be higher than the error bars in the skew. The law of averages ONLY applies to NORMAL DISTRIBUTIONS.
Only problem is that we no longer use computer systems 10 years, (How old is your PC?) over 100 years even the NWS will replace equipment several times.
This is pitiful. The Alarmists completely ignore their own surface temperature measuring instruments shortcomings, but bug the poor purveyors of satelllite temperature data to death over their potential instrument errors.
At least we have the satellites. We should get more of them up there. The more, the merrier
Pat
Many thanks for putting this together. On the occasions that I comment here and on Bishop Hill it seems like I’m a broken record when something so obvious and normal to experimenters and engineers is ignored in favour of blind theory.
Your Figure 12 (right) could be captioned “I’ll see your global warming and raise you a London Bus” such as the size of those error bars.
Thanks, mickyh. Investigating the whole AGW thing has been like entering quicksand. The deeper I got into it, the more it sucked me in. That essay is the part of the result of the struggle.
Re: “More recently, Argo buoys were field calibrated against very accurate CTD (conductivity-temperature-depth) measurements and exhibited average RMS errors of ±0.56 C.”
I thought that only the cool argo results were the ones that needed tossing. (sarc)
At least that is what I learned from the words of the master argo data tosser:
“First, I identified some new Argo floats that were giving bad data; they were too cool compared to other sources of data during the time period. It wasn’t a large number of floats, but the data were bad enough, so that when I tossed them, most of the cooling went away. But there was still a little bit, so I kept digging and digging.”
http://earthobservatory.nasa.gov/Features/OceanCooling/
Obviously, being a scientist, Willis must have then also conducted the same shambolic hand cherry-picking exercise but assuming that he had a preferential bias to discover argo floats that were “too WARM compared to other sources of data during the time period” – and tossing them instead.
Oh, no wait a minute – nope, he obviously ran out of time and then forgot what it was that he was supposed to be doing. Never mind.
preferential bias ?
I’d call it a political bias, but then what do I know.
Ad hominem’s against Willis?
No reason? Just your typical assumed errors?
For the sake of clarification. My comment refers to Josh Willis.
The character at NASA, who is charitably depicted in my link.
Not Willis E. – the author of the O.P.
I think that may explain where our wires got crossed.
Since, I am certainly not presenting an ad-hom.
indefatigablefrog:
This comment refers to Josh? You mean that the sentence should start with “Josh must have…”, not Willis?
indefatigablefrog:
OK, I’ve got it now. You mean Josh Willis, keeper of the earth observatory.
Whenever I read Willis, I think Willis Eschenbach. Implicit versus explicit logic drives me crazy.
You have my apology for misreading and confusing your post.
I actually like Earthobservatory and have had a few discussions back and forth. To me it is a beautiful programming artifact and that someday will also represent accurate data.
Right now, they do the NASA/NOAA the world is on the edge of climate change disaster.
I do wish that Earthobservatory would actually identify what is modeled versus what is real data virtualization. e.g Earthobservatory uses the NASA/NOAA modeled CO2 data instead of pulling actual data right from the satellite dbs.
Temperature, Sea surface temperatures, etc are all the NOAA highly processed muck, not actual data.
OMG I cant believe NOAA would even publish such a thing. Its scientific rape. Willis basically went out and looked for any reason he could think up to warm the record.
Sorry NASA, but that is even worse. That NASA could think that such an analysis is valid is baffling.
I’ve been saying for years that when you add together the resolution limits of the instruments, lack of basic site and sensor maintenance, and lack of spatial coverage, the true error bars for the ground based system is around 5C for modern measurements, and it increases as you go backwards in time.
“The first quality control test for any given station record includes a statistical check for correlation with temperature series among near-by stations.”
Wrong.
Depending on the source there are other QC tests that are performed first. Many in fact.
And depending on the processing correlation can be performed on a variety of metrics.
And in some cases ( to test the importance of this decision ) you can drop this testing
altogether.
QC or no QC the answer is the same: Its warming. There was an LIA
Mosher,
Figure 12 says you are wrong. We don’t actually know what the temperatures were, so we don’t know how they have changed. This is why the phrase “Climate Scientists” should always be inside quotation marks…
A tour de force presentation. The author brings a Big Bertha cannon argument and all Mosher can do is attempt to use a spit clogged pea shooter. That is the wimpiest non-rebuttal rebuttal ever by the Oz. This tells me the post deserves an A+ . I enjoyed it very much.
Bookmarked under the ever growing list of reasons to, at the very least, have some doubts about the consensus view.
Steven Mosher April 19, 2016 at 7:11 am Edit
Steven, always good to hear from you. However, I think you might have missed the word “includes” in the statement that “the first test … includes a statistical check for correlation ….”.
I know that in my own work, if I’m looking at a new dataset, comparing it to other nearby measurements is included in my list of early tests.
Steven, “QC” is a red herring, it has little to do with what Pat is saying. Pat’s claim is that you have greatly underestimated the true error in your Berkeley Earth results. As a a response to Pat’s claim, your comment is … well … unresponsive.
However, given that you never did reply to my pointing out other issues with the Berkeley Earth methods in my post Problems With The Scalpel Method, while your response is unresponsive, it is not unrepresentative.
Zeke Hausfather did reply, he’s good about that, saying:
Zeke Hausfather June 29, 2014 at 12:01 pm Edit
So he says that
a) they haven’t actually tested their algorithm by e.g. adding synthetic sawtooth anomalies to real data, and
b) even if the error is there, it would be removed because the station would diverge from its neighbors over time.
Perhaps … and perhaps not. The point of relevance to this discussion is that there is additional uncertainty introduced by the use of the scalpel method, and I don’t see that being accounted for in the Berkeley Earth error estimates.
Best regards,
w.
Thank-you, Willis. 🙂
”
Steven Mosher
April 19, 2016 at 7:11 am
…
Its warming. There was an LIA”
Better might be “It has probably warmed. There was an LIA”,
leaving the question regarding the amount of warming and whether it continues
both scientifically debatable and, based on the quality of the historical atmospheric temperature data,
probably unknowable.
Well, the last half of that is correct. Normal warming since the last LIA.
“It’s almost as though none of them have ever made a measurement or struggled with an instrument. There is no other rational explanation for that sort of negligence than a profound ignorance of experimental methods.”
I put an FOI request into the UEA to ask them what quality standard they used (e.g. ISO9000). As expected, they didn’t have any quality standard. Likewise, I doubt anyone them have ever actually made any serious measurements in real life situations – let alone had thousands of sensors that all needed calibrating.
that is an utterly astounding admission .
Anybody know if ARGO has fixed its digital “trust” problem? Based on the technical information that was published after the start of this program, there is no digital proof that a given set of measurements came from a given probe, at a given place, at a given time. They instead trust the person(s) who collect the raw data from the buoys — and publish it. This failure to secure the real world to digital world interface would be totally unacceptable in any trusted system. The methods and technology to do this were well understood at the time this system was designed (e.g. your cell phone/network does it every time you make a call) so this is pretty inexcusable.
Any digital reporting protocol I designed would include who I am, where I think I am, what time I think it is, how long since I last reported, my internal state ( battery level, temperature, humidity, vibration ) and so on before I got as far as sending the data. And that’s before putting serious thought into it. Where was the ARGO technical information published?
Good discussion of instrument errors. The existence of non-Gaussian (non-random) errors seems to be an example of the old parable about the drunk looking under the streetlight for the item he dropped in the dark area farther down the block–treating errors as random is so much easier to deal with mathematically.
You’re right, Tom. Accepting that the measurement error is non-random and systematic would leave them with nothing to talk about. Where’s the fun in that? 🙂