Guest essay by Pat Frank
Presented at World Federation of Scientists, Erice, Sicily, 20 August 2015
This is a version of the talk I gave about uncertainty in the global average air temperature record at the 48th Conference of the World Federation Scientists on “Planetary Emergences and Other Events,” at Erice, Sicily, in August of 2015.
It was a very interesting conference and, as an aside, for me the take home message was that the short-term emergency is Islamic violence while the long-term emergency is some large-scale bolide coming down. Please, however, do not distract conversation into these topics.
Abstract: I had a longer abstract, but here’s the short form. Those compiling the global averaged surface air temperature record have not only ignored systematic measurement error, but have even neglected the detection limits of the instruments themselves. Since at least 1860, thermometer accuracy has been magicked out of thin air. Also since then, and at the 95% confidence interval, the rate or magnitude of the global rise in surface air temperature is unknowable. Current arguments about air temperature and its unprecedentedness are speculative theology.
1. Introduction: systematic error
Systematic error enters into experimental or observational results through uncontrolled and often cryptic deterministic processes.  These can be as simple as a consistent operator error. More typically, error emerges from an uncontrolled experimental variable or instrumental inaccuracy. Instrumental inaccuracy arises from malfunction or lack of calibration. Uncontrolled variables can impact the magnitude of a measurement and/or change the course of an experiment. Figure 1 shows the impact of an uncontrolled variable, taken from my own published work. [2, 3]
Figure 1: Left, titration of dissolved ferrous iron under conditions that allowed an unplanned trace of air to enter the experiment. Inset: the incorrect data precisely followed equilibrium thermodynamics. Right, the same experiment but with the appropriately strict exclusion of air. The data are completely different. Inset: the correct data reflect distinctly different thermodynamics.
Figure 1 shows that the inadvertent entry of a trace of air was enough to completely change the course of the experiment. Nevertheless, the erroneous data display coherent behavior and follow a trajectory completely consistent with equilibrium thermodynamics. To all appearances, the experiment was completely valid. In isolation, the data are convincing. However, they are completely wrong because the intruded air chemically modified the iron.
Figure 1 exemplifies the danger of systematic error. Contaminated experimental or observational results can look and behave just like good data, and can rigorously follow valid physical theory. Without care, such data invite erroneous conclusions.
By its nature, systematic error is difficult to detect and remove. Methods of elimination include careful instrumental calibration under conditions identical to the observation or experiment. Methodologically independent experiments that access the same phenomena provide a check on the results. Careful attention to these practices is standard in the experimental physical sciences.
The recent development of a new and highly accurate atomic clock illustrates the extreme care physicists take to eliminate systematic error. Critical to achievement of its 10-18 second accuracy, was removal of systematic error produced the black-body radiation of the instrument itself. 
Figure 2: Close-up picture of the new atomic clock. The timing element is a cluster of fluorescing strontium atoms trapped in an optical lattice. Thermal noise is removed using data provided by a sensor that measures the black-body temperature of the instrument.
As a final word, systematic error does not average away with repeated measurements. Repetition can even increase error. When systematic error cannot be eliminated and is known to be present, uncertainty statements must be reported along with the data. In graphical presentations of measurement or calculational data, systematic error is represented using uncertainty bars.  Those uncertainty bars communicate the reliability of the result.
2. Systematic Error in Surface Temperature Measurements
2.1. Land Surface Air Temperature
During most of the 20th century, land surface air temperatures were measured using a liquid-in-glass (LiG) thermometer housed in a box-like louvered shield (Stevenson screen or Cotton Regional Shelter (CRS)). [5, 6] After about 1985, thermistors or platinum resistance thermometers (PRT) housed in an unaspirated cylindrical plastic shield replaced the CRS/LiG sensors in Europe, the Anglo-Pacific countries, and the US. Beginning in 2000, the US Climate Research Network deployed sensors consisting of a trio of PRTs in an aspirated shield. [5, 7-9] An aspirated shield includes a small fan or impeller that ventilates the interior of the shield with outside air.
Unaspirated sensors rely on prevailing wind for ventilation. Solar radiance can heat the sensor shield, warming the interior atmosphere around the sensor. In the winter, upward radiance from the albedo of a snow-covered surface can also produce a warm bias.  Significant systematic measurement error occurs when air movement is less than 5 m/sec. [9, 11]
Figure 3: Alpine Plaine Morte Glacier, Switzerland, showing the air temperature sensor calibration experiment carried out by Huwald, et al., in 2007 and 2008.  Insets: close-ups of the PRT and the sonic anemometer sensors. Photo credit: Bou-Zeid, Martinet, Huwald, Couach, 2.2006 EPFL-ENAC.
In 2007 and 2008 calibration experiments carried out on the Plaine Morte Glacier (Figure 3) tested the field accuracy of the RM Young PRT housed in an unaspirated louvered shield, situated over a snow-covered surface. In a laboratory setting, the RM Young sensor is capable of ±0.1 C accuracy. Field accuracy was determined by comparison with air temperatures measured using a sonic anemometer, which takes advantage of the impact of temperature on the speed of sound in air and is insensitive to irradiance and wind-speed.
Figure 4: Temperature trends recorded simultaneously on Plaine Morte Glacier during February – April 2007. (¾), Sonic anemometer, and; (¾), RM Young PRT probe.
Figure 4 shows that under identical environmental conditions, the RM Young probe recorded significantly warmer Winter air temperatures than the sonic anemometer. The slope of the RM Young temperature trend is also more than 3 times greater. Referenced against a common mean, the RM Young error would enter a spurious warming trend into a global temperature average. The larger significance of this result is that the RM Young probe is very similar in design and response to the more advanced temperature probes in use world-wide since about 1985.
Figure 5 shows a histogram of the systematic temperature error exhibited by the RM Young probe.
Figure 5. RM Young probe systematic error on Plaine Morte Glacier. Day time error averages 2.0±1.4 C; night-time error averages 0.03±0.32 C.
The RM Young systematic errors mean that, absent an independent calibration instrument, any given daily mean temperature has an associated 1s uncertainty of 1±1.4 C. Figure 5 shows this uncertainty is neither randomly distributed nor constant. It cannot be removed by averaging individual measurements or by taking anomalies. Subtracting the average bias will not remove the non-normal 1s uncertainty. Entry of the RM Young station temperature record into a global average will carry that average error along with it.
Before inclusion in a global average, temperature series from individual meteorological stations are subjected to statistical tests for data quality.  Air temperatures are known to show correlation R = 0.5 over distances of about 1200 km. [14, 15] The first quality control test for any given station record includes a statistical check for correlation with temperature series among near-by stations. Figure 6 shows that the RM Young error-contaminated temperature series will pass this most basic quality control test. Further, the erroneous RM Young record will pass every single statistical test used for the quality control of meteorological station temperature records worldwide. [16, 17]
Figure 6: Correlation of the RM Young PRT temperature measurements with those of the sonic anemometer. Inset: Figure 1a from  showing correlation of temperature records from meteorological stations in the terrestrial 65-70º N, 0-5º E grid. The 0.5 correlation length is 1.4´103 km.
Figure 7: Calibration experiment at the University of Nebraska, Lincoln (ref. , Figure 1); E, MMTS shield; F, CRS shield; G, the aspirated RM Young reference.
Figure 7 shows the screen-type calibration experiment at the University of Nebraska, Lincoln. Each screen contained the identical HMP45C PRT sensor.  The calibration reference temperatures were provided by an aspirated RM Young PRT probe, rated as accurate to <±0.2 C below 1100 Wm-2 solar irradiance.
These independent calibration experiments tested the impact of a variety of commonly used screens on the fidelity of air temperature measurements from PRT probes. [10, 11, 18] Screens included the traditional Cotton Regional Shelter (CRS, Stevenson screen), and the MMTS screen now in common use in the US Historical Climate Network, among others.
Figure 8: Average systematic measurement error of an HMP45C PRT probe within an MMTS shelter over a grass (top) or snow-covered (bottom) surface. [10, 11]
Figure 8, top, shows the average systematic measurement error an MMTS shield imposed on a PRT temperature probe, found during the calibration experiment displayed in Figure 7.  Figure 8, bottom, shows the results of an independent PRT/MMTS calibration over a snow-covered surface.  The average annual systematic uncertainty produced by the MMTS shield can be estimated from these data as, 1s = 0.32±0.23 C. The skewed warm-bias distribution of error over snow is similar in magnitude to the unaspirated RM Young shield in the Plaine Morte experiment (Figure 5).
Figure 9 shows the average systematic measurement error produced by a PRT probe inside a traditional CRS shield. 
Figure 9. Average day-night 1s = 0.44 ± 0.41 C systematic measurement error produced by a PRT temperature probe within a traditional CRS shelter.
The warm bias in the data is apparent, as is the non-normal distribution of error. The systematic uncertainty from the CRS shelter was 1s = 0.44 ± 0.41 C. The HMP45C PRT probe is at least as accurate as the traditional LiG thermometers housed within the CRS shield. [19, 20] The PRT/CRS experiment may then estimate a lower limit of systematic measurement uncertainty present in the land-surface temperature record covering all of the 19th and most of the 20th century.
2.2 Sea-Surface Temperature
Although considerable effort has been expended to understand sea-surface temperatures (SSTs), [21-28] there have been very few field calibration experiments of sea-surface temperature sensors. Bucket- and steamship engine cooling-water intake thermometers provided the bulk of early and mid-20th century SST measurements. Sensors mounted on drifting and moored buoys have come into increasing use since about 1980, and now dominate SST measurements.  Attention is focused on calibration studies of these instruments.
The series of experiments reported by Charles Brooks in 1926 are by far the most comprehensive field calibrations of bucket and engine-intake thermometer SST measurements carried out by any individual scientist.  Figure 10 presents typical examples of the systematic error in bucket and engine intake SSTs that Brooks found.
Figure 10: Systematic measurement error in one set of engine-intake (left) and bucket (right) sea-surface temperatures reported by Brooks. 
Brooks also recruited an officer to monitor the ship-board measurements after he concluded his experiments and disembarked. The errors after he had departed the ship were about twice as large as they were when he was aboard. The simplest explanation is that care deteriorated, perhaps back to normal, when no one was looking. This result violates the standard assumption in the field that temperature sensor errors are constant for each ship.
In 1963 Saur reported the largest field calibration experiment of engine-intake thermometers, carried out by volunteers aboard twelve US military transport ships engaged off the US central Pacific coast.  The experiment included 6826 pairs of observations. Figure 11 shows the experimental results from one voyage of one ship.
Figure 11: Systematic error in recorded engine intake temperatures aboard one military transport ship operating June-July, 1959. The mean systematic bias and uncertainty represented by these data are, 1s = 0.9±0.6 C.
Saur reported Figure 11 as, “a typical distribution of the differences” reported from the various ships. The ±0.6 C uncertainty about the mean systematic error is comparable to the values reported by Brooks, shown in Figure 10.
Saur concluded his report by noting that, “The average bias of reported sea water temperatures as compared to sea surface temperatures, with 95 percent confidence limits, is estimated to be 1.2±0.6 F [0.67±0.33 C] on the basis of a sample of 12 ships. The standard deviation of differences [between ships] is estimated to be 1.6 F [0.9 C]. Thus, without improved quality control the sea temperature data reported currently and in the past are for the most part adequate only for general climatological studies. [bracketed conversions added]” Saur’s caution is instructive, but has apparently been mislaid by consensus scientists.
Measurements from bathythermograph (BT) and expendable bathythermograph (XBT) instruments have also made significant contributions to the SST record.  Extensive BT and XBT calibration experiments revealed multiple sources of systematic error, principally stemming from mechanical problems and calibration errors. [33-35] Relative to a reversing thermometer standard, field BT measurements exhibited ±s = 0.34±0.43 C error.  This standard deviation is more than twice as large as the manufacturer-stated accuracy of ±0.2 C and reflects the impact of uncontrolled field variables.
The SST sensors in deployed floating and moored buoys were never field-calibrated during the 20th century, allowing no general estimate of systematic measurement error.
However, Emery estimated a 1s = ±0.3 C error by comparison of SSTs from floating buoys co-located to within 5 km of each other.  SST measurements separated by less than 10 km are considered coincident.
A similar ±0.26 C buoy error magnitude was found relative to SSTs retrieved from the Advanced Along-Track Scanning Radiometer (AATSR) satellite.  The error distributions were non-normal.
More recently, Argo buoys were field calibrated against very accurate CTD (conductivity-temperature-depth) measurements and exhibited average RMS errors of ±0.56 C.  This is similar in magnitude to the reported average ±0.58 C buoy-Advanced Microwave Scanning Radiometer (AMSR) satellite SST difference. 
Until recently, [39, 40] systematic temperature sensor measurement errors were neither mentioned in reports communicating the origin, assessment, and calculation of the global averaged surface air temperature record, nor were they included in error analysis. [15, 16, 39-46] Even after the recent arrival of systematic errors in published literature, however, the Central Limit Theorem is adduced to assert that they average to zero.  However, systematic temperature sensor errors are neither randomly distributed nor constant over time, space, or instrument. There is no theoretical reason to expect that these errors follow the Central Limit Theorem, [47, 48] or that such errors are reduced or removed by averaging multiple measurements; even when measurements number in the millions. A complete inventory of contributions to uncertainty in the surface air temperature record must include, indeed must start with, the systematic measurement error of the temperature sensor itself. 
The World Meteorological Organization (WMO) offers useful advice regarding systematic error. 
“Section 22.214.171.124.3 Estimating the true value – additional remarks.
“In practice, observations contain both random and systematic errors. In every case, the observed mean value has to be corrected for the systematic error insofar as it is known. When doing this, the estimate of the true value remains inaccurate because of the random errors as indicated by the expressions and because of any unknown component of the systematic error. Limits should be set to the uncertainty of the systematic error and should be added to those for random errors to obtain the overall uncertainty. However, unless the uncertainty of the systematic error can be expressed in probability terms and combined suitably with the random error, the level of confidence is not known. It is desirable, therefore, that the systematic error be fully determined.”
Thus far, in production of the global averaged surface air temperature record, the WMO advice concerning systematic error has been followed primarily in the breach.
Systematic sensor error in air and sea-surface temperature measurements has been woefully under-explored and field calibrations are few. Nevertheless, the reported cases make it clear that the surface air temperature record is contaminated with a very significant level of systematic measurement error. The non-normality of systematic error means that subtracting an average bias will not discharge the measurement uncertainty about the global temperature mean.
Further, the magnitude of the systematic error bias in surface air temperature and SST measurements is apparently as variable in time and space as the magnitude of the standard deviation of systematic uncertainty about the mean error bias. I.e., the mean systematic bias error was 2 C over snow on the Plaine Morte Glacier, Switzerland, but was 0.4 C over snow at Lincoln, Nebraska. Similar differences accrue to the engine-intake systematic error means reported by Brooks and Saur. Therefore, removing an estimate of mean bias will always leave the magnitude ambiguity of the residual mean bias uncertainty. In any complete evaluation of error, the residual uncertainty in mean bias will combine with the 1s standard deviation of measurement uncertainty into the uncertainty total.
A complete evaluation of systematic error is beyond the analysis presented here. However, to the extent that the above errors are representative, a set of estimated uncertainty bars due to systematic error in the global averaged surface air temperature record can be calculated, Figure 12.
The uncertainty bars in Figure 12 (right) reflect a 0.7:0.3 SST:land surface ratio of systematic errors. Combined in quadrature, bucket and engine-intake errors constitute the SST uncertainty prior to 1990. Over the same time interval the systematic error of the PRT/CRS sensor [39, 49], constituted the uncertainty in land-surface temperatures. Floating buoys made a partial contribution (0.25 fraction) to the uncertainty in SST between 1980-1990. After 1990 uncertainty bars are further steadily reduced, reflecting the increasing contribution and smaller errors of MMTS (land) and floating buoy (SS) sensors.
Figure 12: The 2010 global average surface air temperature record obtained from website of the Climate Research Unit (CRU), University of East Anglia, UK. http://www.cru.uea.ac.uk/cru/data/temperature/. Left, error bars following the description provided at the CRU website. Right, error bars reflecting the uncertainty width due to estimated systematic sensor measurement errors within the land and sea surface records. See the text for further discussion.
Figure 12 (right) is very likely a more accurate representation of the state of knowledge than is Figure 12 (left), concerning the rate or magnitude of change in the global averaged surface air temperature since 1850. The revised uncertainty bars represent non-normal systematic error. Therefore the air temperature mean trend loses any status as the most probable trend.
Finally, Figure 13 pays attention to the instrumental resolution of the historical meteorological thermometers.
Figure 13 caused some angry shouts from the audience at Erice, followed by some very rude approaches after the talk, and a lovely debate by email. The argument presented here prevailed.
Instrumental resolution defines the measurement detection limit. For example, the best-case historical 19th to mid-20th century liquid-in-glass (LiG) meteorological thermometers included 1 C graduations. The best-case laboratory-conditions reportable temperature resolution is therefore ±0.25 C. There can be no dispute about that.
The standard SST bucket LiG thermometers from the Challenger voyage on through the 20th century also had 1 C graduations. The same resolution limit applies.
The very best American ship-board engine-intake thermometers included 2 F (~1 C) graduations; on British ships they were 2 C. The very best resolution is then about ±(0.25 – 0.5) C. These are known quantities. Resolution uncertainty, like systematic error, does not average away. Knowing the detection limits of the classes of instruments allows us to estimate the limit of resolution uncertainty in any compiled historical surface air temperature record.
Figure 13 shows this limit of resolution. It compares the instrumental historical ±2s resolution, with ±2s uncertainty in the published Berkeley Earth air temperature compilation. The analysis applies equally well to the published surface air temperature compilations of GISS or CRU/UKMet, which feature the same uncertainty limits.
Figure 13: The Berkeley Earth global averaged air temperature trend with the published ±2s uncertainty limits in grey. The time-wise ±2s instrumental resolution is in red. On the right in blue is a compilation of the best resolution limits of the historical temperature sensors, from which the global resolution limits were calculated.
The globally combined instrumental resolution was calculated using the same fractional contributions as were noted above for the lower limit estimate of systematic measurement error. That is, 0.30:0.70, land : sea surface instruments, and the published historical fractional use of each sort of instrument (land: CRS vs. MMTS, and; SS: buckets vs. engine intakes vs. buoys).
The record shows that during the years 1800-1860, the published global uncertainty limits of field meteorological temperatures equal the accuracy of the best possible laboratory-conditions measurements.
After about 1860 through 2000, the published resolution is small smaller than the detection limits — the resolution limits — of the instruments themselves. From at least 1860, accuracy has been magicked out of thin air.
Does anyone find the published uncertainties credible?
All you engineers and experimental scientists out there may go into shock after reading this. I was certainly shocked by the realization. Espresso helps.
The people compiling the global instrumental record have neglected a experimental limit even more basic than systematic measurement error: the detection limits of their instruments. They have paid no attention to it.
Resolution limits and systematic measurement error produced by the instrument itself constitute lower limits of uncertainty. The scientists engaged in consensus climatology have neglected both of them.
It’s almost as though none of them have ever made a measurement or struggled with an instrument. There is no other rational explanation for that sort of negligence than a profound ignorance of experimental methods.
The uncertainty estimate developed here shows that the rate or magnitude of change in global air temperature since 1850 cannot be known within ±1 C prior to 1980 or within ±0.6 C after 1990, at the 95% confidence interval.
The rate and magnitude of temperature change since 1850 is literally unknowable. There is no support at all for any “unprecedented” in the surface air temperature record.
Claims of highest air temperature ever, based on even 0.5 C differences, are utterly insupportable and without any meaning.
All of the debates about highest air temperature are no better than theological arguments about the ineffable. They are, as William F. Buckley called them, “Tedious speculations about the inherently unknowable.”
There is no support in the temperature record for any emergency concerning climate. Except, perhaps an emergency in the apparent competence of AGW-consensus climate scientists.
4. Acknowledgements: Prof. Hendrik Huwald and Dr. Marc Parlange, Ecole Polytechnique Federale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland, are thanked for generously providing the Plaine Morte sensor calibration data entering into Figure 4, Figure 5, and Figure 6. This work was carried out without any external funding.
 JCGM, Evaluation of measurement data — Guide to the expression of uncertainty in measurement 100:2008, Bureau International des Poids et Mesures: Sevres, France.
 Frank, P., et al., Determination of ligand binding constants for the iron-molybdenum cofactor of nitrogenase: monomers, multimers, and cooperative behavior. J. Biol. Inorg. Chem., 2001. 6(7): p. 683-697.
 Frank, P. and K.O. Hodgson, Cooperativity and intermediates in the equilibrium reactions of Fe(II,III) with ethanethiolate in N-methylformamide solution. J. Biol. Inorg. Chem., 2005. 10(4): p. 373-382.
 Hinkley, N., et al., An Atomic Clock with 10-18 Instability. Science, 2013. 341(p. 1215-1218.
 Parker, D.E., et al., Interdecadal changes of surface temperature since the late nineteenth century. J. Geophys. Res., 1994. 99(D7): p. 14373-14399.
 Quayle, R.G., et al., Effects of Recent Thermometer Changes in the Cooperative Station Network. Bull. Amer. Met. Soc., 1991. 72(11): p. 1718-1723; doi: 10.1175/1520-0477(1991)072<1718:EORTCI>2.0.CO;2.
 Hubbard, K.G., X. Lin, and C.B. Baker, On the USCRN Temperature system. J. Atmos. Ocean. Technol., 2005. 22(p. 1095-1101.
 van der Meulen, J.P. and T. Brandsma, Thermometer screen intercomparison in De Bilt (The Netherlands), Part I: Understanding the weather-dependent temperature differences). International Journal of Climatology, 2008. 28(3): p. 371-387.
 Barnett, A., D.B. Hatton, and D.W. Jones, Recent Changes in Thermometer Screen Design and Their Impact in Instruments and Observing Methods WMO Report No. 66, J. Kruus, Editor. 1998, World Meteorlogical Organization: Geneva.
 Lin, X., K.G. Hubbard, and C.B. Baker, Surface Air Temperature Records Biased by Snow-Covered Surface. Int. J. Climatol., 2005. 25(p. 1223-1236; doi: 10.1002/joc.1184.
 Hubbard, K.G. and X. Lin, Realtime data filtering models for air temperature measurements. Geophys. Res. Lett., 2002. 29(10): p. 1425 1-4; doi: 10.1029/2001GL013191.
 Huwald, H., et al., Albedo effect on radiative errors in air temperature measurements. Water Resorces Res., 2009. 45(p. W08431; 1-13.
 Menne, M.J. and C.N. Williams, Homogenization of Temperature Series via Pairwise Comparisons. J. Climate, 2009. 22(7): p. 1700-1717.
 Briffa, K.R. and P.D. Jones, Global surface air temperature variations during the twentieth century: Part 2 , implications for large-scale high-frequency palaeoclimatic studies. The Holocene, 1993. 3(1): p. 77-88.
 Hansen, J. and S. Lebedeff, Global Trends of Measured Surface Air Temperature. J. Geophys. Res., 1987. 92(D11): p. 13345-13372.
 Brohan, P., et al., Uncertainty estimates in regional and global observed temperature changes: A new data set from 1850. J. Geophys. Res., 2006. 111(p. D12106 1-21; doi:10.1029/2005JD006548; see http://www.cru.uea.ac.uk/cru/info/warming/.
 Karl, T.R., et al., The Recent Climate Record: What it Can and Cannot Tell Us. Rev. Geophys., 1989. 27(3): p. 405-430.
 Hubbard, K.G., X. Lin, and E.A. Walter-Shea, The Effectiveness of the ASOS, MMTS, Gill, and CRS Air Temperature Radiation Shields. J. Atmos. Oceanic Technol., 2001. 18(6): p. 851-864.
 MacHattie, L.B., Radiation Screens for Air Temperature Measurement. Ecology, 1965. 46(4): p. 533-538.
 Rüedi, I., WMO Guide to Meteorological Instruments and Methods of Observation: WMO-8 Part I: Measurement of Meteorological Variables, 7th Ed., Chapter 1. 2006, World Meteorological Organization: Geneva.
 Berry, D.I. and E.C. Kent, Air–Sea fluxes from ICOADS: the construction of a new gridded dataset with uncertainty estimates. International Journal of Climatology, 2011: p. 987-1001.
 Challenor, P.G. and D.J.T. Carter, On the Accuracy of Monthly Means. J. Atmos. Oceanic Technol., 1994. 11(5): p. 1425-1430.
 Kent, E.C. and D.I. Berry, Quantifying random measurement errors in Voluntary Observing Ships’ meteorological observations. Int. J. Climatol., 2005. 25(7): p. 843-856; doi: 10.1002/joc.1167.
 Kent, E.C. and P.G. Challenor, Toward Estimating Climatic Trends in SST. Part II: Random Errors. Journal of Atmospheric and Oceanic Technology, 2006. 23(3): p. 476-486.
 Kent, E.C., et al., The Accuracy of Voluntary Observing Ships’ Meteorological Observations-Results of the VSOP-NA. J. Atmos. Oceanic Technol., 1993. 10(4): p. 591-608.
 Rayner, N.A., et al., Global analyses of sea surface temperature, sea ice, and night marine air temperature since the late nineteenth century. Journal of Geophysical Research-Atmospheres, 2003. 108(D14).
 Emery, W.J. and D. Baldwin. In situ calibration of satellite sea surface temperature. in Geoscience and Remote Sensing Symposium, 1999. IGARSS ’99 Proceedings. IEEE 1999 International. 1999.
 Emery, W.J., et al., Accuracy of in situ sea surface temperatures used to calibrate infrared satellite measurements. J. Geophys. Res., 2001. 106(C2): p. 2387-2405.
 Woodruff, S.D., et al., The Evolving SST Record from ICOADS, in Climate Variability and Extremes during the Past 100 Years, S. Brönnimann, et al. eds, 2007, Springer: Netherlands, pp. 65-83.
 Brooks, C.F., Observing Water-Surface Temperatures at Sea. Monthly Weather Review, 1926. 54(6): p. 241-253.
 Saur, J.F.T., A Study of the Quality of Sea Water Temperatures Reported in Logs of Ships’ Weather Observations. J. Appl. Meteorol., 1963. 2(3): p. 417-425.
 Barnett, T.P., Long-Term Trends in Surface Temperature over the Oceans. Monthly Weather Review, 1984. 112(2): p. 303-312.
 Anderson, E.R., Expendable bathythermograph (XBT) accuracy studies; NOSC TR 550 1980, Naval Ocean Systems Center: San Diego, CA. p. 201.
 Bralove, A.L. and E.I. Williams Jr., A Study of the Errors of the Bathythermograph 1952, National Scientific Laboratories, Inc.: Washington, DC.
 Hazelworth, J.B., Quantitative Analysis of Some Bathythermograph Errors 1966, U.S. Naval Oceanographic Office Washington DC.
 Kennedy, J.J., R.O. Smith, and N.A. Rayner, Using AATSR data to assess the quality of in situ sea-surface temperature observations for climate studies. Remote Sensing of Environment, 2012. 116(0): p. 79-92.
 Hadfield, R.E., et al., On the accuracy of North Atlantic temperature and heat storage fields from Argo. J. Geophys. Res.: Oceans, 2007. 112(C1): p. C01009.
 Castro, S.L., G.A. Wick, and W.J. Emery, Evaluation of the relative performance of sea surface temperature measurements from different types of drifting and moored buoys using satellite-derived reference products. J. Geophys. Res.: Oceans, 2012. 117(C2): p. C02029.
 Frank, P., Uncertainty in the Global Average Surface Air Temperature Index: A Representative Lower Limit. Energy & Environment, 2010. 21(8): p. 969-989.
 Frank, P., Imposed and Neglected Uncertainty in the Global Average Surface Air Temperature Index. Energy & Environment, 2011. 22(4): p. 407-424.
 Hansen, J., et al., GISS analysis of surface temperature change. J. Geophys. Res., 1999. 104(D24): p. 30997–31022.
 Hansen, J., et al., Global Surface Temperature Change. Rev. Geophys., 2010. 48(4): p. RG4004 1-29.
 Jones, P.D., et al., Surface Air Temperature and its Changes Over the Past 150 Years. Rev. Geophys., 1999. 37(2): p. 173-199.
 Jones, P.D. and T.M.L. Wigley, Corrections to pre-1941 SST measurements for studies of long-term changes in SSTs, in Proc. Int. COADS Workshop, H.F. Diaz, K. Wolter, and S.D. Woodruff, Editors. 1992, NOAA Environmental Research Laboratories: Boulder, CO. p. 227–237.
 Jones, P.D. and T.M.L. Wigley, Estimation of global temperature trends: what’s important and what isn’t. Climatic Change, 2010. 100(1): p. 59-69.
 Jones, P.D., T.M.L. Wigley, and P.B. Wright, Global temperature variations between 1861 and 1984. Nature, 1986. 322(6078): p. 430-434.
 Emery, W.J. and R.E. Thomson, Data Analysis Methods in Physical Oceanography. 2nd ed. 2004, Amsterdam: Elsevier.
 Frank, P., Negligence, Non-Science, and Consensus Climatology. Energy & Environment, 2015. 26(3): p. 391-416.
 Folland, C.K., et al., Global Temperature Change and its Uncertainties Since 1861. Geophys. Res. Lett., 2001. 28(13): p. 2621-2624.