There is a new paper out that investigates something that has not previously been well dealt with related to the surface temperature record (at least as far as the author knows). “Sensor measurement uncertainty”. The author has defined a lower limit to the uncertainty in the instrumental surface temperature record.

UNCERTAINTY IN THE GLOBAL AVERAGE SURFACE AIR TEMPERATURE INDEX: A REPRESENTATIVE LOWER LIMIT
Patrick Frank, Palo Alto, CA 94301-2436, USA, Energy and Environment, Volume 21, Number 8 / December 2010 DOI: 10.1260/0958-305X.21.8.969
Abstract
Sensor measurement uncertainty has never been fully considered in prior appraisals of global average surface air temperature. The estimated average ±0.2 C station error has been incorrectly assessed as random, and the systematic error from uncontrolled variables has been invariably neglected. The systematic errors in measurements from three ideally sited and maintained temperature sensors are calculated herein. Combined with the ±0.2 C average station error, a representative lower-limit uncertainty of ±0.46 C was found for any global annual surface air temperature anomaly. This ±0.46 C reveals that the global surface air temperature anomaly trend from 1880 through 2000 is statistically indistinguishable from 0 C, and represents a lower limit of calibration uncertainty for climate models and for any prospective physically justifiable proxy reconstruction of paleo-temperature. The rate and magnitude of 20th century warming are thus unknowable, and suggestions of an unprecedented trend in 20th century global air temperature are unsustainable.
INTRODUCTION
The rate and magnitude of climate warming over the last century are of intense and
continuing international concern and research [1, 2]. Published assessments of the
sources of uncertainty in the global surface air temperature record have focused on
station moves, spatial inhomogeneity of surface stations, instrumental changes, and
land-use changes including urban growth.
However, reviews of surface station data quality and time series adjustments, used
to support an estimated uncertainty of about ±0.2 C in a centennial global average
surface air temperature anomaly of about +0.7 C, have not properly addressed
measurement noise and have never addressed the uncontrolled environmental
variables that impact sensor field resolution [3-11]. Field resolution refers to the ability
of a sensor to discriminate among similar temperatures, given environmental exposure
and the various sources of instrumental error.
In their recent estimate of global average surface air temperature and its uncertainties,
Brohan, et al. [11], hereinafter B06, evaluated measurement noise as discountable,
writing, “The random error in a single thermometer reading is about 0.2 C (1σ) [Folland,et al., 2001] ([12]); the monthly average will be based on at least two readings a day throughout the month, giving 60 or more values contributing to the mean. So the error
in the monthly average will be at most 0.2 /sqrt60= 0.03 C and this will be uncorrelated with the value for any other station or the value for any other month.
Paragraph [29] of B06 rationalizes this statistical approach by describing monthly surface station temperature records as consisting of a constant mean plus weather noise, thus, “The station temperature in each month during the normal period can be considered as the sum of two components: a constant station normal value (C) and a random weather value (w, with standard deviation σi).” This description plus the use of a 1 / sqrt60 reduction in measurement noise together indicate a signal averaging statistical approach to monthly temperature.
…
I and the volunteers get some mention:
The quality of individual surface stations is perhaps best surveyed in the US by way of the commendably excellent independent evaluations carried out by Anthony Watts and his corps of volunteers, publicly archived at http://www.surfacestations.org/ and approaching in extent the entire USHCN surface station network. As of this writing, 69% of the USHCN stations were reported to merit a site rating of poor, and a further 20% only fair [26]. These and more limited published surveys of station deficits [24, 27-30] have indicated far from ideal conditions governing surface station measurements in the US. In Europe, a recent wide-area analysis of station series quality under the European Climate Assessment [31], did not cite any survey of individual sensor variance stationarity, and observed that, “it cannot yet be guaranteed that every temperature and precipitation series in the December 2001 version will be sufficiently homogeneous in terms of daily mean and variance for every application.”
…
Thus, there apparently has never been a survey of temperature sensor noise variance or stationarity for the stations entering measurements into a global instrumental average, and stations that have been independently surveyed have exhibited predominantly poor site quality. Finally, Lin and Hubbard have shown [35] that variable field conditions impose non-linear systematic effects on the response of sensor electronics, suggestive of likely non-stationary noise variances within the temperature time series of individual surface stations.
…
…
The ±0.46 C lower limit of uncertainty shows that between 1880 and 2000, the
trend in averaged global surface air temperature anomalies is statistically
indistinguishable from 0 C at the 1σ level. One cannot, therefore, avoid the conclusion
that it is presently impossible to quantify the warming trend in global climate since
1880.
The journal paper is available from Multi-Science publishing here
I ask anyone who values this work and wants to know more, to support this publisher by purchasing a copy of the article at the link above.
Congratulations to Mr. Frank for his hard work and successful publication. I know his work will most certainly be cited.
Jeff Id at the Air Vent has a technical discussion going on about this as well, and it is worth a visit.
Discover more from Watts Up With That?
Subscribe to get the latest posts sent to your email.

As others have already noted, this paper adresses the measurement ACCURACY at a SINGLE station. But what is of real interest on a global climatic basis is the RESOLUTION of changes in the AGGREGATE of all stations. The year-to-year VARIABILITY of the aggregate mean for relatively small samples of stations repeatedly shows agreement well within ~.25K at the 2-sigma level. That provides a more appropriate bound for the sampling and measurement uncertainty associated with climate indices. This, of course, does not apply to the effects of any time-dependent BIAS in the data, which is the ultimate source of uncertainty in estimating secular trend.
I also have some professional CO2 measurement experience.
The amount of errors in temperature measuring systems is actually quite minor compared to the amount of possible errors in CO2 measurement systems.
Take any CO2 data presented with a large dose of skepticism, unless the complete parameters regarding the measurement system are openly disclosed.
@Patrick Frank
So the actual global average temperature change 1880-2000 could conceivably be any value of 0C to +1.75C. Unless one can show why there’s a systematic error favoring the lower end of the range or the higher end of the range pretty much everyone is going to presume the middle of that range is accurate. If it were just a single instrument the presumption wouldn’t be made but we’re talking thousands of instruments with a good deal of corroboration & agreement with the commonly employed observation methods from different types of instruments ranging from high precision/accuracy laboratory measurements to satellites, to say nothing of other means of corroboration like sea level rise, glacier retreat, sea ice changes, and so forth. It also happens to agree with what the physics of GHGs predict we should see if the physics are correct.
This attempt to indict the presumed temperature rise isn’t going to fly. Not one tiny bit. A large number of skeptics might buy it but they’ll pretty much buy anything that is agreeable with their skeptical opinions just like AGW faithful will buy anything that agrees with theirs. Nice try but no cigar.
Hi Pat,
I just read the article of Antony regarding your paper “UNCERTAINTY IN THE GLOBAL AVERAGE SURFACE AIR TEMPERATURE INDEX: A REPRESENTATIVE LOWER LIMIT” Congratulations for that.
I have purchased it right now, but had´nt time to read it carefully. But from the abstract I see that you came to an very similiar result as I, when I issued my dissertation about data quality of recent historical temperature and sea level measurements for the time until 1880 at german Leipzig University in march 2010. It is written in german language, and till today not reviewed by external peers, so I couldn´t publish it before peer review (to become a PhD) is finished.
Different from you, I did´nt feel able to calculate any boundary or lower/upper limit for the uncertainty, but found out that the remaining systematic and coarse errors within the data will at least exceed the total increase of last century.
I found three major components:
1. Urban heat Island effect and associated errors guessing it may be in the range of +0.4 til +0.6 K. Based in part on Antony´s surface station project and papers of many others.
2. Same value might have the systematic error obtained by using (up to) 100 (!) but mainly 3 different algorithms to calculate the daily mean temperature. This had never been corrected and might show again 0.4 to 0.6 K- Due to the fact that we cannot find out wether this error is positive only or not, one might guess it may be ± 0.2 till ± 0.3 K. F.e. German Met office changed 2002 from “Mannheimer Stunden” algorithm to hourly (24 x) mean value calculation. The result show an increase of + 0.1 K between the 2 algorithms, since that time.
3. The rest consists of a collection of systematic errors (bias): which can be named:
Painting error (Antonys work), variations of sensors – thermometer replacement to electronic higher sensible ones- measurement height of weather shed (how high is the thermometer installed above ground: this varies historically from 1.2 m to 3.2 m), ground changes on places where weather sheds had been installed. Differences between SST and MAT (completely unknown, especially in higher latitudes with air temperature below zero °C), and finally coverage error due to sparse numbers of stations with small coverage of area) on both land as well as sea.
All uncertainties may sum up to 1 K or more. But “very likely” almost sure not less.
As a side result my work shows that mean global temperature – if you insist on calculating this nonsense figure- should be at least 1 to 1.5 K lower than Phil Jones (guessed) number of 14° C. This is due to the fact, that weather shed temperature is nearly always higher than true temperature outside. Which will be neither measured nor estimated. WMO itself defines temperature is that within the shed.
As soon as I have read your paper I will tell you more. Please feel free to ask for more details
best regards
Michael Limburg
Vizepräsident EIKE (Europäisches Institut für Klima und Energie)
Tel: +49-(0)33201-31132
http://www.eike-klima-energie.eu/
Michael Limburg says:
January 22, 2011 at 3:19 am
Hi Pat,
I just read the article of Antony regarding your paper “UNCERTAINTY IN THE GLOBAL AVERAGE SURFACE AIR TEMPERATURE INDEX: A REPRESENTATIVE LOWER LIMIT” Congratulations for that.
I have purchased it right now, but had´nt time to read it carefully.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Or, if you read the Air Vent;
http://noconsensus.wordpress.com/2011/01/20/what-evidence-for-unprecedented-warming/
“This analysis is now out in Energy and Environment [6], and anyone who’d like a reprint can contact me at pfrank830 AT earthlink DOT net.”
And get a copy for free, as I did, and have already read the whole thing yesterday, as I did. 🙁
@David L. Hagen:
“NIST provides an introduction: Essentials of expressing measurement uncertainty”
Bad link to NIST.
Thanks to everyone for your continued interest. There are lots of cogent comments here, but I hope you don’t mind that I choose out only a few for reply.
eadler, you’re right to be skeptical. In the article, I point out that the problem with the common estimate of random station error, is that it’s a guesstimate and is in fact not known to be random. That changes its statistics. To answer your other question, the systematic error due to uncontrolled environmental variables does vary with time — virtually minute by minute as Domenic pointed out. Domenic also mentions the very interesting problem produced by a change in instrumental time constant. The LIG thermometers were phased out after 1990. As he notes, one wonders, now, how much of the recent higher temperatures can be assigned to the much smaller time constant of modern electronic sensors.
Ira, the uncertainty bars don’t say anything about where the temperature trend may be. They say that we don’t know where the trend is within their limits.
There is a huge number of possible trends within those limits. So, supposing that the limits allow warming of 1.8 C or a cooling of 0.2 C picks out two possibilities among very many. But those two, individually, are very improbable given the huge number of possible trends. So, no one can make any hay from them, and we needn’t worry about them.
Agust, Kenneth Hubbard and Xiaomao Lin at University of Nebraska have published on the accuracy of electronic sensors in the field. Anthony has blogged about their work, and his article is worth a careful look.
JJ, the problem is two-fold. Guesstimated errors are not known to be random and propagate as 1/sqrt[N/(N-1)], which approaches the magnitude of the original guesstimate as N becomes large.
Likewise, systematic error is unlikely to be correlated globally, but it’s not random either. Nor are systematic error variances known to be normally distributed. So, that error propagates as 1/sqrt[N/(N-1)] as well. The problem with the global temperature record is that it’s been promoted to enormous scientific and political importance without ever having been globally validated. We’ll see about longevity.
Guenther, let’s not go too far. It’s just basic error analysis. On the other hand, Jim Hansen is on record saying that, “If anybody could show that the global warming curve was wrong they would become famous, maybe win a Nobel Prize.” If Jim Hansen wants to recommend Anthony, Warwick Hughes, myself, and Joe D’Aleo to Stockholm, I wouldn’t mind splitting the windfall. Maybe then I could finally buy a house and move out of my apartment. 🙂
Doug, I think you’ve made a very good analysis. I’ve been looking at Sea Surface temperatures, and especially in the earlier record, the problems you describe show up in serious spades there.
Scott, you’re right, the (+/-)0.46 C lower limit is a “best case.” Anthony’s work shows the true error is certainly much larger.
EFS_Junior says:
January 22, 2011 at 8:54 am
Junior, glad to hear you read the whole paper. I’ll try to get to it when I get more time (I’ve been hammered with work the last few months and will through at least the first week of Feb). I’d be curious to hear your thoughts on it.
Quickly glancing through the summary and some comments here, the results look interesting, but I don’t know if there’s much impact to it with the exception that uncertainties may now be better quantified than previously. The slope in temps is obviously unchanged, and presumably the uncertainty in that slope has increased a bit and that’s all. Is that a valid interpretation?
Thanks,
-Scott
Sky wrote, “As others have already noted, this paper adresses the measurement ACCURACY at a SINGLE station.”
This isn’t strictly correct. The paper addresses the measurement accuracy found during a very careful and extended field calibration experiment that extensively tested multiple sensors and shields. That isn’t a single station. It’s an experiment that has relevance to the accuracy of surface temperature measurements at all stations. To deny that relevancy is to deny the principle of replication that is the very basis of empirical science.
But Sky’s incorrect point opens the door to some history about the paper. Before Submitting to E&E, I submitted a longer paper to the AMS Journal of Applied Meteorology and Climatology. That paper was later divided into two shorter papers for E&E. The second will be published in a later issue.
But at JAMC, the original manuscript went through three rounds of review, which took a full year, before being rejected on Sky’s grounds. All that took 38 pages of reviews and responses, which far from the recent 88 page standard of the OLMC Steig critique, but it was a personal best. 🙂
The story is classic climate science. In the first round, the paper was conditionally rejected, but reviewers “A” and “C” gave constructive comments; especially reviewer “A.” Reviewer “B” was critical to the point of hostility, essentially calling it the worst paper he’d ever read, that it contained “scientific horrors,” and that I was guilty of a “bait-and-switch” tactic; guilty of dishonesty, in other words.
The re-submission was heavily revised, especially in light of the helpful comments of reviewer “A.” In reply after the second review, the editor said that previous reviewer “C” had declined a re-review. This usually means that a reviewer has read the revised ms, and is now in accordance with publication. But the editor substituted a new third reviewer, who now was reviewer “B”. The hostile reviewer, previously “B,” became reviewer “C.”
In this case, both previous reviewer “A” and the new reviewer “B” recommended publication. Reviewer “A” also wrote that, “The material seems appropriate for the issues at hand. The author’s revision is clear, well-organized, and to the point.”
Reviewer “B” wrote, “The study by author in this paper is a good attempt to explore the possibilities of uncertainty existing in the global surface temperature data sets. Author also did an excellent literature review regarding the data quality and averaging uncertainty analysis. Author’s presentation is clear and well-organized.”
Reviewer “C” wrote that, “the [author’s] uncertainty bars assigned to the global temperature curve (Fig. 6) are unjustified because they are derived from the uncertainty estimates from a single station. The author therefore makes no accounting for the reduction in error that results from the overwhelming redundancy in the global observing system … For this reason, I must again recommend rejection of the paper.”
It’s clear that for Reviewer “C,” the random nature of measurement error is an assumption, when good empirical science requires that it must be demonstrated. That point is, and was, addressed in detail in the submitted paper and in my response to that reviewer. But somehow reviewer “C” either didn’t read that part of the study, or else decided to ignore it. He never actually addressed the argument presented in the paper. He also clearly did not understand the concept of a widely applicable lower limit of error obtained by testing an instrument under ideal operating conditions; something no other reviewer had any apparent trouble understanding.
The third round solely concerned the comments of reviewer “C.” In this round, he repeated the objections he’d made earlier. For example, he wrote, “Unfortunately, in spite of the equations and assumptions that the author presents, his conclusion can only be reached by dismissing the huge redundancy in the observational networks as well as the close agreement of temperature trends from sensors with different design and error characteristics.”
In reply, I wrote, “Once again: when noise is not known to be stationary, the degree of measurement redundancy is virtually irrelevant to the magnitude of propagated uncertainty. This straightforward rule of measurement statistics has consistently escaped the Reviewer’s grasp. The analysis within [the manuscript] makes no assumptions about noise error. Rather, there is an assumption of stationary noise within Brohan, et al., 2006, which is shown to be unjustifiable in light of the published record. The Reviewer here has supposed an assumption not in evidence, and has ignored a falsified assumption in plain view.”
At the end of that, things got a little irregular. Three of four reviewers were now evidently favorably disposed, and one was opposed. Instead of making a critical judgment, the editor solicited the further views of two Associate Editors.
In his final letter, the manuscript editor just paraphrased the views of the two AEs. He said one AE thought the study should be published elsewhere (i.e., no mention of any scientific problem) and the other AE agreed with the “single station” objection and thought it was fatal. The manuscript editor agreed with the second AE.
When I asked for a copy of their actual comments, the editor declined to reveal them. So, I had no opportunity to evaluate the scope of their actual criticisms, and compose a proper reply to them. In all my prior experience publishing in scientific journals, I’ve never been disallowed from seeing the actual reviewer comments before, and consider that to be a very irregular practice.
So, in short, a majority of the reviewers and one AE passed the study. Apparently they understood the criticism of an unwarranted assumption about random error, and understood the concept of an instrumental lower limit obtained under ideal conditions.
One reviewer apparently understood neither concept, along with one AE and apparently the editor. At the end, the editor went with the minority view and rejected the manuscript.
At that point, I thought of perhaps submitting to an AGU journal, but on considering that organization had taken a public and partisan stand on AGW, I couldn’t see risking another year of difficult reviews.
So, I went to E&E expecting the usual scientific standard of a relatively dispassionate critical review, and that’s what I got from their two reviewers. Neither of them raised any objection to the idea of an instrumental lower limit. And, in fact, one of E&E’s reviewers found an error in one of the equations that had gotten by three rounds at JAMC.
sky also wrote that, “The year-to-year VARIABILITY of the aggregate mean for relatively small samples of stations repeatedly shows agreement well within ~.25K at the 2-sigma level. That provides a more appropriate bound for the sampling and measurement uncertainty associated with climate indices.
This is the usual approach in surface climate science, to examine the numbers themselves and look at their aggregate variance as a guide to their validity without paying any attention to the instruments. This reveals nothing about any systematic error that may be in the data.
Systematic error does not cancel in an average. It just averages as the data. There hasn’t been any survey for how instrumental systematic error may correlate across small regions. So, to suppose that correlated trends imply low-error data is to assume what should be demonstrated.
Whether trends approximately reproduce locally doesn’t reveal anything about instrumental accuracy in the subordinate data. The only way to assess the magnitude of systematic error is to test an instrumental measurement against a known standard.
Dave, why would any scientist assume that the middle of an uncertainty range is accurate. It’s no more accurate than the fringes. If not the scientists, who is “everyone,” and why should their uninformed opinion matter?
“we’re talking thousands of instruments with a good deal of corroboration & agreement…” Are we? We’re talking about instruments that have been unassessed with respect to systematic error, whose noise variances are not known to be random, and that have been shown in the US to be predominantly poorly sited. Elsewhere, apart from Europe, is likely to be worse. What is the meaning of agreement among such instruments? Asserting conclusions based on correlations among poorly functioning instruments is hardly science.
We all know that invoking melting glaciers, etc., is a specious argument, so don’t even try. No one here is denying that the climate has warmed. The question concerns metrics.
We also know that “the physics of GHGs predict” nothing about small changes in climate, so please don’t try that around here, either.
As to the rest, we’ll see.
Pat Frank says:
January 22, 2011 at 7:51 pm
Hi Pat,
Thanks for all the gory details on your peer review experience. I’ve published 7 peer-reviewed publications as first author, and I’ve never had an experience like you describe. It really goes to show how political the process can be (most of my work had little or no political relevance on a larger scale, so I avoided those problems).
Heck, several times I had two reviews, one favorable and the other not so much and the editor chose to publish anyway. It seems that when politics aren’t involved the general approach is to publish and if the work has some issues it will become known by the readers easily enough. Basically a “if in doubt, put it out there anyway” kind of philosophy. I think this philosophy is the way to go but clearly didn’t happen in your case.
Thanks again for your details,
-Scott
Scott says:
January 22, 2011 at 7:25 pm
EFS_Junior says:
January 22, 2011 at 8:54 am
And get a copy for free, as I did, and have already read the whole thing yesterday, as I did. 🙁
Junior, glad to hear you read the whole paper. I’ll try to get to it when I get more time (I’ve been hammered with work the last few months and will through at least the first week of Feb). I’d be curious to hear your thoughts on it.
Quickly glancing through the summary and some comments here, the results look interesting, but I don’t know if there’s much impact to it with the exception that uncertainties may now be better quantified than previously. The slope in temps is obviously unchanged, and presumably the uncertainty in that slope has increased a bit and that’s all. Is that a valid interpretation?
Thanks,
-Scott
_____________________________________________________________
I’ve now read through this “paper” twice, from beginning to end and every where in between. I also have most of the cited referenced, particularly reference #17, from whence the gross assumptions are made in this so called “derivation.”
Short answer? Typical E&E stuff.
Long answer? Separation of the total temperature error term (only considers the +/- term, the “paper” never discusses bias offset errors, and assumes all errors are symmetric about a zero mean) into two mutually exclusive, independent, and unrelated terms is bogus, unnecessary, and without merit.
The total error does NOT propagate through the temperature array system as sigma, it also does not propagate through the system as sigma/SQRT(N), as usual, the truth lies somewhere in between these two limits. I could post the right answer, once I’ve processed more data for different N (I have N =1, N = 7 (needs a little more work), and N = 23 so far, I need larger N before I can make a definitive statement). For N = 23 the total error is reduced by a factor of ~2 (autocorrelation is your friend, but it also complicates things, as I have to first remove all artifacts of autocorrelation, I done so successively for the 23 HAWS (all station correlations of anomalies between any two stations has been done, the resulting plots all look Gaussian (need to do some binning to be sure) with zero mean).
Note that the 23 HAWS include the original DAILY raw Canadian data sets;
http://www.climate.weatheroffice.gc.ca/climateData/canada_e.html
The anomaly period I used was 1951-2010 (N = 60) (had to wait a few weeks from my previous anomaly base line of 1951-2009), the data sets are all of (IMHO) very high quality with all temperatures reported at 0.1C resolution (for all years I downloaded to date, 1982-2010 inclusive). My analysis have been through several iterations, to arrive at a complete and entirely internally consistent procedure for handling all individual data sets prior to extracting the low frequency temperature signature.
I have used the 23 HAWS in relationship to Arctic sea ice extent/area/volume, these stations all represent the Canadian Archipelago and Hudson Bay. There is an overall method to my madness! 🙂
The seven highest in latitude (all at or above 72N) HAWS (Alert, Eureka, Isachsen, Resolute, Mould Bay, Sachs Harbor, and Pond inlet) are there for the purpose of the core MYI Arctic sea ice field which lies predominantly directly above these stations.
The results from the seven HAWS and 23 HAWS would shock you, particularly the period from the ~early 70’s to present, currently these areas are warming at ~0.25C/year (+/- 0.15 C/year). I won’t extrapolate these numbers into the future though, because, if true, we are in for some very serious problems with Arctic sea ice extents/areas/volumes.
My sincere hope is that these current temperature rates slow down, seriously.
See this post where I briefly describe my own analysis of 23 HAWS stations;
http://wattsupwiththat.com/2011/01/22/the-metrology-of-thermometers/#comment-580961
Bottom line? If the total error was as claimed, I would never be able to extract a very real low frequency temperature trend line signature with a resulting R^2= 0.99.
I’ve been doing this type of stuff, on and off for ~35 years, at the USACE ERDC CHL (US Army lab of the year 3 of the last 4 years), moored ship motion in LA/LB harbours (BTW, largest combined harbour complex in the world) (I do prefer the Canadian spelling)), where the ships are excited by the low frequency component of the total ocean wave signature, and believe me when I say that the ratios of the low frequency wave signatures are very much smaller (by several orders of magnitude) than the ratios of the low frequency temperature signatures published to date.
There’s no getting around the basic fact that the low frequency temperature signatures are indeed very real and very accurate, regardless of how one goes about describing the accuracy of the total temperature signatures.
In closing, my analysis of Arctic sea ice extents/areas/volumes and HAWS temperatures is definitely journal publication worthy, and whatever journal they do make it into, it won’t be E&E.
[Edited. Robt]
Scott says:
January 22, 2011 at 9:56 pm
Pat Frank says:
January 22, 2011 at 7:51 pm
Hi Pat,
Thanks for all the gory details on your peer review experience. I’ve published 7 peer-reviewed publications as first author, and I’ve never had an experience like you describe. It really goes to show how political the process can be (most of my work had little or no political relevance on a larger scale, so I avoided those problems).
Heck, several times I had two reviews, one favorable and the other not so much and the editor chose to publish anyway. It seems that when politics aren’t involved the general approach is to publish and if the work has some issues it will become known by the readers easily enough. Basically a “if in doubt, put it out there anyway” kind of philosophy. I think this philosophy is the way to go but clearly didn’t happen in your case.
Thanks again for your details,
-Scott
_____________________________________________________________
Actually Scott I agree with you 100% that this “paper” should have been given in a much higher profile in the well respected peer reviewed climate science journal. Or any other journal for that matter.
It does spur debate, and follow-up publications, either supporting or rejecting the basics presented in any paper.
Getting it published in E&E, it sort of lies dormant relative to the broader climate science community majority.
The AGW skeptics eat it up though, as is quite evident in the metrology thread.
Where is true agnostic skepticism when you need it?
I really do wish that others in the AWG community (or the climate science community in total (and yes that includes all the skeptics too)) would take the time to read this “paper” and go through the “theory” in greatest detail.
As it sits now, apparently no “respectable”climate scientist (and that includes all climate scientists, skeptical or not) is likely to take serious notice.
Sad but true IMHO.
I’d like to call attention to Michael Limburg’s post above about his Ph.D. work. It looks quite comprehensive, and if so, publication of the results will finally disabuse everyone of the notion of climatologically useful accuracy in the 20th century global surface air temperature record.
Scott, thanks. Steve McIntyre has published that kind of story about his review processes. Ross McKitrick has several similar stories — I’d guess one for each submission that challenged the prevailing climate wisdom. So does Richard Lindzen, and we’ve all read the story of the OLMC 88 pages. For my Skeptic article, one reviewer gratuitously accused me of scientific dishonesty. If, after this is all over, some social scientist were to interview climatologists, I’d suspect a large number of presently invisible but discouraged scientists would be found, who had been similarly oppressed. The evidence seems to be that unfair reviews are at least a common minority practice in climatology, but one which does not bring a reprimand or disqualification from journal editors. It should do. After all, climate science is about physics, not a branch of philosophy where peer review is apparently often about who one supports. I certainly have never, ever experienced such attacks in Chemistry, and have heard of none.
EFS, thanks for your sympathy, despite the quotation marks.
By the way, here’s the title and abstract of the second paper, to also be published in E&E and then ignored. This work was the other part of the original longer paper rejected by JAMC.
Title: “Imposed And Neglected Uncertainty In The Global Average Surface Air Temperature Index”
Abstract: “The statistical error model commonly applied to monthly surface station temperatures assumes a physically incomplete climatology that forces deterministic temperature trends to be interpreted as measurement errors. Large artefactual uncertainties are thereby imposed onto the global average surface air temperature record. To illustrate this problem, representative monthly and annual uncertainties were calculated using air temperature data sets from globally distributed surface climate stations, yielding (+/-)2.7 C and (+/-)6.3 C, respectively. Further, the magnitude uncertainty in the 1961-1990 global air temperature annual anomaly normal, entirely neglected until now, is found to be (+/-)0.17 C. After combining magnitude uncertainty with the previously reported (+/-)0.46 C lower limit of measurement error, the 1856-2004 global surface air temperature anomaly with its 95% confidence interval is 0.8(+/-)0.98 C. Thus, the global average surface air temperature trend is statistically indistinguishable from 0 C. Regulatory policies aimed at influencing global surface air temperature are not empirically justifiable.”
The (+/-)0.17 C “magnitude uncertainty” in the normal mean is the standard deviation of the 1961-1990 yearly anomalies around the CRU mean anomaly normal of the same period. This gives a measure of the climate ‘jitter’ during the normal period, and amounts to an uncertainty in the magnitude of the normal mean. As such, it must be propagated into the annual anomalies that are calculated by subtraction from the normal mean. This statistical practice, after all, is merely the standard for calculating empirical uncertainties in physical science.
Anyway, it was also interesting to discover that the ‘jitter’ standard deviation of annual anomalies over the total of years from 1856 – 2004, relative to the normal mean, was (+/-)0.28 C. The paper points out that if the governing climate regime was unchanging for the duration, then (+/-)0.28 C is a 1-sigma measure of the “natural variability” of climate during this entire period.
The 95% confidence interval of recent natural variability for 1856-2004 is then (+/-)0.56 C, which all by itself discounts causality in 70% of the total warming since 1856. That is, provisionally crediting the mean temperature trend, it’s hardly different from a random process that exhibits the pseudo-trends expected from a stochastic process plus persistence.
Pat Frank says:
January 23, 2011 at 2:32 pm
I’d like to call attention to Michael Limburg’s post above about his Ph.D. work. It looks quite comprehensive, and if so, publication of the results will finally disabuse everyone of the notion of climatologically useful accuracy in the 20th century global surface air temperature record.
_____________________________________________________________
Are you sure about that? I seriously doubt that the global mean temperature curve will just go away because of one additional PhD. thesus.
_____________________________________________________________
Scott, thanks. Steve McIntyre has published that kind of story about his review processes. Ross McKitrick has several similar stories — I’d guess one for each submission that challenged the prevailing climate wisdom. So does Richard Lindzen, and we’ve all read the story of the OLMC 88 pages. For my Skeptic article, one reviewer gratuitously accused me of scientific dishonesty. If, after this is all over, some social scientist were to interview climatologists, I’d suspect a large number of presently invisible but discouraged scientists would be found, who had been similarly oppressed. The evidence seems to be that unfair reviews are at least a common minority practice in climatology, but one which does not bring a reprimand or disqualification from journal editors. It should do. After all, climate science is about physics, not a branch of philosophy where peer review is apparently often about who one supports. I certainly have never, ever experienced such attacks in Chemistry, and have heard of none.
_____________________________________________________________
You should try to get SM’s attention over at CA. Perhaps Anthony can help you out over there. One thing’s for certain with CA, spirited debate.
_____________________________________________________________
EFS, thanks for your sympathy, despite the quotation marks.
_____________________________________________________________
Go ahead and take out the quotation marks then, I do need to turn down my snark meter below 11teen!
This will be a point-by-point rebuttal of EFS_Junior’s criticism.
First, as we go through this, keep in mind that my paper discusses unaddressed uncertainty in the standard calculation of global average surface temperature, not in recent trends in regional temperatures that form the base of EFS_Junior’s critique.
However, to EFS_Junior’s answer about why my analysis is wrong:
Point 1: “Long answer? Separation of the total temperature error term … into two mutually exclusive, independent, and unrelated terms is bogus, unnecessary, and without merit.”
Reply 1: Let’s look at what’s done in Brohan, 2006: Under Section 2.3.1 “Station Errors,” they write T_actual = T_ob + e_ob + C_H + e_H + e_RC, where T_ob, e_ob are observed temperature and its error, C_H, e_H are homogenization and its error, and e_RC is miscalculation or misreporting error.
So, Brohan, et al. separated their temperature errors into three (not two) mutually exclusive, independent, and unrelated terms, and presumably that is bogus, unnecessary, and without merit.
EFS_Junior mentioned in his preamble that he has a copy of my article reference 17, which is Bevington and Robinson, “Data Reduction and Error Analysis for the Physical Sciences.” Pages 1&2 in B&R say this: “Our interest is in uncertainties introduced by random fluctuations in our measurements, and systematic errors that limit the precision and accuracy of our results…(italics in original). They go on to distinguish between precision and accuracy, and on page 3 between systematic errors and random errors.
On page 3, systematic errors are described as due to observer mistakes, equipment failures, and experimental conditions. Random errors are described as fluctuational. Bevington and Robinson thus distinguish these error types as mutually exclusive, independent, and unrelated as to both source and behavior.
But, according to EFS_Junior, separation of error terms into mutually exclusive, independent, and unrelated terms is bogus, unnecessary, and without merit.
In the JCGM 100:2008 “Evaluation of measurement data — Guide to the expression of uncertainty in measurement,” put out by Working Group 1 of the Joint Committee for Guides in Metrology (JCGM/WG 1), random error is orthogonally distinguished from systematic error.
Random error is “stochastic temporal and spatial variations of influence quantities [and its] expectation value is zero.” In contrast, systematic error is defined as the “effect of an influence quantity on a measurement result,” that produces a mean value different from the true value and further that, “systematic error and its causes cannot be completely known.”
Once again, error types are separated into mutually exclusive, independent, and unrelated terms, which we now know is bogus, unnecessary, and without merit..
In both Bevington & Robinson (p. 11) and in the JCGM manual (Section C.2.20), the variance about a mean goes as s^2 = [(1/(N-1)]*[sum-over-N(x-x_bar)^2]. As, by definition, systematic error yields an error mean different from zero, the dispersion of systematic error about its mean goes as s^2 and does not diminish as 1/(sqrtN). This is the approach taken in my paper, and it is the documented correct approach.
EFS_Junior’s critique against separating total error into its component types is wrong.
Further, evaluating separate sources of error is standard practice in experimental science. Doing so is really the only way to determine the contribution each sort of error makes to a measurement. EFS_Junior, in his critique, is dismissing perhaps the most critical method of assessing the meaning and significance of an experimental result.
In Brohan 2006, e_ob is represented as the guesstimated (+/-)0.2 C average of Folland, et al, 2001, which is discussed in detail in my paper.
The contribution of systematic error to e_ob, due to uncontrolled environmental variables, is not discussed at all in Brohan, 2006. This analytical lacuna is what sparked my inquiry.
Point 2: EFS_Junior also wrote that, “[he] only considers the +/- term, the “paper” never discusses bias offset errors, and assumes all errors are symmetric about a zero mean.”
Reply 2: The first part of my paper describes basic sorts of random error, and with that as context goes to show that the average (+/-)0.2 C error taken by Brohan, 2006, and Folland, 2001 isn’t random at all.
After that, Section 3.2.2 discusses systematic error and its sources in surface air temperature measurements. Bias mean offsets and standard deviations for three sensor systems are calculated and presented in Table 1. The top of page 978 has this sentence: “All these systematic errors, including the microclimatic effects, vary erratically in time and space [40-45], and can impose nonstationary and unpredictable biases and errors in sensor temperature measurements and data sets.”
Figures 1 and 2 show fits to three data sets of systematic error in temperature measurements. For clarity of graphical presentation, the bias means were subtracted to produce an artificial mean of zero. Maybe that misled EFS_Junior, although it’s explicitly mentioned in both figure legends.
That is, I directly and obviously discussed bias error offsets and nonstationary error, which by definition does not have a mean of zero. So, I did not assume all errors are symmetrical about a zero mean, and it’s obvious that I did not.
Somehow, EFS_Junior apparently missed all of this, even after two readings.
Let’s also add that if I had believed all errors are symmetrical about zero, I’d have had to also believe that all these errors would diminish as 1/sqrtN, and would go to zero at large N. The lower limit of error would then be uniformly zero, and my paper would have no point. So, EFS_Junior is claiming that my paper reflects an assumption about error that completely contradicts the actual and explicit analysis of error present in the paper. That, after two readings.
More about bias errors: under “Systematic Errors” page 3 of Bevington and Robinson gives an example where subtracting a bias offset error actually increased the inaccuracy of a result, because the bias correction itself was an estimate. Climate scientist colleagues of EFS_Junior regularly “correct” their data sets by subtracting estimated bias offsets, and then go on to assume improved accuracy.
Bevington and Robinson, in contrast, go on to advise that one must explicitly take account of new uncertainties that may be introduced by bias corrections. How is that done in bias-corrected temperature data sets, when the uncontrolled systematic effects on the initial measurements are unknown?
Point 3: EFS_Junior wrote, “The total error does NOT propagate through the temperature array system as sigma, it also does not propagate through the system as sigma/SQRT(N), as usual, the truth lies somewhere in between these two limits.”
Reply 3: I didn’t calculate “total error.” I calculated a lower limit of sensor measurement error, principally due to systematic effects right at the instrument.
Total error also includes discontinuities due to instrumental changes, extrapolations due to sparse data, missing data, dropped stations, moved stations, station spatial inhomogeneity, observer read bias, time-of-observation bias, albedo changes, the hotly-debated UHI, and what else? So what if these things propagate differently than 1/sqrtN, or differently from sigma itself?
My paper isn’t concerned with them. That’s why “lower limit” is included in the title and in the text.
Systematic instrumental error does propagate as 1/[sqrt(N/(N-1)], and at large N the adjudged guesstimated (+/-)0.2 C station error of Brohan, 2006/Folland, 2001 also does propagate as sigma.
All the rest of the errors must be calculated in their own particular way, and summed with the basic instrumental error to get the “total error.” All the other sources of error would merely add to instrumental error error. EFS_Junior’s point 3 is irrelevant.
Finally, the point about the HAWS data.
Point 4: EFS_Junior wrote, “for the 23 HAWS (all station correlations of anomalies between any two stations has been done, the resulting plots all look Gaussian (need to do some binning to be sure) with zero mean).” and “Bottom line? If the total error was as claimed, I would never be able to extract a very real low frequency temperature trend line signature with a resulting R^2= 0.99.”
Reply 4: This claim deserves some comment. EFS_Junior wrote that, “the data sets are all of (IMHO) very high quality with all temperatures reported at 0.1C resolution (for all years I downloaded to date, 1982-2010 inclusive).”
The newest sensors used in Canada were reported in E. Milewska and W. D. Hogg (2002) Atmos-Ocean 40, 333–359, to be “Yellow Springs International (YSI) Model 44212 thermistors in a Stevenson screen.” The Campbell Science manual (pdf download) reports the manufacturer specification average accuracy for these probes to be (+/-)0.1 C. Well and good.
K. G. Hubbard and X. Lin (2002) GRL 29, 1425 (doi:10.1029/2001GL013191) reported on the field resolution of the PRT HMP45C probe (pdf download), also inside a Stevenson screen. The HMP45C probe has a closely comparable manufacturer’s specification of (+/-)0.2 C accuracy at 20 C.
Systematic error reduced the field resolution of the HMP45C in a Stevenson screen to an average bias offset of 0.34(+/-)0.53 C (article Table 1). Subtracting 0.34 C from all the recorded temperatures would not remove the dispersion uncertainty of (+/-)0.53 C. The (+/-)0.53 C propagates as 1/(sqrt[N/(N-1)] into any average of that data.
Maybe in Canada the resolution of thermistor probes is not degraded by systematic effects in the field while housed inside Stevenson screens, but unless that has been demonstrated, EFS_Junior’s claim of (+/-)0.1 C accuracy is no more than special pleading.
It’s worth quoting Milewska and Hogg a little more extensively: “Climatological records of high temporal resolution have been generating interest recently, because of their direct applicability to climate change impact studies. Finding adjustment factors for these daily and sub-diurnal observations – synoptic, hourly – is a challenging task. A single bias adjustment value will not work well on any day or time of the day, it might even introduce additional uncertainty by over or under correcting on a given day. The magnitude of the adjustment factor depends on meteorological conditions and thus should vary from one day to another. For example, wind speed and cloudiness are the two main controlling weather elements that determine the value of the adjustment in the case of temperature. Larger factors may be required for calm clear nights, when siting biases are magnified due to the increased response of the ground surface to radiative cooling… (bolding added)”
Milewska and Hogg seem to know about B&R’s caution concerning bias offset subtraction. One is led to wonder whether EFS_Junior worried about whether any bias removal might degrade his data further when he compiled his temperature series. Did he know even the average magnitude of any systematic biases in his data? Well, but of course the reported accuracy was (+/-)0.1 C.
Milewska and Hogg don’t report any attempt to quantify the field resolutions of their YSI sensors in Stevenson screens by comparison with a high-precision sensor like the R. M. Young aspirated probe, but do mention the manufacturer’s estimated field resolution. Thus, Milewska and Hogg: “The sensors are generally reliable with manufacturer specified accuracy, which is the closeness of the agreement between the result of a measurement and the “true” value, of (+/-)0.3°C.”
So, Canadian AWOS and RTD sensors in Stevenson screens are admitted to have a field accuracy of (+/-)0.3 C. If we believe Hubbard and Lin’s actual field test, they’re likely to have an accuracy of no better than (+/-)0.5 C. But, they were reported to (+/-)0.1 C, and we’re told that was apparently good enough to trust uncritically.
EFS_Junior reports that arctic “areas are warming at ~0.25C/year…“, which is 0.05 C inside the estimated inaccuracy envelope reported by Milewska and Hogg, and half the 1-sigma uncertainty reported by Hubbard and Lin. So, EFS_Junior’s “(+/- 0.15 C/year)” is clearly over-optimistic.
Note that the (+/-)0.3 C is an accuracy measure, not a precision measure. There is no reason to think that in the field the true inaccuracy is symmetric about a mean of zero. An assumption that the field inaccuracy diminishes as 1/(sqrtN) is empirically unjustified.
Finally, point 5: EFS_Junior wrote, “Bottom line? If the total error was as claimed, I would never be able to extract a very real low frequency temperature trend line signature with a resulting R^2= 0.99.”
Reply 5: EFS_Junior emailed me about this earlier. In reply, I pointed out that if sensor measurement error = e_tot = e_sys + e_ran, then averaging daily temperatures would make e_ran diminish as 1/(sqrtN). However, e_sys remains undiminished, and would just factor into the mean. The systematic error would be completely invisible, and hide its contribution within the data. That is, without prior methodological testing, data + systematic error looks just like data + no error. Apparently EFS_Junior found this very standard caution unconvincing.
In the case of data distorted by resident e_sys, any trend that emerges would be contaminated by the uncompensated systematic error. This error may even reflect the degradation of the sensor over time. Any “true” trend may be greater or smaller than the observed trend, but whatever the true trend is, it’s unknown when systematic error is uncompensated.
Even comparingf trends from adjoining stations, to show “homogeneity” of data, gives no reason to dismiss the impact of systematic error. Uncontrolled environmental variables can be regional as much as local. Regionally extensive environmental variables can impact regional sensors in analogous systematic ways, to impose analogous systematic errors. In data averaged over longer times, one might expect the systematic effects of regional environmental variables to emerge most strongly. It’s conceivable that the systematic effects that follow meteorological trends (e.g., of trends in insolation or wind) could impose analogous but spurious low-frequency trends on independent data sets from stations scattered across a contiguous region.
This possibility has never been tested — at least to my knowledge after searching the literature. So, EFS_Junior’s claim of “never be able to extract…” is without empirical merit.
On the other hand, I do know of one sensor test of a different sort that does indicate the possibility of regionally extensive environmental systematic effects on temperature sensors. At some point I intend to write this up, along with some related analysis.
The rest of EFS_Junior’s critique was polemical, with an appeal to authority and considerable gratuitous disrespect directed toward E&E. There’s no need to reply to that.
EFS_Junior: “I seriously doubt that the global mean temperature curve will just go away because of one additional PhD. thesus.”
Newtonian physics was overturned by one additional patent clerk. Other examples abound.
I’m reposting this from tAV. Anthony expressed a similar sentiment at the head of this post:
I’d like to add that though I’ve offered to send reprints on request, I’d ask that those of you who have access to academic accounts, or who have fine personal incomes, to please purchase the article from Energy and Environment, here.
Energy and Environment has proved to be one of the few remaining journals in climate science where one can anticipate a uniformly dispassionate and even constructive critical review. In the principled stand of its editor, Dr. Sonja Boehmer-Christiansen and its Multi-Science publisher, Dr. William Hughes, E&E has been thoroughly in support of open and transparent science when so many have abridged it.
The journal merits support, and deserves to recover their fair profit for publishing my article.
Thanks very much,
Pat
Pat Frank says:
January 23, 2011 at 10:34 pm
Reply 5: EFS_Junior emailed me about this earlier. In reply, I pointed out that if sensor measurement error = e_tot = e_sys + e_ran, then averaging daily temperatures would make e_ran diminish as 1/(sqrtN). However, e_sys remains undiminished, and would just factor into the mean. The systematic error would be completely invisible, and hide its contribution within the data. That is, without prior methodological testing, data + systematic error looks just like data + no error. Apparently EFS_Junior found this very standard caution unconvincing.
In the case of data distorted by resident e_sys, any trend that emerges would be contaminated by the uncompensated systematic error. This error may even reflect the degradation of the sensor over time. Any “true” trend may be greater or smaller than the observed trend, but whatever the true trend is, it’s unknown when systematic error is uncompensated.
Even comparingf trends from adjoining stations, to show “homogeneity” of data, gives no reason to dismiss the impact of systematic error. Uncontrolled environmental variables can be regional as much as local. Regionally extensive environmental variables can impact regional sensors in analogous systematic ways, to impose analogous systematic errors. In data averaged over longer times, one might expect the systematic effects of regional environmental variables to emerge most strongly. It’s conceivable that the systematic effects that follow meteorological trends (e.g., of trends in insolation or wind) could impose analogous but spurious low-frequency trends on independent data sets from stations scattered across a contiguous region.
This possibility has never been tested — at least to my knowledge after searching the literature. So, EFS_Junior’s claim of “never be able to extract…” is without empirical merit.
_____________________________________________________________
You need to simlify your reply above. As it stands now I don”t even know what you mean.
The bottom line is that 23 HAWS stations all show (roughly) the same low frequency trend lines. And that’s just a plain cold harf fact. 23 HAWS stations all showing the same systematic errors? I think NOT!
Further you claim that systematic error eould be “would be completely invisible” well than, that works for me! This is the same as stating that systematic error averages out to zero for each station or to the ensamble average of any N stations.
Good to know that, thanks for providing the necessary confirmation.
Your second paragraph makes no sense whatsoeven, and is quite obviously circuitous. If this argument were actually true, than no analyses could ever be conducted on any raw data set whatsoever. It’s akin to saying collect the raw data, than do nothing with the raw collected data. Sad, really sad.
Your third paragraph plays with the concept of region, so is the Northern Hmisphere (NH) a region? Because I’m quite sure that I can show some autocorrelation/cross correlation with any two NH regional stations simply due to similar seasonal and diurnal behaviors (note that these would have to be cross correlated first to determine the diurnal lag coefficient). This also plays somewhat into the LLN, as it would be virtually impossible (p ~ 0) to take multiple stations from any region (no matter how small or how large) with the expectation (p ~ 1) that all these stations used the exact same sensors over time, have identical systematic errors over time, etceteras.
Autocorrelations exceeding 0.8, 0.9, 0.99, 0.999 with slopes of 0.98, 0.99, 1.00, 1.01, 1.02 for any two stations, is empirical proof that there is an underlying relationship between them, that whatever errors exist in these measurements, that these errors are not systematic in nature, to the degree, or in the manner, that the author claims.
It’s akin to saying “you can’t do that because x, y, z, …” but than doing so anyway, and than producing results with R^2 > 0.99 (and no, the “correlation does not causation” argument holds no water here, as the comparison is between temperature records from two random stations, one is not causing the other, both are measuring a nearly identical response to TBD external forcings).
In your 4th paragraph/sentence you again make a baseless statement, as empirical evidence abounds as to the sameness of the global temperaure trends (land based and satellite eras, a random selection of a small group of land based records, say, for example, 10 < N < 100), that these trends are very similar for the 23 HAWS as well as to larger networks (but with lower trend lines for the global mean temperature trends), that these 23 HAWS trend lines are, in the aggrigate, quite similar in shape to the global trend lines.
In short, saying something is "without empirical merit" flies directly in the face of the of the wealth of empirical evidence that does exist with respect to low frequency temperature trendlines.
So obviously, the author clearly does not understand the issues at hand, thus the author's so called "theory" is bogus, unnecessary, and truly without merit.
In closing, any temperature record has a total error associated with it, whatever that error is, it does not stop one from extracting very real low frequency information with an associated very high degree of statistical confidence.
If this were not true, if this were not the case, than all low frequency temperature trand lines would indeed exhibit truly random behaviors, with R^2 approaching zero in all cases. That because we can demonstrate well defined low frequency trend lines with associated high degrees of statistical confidence, that we can choose large N, and still obtain well defined low frequency trend lines with associated high degrees of statistical confidence, suggests umabigiously, based on the LLN, that these emperical results are real with probability approaching one (p ~ 1).
Reply, Part 1.
This Part 1 of what will be a point-by-point reply to EFS_Junior’s follow-up critique of my article. I’ll begin with what he did discuss, and finish with what he did not. EFS_Junior’s comments will be enquoted.
EFS Point 1. “You need to simplify your reply above. As it stands now I don’t even know what you mean.”
Reply 1. I wish you’d been more specific. But assuming it’s the bit about e_tot: the total instrumental error in any single measurement, e_tot, will be the sum of the random error e_ran and the systematic error e_sys. Each kind of error contributes a spurious magnitude to the observation.
In measuring a temperature, for example, the magnitude of the observed temperature “t_i” is a sum of the “true” magnitude, “tau_i” plus the magnitudes of all the errors affecting that particular measurement.
So for any measured temperature, t_i = tau_i + e_ran_i + e_sys_i, where “i” is the measurement index (i = 1,2, …, n).
As usual, e_ran is stationary by definition. When multiple measurements of temperature are averaged, the total of e_ran will diminish as 1/sqrtN in the mean temperature, T_bar. The details of this are discussed in Cases 1-3 in my paper.
However, e_sys is not stationary. It typically arises from uncontrolled variables that are of unknown magnitude and unknown variability. Therefore, e_sys can vary between sequential measurements, doesn’t have a mean of zero, and permanently impacts the magnitude of the measurement.
The e_sys of each observation enters into any mean of multiple observations. In the mean, the total of e_sys goes as (1/(N-1)*sqrt[sum-over-N(e_sys_i)^2], and never diminishes to zero. At large N, the total of e_sys approaches (e_sys)avg, and the mean temperature is T_bar(+/-)(e_sys)avg.
The only way to know the magnitude of e_sys(avg) in a set of measurements is to have done prior tests of the methodology, using the same instrument to measure precisely known standards under conditions a close as possible to those to be used for the experimental measurements.
This is the message in the statistical sources mentioned in my previous post.
EFS Point 2. “The bottom line is that 23 HAWS stations all show (roughly) the same low frequency trend lines. And that’s just a plain cold harf fact. 23 HAWS stations all showing the same systematic errors? I think NOT!”
Reply 2. First, I refer you to the classic paper of Hansen and Lebedeff [1]. I’m sure you have a copy. Please look at their Figure 3, where, Hansen and Lebedeff show pair-wise correlation coefficients of temperature measurements among many hundreds of surface stations.
Here it is, in their own words: “At middle and high latitudes the correlations approach unity as the station separation becomes small; the correlations fall below 0.5 at a station separation of about 1200 km, on the average. At low latitudes the mean correlation is only 0.5 at small station separation. The distance over which strong correlations are maintained at high latitudes probably reflects the dominance of mixing by large-scale eddies. At low latitudes the most active atmospheric dynamical scales are smaller, but apparently there are also substantial coherent temperature variations on very large scales (for example, due to the quasi-biennial oscillation, Southern Oscillation, and E1 Nino phenomena), which account for the slight tendency toward positive correlations at large station separations. We examined the dependence of the correlations on the direction of the line connecting the two stations. For the regions for which this check was performed, the United States and Europe, no substantial dependence on direction was found. For example, in these regions the average correlation coefficient for 1000-km separation was found to be within the range 0.5-0.6 for each of the directions defined by 45 [degree] intervals.”
In short, Hansen and Lebedeff show large correlations of recorded temperature compared between pair-wise surface stations The correlations extend for considerable distances. Temperature correlation becomes more than 0.8 at distances less than 250 km for latitudes greater than 23 degrees. At distances less than 250 km between the equator and 23 degrees, regional correlations are ~0.5.
Next, we turn to the work of Hubbard and Lin [2]. They tested the temperature measurements “of dual temperature systems for ASOS, MMTS, and Gill shield (with HMP45C sensor), as well as one aspirated (ASP-ES) and one non-aspirated (NON-ES) shield from Eastern Scientific Inc. (with HMP45C), and one CRS (with HMP45C)” against the simultaneous measurements from a high-precision R. M. Young probe in an aspirated shield. “CRS” is Cotton Regional Shelter (the Stevenson Screen).
The three-panel Figure 2 in their paper shows the distribution of bias temperatures during day, night, and average for all six of the tested sensors.
Figure 2 legend is: “Statistical distributions of (a) daytime air temperature biases, (b) nighttime air temperature biases, and (c) overall air temperature biases for all air temperature systems in the measurements.” The data include thousands of temperatures (April through August, 2000), recorded at 0.1 Hz for 5 minutes each, representing 30 measurements per recorded temperature. So, random error was diminished by 1/5.5 in each reported temperature.
The average bias envelope for the HMP45C probe in the CRS shield was asymmetric. Ir showed a large excess of too-warm temperatures. The bias mean(+/-)sigma was 0.34(+/-)0.53, however, this isn’t the whole story.
A distribution of temperatures skewed toward warm, inserts a variable warm bias into the temperature record, during the thousands of measurements. This systematic temperature bias was primarily the impact on the sensor of solar loading and wind speeds.
This warm bias is a systematic error in the instrument that will not be revealed by any test looking for errors due to random artifacts, or any changes external to the instrument.
If the HMP45C/CRS warm bias was not distributed randomly in time, it will induce a spurious trend into the temperature data. A spurious warming trend will show up if the systematic bias is more pronounced late in the data set. Or, counter intuitively, if a warm systematic bias is asymmetrically distributed into the early part of the data set, it can produce a spurious cooling trend.
Figure 2 of Hubbard and Lin shows that every single one of the six sensor systems they examined showed an excess bias asymmetry toward too warm temperatures (also shown in their Table 1).
Now we combine the results of Hansen and Lebedeff with those of Hubbard and Lin:
1. High correlations of temperatures exist among regionally adjacent surface stations, extending over at least 250 km.
2. Regional correlations of temperature require regional correlations of climate. The results of Hansen and Lebedeff mean climate is correlated over distance.
3. Local temperatures are governed by local climate (sun and wind, and even snow albedo [3]).
4. Systematic biases are induced into air temperature measurements by the same climatic effects that produce the air temperature itself.
5. Regionally correlated air temperatures therefore require regionally correlated systematic effects.
6. Temperature sensors of regionally adjacent surface stations systematically biased by correlated climates produce correlated systematic biases.
7. Correlated systematic errors will be present in the air temperature records of regionally adjacent surface stations.
8. The tested sensors were of different configurations and all of them displayed systematic error similarly skewed to excess warm temperatures.
9. Surface stations with different types of sensors will produce correlated systematic errors.
Conclusion 1: Systematic errors are inevitably present in air temperature measurements.
Conclusion 2: These systematic errors will certainly propagate into the recorded air temperature time series of surface stations.
Conclusion 3: Systematic errors in surface air temperature records will be correlated among regional surface stations.
Therefore: Spurious temperature trends within the data of one surface station will correlate with similar spurious trends in the temperature data of regionally adjacent surface stations.
Observation: This strong likelihood of regionally correlated temperature biases, which is predicted by published work, has completely escaped the notice of the climate science community.
Finally: Spurious trends due to systematic error that appear in data, and that correlate among the records of adjacent surface stations, will look exactly like real trends.
Examining correlations within temperature-time series trends from adjacent surface stations will not reveal the contamination of the temperature series by systematic error.
Correlations of temperature-time series among adjacent surface stations do not, and in-and-of-themselves will never, disprove the existence of systematic error in the data.
It’s particularly relevant to your HAWS data that Hansen and Lebedeff noted the strongest regional temperature correlations in the highest latitudes. That means the systematic error will also be most strongly correlated at the same high latitudes where HAWS data were collected.
Hansen and Lebedeff compared station temperatures, not anomalies. Their temperatures were large in magnitude compared to the (+/-)0.46 C lower limit of error. Therefore their correlations are real. However, anomalies are of the same magnitude as the (+/-)0.46 C lower limit. This is evidenced by the 0.25(+/-)0.15 C HAWS trend you mentioned.
As the HAWS trend is smaller than the 1-sigma lower limit systematic error, there is no reason to believe it is real.
This leads to:
EFS Point 3. “Further you claim that systematic error eould be “would be completely invisible” well than, that works for me! This is the same as stating that systematic error averages out to zero for each station or to the ensamble average of any N stations.”
Reply 3. “Invisible” means invisible, not ‘averages out to zero.’ Invisible means systematic error will not show up in any statistical test posterior to collecting the data.
Data contaminated with systematic error will look like real data, except that it will be false data. In the case of temperature, the magnitudes will be wrong, and the trends may be spurious. How the spurious data will appear depends on the magnitudes, distributions, and changeability of the uncontrolled variables that affected the sensor while the temperatures were being measured.
Overcoming these problems is why Hubbard and Lin have spent so much effort developing real time filtering methods for surface station temperatures.
EFS Point 4: “Your second paragraph makes no sense whatsoeven, and is quite obviously circuitous. If this argument were actually true, than no analyses could ever be conducted on any raw data set whatsoever. It’s akin to saying collect the raw data, than do nothing with the raw collected data. Sad, really sad.”
Reply 4. We can agree on the “sad.” By now, you should have realized that surface air temperatures contaminated by systematic error are useless for detailed comparisons. You should also have realized by now that field air temperatures are undoubtedly contaminated by systematic error.
It means that the conclusion of my E&E paper obtains, namely that, “no analyses could ever be conducted on any raw [surface air temperature] data set whatsoever” that are more accurate than about (+/-)0.46 C.
Part 2 will be forthcoming.
References:
1. Hansen, J. and Lebedeff, S., Global Trends of Measured Surface Air Temperature, J. Geophys. Res. (1987) 92 (D11), 13345-13372.
2. Hubbard, K.G. and Lin, X., Realtime data filtering models for air temperature measurements, Geophys. Res. Lett. (2002) 29(10), 1425 1-4; doi: 10.1029/2001GL013191.
3. Lin, X., Hubbard, K.G. and Baker, C.B., Surface Air Temperature Records Biased by Snow-Covered Surface, Int. J. Climatol. (2005) 25, 1223-1236; doi: 10.1002/joc.1184.
During the course of the year the insolation that heats the planet, averaged out at about 288 W/m2, varies by about +/- 20 W/m2 due to orbital eccentricity and albedo. These changes (+/- 15%) are global but not randomly distributed in space or time. What the actual impact on regions is not known (to my knowledge).
We worry about instrumental or measurement accuracy (and precision), yet what of the impact of long-term local variations in albedo and timing of albedo? Do these disproportionately alter temperature readings so as to distort the “global” temperature? We know a 1.5% change in albedo equals the impact of doubled CO2 (by some calculations). Can a 3% change in one area, say, the Arctic, cause a 0.9K apparent increase in a portion of the year’s global temperature?
The impact of non-random variations in time of in-out energy looks like it has far greater possible impact than measurement accuracry and precision. Trenberth looks for 0.85 W/m2 of “missing” heat when the solar insolation value just got knocked down by 1.6 W/m2 (correctly or not). Our ability to ACCURATELY know what is going on is greater than our ability to understand the significance of the variance from the long term “normal”, at least at the level of variance we are seeing.
This is Part II of my reply to the critique of my paper by EFS_Junior. Part I of my reply is here.
EFS Point 5. “Your third paragraph plays with the concept of region, so is the Northern Hmisphere (NH) a region? Because I’m quite sure that I can show some autocorrelation/cross correlation with any two NH regional stations simply due to similar seasonal and diurnal behaviors (note that these would have to be cross correlated first to determine the diurnal lag coefficient).”
Reply 5. “Region” can be empirically defined in terms of the 1987 results of Hansen and Lebedeff [1], who showed air temperature correlations across 1200 km. For convenience, we can define as a region, locales where correlation is, say, 0.68 (1-sigma) or better. For latitudes north or south of 23 degrees, that might be 500 km. For zero to 23 degrees, that might be 100 km. Of course, one can define them as one likes, but an evidence-based qualifier is required.
As pointed out in Reply Part I, instrumental systematic error is produced by the same uncontrolled environmental variables that determine surface air temperature. So, systematic errors in the temperature record will be about as regionally correlated as the temperature measurements themselves.
EFS Point 6: “This also plays somewhat into the LLN, as it would be virtually impossible (p ~ 0) to take multiple stations from any region (no matter how small or how large) with the expectation (p ~ 1) that all these stations used the exact same sensors over time, have identical systematic errors over time, etceteras.”
Reply 6. Again as pointed out in Reply Part I, the field calibrations from Hubbard and Lin [2] show that sensors in six very different shields exhibit the same direction of systematic skew in measured temperatures (see also [3-5]). Therefore, it is an empirically justifiable surmise that regional systematic errors will remain correlated no matter the sort of temperature sensor employed. Your analysis is therefore not appropriate to the question.
EFS Point 7: “Autocorrelations exceeding 0.8, 0.9, 0.99, 0.999 with slopes of 0.98, 0.99, 1.00, 1.01, 1.02 for any two stations, is empirical proof that there is an underlying relationship between them, that whatever errors exist in these measurements, that these errors are not systematic in nature, to the degree, or in the manner, that the author claims.”
Reply 7. Climate stations measure surface air temperatures (among other observables). We know that surface air temperatures are regionally correlated. Clearly, then, any long-term trends in air temperatures will also be regionally correlated. The total average lower limit systematic-plus-station-error of (+/-)0.32 C is much smaller than the magnitude of the recorded air temperatures. Therefore, although the 20th century temperature measurements are contaminated with at least (+/-)0.32 C of error, our knowledge of these air temperatures is not seriously impacted by that lower limit of error.
However, the (+/-)0.46 C lower limit of error in an annual anomaly is not smaller than the air temperature anomalies themselves, which are calculated by subtraction of station air temperatures from a climate normal. That means any anomaly trend in temperatures smaller than (+/-)0.46 C, over the time-range of interest, is not physically distinguishable from zero.
As noted above, the regional correlation of systematic error in surface air temperature, guaranteed by the systematic correlation of surface air temperature itself plus the analogous response errors of various air temperature sensors, will also guarantee that spurious anomaly trends smaller than (+/-)0.46 C will be regionally correlated. This spurious regional correlation will mislead any scientist who neglects the effects of systematic error, or any one who fails to understand that systematic errors follow uncontrolled systematic climate variables.
As the systematic errors entering the measurements of 20th century air temperatures are unknown, it is impossible to correct 20th century air temperatures for these errors. It is only possible to measure contemporary systematic errors by monitoring sensor systems similar to 20th century systems, in order to estimate an average representative uncertainty due to the systematic error present in the 20th century measurements. This experiment would require deploying high precision temperature sensors at multiple (>,=100) locations world-wide to monitor “true” air temperatures – making a precision data-base against which concurrently recorded standard sensor temperature measurements could be compared. See reference [2] for what such an experiment might look like.
EFS Point 8. “It’s akin to saying “you can’t do that because x, y, z, …” but than doing so anyway…”
Reply 8. This point is answered in Reply 7.
EFS Point 9. “In your 4th paragraph/sentence you again make a baseless statement, as empirical evidence abounds as to the sameness of the global temperaure trends (land based and satellite eras, a random selection of a small group of land based records, say, for example, 10 < N < 100), that these trends are very similar for the 23 HAWS as well as to larger networks (but with lower trend lines for the global mean temperature trends), that these 23 HAWS trend lines are, in the aggrigate, quite similar in shape to the global trend lines.”
Reply 9. As noted in Reply 7, regional trends in temperature do not necessarily indicate the credibility of regional anomaly trends. Climate stations globally all use similar temperature sensors. Over the bulk of the 20th century, these were LIG thermometers in CRS shelters, or worse. There is every reason to think that the systematic error contaminating their temperature measurements would vary as the air temperature. When regional and global annual anomalies are calculated, the trends should usually (but perhaps not always) be preserved, except that any trend smaller than (+/-)0.46 C would be indistinguishable from spurious (zero).
Your empirical correlations would be credible indicators of real annual anomaly trends only if all measurement errors were random.
EFS Point 10. “In closing, any temperature record has a total error associated with it, whatever that error is, it does not stop one from extracting very real low frequency information with an associated very high degree of statistical confidence.”
Reply 10. Only when the low frequency trend is greater than (+/-)0.46 C. And that emergence would only grant 68% confidence. You’d need a 0.92 C trend to attain the usual 95% (p<0.05) statistical confidence.
EFS Point 11. “If this were not true, if this were not the case, than all low frequency temperature trand lines would indeed exhibit truly random behaviors, with R^2 approaching zero in all cases.”
Reply 11. Not when the error is systematic and regionally correlated.
EFS Point 12. “That because we can demonstrate well defined low frequency trend lines with associated high degrees of statistical confidence, that we can choose large N, and still obtain well defined low frequency trend lines with associated high degrees of statistical confidence, suggests umabigiously, based on the LLN, that these emperical results are real with probability approaching one (p ~ 1).”
Reply 12. 1/sqrtN reduction of error works only with random error.
Your entire analysis rests upon the assumption that random error exhausts all the instrumental error present in the surface air temperature record. It doesn’t. Unfortunately, the climate scientists who compile surface air temperature-time series have thoroughly neglected the systematic error suffered by temperature sensors. This neglect has led them to false confidence and to publish empirically unjustifiable conclusions.
The unappreciated error in the surface air temperature has further misled climate modelers to place false confidence in the temperature calibration accuracy of GCMs, and even further has led to false precision when so-called proxy paleo-temperature reconstructions are normalized to the instrumental record.
Final Part III is forthcoming.
References:
1. Hansen, J. and Lebedeff, S., Global Trends of Measured Surface Air Temperature, J. Geophys. Res., 1987, 92 (D11), 13345-13372.
2. Hubbard, K.G. and Lin, X., Realtime data filtering models for air temperature measurements, Geophys. Res. Lett., 2002, 29 (10), 1425 1-4; doi: 10.1029/2001GL013191.
3. Hubbard, K.G., Lin, X., Baker, C.B. and Sun, B., Air Temperature Comparison between the MMTS and the USCRN Temperature Systems, J. Atmos. Ocean. Technol., 2004, 21 1590-1597.
4. Lin, X. and Hubbard, K.G., Sensor and Electronic Biases/Errors in Air Temperature Measurements in Common Weather Station Networks, J. Atmos. Ocean. Technol., 2004, 21 1025-1032.
5. Lin, X., Hubbard, K.G. and Meyer, G.E., Airflow Characteristics of Commonly Used Temperature Radiation Shields, J. Atmos. Oceanic Technol., 2001, 18 (3), 329-339.
Even the 95% confidence level is soft and squishy. Suitable for Psychology and other opinion-contaminated fields. It is “accepted” in Climate Science because it’s the only standard it has a hope of (occasionally) reaching. Real science goes for 5-6 sigma confidence. Never, ever, ever, will Climate Science attempt that.