Guest Post by Werner Brozek, Excerpted from Professor Robert Brown from Duke University, Conclusion by Walter Dnes and Edited by Just The Facts:
A while back, I had a post titled: HadCRUT4 is From Venus, GISS is From Mars, which showed wild monthly variations from which one could wonder if GISS and Hadcrut4 were talking about the same planet. In comments, mark stoval, posted a link to this article, Why “GISTEMP LOTI global mean” is wrong and “HadCRUt3 gl” is right“, who’s title speaks for itself and Bob Tisdale has a recent post, Busting (or not) the mid-20th century global-warming hiatus, which could explain the divergence seen in the chart above.
The graphic at the top is the last plot from Professor Brown from his comment I’ve excerpted below, which shows a period of 19 years where the slopes go in opposite directions by a fairly large margin. Is this reasonable? Think about this as you read his comment below. His comment ends with rgb.
November 10, 2015 at 1:19 pm
“Werner, if you look over the length and breadth of the two on WFT, you will find that over a substantial fraction of the two plots they are offset by less than 0.1 C. For example, for much of the first half of the 20th century, they are almost on top of one another with GISS rarely coming up with a patch 0.1 C or so higher. They almost precisely match in a substantial part of their overlapping reference periods. They only start to substantially split in the 1970 to 1990 range (which contains much of the latter 20th century warming). By the 21st century this split has grown to around 0.2 C, and is remarkably consistent. Let’s examine this in some detail:
We can start with very simple graph that shows the divergence over the last century:
The two graphs have widening divergence in the temperatures they obtain. If the two measures were in mutual agreement, one would expect the linear trends to be in good agreement — the anomaly of the anomaly, as it were. They should, after all, be offset by only the difference in mean temperatures in their reference periods, which should be a constant offset if they are both measuring the correct anomalies from the same mean temperatures.
Obviously, they do not. There is a growing rift between the two and, as I noted, they are split by more than the 95% confidence that HadCRUT4, at least, claims even relative to an imagined split in means over their reference periods. There are, very likely, nonlinear terms in the models used to compute the anomalies that are growing and will continue to systematically diverge, simply because they very likely have different algorithms for infilling and kriging and so on, in spite of them very probably having substantial overlap in their input data.
In contrast, BEST and GISS do indeed have similar linear trends in the way expected, with a nearly constant offset. One presumes that this means that they use very similar methods to compute their anomalies (again, from data sets that very likely overlap substantially as well). The two of them look like they want to vote HadCRUT4 off of the island, 2 to 1:
Until, of course, one adds the trends of UAH and RSS:
All of a sudden consistency emerges, with some surprises. GISS, HadCRUT4 and UAH suddenly show almost exactly the same linear trend across the satellite era, with a constant offset of around 0.5 C. RSS is substantially lower. BEST cannot honestly be compared, as it only runs to 2005ish.
One is then very, very tempted to make anomalies out of our anomalies, and project them backwards in time to see how well they agree on hind casts of past data. Let’s use the reference period show and subtract around 0.5 C from GISS and 0.3 C from HadCRUT4 to try to get them to line up with UAH in 2015 (why not, good as any):
We check to see if these offsets do make the anomalies match over the last 36 most accurate years (within reason):
and see that they do. NOW we can compare the anomalies as they project into the indefinite past. Obviously UAH does have a slightly slower linear trend over this “re-reference period” and it doesn’t GO any further back, so we’ll drop it, and go back to 1880 to see how the two remaining anomalies on a common base look:
We now might be surprised to note that HadCRUT4 is well above GISS LOTI across most of its range. Back in the 19th century splits aren’t very important because they both have error bars back there that can forgive any difference, but there is a substantial difference across the entire stretch from 1920 to 1960:
This reveals a robust and asymmetric split between HadCRUT4 and GISS LOTI that cannot be written off to any difference in offsets, as I renormalized the offsets to match them across what has to be presumed to be the most precise and accurately known part of their mutual ranges, a stretch of 36 years where in fact their linear trends are almost precisely the same so that the two anomalies differ only BY an offset of 0.145 C with more or less random deviations relative to one another.
We find that except for a short patch right in the middle of World War II, HadCRUT4 is consistently 0.1 to 0.2 C higher than GISStemp. This split cannot be repaired — if one matches it up across the interval from 1920 to 1960 (pushing GISStemp roughly 0.145 HIGHER than HadCRUT4 in the middle of WW II) then one splits it well outside of the 95% confidence interval in the present.
Unfortunately, while it is quite all right to have an occasional point higher or lower between them — as long as the “occasions” are randomly and reasonably symmetrically split — this is not an occasional point. It is a clearly resolved, asymmetric offset in matching linear trends. To make life even more interesting, the linear trends do (again) have a more or less matching slope, across the range 1920 to 1960 just like they do across 1979 through 2015 but with completely different offsets. The entire offset difference was accumulated from 1960 to 1979.
Just for grins, one last plot:
Now we have a second, extremely interesting problem. Note that the offset between the linear trends here has shrunk to around half of what it was across the bulk of the early 20th century with HadCRUT4 still warmer, but now only warmer by maybe 0.045 C. This is in a region where the acknowledged 95% confidence range is order of 0.2 to 0.3. When I subtract appropriate offsets to make the linear trends almost precisely match in the middle, we get excellent agreement between the two anomalies.
Too excellent. By far. All of the data is within the mutual 95% confidence interval! This is, believe it or not, a really, really bad thing if one is testing a null hypothesis such as “the statistics we are publishing with our data have some meaning”.
We now have a bit of a paradox. Sure, the two data sets that these anomalies are built from very likely have substantial overlap, so the two anomalies themselves cannot properly be viewed as random samples drawn from a box filled with independent and identically distributed but correctly computed anomalies. But their super-agreement across the range from 1880 to 1920 and 1920 to 1960 (with a different offset) and across the range from 1979 to 2015 (but with yet another offset) means serious trouble for the underlying methods. This is absolutely conclusive evidence, in my opinion, that “According to HadCRUT4, it is well over 99% certain GISStemp is an incorrect computation of the anomaly” and vice versa. Furthermore, the differences between the two can not be explained by the fact that they draw on partially independent data sources — if this were the case, the strong coincidences between the two across piecewise blocking of the data are too strong — obviously the independent data is not sufficient to generate a symmetric and believable distribution of mutual excursions with errors that are anywhere near as large as they have to be, given that both HadCRUT4 and GISStemp if anything underestimate probable errors in the 19th century.
Where is the problem? Well, as I noted, a lot of it happens right here:
The two anomalies match up almost perfectly from the right hand edge to the present. They do not match up well from 1920 to 1960, except for a brief stretch of four years or so in early World War II, but for most of this interval they maintain a fairly constant, and identical, slope to their (offset) linear trend! They match up better (too well!) — with again a very similar linear trend but yet another offset across the range from 1880 to 1920. But across the range from 1960 to 1979, Ouch! That’s gotta hurt. Across 20 years, HadCRUT4 cools Earth by around 0.08 C, while GISS warms it by around around 0.07C.
So what’s going on? This is a stretch in the modern era, after all. Thermometers are at this point pretty accurate. World History seems to agree with HadCRUT4, since in the early 70’s there was all sorts of sound and fury about possible ice ages and global cooling, not global warming. One would expect both anomalies to be drawing on very similar data sets with similar precision and with similar global coverage. Yet in this stretch of the modern era with modern instrumentation and (one has to believe) very similar coverage, the two major anomalies don’t even agree in the sign of the linear trend slope and more or less symmetrically split as one goes back to 1960, a split that actually goes all the way back to 1943, then splits again all the way back to 1920, then slowly “heals” as one goes back to 1880.
As I said, there is simply no chance that HadCRUT4 and GISS are both correct outside of the satellite era. Within the satellite era their agreement is very good, but they split badly over the 20 years preceding it in spite of the data overlap and quality of instrumentation. This split persists over pretty much the rest of the mutual range of the two anomalies except for a very short period of agreement in mid-WWII, where one might have been forgiven for a maximum disagreement given the chaotic nature of the world at war. One must conclude, based on either one, that it is 99% certain that the other one is incorrect.
Or, of course, that they are both incorrect. Further, one has to wonder about the nature of the errors that result in a split that is so clearly resolved once one puts them on an equal footing across the stretch where one can best believe that they are accurate. Clearly it is an error that is a smooth function of time, not an error that is in any sense due to accuracy of coverage of the (obviously strongly overlapping) data.
This result just makes me itch to get my hands on the data sets and code involved. For example, suppose that one feeds the same data into the two algorithms. What does one get then? Suppose one keeps only the set of sites that are present in 1880 when the two have mutually overlapping application (or better, from 1850 to the present) and runs the algorithm on them. How much do the results split from a) each other; and b) the result obtained from using all of the available sites in the present? One would expect the latter, in particular, to be a much better estimator of the probable method error in the remote past — if one uses only those sites to determine the current anomaly and it differs by (say) 0.5 C from what one gets using all sites, that would be a very interesting thing in and of itself.
Finally, there is the ongoing problem with using anomalies in the first place rather than computing global average temperatures. Somewhere in there, one has to perform a subtraction. The number you subtract is in some sense arbitrary, but any particular number you subtract comes with an error estimate of its own. And here is the rub:
The place where the two global anomalies develop their irreducible split is square inside the mutually overlapping part of their reference periods!
That is, the one place they most need to be in agreement, at least in the sense that they reproduce the same linear trends, that is, the same anomalies is the very place where they most greatly differ. Indeed, their agreement is suspiciously good — as far as linear trend is concerned – everywhere else, in particular in the most recent present where one has to presume that the anomaly is most accurately being computed and the most remote past where one expects to get very different linear trends but instead get almost identical ones!
I doubt that anybody is still reading this thread to see this — but they should.
P.S. from Werner Brozek:
On Nick Stokes Temperature Trend Viewer note the HUGE difference in the lower number for the 95% (Cl) confidence limits between Hadcrut4 and GISS from March 2005 to April 2016:
Temperature Anomaly trend
Mar 2005 to Apr 2016
CI from 0.433 to 3.965;
Temperature Anomaly trend
Mar 2005 to Apr 2016
CI from -0.023 to 3.850;
In the sections below, we will present you with the latest facts. The information will be presented in two sections and an appendix. The first section will show for how long there has been no statistically significant warming on several data sets. The second section will show how 2016 so far compares with 2015 and the warmest years and months on record so far. For three of the data sets, 2015 also happens to be the warmest year. The appendix will illustrate sections 1 and 2 in a different way. Graphs and a table will be used to illustrate the data. The two satellite data sets go to May and the others go to April.
For this analysis, data was retrieved from Nick Stokes’ Trendviewer available on his website. This analysis indicates for how long there has not been statistically significant warming according to Nick’s criteria. Data go to their latest update for each set. In every case, note that the lower error bar is negative so a slope of 0 cannot be ruled out from the month indicated.
On several different data sets, there has been no statistically significant warming for between 0 and 23 years according to Nick’s criteria. Cl stands for the confidence limits at the 95% level.
The details for several sets are below.
For UAH6.0: Since May 1993: Cl from -0.023 to 1.807
This is 23 years and 1 month.
For RSS: Since October 1993: Cl from -0.010 to 1.751
This is 22 years and 8 months.
For Hadcrut4.4: Since March 2005: Cl from -0.023 to 3.850
This is 11 years and 2 months.
For Hadsst3: Since July 1996: Cl from -0.014 to 2.152
This is 19 years and 10 months.
For GISS: The warming is significant for all periods above a year.
This section shows data about 2016 and other information in the form of a table. The table shows the five data sources along the top and other places so they should be visible at all times. The sources are UAH, RSS, Hadcrut4, Hadsst3, and GISS.
Down the column, are the following:
1. 15ra: This is the final ranking for 2015 on each data set.
2. 15a: Here I give the average anomaly for 2015.
3. year: This indicates the warmest year on record so far for that particular data set. Note that the satellite data sets have 1998 as the warmest year and the others have 2015 as the warmest year.
4. ano: This is the average of the monthly anomalies of the warmest year just above.
5. mon: This is the month where that particular data set showed the highest anomaly prior to 2016. The months are identified by the first three letters of the month and the last two numbers of the year.
6. ano: This is the anomaly of the month just above.
7. sig: This the first month for which warming is not statistically significant according to Nick’s criteria. The first three letters of the month are followed by the last two numbers of the year.
8. sy/m: This is the years and months for row 7.
9. Jan: This is the January 2016 anomaly for that particular data set.
10. Feb: This is the February 2016 anomaly for that particular data set, etc.
14. ave: This is the average anomaly of all months to date taken by adding all numbers and dividing by the number of months.
15. rnk: This is the rank that each particular data set would have for 2016 without regards to error bars and assuming no changes. Think of it as an update 20 minutes into a game.
If you wish to verify all of the latest anomalies, go to the following:
For UAH, version 6.0beta5 was used. Note that WFT uses version 5.6. So to verify the length of the pause on version 6.0, you need to use Nick’s program.
For Hadsst3, see: https://crudata.uea.ac.uk/cru/data/temperature/HadSST3-gl.dat
For GISS, see:
To see all points since January 2015 in the form of a graph, see the WFT graph below. Note that UAH version 5.6 is shown. WFT does not show version 6.0 yet. Also note that Hadcrut4.3 is shown and not Hadcrut4.4, which is why many months are missing for Hadcrut.
As you can see, all lines have been offset so they all start at the same place in January 2015. This makes it easy to compare January 2015 with the latest anomaly.
In this part, we are summarizing data for each set separately.
For UAH: There is no statistically significant warming since May 1993: Cl from -0.023 to 1.807. (This is using version 6.0 according to Nick’s program.)
The UAH average anomaly so far for 2016 is 0.673. This would set a record if it stayed this way. 1998 was the warmest at 0.484. The highest ever monthly anomaly was in April of 1998 when it reached 0.743 prior to 2016. The average anomaly in 2015 was 0.261 and it was ranked 3rd.
For RSS: There is no statistically significant warming since October 1993: Cl from -0.010 to 1.751.
The RSS average anomaly so far for 2016 is 0.753. This would set a record if it stayed this way. 1998 was the warmest at 0.550. The highest ever monthly anomaly was in April of 1998 when it reached 0.857 prior to 2016. The average anomaly in 2015 was 0.358 and it was ranked 3rd.
For Hadcrut4: There is no statistically significant warming since March 2005: Cl from -0.023 to 3.850.
The Hadcrut4 average anomaly so far is 0.990. This would set a record if it stayed this way. The highest ever monthly anomaly was in December of 2015 when it reached 1.010 prior to 2016. The average anomaly in 2015 was 0.746 and this set a new record.
For Hadsst3: There is no statistically significant warming since July 1996: Cl from -0.014 to 2.152.
The Hadsst3 average anomaly so far for 2016 is 0.671. This would set a record if it stayed this way. The highest ever monthly anomaly was in September of 2015 when it reached 0.725 prior to 2016. The average anomaly in 2015 was 0.592 and this set a new record.
For GISS: The warming is significant for all periods above a year.
The GISS average anomaly so far for 2016 is 1.21. This would set a record if it stayed this way. The highest ever monthly anomaly was in December of 2015 when it reached 1.10 prior to 2016. The average anomaly in 2015 was 0.87 and it set a new record.
If GISS and Hadcrut4 cannot both be correct, could the following be a factor:
* “Hansen and Imhoff used satellite images of nighttime lights to identify stations where urbanization was most likely to contaminate the weather records.” GISS
* “Using the photos, a citizen science project called Cities at Night has discovered that most light-emitting diodes — which are touted for their energy-saving properties — actually make light pollution worse. The changes in some cities are so intense that space station crew members can tell the difference from orbit.” Tech Insider
Question… is the GISS “nightlighting correction” valid any more? And what does that do to their “data”?
GISS for May came in at 0.93. While this is the warmest May on record, it is the first time that the anomaly fell below 1.00 since October 2015. As for June, present indications are that it will drop by at least 0.15 from 0.93. All months since October 2015 have been record warm months so far for GISS. Hadsst3 for May came in at 0.595. All months since April 2015 have been monthly records for Hadsst3.