Guest essay by M.S.Hodgart (Visiting Reader Surrey Space Centre University of Surrey)
A feature of the politicised debate – if such it may be called – over AGW (anthropogenic global warming) and so-called ‘climate change’- is the tendency on both sides to cite only the evidence supporting their views and to ignore what does not. Scientists of course are supposed to be above this sort of thing and to take into account all relevant evidence.
One finds a lot of partiality when it comes to interpretation of the trend in climate data – particularly the available time series of average temperature measurements on the surface of this planet. Is it going up or down or has it paused? What is happening?
Sceptical commentators were the first to draw attention to a recent pause or hiatus in global temperatures and are naturally tempted to see this as being persistent for as long as possible. The ‘warmist’ climate scientists – those that compiled the IPCC reports including those who work for or presumably get their research funding from the UK Meteorological Office have tended the other way. For a long time they were in a state of denial of any pause – not even conceding any reduction in warming rate – presumably because anything that detracted from the sacred dogma that an uncontested increase in atmospheric CO2 must entail a rise in temperature was very unwelcome.
But where both sides of the debate are often referring to the same data one must ask why it is not possible to come to a more objective conclusion.
I focus first on the time series of remote sensed TLT satellite measurements released by Remote Sensing Systems . I also look again at the HadCRUT4 data which were the object of my analysis in the WUWT of September 2013. It should be emphasised that the physical accuracy of any of these data is not under review here and is a separate issue.
Plotted either as monthly or annual updates the time series of globally averaged temperature measurements shows a substantial random-looking scatter from one month to the next (or year to the next). This scatter and a general lack of knowledge as to what exactly drives the temperatures makes it difficult to determine the trend. Yet so many people debate, write and comment as if the trend in these data were entirely obvious. They think they know – ignoring the fact that the scatter in the data makes for a significant problem, not least in establishing what a trend means. The distinguished econometrician Phillips has memorably written (see his introduction)
No one understands trends. Everyone sees them in data.
also (and not altogether ironically)
A statistician is a fellow that draws a line through a set of points based on unwarranted assumptions with a foregone conclusion.
In other words be careful if you run a linear regression on data like these. In the spirit of impartiality and with all respect for his warning I try here to draw reliable conclusions about the trend from these particular cited data. I must however put on record that like our ‘climate lord’ Matt Ridley I am a ‘luke-warmist’. My sympathies are with the ‘sceptics’ because there seems to have arisen an officially-sponsored global warming industry and a general scare-mongering by and of the scientifically ignorant. It has for example become a political ‘fact’ – contrary to all biology and chemistry – that CO2 in the atmosphere at present or worst-case future concentrations is or will be a pollutant i.e. a poison. It is not; its presence is essential to plant growth and therefore our survival. The material bulk of all trees and crops derives and is converted from CO2 in the air. Trees and crops grow out of the air not the ground! See the brilliant “Fun to imagine” TV series by Feynman. It is difficult to take seriously an unremitting propaganda that is prepared to distort the science as badly as this.
Lord Monckton and the RSS data
Viscount Monckton of Brenchley is a prominent climate sceptic. In a recent release to WUWT he emphasises what seems to him an obvious fact that global surface temperatures have paused for almost two decades. He is not alone in this view but let us see how he comes to this conclusion. He appeals first to the TLT satellite measurements released by Remote Sensing Systems (RSS). By the simple procedure of linear regression on their monthly data he finds for effectively a zero slope (his last cited month was September 2015) going back to February 1997. I replicate his result in my fig 1 (the red line). In consequence it seems obvious to him – and to so many others – that indeed global warming has stopped for all this time. But has it?
The problem is that he has chosen to disregard all the prior months of available measurements going back to January 1977. A linear regression over all these months yields a line (brown) with a slope of 0.12 deg C/decade. Although he acknowledges this effect he does not seem to realise that this longer regression makes his conclusion untenable, whatever assumptions are made as to what the linear regression achieves.
He probably assumes that the slope resulting from linear regression determines the trend in global temperature. In other words “whatever I choose to calculate and the way I do it defines the observed effect”. If he does then he runs into a flat contradiction. The red line gives him his “Pause” (he uses a capital letter); but the brown line says that over the same time interval temperatures continued to rise. So which ? The trend can’t be doing both. The RSS web-site plots only the longer span regression. For them there is no pause.
If however he were to make the more orthodox assumption that linear regression estimates a linear trend there are still difficulties. It could be that the data back to 1997 conforms to a classical signal + noise model with a straight line of some slope and offset (the signal) which one cannot see because of an obscuring random variation (the noise). The standard model is
where z[k] is the time series, the variable k is a count in months or years (it is easiest to start at zero), and the signal= trend in (i) is defined by the offset a and rate b. The noise terms v[k] in (ii) are introduced in order to give an account of that random-looking fluctuation we can see in the time series. Ideally they answer to a description of ‘white’ noise but the terms here exhibit some limited correlation – approximating what electrical engineers call ‘low pass noise’. Linear regression estimates an offset and slope which are in error from the true a and b because of that scatter. There are then two problems – the minor one being that his zero slope is at best a likely estimate – it is not definite.
More importantly it is confusing to decide over just what span of years this model (1) could be valid. We could postulate that model (1) applies over a limited span. But it is asking a lot of Nature to oblige Monckton with even an approximation to a linear model as which just happens to start in Feb 1997. If it applies over all the years then the two regressions are estimating the same trend and the flat red regression is a ‘freak’ due to a chance combination of noise terms. Again one would conclude that only the longer regression had any validity.
Fig 1 RSS monthly data and linear regressions. Red line from Feb 1979 to September 2015 (Monckton’s regression). Blue line: regression from mid 1973. Brown line: regression through all data.
But there is hope for Lord Monckton still. It can be shown that the assumption that a linear trend runs over the whole is unlikely to be true. The difference in slope between the two regressions of 0.12 deg C/decade is too large to be attributable to ‘chance’ – as one can readily determine. The two regressions and also a third regression (blue line) calculated from mid-1993 with an intermediate slope strongly suggests that beneath the noise the trend is not following a straight line.
All three lines can be reconciled if we allow that there is a non-linear trend – as indeed the IPCC scientists readily concede in ‘Box 2.2’ of their latest report AR5. There has to be something more complicated than a straight line beneath the noise. A generalisation of (1) is the classic
where z[k] are again the data points, and the signal = trend s[k] follows an assumed but unknown curve. The v[k] are again noise terms. The curve hidden in the data can be assumed to cover the whole span of years. Model (1) is at best an approximation over a limited span.
A linear regression is not invalidated by this model but the computed slope has to be interpreted differently. It will have to be seen as an average of a trend with some actual variation within the span of years.
Accordingly the overall regression (brown line) computes an average trend of something which is non-linear between the years 1979 and 2015. But Monckton’s regression in principle is also no more than an average trend. So yes: there is a ‘Pause’ but its strict interpretation is that “an estimate of the average trend from Feb1997 to Sept 2015 happens to have a zero slope”. But no: he has not demonstrated what is the most likely actual trend over this time.
As I show below it is much more likely that temperatures were still rising past 1997 and that Monckton only gets his Pause from a later date. As many others have pointed out it is easy to get fooled in statistical analysis by an apparent pattern suggested by what turns out to be the influence of a random component in the data.
Monckton’s construction does have one useful consequence: he has shown that none of these linear regressions (including his own) is likely to be estimating a straight line.
Alternative stochastic model?
In this deterministic trend model (2) there is assumed to be some unknown but well-defined curve or line concealed by low-pass noise i.e. strictly a weak sense stationary stochastic process. We need to be aware of a substantial literature which views the entire time series as a generalised non-stationary stochastic process. It is ‘all noise’. This approach is the preferred choice of econometricians who have taken a look at climate data. In his extensive publications Professor Terence Mills has looked at both approaches but favours the all-stochastic. If identification of ARIMA processes is your meat then there is plenty to work on. I wish you luck! In my opinion the stochastic approach leads to paradox and a terminological confusion. The data series has to be regarded as the output of a feed-forward and feed-back machine whose input is a white noise. If this were true then every possible time series is ‘random’. So where is your anthropogenic global warming ? I will follow the climate scientists and stay with deterministic trend estimation in general and (2) in particular.
Estimating a non-linear trend
If we have to fall back on the generalisation which is (2) then we shall have to estimate s[k] while only having access to the data z[k]. This is an exercise in curve fitting– for which there are a plethora of methods.
The difficulty with all methods of curve fitting is that there are essentially two kinds of error to contend with: the random error or variance due to the omni-present noise v[k] ; and a systematic error or bias due to the poor fit of a proposed fitting function to the unknown hidden signal s[k]. Whatever method is adopted the unavoidable problem is to decide if the computed curve is over-fitting (too much random error) or is under-fitting (too much bias error). There is a model selection problem.
In my earlier release to WUWT back in 2013 analysis of the HadCRUT4 data I proposed using a cubic loess – which Mills shows is superior to quadratic or linear loess – and also a polynomial regression In the case of loess the problem is to decide on the effective window width and with a polynomial to decide on the degree.
For loess if the window width is too narrow random error dominates over the systematic and if too wide vice versa. For a polynomial regression if the degree is too high random error dominates over the systematic and if too low vice versa. There are many model identification methods designed to guide a choice – starting perhaps with Akaike Information Criteria, modifications such as that by Hurvich and Tsai and many more. There are also various forms of cross validation technique. But they seem to me (having tried some of them) to be uncertain and unreliable. Statistical experts may disagree.
Corroborating curve fitting
Whatever the procedure the would-be statistician is left with a degree of freedom in allocating a crucial parameter. Some years ago however I stumbled on the fact that a combination of cubic polynomial loess and a standard polynomial regressions offer a unique choice of window width for the former and degree for the latter which gives the least disparity between the two generated curves. The one selects the other. The combination is self-selective. This idea seemed to work well on the HadCRUT4 data. This serendipitous result is now found to apply to the RSS data. In fig 2 a (half) window width of 168 months for a cubic polynomial loess and a polynomials degree of 5 give the closest agreement to each other (shown in blue dashed lines with no attempt to distinguish between them).
These very similar curves are perhaps the most likely deterministic estimates of the trend but they cannot be the exact truth. The uncertainty is again due to the noise present in the data. Assuming however that they are ‘close enough’ what they have in common, if we disregard the discernable oscillation, is a depiction of a rising trend followed by a pause effectively starting around 2003 – and not 1997.
Alternative segmented linear regression
The shape of these curves provides also a motivation for a different idea: to apply a split or segmented regression. The idea is to run two regressions over all the data years but with a break point which offers the least discontinuity between the two segments.
The break point is found after a trial and error search to be September 2003. Monckton still gets his pause but it is now reduced to the last 12 years. The first segment of the proposed regression in fig 2 from 1997 to 2003. finds for a computable rate of 0.16 deg C/decade. There is a pause after that over which the trend is indeed flat. The trend does not literally switch in slope on the month of September 2003. The purpose is to provide a meaningful computable rate.
Fig 2 RSS monthly data Jan 1979 to September 2015. Dashed blue curves: cubic polynomial loess with 168 month half window width; polynomial regression with degree 5. Continuous red lines: segmented linear regression with break point September 2003.
However each regression is seen by comparison with the loess and polynomial curves to be an acceptable approximation. The two segments are plausible averages over respectively separate ranges of data. The apparently contradictory or competitive regressions in fig 1 are now explained by more than just positing average slopes of a non-linear trend. Some information has been gleaned as to what that trend consists.
Application to HadCRUT4 data
The RSS data tell us nothing about global trends before 1979 and one has to turn to the publicly available land and sea-based surface measurements. The UK compilation HadCRUT4 goes back to 1850 but the two US series go back only to 1880. It is not my intention to try and assess the accuracy and reliability of any of these compilations. It is clearly a difficult exercise relying on measurements which were never intended for a systematic global experiment. Particular difficulties must be associated with sea temperature measurements which historically were very crude indeed. The series is of course under continual review from both its compilers and from sceptical critics – which can only be a good thing. Avoiding the very important issue of measurement error what can be inferred about the trend in global temperature if we should decide to trust HadCRUT4? To repeat: in my previous submission to WUWT in September 2013 I used this self-checking combination of a high degree polynomial fit and a cubic loess. But now let try something simpler – a succession of split linear regressions. We will need more than one break year. The same criterion will be adopted: that there needs to be the least discontinuity between successive regressions. All the break years meeting this requirement have to be searched and discovered by trial and error.
The result of this exercise is shown in fig.3 on the annually updated time series.
Fig 3 HadCRUT 4.4 annual boxed connected points to 2014 . Discrete heavy spots are Met Office approved discrete decadal averages. Brown lines are sequential regression segments. Arbitrary start from 1870; break point years 1910, 1942, 1975, 2005. Estimated r.m.s noise = 0.098 deg C. Red lines estimate average trend; discovered break point year 1941; post-war average trend 0.087± 0.012 (2 s.d) deg C /decade from 1941 to 2014.
I start on the same year 1870 as in my previous report to WUWT. We need four break years – splitting the trend estimate into five segments (see brown lines). It should be noted that these break years are discovered – not arbitrary choices. The heavy points also depicted are discrete decadal averages of temperature located in the middle of each decade – a simple statistic which the UK Met Office has long favoured and was adopted for the first time by the IPCC in their AR5 report (see part 2.4.3 AR5 )
As can be seen the proposed line regressions are in excellent agreement with these averages. This agreement surely promotes confidence in both procedures. Comparison with my earlier presentation also shows a good agreement with optimally chosen cubic loess and polynomial regression. One can see a broad similarity with the RSS time series from the 80s onwards. The temperatures started rising from 1975 and no pause is found until a break year of 2005 (two years later than for the RSS data). With this latest version of HadCRUT4 (now issue 4.4) we now get a low warming rate (of about 0.01 deg C/decade) from 2005 (compare flat response with the RSS data). I have not included the year 2015 which was not completed when running all these calculations.
One should emphasise that (i) these computed lines are probabilities not certainties; (ii) they are not meant to be taken literally but to be seen as approximants to some postulated smooth curve which is hidden from view and for which the loess and polynomial regressions may be better estimates.
The split regression segments graphically convey the impression that there were two long periods when temperatures were actually falling. Temperatures fell from at least 1870 to 1910, but rose from 1910 to 1942. They then were falling again from 1942 to 1975. From 1975 to 2005 warming resumed with a probable rate of 0.20 deg C /decade. But the warming did not persist at this rate. It seems to me to be probable that a third half- period has begun in which if there is now a pause (but with revised HadCRUT4.4 it is now a very slow warming).
This recent pause looks to be a continuation of an oscillation of global temperatures with a period of slightly more than 60 years going right back through the record imposed on a generally rising mean trend. I am not of course the first ‘sceptic’ to point this out.
I come to much the same conclusion as in my 2013 report. It seems that the much simpler sequential regressions are as convincing a way of specifying the trend in the data as my previous effort using polynomial regression and cubic loess.
What is the matter with the UK Met Office and the IPCC scientists?
In the summer of 2013 the UK Met Office, and the academics which they support, called a press conference in London to concede (reluctantly) a pause or a ‘hiatus’ in global temperatures and also confess they hadn’t clue as to why it was happening. The rather critical BBC journalist David Shukman who was present noted that
….the scientists say .. pauses in warming were always to be expected. This is new – at least to me…I asked why this had not come up in earlier presentations. No one really had an answer, except to say that this “message” about pauses had not been communicated widely..
Indeed! The press conference coincided with a reports by the Met Office (report 1, report 2, report 3) on the same theme. What the Met Office scientists did not discuss or even concede in that 3-part report is the presence of substantial oscillation over the historical record. This oscillation surely cannot be attributed to increasing concentration of atmospheric CO2 and it accounts for half the faster rate of warming in the 80s and 90s.
I find it troubling that presumably intelligent scientists (and they have competent statisticians also) cannot bring themselves to acknowledge – let alone explain or even properly discuss – the statistical fact that two extended cooling periods have featured in the past while CO2 levels were presumably always rising .
The reader will find the same statistical obfuscation in the two most recent reports (AR4 and AR5) released by the IPCC. A pause (or hiatus or standstill) is most unwelcome. Yet there is surely something to explain here for those who believe in the dominant anthropogenic effect on global warming. Since at least 1958 with the Keeling measurements (Mauna Loa etc) – and no doubt long before that – atmospheric CO2 levels have been rising monotonically (after seasonal averaging). It is hard to avoid the impression that there has been political pressure not to acknowledge the obvious: that an ever-rising concentration of atmospheric CO2 cannot be the only effect determining global surface temperature.
Trend v. average trend
In principle an oscillation does not have a trend. There is a need therefore to identify a mean trend which discounts that obvious oscillation. As suggested before one can differentiate
trend in the data = mean trend in the data + quasi-periodic oscillation
How then to estimate this mean trend? My previous effort was perhaps too elaborate. The following may be more convincing. One can construct a split regression with just two segments (the two red lines in fig. 3). To my mind the lines are steering a convincing middle course between the oscillating trend conveyed by the multiple split regressions. They may be about right. The break year of 1941 is again not an arbitrary choice: it has to be searched in order to ensure the least discontinuity between the two regressions with this construction. This notional mean trend is being estimated by two average trends computed by linear regression between favourable years. The post war average trend is found to be 0.087 ± 0.011 ( 2 s.d) deg C /decade i.e. less than 0.1 deg/decade which is half the rate of the actual trend which peaked (temporarily) in the 80s and 90s. The error limits are computed after first estimating the standard deviation of the noise of 0.098 deg C.
It is extraordinary that in their various releases neither the UK Met Office nor the IPCC seem to want to confront these statistical facts in their own data. It is of course unwise to make a projection into the future but if we trust neither the elaborate computer climate models favoured by the Met Office nor the projection of Mills- type all-stochastic models this is all we have got. One can only note that in the 85 years from now to 2100 the projected increase could be around 0.0087 ´ 85 = 0.74 degrees. Could this be realistic and if so is that a cause for alarm? I only ask.