
By Walter Dnes – Edited by Just The Facts
Investopedia defines “Leading Indicator” thusly…
A measurable economic factor that changes before the economy starts to follow a particular pattern or trend. Leading indicators are used to predict changes in the economy, but are not always accurate.
Economics is not the only area where a leading indicator is nice to have. A leading indicator that could predict in February, whether this calendar year’s temperature anomaly will be warmer or colder than the previous calendar year’s anomaly would also be nice to have. I believe that I’ve stumbled across exactly that. Using data from 1979 onwards, the rule goes like so…
- If this year’s January anomaly is warmer than last year’s January anomaly, then this year’s annual anomaly will likely be warmer than last year’s annual anomaly.
- If this year’s January anomaly is colder than last year’s January anomaly, then this year’s annual anomaly will likely be colder than last year’s annual anomaly.
This is a “qualitative” forecast. It doesn’t forecast a number, but rather a boundary, i.e. greater than or less than a specific number. I don’t have an explanation for why it works. Think of it as the climatological equivalent of “technical analysis”; i.e. event X is usually followed by event Y, leaving to others to figure out the underlying “fundamentals”, i.e. physical theory. I’ve named it the “January Leading Indicator”, abbreviated as “JLI” (which some people will probably pronounce as “July”). The JLI has been tested on the following 6 data sets, GISS, HadCRUT3, HadCRUT4, UAH5.6, RSS and NOAA
In this post I will reference this zipped GISS monthly anomaly text file and this spreadsheet. Note that one of the tabs in the spreadsheet is labelled “documentation”. Please read that tab first if you download the spreadsheet and have any questions about it.
The claim of the JLI would arouse skepticism anywhere, and doubly so in a forum full of skeptics. So let’s first look at one data set, and count the hits and misses manually, to verify the algorithm. The GISS text file has to be reformatted before importing into a spreadsheet, but it is optimal for direct viewing by humans. The data contained within the GISS text file is abstracted below.
Note: GISS numbers are the temperature anomaly, multiplied by 100, and shown as integers. Divide by 100 to get the actual anomaly. E.g. “43” represents an anomaly of 43/100=0.43 Celsius degrees. “7” represents an anomaly of 7/100=0.07 Celsius degrees.
- The first 2 columns on the left of the GISS text file are year and January anomaly * 100.
- The column after “Dec” (labelled “J-D”) is the January-December anomaly * 100
The verification process is as follows:
- Count all the years where the current year’s January anomaly is warmer than the previous year’s January anomaly. Add a 1 in the Counter column for each such year.
- For each such year, we count all where the year’s annual anomaly is warmer than the previous year’s annual anomaly and add a 1 in the Hit column for each such year.
| Jan(current) > Jan(previous) | J-D(current) > J-D(previous) | ||||
| Year | Counter | Compare | Hit | Compare | Comment |
| 1980 | 1 | 25 > 10 | 1 | 23 > 12 | |
| 1981 | 1 | 52 > 25 | 1 | 28 > 23 | |
| 1983 | 1 | 49 > 4 | 1 | 27 > 9 | |
| 1986 | 1 | 25 > 19 | 1 | 15 > 8 | |
| 1987 | 1 | 30 > 25 | 1 | 29 > 15 | |
| 1988 | 1 | 53 > 30 | 1 | 35 > 29 | |
| 1990 | 1 | 35 > 11 | 1 | 39 > 24 | |
| 1991 | 1 | 38 > 35 | 0 | 38 < 39 | Fail |
| 1992 | 1 | 42 > 38 | 0 | 19 < 38 | Fail |
| 1995 | 1 | 49 > 27 | 1 | 43 > 29 | |
| 1997 | 1 | 31 > 25 | 1 | 46 > 33 | |
| 1998 | 1 | 60 > 31 | 1 | 62 > 46 | |
| 2001 | 1 | 42 > 23 | 1 | 53 > 41 | |
| 2002 | 1 | 72 > 42 | 1 | 62 > 53 | |
| 2003 | 1 | 73 > 72 | 0 | 61 < 62 | Fail |
| 2005 | 1 | 69 > 57 | 1 | 66 > 52 | |
| 2007 | 1 | 94 > 53 | 1 | 63 > 60 | |
| 2009 | 1 | 57 > 23 | 1 | 60 > 49 | |
| 2010 | 1 | 66 > 57 | 1 | 67 > 60 | |
| 2013 | 1 | 63 > 39 | 1 | 61 > 58 | |
| Predicted 20 > previous year | Actual 17 > previous year | ||||
Of 20 candidates flagged (Jan(current) > Jan(previous)), 17 are correct (i.e. J-D(current) > J-D(previous)). That’s 85% accuracy for the qualitative annual anomaly forecast on the GISS data set where the current January is warmer than the previous January.
And now for the years where January is colder than the previous January. The procedure is virtually identical, except that we count all where the year’s annual anomaly is colder than the previous year’s annual anomaly and add a 1 in the Hit column for each such year.
| Jan(current) < Jan(previous) | J-D(current) < J-D(previous) | ||||
| Year | Counter | Compare | Hit | Compare | Comment |
| 1982 | 1 | 4 < 52 | 1 | 9 < 28 | |
| 1984 | 1 | 26 < 49 | 1 | 12 < 27 | |
| 1985 | 1 | 19 < 26 | 1 | 8 < 12 | |
| 1989 | 1 | 11 < 53 | 1 | 24 < 35 | |
| 1993 | 1 | 34 < 42 | 0 | 21 > 19 | Fail |
| 1994 | 1 | 27 < 34 | 0 | 29 > 21 | Fail |
| 1996 | 1 | 25 < 49 | 1 | 33 < 43 | |
| 1999 | 1 | 48 < 60 | 1 | 41 < 62 | |
| 2000 | 1 | 23 < 48 | 1 | 41 < 41 | 0.406 < 0.407 |
| 2004 | 1 | 57 < 73 | 1 | 52 < 61 | |
| 2006 | 1 | 53 < 69 | 1 | 60 < 66 | |
| 2008 | 1 | 23 < 94 | 1 | 49 < 63 | |
| 2011 | 1 | 46 < 66 | 1 | 55 < 67 | |
| 2012 | 1 | 39 < 46 | 0 | 58 > 55 | Fail |
| Predicted 14 < previous year | Actual 11 < previous year | ||||
Of 14 candidates flagged (Jan(current) < Jan(previous)), 11 are correct (i.e. J-D(current) < J-D(previous)). That’s 79% accuracy for the qualitative annual anomaly forecast on the GISS data set where the current January is colder than the previous January. Note that the 1999 annual anomaly is 0.407, and the 2000 annual anomaly is 0.406, when calculated to 3 decimal places. The GISS text file only shows 2 (implied) decimal places.
The scatter graph at this head of this article compares the January and annual GISS anomalies for visual reference.
Now for a verification comparison amongst the various data sets, from the spreadsheet referenced above. First, all years during the satellite era, which were forecast to be warmer than the previous year
| Data set | Had3 | Had4 | GISS | UAH5.6 | RSS | NOAA |
| Ann > previous | 16 | 15 | 17 | 18 | 18 | 15 |
| Jan > previous | 19 | 18 | 20 | 21 | 20 | 18 |
| Accuracy | 0.84 | 0.83 | 0.85 | 0.86 | 0.90 | 0.83 |
Next, all years during the satellite era, which were forecast to be colder than the previous year
| Data set | Had3 | Had4 | GISS | UAH5.6 | RSS | NOAA |
| Ann < previous | 11 | 11 | 11 | 11 | 11 | 11 |
| Jan < previous | 15 | 16 | 14 | 13 | 14 | 16 |
| Accuracy | 0.73 | 0.69 | 0.79 | 0.85 | 0.79 | 0.69 |
The following are scatter graph comparing the January and annual anomalies for the other 5 data sets:
HadCRUT3

HadCRUT4

UAH 5.6

RSS

NOAA

The forecast methodology had problems during the Pinatubo years, 1991 and 1992. And 1993 also had problems, because the algorithm compares with the previous year, in this case Pinatubo-influenced 1992. The breakdowns were…
- For 1991 all 6 data sets were forecast to be above their 1990 values. The 2 satellite data sets (UAH and RSS) were above their 1990 values, but the 4 surface-based data sets were below their 1990 values
- For 1992 the 4 surface-based data sets (HadCRUT3, HadCRUT4, GISS, and NCDC/NOAA) were forecast to be above their 1991 values, but were below
- The 1993 forecast was a total bust. All 6 data sets were forecast to be below their 1992 values, but all finished the year above
In summary, during the 3 years 1991/1992/1993, there were 6*3=18 over/under forecasts, of which 14 were wrong. In plain English, if a Pinatubo-like volcano dumps a lot of sulfur dioxide (SO2) into the stratosphere, the JLI will not be usable for the next 2 or 3 years, i.e.:
“The most significant climate impacts from volcanic injections into the stratosphere come from the conversion of sulfur dioxide to sulfuric acid, which condenses rapidly in the stratosphere to form fine sulfate aerosols. The aerosols increase the reflection of radiation from the Sun back into space, cooling the Earth’s lower atmosphere or troposphere. Several eruptions during the past century have caused a decline in the average temperature at the Earth’s surface of up to half a degree (Fahrenheit scale) for periods of one to three years. The climactic eruption of Mount Pinatubo on June 15, 1991, was one of the largest eruptions of the twentieth century and injected a 20-million ton (metric scale) sulfur dioxide cloud into the stratosphere at an altitude of more than 20 miles. The Pinatubo cloud was the largest sulfur dioxide cloud ever observed in the stratosphere since the beginning of such observations by satellites in 1978. It caused what is believed to be the largest aerosol disturbance of the stratosphere in the twentieth century, though probably smaller than the disturbances from eruptions of Krakatau in 1883 and Tambora in 1815. Consequently, it was a standout in its climate impact and cooled the Earth’s surface for three years following the eruption, by as much as 1.3 degrees at the height of the impact.” USGS
For comparison, here are the scores with the Pinatubo-affected years (1991/1992/1993) removed. First, where the years were forecast to be warmer than the previous year
| Data set | Had3 | Had4 | GISS | UAH5.6 | RSS | NOAA |
| Ann > previous | 16 | 15 | 17 | 17 | 17 | 15 |
| Jan > previous | 17 | 16 | 18 | 20 | 19 | 16 |
| Accuracy | 0.94 | 0.94 | 0.94 | 0.85 | 0.89 | 0.94 |
And for years where the anomaly was forecast to be below the previous year
| Data set | Had3 | Had4 | GISS | UAH5.6 | RSS | NOAA |
| Ann < previous | 11 | 11 | 11 | 10 | 10 | 11 |
| Jan < previous | 14 | 15 | 13 | 11 | 12 | 15 |
| Accuracy | 0.79 | 0.73 | 0.85 | 0.91 | 0.83 | 0.73 |
Given the existence of January and annual data values, it’s possible to do linear regressions and even quantitative forecasts for the current calendar year’s annual anomaly. With the slope and y-intercept available, one merely has to wait for the January data to arrive in February and run the basic “y = mx + b” equation. The correlation is approximately 0.79 for the surface data sets, and 0.87 for the satellite data sets, after excluding the Pinatubo-affected years (1991 and 1992).
There will probably be a follow-up article a month from now, when all the January data is in, and forecasts can be made using the JLI. Note that data downloaded in February will be used. NOAA and GISS use a missing-data algorithm which results in minor changes for most monthly anomalies, every month, all the way back to day 1, i.e. January 1880. The monthly changes are generally small, but in borderline cases, the changes may affect rankings and over/under comparisons.
The discovery of the JLI was a fluke based on a hunch. One can only wonder what other connections could be discovered with serious “data-mining” efforts.
Discover more from Watts Up With That?
Subscribe to get the latest posts sent to your email.
walterdnes says: February 1, 2014 at 6:55 pm
Here’s my competition…
Barely, i.e. “Met Office global forecasts too warm in 13 of last 14 years”:
http://www.bbc.co.uk/blogs/paulhudson/posts/Met-Office-global-forecasts-too-warm-in-13-of-last-14-years
Out of sheer curiousity, is anybody else out there making forecasts about the 2014 annual temperature anomaly?
walterdnes;
Your competition has nothing to do with it. You tested to see if the data in a given set follows the same general trend as does a subset of that same data. It does. If it didn’t that would be significant.
Not on Intrade. It worked like a futures market. Betting was open on the annual anomaly until the day the contract expired; IOW, until the year ended. The price (odds) offered or bid upon moved up and down to take into account traders’ estimates of how much what had gone before was likely to influence the final outcome.
Even in Vegas, one can bet on ongoing major sports events like the super bowl or world series. The odds adjust to take into account the score so far.
How is the JLI for THIS January shaping up?
walterdnes says: February 1, 2014 at 7:04 pm
Out of sheer curiousity, is anybody else out there making forecasts about the 2014 annual temperature anomaly?
It’s equivocating, but here is the Hansen et al., 2014 prediction:
“So what are the near-term prospects? El Niño depends on fickle wind anomalies for initiation, so predictions are inherently difficult, but conditions are ripe for El Niño initiation in 2014. About half of the climate models catalogued by the International Research Institute predict that the next El Ni ño will begin by summer 2014, with the other half predicting ENSO neutral conditions 21. The mean NCEP forecast 21 issued 13 January has an El Niño beginning in the summer of 2014, although a significant minority of the ensemble members predicts ENSO neutral conditions for 2014.
The strength of an El Niño, too, depends on the fickle wind anomalies at the time of initiation. We speculated 22 that the likelihood of “super El Niños, such as those in 1982 – 3 and 1997 –
8, has increased. Our rationale was that global warming increased SSTs in the Western Pacific, without yet having much 13 effect on the temperature of upwelling deep water in the Eastern Pacific (Fig. 2 above), thus allowing the possibility of a larger swing of Eastern Pacific temperature. Recent paleoclimate 23 and modeling 24 studies find evidence for an increased frequency of extreme El Niños with global warming.
Assuming that an El Niño begins in summer 2014, 2014 is likely to be warmer than 2013 and perhaps the warmest year in the instrumental record. However, given the lag between El Niño initiation and global temperature, 2015 is likely to have a temperature even higher than in 2014.”
http://www.columbia.edu/~jeh1/mailings/2014/20140121_Temperature2013.pdf
davidmhoffer says:
> February 1, 2014 at 7:09 pm
> Your competition has nothing to do with it. You tested to see
> if the data in a given set follows the same general trend as
> does a subset of that same data. It does. If it didn’t that
> would be significant.
The point of this article was to show a useful forecast tool. Yes, it looks obvious now. How many people were using this method in the past?
I like this work! I like it because it has features of a scientific study that are missing from the studies of global warming that are referenced by the IPCC in its assessment reports. There are events (with durations of 1 calendar year each). Each event has an outcome (whether or not the current year’s annual anomaly exceeds the previous year’s annual anomaly). Each event has a condition (whether the current year’s January anomaly exceeds the previous year’s anomaly). Observed events in which the annual anomaly exceeds the previous year’s anomaly have the count that statisticians call the “frequency.” Observed events in which the January anomaly exceeds the previous year’s anomaly have a frequency. The ratio of the two frequencies is an example of the idea that statisticians call a “relative frequency.” A relative frequency is the empirical counterpart of a probability. Probabilities are an essential component of logic.
There are the makings here for a scientific theory. Steps along the path toward such a theory would include adapting the model to predict the relative frequencies of the outcomes of the future and the uncertainties in these relative frequencies. Going forward, the question should be asked of whether the predicted relative relative frequencies and uncertainties are a match for the observed relative frequencies. If they are a match, the model is validated. Otherwise, it is falsified.
Also, the list of independent variables of the model should be expanded beyond the amount of the January anomaly and the question should be asked of whether a condition other than the one assumed would provide more information about the outcome. Among the independent variables considered for inclusion should be the CO2 concentration.
walterdnes;
The point of this article was to show a useful forecast tool.
>>>>>>>>>>>
But it isn’t. All it shows is that warm years are comprised of warm months. What else would a warm year be comprised of?
Curious to know if any correlation between December of previous year and the anomaly of the next 12 months… especially since we already have that for this year and won’t get january for a couple more weeks
walterdnes
Also, for reference:
“Physical barriers to prediction
Regardless of what type of ENSO forecast model one uses, forecasting ENSO is considerably more difficult during certain seasons of the year than others. Individual El Niño or La Niña episodes tend to develop between the months of April and June, and, once developed, last until the following February through May. Thus, once an episode has developed in early northern summer, forecasting its evolution through the remainder of its life cycle is not difficult. A much harder task is to forecast what will happen between March and June, when a forecast is being made in the preceding January through April. The difficulty in forecasting at this time of year is often called the “spring barrier” (in the Northern Hemisphere), or the “autumn barrier” (in the Southern Hemisphere).
After April has finished, while there still is uncertainty, it starts becoming easier to see in the latest observations how the stage is being set for the remainder of the calendar year and the first few months of the following year. By June, the uncertainty becomes still less: if there is nothing new developing, the chances of new development are small. While ENSO forecasting is most difficult through the late northern spring, the spring barrier is not impenetrable. Signs of changes in the ENSO state, such as increased heat content in the western equatorial Pacific Ocean, are available, so that at least a probability forecast can be made through the spring barrier. As April, May and June come along, such probabilities normally become more robust.”
http://iri.columbia.edu/climate/ENSO/background/prediction.html#barrier
I don’t understand. I thought this was ment to be humerous yet from the comments it appears all are taking it seriously
joshuah says:
> February 1, 2014 at 7:40 pm
> Curious to know if any correlation between December of
> previous year and the anomaly of the next 12 months…
> especially since we already have that for this year and
> won’t get january for a couple more weeks
Good question. Can’t do that quickly. Fortunately, my spreadsheet calculates annual anomalies on-the-fly. So I was able to do a quick-n-dirty hack a couple of minutes ago. I…
* created a copy of the spreadsheet
* copied the monthly data to a blank area (as values, not pointers)
* copied the values back to the data area, offset by 1 month
This hack pushes December data into January, etc., and effectively compares December anomalies against the 12-month period December to next November. The accuracy fell off approx 10% for most of the years where a warmer anomaly was forecast. It fell off 10% to almost 30% (averaging approx 20%) for years forecast to be cooler. If it had compared to the 12-month period starting in January (i.e. one month later), the accuracy probably would’ve fallen off even more. This could be the basis of a follow-up article, i.e. which month is the best/worst leading indicator for the 12-month mean starting with itself.
Gary Pearse says:
February 1, 2014 at 5:38pm
Also, note the unfudged satellite data gave the highest correlation. This might be a “test” of the data sets.
Gary,
That is what struck me, as I looked at the comparative data plots and correlations.
Mac
Walter
Thought provoking.
That looks like further evidence of Hurst Kolmogorov dynamics (aka “climate persistence”). e.g. especially by Demetris Koutsoyianis e.g. Climatic variability over time scales spanning nine orders of magnitude: Connecting Milankovitch cycles with Hurst-Kolmogorov dynamics
For a possible cause, may I recommend David Stockwell’s Solar Accumulative theory. e.g.
Key evidence for the accumulative model of high solar influence on global temperature
Regards
David
A warmer January alone should not have such a disproportionate effect on the average for the rest of the year. Unless there is a hidden linkage to something in the climate system./
You can do similar analyses for predicting solar cycle maximum amplitude based on the first one, two or three years of any particular cycle.
A variety of rules of thumb emerge which can be used to predict, correctly or otherwise, how a particular cycle will evolve. Broadly, they involve the SSN for Year 1 of a new cycle being less than something or greater than something. Less than a threshold implies a high likelihood of a weaker cycle, greater than a certain threshold is indicative of a strong cycle.
Compare the .Global Warming Prediction Project
davidmhoffer says:
February 1, 2014 at 6:20 pm
I agree. I don’t find this result to be anything other than expected. Since your “leading indicator” is included in the data you are trying to predict, of course it will be correlated.
As to the question you raise above about different months, David, there’s not a whole lot of difference. When I do the analysis on the whole 132 years of the GISS LOTI dataset, I get the following results:
Jan, 0.70
Feb, 0.62
Mar, 0.57
Apr, 0.61
May, 0.56
Jun, 0.58
Jul, 0.55
Aug, 0.65
Sep, 0.55
Oct, 0.59
Nov, 0.70
Dec, 0.67
Average, 0.61
No obvious pattern, nobody really shines.
Finally, the author hasn’t adjusted for the fact that the data has a trend … and that means that on average, both the January to January and the year to year data both will have a positive value.
I may run a monte carlo analysis on the data to confirm what it looks like, but as far as I’m concerned, and with my apologies to the author, this is a non-event. This is what you’d expect.
w.
It seems to me this would hold true with a random data set.
Walter,
The heading of your first data table has the ” >” sign switched to a “<" sign, unless I am misreading something.
David L. Hagen says:
February 1, 2014 at 8:21 pm
> Compare the Global Warming Prediction Project
> This project is initiated, run, and maintained by
> KnowledgeMiner Software, a research, consulting and
> software development company in the field of high-end
> predictive modeling.
That’s what I meant by “data mining”.
Thanks Willis.
Walter, allow me another analogy. Walk up a hill and down the other side, measuring your altitude at each step. Suppose the whole trip is 10,000 steps. Break your trip into 100 step increments. Now, compare the first step from each group of 100 to the average altitude of the entire trip. Classify each step as either higher than average, or lower than average.
Is the first step a good leading indicator of the next 99 steps being higher or lower than average? Of course it is. You should get a correlation very close to 1. In other words, information that is completely accurate and entirely useless.
Knowing that steps at higher than average altitude are very likely to be followed by more steps that are higher than average in altitude tells you absolutely zero in regard to any given step being uphill or downhill, which would be a coin flip.
Fred Souder says:
> February 1, 2014 at 8:45 pm
> Walter,
> The heading of your first data table has the ” >” sign
> switched to a “<" sign, unless I am misreading something.
I think you're right. "Just The Facts ", can you correct that? I don't have authorization to edit this blog.
davidmhoffer says:
February 1, 2014 at 7:30 pm
All it shows is that warm years are comprised of warm months. What else would a warm year be comprised of?
However it also gives other information. For example, what would the ranking be for 2014 on Hadcrut4 if the January anomaly is 0.4 or 0.5?
With this tool, I can say with a certainty of 75% that if the anomaly is 0.4, then it will be larger than 8th. But if it is 0.5 we can be 75% certain it will be less than 8th. Do you think the MET office would be this close if they had the January numbers? Now I know they do not set the bar too high! ☺