The January Leading Indicator

GISS Data – Image Credit: Walter Dnes

By Walter Dnes – Edited by Just The Facts

Investopedia defines “Leading Indicator” thusly…

A measurable economic factor that changes before the economy starts to follow a particular pattern or trend. Leading indicators are used to predict changes in the economy, but are not always accurate.

Economics is not the only area where a leading indicator is nice to have. A leading indicator that could predict in February, whether this calendar year’s temperature anomaly will be warmer or colder than the previous calendar year’s anomaly would also be nice to have. I believe that I’ve stumbled across exactly that. Using data from 1979 onwards, the rule goes like so…

  1. If this year’s January anomaly is warmer than last year’s January anomaly, then this year’s annual anomaly will likely be warmer than last year’s annual anomaly.
  2. If this year’s January anomaly is colder than last year’s January anomaly, then this year’s annual anomaly will likely be colder than last year’s annual anomaly.

This is a “qualitative” forecast. It doesn’t forecast a number, but rather a boundary, i.e. greater than or less than a specific number. I don’t have an explanation for why it works. Think of it as the climatological equivalent of “technical analysis”; i.e. event X is usually followed by event Y, leaving to others to figure out the underlying “fundamentals”, i.e. physical theory. I’ve named it the “January Leading Indicator”, abbreviated as “JLI” (which some people will probably pronounce as “July”). The JLI has been tested on the following 6 data sets, GISS, HadCRUT3, HadCRUT4, UAH5.6, RSS and NOAA

In this post I will reference this zipped GISS monthly anomaly text file and this spreadsheet. Note that one of the tabs in the spreadsheet is labelled “documentation”. Please read that tab first if you download the spreadsheet and have any questions about it.

The claim of the JLI would arouse skepticism anywhere, and doubly so in a forum full of skeptics. So let’s first look at one data set, and count the hits and misses manually, to verify the algorithm. The GISS text file has to be reformatted before importing into a spreadsheet, but it is optimal for direct viewing by humans. The data contained within the GISS text file is abstracted below.

Note: GISS numbers are the temperature anomaly, multiplied by 100, and shown as integers. Divide by 100 to get the actual anomaly. E.g. “43” represents an anomaly of 43/100=0.43 Celsius degrees. “7” represents an anomaly of 7/100=0.07 Celsius degrees.

  • The first 2 columns on the left of the GISS text file are year and January anomaly * 100.
  • The column after “Dec” (labelled “J-D”) is the January-December anomaly * 100

The verification process is as follows:

  • Count all the years where the current year’s January anomaly is warmer than the previous year’s January anomaly. Add a 1 in the Counter column for each such year.
  • For each such year, we count all where the year’s annual anomaly is warmer than the previous year’s annual anomaly and add a 1 in the Hit column for each such year.
Jan(current) > Jan(previous) J-D(current) > J-D(previous)
Year Counter Compare Hit Compare Comment
1980 1 25 > 10 1 23 > 12
1981 1 52 > 25 1 28 > 23
1983 1 49 > 4 1 27 > 9
1986 1 25 > 19 1 15 > 8
1987 1 30 > 25 1 29 > 15
1988 1 53 > 30 1 35 > 29
1990 1 35 > 11 1 39 > 24
1991 1 38 > 35 0 38 < 39 Fail
1992 1 42 > 38 0 19 < 38 Fail
1995 1 49 > 27 1 43 > 29
1997 1 31 > 25 1 46 > 33
1998 1 60 > 31 1 62 > 46
2001 1 42 > 23 1 53 > 41
2002 1 72 > 42 1 62 > 53
2003 1 73 > 72 0 61 < 62 Fail
2005 1 69 > 57 1 66 > 52
2007 1 94 > 53 1 63 > 60
2009 1 57 > 23 1 60 > 49
2010 1 66 > 57 1 67 > 60
2013 1 63 > 39 1 61 > 58
Predicted 20 > previous year Actual 17 > previous year

Of 20 candidates flagged (Jan(current) > Jan(previous)), 17 are correct (i.e. J-D(current) > J-D(previous)). That’s 85% accuracy for the qualitative annual anomaly forecast on the GISS data set where the current January is warmer than the previous January.

And now for the years where January is colder than the previous January. The procedure is virtually identical, except that we count all where the year’s annual anomaly is colder than the previous year’s annual anomaly and add a 1 in the Hit column for each such year.

Jan(current) < Jan(previous) J-D(current) < J-D(previous)
Year Counter Compare Hit Compare Comment
1982 1 4 < 52 1 9 < 28
1984 1 26 < 49 1 12 < 27
1985 1 19 < 26 1 8 < 12
1989 1 11 < 53 1 24 < 35
1993 1 34 < 42 0 21 > 19 Fail
1994 1 27 < 34 0 29 > 21 Fail
1996 1 25 < 49 1 33 < 43
1999 1 48 < 60 1 41 < 62
2000 1 23 < 48 1 41 < 41 0.406 < 0.407
2004 1 57 < 73 1 52 < 61
2006 1 53 < 69 1 60 < 66
2008 1 23 < 94 1 49 < 63
2011 1 46 < 66 1 55 < 67
2012 1 39 < 46 0 58 > 55 Fail
Predicted 14 < previous year Actual 11 < previous year

Of 14 candidates flagged (Jan(current) < Jan(previous)), 11 are correct (i.e. J-D(current) < J-D(previous)). That’s 79% accuracy for the qualitative annual anomaly forecast on the GISS data set where the current January is colder than the previous January. Note that the 1999 annual anomaly is 0.407, and the 2000 annual anomaly is 0.406, when calculated to 3 decimal places. The GISS text file only shows 2 (implied) decimal places.

The scatter graph at this head of this article compares the January and annual GISS anomalies for visual reference.

Now for a verification comparison amongst the various data sets, from the spreadsheet referenced above. First, all years during the satellite era, which were forecast to be warmer than the previous year

Data set Had3 Had4 GISS UAH5.6 RSS NOAA
Ann > previous 16 15 17 18 18 15
Jan > previous 19 18 20 21 20 18
Accuracy 0.84 0.83 0.85 0.86 0.90 0.83

Next, all years during the satellite era, which were forecast to be colder than the previous year

Data set Had3 Had4 GISS UAH5.6 RSS NOAA
Ann < previous 11 11 11 11 11 11
Jan < previous 15 16 14 13 14 16
Accuracy 0.73 0.69 0.79 0.85 0.79 0.69

The following are scatter graph comparing the January and annual anomalies for the other 5 data sets:

HadCRUT3

HadCRUT3 Data – Walter Dnes

HadCRUT4

HadCRUT4 Data – Walter Dnes

UAH 5.6

UAH 5.6 Data – Walter Dnes

RSS

RSS Data – Walter Dnes

NOAA

NOAA Data – Walter Dnes

The forecast methodology had problems during the Pinatubo years, 1991 and 1992. And 1993 also had problems, because the algorithm compares with the previous year, in this case Pinatubo-influenced 1992. The breakdowns were…

  • For 1991 all 6 data sets were forecast to be above their 1990 values. The 2 satellite data sets (UAH and RSS) were above their 1990 values, but the 4 surface-based data sets were below their 1990 values
  • For 1992 the 4 surface-based data sets (HadCRUT3, HadCRUT4, GISS, and NCDC/NOAA) were forecast to be above their 1991 values, but were below
  • The 1993 forecast was a total bust. All 6 data sets were forecast to be below their 1992 values, but all finished the year above

In summary, during the 3 years 1991/1992/1993, there were 6*3=18 over/under forecasts, of which 14 were wrong. In plain English, if a Pinatubo-like volcano dumps a lot of sulfur dioxide (SO2) into the stratosphere, the JLI will not be usable for the next 2 or 3 years, i.e.:

“The most significant climate impacts from volcanic injections into the stratosphere come from the conversion of sulfur dioxide to sulfuric acid, which condenses rapidly in the stratosphere to form fine sulfate aerosols. The aerosols increase the reflection of radiation from the Sun back into space, cooling the Earth’s lower atmosphere or troposphere. Several eruptions during the past century have caused a decline in the average temperature at the Earth’s surface of up to half a degree (Fahrenheit scale) for periods of one to three years. The climactic eruption of Mount Pinatubo on June 15, 1991, was one of the largest eruptions of the twentieth century and injected a 20-million ton (metric scale) sulfur dioxide cloud into the stratosphere at an altitude of more than 20 miles. The Pinatubo cloud was the largest sulfur dioxide cloud ever observed in the stratosphere since the beginning of such observations by satellites in 1978. It caused what is believed to be the largest aerosol disturbance of the stratosphere in the twentieth century, though probably smaller than the disturbances from eruptions of Krakatau in 1883 and Tambora in 1815. Consequently, it was a standout in its climate impact and cooled the Earth’s surface for three years following the eruption, by as much as 1.3 degrees at the height of the impact.” USGS

For comparison, here are the scores with the Pinatubo-affected years (1991/1992/1993) removed. First, where the years were forecast to be warmer than the previous year

Data set Had3 Had4 GISS UAH5.6 RSS NOAA
Ann > previous 16 15 17 17 17 15
Jan > previous 17 16 18 20 19 16
Accuracy 0.94 0.94 0.94 0.85 0.89 0.94

And for years where the anomaly was forecast to be below the previous year

Data set Had3 Had4 GISS UAH5.6 RSS NOAA
Ann < previous 11 11 11 10 10 11
Jan < previous 14 15 13 11 12 15
Accuracy 0.79 0.73 0.85 0.91 0.83 0.73

Given the existence of January and annual data values, it’s possible to do linear regressions and even quantitative forecasts for the current calendar year’s annual anomaly. With the slope and y-intercept available, one merely has to wait for the January data to arrive in February and run the basic “y = mx + b” equation. The correlation is approximately 0.79 for the surface data sets, and 0.87 for the satellite data sets, after excluding the Pinatubo-affected years (1991 and 1992).

There will probably be a follow-up article a month from now, when all the January data is in, and forecasts can be made using the JLI. Note that data downloaded in February will be used. NOAA and GISS use a missing-data algorithm which results in minor changes for most monthly anomalies, every month, all the way back to day 1, i.e. January 1880. The monthly changes are generally small, but in borderline cases, the changes may affect rankings and over/under comparisons.

The discovery of the JLI was a fluke based on a hunch. One can only wonder what other connections could be discovered with serious “data-mining” efforts.

0 0 votes
Article Rating

Discover more from Watts Up With That?

Subscribe to get the latest posts sent to your email.

165 Comments
Inline Feedbacks
View all comments
Bernie Hutchins
February 5, 2014 12:18 am

Dr. Strangelove said on February 4, 2014 at 10:29 pm
“Bernie
The concepts are simple enough to understand without computer programming, at least for those familiar with probability theory. If not, a google search on statistics would be more informative.”
Wow – that was dismissive! I am familiar enough with probability theory. What I do not understand is the “nuts and bolts” of what you claim to be doing in your post of 5:46pm. It is vague and makes no sense – what you did, or why you are even doing certain things. If it’s not BS on your part, than post some code. Or are you just hand-waving?

Editor
February 5, 2014 3:44 pm

walterdnes says:
February 4, 2014 at 1:30 am

Willis, does the “skill level” become significant if you use GISS data 1955 to 2013, especially with 1991-1994 removed? See below for the long explanation.

Sorry, I’ve said this before, but it bears repeating. You don’t get to pick and choose what data to use based on whether it fits your theory.
More to the point … who cares? To give a real-world example, the farmers around where I live like hot summers for the grapes. So I took a look at your JLI for my local weather station, Santa Rosa. I used all of the months, not just January, as the leading indicator for that month plus the next 11 months.
Just like you said, it works a treat, it gives me a 59% success rate. So I set myself as Nostradamus of the North, the Weather Prognosticator.
So now, when the January is warmer than last year, the good farmers around here come to me and I’ll tell them “Yep, Walter’s indicator says it will be warmer”. And they all go away satisfied, because they can now plan for the future … except for one ornery old geezer who comes back and says, “Hang on … how much warmer than last year will it be?”
So I go back to my data, I average out all of the results, and I tell him “Walter’s method says it will be a bit more than a tenth of a degree warmer than last year” … he considers that a moment, then asks for the standard deviation of the results … I go back and calculate that one … “Plus or minus half a degree”, I tell him.
And the farmer says “You’re telling me that this year will be a tenth of a degree warmer than last year, plus or minus half a degree? Have you lost your mind? What do I care about a tenth of a degree, particularly with that wide an error in the results?”
I’m sure you can see the moral of the story. It’s a difference that doesn’t make a difference, and things are even worse (much smaller values) at a global level. In fact, using January alone for all of the GISS LOTI data, yes, there is a real result (Average of positive = 0.05, average of negative = -0.05), but the standard deviation is twice that value (0.10).
Finally, upstream someone commented:

Whether or not the JLI has skill, it appears to provide us with an essential ingredient for policy making. This ingredient is information about the outcomes from policy decisions. None of the climate models referenced by AR4 provide us with information.

Using the GISS LOTI data, we can say that if this January is warmer than last January, we can say that this year will be will be a heart-stopping 0.05°C warmer than last year ON AVERAGE, with a 95% confidence interval from -0.15°C to 0.25°C.
Anyone who thinks that a projected possible warming of five hundredths of a degree is an “essential ingredient for policy making” hasn’t thought this all the way through.
w.

Editor
February 5, 2014 3:50 pm

Dr. Strangelove says:
February 4, 2014 at 5:46 pm

Willis
I’m not making things up. Do a Monte Carlo simulation the way I described it. You can probably do it on Excel spreadsheet. You need a random number generator. Assign two possible outcomes: Outcome 1 and Outcome 2 as explained previously. Each with equal probability P = 0.5. Plot the histogram of the 20,000 runs. You will see a normal curve. Then compute the probabilities from the normal curve of the actual results of outcomes 1 and 2 obtained from temperature data. This is not a hoax.

Read up on the difference between “white noise” and “red noise”, Doc. You’ve used white noise, perhaps not even random normal white noise (excel “RAND” function gives uniform random rather than normal random numbers) … but the temperature data you are testing is red noise, actually very red noise. As a result, you need red noise pseudodata for the monte carlo test.
w.

Editor
February 5, 2014 3:52 pm

Guy says:
February 4, 2014 at 6:40 pm

Should we not be looking at the prediction for “the rest of the year”? If January is up and the rest of the year is down less than January was up, the year will still be up.

Thanks for the support, Guy. I’ve pointed that out a bunch of times, only to be told it doesn’t matter. It does matter, but these guys love their “positive” results … go figure.
w.

Editor
February 5, 2014 4:07 pm

Terry Oldberg says:
February 4, 2014 at 9:01 pm

Dr. Strangelove:
Thanks for sharing your view. It looks to me as though you’ve conflated some concepts.
As I’ll use the terms, a “Condition” is a condition on the Cartesian product of the values that are taken on by a model’s independent variables. An “Outcome” is a condition on the Cartesian product of the values that are taken on by a model’s dependent variables.

I suppose I should try decoding Terry again, although it’s not been too productive in the past … Terry, defining a “Condition” in capital letters as a “condition on the Cartesian product of the values” of the independent variables, doesn’t mean anything to me.
Suppose we have two independent variables, J and K. The values of J are {a, b, c} and the values of K are {d, e, f}. The Cartesian product of those two sets, more commonly called the “cross product” is the set of all possible pairs,
{ { a, d }, { a, e }, { a, f }, { b, d }, { b, e }, { b, f }, { c, d }, { c, e }, { c, f } }
OK, that’s our Cartesian product of the values of the independent variables. But what on earth is a “condition on” the set { { a, d }, { a, e }, { a, f }, { b, d }, { b, e }, { b, f }, { c, d }, { c, e }, { c, f } }? That makes no sense at all, and its application to the current situation is completely unclear.
w.

Editor
February 5, 2014 4:11 pm

Bernie Hutchins says:
February 4, 2014 at 9:30 pm

Strangelove,
People here need to see your code.

Dr. Strangelove says:
February 4, 2014 at 10:29 pm

Bernie
The concepts are simple enough to understand without computer programming, at least for those familiar with probability theory.

Doc, Bernie’s not asking about your concepts. He wants to see exactly what you did. Not what your concepts say you did. Not what you truly believe you did.
What you actually did.
Either show your code or we are under no obligation to listen to a word you say. This is a scientific site.
w.

February 5, 2014 5:41 pm

Willis Eschenbach says:
February 5, 2014 at 3:44 pm
Anyone who thinks that a projected possible warming of five hundredths of a degree is an “essential ingredient for policy making” hasn’t thought this all the way through.
I agree. And the MET office is no better. See
http://www.metoffice.gov.uk/media/pdf/1/8/decadal_forecast_2014-2018_jan2014.pdf
“• Averaged over the 5-year period 2014-2018, global average temperature is expected to remain high and is likely to be between 0.17°C and 0.43°C above the long-term (1981–2010) average.”
“Conclusions
It also has a broad range of potential applications in terms of policy making and investment decisions.”

Brian H
February 6, 2014 12:17 am

Aren’t we talking about autocorrelation here? Each month is the start point for the next.

Editor
February 6, 2014 12:33 am

Brian H says:
February 6, 2014 at 12:17 am (Edit)

Aren’t we talking about autocorrelation here? Each month is the start point for the next.

Umm … yep, I talked about that very thing, as have other folks. A search for “autocorr…” on the page will find much discussion of the subject.
w.

1 5 6 7