The January Leading Indicator

GISS Data – Image Credit: Walter Dnes

By Walter Dnes – Edited by Just The Facts

Investopedia defines “Leading Indicator” thusly…

A measurable economic factor that changes before the economy starts to follow a particular pattern or trend. Leading indicators are used to predict changes in the economy, but are not always accurate.

Economics is not the only area where a leading indicator is nice to have. A leading indicator that could predict in February, whether this calendar year’s temperature anomaly will be warmer or colder than the previous calendar year’s anomaly would also be nice to have. I believe that I’ve stumbled across exactly that. Using data from 1979 onwards, the rule goes like so…

  1. If this year’s January anomaly is warmer than last year’s January anomaly, then this year’s annual anomaly will likely be warmer than last year’s annual anomaly.
  2. If this year’s January anomaly is colder than last year’s January anomaly, then this year’s annual anomaly will likely be colder than last year’s annual anomaly.


This is a “qualitative” forecast. It doesn’t forecast a number, but rather a boundary, i.e. greater than or less than a specific number. I don’t have an explanation for why it works. Think of it as the climatological equivalent of “technical analysis”; i.e. event X is usually followed by event Y, leaving to others to figure out the underlying “fundamentals”, i.e. physical theory. I’ve named it the “January Leading Indicator”, abbreviated as “JLI” (which some people will probably pronounce as “July”). The JLI has been tested on the following 6 data sets, GISS, HadCRUT3, HadCRUT4, UAH5.6, RSS and NOAA

In this post I will reference this zipped GISS monthly anomaly text file and this spreadsheet. Note that one of the tabs in the spreadsheet is labelled “documentation”. Please read that tab first if you download the spreadsheet and have any questions about it.

The claim of the JLI would arouse skepticism anywhere, and doubly so in a forum full of skeptics. So let’s first look at one data set, and count the hits and misses manually, to verify the algorithm. The GISS text file has to be reformatted before importing into a spreadsheet, but it is optimal for direct viewing by humans. The data contained within the GISS text file is abstracted below.

Note: GISS numbers are the temperature anomaly, multiplied by 100, and shown as integers. Divide by 100 to get the actual anomaly. E.g. “43” represents an anomaly of 43/100=0.43 Celsius degrees. “7” represents an anomaly of 7/100=0.07 Celsius degrees.

  • The first 2 columns on the left of the GISS text file are year and January anomaly * 100.
  • The column after “Dec” (labelled “J-D”) is the January-December anomaly * 100

The verification process is as follows:

  • Count all the years where the current year’s January anomaly is warmer than the previous year’s January anomaly. Add a 1 in the Counter column for each such year.
  • For each such year, we count all where the year’s annual anomaly is warmer than the previous year’s annual anomaly and add a 1 in the Hit column for each such year.
Jan(current) > Jan(previous) J-D(current) > J-D(previous)
Year Counter Compare Hit Compare Comment
1980 1 25 > 10 1 23 > 12
1981 1 52 > 25 1 28 > 23
1983 1 49 > 4 1 27 > 9
1986 1 25 > 19 1 15 > 8
1987 1 30 > 25 1 29 > 15
1988 1 53 > 30 1 35 > 29
1990 1 35 > 11 1 39 > 24
1991 1 38 > 35 0 38 < 39 Fail
1992 1 42 > 38 0 19 < 38 Fail
1995 1 49 > 27 1 43 > 29
1997 1 31 > 25 1 46 > 33
1998 1 60 > 31 1 62 > 46
2001 1 42 > 23 1 53 > 41
2002 1 72 > 42 1 62 > 53
2003 1 73 > 72 0 61 < 62 Fail
2005 1 69 > 57 1 66 > 52
2007 1 94 > 53 1 63 > 60
2009 1 57 > 23 1 60 > 49
2010 1 66 > 57 1 67 > 60
2013 1 63 > 39 1 61 > 58
Predicted 20 > previous year Actual 17 > previous year

Of 20 candidates flagged (Jan(current) > Jan(previous)), 17 are correct (i.e. J-D(current) > J-D(previous)). That’s 85% accuracy for the qualitative annual anomaly forecast on the GISS data set where the current January is warmer than the previous January.

And now for the years where January is colder than the previous January. The procedure is virtually identical, except that we count all where the year’s annual anomaly is colder than the previous year’s annual anomaly and add a 1 in the Hit column for each such year.

Jan(current) < Jan(previous) J-D(current) < J-D(previous)
Year Counter Compare Hit Compare Comment
1982 1 4 < 52 1 9 < 28
1984 1 26 < 49 1 12 < 27
1985 1 19 < 26 1 8 < 12
1989 1 11 < 53 1 24 < 35
1993 1 34 < 42 0 21 > 19 Fail
1994 1 27 < 34 0 29 > 21 Fail
1996 1 25 < 49 1 33 < 43
1999 1 48 < 60 1 41 < 62
2000 1 23 < 48 1 41 < 41 0.406 < 0.407
2004 1 57 < 73 1 52 < 61
2006 1 53 < 69 1 60 < 66
2008 1 23 < 94 1 49 < 63
2011 1 46 < 66 1 55 < 67
2012 1 39 < 46 0 58 > 55 Fail
Predicted 14 < previous year Actual 11 < previous year

Of 14 candidates flagged (Jan(current) < Jan(previous)), 11 are correct (i.e. J-D(current) < J-D(previous)). That’s 79% accuracy for the qualitative annual anomaly forecast on the GISS data set where the current January is colder than the previous January. Note that the 1999 annual anomaly is 0.407, and the 2000 annual anomaly is 0.406, when calculated to 3 decimal places. The GISS text file only shows 2 (implied) decimal places.

The scatter graph at this head of this article compares the January and annual GISS anomalies for visual reference.

Now for a verification comparison amongst the various data sets, from the spreadsheet referenced above. First, all years during the satellite era, which were forecast to be warmer than the previous year

Data set Had3 Had4 GISS UAH5.6 RSS NOAA
Ann > previous 16 15 17 18 18 15
Jan > previous 19 18 20 21 20 18
Accuracy 0.84 0.83 0.85 0.86 0.90 0.83

Next, all years during the satellite era, which were forecast to be colder than the previous year

Data set Had3 Had4 GISS UAH5.6 RSS NOAA
Ann < previous 11 11 11 11 11 11
Jan < previous 15 16 14 13 14 16
Accuracy 0.73 0.69 0.79 0.85 0.79 0.69

The following are scatter graph comparing the January and annual anomalies for the other 5 data sets:

HadCRUT3

HadCRUT3 Data – Walter Dnes

HadCRUT4

HadCRUT4 Data – Walter Dnes

UAH 5.6

UAH 5.6 Data – Walter Dnes

RSS

RSS Data – Walter Dnes

NOAA

NOAA Data – Walter Dnes

The forecast methodology had problems during the Pinatubo years, 1991 and 1992. And 1993 also had problems, because the algorithm compares with the previous year, in this case Pinatubo-influenced 1992. The breakdowns were…

  • For 1991 all 6 data sets were forecast to be above their 1990 values. The 2 satellite data sets (UAH and RSS) were above their 1990 values, but the 4 surface-based data sets were below their 1990 values
  • For 1992 the 4 surface-based data sets (HadCRUT3, HadCRUT4, GISS, and NCDC/NOAA) were forecast to be above their 1991 values, but were below
  • The 1993 forecast was a total bust. All 6 data sets were forecast to be below their 1992 values, but all finished the year above

In summary, during the 3 years 1991/1992/1993, there were 6*3=18 over/under forecasts, of which 14 were wrong. In plain English, if a Pinatubo-like volcano dumps a lot of sulfur dioxide (SO2) into the stratosphere, the JLI will not be usable for the next 2 or 3 years, i.e.:

“The most significant climate impacts from volcanic injections into the stratosphere come from the conversion of sulfur dioxide to sulfuric acid, which condenses rapidly in the stratosphere to form fine sulfate aerosols. The aerosols increase the reflection of radiation from the Sun back into space, cooling the Earth’s lower atmosphere or troposphere. Several eruptions during the past century have caused a decline in the average temperature at the Earth’s surface of up to half a degree (Fahrenheit scale) for periods of one to three years. The climactic eruption of Mount Pinatubo on June 15, 1991, was one of the largest eruptions of the twentieth century and injected a 20-million ton (metric scale) sulfur dioxide cloud into the stratosphere at an altitude of more than 20 miles. The Pinatubo cloud was the largest sulfur dioxide cloud ever observed in the stratosphere since the beginning of such observations by satellites in 1978. It caused what is believed to be the largest aerosol disturbance of the stratosphere in the twentieth century, though probably smaller than the disturbances from eruptions of Krakatau in 1883 and Tambora in 1815. Consequently, it was a standout in its climate impact and cooled the Earth’s surface for three years following the eruption, by as much as 1.3 degrees at the height of the impact.” USGS

For comparison, here are the scores with the Pinatubo-affected years (1991/1992/1993) removed. First, where the years were forecast to be warmer than the previous year

Data set Had3 Had4 GISS UAH5.6 RSS NOAA
Ann > previous 16 15 17 17 17 15
Jan > previous 17 16 18 20 19 16
Accuracy 0.94 0.94 0.94 0.85 0.89 0.94

And for years where the anomaly was forecast to be below the previous year

Data set Had3 Had4 GISS UAH5.6 RSS NOAA
Ann < previous 11 11 11 10 10 11
Jan < previous 14 15 13 11 12 15
Accuracy 0.79 0.73 0.85 0.91 0.83 0.73

Given the existence of January and annual data values, it’s possible to do linear regressions and even quantitative forecasts for the current calendar year’s annual anomaly. With the slope and y-intercept available, one merely has to wait for the January data to arrive in February and run the basic “y = mx + b” equation. The correlation is approximately 0.79 for the surface data sets, and 0.87 for the satellite data sets, after excluding the Pinatubo-affected years (1991 and 1992).

There will probably be a follow-up article a month from now, when all the January data is in, and forecasts can be made using the JLI. Note that data downloaded in February will be used. NOAA and GISS use a missing-data algorithm which results in minor changes for most monthly anomalies, every month, all the way back to day 1, i.e. January 1880. The monthly changes are generally small, but in borderline cases, the changes may affect rankings and over/under comparisons.

The discovery of the JLI was a fluke based on a hunch. One can only wonder what other connections could be discovered with serious “data-mining” efforts.

About these ads
This entry was posted in Forecasting, Lower Troposphere Temperature, Temperature and tagged , , . Bookmark the permalink.

165 Responses to The January Leading Indicator

  1. Rhoda R says:

    I think I’d rather look at jet stream and positions of highs and lows as well as what the various ENSO readings are doing.

  2. timetochooseagain says:

    USGS claim of a cooling of 1.3 degrees is pure bullshit, I’m sorry to say that as indelicately as I possibly can.

  3. philjourdan says:

    Is that pre or post adjustments?

  4. Eric Worrall says:

    So, BAU is still the best short term climate model, despite billions of taxpayer’s money spent on developing analytical approaches.

    I want a refund :-)

  5. Col Mosby says:

    What’s left is to use each of the year’s months as a leading indicator and see what, if any, differences you might find. Common sense says that these leading indicators will work best during periods of strong temperature trending, up or down. Also experiment with algorithm in which a “no prediction” call is made unless the previous and current month’s anomaly difference exceeds a selected magnitude.

  6. Walter Dnes says:

    philjourdan says:
    > February 1, 2014 at 4:47 pm
    > Is that pre or post adjustments?

    The algorithm uses the monthly anomaly numbers downloaded from the various URLs for Hadley/GISS/NOAA/UAH/RSS, with whatever adjustments they’ve included. The only adjustment I’ve added is to remove the 1991 and 1992 data due to Pinatubo interference.

  7. MikeN says:

    That could just be luck.

  8. noaaprogrammer says:

    Another variation to explore would be seasonal leading indicators. Many people are interested to know whether or not this winter/summer is going to be colder/hotter than most other winters/summers.

  9. Werner Brozek says:

    We have all heard of numerous adjustments, but I think that this is one time that the adjustments are not relevant. After all, we are not interested in the rate of warming over the last several decades but how the January anomalies predict annual anomalies. And any adjustments that are made would affect January as much as the annual anomaly more or less equally.

  10. Gary Pearse says:

    Rhoda R says:
    February 1, 2014 at 4:37 pm

    “I think I’d rather look at jet stream and positions of highs and lows as well as what the various ENSO readings are doing.”

    Gee Rhoda, with an 80% track record, why wouldn’t you look at both. In any case, you don’t get to look at future jet stream positions. A good measure of your preference would be to go back 30 years and “use” jet stream and Enso and compares to this.

    Walter, it might be further refinable. The “fails” were cases where only 1-4 units difference occurred. You could make such small differences equal to “no change” in annual anomaly. Also, note the unfudged satellite data gave the highest correlation. This might be a “test” of the data sets.

  11. Kip Hansen says:

    If this were a medical issue, I would ask to see some prior explanations regarding biological plausibility.

  12. walterdnes says:

    Col Mosby says:
    > February 1, 2014 at 4:56 pm

    > What’s left is to use each of the year’s months as a leading
    > indicator and see what, if any, differences you might find.

    That goes under “further data-mining”.

    > Also experiment with algorithm in which a “no prediction”
    > call is made unless the previous and current month’s anomaly
    > difference exceeds a selected magnitude.

    Stuff like that also goes under “further data-mining”. I’ll see what else I can dig up. I wanted to get word out about what I’ve found out so far.

  13. wbrozek says:

    MikeN says:
    February 1, 2014 at 5:24 pm
    That could just be luck.

    I believe there are sound scientific reasons for these numbers. Here are five reasons:
    1. By the laws of averages, half of all Januaries should be above the yearly average and half should be below. So with a number of high Januaries, the final anomalies would be higher than for a number of low Januaries.
    2. Related to the above, if the January anomaly went from 0.4 to 0.3, and if we assume the previous year also had an average anomaly of 0.4, and with the chances being 50% for an anomaly of less than 0.3 for the new year, the odds are greater than 50% for an anomaly of less than 0.4.
    3. The number in January may be so much higher or lower that it takes 11 months of normal values to partially negate the effect of the high or low January value. To use a sports analogy, two teams may be very equal, but one team has the jitters for the first 5 minutes and is down by 3 goal in this time. It is quite possible that the rest of the game is not long enough for this deficit to be overcome.
    4. According to Bob Tisdale, effects of El Nino or La Nina often show themselves in January so in those cases, it would be obvious why the rest of the year follows.
    5. Any other cycle such as a sun that getting quieter every year would automatically be reflected in the anomalies for January and the rest of the year as well.
    6. Can you think of others?

  14. walterdnes says:

    noaaprogrammer says:
    > February 1, 2014 at 5:25 pm

    > Another variation to explore would be seasonal
    > leading indicators. Many people are interested to
    > know whether or not this winter/summer is going to
    > be colder/hotter than most other winters/summers.

    An interesting idea. I.e. rather than sticking to calendar years, go N months forward.

  15. Robert of Ottawa says:

    Coin toss.

  16. walterdnes says:

    MikeN says:
    > February 1, 2014 at 5:24 pm

    > That could just be luck.

    If you get 70% to 90% accuracy at blackjack in Vegas, you get thrown out of the casino for being a card-counter
    http://en.wikipedia.org/wiki/Card_counting#Countermeasures

  17. walterdnes says:

    Robert of Ottawa says:
    > February 1, 2014 at 5:47 pm

    > Coin toss.

    * Take 6 different coins.
    * Flip each coin 20 times

    What are the odds that *ALL 6 COINS* come up heads 14 to 18 times out of 20?

  18. wws says:

    I think I reacted the same way Wbrozek did, a couple replies above. Let me try and state what I think may be a rather simple explanation, simply:

    to quote: “Using data from 1979 onwards, the rule goes like so…

    1. If this year’s January anomaly is warmer than last year’s January anomaly, then this year’s annual anomaly will likely be warmer than last year’s annual anomaly.
    2. If this year’s January anomaly is colder than last year’s January anomaly, then this year’s annual anomaly will likely be colder than last year’s annual anomaly.”

    Allow me to restate those rules slightly:
    1. After 1/12 of the year has already been measured, and if that measurement is is found to be higher than last year’s measurement for the same time period, then the entire year’s measurements are *at least* 1/12 more likely to be higher than the last entire year’s measurements.

    2. After 1/12 of the year has already been measured, and if that measurement is is found to be colder than last year’s measurement for the same time period, then the entire year’s measurements are *at least* 1/12 more likely to be colder than the last entire year’s measurements.

    And to put it even more simply yet: You are predicting that the numbers you have already measured are likely to influence your final measurement.

    There’s a reason oddsmakers generally don’t take any more bets once the game has started.

  19. walterdnes says:

    wws says:
    > February 1, 2014 at 5:58 pm

    > And to put it even more simply yet: You are predicting
    > that the numbers you have already measured are likely
    > to influence your final measurement.

    > There’s a reason oddsmakers generally don’t take any
    > more bets once the game has started.

    I agree with what you’ve said. That’s how leading indicators work. There is still some value in getting a future forecast.

  20. davidmhoffer says:

    I’m sorry, I just don’t see how one could get any result other than this!

    For a given year in which the anomaly for the year is positive, by definition, the average of the monthly anomalies within the year must also be positive. In other words, my expectation is that if you do this with any given month, you will find the exact same thing. If you could show that some months are consistently good leading indicators and others poor ones, that might be more interesting. But showing that in a warmer than usual year, one given month is also warmer, all I can think of is….yeah, what else would you expect? The only place where this is unlikely to be true is where a short term perturbation of the system (ie Pinatubo) is introduced, or the direction of the long term trend reverses in that given year.

  21. darrylb says:

    Thank you Walter D.
    At a certain location, the temps were in the upper third of the range of temps for 13 months running , Jeff Masters of Weather Underground stated that the likelihood of that happening was one chance in three to the thirteenth power!
    A good example to talk about white and red data in statistics. Month to month fluctuations are usually not that great
    Your empirical data would tend to show that heat does enter and leave the ocean continually for extended periods of time.
    How about plotting the last eight months of the year anomalies as a function those in January?

  22. Jeff Alberts says:

    Once you realize there is no global temperature, and therefore no anomaly, this article becomes pretty much moot.

  23. walterdnes says:

    davidmhoffer says:
    > February 1, 2014 at 6:20 pm

    > I’m sorry, I just don’t see how one could get any
    > result other than this!

    Here’s my competition…

    http://www.metoffice.gov.uk/news/releases/archive/2013/global-temperature-2014
    > 19 December 2013 – The global average temperature
    > in 2014 is expected to be between 0.43 C and 0.71 C
    > above the long-term (1961-1990) average of 14.0 C,
    > with a central estimate of 0.57 C, according to the Met
    > Office annual global temperature forecast.
    >
    > Taking into account the range of uncertainty in the
    > forecast, it is likely that 2014 will be one of the warmest
    > ten years in the record which goes back to 1880.
    >
    > The forecast range and central estimate for 2014 are
    > the same as were forecast by the Met Office for 2013.

    Note that their anomaly is the average of HadCRUT4, NOAA/NCDC, and NOAA/GISS data sets. My post article uses the names “NOAA” for NOAA/NCDC, and “GISS” for NOAA/GISS. Those 3 data sets are part of the 6 I use. In 2013, the annual anomalies were…
    HadCRUT4 0.488
    NOAA/GISS 0.607
    NOAA/NCDC 0.621
    Average 0.572

    I mentioned in passing that you can do a linear regression on the January anomaly, versus annual anomalies, to generate a quantitative forecast. I acknowledge having 2 months advantage on the UK Met Office for my forecast, which should be out the 3rd week of February. Come next January, we’ll see how my forecast fared versus the UK Met Office. The UK Met Office doesn’t provide separate values for the 3 data sets. So I’ll take the average of my forecasts for HadCRUT4, GISS, and NOAA. This will enable an apples-to-apples comparison.

  24. OssQss says:

    Logical as an indicator when considering the solstice as a starting point.

  25. walterdnes says: February 1, 2014 at 6:55 pm

    Here’s my competition…

    Barely, i.e. “Met Office global forecasts too warm in 13 of last 14 years”:
    http://www.bbc.co.uk/blogs/paulhudson/posts/Met-Office-global-forecasts-too-warm-in-13-of-last-14-years

  26. walterdnes says:

    Out of sheer curiousity, is anybody else out there making forecasts about the 2014 annual temperature anomaly?

  27. davidmhoffer says:

    walterdnes;

    Your competition has nothing to do with it. You tested to see if the data in a given set follows the same general trend as does a subset of that same data. It does. If it didn’t that would be significant.

  28. rogerknights says:

    wws says:

    There’s a reason oddsmakers generally don’t take any more bets once the game has started.

    Not on Intrade. It worked like a futures market. Betting was open on the annual anomaly until the day the contract expired; IOW, until the year ended. The price (odds) offered or bid upon moved up and down to take into account traders’ estimates of how much what had gone before was likely to influence the final outcome.

    Even in Vegas, one can bet on ongoing major sports events like the super bowl or world series. The odds adjust to take into account the score so far.

  29. rogerknights says:

    How is the JLI for THIS January shaping up?

  30. walterdnes says: February 1, 2014 at 7:04 pm

    Out of sheer curiousity, is anybody else out there making forecasts about the 2014 annual temperature anomaly?

    It’s equivocating, but here is the Hansen et al., 2014 prediction:

    “So what are the near-term prospects? El Niño depends on fickle wind anomalies for initiation, so predictions are inherently difficult, but conditions are ripe for El Niño initiation in 2014. About half of the climate models catalogued by the International Research Institute predict that the next El Ni ño will begin by summer 2014, with the other half predicting ENSO neutral conditions 21. The mean NCEP forecast 21 issued 13 January has an El Niño beginning in the summer of 2014, although a significant minority of the ensemble members predicts ENSO neutral conditions for 2014.

    The strength of an El Niño, too, depends on the fickle wind anomalies at the time of initiation. We speculated 22 that the likelihood of “super El Niños, such as those in 1982 – 3 and 1997 –
    8, has increased. Our rationale was that global warming increased SSTs in the Western Pacific, without yet having much 13 effect on the temperature of upwelling deep water in the Eastern Pacific (Fig. 2 above), thus allowing the possibility of a larger swing of Eastern Pacific temperature. Recent paleoclimate 23 and modeling 24 studies find evidence for an increased frequency of extreme El Niños with global warming.

    Assuming that an El Niño begins in summer 2014, 2014 is likely to be warmer than 2013 and perhaps the warmest year in the instrumental record. However, given the lag between El Niño initiation and global temperature, 2015 is likely to have a temperature even higher than in 2014.”
    http://www.columbia.edu/~jeh1/mailings/2014/20140121_Temperature2013.pdf

  31. walterdnes says:

    davidmhoffer says:
    > February 1, 2014 at 7:09 pm

    > Your competition has nothing to do with it. You tested to see
    > if the data in a given set follows the same general trend as
    > does a subset of that same data. It does. If it didn’t that
    > would be significant.

    The point of this article was to show a useful forecast tool. Yes, it looks obvious now. How many people were using this method in the past?

  32. I like this work! I like it because it has features of a scientific study that are missing from the studies of global warming that are referenced by the IPCC in its assessment reports. There are events (with durations of 1 calendar year each). Each event has an outcome (whether or not the current year’s annual anomaly exceeds the previous year’s annual anomaly). Each event has a condition (whether the current year’s January anomaly exceeds the previous year’s anomaly). Observed events in which the annual anomaly exceeds the previous year’s anomaly have the count that statisticians call the “frequency.” Observed events in which the January anomaly exceeds the previous year’s anomaly have a frequency. The ratio of the two frequencies is an example of the idea that statisticians call a “relative frequency.” A relative frequency is the empirical counterpart of a probability. Probabilities are an essential component of logic.

    There are the makings here for a scientific theory. Steps along the path toward such a theory would include adapting the model to predict the relative frequencies of the outcomes of the future and the uncertainties in these relative frequencies. Going forward, the question should be asked of whether the predicted relative relative frequencies and uncertainties are a match for the observed relative frequencies. If they are a match, the model is validated. Otherwise, it is falsified.

    Also, the list of independent variables of the model should be expanded beyond the amount of the January anomaly and the question should be asked of whether a condition other than the one assumed would provide more information about the outcome. Among the independent variables considered for inclusion should be the CO2 concentration.

  33. davidmhoffer says:

    walterdnes;
    The point of this article was to show a useful forecast tool.
    >>>>>>>>>>>

    But it isn’t. All it shows is that warm years are comprised of warm months. What else would a warm year be comprised of?

  34. joshuah says:

    Curious to know if any correlation between December of previous year and the anomaly of the next 12 months… especially since we already have that for this year and won’t get january for a couple more weeks

  35. walterdnes

    Also, for reference:

    “Physical barriers to prediction

    Regardless of what type of ENSO forecast model one uses, forecasting ENSO is considerably more difficult during certain seasons of the year than others. Individual El Niño or La Niña episodes tend to develop between the months of April and June, and, once developed, last until the following February through May. Thus, once an episode has developed in early northern summer, forecasting its evolution through the remainder of its life cycle is not difficult. A much harder task is to forecast what will happen between March and June, when a forecast is being made in the preceding January through April. The difficulty in forecasting at this time of year is often called the “spring barrier” (in the Northern Hemisphere), or the “autumn barrier” (in the Southern Hemisphere).

    After April has finished, while there still is uncertainty, it starts becoming easier to see in the latest observations how the stage is being set for the remainder of the calendar year and the first few months of the following year. By June, the uncertainty becomes still less: if there is nothing new developing, the chances of new development are small. While ENSO forecasting is most difficult through the late northern spring, the spring barrier is not impenetrable. Signs of changes in the ENSO state, such as increased heat content in the western equatorial Pacific Ocean, are available, so that at least a probability forecast can be made through the spring barrier. As April, May and June come along, such probabilities normally become more robust.”
    http://iri.columbia.edu/climate/ENSO/background/prediction.html#barrier

  36. Tom says:

    I don’t understand. I thought this was ment to be humerous yet from the comments it appears all are taking it seriously

  37. walterdnes says:

    joshuah says:
    > February 1, 2014 at 7:40 pm

    > Curious to know if any correlation between December of
    > previous year and the anomaly of the next 12 months…
    > especially since we already have that for this year and
    > won’t get january for a couple more weeks

    Good question. Can’t do that quickly. Fortunately, my spreadsheet calculates annual anomalies on-the-fly. So I was able to do a quick-n-dirty hack a couple of minutes ago. I…
    * created a copy of the spreadsheet
    * copied the monthly data to a blank area (as values, not pointers)
    * copied the values back to the data area, offset by 1 month

    This hack pushes December data into January, etc., and effectively compares December anomalies against the 12-month period December to next November. The accuracy fell off approx 10% for most of the years where a warmer anomaly was forecast. It fell off 10% to almost 30% (averaging approx 20%) for years forecast to be cooler. If it had compared to the 12-month period starting in January (i.e. one month later), the accuracy probably would’ve fallen off even more. This could be the basis of a follow-up article, i.e. which month is the best/worst leading indicator for the 12-month mean starting with itself.

  38. Mac the Knife says:

    Gary Pearse says:
    February 1, 2014 at 5:38pm
    Also, note the unfudged satellite data gave the highest correlation. This might be a “test” of the data sets.

    Gary,
    That is what struck me, as I looked at the comparative data plots and correlations.
    Mac

  39. David L. Hagen says:

    Walter
    Thought provoking.

    That looks like further evidence of Hurst Kolmogorov dynamics (aka “climate persistence”). e.g. especially by Demetris Koutsoyianis e.g. Climatic variability over time scales spanning nine orders of magnitude: Connecting Milankovitch cycles with Hurst-Kolmogorov dynamics

    For a possible cause, may I recommend David Stockwell’s Solar Accumulative theory. e.g.
    Key evidence for the accumulative model of high solar influence on global temperature

    Regards
    David

  40. rogerknights says:

    davidmhoffer says:
    February 1, 2014 at 7:30 pm

    walterdnes;
    The point of this article was to show a useful forecast tool.
    >>>>>>>>>>>

    But it isn’t. All it shows is that warm years are comprised composed of warm months a certain warm month. What else would a warm year be comprised composed of?

    A warmer January alone should not have such a disproportionate effect on the average for the rest of the year. Unless there is a hidden linkage to something in the climate system./

  41. rtj1211 says:

    You can do similar analyses for predicting solar cycle maximum amplitude based on the first one, two or three years of any particular cycle.

    A variety of rules of thumb emerge which can be used to predict, correctly or otherwise, how a particular cycle will evolve. Broadly, they involve the SSN for Year 1 of a new cycle being less than something or greater than something. Less than a threshold implies a high likelihood of a weaker cycle, greater than a certain threshold is indicative of a strong cycle.

  42. David L. Hagen says:

    Compare the .Global Warming Prediction Project

    This project is initiated, run, and maintained by KnowledgeMiner Software, a research, consulting and software development company in the field of high-end predictive modeling.

    The objective is doing modeling and prediction of global temperature anomalies through self-organizing knowledge extraction using public data. It predicts temperatures of nine latitudinal bands 36 months ahead, . . .

    Still confirming forecast of Apr 2011 at 73% accuracy. IPCC forecast at 10%. What drives Global Warming? (Update 2)

  43. Willis Eschenbach says:

    davidmhoffer says:
    February 1, 2014 at 6:20 pm

    I’m sorry, I just don’t see how one could get any result other than this!

    For a given year in which the anomaly for the year is positive, by definition, the average of the monthly anomalies within the year must also be positive. In other words, my expectation is that if you do this with any given month, you will find the exact same thing. If you could show that some months are consistently good leading indicators and others poor ones, that might be more interesting. But showing that in a warmer than usual year, one given month is also warmer, all I can think of is….yeah, what else would you expect? The only place where this is unlikely to be true is where a short term perturbation of the system (ie Pinatubo) is introduced, or the direction of the long term trend reverses in that given year.

    I agree. I don’t find this result to be anything other than expected. Since your “leading indicator” is included in the data you are trying to predict, of course it will be correlated.

    As to the question you raise above about different months, David, there’s not a whole lot of difference. When I do the analysis on the whole 132 years of the GISS LOTI dataset, I get the following results:

    Jan, 0.70
    Feb, 0.62
    Mar, 0.57
    Apr, 0.61
    May, 0.56
    Jun, 0.58
    Jul, 0.55
    Aug, 0.65
    Sep, 0.55
    Oct, 0.59
    Nov, 0.70
    Dec, 0.67
    Average, 0.61

    No obvious pattern, nobody really shines.

    Finally, the author hasn’t adjusted for the fact that the data has a trend … and that means that on average, both the January to January and the year to year data both will have a positive value.

    I may run a monte carlo analysis on the data to confirm what it looks like, but as far as I’m concerned, and with my apologies to the author, this is a non-event. This is what you’d expect.

    w.

  44. Joseph Murphy says:

    It seems to me this would hold true with a random data set.

  45. Fred Souder says:

    Walter,
    The heading of your first data table has the ” >” sign switched to a “<" sign, unless I am misreading something.

  46. walterdnes says:

    David L. Hagen says:
    February 1, 2014 at 8:21 pm

    > Compare the Global Warming Prediction Project

    > This project is initiated, run, and maintained by
    > KnowledgeMiner Software, a research, consulting and
    > software development company in the field of high-end
    > predictive modeling.

    That’s what I meant by “data mining”.

  47. davidmhoffer says:

    Thanks Willis.

    Walter, allow me another analogy. Walk up a hill and down the other side, measuring your altitude at each step. Suppose the whole trip is 10,000 steps. Break your trip into 100 step increments. Now, compare the first step from each group of 100 to the average altitude of the entire trip. Classify each step as either higher than average, or lower than average.

    Is the first step a good leading indicator of the next 99 steps being higher or lower than average? Of course it is. You should get a correlation very close to 1. In other words, information that is completely accurate and entirely useless.

    Knowing that steps at higher than average altitude are very likely to be followed by more steps that are higher than average in altitude tells you absolutely zero in regard to any given step being uphill or downhill, which would be a coin flip.

  48. walterdnes says:

    Fred Souder says:
    > February 1, 2014 at 8:45 pm

    > Walter,
    > The heading of your first data table has the ” >” sign
    > switched to a “<" sign, unless I am misreading something.

    I think you're right. "Just The Facts ", can you correct that? I don't have authorization to edit this blog.

  49. Werner Brozek says:

    davidmhoffer says:
    February 1, 2014 at 7:30 pm
    All it shows is that warm years are comprised of warm months. What else would a warm year be comprised of?

    However it also gives other information. For example, what would the ranking be for 2014 on Hadcrut4 if the January anomaly is 0.4 or 0.5?
    With this tool, I can say with a certainty of 75% that if the anomaly is 0.4, then it will be larger than 8th. But if it is 0.5 we can be 75% certain it will be less than 8th. Do you think the MET office would be this close if they had the January numbers? Now I know they do not set the bar too high! ☺

  50. walterdnes says:

    Willis Eschenbach says:
    > February 1, 2014 at 8:21 pm

    > As to the question you raise above about different
    > months, David, there’s not a whole lot of difference. When
    > I do the analysis on the whole 132 years of the GISS LOTI
    > dataset, I get the following results:

    In addition to less coverage back in the past, the people behind GISS have had more opportunity to “adjust” the data from longer ago. See http://wattsupwiththat.com/2011/01/13/tale-of-the-global-warming-tiger/ and http://wattsupwiththat.com/2011/01/16/the-past-is-not-what-it-used-to-be-gw-tiger-tale/ Not to mention that the mid-1940’s warm period has been “disappeared” just like the MWP. I selected 1979-to-2013 in order to be able to do an apples-to-apples comparison between land and satellite data.

    > Finally, the author hasn’t adjusted for the fact that the
    > data has a trend … and that means that on average,
    > both the January to January and the year to year data
    > both will have a positive value.

    I provided separate numbers for forecasts of warmer years versus forecasts of colder years. The warmer-forecast-years do have higher percentage. That alone should be enough to infer a warming trend.

  51. Willis Eschenbach says:

    Recall from above that the average of all of the months of “leading indicators” in the GISS LOTI data is 0.61. That is to say, 61% of the time, if a given month is warmer (colder) than the same month a year previous, 61% of the time that year’s average (12 months starting with and including the given month) is warmer (colder) than the previous 12 months. Recall also that for January the relevant figure was 70%.

    What was missing, and needed, were the details of the “null hypothesis”, which is that this a random event. To see what that value would be, I just finished running the Monte Carlo analysis.

    I took the GISS LOTI data, detrended it, and calculated the AR and MA coefficients (0.93 and -0.48, typical values for global temperature datasets). Then I generated random ARIMA proxy temperature datasets with those parameters, and set them to the trend of the GISS dataset … here is a group of the randomly generated proxies:

    Just kidding, the bottom left one is the actual GISS LOTI data. I do this when I’m doing a Monte Carlo analysis, to make sure I’m actually comparing apples to apples.

    So … what did I find from that? Well, the so-called “leading indicator”, which isn’t leading, agrees with the annual results some 66 percent of the time, with a standard deviation of ± 4%. This means that 95% of the “leading indicator” results for the proxy temperature datasets fell between 58% and 74%.

    And this, as I suspected, means that at 70%, the author’s “leading indicator” is not doing any better than random chance … as as such, it is useless as a prognostication device.

    w.

  52. walterdnes says: February 1, 2014 at 8:55 pm

    can you correct that?

    Corrected.

  53. Nick Stokes says:

    I’ve put here a table of the correlation coefficients for the six land/ocean indices I deal with. Each is over the whole length of data. The correlations are between each month and the annual (calendar) average. Naturally they improve as you advance in the year; mid-year months are a more representative sample.

    I don’t think the result is worthless. As said, Jan is a leading indicator. It has limited use as a predictor.

  54. walterdnes says:

    Willis Eschenbach says:
    > February 1, 2014 at 9:15 pm

    > And this, as I suspected, means that at 70%, the author’s
    > “leading indicator” is not doing any better than random
    > chance … as as such, it is useless as a prognostication
    > device.

    For the satellite era, the GISS numbers I get are 85% (17 of 20 when forecast warmer than previous year) and 79% (11 of 14 when forecast cooler than previous year). That’s with the Pinatubo years included. The numbers look even better with Pinatubo years eliminated.

  55. Joel O'Bryan says:

    Mt Sinabung eruptions, if they continue and strengthen, could make this a bust year and 2015 too. The bright side is that a Sinabung cooling effect could dampen any Summer-Fall El Nino, thus spoiling Trenberth’s hoped-for El Nino warming.

  56. Willis Eschenbach says:

    walterdnes says:
    February 1, 2014 at 9:44 pm

    For the satellite era, the GISS numbers I get are 85% (17 of 20 when forecast warmer than previous year) and 79% (11 of 14 when forecast cooler than previous year). That’s with the Pinatubo years included. The numbers look even better with Pinatubo years eliminated.

    Thanks for the quick response, Walter.

    I’m sorry, but that is special pleading. You need to use the full dataset, not just the section that might be favorable to your theory. Yes, if you throw out the data that gives poor results, your results will get stronger and strong … consider what that means. It means nothing.

    In any case, using a much shorter subset of the data greatly widens the variations in the results. Recall from my analysis above that the results from using each of the 12 months as a “leading indication” for the year that starts with that month were:

    Jan, 70%
    Feb, 62%
    Mar, 57%
    Apr, 61%
    May, 56%
    Jun, 58%
    Jul, 55%
    Aug, 65%
    Sep, 55%
    Oct, 59%
    Nov, 70%
    Dec, 67%
    AVERAGE, 61%
    95% CI 50% to 72%

    If, on the other hand, we use only the satellite era data we get

    Jan, 83%
    Feb, 59%
    Mar, 62%
    Apr, 59%
    May, 50%
    Jun, 62%
    Jul, 44%
    Aug, 50%
    Sep, 44%
    Oct, 47%
    Nov, 71%
    Dec, 68%
    AVERAGE, 58%
    95% CI 35% to 81% 

    Note that while the average is not much different, the spread is wider. Now, my Monte Carlo analysis for the full GISS LOTI gave me a 95% CI from 54% to 71%. Hang on, let me recalculate the results …

    OK. As you’d expect, the average is about the same, but now the confidence interval has widened, to from 43% to 77%.

    So once again, the results that you are finding are not at all surprising. Instead, we find them in random pseudo-data. They are a consequence of three things. The first is that the temperature data is highly autocorrelated. The lag-1 autocorrelation term AR of the total GISS LOTI dataset is > 0.9. This means that once a trend is started, it tends to persist … which in turn means that January is more likely to resemble the following year.

    The second thing that increases correlation is any multi-year overall trend from any reason. If the temperatures drop for a few years, January will more similar to the yearly average. Given that the world has been warming for the last few centuries …

    Finally, you’ve done something which is an absolute no-no in the forecasting world. This is to include the predictor data in the response. Since January is a part of the yearly average, if there were no other factors (no trend, no autocorrelation), we’d expect the January trend to agree with the yearly trend some 53% of the time.

    The difficulty is that these three factors conspire together give results from random “red noise” pseudo-data which are indistinguishable from the results we find in the GISS LOTI data.

    As a result, you have not been able to falsify the null hypothesis. You have not shown that your results are different from what we find in random red-noise pseudo-data.

    Please take this in the supportive sense in which it is offered. You need to learn to do a Monte Carlo analysis in order to see if your results from this (or any other) “indicator” are doing any better than random chance.

    Best regards, and thanks for all the work,

    w.

  57. What’s the point? Isn’t this just suggesting the existence of multiyear trends, which is already well known.

    You can get 75+% success in weather forecasting as well by simply saying “the weather tomorrow will be like the weather today”.

  58. walterdnes says:

    Joel O’Bryan says:
    > February 1, 2014 at 10:42 pm

    > Mt Sinabung eruptions, if they continue and strengthen,
    > could make this a bust year and 2015 too. The bright side
    > is that a Sinabung cooling effect could dampen any
    > Summer-Fall El Nino, thus spoiling Trenberth’s hoped-for
    > El Nino warming.

    Volcano activity is an excuse only when a volcano *EXPLODES VERTICALLY*, pumping a lot of SO2, etc, into the *STRATOSPHERE*, like Pinatubo, Tambora, etc. Spewing lava (Etna, Mauna Loa, etc) causes localized damage, and maybe some forest-fire-equivalant smoke from burning buildings and vegetation, but doesn’t cool the planet noticeably.

  59. Greg Goodman says:

    “The 1993 forecast was a total bust. All 6 data sets were forecast to be below their 1992 values, but all finished the year above”

    Which pretty much shows the extent of even largest stratospheric eruption of 20th was very limited in duration. This is what was found by my volcano stack analysis that superimposed six major eruptions and looked at the evolution of the degree.day (growing days) integral.

    In these graphs a straight downward slope indicates cooler temps, rather than actual cooling. Flat portions even if lower are where the temp has recovered to pre-eruption levels. In the tropics where the degree.days integral comes back to the same level means that temps actually got warmer and made up for the loss of growth days during the post eruption years. ie climate responded to the loss of solar input in some way and self corrected. Not just to restore temperature to previous levels but to compensate for the lost growth caused by making it warmer for an equivalent period.

    http://climategrog.wordpress.com/?attachment_id=285

    Follow the links for similar plots of SST and NH , SH comparisons.

  60. Greg Goodman says:

    Willis : “I’m sorry, but that is special pleading. You need to use the full dataset, not just the section that might be favorable to your theory. Yes, if you throw out the data that gives poor results, your results will get stronger and strong … consider what that means. It means nothing.”

    That would be a reasonable comment if he was arbitrarily removing an “inconvenient ” sections. However, there is a good, accepted, physical reason why that section should have a different behaviour. Removing it is a legitimate step.

    If you were investigation the variation of diurnal temperature variation against solar elevation, it would be legitimate to remove a day that had a solar eclipse at 2pm.

    Other than that, a lot of this is about auto-correlation as you rightly say.

  61. walterdnes says:

    Willis Eschenbach says:
    > February 1, 2014 at 10:49 pm

    > If, on the other hand, we use only the satellite era data we get
    >
    > Jan, 83%
    > Feb, 59%
    > Mar, 62%
    > Apr, 59%
    > May, 50%
    > Jun, 62%
    > Jul, 44%
    > Aug, 50%
    > Sep, 44%
    > Oct, 47%
    > Nov, 71%
    > Dec, 68%
    > AVERAGE, 58%
    > 95% CI 35% to 81%

    OK, maybe it’s been a freaky/flukey 1/3rd of a century, but it’s nice to know that my calculations agree with yours about GISS having approx 80% correlation for January versus the entire year (I get similar numbers for the other data sets). The 80%+ correlation is the whole point of the article. I don’t have a physical explanation for why that is, versus the lower numbers for other months. I get that you’re saying it could be entirely due to chance. Maybe it is. But I’ll stick my neck out this month and make forecasts. A year from now you may be laughing at me.

    As I mentioned previously, I used satellite-era data to enable an apples-to-apples comparison between the surface-based data sets, and the satellite-based data sets.

  62. Greg Goodman says:

    wbrozek: “4. According to Bob Tisdale, effects of El Nino or La Nina often show themselves in January so in those cases, it would be obvious why the rest of the year follows.”

    That does not explain it , it is just saying the same thing. It is another ‘predicitve’ observation that warm Jan is often followed by warm year. Call it El Ninjo or whatever, when it happens, it’s just giving a name to the same observation.

    What this means is that the auto-correlation is longer than AR(1) in monthly data, there would seem to be some annual AR1 as well.

  63. Greg Goodman says:

    Willis: “And this, as I suspected, means that at 70%, the author’s “leading indicator” is not doing any better than random chance … as as such, it is useless as a prognostication device.”

    What you are doing is a valid attempt at assessing the effect and the need to test a null is a very good point.

    However, your data is not “random” . You have constructed pseudo data with a similar statistical structure based on an analysis of the data and find the “predictor” works similarly. This demonstrates that at least a large part of the effect is due to the auto-correlation structure.

    This does not mean the predictor is useless, it means it will retain its, rather limited predictive ability as long as the data retains its auto-regressive nature. That is probably a reasonable expectation (and as long as there are no major volcanoes).

  64. Hoser says:

    It’s just diagnostic testing. http://en.wikipedia.org/wiki/Likelihood_ratios_in_diagnostic_testing
    Use this calculator to see how well the test works. http://www.medcalc.org/calc/diagnostic_test.php
    You fill in a 2×2 grid like this:
    Cold Year Warm Year
    Cold Jan 11 3
    Warm Jan 3 17

    Results:
    a c
    b d

    Positive Predictive Value (Warm)
    a /(a + c ) = 78.57 % (*) 95% CI: 49.21 % to 95.09 %
    Negative Predictive Value (Cold)
    d /(b + d ) = 85.00 % (*) 95% CI: 62.08 % to 96.62 %

    More likely to be right when the test indicates cold..
    Nobody needs to get all cranky. Interesting test. But more than likely the FDA would prefer at least 1000 tests. Oops! There goes the share price.

  65. daddylonglegs says:

    Nick Stokes on February 1, 2014 at 9:43 pm

    I’ve put here a table of the correlation coefficients for the six land/ocean indices I deal with. Each is over the whole length of data. The correlations are between each month and the annual (calendar) average. Naturally they improve as you advance in the year; mid-year months are a more representative sample.

    I don’t think the result is worthless. As said, Jan is a leading indicator. It has limited use as a predictor.

    Thanks – I was just about to ask what about the other months.

  66. Berényi Péter says:

    Is it not possible, that any month could be used as a “leading indicator” for the average temperature of the 12 month period starting with it? Just because things tend to change slowly, perhaps.

  67. lemiere jacques says:

    january anomaly is used to calculate yearly anomaly.
    let s imagine the anomaly is random on a monthly basis and Zero in average.
    if you pick up a year with a given anomaly in january the anomaly of the year will be this anomaly on average on a statistical point of view.

    I just want to say autocorrelation.

    Less the 40 points in your graph…

  68. This is completely unsurprising: there is substantial autocorrelation in the anomalies, so any point measurement is a reasonable predictor of the coming year, barring “funny things” happening, such as major volcanoes. As Nick Stokes points out, measurements in the middle of the year (June/July) would be expected to have the best correlation with the calendar year, and it looks like they do.

    The fact that January is a reasonable predictor tells you that the correlation time scale is not “short” compared with one year; the fact that June/July do better tells you that the correlation time scale is not “very long” compared with one year. But we already knew that.

  69. A C Osborn says:

    Jonathan Jones says: February 2, 2014 at 3:05 am
    “the fact that June/July do better tells you that the correlation time scale is not “very long” compared with one year”

    I don’t know who’s data you are looking at, but it certainly isn’t the OP’s.
    > Jan, 83%
    Versus
    > Jun, 62%
    > Jul, 44%

  70. son of mulder says:

    As January is being used as an indicator and is part of the annual average being predicted and compared to, hence introducing a bias, what happens if you use the preceding December as an indicator ie if Dec 2001 is warmer than Dec 2000 will 2002 be warmer than 2001? This would also have the advantage of “predicting” a year before it starts.

  71. Steve from Rockwood says:

    Looking at Willis’ numbers the winter months give higher predictive coefficients than the summer months, suggesting average winter temperatures are a better predictor of annual temperatures. The fact that even the summer temperature predictive coefficients are so high is likely a result of length of time it takes for the trends to turn around (longer trends producing a higher correlation than shorter ones). If you compare the change in coefficients from a min / max point of view the difference between summer and winter becomes even greater (so does the variance).

    I liked the article Walter and the comments even more.

  72. Verity Jones says:

    I suspect this JTI works because January is the month in which we see how much the Northern Hemisphere has cooled from the previous summer warmth and overall the NH has tended to warm more than the Southern Hemisphere, having therefore a greater effect on the global average anomaly.
    GISS Graph of Hemispheric temperature change

  73. Martin 457 says:

    Layman perspective:

    Isn’t this how the climate models failed? Not by ‘flipping coins’, but, ‘rolling loaded dice’.

  74. herkimer says:

    In Contiguous US, January temperatures have been declining for 15 years at -1.49F/decade . So have the winter temperatures at -1.57F/decade and the annual at -.16F/decade.The reason the annual is also declining is that 7 out of 12 months are also declining. The fall temperatures are declining and spring as well but only APRIL and MAY. .Only the summer temperatures are still increasing . So with North America cooling , It is probable that if janauary is cold, the rest of the year is cold as well since most months are cooling due to Northern HEMISPHERE oceans which have been temperature flat for 10 years and are now cooling year round since about 2005..

  75. North of 43 and south of 44 says:

    Werner Brozek says:
    February 1, 2014 at 5:32 pm

    We have all heard of numerous adjustments, but I think that this is one time that the adjustments are not relevant. After all, we are not interested in the rate of warming over the last several decades but how the January anomalies predict annual anomalies. And any adjustments that are made would affect January as much as the annual anomaly more or less equally.
    _____________________________________________________________________

    Only if the adjustment system isn’t being mucked with on the whim of fitting the curves to a predetermined end state.

    In this case verify before using let alone trusting.

  76. phlogiston says:

    Verity Jones says:
    February 2, 2014 at 5:12 am
    I suspect this JTI works because January is the month in which we see how much the Northern Hemisphere has cooled from the previous summer warmth and overall the NH has tended to warm more than the Southern Hemisphere, having therefore a greater effect on the global average anomaly.

    Verity check out this data posted above by Nick Stokes:

    http://www.moyhu.org.s3.amazonaws.com/misc/janlead.txt

    January is near the worst, not the best, correlated with the year average.
    But it has the advantage of beginning of the year.

  77. Claude Harvey says:

    I favor “chicken droppings”, myself. The whiter they get in January, the cooler the following year will be (except in years when you’ve overdone it with the grit).

  78. walterdnes says:

    phlogiston says:
    > February 2, 2014 at 7:49 am

    > Verity check out this data posted above by Nick Stokes:

    > http://www.moyhu.org.s3.amazonaws.com/misc/janlead.txt

    I believe that the difference between these numbers and Willis Eschenbach’s numbers is that…

    * Nick’s numbers are Month N, versus the same year’s January through December

    * Willis’ numbers are Month N, versus the 12 month series, Month N this year through Month (N – 1) next year.

    Re: Nick’s data…
    > January is near the worst, not the best, correlated
    > with the year average. But it has the advantage of
    > beginning of the year.

    Using December as a leading indicator for the current calendar year is pointless. Once the December data is in, you already have the current calendar year’s data. Similarly, using July data means that you’re only “predicting” 5 months forward versus January, which “predicts” 11 months forward. The fact that shorter range predictions have higher accuracy is not a shock.

  79. Verity Jones says:

    @phlogiston
    Now that I’ve looked at Nick Stokes data, and thought about it, the correlations are not that surprising. The correlations are highest in June-Oct. The seasonal year ends in Nov. December’s data will usually be carried into the following year (Winter = DJF) so December is the first month, with the expected lowest correlation.

  80. Keith Sketchley says:

    Is it proper to use a single line best fit rather than segmented or curved?

  81. marcjf says:

    Taking in account the valid points raised here about statistical bias etc, it seem to me that if this simple observation holds good then we do have a short/medium term leading indicator. Which is more than most “models” can manage. It ought to be interesting to see how this pans out.

  82. Willis Eschenbach says:

    walterdnes says:
    February 2, 2014 at 12:11 am

    Willis Eschenbach says:

    > February 1, 2014 at 10:49 pm

    > If, on the other hand, we use only the satellite era data we get
    >
    > Jan, 83%
    > Feb, 59%
    > Mar, 62%
    > Apr, 59%
    > May, 50%
    > Jun, 62%
    > Jul, 44%
    > Aug, 50%
    > Sep, 44%
    > Oct, 47%
    > Nov, 71%
    > Dec, 68%
    > AVERAGE, 58%
    > 95% CI 35% to 81%

    OK, maybe it’s been a freaky/flukey 1/3rd of a century, but it’s nice to know that my calculations agree with yours about GISS having approx 80% correlation for January versus the entire year (I get similar numbers for the other data sets). The 80%+ correlation is the whole point of the article. I don’t have a physical explanation for why that is, versus the lower numbers for other months. I get that you’re saying it could be entirely due to chance. Maybe it is. But I’ll stick my neck out this month and make forecasts. A year from now you may be laughing at me.

    You seem to not understand what I am saying, likely my fault. I am saying that the January results are EXPECTED when the data has the shape and form of the temperature data. You say that you “don’t have a physical explanation” … why would you need a “physical explanation” for a random occurrence?

    That’s like putting up a post of WUWTsaying “WOW, I just grabbed a coin, flipped three heads in a row, and I have no physical explanation for why that is.” …

    It has a “physical explanation”. It is an EXPECTED RANDOM OCCURRENCE. Not only is the mean value expected, but the range of the values is also expected. Therefore, the high January “versus the lower numbers for the other months” is also an EXPECTED RANDOM OCCURRENCE. My Monte Carlo analysis finds a variation in the monthly results which almost exactly matches the range of values we find in the GISS LOTI data, including the increased range when you move from the full dataset to the shorter satellite era data.

    Next, you say “it could be entirely due to chance. Maybe it is” … but you are still going to use it for forecasts anyways.

    Hey, nobody can stop you. But nobody’s going to be impressed if your “forecasts” come true. You see, we expect such a forecast to come true, it’s inherent in the data …

    As I mentioned previously, I used satellite-era data to enable an apples-to-apples comparison between the surface-based data sets, and the satellite-based data sets.

    That’s fine … but then you left out the full analysis of the individual datasets.

    Walter, you seem like one of the good guys, but you are chasing a total chimera here. The result that seems to impress and surprise you so much occurs in “red noise” pseudodata … you’re looking at expected results, my friend, there’s nothing there.

    Truly, my friend, you need to take up the study of the Monte Carlo analysis, along with ARIMA datasets. Highly autocorrelated datasets like the temperature data have funny properties, and you’ve stumbled across one of them.

    As a result, your observation about January being a “leading indicator” is no more surprising than finding a large number of warm years in the most recent decade of the temperature record of a planet which has been gradually warming for a few hundred years … which is to say, not surprising in the slightest.

    w.

  83. Willis Eschenbach says:

    Greg Goodman says:
    February 2, 2014 at 12:10 am

    Willis :

    “I’m sorry, but that is special pleading. You need to use the full dataset, not just the section that might be favorable to your theory. Yes, if you throw out the data that gives poor results, your results will get stronger and strong … consider what that means. It means nothing.”

    That would be a reasonable comment if he was arbitrarily removing an “inconvenient ” sections. However, there is a good, accepted, physical reason why that section should have a different behaviour. Removing it is a legitimate step.

    If you were investigation the variation of diurnal temperature variation against solar elevation, it would be legitimate to remove a day that had a solar eclipse at 2pm.

    Greg, while in other cases you might be right, in this case he’s arbitrarily removed about three-quarters of the GISS LOTI data.

    Are you seriously arguing that there is a “good, accepted, physical reason” why three quarters of the data should have a different behavior??? Or the same for the data around the time of Pinatubo?

    I ask in part because Walter himself has said that he does not have “a physical explanation” for the claimed correlation. Given that there is no “good, accepted physical reason” for the putative phenomenon itself … then how can we possibly know what might be a “good, accepted physical reason” that some data should be excluded?

    Best regards,

    w.

  84. A C Osborn says:

    As usual Mr Eschenbach uses maths to show something, in this case that he can reproduce the same results as the OP.
    Wwhen his results are no where near the OPs, unless of course he can’t see the difference between

    Jan	0.7	Jan	0.83
    Feb	0.62	Feb	0.59
    Mar	0.57	Mar	0.62
    Apr	0.61	Apr	0.59
    May	0.56	May	0.5
    Jun	0.58	Jun	0.62
    Jul	0.55	Jul	0.44
    Aug	0.65	Aug	0.5
    Sep	0.55	Sep	0.44
    Oct	0.59	Oct	0.47
    Nov	0.7	Nov	0.71
    Dec	0.67	Dec	0.68
    Average	0.61	AVERAGE	0.58
    Max	0.7	Max	0.83
    Min	0.55	Min	0.44
    Range	0.15	Range	0.39
    Std.	0.055616381	Std	0.119401919
    
    

    [You will find it better to use "pre" within angle brackets to format tables, but don't bother to copy-and-paste all 10 digits of a std dvt when the data is only 2 digits. 8<0 Mod]

  85. Willis Eschenbach says:

    Greg Goodman says:
    February 2, 2014 at 12:35 am

    Willis:

    “And this, as I suspected, means that at 70%, the author’s “leading indicator” is not doing any better than random chance … as as such, it is useless as a prognostication device.”

    What you are doing is a valid attempt at assessing the effect and the need to test a null is a very good point.

    However, your data is not “random” . You have constructed pseudo data with a similar statistical structure based on an analysis of the data and find the “predictor” works similarly. This demonstrates that at least a large part of the effect is due to the auto-correlation structure.

    This does not mean the predictor is useless, it means it will retain its, rather limited predictive ability as long as the data retains its auto-regressive nature. That is probably a reasonable expectation (and as long as there are no major volcanoes).

    Greg, the null hypothesis is that the behavior pointed out by Walter is expected. That is to say, it is an inherent feature of the nature of the dataset itself. I have shown that indeed, this is the case. We find the same thing in “red noise”. So Walter has not falsified the null hypothesis …

    Does that make it “useless”? Well, that depends on how you define “useful”. For example, if you want to predict tomorrow’s weather, your best guess is today’s weather … is that a useful prediction?

    Generally, in climate science, this is not seen as “useful”. Instead, it just forms the baseline that any practical forecasting system has to beat. If you can’t do better than saying tomorrow will be like today, your forecast sucks … but that doesn’t make “tomorrow will be like today” into a useful forecast.

    If it were “useful”, as you claim, then there would be weather forecasters out there every day forecasting that tomorrow will be like today … funny, I don’t find them. Nor do I find people forecasting the year based on January, and for the same reason.

    That’s not a forecast. That’s the predictability that is inherent in the data—so that’s not a prediction of any kind.

    Instead, that’s merely the baseline that a real forecast has to beat in order to be of any value.

    w.

  86. Willis Eschenbach says:

    A C Osborn says:
    February 2, 2014 at 10:07 am

    As usual Mr Eschenbach uses maths to show something, in this case that he can reproduce the same results as the OP.
    Wwhen his results are no where near the OPs, unless of course he can’t see the difference between

    Jan	0.7	Jan	0.83
    Feb	0.62	Feb	0.59
    Mar	0.57	Mar	0.62
    Apr	0.61	Apr	0.59
    May	0.56	May	0.5
    Jun	0.58	Jun	0.62
    Jul	0.55	Jul	0.44
    Aug	0.65	Aug	0.5
    Sep	0.55	Sep	0.44
    Oct	0.59	Oct	0.47
    Nov	0.7	Nov	0.71
    Dec	0.67	Dec	0.68
    Average	0.61	AVERAGE	0.58
    Max	0.7	Max	0.83
    Min	0.55	Min	0.44
    Range	0.15	Range	0.39
    Std.	0.055616381	Std	0.119401919

    I have no idea why you are comparing those results, A.C. The numbers in the right column are from my analysis of the satellite era GISS LOTI data … and the numbers on the left are from my analysis of the full GISS LOTI data. Neither of them are “the OPs” results.

    It appears you think you are comparing my results to Walters, and getting all snippy about the fact that they don’t match … bad news. You’re comparing two things that we DON’T EXPECT TO BE THE SAME. You are comparing MY analysis of the full data set with MY analysis of a quarter of it, and you haven’t compared either one to the OPs results.

    Sorry, A.C., but your comment is a colossal fail. You don’t even seem to notice what you are comparing … but of course, in your inimitable way, you attempt to use your totally bogus results to get all nasty about me and claim that I don’t know what I’m doing.

    Nice try …

    w.

  87. Hoser says:

    Willis Eschenbach says:
    February 2, 2014 at 9:48 am

    “As a result, your observation about January being a “leading indicator” is no more surprising than finding a large number of warm years in the most recent decade of the temperature record of a planet which has been gradually warming for a few hundred years … which is to say, not surprising in the slightest.”

    W, you missed the point. Not surprised you got lost in the weeds of your own analysis. Walterdnes does have something interesting. It isn’t earth shattering, it might not hold true, but it seems to have some merit.

    See Hoser says:
    February 2, 2014 at 12:53 am

    Walterdnes, strap on a pair, and don’t let W walk all over you with too many paragraphs, and too much so-called analysis.

  88. A C Osborn says:

    As usual MR Eschenbach you are having Reading Difficulties, just like the last time I communicated with you.
    Let me correct that for you the right hand column comes from here

    walterdnes says:
    February 2, 2014 at 12:11 am

    Willis Eschenbach says:
    > February 1, 2014 at 10:49 pm

    > If, on the other hand, we use only the satellite era data we get
    >
    > Jan, 83%
    > Feb, 59%
    > Mar, 62%
    > Apr, 59%
    > May, 50%
    > Jun, 62%
    > Jul, 44%
    > Aug, 50%
    > Sep, 44%
    > Oct, 47%
    > Nov, 71%
    > Dec, 68%
    > AVERAGE, 58%
    > 95% CI 35% to 81%

    As we say in the UK “He should have gone to Spec Savers.

  89. A C Osborn says:

    This time I do apologise, you are correct, they are you numbers.
    But they do show exactly what the OP is saying, that for that period January stand out like a sore thumb.

  90. A C Osborn says:

    Your 0.83 compares very well with his overall results, but with Volcanic activity taken out he gets

    For comparison, here are the scores with the Pinatubo-affected years (1991/1992/1993) removed. First, where the years were forecast to be warmer than the previous year

    Data set 	Had3 	Had4 	GISS 	UAH5.6 	RSS 	NOAA
    Ann > previous 	16 	15 	17 	17 	17 	15
    Jan > previous 	17 	16 	18 	20 	19 	16
    Accuracy 	0.94 	0.94 	0.94 	0.85 	0.89 	0.94
    

    And for years where the anomaly was forecast to be below the previous year

    Data set 	Had3 	Had4 	GISS 	UAH5.6 	RSS 	NOAA
    Ann < previous 	11 	11 	11 	10 	10 	11
    Jan < previous 	14 	15 	13 	11 	12 	15
    Accuracy 	0.79 	0.73 	0.85 	0.91 	0.83 	0.73
    

    Which are even more impressive when compared to the UK Met Office forecasts, which are never correct and have been shown to be worse than a coin toss or Dart throw.

    So how do you explain those results compared to your Monte Carlo analysis?

  91. DS says:

    I’m not sure if this has been mentioned as I didn’t read all of the comments yet, but there is likely a much easier way to find this trend. That is, solely looking at El Nino/La Nina

    [IMG]http://i57.tinypic.com/w1to9f.jpg[/IMG]

    The first column is your January “Warmer/Colder” prediction, with your misses being highlighted yellow. The years data that follows is color-coded to your prediction to make for easy comparison with my prediction. The data I used is the Oceanic Niño Index, as available here
    http://www.cpc.ncep.noaa.gov/products/analysis_monitoring/ensostuff/ensoyears.shtml
    The white calculation column at the end is the Oct/Nov/Dec change between the previous 2 years (eg, 1981 OND – 1980 OND = my 1982 Prediction). My prediction is based solely off that year over year change, and can be seen in the color-coded year column directly before it.

    1985 was a toss up for my prediction method as there was a perfect 0 trend between the prior to years. Otherwise 1994 & 2012 are the only years where my method varies from your method – and my method predicted those years correctly, yours did not.

    I continue your trend of missing the Pinatubo years between 1991-1993 (I came to the same outcome you did) and otherwise only miss on 2003, a year you similarly missed. (wondering if it is an extremely high volcanic activity year)

    So ignoring the Pinatubo years of 1991-1993 and removing my non-prediction for 1985, I was able to correctly predict 27 of 28 years. I can also already make a prediction for 2014 – Colder

    Now, my 96% looks fantastic at first glance, but (like yours) maybe someone would like to check these, what would be my 1952-1982 predictions
    [IMG]http://i59.tinypic.com/2vwcxe1.jpg[/IMG]
    Unfortunately, I can already see I missed on 1981 (and ironically, you were correct)

    And all of that said, maybe I failed to use the most predictive of trend patterns from the data I referenced. It is also possible my findings are a complete fluke. Oh, but if anyone uses my quick initial thoughts for an actual predictive model, please feel free to give me a nod ;-)

  92. William T Reeves says:

    One would expect that in a variable with twelve equal components knowledge of the direction of change of the first component would be correlated with the change of all because Jan is not statistically independent of the entire year. And if that’s the case knowledge of two of the components should increase accuracy and so on.

    It is my foggy understanding that generally this sort of thing is considered an error in statistical reasoning. Independent variables are used to build predictive models because variables where the predictive variable and the outcome are not independent tend to confound the analysis.

    But I defer to.the statisticians in the audience

  93. DougByMany says:

    I did not read your entire post or the comments, but I suspect the “homogenization” adjustments are the source of this correlation. If bad actors wanted to show an erroneous up trend, then they would push the oldest January temperatures down, along with the entire oldest year. They would push more recent temperatures up, along with the entire year. You get the picture. Perhaps your analysis is more evidence that the fix is in on our temperature record.

  94. Dr. Deanster says:

    I haven’t done it … and probably won’t do it …. but I suspect that if you line up the January’s with ENSO data, you are going to see a close relationship, given that EL Ninos and La Nina usually take shape right around January.

    ALSO .. I think you should omit 1992-1995, as those were the years where a Volcano was influencing Climate. .. thus, it is no suprise that your hypothesis fails in those years.

  95. Lars P. says:

    Latitude says:
    February 1, 2014 at 4:49 pm
    Note: GISS numbers are…….fake

    That was my first thought too thinking at GISS data. Wonder if the same trend is on the unadjusted data?
    However the same can be seen for the satellites. As Willis mentioned the data has a warming trend – which should explain why the case was more valid for January warmer = Year warmer (85%) the January colder = Year colder (79%). The opposite should appear in a cooling trend.

    As a general forecast it still looks better odds then random, so there might be some other effect, maybe Verity is right:

    Verity Jones says:
    February 2, 2014 at 5:12 am
    I suspect this JTI works because January is the month in which we see how much the Northern Hemisphere has cooled from the previous summer warmth and overall the NH has tended to warm more than the Southern Hemisphere, having therefore a greater effect on the global average anomaly.

  96. Werner Brozek says:

    walterdnes says:
    February 1, 2014 at 6:55 pm
    HadCRUT4 0.488

    This is the number obtained from WFT or where you add all anomalies and divide by 12. However the HadCRUT4 site itself gives 0.486, presumably by taking into account such things as February having fewer days. It should make no difference most of the time, but in the case of a very close call, it may matter, just like certain ranks for GISS had to be determined by going to the third decimal place. Then there may also be future adjustments, especially for GISS.
    Of course we do not know things to 3 decimal places so you may wish to consider certain ranges to be virtual ties, but that is another matter.

  97. walterdnes says:

    DS says:
    > February 2, 2014 at 11:51 am

    > I’m not sure if this has been mentioned as I didn’t read
    > all of the comments yet, but there is likely a much
    > easier way to find this trend. That is, solely looking at
    > El Nino/La Nina
    >
    > [IMG]http://i57.tinypic.com/w1to9f.jpg[/IMG]

    The more, the merrier. Seriously, if multiple predictive tools are available, let’s use them.

  98. walterdnes says:

    William T Reeves says:
    > February 2, 2014 at 12:21 pm
    >
    > It is my foggy understanding that generally this sort of
    > thing is considered an error in statistical reasoning.
    > Independent variables are used to build predictive
    > models because variables where the predictive variable
    > and the outcome are not independent tend to confound
    > the analysis.

    I’m perfectly happy to accept that the January anomaly is a co-dependant variable with the annual anomaly. I’m not arguing about dependant/independant variables or against auto-correlation. All I’m claiming is that the January anomaly is a good leading indicator of the annual anomaly.

  99. george e. smith says:

    Now I have seen everything, there is to know about the climate. A graph with absolutely no time scale at all.

    What a great idea, to plot fictional annual anomalies against fictional monthly anomalies, without regard to any time correspondence between them .

    Sort of like plotting the number of plant species in the pre-Cambrian, against the number of animal species in the Plasticine age; well of course I meant to say to plot the plant species anomalies in the pre-Cambrian, against the animal species anomalies in the Plasticine. Really informative graph, giving a scatter plot around a straight line axis (inclined). They should be well correlated out to about one million years.

    Wunnerful !!

  100. george e. smith says:

    Of course, the cognoscenti will fully recognize that the Plasticine era was at the very height of the “modeling” tradition of theoretical science.

  101. KRJ Pietersen says:

    I thought this post was a quirky fun piece of weekend trivia until I read all the comments. I, for one, will keep the JLI in mind in future to see how it fares. Thank you for intriguing me, Walter. Since I suppose that Mother Nature doesn’t know her Gregorian calendar from her elbow, I would say that any observed correlation would be a fluke, but time will tell. That’s science. Observation leading to hypothesis and so on.

    Leading on from your research here, I’d be fascinated to find out that a particular individual date had a colossally high correlation in terms of the annual anomaly, either plus or minus. Kind of a Groundhog Day that we should all eagerly anticipate.

  102. Greg says:

    Indeed, George, during the Plastocene optimum global mean temperature was notably higher than it is today. That’s why nowadays temperature data needs to be massaged for some considerable time to make it maleable enough to be suitable for modelling purposes.

    ;)

  103. Greg says:

    Willis: “Greg, while in other cases you might be right, in this case he’s arbitrarily removed about three-quarters of the GISS LOTI data.

    Are you seriously arguing that there is a “good, accepted, physical reason” why three quarters of the data should have a different behavior??? Or the same for the data around the time of Pinatubo?”

    I read that he removed 1991-93. How do you get from there to “three quarters of the GISS LOTI data “?

  104. William T Reeves:

    It is the events of Mr. Dnes’s model that need to be statistically independent. He ensures that his events are statistically independent by defining each of them on a different calendar year.

    By the way, in his write-up Mr. Dnes references event X and event Y. As his is a multivariate model, it would have been appropriate to refererence event (X, Y) where the elements of X are the possible conditions and the elements of Y are the possible outcomes. A pairing of a condition with an outcome is a description of an event for a multivariate model.

  105. walterdnes says:

    Werner Brozek says:
    > February 2, 2014 at 2:01 pm
    >
    > walterdnes says:
    > February 1, 2014 at 6:55 pm
    > HadCRUT4 0.488
    >
    > This is the number obtained from WFT or where you
    > add all anomalies and divide by 12. However the
    > HadCRUT4 site itself gives 0.486, presumably by taking
    > into account such things as February having fewer days.

    Interesting. I double-checked, and it is 0.488 from the downloaded data. I even edited out the leading “0.” from the text file, to make it all integers, and re-imported the year. This ensures a lot fewer possible round-off errors. It came out as 488. One more thing to watch for down the road.

  106. Willis Eschenbach says:

    A C Osborn says:
    February 2, 2014 at 10:46 am

    This time I do apologise, you are correct, they are you numbers.
    But they do show exactly what the OP is saying, that for that period January stand out like a sore thumb.

    So … you ragged all over me, and you accused me not being able to read, but the reading error was yours.

    And that is your apology?

    All too typical of your style, A. C.

    w.

  107. Willis Eschenbach says:

    Hoser says:
    February 2, 2014 at 10:41 am

    Willis Eschenbach says:
    February 2, 2014 at 9:48 am

    “As a result, your observation about January being a “leading indicator” is no more surprising than finding a large number of warm years in the most recent decade of the temperature record of a planet which has been gradually warming for a few hundred years … which is to say, not surprising in the slightest.”

    W, you missed the point.

    In other words, you don’t like my results, but you’re not saying why.

    Not surprised you got lost in the weeds of your own analysis.

    In other words, you don’t like my results, but you’re not saying why.

    Walterdnes does have something interesting. It isn’t earth shattering, it might not hold true, but it seems to have some merit.

    In other words, you like Walter’s results, but you’re not saying why.

    See Hoser says:
    February 2, 2014 at 12:53 am

    OK, I saw it. You’ve calculated the results of Walter’s claims in a slightly different manner … which has nothing to do with what I said. In other words, you don’t like what I said, but you’re not saying why.

    Walterdnes, strap on a pair, and don’t let W walk all over you with too many paragraphs, and too much so-called analysis.

    In other words, you can’t find one single flaw in my analysis … but you don’t like it. Oh, and you think Walter’s got no balls.

    OK, Hoser, let me summarize your contribution to this thread. To date, you have seriously and firmly established that:

    1. You don’t like my analysis, but you haven’t put forward one single solitary reason why you don’t like it. You have not pointed out any scientific flaws in my analysis, heck, you haven’t even tried … but by heaven, you don’t like it. OK, got it.

    2. You think Walter has no balls.

    Regarding the second one of these, I disagree entirely. Walter had the necessary nerve to put his scientific ideas out here, and to defend them. It takes real nerve to put a radical idea out in a head post on WUWT, hand around the hammers, and invite people to see if they can demolish the idea. As someone who does that regularly, I salute him.

    And meanwhile, all you’ve had the blanquillos to do is complain about my analysis without providing even the slightest scientific objection to what I’ve done.

    I’ll take Walter’s approach over yours any day,

    w.

  108. Willis Eschenbach says:

    Greg says:
    February 2, 2014 at 3:08 pm

    Willis:

    “Greg, while in other cases you might be right, in this case he’s arbitrarily removed about three-quarters of the GISS LOTI data.

    Are you seriously arguing that there is a “good, accepted, physical reason” why three quarters of the data should have a different behavior??? Or the same for the data around the time of Pinatubo?”

    I read that he removed 1991-93. How do you get from there to “three quarters of the GISS LOTI data “?

    Sorry for the lack of clarity. The GISS LOTI data is 133 years long. He’s arbitrarily decided to look at only a bit more than thirty years of the data, starting in 1979. Since the dataset starts in 1880, that’s one hundred years of data he’s removed from the GISS LOTI dataset.

    How do you get from there to him not removing heaps of data?

    Now, if later on he wants to do the analysis on the satellite data, fine. But that’s not a reason to ignore a hundred years of perfectly valid data.

    w.

  109. Willis Eschenbach says:

    William T Reeves says:
    February 2, 2014 at 12:21 pm

    One would expect that in a variable with twelve equal components knowledge of the direction of change of the first component would be correlated with the change of all because Jan is not statistically independent of the entire year. And if that’s the case knowledge of two of the components should increase accuracy and so on.

    It is my foggy understanding that generally this sort of thing is considered an error in statistical reasoning. Independent variables are used to build predictive models because variables where the predictive variable and the outcome are not independent tend to confound the analysis.

    But I defer to.the statisticians in the audience

    Couldn’t agree more. As I said above,

    Finally, you’ve done something which is an absolute no-no in the forecasting world. This is to include the predictor data in the response. Since January is a part of the yearly average, if there were no other factors (no trend, no autocorrelation), we’d expect the January trend to agree with the yearly trend some 53% of the time.

    Terry Oldberg says:
    February 2, 2014 at 3:44 pm

    William T Reeves:

    It is the events of Mr. Dnes’s model that need to be statistically independent. He ensures that his events are statistically independent by defining each of them on a different calendar year.

    I agree that the events need to statistically independent of each other, as you say.

    In addition, however, it’s considered bad form at a minimum to include your predictor variables in what you are trying to predict. Suppose I said “WOW! The average of the year up to September and the average for the whole year have the same sign almost every year!” Would you consider that impressive?

    Of course not, because the data you know is included in what you are trying to predict.

    Now, that’s by no means everything that’s wrong with Walter’s analysis. However, it is one of the things wrong with it, as William Reeves correctly surmised.

    w.

  110. Willis Eschenbach says:

    Let me take another shot at explaining my objection, since some folks don’t get it.

    In a dataset that contains multi-year, decadal, and multidecadal trends, we EXPECT that if this January is warmer than last January, that this year will be warmer than last year. It is not a peculiarity of the Earth’s climate system. It is not a “leading indicator”. It is the EXPECTED AND INEVITABLE result of the “persistence” in the dataset. It is just another example of the well-known fact that the best estimate for tomorrow’s temperature is today’s temperature … and the same is true for next week’s temperature, next month’s temperature, and (as Walter points out in the head post) next year’s temperature.

    That is why we find Walter’s indicator operating in “red noise”, which is autocorrelated random noise. It’s not a secret window into the climate, it is a consequence of the nature of the dataset. We’ll find it in a host of natural datasets—the best estimate of tomorrow is today.

    Now, weather forecasters of any sagacity know this. As a result, they are not surprised or impressed by it, as many people here seem to be.

    The sagacious forecasters know about and use things like Walter’s indicator, but not as a forecasting method.

    Instead, they use things like Walter’s indicator as a yardstick to measure their forecasts. Remember, Walter’s indicator is present in red noise data … so if you can’t do at least as well as Walter did, then your forecast sucks.

    Well, except for the fact that Walter’s predictand contains the predictor variable … but you take my meaning, and Walter could correct that.

    In that regard, wasn’t there a post here on WUWT a while back regarding how well the climate models did, year over year, compared with just assuming that next year would be the same as this year?

    Now that would be a good analysis. Use a lag-1 based forecast like Walter’s (without the predictor/predictand overlap) to see if the climate models can beat a Walter-style analysis …

    My regards to everyone,

    w.

  111. Willis Eschenbach says:

    george e. smith says:
    February 2, 2014 at 2:18 pm

    Now I have seen everything, there is to know about the climate. A graph with absolutely no time scale at all.

    George, you might google “scatterplot”. They don’t have time scales. Instead, they have pairs of values.

    What a great idea, to plot fictional annual anomalies against fictional monthly anomalies, without regard to any time correspondence between them .

    Huh? Walter is plotting what happened in a given year against what happened the year before. This kind of time-lagged plot is quite common, and for a good reason. It is often quite informative. And there is total regard given to the “time correspondence between them”. It is the basis upon which they are paired—any given year versus the previous year.

    Sort of like plotting the number of plant species in the pre-Cambrian, against the number of animal species in the Plasticine age; well of course I meant to say to plot the plant species anomalies in the pre-Cambrian, against the animal species anomalies in the Plasticine. Really informative graph, giving a scatter plot around a straight line axis (inclined). They should be well correlated out to about one million years.

    Wunnerful !!

    While I do love the idea of the “Plasticine Age”, I fear that your scorn and sarcasm merely reveal that you don’t understand what’s going on.

    Nor is your example of plotting “plant species anomalies in the pre-Cambrian, against the animal species anomalies in the Plasticine” valid. Why? Because a scatterplot is used, as Walter did, for paired data, where each “x” is logically paired with a corresponding “y”.

    Now as you point out, no such correspondence exists between the plant and animal species … but that just means you shouldn’t use a scatterplot for that particular pile of data. However, it means nothing about the utility of the scatterplot … there’s a good reason why it’s used so much, and Walter is using it properly.

    w.

  112. Werner Brozek says:

    Willis Eschenbach says:
    February 2, 2014 at 6:41 pm
    The sagacious forecasters know about and use things like Walter’s indicator, but not as a forecasting method.

    I believe the MET office would do better if they did use some version of what Walter used.

  113. Willis Eschenbach says:

    walterdnes says:
    February 2, 2014 at 2:13 pm

    William T Reeves says:

    > February 2, 2014 at 12:21 pm
    >
    > It is my foggy understanding that generally this sort of
    > thing is considered an error in statistical reasoning.
    > Independent variables are used to build predictive
    > models because variables where the predictive variable
    > and the outcome are not independent tend to confound
    > the analysis.

    I’m perfectly happy to accept that the January anomaly is a co-dependant variable with the annual anomaly. I’m not arguing about dependant/independant variables or against auto-correlation. All I’m claiming is that the January anomaly is a good leading indicator of the annual anomaly.

    Thanks, Walter. Yes, you are right.

    It is also an equally good “leading” indicator of red noise random data … and in neither case is that a significant, unexpected, or unusual finding. It is an EXPECTED RESULT in any autocorrelated, persistent dataset.

    As a result, your “leading” indicator is useful, but only as a yardstick against which to compare a real forecast system.

    Best regards,

    w.

    PS—Calling it a “leading indicator” is totally misleading, since you are using data that does NOT lead the result you are trying to predict. You need to either change the name or change the method … my suggestion would be to compare January to the following Feb to Jan year.

    A leading indicator is called a “leading indicator” because, well, it leads …

  114. Willis Eschenbach says:

    Walter, one more thing. I just ran the numbers. When you actually turn your method into a true leading indicator, by having January predicting the 12 months following January (and NOT including that January), your January correlation in the satellite era drops from 83% down to 74%. In addition, if we look at the entire GISS LOTI record, the January indicator drops from 70% to 61%.

    So by what are usually considered to be improper methods in the world of forecasting (including your prediction variable January in the yearly average to be predicted), you’ve inflated your results by no less than 9% … which is why it’s generally considered to be a forecasting no-no.

    My spreadsheet is available here, showing these results.

    w.

  115. Willis Eschenbach says:

    Werner Brozek says:
    February 2, 2014 at 7:27 pm

    Willis Eschenbach says:
    February 2, 2014 at 6:41 pm

    The sagacious forecasters know about and use things like Walter’s indicator, but not as a forecasting method.

    I believe the MET office would do better if they did use some version of what Walter used.

    Thanks, Werner. Actually, the MET office uses a variety of metrics to assess the skill of their forecasts, which is an entire field of study in itself. See e.g. here, here, and here.

    Google “forecast skill” for various discussions of the issues in the field.
    w.

  116. Willis Eschenbach (Feb. 2 at 8:17 pm):

    You seem to base your conclusions regarding Mr. Dnes’s work upon the claim of David B. Stockwell that “if a methodology generates the same results with random data as with real data it is highly likely the methodology simply embodies a logical fallacy know [sic] as petitio principii, or the circular argument, where the conclusions are assumed in the premises.” Stockwell does
    not provide a citation to a proof of his generalization that it is “highly likely the methdology simply embodies a logical fallacy…” Can you supply a citation?

  117. Willis Eschenbach says:

    Terry Oldberg says:
    February 2, 2014 at 9:14 pm

    Willis Eschenbach (Feb. 2 at 8:17 pm):

    You seem to base your conclusions regarding Mr. Dnes’s work upon the claim of David B. Stockwell that “if a methodology generates the same results with random data as with real data it is highly likely the methodology simply embodies a logical fallacy know [sic] as petitio principii, or the circular argument, where the conclusions are assumed in the premises.”

    Thanks, Terry. Interesting question, but I’m absolutely not basing anything on that. First, I’ve never read the Stockwell quote.

    Second, only a part of Walter’s results are due to petitio principii. Both the premises and the conclusion contain January … but that’s only one month in twelve.

    The other part of Walter’s positive results are due to the autocorrelation structure of the data.

    Stockwell does not provide a citation to a proof of his generalization that it is “highly likely the methdology simply embodies a logical fallacy…” Can you supply a citation?

    I do not agree with his generalization, and I have no idea what he bases it on. In any case, I’m allergic to providing citations for another man’s claims. More to the point, I say something quite different.

    I say the “Monte Carlo Method” which I used above is a valid way to estimate the effect of a methodology. So my citation would be to e.g. Wolfram Mathworld, which says:

    Monte Carlo Method

    Any method which solves a problem by generating suitable random numbers and observing that fraction of the numbers obeying some property or properties. The method is useful for obtaining numerical solutions to problems which are too complicated to solve analytically. It was named by S. Ulam, who in 1946 became the first mathematician to dignify this approach with a name, in honor of a relative having a propensity to gamble (Hoffman 1998, p. 239). Nicolas Metropolis also made important contributions to the development of such methods.

    I’ve used it for exactly that purpose, for “obtaining numerical solutions to problems which are too complicated to solve analytically”, in this case the expected success of Walter’s non-leading indicator.

    At the Wolfram site, there are ten references to the history, development, and details of the method, along with four interactive examples (right column).

    In addition, Wikipedia has a good description of the Monte Carlo method here.

    Regards,

    w.

  118. walterdnes says:

    Willis, you seem to be hung up on the term “leading indicator”. Would you be happier if I simply called it a “Rule of Thumb” or “Heuristic” or “Algorithm”? I’m using a term from economics, to refer to a climate event. Is there a technical “Statistics 101″ definition of “Leading Indicator”? The January anomaly leads the annual (calendar-year) anomaly by 11 months, by definition. A more detailed definition is at http://www.investorwords.com/2741/leading_indicator.html

    > An economic indicator that changes before the economy
    > has changed. Examples of leading indicators include
    > production workweek, building permits, unemployment
    > insurance claims, money supply, inventory changes, and
    > stock prices. The Fed watches many of these indicators
    > as it decides what to do about interest rates. There are
    > also coincident indicators, which change about the same
    > time as the overall economy, and lagging indicators,
    > which change after the overall economy, but these are
    > of minimal use as predictive tools.

    As far as I’m concerned, a growing workweek, rising building permit numbers, and falling unemployment means the economy has improved already. Some aspects of the economy react before others.

    Another question; Given that last year’s January…

    UAH5.6 anomaly was 0.497 and RSS anomaly was 0.439

    assume that this year the anomalies are lower, 0.183 and 0.175 respectively (You don’t want to know how I got these numbers)

    Do you not see any value in saying that 17 out of the last 19 times a similar event happened for RSS, and 17 out of 20 for UAH, during a non-volcanic year, the current year’s annual anomaly was lower than the previous year’s anomaly?

  119. Dr. Strangelove says:

    Just the facts, Willis
    I’m impressed with the results of JLI. Offhand I see it’s not trivial. Do the math to prove the results are non-trivial.

    Case A – this year’s January anomaly is greater (warmer) than last year’s January anomaly. Two possible outcomes: 1) This year’s annual anomaly is greater than last year’s annual anomaly; 2) This year’s annual anomaly is less than last year’s annual anomaly. Apply probability theory and null hypothesis. Assume each outcome is random and equally probable. P = 0.5 for outcomes 1 and 2.

    From 1980-2013 data, actual results: Outcome 1 = 17/20 = P = 0.85; Outcome 2 = 3/20 = P = 0.15

    Case B – this year’s January anomaly is less (colder) than last year’s January anomaly. Two possible outcomes: 1) This year’s annual anomaly is less than last year’s annual anomaly; 2) This year’s annual anomaly is greater than last year’s annual anomaly. P = 0.5 for outcomes 1 and 2.

    From 1982-2012 data, actual results: Outcome 1 = 11/14 = P = 0.78; Outcome 2 = 3/14 = P = 0.22

    Are these results due to chance? Apply Monte Carlo simulation to answer this. For Case A, run the simulation 20,000 times divided into 1,000 sets of 20 runs per set. Assign P = 0.5 for outcome 1 and 2. Plot outcome 1 in a histogram, each set represented by a frequency bar. Since this is a random event, the histogram will resemble a normal curve. Calculate the mean and the standard deviation.

    Using the normal curve, you can compute the probability of the actual results in Case A occurring by chance. This is equivalent to the area under the curve. Do this also for Case B. I predict the probability is very small indicating the actual results are non-trivial.

  120. Greg Goodman says:

    Willis:” When you actually turn your method into a true leading indicator, by having January predicting the 12 months following January (and NOT including that January), your January correlation in the satellite era drops from 83% down to 74%. ”

    That step makes sense, excluding correlating someting with itself.

    So 74% is pretty much the 75% that someone (Willis?) said ealier would be expected from the autocorrelated nature of the data.

    You derived that from “satellite era” which includes the Mt P perturbation, so I wouid presumably there would be a somewhat higher result if 19-93 was excluded. That means that this “predictor” is picking up something about the structure of the data beyond monthly AR1, though it is not a huge difference.

    That presumably results from some longer term (inter-annual) structure in the data.

    AR1 can easily be removed by taking first difference of the data which is what I do pretty much systematically when analysing any climate time series . Perhaps doing the same process on the first diff would be more enlightening.

  121. Greg Goodman says:

    Willis: “Sorry for the lack of clarity. The GISS LOTI data is 133 years long. He’s arbitrarily decided to look at only a bit more than thirty years of the data, starting in 1979. Since the dataset starts in 1880, that’s one hundred years of data he’s removed from the GISS LOTI dataset.

    How do you get from there to him not removing heaps of data?

    Now, if later on he wants to do the analysis on the satellite data, fine. But that’s not a reason to ignore a hundred years of perfectly valid data.”

    ===

    Thanks for the clarification. It would make sense to use all the data unless there is an explicit reason for not doing so.

    However, calling GISS LOTI “perfectly valid data” I would question. It’s one of the datasets I have the least faith in due the constant cooling the past “corrections” and the declared activist stance of those in charge of creating, adjusting and maintaining the data.

  122. Willis Eschenbach says:

    walterdnes says:
    February 2, 2014 at 11:54 pm

    Willis, you seem to be hung up on the term “leading indicator”. Would you be happier if I simply called it a “Rule of Thumb” or “Heuristic” or “Algorithm”? I’m using a term from economics, to refer to a climate event. Is there a technical “Statistics 101″ definition of “Leading Indicator”? The January anomaly leads the annual (calendar-year) anomaly by 11 months, by definition. A more detailed definition is at http://www.investorwords.com/2741/leading_indicator.html

    > An economic indicator that changes before the economy
    > has changed. Examples of leading indicators include
    > production workweek, building permits, unemployment
    > insurance claims, money supply, inventory changes, and
    > stock prices. The Fed watches many of these indicators
    > as it decides what to do about interest rates. There are
    > also coincident indicators, which change about the same
    > time as the overall economy, and lagging indicators,
    > which change after the overall economy, but these are
    > of minimal use as predictive tools.

    As far as I’m concerned, a growing workweek, rising building permit numbers, and falling unemployment means the economy has improved already. Some aspects of the economy react before others.

    Walter, it’s fine whatever you want to call it. I’m trying to emphasize that including your predictor variable in the results you are trying to predict has given you a totally bogus 9% inflation of your results. To try to emphasize that, I pointed out that a leading indication has to occur BEFORE what you are trying to predict … otherwise it’s hardly leading, is it?

    Another question; Given that last year’s January…

    UAH5.6 anomaly was 0.497 and RSS anomaly was 0.439

    assume that this year the anomalies are lower, 0.183 and 0.175 respectively (You don’t want to know how I got these numbers)

    Do you not see any value in saying that 17 out of the last 19 times a similar event happened for RSS, and 17 out of 20 for UAH, during a non-volcanic year, the current year’s annual anomaly was lower than the previous year’s anomaly?

    First off, in the UAH dataset, out of the 34 January-to-January intervals, only 13 have been negative. Of these, 11 times the year has gone the same way that the January went. So I haven’t a clue where you are getting your data.

    The problem is that your results are statistically indistinguishable from the results I get by applying your method to random “red noise”. In other words they are not a characteristic of the climate. They are a characteristic of the type of dataset.

    Now, is there value in saying that in a red-noise dataset of this type, January change and annual change are correlated? Yes, there is … but not what you think. It is valuable because it gives you a baseline that you have to beat in order to for your actual forecast to have skill. Otherwise, it’s no more than we’d expect.

    That’s why I implored you a while back to learn to do Monte Carlo analysis … so you could see for yourself that you get the same results on red noise. Let me suggest again that you invest the time and effort to learn to do that, so you can determine the odds for yourself.

    w.

  123. A C Osborn says:

    I would like to take a look at what Mr Eschenbach said at February 2, 2014 at 6:07 pm and I quote
    “He’s arbitrarily decided to look at only a bit more than thirty years of the data, starting in 1979.
    Since the dataset starts in 1880, that’s one hundred years of data he’s removed from the GISS LOTI dataset.”
    and
    “But that’s not a reason to ignore a hundred years of perfectly VALID data.”

    I am sure Mr Eschenbach is aware that GISS data has been the subject of massive adjustments over the past 20+ years.
    The least adjusted data is during the Satellite era where their readings have kept it slightly more honest than the data prior to 1979.
    The data prior to 1979 has been adjusted to lower the older temperatures and to remove some of the variability of the 1930/40 period and the 1940/1975 period, Steven Goddard and many others have done a great deal of work on this.
    So why would you want to conduct analysis on data that you are fairly sure does not represent what the Climate actually did and for that reason is NOT “valid”?
    Even the period that he has used has shown marked differences between all the datasets from 1979 onwards.

    Although I think he has found something very interesting, I am not sure of it’s value though. It is nice to know that in January you can quite accurately predict the Global Temperature movement for that year, especially compared to other GCM’s output.
    But it doesn’t seem to be very useful, it doesn’t help Farmers plan their crops, or Authorities plan for Ice, Snow, Flood or Drought etc.
    However if what he has shown also works at the more “Local” level, Continental, Country, State or area that could prove to be much more useful for planners.

  124. A C Osborn says:

    Willis Eschenbach says:February 1, 2014 at 9:15 pm

    Based on the Mangled GISS data of the last 130 years, it was used to create some “Proxy” random data, the GISS data was Detrended and Random data was generated using the AR and MA coefficients (0.93 and -0.48, typical values for global temperature datasets).

    How then do all 7 sets of Random data show almost identical Trends rather than 7 straight lines?
    I assume that the random data was therefore placed around the original GISS Trend line.
    So what was actually measured when the leading Indicator was used is the effectiveness of it’s use to predict a fixed Trend using random values and as you would expect due to the trend it worked 66 percent of the time.
    But as we all know the Global Temperature WAS NOT just a fixed trend prior to NASA/GISS involvement, it was a low level fixed trend with an approximately 60 year cycle overlaid on it, where the temperature in the period of 30/40s was at least as high as it is now.

    So was it a fair test, was it really comparing Oranges with Oranges?

  125. walterdnes says:

    Willis Eschenbach says:
    > February 3, 2014 at 2:56 am

    > Now, is there value in saying that in a red-noise dataset
    > of this type, January change and annual change are
    > correlated? Yes, there is … but not what you think. It is
    > valuable because it gives you a baseline that you have to
    > beat in order to for your actual forecast to have skill.
    > Otherwise, it’s no more than we’d expect.

    Are we possibly talking past each other? I don’t care if hitting 80%+ correct is considered “zero skill”, due to “cheating” by using January data and taking advantage of auto correlation. I’ll gladly settle for it. Or are you saying that this was a fluke, and could be 80% wrong the next 20 or 30 years.

    > That’s why I implored you a while back to learn to do
    > Monte Carlo analysis … so you could see for yourself
    > that you get the same results on red noise. Let me
    > suggest again that you invest the time and effort to
    > learn to do that, so you can determine the odds for yourself.

    Do you have pointers to tutorials on general monte carlo simulation? My Google-searching returns either PDF’s full of complex equations+integrals, or some very problem-specific apps.

  126. Greg Goodman says:

    Willis: “Now, is there value in saying that in a red-noise dataset of this type, January change and annual change are correlated? Yes, there is … but not what you think. It is valuable because it gives you a baseline that you have to beat in order to for your actual forecast to have skill. Otherwise, it’s no more than we’d expect.”

    This is an interesting point. ENSO index is a fairly crude and clunky three month running mean, so Jan ENSO is NDJ mean.

    Now if we apply the same logic, the “ENSO-meter” has no more predicitive ability than OP’s effort. It is simply a restatement of the basic monthly AR1 structure of the data.

    ENSO has the added interest that this relatively small region of SST seems to lag-correlate with many other basins. Though to what extent this represents common cause rather than the assumed causation has never been assessed to my knowledge.

    So does ENSO have any predictive ability?

  127. Richard Mallett says:

    Doing the same exercise on the Met Office Hadley Centre Central England Temperature record from 1659-2013 gives the following results :-
    January higher than previous and year higher than previous – 106; year lower 66 (success 62%)
    January lower than previous and year lower than previous – 106; year higher 63 (success 63%)
    There were 13 Januaries, and 2 years, that were the same as previous years.
    The January Leading Indicator is not supported by the longest temperature record.

  128. Whether or not the JLI has skill, it appears to provide us with an essential ingredient for policy making. This ingredient is information about the outcomes from policy decisions. None of the climate models referenced by AR4 provide us with information.

  129. A C Osborn says:

    Richard Mallett says:February 3, 2014 at 8:18 am

    Do you know what the post 1979 data shows?

  130. Richard Mallett says:

    January high and year high 9 year low 8 success 53%
    January low and year low 10 year high 7 success 59%
    That coin toss theory is looking better and better.

  131. Willis Eschenbach says:

    Terry Oldberg says:
    February 3, 2014 at 8:30 am

    Whether or not the JLI has skill, it appears to provide us with an essential ingredient for policy making. This ingredient is information about the outcomes from policy decisions. None of the climate models referenced by AR4 provide us with information.

    Walter says:

    Do you not see any value in saying that 17 out of the last 19 times a similar event happened for RSS, and 17 out of 20 for UAH, during a non-volcanic year, the current year’s annual anomaly was lower than the previous year’s anomaly?

    OK, let me see if an example makes it any clearer. In the spirit of “January Leading Indicator” (JLI), let me propose a Monthly Leading Indicator (MLI). The Monthly Leading Indicator says that if a given month is warmer than last year, the average of that given month and the following month has an astounding 84% chance of being warmer than the average of those same two months in the previous year.

    That is a much more impressive performance than the January Leading Indicator. Eighty-four percent correct, not just on some specially selected subset of the data (e.g. the subset of non-volcano years after 1978 where the dataset shows cooling, as in Walter’s example above), but on the entire GISS LOTI dataset, every year, N = 1595, a very solid result.

    So, Walter and Terry … does my MLI (Monthly Leading Indicator) “provide us with an essential ingredient for policy making”?

    Or to put it another way, “do you not see any value” in the impressive MLI results?

    Me, I say a very positive no to both questions … I say the MLI is neither meaningful nor valuable. I say it’s in the category of such “successful” predictions as saying tomorrow will be like today. Yes, that is true in general … and no, it’s not particularly valuable information.

    So I ask you both … does the 84% success rate of the MLI make it a valuable leading indicator providing us with essential information for policy-making? And if not, why not?

    Bear in mind that the MLI and the JLI are identical in form, suffer from the same flaw (predictand includes the predictor variable), and differ only in the length of the average …

    w.

  132. A C Osborn says:

    Willis Eschenbach says: February 3, 2014 at 10:58 am
    The answer is No, it can only be used as a Global Indicator, which is of no practical use.
    If JLI worked at local levels it would be of some practical use in terms of at least ensuring that you had enough Salt & Grit to last the cold period, so that you don’t run out as we did in the UK a few years ago.
    However as Richard Mallett has shown it certainly doesn’t work for the locale of Central England.

  133. For completeness, I’d like to point out that use of the term “noise” is inappropriate in reference to the problem of prediction. This is true though the use of this term is appropriate in reference to the problem of retrodiction.

    A communications system is an example of a retrodictor, for the outcomes of events lie in the past. A predictive system is an example of a predictor, for the outcomes lie in the future. Under Einsteinian relativity, a physical signal may travel from the present toward the future
    as it can do so without its speed exceeding the speed of light. However, a physical signal may not travel from the future, toward the present as to do so its speed would have to exceed the speed of light. It follows that “noise” does not exist for a predictive system. Thus, the IPCC’s argument that that fluctuating temperatures prior to the recent run up in CO2 concentrations constitute “noise” with respect to a postulated “anthropogenic signal” fails from its violation of relativity.

    Einsteinian relativity presents no barrier to the flow of information from the future to the present as information is non-physical. While relativity does not bar the flow of information, professional climatologists have barred this flow. They have done so by their choices in designing their studies of global warming. The net result is that we have gained nothing of any value in regulating the climate from the $200 billion thus far spent in pursuit of this goal.

  134. Richard Mallett says:

    This illustrates a more general problem – that one can debate about how much the globe is warming or cooling per century; but this tells us little or nothing about whether individual governments at national / state / county level have a climate problem, and, what (if anything) they should be doing about it.

  135. David L. Hagen says:

    Terry Oldberg
    Re: “petitio principii, or the circular argument”
    See petitio princippi fallacy search. This is also known as “begging the question”.

  136. george e. smith says:

    “””””…..Willis Eschenbach says:

    February 2, 2014 at 6:53 pm

    george e. smith says:
    February 2, 2014 at 2:18 pm

    Now I have seen everything, there is to know about the climate. A graph with absolutely no time scale at all.

    George, you might google “scatterplot”. They don’t have time scales. Instead, they have pairs of values……”””””

    Willis, I truly do appreciate your effort to explain Walters plots of annual anomalies versus monthly anomalies; but I must admit, I just don’t get it.
    Now I do know what scatter plots are; I use them ALL the time. It is one of the few graphing means in Micro$oft Excel, that is of any use for engineering or scientific graphing. I use M$ excel exclusively to do all my math analysis; even though it is the most brain dead piece of trash, I have ever encountered, and I plot my results using the scatter plot, as pie charts, or bar graphs are simply not a good way to plot the surface profile of a precision optical element.

    But when I saw both year and month, in the same graph, I fully expected to see no more than 12 plotted points, that being how many months on average, there are in an average year.

    So if I look at any one dot on Walter’s first graph, I see an X-axis value on the monthly anomaly axis, with a yearly anomaly value on the y-axis, but no way to know what month or what year, these two numbers are extracted from, or even if they are from the same year or the same month.

    I also know what Monte Carlo methods are. They are also used extensively in circuit design fro manufacturing purposes. (they were in the good old days.

    You took a circuit design with a list of all of its component values, and their value tolerances. Then the computer ran a simulation of the circuit performance, with random values chosen for each element parameter value, within the stated tolerance range. The result was checked for compliance with the manufacturing spec, and then repeated many times, to find out how many of the circuits, would perform to the manufacturing spec. This information would be used to calculate manufacturing yields of good circuits, and figure the cost of the yield loss.

    The designer, could then consider buying cheaper looser spec components to save some manufacturing cost; balancing that cost, against the loss to lower manufactured yields.

    I don’t believe I ever designed a circuit that way. We always plugged in the worst case component tolerance value; say perhaps the value that would give the lowest gain for an amplifier, and did that simultaneously for ALL parameters, Then a SPICE or hand analysis, would calculate the result, to see if the gain (or whatever) exceeded the minimum manufacturing spec.

    So all my designs were guaranteed by design, and always worked correctly unless some component had a catastrophic failure (from its stated spec).

    That probably costs more, than Monte Carlo methodology; but how do you put a cost on a pissed off customer, who got one of your gizmos that was a fringe performer that MC said was a low probability occurrence.

    Just like buying a winning lottery ticket; a non functioning gizmo, is not appreciated, no matter how unlikely the MC analysis said it would be. Somebody wins the lottery; and someone always buys your lemon; and then never buys another thing from you.

    g

    And yes, thanks for trying Willis.

  137. Dr. Strangelove says:

    Just the facts, Willis
    Do a Monte Carlo simulation. Case A = 20,000 runs and Case B = 14,000 runs at 1,000 runs per set to simulate the results obtained in your exercise using 1,000x the actual data (20 and 14). Compute the probabilities. You will see the results are statistically significant. The null hypothesis will be rejected.

    It’s a non-issue that the predictor is included in the prediction. What’s important is the certainty of the prediction. This exercise is equivalent to a statistical sampling where the sample size n = 1 (one month) results to an estimator with confidence level of approximately 80%. In statistical sampling, the sample (predictor) is included in the population (prediction).

    Extend the analysis. Try n = 2 (Jan. and Feb.) and see if the confidence level will increase to 90-95%. Try a one-year predictor to predict 5-year and 10-year periods. Try a two consecutive-year predictor and see if the confidence level will increase.

  138. Dr. Strangelove says:

    Willis
    I see the value of leading indicators. A colder winter in January than last year tells us that there’s 80% chance the rest of the year will be colder than last year. Do the analysis in many countries and it tells us the ‘global warming pause’ will likely extend for another year. At the very least, it debunks the claim that colder winter is also due to global warming. Colder winter indicates colder climate ahead. Perhaps global cooling.

  139. Werner Brozek says:

    george e. smith says:
    February 3, 2014 at 4:47 pm
    I must admit, I just don’t get it.

    Please allow me to try. I will just use the RSS plot and illustrate a single point. There would be as many points as years, so if Walter plotted 30 years, there are 30 Januaries and 30 years. Each January and each year are represented by a single point for the 2 pieces of information. For 2013, the January anomaly on RSS was 0.439. And the average anomaly for all of 2013 was 0.218. So on the x axis, go to 0.439 which is the January anomaly. Then on the y axis, go to 0.218 which is the average anomaly for 2013. And you see a little diamond where 0.218 on the y axis intersects 0.439 on the x axis. As you noted, there is absolutely nothing to indicate that this point represents 2013.

  140. walterdnes says:

    Early on in the comments
    philjourdan says:
    > February 1, 2014 at 4:47 pm
    >
    > Is that pre or post adjustments?

    Latitude says:
    >February 1, 2014 at 4:49 pm
    >
    > Note: GISS numbers are…….fake
    >
    > http://stevengoddard.files.wordpress.com/2014/01/hidingthedecline1940-19671.gif

    I decided to go back and do the full GISS data set. Then I remembered that I had once pulled the GISS data to the end of 2005 from “the wayback machine”. It was still kicking around on my hard drive. I wrote a script to process the data to the end of 2005. I ran the script on the old GISS data set, and the current data set. Note; both data sets were processed to the end of 2005 to provide an apples-to-apples comparison. The results…

    data set issued January 2006, with numbers to December 2005
    Warmer forecast
    count hit fail
    65 45 20
    Colder forecast
    count hit fail
    58 37 21
    And 2 Januarys equal to previous January

    data set issued January 2014, with numbers to December 2013
    Warmer forecast
    count hit fail
    70 51 19
    Colder forecast
    count hit fail
    54 35 19
    And 1 January equal to previous January

    The *NET* difference was a handful of years changing. But “it’s worse than I thought”… in the 8 years between the 2 GISS outputs, 17 out of 125 forecast years (i.e. 13.6%) had some change that affected the analysis. This was any of current versus previous January over/under, or current year versus previous year over/under. Willis may be right about me “analysing red noise”, but not the way we thought. There’s an expression in computing “bit rot”, but this is ridiculous.

  141. Bernie Hutchins says:

    Obviously, it is sometimes possible and useful to indicate the time-ordering (where appropriate) of points on a scatter plot. For example, if there are just a few points, you may be able to put a number near each point. Or, you can mark one point and then connect successive points with unobtrusive straight lines (like a Poincare phase plot). Doesn’t always work. It may be too messy. And oftentimes, there is no inherent ordering of scatter plot points anyway (like students scores in math vs. scores in physics).

  142. walterdnes says:

    Willis, does the “skill level” become significant if you use GISS data 1955 to 2013, especially with 1991-1994 removed? See below for the long explanation.

    One of the advantages of being retired is that I can stay up late to work on hunches. Given that the data from 1979 shows high JLI correlation, and analysis of the entire GISS data set shows lower correlation, then it’s obvious that the correlation must’ve been really bad sometime before 1979. I wrote a bash script to parse the data and spit out CSV-formatted running total counts in 3 columns…

    Column 1 Year
    Column 2 “Greater than” (hits – misses)
    Column 3 “Less than” (hits – misses)

    Looking at a graph shows
    “Greater Than” forecasts verified
    * flatline barely positive 1881 to 1918
    * good correlation 1919 to late 1930’s
    * flatline late 1930’s to 1955 (WWII and aftermath?)
    * good correlation from 1955 onwards, except 1991 and 1992

    “Less Than” forecasts verified
    * flatline barely positive 1881 to 1926
    * good correlation 1927 to late 1930’s
    * flatline late 1930’s to 1955 (WWII and aftermath?)
    * good correlation from 1955 onwards, except 1993 and 1994

    Pinatubo hit the analysis badly. 1991-1994 was one of only two “4-consecutive-years-bust” situations in the analysis. The other one was 1897-1900. See http://en.wikipedia.org/wiki/Mayon_Volcano#1897_eruption
    > Mayon Volcano’s longest uninterrupted eruption
    > occurred on June 23, 1897 (VEI=4), which lasted for
    > seven days of raining fire. Lava once again flowed down
    > to civilization. Eleven kilometers (7 miles) eastward, the
    > village of Bacacay was buried 15 m (49 ft) beneath the
    > lava. In Libon 100 people were killed by steam and
    > falling debris or hot rocks. Other villages like
    > San Roque, Misericordia and Santo Niño became
    > deathtraps. Ash was carried in black clouds as far as
    > 160 kilometres (99 mi) from the catastrophic event,
    > which killed more than 400 people

  143. A C Osborn says:

    walterdnes says: February 3, 2014 at 8:59 pm “But “it’s worse than I thought”… in the 8 years between the 2 GISS outputs, 17 out of 125 forecast years (i.e. 13.6%) had some change that affected the analysis”
    That is why I said A C Osborn says: February 3, 2014 at 3:39 am “So why would you want to conduct analysis on data that you are fairly sure does not represent what the Climate actually did and for that reason is NOT “valid”?”

  144. David L. Hagen:

    Thanks for the citations!

    The petitio princippi fallacy does not apply. It is a consequence from an argument under the classical logic but this logic applies only to situations in which information for a deductive conclusion is not missing. Here, information is missing. Had Mr. Dnes’s model claimed that the sign of the change in the annual temperature anomaly determines the sign of the change in the annual temperature anomaly information would not have been missing and the model would have been guilty of this fallacy.

  145. Willis Eschenbach says:

    Dr. Strangelove says:
    February 3, 2014 at 6:16 pm

    Just the facts, Willis
    Do a Monte Carlo simulation. Case A = 20,000 runs and Case B = 14,000 runs at 1,000 runs per set to simulate the results obtained in your exercise using 1,000x the actual data (20 and 14). Compute the probabilities. You will see the results are statistically significant. The null hypothesis will be rejected.

    I did a monte carlo simulation, several of them in fact, and reported on them above. In none of them were the results statistically significant. You’re simply making things up.

    w.

  146. philjourdan says:

    @Walterdnes – Thank you for checking. But I was curious about the numbers you had used originally given the recent revelation of the changes by Goddard.

    You are just affirming his work as well as your own

  147. Willis Eshenbach:

    Thank you for taking the time to respond.

    Whether the MLI provides information about the outcomes of events is unclear pending further work on your part. The JLI has the same shortcoming. Not withstanding the alleged “flaw,” with further work either indicator might be found to provide information to a policy maker. It is doubtful that this information, if present, would be of interest to a policy maker on global warming. Currently, policy makers on global warming make policy without having information. The IPCC seems to have duped policy makers into thinking they have information when they have none.

    My interest in Mr. Dnes’s work flows from my belief that nearly all climatologists need tutoring in elementary ideas of probability theory, statistics, information theory and logic. Mr. Dnes’s work is of didactic value because recognizable in his methodology are the ideas of event, observed event, unobserved but observable event, outcome, condition, population, sample, frequency, relative frequency, probability and inference. All of these ideas are crucial to the creation of a model that: a) supplies information to a policy about the outcomes from his policy decisions and b) makes falsifiable claims. None of these ideas are evident in the report of Working Group I, AR4 on the allegedly scientific (but actually pseudo-scientific) basis for the IPCC’s claims, thus the need for tutoring climatologists.

  148. Dr. Strangelove says:

    Willis
    I’m not making things up. Do a Monte Carlo simulation the way I described it. You can probably do it on Excel spreadsheet. You need a random number generator. Assign two possible outcomes: Outcome 1 and Outcome 2 as explained previously. Each with equal probability P = 0.5. Plot the histogram of the 20,000 runs. You will see a normal curve. Then compute the probabilities from the normal curve of the actual results of outcomes 1 and 2 obtained from temperature data. This is not a hoax.

  149. Guy says:

    Should we not be looking at the prediction for “the rest of the year”? If January is up and the rest of the year is down less than January was up, the year will still be up.

  150. Dr. Strangelove says:

    Willis
    My proposed MC simulation is to mathematically prove that the results are statistically significant and reject the null hypothesis. However it is conceptually obvious why the results must be significant. The predictor is actually part of the prediction (population) as you pointed out. The former accounts for 1/12 = 8% of the latter. Clearly the two are causally linked hence disproving the null hypothesis that they completely independent events, that any correlation between the two are purely due to chance. The MC simulation is a simple mathematical proof of what is conceptually obvious.

    You also did MC simulation but you correctly demonstrated a different point from the one I made. You demonstrated that the predictor can also predict a random function. It does not actually make the results trivial. It only proves that the method applies to both non-random and random functions. For example, we can do statistical sampling in the population of New York City. These are real data and the results are statistically significant. Then we can invent an imaginary city with random population characteristics and do statistical sampling. These are imaginary random data but the results are also statistically significant. It only proves the rules of statistics apply to both real and imaginary data.

  151. Dr. Strangelove:

    Thanks for sharing your view. It looks to me as though you’ve conflated some concepts.

    As I’ll use the terms, a “Condition” is a condition on the Cartesian product of the values that are taken on by a model’s independent variables. An “Outcome” is a condition on the Cartesian product of the values that are taken on by a model’s dependent variables. A pairing of a Condition
    with an Outcome provides a description of an event.

    In establishment of the significance level of the predictions of a model, the appropriate null hypothesis is that the Conditions of the events are independent of the associated Outcomes. In this case, the so-called “mutual information” between the Conditions and the Outcomes is nil. With the mutual information nil, knowing the Condition of an event provides one with no information about the Outcome.

    A “population” is a set of events. A “prediction” is an extrapolation from an observed Condition of an event to an unobserved but observable Outcome of the same event. A “predictor” is a condition. Thus, it is untrue that (as you claim) “the predictor is actually part of the prediction (population)” for a population is not a prediction and a predictor is neither a part of a prediction nor a part of a population.

  152. Bernie Hutchins says:

    Strangelove,

    People here need to see your code. Here the code is necessary (or at least very useful) not because they doubt your results (although they may) but because it may give better clues about what you are talking about. Descriptions in English and even in mathematical formulas may be ambiguous and awkward. Computer code (often, even if in an unfamiliar language) is unambiguous. Computers won’t tolerate ambiguity. The code posted first need not be elegant – in fact the more simple-minded it is, the better at this point.

  153. Dr. Strangelove says:

    Terry
    Apparently we are talking of different things. You are talking about functions and regression analysis. I’m talking about statistics and probability theory (though statistics is also used in regression analysis). In this context “outcome,” “population” and “prediction” are not how you define them.

    Bernie
    The concepts are simple enough to understand without computer programming, at least for those familiar with probability theory. If not, a google search on statistics would be more informative.

  154. Bernie Hutchins says:

    Dr. Strangelove said on February 4, 2014 at 10:29 pm
    “Bernie
    The concepts are simple enough to understand without computer programming, at least for those familiar with probability theory. If not, a google search on statistics would be more informative.”

    Wow – that was dismissive! I am familiar enough with probability theory. What I do not understand is the “nuts and bolts” of what you claim to be doing in your post of 5:46pm. It is vague and makes no sense – what you did, or why you are even doing certain things. If it’s not BS on your part, than post some code. Or are you just hand-waving?

  155. Dr. Strangelove:

    Thank you for taking the time to reply and for sharing your ideas.

    In this thread, I have not made reference to regression analysis. I have made reference to probability theory, statistics and information theory, In doing so, I have used the terminology that I have always used in discourse with professional statisticians. They have used the same terminology. If you favor a terminology that differs from this one, what is it?

  156. Willis Eschenbach says:

    walterdnes says:
    February 4, 2014 at 1:30 am

    Willis, does the “skill level” become significant if you use GISS data 1955 to 2013, especially with 1991-1994 removed? See below for the long explanation.

    Sorry, I’ve said this before, but it bears repeating. You don’t get to pick and choose what data to use based on whether it fits your theory.

    More to the point … who cares? To give a real-world example, the farmers around where I live like hot summers for the grapes. So I took a look at your JLI for my local weather station, Santa Rosa. I used all of the months, not just January, as the leading indicator for that month plus the next 11 months.

    Just like you said, it works a treat, it gives me a 59% success rate. So I set myself as Nostradamus of the North, the Weather Prognosticator.

    So now, when the January is warmer than last year, the good farmers around here come to me and I’ll tell them “Yep, Walter’s indicator says it will be warmer”. And they all go away satisfied, because they can now plan for the future … except for one ornery old geezer who comes back and says, “Hang on … how much warmer than last year will it be?”

    So I go back to my data, I average out all of the results, and I tell him “Walter’s method says it will be a bit more than a tenth of a degree warmer than last year” … he considers that a moment, then asks for the standard deviation of the results … I go back and calculate that one … “Plus or minus half a degree”, I tell him.

    And the farmer says “You’re telling me that this year will be a tenth of a degree warmer than last year, plus or minus half a degree? Have you lost your mind? What do I care about a tenth of a degree, particularly with that wide an error in the results?”

    I’m sure you can see the moral of the story. It’s a difference that doesn’t make a difference, and things are even worse (much smaller values) at a global level. In fact, using January alone for all of the GISS LOTI data, yes, there is a real result (Average of positive = 0.05, average of negative = -0.05), but the standard deviation is twice that value (0.10).

    Finally, upstream someone commented:

    Whether or not the JLI has skill, it appears to provide us with an essential ingredient for policy making. This ingredient is information about the outcomes from policy decisions. None of the climate models referenced by AR4 provide us with information.

    Using the GISS LOTI data, we can say that if this January is warmer than last January, we can say that this year will be will be a heart-stopping 0.05°C warmer than last year ON AVERAGE, with a 95% confidence interval from -0.15°C to 0.25°C.

    Anyone who thinks that a projected possible warming of five hundredths of a degree is an “essential ingredient for policy making” hasn’t thought this all the way through.

    w.

  157. Willis Eschenbach says:

    Dr. Strangelove says:
    February 4, 2014 at 5:46 pm

    Willis
    I’m not making things up. Do a Monte Carlo simulation the way I described it. You can probably do it on Excel spreadsheet. You need a random number generator. Assign two possible outcomes: Outcome 1 and Outcome 2 as explained previously. Each with equal probability P = 0.5. Plot the histogram of the 20,000 runs. You will see a normal curve. Then compute the probabilities from the normal curve of the actual results of outcomes 1 and 2 obtained from temperature data. This is not a hoax.

    Read up on the difference between “white noise” and “red noise”, Doc. You’ve used white noise, perhaps not even random normal white noise (excel “RAND” function gives uniform random rather than normal random numbers) … but the temperature data you are testing is red noise, actually very red noise. As a result, you need red noise pseudodata for the monte carlo test.

    w.

  158. Willis Eschenbach says:

    Guy says:
    February 4, 2014 at 6:40 pm

    Should we not be looking at the prediction for “the rest of the year”? If January is up and the rest of the year is down less than January was up, the year will still be up.

    Thanks for the support, Guy. I’ve pointed that out a bunch of times, only to be told it doesn’t matter. It does matter, but these guys love their “positive” results … go figure.

    w.

  159. Willis Eschenbach says:

    Terry Oldberg says:
    February 4, 2014 at 9:01 pm

    Dr. Strangelove:

    Thanks for sharing your view. It looks to me as though you’ve conflated some concepts.

    As I’ll use the terms, a “Condition” is a condition on the Cartesian product of the values that are taken on by a model’s independent variables. An “Outcome” is a condition on the Cartesian product of the values that are taken on by a model’s dependent variables.

    I suppose I should try decoding Terry again, although it’s not been too productive in the past … Terry, defining a “Condition” in capital letters as a “condition on the Cartesian product of the values” of the independent variables, doesn’t mean anything to me.

    Suppose we have two independent variables, J and K. The values of J are {a, b, c} and the values of K are {d, e, f}. The Cartesian product of those two sets, more commonly called the “cross product” is the set of all possible pairs,

    { { a, d }, { a, e }, { a, f }, { b, d }, { b, e }, { b, f }, { c, d }, { c, e }, { c, f } }

    OK, that’s our Cartesian product of the values of the independent variables. But what on earth is a “condition on” the set { { a, d }, { a, e }, { a, f }, { b, d }, { b, e }, { b, f }, { c, d }, { c, e }, { c, f } }? That makes no sense at all, and its application to the current situation is completely unclear.

    w.

  160. Willis Eschenbach says:

    Bernie Hutchins says:
    February 4, 2014 at 9:30 pm

    Strangelove,

    People here need to see your code.

    Dr. Strangelove says:
    February 4, 2014 at 10:29 pm

    Bernie
    The concepts are simple enough to understand without computer programming, at least for those familiar with probability theory.

    Doc, Bernie’s not asking about your concepts. He wants to see exactly what you did. Not what your concepts say you did. Not what you truly believe you did.

    What you actually did.

    Either show your code or we are under no obligation to listen to a word you say. This is a scientific site.

    w.

  161. Werner Brozek says:

    Willis Eschenbach says:
    February 5, 2014 at 3:44 pm
    Anyone who thinks that a projected possible warming of five hundredths of a degree is an “essential ingredient for policy making” hasn’t thought this all the way through.

    I agree. And the MET office is no better. See
    http://www.metoffice.gov.uk/media/pdf/1/8/decadal_forecast_2014-2018_jan2014.pdf

    “• Averaged over the 5-year period 2014-2018, global average temperature is expected to remain high and is likely to be between 0.17°C and 0.43°C above the long-term (1981–2010) average.”

    “Conclusions
    It also has a broad range of potential applications in terms of policy making and investment decisions.”

  162. Brian H says:

    Aren’t we talking about autocorrelation here? Each month is the start point for the next.

  163. Willis Eschenbach says:

    Brian H says:
    February 6, 2014 at 12:17 am (Edit)

    Aren’t we talking about autocorrelation here? Each month is the start point for the next.

    Umm … yep, I talked about that very thing, as have other folks. A search for “autocorr…” on the page will find much discussion of the subject.

    w.

Comments are closed.