Sunspots: Labitzke Meets Bonferroni

Guest Post by Willis Eschenbach

In a previous thread here on WUWT, a commenter said that the sunspot-related variations in solar output were shown by Labitzke et al. to affect the stratospheric temperature over the North Pole, viz:

Karin Labitzke cracked that nut. She was the first one to find a correlation between not one but two atmospheric parameters and solar activity. After almost 40 years her findings are still solid, and thanks to her we know that the strength of the polar vortex depends on solar activity modulated by the quasi-biennial oscillation.

And when I went and got the data from the Freie Universität Berlin, I was able to replicate their result. Here’s the relationship Dr. Labitzke et al. found between sunspots and polar stratospheric temperatures:

Figure 1. Sunspots versus north pole stratospheric temperatures. Red line shows the trend.

So … what’s not to like? To lay the groundwork for the answer to that question, let me refer folks to my previous post, Sea Level and Effective N, which discusses the Bonferroni Correction and long-term persistence (LTP).

The Bonferroni Correction is needed when you’ve looked in more than one place or looked more than one time for something unusual. 

For example. Suppose we throw three dice at once and all three of them come up showing fours … that’s a bit suspicious, right? Might even be enough to make you say the dice were loaded. The chance of three 4’s in a single throw of three dice is only about five in a thousand.

But suppose you throw your three dice say a hundred times. Would it be strange or unusual to find three 4’s in one of the throws among them?  Well … no. Actually, with that many tries, you have about a 40% chance of getting three 4’s in there somewhere.

In other words, if you look in enough places or you look enough times, you’ll find all kinds of unusual things happening purely by random chance.

Now in climate science, for something to be considered statistically significant, the odds of it happening by random chance alone have to be less than five in a hundred. Or to put it in the terms commonly used, what is called the “p-value” needs to be less than five hundredths, which is usually written as “p-value < 0.05”.

HOWEVER, and it’s a big however, when you look in more than one place, for something to be significant it needs to have a lower p-value. The Bonferroni Correction says you need to divide the desired p-value (0.05) by the number of places that you’ve looked. So for example, if you look in ten places for some given effect, for the effect to be significant it would have to have a p-value less than 0.05 divided by ten, because ten is the number of places you’ve looked. This means it would have to have a p-value of 0.005 or less to be statistically significant.

So … how many places were examined? To answer that, let me be more specific about what was actually found.

The chart above shows their finding … which is that if you look at the temperature in February, at one of seven different possible sampled levels of the stratosphere, over the North Pole, compared to the January sunspots lagged by one month, during the approximately half of the time when the equatorial stratospheric winds are going west rather than east, the p-value is 0.002.

How many different places have they looked for a relationship? Well, they’ve chosen the temperature of one of twelve months, in one of seven atmospheric levels, with one of three sunspot lag possibilities (0, 1, or 2 months lag), and one of two equatorial stratospheric wind conditions.

That gives 504 different combinations. Heck, even if we leave out the seven levels, that’s still 72 different combinations. So at a very conservative estimate, we’d need to find something with a p-value of 0.05 divided by 72, which is 0.0007 … and the p-value of her finding is about three times that. Not significant.

And this doesn’t even account for the spatial sub-selection. They’re looking just at temperatures over the North Pole, and the area north of the Arctic Circle is only 4% of the planet … which would make the Bonferroni Correction even larger.

That’s the first problem, a very large Bonferroni Correction. The second problem, as I discussed in my post linked to above, is that we have to account for long-term persistence (LTP). After accounting for LTP, the p-value of what is shown in Figure 1 above rises to 0.09 … which is not statistically significant, even without considering the Bonferroni Correction.

To summarize:

  • As Labitzke et al. found, February temperatures at 22 kilometres altitude over the North Pole during the time when the equatorial stratospheric winds are blowing to the west are indeed correlated with January sunspots lagged one month.
  • The nominal p-value without accounting for LTP or Bonferroni is 0.002, which appears significant.
  • However, when you account just for LTP, the p-value rises to 0.09, which is not significant.
  • And when you use the Bonferroni Correction to account just for looking in a host of locations and conditions, you’d need a p-value less than about 0.0007 to be statistically significant.
  • So accounting for either the LTP or the Bonferroni Correction is enough, all by itself, to establish that the claimed correlation is not statistically significant … and when we account for both LTP and Bonferroni, we see that the results are far, far from being statistically significant.

Unfortunately, the kind of slipshod statistical calculation reflected in the study is far too common in the climate debate, on both sides of the aisle …

ADDENDUM: I was lying in bed last night after writing this and I thought “Wait … what??” Here’s the idea that made me wonder—if you were going to look for some solar-related effect in February, where is the last place on Earth you’d expect to find it?

Yep, you’re right … the last place you’d expect to find a solar effect in February would be the North Polar region, where in February there is absolutely no sun at all … doesn’t make it impossible. Just less probable.

Finally, does this mean that the small sunspot-related solar variations have no effect on the earth? Not at all. As a ham radio operator myself (H44WE), I know for example that sunspots affect the electrical qualities of the ionosphere.

What I have NOT found is any evidence that the small sunspot-related solar variations having any effect down here at the surface. Doesn’t mean it doesn’t exist … just that despite extensive searching I have not found any such evidence.

My best regards to all,

w.

PS—As usual, I request that when you comment, you quote the exact words you are referring to, so that we can all be clear about just who and what you are discussing.

Get notified when a new post is published.
Subscribe today!
0 0 votes
Article Rating
163 Comments
Inline Feedbacks
View all comments
February 27, 2019 5:06 am

To prove my point above that this problem should be treated as a regression, I constructed a regression tree with Cubist (aka “M5”), using the data referenced in Willis’ scatter plot above:
https://www.geo.fu-berlin.de/en/met/ag/strat/produkte/northpole

Using only that data, I built a piecewise linear Cubist regression tree to predict the 30hPa temperature, using the sunspot counts and other attributes contained in that file. [A small Python script (below) preprocess the file into the standard Cubist training data format]

The resulting model has an average error of 2.9 degrees in predicting the 30hPa temperature from the other attributes. The ‘relative error’ is a comparison the prediction errors with a default model using only the target mean as a predictor. Correlation is between predicted the predicted temperature and temperature in the training data.

Evaluation on training data (270 cases):
Average |error| 2.9
Relative |error| 0.29
Correlation coefficient 0.94

To mitigate over-fitting of the data, I also ran a 10-fold cross validation, using 90% of the data for training and 10% for blind validation, repeating that 10 times, so that all of the data was used for training and validation. This resulted in a slightly higher average error of 3.2 degrees.

Summary:
Average |error| 3.2
Relative |error| 0.32
Correlation coefficient 0.93

Here are the attributes declared in the model, and extracted from the training data. (Pipe symbol ‘|’ denotes comment)

| Predict stratosphere 30hPa temperature given sunspot number and attributes
| https://www.geo.fu-berlin.de/en/met/ag/strat/produkte/northpole

TEMP. | target
ID: label. | year ID
RJ: continuous. | Sunspot count
QBO: east,west,e/w,w/e. | QBO phase
TRT: early,late. | Transition time
C: True,False. | Cold flag
CW: True,False. | Canadian Warmings
FW: True,False. | Major Final Warming
STAR: True,False. | Major Mid-winter warming
MONTH: nov,dec,jan,feb,mar,apr. | Month
TEMP: continuous. | 30hPa temperature

| Uncomment these to exclude or include only specific attributes etc.
| attributes excluded: RJ.
| attributes included: RJ,QBO,MONTH.

Here is the script which converts the tab-delimited training data table to a cubist .data file. ‘lbzke.table’ was created simply by using a mouse to cut and paste the html table to a text file. In Chrome, the tab chars are automatically created to delimit table fields correctly. Afterwards, a few records with blank temperatures were deleted.

import re
month = [‘nov’,’dec’,’jan’,’feb’,’mar’,’apr’]
with open(‘lbzke.table’,’r’) as f:
for line in f:
word = line.strip().split(‘\t’)
id = word[0]
if id[0] != ‘1’: continue
rj = word[1]
qbo = word[5]
trt = word[9]
t = word[2:5]+word[6:9]

for i in range(6): #iterate over monthly values
r=re.findall(r”([-+]?[0-9]*\.?[0-9]+)(.*)”, t[i])
temp = ‘.’
flag = ‘.’
if len(r) == 1:
temp = r[0][0]
flag = r[0][1]

star = (‘*’ in flag)
cw = (‘CW’ in flag)
fw = (‘FW’ in flag)
c = (‘C’ == flag )

# ID,RJ,QBO,TRT,C,CW,FW,STAR,MON,TEMP
print “%s_%d,%s,%s,%s,%s,%s,%s,%s,%s,%s” % (id,i+1,rj,qbo,trt,c,cw,fw,star,month[i], temp)

In using this multivariate regression, the important question is not “Did I count my Bonferronies?”.
Rather, it is “How much did each variable contribute to the construction of this piece-wise linear regression tree?

Surprisingly, or not, the sunspot counts and BSO flags were not needed at all to achieve this result:

Attribute usage:
Conds Model

100% MONTH
90% C
37% STAR
11% FW

So 100% of the regression rules used MONTH, but only 11% used the major-final-warming flag etc. RJ and QBO were available, but not needed to achieve the given results.

Conclusion
Therefore, for this dataset, I also reject the sunspot/QBO hypothesis. Just like Willis did. The difference is that my modeling tool told me this up front. Willis’ rejection was based on non-explanatory blind principles (Bonferroni etc), which resulted in a lot of hand waving and polemics.

BTW, Cubist is available as an R package (as is M5, Quinlan’s original tree regression tool). I use the C command line version because it has more options.

Here is the complete regression tree, for those who are interested. It has only 8 rules:

Cubist [Release 2.07 GPL Edition] Wed Feb 27 08:04:24 2019
———————————

Options:
Application `labitzke’

Target attribute `TEMP’

Read 270 cases (10 attributes) from labitzke.data

Model:

Rule 1: [43 cases, mean -79.0, range -84 to -76, est err 1.5]

if
C = True
MONTH in {dec, jan}
then
TEMP = -79

Rule 2: [37 cases, mean -73.5, range -83 to -70, est err 2.6]

if
C = True
MONTH in {nov, feb}
then
TEMP = -73

Rule 3: [82 cases, mean -68.0, range -77 to -49, est err 3.8]

if
C = False
STAR = False
MONTH in {nov, dec, jan, feb}
then
TEMP = -68

Rule 4: [16 cases, mean -65.8, range -79 to -61, est err 3.5]

if
C = True
MONTH = mar
then
TEMP = -65

Rule 5: [18 cases, mean -55.3, range -69 to -38, est err 5.8]

if
STAR = True
MONTH in {nov, dec, jan, feb}
then
TEMP = -56

Rule 6: [20 cases, mean -55.0, range -61 to -45, est err 3.2]

if
C = False
FW = False
MONTH = mar
then
TEMP = -56

Rule 7: [13 cases, mean -53.7, range -57 to -51, est err 1.9]

if
C = True
MONTH = apr
then
TEMP = -53

Rule 8: [9 cases, mean -48.0, range -53 to -42, est err 3.9]

if
FW = True
MONTH = mar
then
TEMP = -48

Rule 9: [32 cases, mean -44.8, range -50 to -34, est err 2.8]

if
C = False
MONTH = apr
then
TEMP = -45

Evaluation on training data (270 cases):

Average |error| 2.9
Relative |error| 0.29
Correlation coefficient 0.94

Attribute usage:
Conds Model

100% MONTH
90% C
37% STAR
11% FW

Time: 0.0 secs

Reply to  Johanus
February 27, 2019 5:49 am

Hmm, I used the <pre> and </pre> tags to format the code and tables. But it still sucked up all the indenting. So won’t be able to run the Python code, unless you restore the proper indenting. {Left as an exercise for the students]
😐

February 27, 2019 7:54 am

” … complete regression tree, for those who are interested. It has only 8 9 rules … “

For more info on Cubist and related data-mining tools from Ross Quinlan (an early pioneer in this area):
https://www.rulequest.com/cubist-info.html
https://www.rulequest.com/Personal/

A cogent survey of recursive partitioning classification and regression tree builders, from a distinguished stat.wisc.edu statistician (Loh) who dislikes these tools in general, but is amazed how many researchers still find them useful.
http://washstat.org/presentations/20150604/loh_slides.pdf

February 27, 2019 10:55 am

Willis,
I would be interested in hearing your views on my Cubist experiment, which indicates that there is no support for the solar cycle/QBO hypothesis in the Labitzke datafile you used for your scatter plot.

I’ve always been skeptical of the solar-cycle connection to terrestrial weather. But this file looked interesting because the temperature “measurements” [actually ‘reanalysis’ forecasts] were made in the mid-stratosphere, where warming due to enhanced UV scattering could conceivably be a little stronger. Even more tantalizing, your scatter plot does show some kind of mild correlation (0.28) between 30hPa temp and sunspot count.

Actually, I was able to strengthen that correlation, using Cubist, to 0.94, but also demonstrate, in my regression, that the correlation came from the layer attributes in the dataset, not from sunspot counts or QBO.

I would also like to find the data for the other layers you examined. They don’t appear to be in the other data link you provided.

Thanks,
Johanus

Julian Forbes-Laird
February 27, 2019 2:27 pm

Well it took a while to read through the comments to date, but I am pleased to report that I have reached a five star, fur lined, diamond studded, gold plated, ocean going conclusion re the great solar/ non solar debate: the science is not settled.

Reply to  Julian Forbes-Laird
February 27, 2019 3:00 pm

“the science is not settled.”

Which is good. When science becomes settled, I believe it starts to turn into religion, making it much harder to falsify.

February 27, 2019 4:08 pm

I am using Cubist to further study the structure of the Arctic temperatures in the winter problem. I excluded attributes RJ,C,CW,FW,and STAR, leaving only MONTH and QBO. Obviously MONTH is important for explaining temperatures and I am curious what predictive power QBO has in this ‘northpole’ Labitzke dataset.

Not surprisingly, the mean error doubled and the correlation fell to 0.77. but still a respectable value. Only 3 rules were generated, which neatly divided this winter regime into 3 segments: {nov,dec,jan,feb}, {mar} and {apr}.

There is a standard default for creating estimators: if you don’t have the time or skill to write an accurate estimator, then just use the mean value of the data as the first approximation. And that is exactly what Cubist did here, assigning the mean value of each segment as the regression.

So, even though the temperature varies from -84 to -34, this very simple “bottom line” estimator has a respectable avg error rate of only 5.7 degrees.

Read 270 cases (10 attributes) from labitzke.data
Attributes excluded:
RJ
C
CW
FW
STAR

Model:
Rule 1: [180 cases, mean -70.5, range -84 to -38, est err 6.1]
if
MONTH in {nov, dec, jan, feb}
then
TEMP = -71
Rule 2: [45 cases, mean -57.4, range -79 to -42, est err 6.6]
if
MONTH = mar
then
TEMP = -57
Rule 3: [45 cases, mean -47.4, range -57 to -34, est err 4.3]
if
MONTH = apr
then
TEMP = -47
Evaluation on training data (270 cases):
Average |error| 5.7
Relative |error| 0.57
Correlation coefficient 0.77

Why is this study limit itself to the winter months. Because of ‘data dredging’ cherry picking? No, I don’t think so. I think the answer involves the so-called stratospheric polar vortex, which only exists during these months. Unlike the tropospheric PV, which completely separated vertically and lives all year long.

During the winter in the NH, the Sun heats the stratosphere, but not the ground, creating a powerful temperature gradient and the stratsopheric PV.
Waugh et al., “WHAT IS THE POLAR VORTEX AND HOW DOES IT INFLUENCE WEATHER?”,
https://journals.ametsoc.org/doi/pdf/10.1175/BAMS-D-15-00212.1

Javier’s concept of the 30hPa geopotential height acting like a thermometer bulb is a key here:

30 hPa geopotential height for high solar activity years and low solar activity years. The difference is a whopping 240 meters. A quarter of a kilometer. Indicating profound differences in the density and temperature of the air column below 30 hPa all the way to the surface.

But it doesn’t heat up all the way to the surface, only the top part heats up, because the ground is in relative darkness as this time.

I think that’s why this solar activity only involves the stratosphere. But the effects on the stratospheric PV may very well extend down to the surface.

I think the attributes C,CW,STAR, and FW will explain a lot more about this mysterious problem.

Johann Wundersamer
February 28, 2019 5:45 am

Willis & lying in bed you won’t find the sun, north pole or equator.

Be assured the sun is always out there doing it’s work whatever that would be.

G Griers
March 1, 2019 10:55 am

There is another great sunspot to climate correlation don by Scarfeta et al at MIT. They, too received enormous criticism from the alarmist community, even though all they did was show data and mathematics. But hey, data is data what can you do?