Sunspots: Labitzke Meets Bonferroni

Guest Post by Willis Eschenbach

In a previous thread here on WUWT, a commenter said that the sunspot-related variations in solar output were shown by Labitzke et al. to affect the stratospheric temperature over the North Pole, viz:

Karin Labitzke cracked that nut. She was the first one to find a correlation between not one but two atmospheric parameters and solar activity. After almost 40 years her findings are still solid, and thanks to her we know that the strength of the polar vortex depends on solar activity modulated by the quasi-biennial oscillation.

And when I went and got the data from the Freie Universität Berlin, I was able to replicate their result. Here’s the relationship Dr. Labitzke et al. found between sunspots and polar stratospheric temperatures:

Figure 1. Sunspots versus north pole stratospheric temperatures. Red line shows the trend.

So … what’s not to like? To lay the groundwork for the answer to that question, let me refer folks to my previous post, Sea Level and Effective N, which discusses the Bonferroni Correction and long-term persistence (LTP).

The Bonferroni Correction is needed when you’ve looked in more than one place or looked more than one time for something unusual. 

For example. Suppose we throw three dice at once and all three of them come up showing fours … that’s a bit suspicious, right? Might even be enough to make you say the dice were loaded. The chance of three 4’s in a single throw of three dice is only about five in a thousand.

But suppose you throw your three dice say a hundred times. Would it be strange or unusual to find three 4’s in one of the throws among them?  Well … no. Actually, with that many tries, you have about a 40% chance of getting three 4’s in there somewhere.

In other words, if you look in enough places or you look enough times, you’ll find all kinds of unusual things happening purely by random chance.

Now in climate science, for something to be considered statistically significant, the odds of it happening by random chance alone have to be less than five in a hundred. Or to put it in the terms commonly used, what is called the “p-value” needs to be less than five hundredths, which is usually written as “p-value < 0.05”.

HOWEVER, and it’s a big however, when you look in more than one place, for something to be significant it needs to have a lower p-value. The Bonferroni Correction says you need to divide the desired p-value (0.05) by the number of places that you’ve looked. So for example, if you look in ten places for some given effect, for the effect to be significant it would have to have a p-value less than 0.05 divided by ten, because ten is the number of places you’ve looked. This means it would have to have a p-value of 0.005 or less to be statistically significant.

So … how many places were examined? To answer that, let me be more specific about what was actually found.

The chart above shows their finding … which is that if you look at the temperature in February, at one of seven different possible sampled levels of the stratosphere, over the North Pole, compared to the January sunspots lagged by one month, during the approximately half of the time when the equatorial stratospheric winds are going west rather than east, the p-value is 0.002.

How many different places have they looked for a relationship? Well, they’ve chosen the temperature of one of twelve months, in one of seven atmospheric levels, with one of three sunspot lag possibilities (0, 1, or 2 months lag), and one of two equatorial stratospheric wind conditions.

That gives 504 different combinations. Heck, even if we leave out the seven levels, that’s still 72 different combinations. So at a very conservative estimate, we’d need to find something with a p-value of 0.05 divided by 72, which is 0.0007 … and the p-value of her finding is about three times that. Not significant.

And this doesn’t even account for the spatial sub-selection. They’re looking just at temperatures over the North Pole, and the area north of the Arctic Circle is only 4% of the planet … which would make the Bonferroni Correction even larger.

That’s the first problem, a very large Bonferroni Correction. The second problem, as I discussed in my post linked to above, is that we have to account for long-term persistence (LTP). After accounting for LTP, the p-value of what is shown in Figure 1 above rises to 0.09 … which is not statistically significant, even without considering the Bonferroni Correction.

To summarize:

  • As Labitzke et al. found, February temperatures at 22 kilometres altitude over the North Pole during the time when the equatorial stratospheric winds are blowing to the west are indeed correlated with January sunspots lagged one month.
  • The nominal p-value without accounting for LTP or Bonferroni is 0.002, which appears significant.
  • However, when you account just for LTP, the p-value rises to 0.09, which is not significant.
  • And when you use the Bonferroni Correction to account just for looking in a host of locations and conditions, you’d need a p-value less than about 0.0007 to be statistically significant.
  • So accounting for either the LTP or the Bonferroni Correction is enough, all by itself, to establish that the claimed correlation is not statistically significant … and when we account for both LTP and Bonferroni, we see that the results are far, far from being statistically significant.

Unfortunately, the kind of slipshod statistical calculation reflected in the study is far too common in the climate debate, on both sides of the aisle …

ADDENDUM: I was lying in bed last night after writing this and I thought “Wait … what??” Here’s the idea that made me wonder—if you were going to look for some solar-related effect in February, where is the last place on Earth you’d expect to find it?

Yep, you’re right … the last place you’d expect to find a solar effect in February would be the North Polar region, where in February there is absolutely no sun at all … doesn’t make it impossible. Just less probable.

Finally, does this mean that the small sunspot-related solar variations have no effect on the earth? Not at all. As a ham radio operator myself (H44WE), I know for example that sunspots affect the electrical qualities of the ionosphere.

What I have NOT found is any evidence that the small sunspot-related solar variations having any effect down here at the surface. Doesn’t mean it doesn’t exist … just that despite extensive searching I have not found any such evidence.

My best regards to all,

w.

PS—As usual, I request that when you comment, you quote the exact words you are referring to, so that we can all be clear about just who and what you are discussing.

Get notified when a new post is published.
Subscribe today!
0 0 votes
Article Rating
163 Comments
Inline Feedbacks
View all comments
Latitude
February 25, 2019 2:08 pm

“Suppose we throw three dice at once and all three of them come up showing fours … that’s a bit suspicious, right?”…

no not at all…..Hillary won every coin toss………../snark

Kerry Eubanks
February 25, 2019 2:10 pm

I think some would call this study a data dredge.

Greg
Reply to  Willis Eschenbach
February 27, 2019 12:02 am

Nice brief and well argued article.

The common term is cherry picking. If this effect is not found in January or March , or any month of the year, it should be clear to anyone with a modicum of reflection that it is accidental, not meaningful result. I’m always very skeptical of studies which pull out winter months or some other subset to produce a “finding”.

Here’s the idea that made me wonder—if you were going to look for some solar-related effect in February, where is the last place on Earth you’d expect to find it?

The equinox is in March when the Earth is ‘side-on to the Sun and the Arctic does receive sun. Feb may be cold but it’s not the middle of winter in celestial terms. The stratosphere will get direct insolation anyway, it is the surface which is in shadow.

Also there is the funneling effect of the magnetic field and a cascade of high energy particles entering the Arctic atmosphere. This will be affected by solar wind etc. It is not unreasonable to propose and atmospheric effect in February. The problem comes with it is ONLY seen in that one month.

Greg
Reply to  Willis Eschenbach
February 27, 2019 12:25 am

And when I went and got the data from the Freie Universität Berlin

https://www.geo.fu-berlin.de/met/ag/strat/produkte/qbo/qbo.dat

Hang on Willis, that is zonal wind speed data. Is that what you intended to link to?

Lorenzo
Reply to  Willis Eschenbach
February 27, 2019 2:49 pm

p hacking! William M. Briggs, call your office.

RCS
February 25, 2019 2:15 pm

A well-known example of similar reasoning is
“If you have 20 people in a room, what is the chance that two will have the same birthday?”
The answer is 0.5. If you have 40 people the answer is over 90%.

However, the Bonferrioni correction requires that all the possible choices are uncorrelated. If they are, the correction is less straight-forward and points to lower p values, simply because one is re-sampling the same effect.

In this particular case, I’ve no idea of the correlation between the different possibilities but, since they are connected, correlations may well exist and the Bonferrioni correction may over-estimate the p value.

February 25, 2019 2:22 pm
Reply to  Dave Burton
February 25, 2019 4:10 pm

That’s not an image, its HTML.

Reply to  Anthony Watts
February 25, 2019 4:30 pm

comment image

happy?

Greg
Reply to  Leo Smith
February 27, 2019 12:18 am

I’ll be happy if this posts as an image and not a hyper link.

comment image

If that is a link, please explain how you posted an image.

Greg
Reply to  Greg
February 27, 2019 2:06 am

let’s try html tags, but that used to get filtered out:

Greg
Reply to  Greg
February 27, 2019 2:15 am

OK IMG tags still get removed.

How did you post an image Leo ?

Pft
February 25, 2019 2:22 pm

Not significant is not insignificant. Thats the p-value fallacy so common in science today that stifles scientific advance

Tom Halla
Reply to  Pft
February 25, 2019 2:53 pm

Oh? Why? If a result is not distinguishable from the noise in the system, what does it mean?

WXcycles
Reply to  Tom Halla
February 25, 2019 7:36 pm

It means you can’t actually isolate solar cycles in weather cycles and then claim them to be climate cycles.

LdB
Reply to  Tom Halla
February 26, 2019 12:47 am

Correlation does equal causation it likely means nothing just another coincidence.

Hivemind
Reply to  Pft
February 26, 2019 3:58 am

Statistical significance is often incorrectly used to mean significant. A more modern term is medically significant. Eg, drug trials of blood pressure reducing medicines can show that they reduce patient blood pressure by a statistically significant amount. But in medical terms it can be too low to be meaningful.

February 25, 2019 2:29 pm

My biggest smile:

ADDENDUM: I was lying in bed last night after writing this and I thought “Wait … what??” Here’s the idea that made me wonder—if you were going to look for some solar-related effect in February, where is the last place on Earth you’d expect to find it?

One of those great observations so obvious that I’m annoyed that I didn’t think of it.

Reply to  Dave Burton
February 25, 2019 4:34 pm

Yes, but the solar activity affects the North (and South) polar zone also in winter. See, for example, the frequency of aurora related to solar activity :

https://pwg.gsfc.nasa.gov/polar/EPO/auroral_poster/aurora_all.pdf

Geoff Sherrington
February 25, 2019 2:29 pm

Lack of such correction is a widespread deficiency in climate research. Why, the very last WUWT article before this allegedly attributes wildfire frequency in California to global warming, against the odds.
We need a quiet citizen scientist rebellion, with a memorable flag noting “Fails Bonferroni Test.”

WILLIAM ABBOTT
February 25, 2019 2:40 pm

Is there a correlation between a west QBO, a quiescent sun, and SSW events? I have seen graphs that seem to indicate a correlation.

Dave Fair
February 25, 2019 2:53 pm

Willis, I really do appreciate all the unpaid work you do to further true climate science. Skewering skeptic claims is a huge service, as is skewering alarmist claims.

EdB
Reply to  Dave Fair
February 25, 2019 4:43 pm

100%

Dave Yaussy
Reply to  Dave Fair
February 26, 2019 7:46 am

+10

Mike Thies
February 25, 2019 2:57 pm

“ADDENDUM: I was lying in bed last night after writing this and I thought “Wait … what??” Here’s the idea that made me wonder—if you were going to look for some solar-related effect in February, where is the last place on Earth you’d expect to find it?”

I agreed at first but…

1) There is no doubt that the sun directly influences the earth’s atmosphere via its magnetic field regardless of where the sun shines. The fact that the auroral oval shifts to the “dark side” is just one proof of this.

2) That the solar radiation constant (measured by satellites perpendicular to the sun) shows a variance of 0.001 regardless of the number of sunspots implies that, should sunspots be an indicator of an unknown influence upon the global climate of the earth, the north pole during winter is an excellent place to search for evidence.

Gibo
February 25, 2019 3:00 pm

H44WE defunct Solomon Island Call sign !!

AJ6FK
Reply to  Gibo
February 25, 2019 4:33 pm

Noticed the same thing on QRZ.

Roy Lofquist
February 25, 2019 3:16 pm

Long story short, sunspots affect the aurorae. Aurorae = electrons from the sun = energy.

peterh
Reply to  Roy Lofquist
February 25, 2019 4:30 pm

If Aurora can account for enough energy transfer, they could account for stratosphere temperature at the north pole in the dead of winter. This being the case, temperature distribution there during periods of high solar activity should reflect the ease with which electrons can reach the atmosphere at different locations.

Farmer Ch E retired
Reply to  Roy Lofquist
February 26, 2019 4:27 am

“Aurorae = electrons from the sun = energy.”

Please note – temperature is not a measure of energy (or heat).

Steve Reddish
Reply to  Farmer Ch E retired
February 26, 2019 9:45 am

Spewing energy into something always results in a temperature increase in that something.

SR

Farmer Ch E retired
Reply to  Steve Reddish
February 26, 2019 10:29 am

Explain then how 75F air at near 100% RH in Louisiana has over twice the heat content per unit volume as 100F air in Phoenix at near 0% RH? (was blogged on WUWT a few days ago)

Farmer Ch E retired
Reply to  Farmer Ch E retired
February 26, 2019 1:31 pm

Hint to Steve Reddish – when you spew energy into water at 100C and 1-atm (like in a pan on your stove), you get steam at 100C with no temperature increase.

John F. Hultquist
February 25, 2019 3:17 pm

r^2 = 0.28

She (they) should try The Journal of Disappointing Results.

Donald Horne
Reply to  John F. Hultquist
February 25, 2019 8:06 pm

Maybe even JIR, Journal of Irreproducible Results.

unka
February 25, 2019 3:19 pm

Excellent example of an extreme unbeliever fighting the result he does not like. P-values and Bonferroni mumbo jumbo is his last refuge.

D. J. Hawkins
Reply to  unka
February 25, 2019 3:52 pm

@unka
Well, that was certainly a cogent reply…NOT. Did you intend to offer something constructive, Snowflake?

unka
Reply to  D. J. Hawkins
February 25, 2019 5:58 pm

Willis does not know what he is talking about. You have N data (X,Y) and you can always calculate correlation. You can check some outliers or check the hypothesis whether the relationship is linear or not but beyond that there is no point of talking about p-values and and top of it bringing up Bonferroni is a manipulation by dishonest and/or ignorant person.

Greg
Reply to  unka
February 27, 2019 12:12 am

You spend all your effort attacking what you perceive to be Willis’ motivations without making any scientific point.

If there is “no point in talking about p-values” maybe you should be criticising the authors, not Willis.

unka
Reply to  Greg
February 27, 2019 11:37 am

They did not talka about p-values.

February 25, 2019 3:32 pm

@Willis

The Bonferroni Correction is needed when you’ve looked in more than one place or looked more than one time for something unusual.

I’m no expert on the Bonferroni Correction, just read about it here, but the Wikipedia description of it somewhat different than yours

The Bonferroni correction is named after Italian mathematician Carlo Emilio Bonferroni for its use of Bonferroni inequalities.[1] Its development is often credited to Olive Jean Dunn, who described the procedure’s application to confidence intervals.[2][3]

Statistical hypothesis testing is based on rejecting the null hypothesis if the likelihood of the observed data under the null hypotheses is low. If multiple hypotheses are tested, the chance of a rare event increases, and therefore, the likelihood of incorrectly rejecting a null hypothesis (i.e., making a Type I error) increases.[4]

So it’s used for statistical testing, under _multiple hypotheses_, to prevent a false rejection of the null hypothesis.

What are the multiple hypotheses that you are testing? And what is the null hypothesis that you are trying to reject?

BTW, I tend to be skeptical of claims that solar activity has an effect on surface weather and climate, because there simply no compelling evidence of it. Yet. I’m not saying the effects are zero, but probably very small, commensurate with the tiny 0.1% variation in TSI.

But I think in this case it may not be surprising to find a solar connection between sunspot counts and temperature, because this event is taking place in the stratosphere.

At the poles, the tropopause is very low (about 10km), so this relationship observed at 22km, well into the stratosphere. Ultraviolet heating (ozone etc) is a very well known phenomenon there and may have some variability dependent on solar activity.

And going higher, nn the thermosphere (> 80km), solar magnet activity definitely effects temperature. It is virtual proxy for sunspot activity.

D. J. Hawkins
Reply to  Johanus
February 25, 2019 3:59 pm

Each instance of the investigation constitutes a new hypothesis. Hence, Willis’ observation that there are 504 different combinations. Or 72, to be generous. There is an xkcd cartoon that illustrates it nicely:

https://xkcd.com/882/

Reply to  D. J. Hawkins
February 25, 2019 4:39 pm

“each instance of the investigation constitutes a new hypothesis”

I see it more as a “multivariate” regression rather than “multihypothesis” test. How would you formulate this as a statistical test? What is the null hypothesis? Exactly what are the hypotheses being tested?

As a regression is simple. The dependent variable is temperature and the the independent variables are QBO_polarity, sunspot_count, geopotential heights of effective layers etc.

How does the existence of multiple variables destroy the temperature relationship being investigated? If a regression turns out to have predictive power/skill, who cares about it lack of Bonferronies?

Nylo
Reply to  Johanus
February 25, 2019 11:42 pm

Johanus,

What are the multiple hypotheses that you are testing? And what is the null hypothesis that you are trying to reject?

I will try to advance the likely responses by Willis. The multiple hypothesis would be that the Sunspots affect layer 1, or layer 2, or layer 3… o layer 7, in January, or in February, or in March… with this wind, or with this other wind, as he has already explained. Over 500 different hypothesis.
The Null hypothesis would be that Sunspots do NOT affect significantly any ot the layers in any month with any wind.
In order to reasonably assume causation from what they have (a mere correlation), you need a strong correlation, which they have. But that would be true if they had only looked for it in February, in that layer and with that wind. That’s not what they did. They looked at more than 500 combinations, and if you look at many places, you need a much stronger correlation, because merely a strong correlation of p<0.05 actually has a lot of chances of ocurring by pure luck in some of the places where you looked for it (more than 500 in this case). Rejecting the Null hypothesis means rejecting that the correlation can happen by pure chance. Therefore they have NOT rejected the Null hypothesis.

Reply to  Nylo
February 26, 2019 2:56 am


“The Null hypothesis would be that Sunspots do NOT affect significantly any ot the layers in any month with any wind.”

Thanks. This is indeed a complicated model with many layers and several additional “qualifications” of features. It even has a bit of “magic”: equatorial-polar telelcommunication (‘action at a distance’), where events at the equator affect events at the poles.

So I think all of these dataset features just support a somewhat simpler null hypothesis: solar activity affects polar stratospheric temperatures.

That is, we are trying to predict a temperature, given a set of qualified features, i.e. multivariate regression. It is also a search problem. Willis refers to it perjoratively with the term “data dredging”. I prefer to call it “data mining”. We are trying to optimize the qualified feature set to minimize the regression error.

Simply rejecting the hypothesis, because it doesn’t have enough “Bonferronies”, is throwing the baby out with the bath water.

There seems to be a very interesting correlation of solar activity to temperature, which I can see easily, but roughly, in Willis’ plot. The feature set needs to be refined to strengthen that correlation, and learn more about the underlying physics, which I suspect might be heating caused by scattering of far ultraviolet (i.e. the UV normally blocked by the troposphere).

Yes, I’m waving my hands a bit here. But that’s how science often happens.

Krishna Gans
February 25, 2019 3:34 pm
Reply to  Krishna Gans
February 25, 2019 4:06 pm

Well, the TCI seems to correlate well with SC24. And the physics make sense too, because high sunspot activity is correlated to enhanced EUV irradiance, which has no effect in the troposphere because it is all absorbed in the thermosphere.

So, in the thermosphere, the enhanced EUV does indeed create hotter “weather”. But this heating has zero effect on the troposphere below.

Krishna Gans
Reply to  Johanus
February 26, 2019 5:23 am

Low sun activity and and a reduced hight of the thermosphere have an impact on the polar circulation with all consequeces we see and feel f.e. this winter in NH

David S
February 25, 2019 3:43 pm

Draw a straight line through the data points when the data points look like a shotgun blast.

Reply to  David S
February 25, 2019 4:10 pm

If you remove the outliers at -50C, the remaining points seem to correlate much better to the trend line, such that we can infer lower temperatures at lower sunspot counts, higher temp at higher activity, etc.

Donald Kasper
Reply to  Johanus
February 25, 2019 4:28 pm

There simply is no sunspot to temperature correlation. Use the Central England temperature record back to 1850, take the monthly mean to monthly mean sunspots and graph it. Total blob. With less data points, correlations can look better, but several hundred Central England plots show the reality. It is long past due to debunk the myth of global or local temperature records to sunspot count.

Reply to  Donald Kasper
February 25, 2019 4:44 pm

Donald, I totally agree, at the surface of the Earth.

But in the thermosphere the correlation is remarkably high, because the air heats up as a result of EUV scatter.

Does this kind of heating occur also in the upper stratosphere, maybe due to UV or UV-C? Maybe. Worth investigating.

Does it affect tropospheric weather? Probably not, but I would shut the book on it.

Reply to  Johanus
February 25, 2019 5:02 pm

… would _not_ shut the book on it. 😐

Samuel C Cogar
Reply to  Donald Kasper
February 26, 2019 8:21 am

Are not the “optics” in use now days (2019) for viewing/counting “sunspot’ numbers ….. far, far better than they were during the 100 years post-1850?

And how much better is todays (2019) “optics” than in yesteryears when sunspots were first observed and recorded in China like 3,700 years ago, …… during the Shang Dynasty (~1700 BC to ~1027 BC?

Does it make “logical” science sense to append the Historical Sunspot Numbers Records ….. to ….. the modern or Late 20th Century / Early 21st Century Sunspot Numbers Record?

Sure it does, ….. as long as one doesn’t try to “squeeze” a few science fiction “truths” out of the composite Record.

Reply to  Samuel C Cogar
February 26, 2019 9:09 am

It’s my understanding that the same telescopes that Wolf used are still in use today in order not to bias towards the modern era. Leif has posted here about that several times

meiggs
February 25, 2019 4:00 pm

Heck, last night the “Science Channel” saying we got to bring woolly mammoths back so they can knock down all the trees in the arctic to save the perma-frost and, of course, the planet……..proof positive it’s getting too cool too fast……….

https://corporate.discovery.com/discovery-newsroom/science-channel-goes-in-search-of-woolly-mammoths-and-other-creatures-frozen-in-time-in-lost-beasts-of-the-ice-age-premiering-sunday-february-24-at-8pm/

Dr. Tori Herridge, Palaeontologist of the Natural History Museum, continued: “The quest to understand the extinction of so many large animals at the end of the last ice age – and whether humans, or climate change, or both, were responsible – has never felt so important in a world where wildlife is under increasing threat.”

Professor George Church, Geneticist, Harvard Medical School, talking about his plan to use genetic engineering to bring mammoths back to Siberia, said: “The project really feels like it’s leaping forward. We didn’t expect so many high-quality specimens. It’s just very exciting.”

ferd berple
February 25, 2019 4:06 pm

p-value of 0.05
=========
No science or bûsiness should ever accept such a weak result.

How much would you trust an airplane where 1 in 20 parts may be faulty.

So why trust a scientific paper with such a high probability of error. Peer review doesn’t improve the p value.

This may well explain the lack of progress in climate science. The foundation is riddled with false facts let in by a weak p value criteria.

Steven Mosher
Reply to  ferd berple
February 26, 2019 12:24 am

“No science or bûsiness should ever accept such a weak result.”

Thats funny.

Car Raidators are know to fail in extremely hot weather.
Suppose you run an an auto store and you want to know if you should increase your stock
of Corvette radiators for the next two weeks.

I tell you:

1. Most years you consume 5 corevette raidators in these two weeks of the summer.
2. Looking at a 14 day forecast, I ‘m 70% confident that number will double.

Do you require 95% confidence to increase your inventory?

nope.

You should go out and try to make a living.

Bob boder
Reply to  Steven Mosher
February 26, 2019 10:04 am

“1. Most years you consume 5 corevette raidators in these two weeks of the summer.
2. Looking at a 14 day forecast, I ‘m 70% confident that number will double.

Do you require 95% confidence to increase your inventory?”

And then none blow for the rest of the year and you carry more than one in inventory for 52 weeks soaking up capital you need to run your business for the rest of the year. Sorry Steven not how inventory management works. The magic eight ball says try again later.

Dave Fair
Reply to  Steven Mosher
February 26, 2019 12:02 pm

The number of unsupported assumptions in your missive is impressive, Mr. Mosher! You qualify for a career in CliSci, but not a real business.

BTW, whatever happened to your foray into hustling Chinese bitcoin mining hardware?

Dave Fair
Reply to  Steven Mosher
February 26, 2019 12:05 pm

P.S. Wandering In The Weeds is much easier than running a business. I know; I have relevant experience.

Reply to  Steven Mosher
February 26, 2019 5:47 pm

Gotta go with Mr. Mosher on this. ANY individual time or place is one of an infinite number of times or places you could have looked. Using this bonfireoni standard would invalidate ANY correlation, ever.

Take the correlation: the warmest part of the day is 4 in the afternoon. Not a perfect correlation, but good enough to validate some truth. By this bonfire standard the statistical significance is vacated because you didn’t check the moon, Jupiter, etc.

ferd berple
February 25, 2019 4:23 pm

“p-value < 0.05”
=========/
With thousands of climate researchers looking in hundreds of thousands of places for something unusual, and then only publishing where the results are positive, the Bonferroni Correction tells us that hundreds of published studies are likely false positives.

Say for instance that only 1 in 20 studies on X get published. That means as many as 19 other people were also looking for X. And the p value published may not be significant at all. But we cannot tell because we don't know how many studies were not published.

Crispin in Waterloo
Reply to  ferd berple
February 26, 2019 8:42 am

Ferd

Getting a p value of more than value X does not mean it is a false positive. It means it is a less confident “Yes”. There are no absolute “Yesses”. Every Yes is accompanied by an uncertainty.

Choosing a conventional <1/20 chance of a false positive is just that: a convention. Nothing more.

One can say that a result failed to meet a 99.99% confidence test, but it might meet a 99.9% confidence test. It is not " wrong" because it doesn't get a fourth nine. It is certain to one part in a thousand, but not to one part in ten thousand. So what?

No one can demonstrate mathematically that they know the global average temperature to within one degree C with 99% confidence. The propagation of uncertainties doesn't permit that sort of claim. Yet people make claims to know it to 0.01 degrees with 95% confidence. Impossible. They know no such thing.

It is over such issues that "climate experts" lose credibility. They literally don't know what they are talking about. They don't know enough to know that such a claim is impossible, based on the measurements available.

R2DToo
Reply to  Crispin in Waterloo
February 26, 2019 11:59 am

You have that right Crispin. Most disciplines have an established conventional significance level that is accepted by practitioners. This ranges from P<0.10 for some social sciences, to p<0.001 for some physical sciences. I worked in a small science faculty with 8 departments and +/- 50 faculty members. The chemists and physicists did most research under lab controlled conditions and demanded stringent p-values. Those of us who worked outside the lab (literally outside=the field) could only dream of collecting data/measurements with that kind of accuracy, and p<0.05 was standard. There was a period of time when convention called for reporting the actual probability rather than an arbitrary cutoff value. It is all about the nature of the beast. The "voted on" values in IPCC reports are a bad joke. "I think what we have is important, with very high confidence" has no place in science.

Donald Kasper
February 25, 2019 4:24 pm

A correlation coefficient of 0.28 is called random noise. Sunspots since 1850 to Central England monthly temperature plots as a blob of total noise. There is no sunspot to temperature correlation. There is somewhat of a pyramid as with lower sunspot count there is more variability, but this may be an artifact of few high sunspot months compared to low sunspot months.

Donald Kasper
Reply to  Donald Kasper
February 25, 2019 4:26 pm

No correlation with a correlation coefficient under 0.7 is worth publishing.

ferd berple
Reply to  Donald Kasper
February 25, 2019 4:35 pm

Don’t agree. When negative results are not published this leaves a vacuum for false positives.

Right now there is an epidemic of false positives in science because negative results aren’t published.

Jurgen
Reply to  ferd berple
February 26, 2019 9:13 am

Ferd Berple: “Right now there is an epidemic of false positives in science because negative results aren’t published.”

Not published even within the scientific community itself? That would be detrimental indeed. Or do you rather mean “false positives about science in qualified science journals and the popular media”?

Jurgen
Reply to  ferd berple
March 1, 2019 6:57 am

Hmmm… still wondering… how can you be sure about something that isn’t “published”…

Reply to  Donald Kasper
February 26, 2019 3:16 am

No the accepted interpretation of r^2=0.28 would be that the sunspot number accounts for 28% of the variation in temperature.

ferd berple
February 25, 2019 4:29 pm

Correction says you need to divide the desired p-value (0.05) by the number of places that you’ve looked.
≠=========
With thousand of researchers looking in hundreds of thousands of places it could well be that none of the published findings in climate science are significant.

Clyde Spencer
Reply to  ferd berple
February 25, 2019 5:29 pm

ferd berple
You’ve touched on something that I was going to challenge Willis with. Assume that you have done sampling with very fine temporal and spatial resolution in the collection of data, and used all of your data. At first blush, it would appear that one has done a good job. Yet, as I understand Willis’ claim, all that data means dividing the necessary p-value by a very large number, which as it approaches infinity as a limit, means that the required p-value approaches zero. Detailed sampling seems to be counter productive in that the required p-value behaves like a mirage on the horizon, always receding as you approach it!

There is another aspect to the conundrum. What if one’s budget was very tight, and the researcher could only afford to examine one altitude, and they got lucky and picked the altitude where a p-value was smaller than 5%. Should that be accepted as a correlation with statistical significance, even though the sampling was less than thorough? After all, the researcher only looked in one place so there is no requirement to reduce the p-value requirement.

Something doesn’t seem right here.

Dave Fair
Reply to  Clyde Spencer
February 25, 2019 6:08 pm

You’re right, Clyde. Only looking at one or a few items leads to lack of understanding of the whole. One need look at the whole experiment/study to determine its validity.

ferd berple
Reply to  Clyde Spencer
February 25, 2019 8:42 pm

Something doesn’t seem right here.
=≠========
You are confusing the number of data point with the number of times a study is performed.

More data points in a study yields smaller statistical error in that study. More times a study is performed, the less significant each occurrence.

The problem comes when a study is performed 100 times and you throw away the 99 that don’t fit the hypothesis and publish the one that does fit.

The 99 you didn’t publish makes the one you did publish roughly 99 times less likely to be correct.

But since the 99 that were never published remain hidden no one can tell that the one you did publish is likely garbage.

And with the publishing bias against negative results it is possible and even likely that p values across science are a load of rubbish except the first time a study is performed.

noaaprogrammer
Reply to  ferd berple
February 25, 2019 10:03 pm

It would be interesting to apply this to all the versions of code that a climate modeler goes through before settling on the version that he deems best.

Clyde Spencer
Reply to  ferd berple
February 26, 2019 9:52 am

ferd berple
You said, “You are confusing the number of data point with the number of times a study is performed.” However, what Willis said was, “How many different places have they looked for a relationship? Well, they’ve chosen the temperature of one of twelve months, in one of seven atmospheric levels, with one of three sunspot lag possibilities (0, 1, or 2 months lag), and one of two equatorial stratospheric wind conditions.”

Let’s take a situation where we sample data from a phenomena or location with no expectation of there being a correlation. Do we tighten the p-value requirement because we have looked at places or phenomena that do not plausibly have a a relationship, spurious correlations? This is beginning to sound a little like the Heisenberg Uncertainty Principle where the act of observation impacts the result.

February 25, 2019 5:02 pm

… would _not_ shut the book on it. 😐

David L. Hagen
February 25, 2019 5:05 pm

Willis
Thanks for your mathematical explorations and explanation.
PS Caution: Statistician WM Briggs strongly argues against using p values.
Briggs, William M., 2019. Everything Wrong with P-Values Under One Roof. In Beyond Traditional Probabilistic Methods in Economics, V Kreinovich, NN Thach, ND Trung, DV Thanh (eds.), pp 22–44. DOI 978-3-030-04200-4_2
Abstract

P-values should not be used. They have no justification under frequentist theory; they are pure acts of will. Arguments justifying p-values are fallacious. P-values are not used to make all decisions about a model, where in some cases judgment overrules p-values. There is no justification for this in frequentist theory. Hypothesis testing cannot identify cause. Models based on p-values are almost never verified against reality. P-values are never unique. They cause models to appear more real than reality. They lead to magical or ritualized thinking. They do not allow the proper use of decision making. And when p-values seem to work, they do so because they serve a loose proxies for predictive probabilities, which are proposed as the replacement for p-values.

Another Proof Against P-Value Reasoning

Think of it this way: you begin by declaring “The null is true!”; therefore, it becomes almost impossible to move from that declaration to concluding it is false.

ferd berple
Reply to  David L. Hagen
February 25, 2019 9:28 pm

WM Briggs strongly argues against using p values
=======
I strongly agree because the number of times a study has been performed cannot be reliably known so you cannot apply the Bonferroni Correction and thus cannot trust the p value.

Bart Tali
February 25, 2019 5:23 pm

Fewer sunspots would mean more cosmic rays, and with Svensmark’s theory, that means more cloud formation. I would expect clouds to have some effect on stratospheric temperature.

So, though the statistics may not be good enough from Labitzke, I would not be surprised that with some more work, the theoretical relationship is borne out experimentally.

1 2 3