Guest Post by Willis Eschenbach
OK, quick gambler’s question. Suppose I flip seven coins in the air at once and they all seven come up heads. Are the coins loaded?
Near as I can tell, statistics was invented by gamblers to answer this type of question. The seven coins are independent events. If they are not loaded the chances of a heads is fifty percent. The odds of seven heads is the product of the individual odds, or one-half to the seventh power. This is 1/128, less than 1%, less than one chance in a hundred that this is just a random result. Possible but not very likely. As a man who is not averse to a wager, I’d say it’s a pretty good bet the coins were loaded.
However, suppose we take the same seven coins, and we flip all seven of them not once, but ten times. Now what are our odds that seven heads show up in one of those ten flips?
Well, without running any numbers we can immediately see that the more seven-coin-flip trials we have, the better the chances are that seven heads will show up. I append the calculations below, but for the present just note that if we do the seven-coin-flip as few as ten times, the odds of finding seven heads by pure chance go up from less than 1% (a statistically significant result at the 99% significance level) to 7.5% (not statistically unusual in the slightest).
So in short, the more places you look, the more likely you are to find rarities, and thus the less significant they become. The practical effect of this is that you need to adjust your significance level for the number of trials. If the significance level is 95%, as is common in climate science, then if you look at 5 trials, to have a demonstrably unusual result you need to find something significant at the 99% level. Here’s a quick table that relates number of trials to significance level, if you are looking for the equivalent of a single-trial significance level of 95%:
Trials, Required Significance Level 1, 95.0% 2, 97.5% 3, 98.3% 4, 98.7% 5, 99.0% 6, 99.1% 7, 99.3% 8, 99.4%
Now, with that as prologue, following my interest in things albedic I went to examine the following study entitled Spring–summer albedo variations of Antarctic sea ice from 1982 to 2009 :
ABSTRACT: This study examined the spring–summer (November, December, January and February) albedo averages and trends using a dataset consisting of 28 years of homogenized satellite data for the entire Antarctic sea ice region and for five longitudinal sectors around Antarctica: the Weddell Sea (WS), the Indian Ocean sector (IO), the Pacific Ocean sector (PO), the Ross Sea (RS) and the Bellingshausen– Amundsen Sea (BS).
Remember, the more places you look, the more likely you are to find rarities … so how many places are they looking?
Well, to start with, they’ve obviously split the dataset into five parts. So that’s five places they’re looking. Already, to claim 95% significance we need to find 99% significance.
However, they are also only looking at a part of the year. How much of the year? Well, most of the ice is north of 70°S, so it will get measurable sun eight months or so out of the year. That means they’re using half the yearly albedo data. The four months they picked are the four when the sun is highest, so it makes sense … but still, they are discarding data, and that affects the number of trials.
In any case, even if we completely set aside the question of how much the year has been subdivided, we know that the map itself is subdivided into five parts. That means that to be significant at 95%, you need to find one of them that is significant at 99%.
However, in fact they did find that the albedo in one of the five ice areas (the Pacific Ocean sector) has a trend that is significant at the 99% level, and another (the Bellingshausen-Amundsen sector) is significant at the 95% level. And these would be interesting and valuable findings … except for another problem. This is the issue of autocorrelation.
“Autocorrelation” is how similar the present is to the past. If the temperature can be -40°C one day and 30°C the next day, that would indicate very little autocorrelation. But if (as is usually the case) a -40°C day is likely to be followed by another very cold day, that would mean a lot of autocorrelation. And climate variables in general tend to be autocorrelated, often highly so.
Now, one oddity of autocorrelated datasets is that they tend to be “trendy”. You are more likely to find a trend in autocorrelated datasets than in perfectly random datasets. In fact there was an article in the journals not long ago entitled Nature’s Style: Naturally Trendy . (I said “not long ago” but when I looked it was 2005 … carpe diem indeed.) It seems many people understood that concept of natural trendiness, the paper was widely discussed at the time.
What seems to have been less well understood is the following corollary:
Since nature is naturally trendy, finding a trend in observational datasets is less significant than it seems.
In this case, I digitized the trends. While I found their two “significant” trends in the Bellingshausen–Amundsen Sea (BS) at 95% and the Pacific Ocean sector (PO) at 99% were as advertised and they matched my calculations, unfortunately I also found that as I suspected, they had indeed ignored autocorrelation.
Part of the reason that the autocorrelation is so important in this particular case is that we’re only starting with 27 annual data points. As a result, we’re starting with large uncertainties due to small sample size. The effect of autocorrelation is to reduce that already inadequate sample size, so the effective N is quite small. The effective N for the Bellingshausen–Amundsen Sea sector (BS) is 19, and the effective N for the Pacific Ocean sector (PO) is only 8. Once autocorrelation is taken into account both of the trends were not statistically significant at all, as both were down around the 90% significance level.
Adding in the effects of autocorrelation with the effect of repeated trials means that in fact, not one of their reported trends in “spring-summer albedo variations” is statistically significant, nor even near to being significant.
Conclusions? Well, I’d have to say that in climate science we’ve got to up our statistical game. I’m no expert statistician, far from it. For that you want someone like Matt Briggs, Statistician to the Stars. In fact, I’ve never taken even one statistics class ever. I’m totally self-taught.
So if I know a bit about the effects of subdividing a dataset on significance levels, and the effects of autocorrelation on trends, how come these guys don’t? Be clear I don’t think they’re doing it on purpose. I think that this was just an honest mistake on their part, they simply didn’t realize the effect of their actions. But dang, seeing climate scientists making these same two mistakes over and over and over is getting boring.
To close on a much more positive note, I read that Science magazine is setting up a panel of statisticians to read the submissions in order to “help avoid honest mistakes and raise the standards for data analysis”.
Can’t say fairer than that.
In any case, the sun has just come out after a foggy, overcast morning. Here’s what my front yard looks like today …
The redwood tree is native here, the nopal cactus not so much … I wish just such sunny skies for you all.
Except those needing rain, of course …
w.
AS ALWAYS: If you disagree with something I or someone else said, please quote their exact words that you disagree with. That way we can all understand the exact nature of what you find objectionable.
REPEATED TRIALS: The actual calculation of how much better the odds are with repeated trials is done by taking advantage of the fact that if the odds of something happening are X, say 1/128 in the case of flipping seven heads, the odds of it NOT happening are 1-X, which is 1 – 1/128, or 127/128. It turns out that the odds of it NOT happening in N trials is
(1-X)N
or (127/128)N. For N = 10 flips of seven coins, this gives the odds of NOT getting seven heads as (127/128)10, or 92.5%. This means that the odds of finding seven heads in ten flips is one minus the odds of it not happening, or about 7.5%.
Similarly. if we are looking for the equivalent of a 95% confidence in repeated trials, the required confidence level in N repeated trials is
0.951/N
AUTOCORRELATION AND TRENDS: I usually use the method of Nychka which utilizes an “effective N”, a reduced number of degrees of freedom for calculating statistical significance.
where n is the number of data points, r is the lag-1 autocorrelation, and neff is the effective n.
However, if it were mission-critical, rather than using Nychka’s heuristic method I’d likely use a Monte Carlo method. I’d generate say 100,000 instances of ARMA model (auto-regressive moving-average model) pseudo-data which matched well with the statistics of the actual data, and I’d investigate the distribution of trends in that dataset.
[UPDATE] I found a better way to calculate effective N. See below.
Discover more from Watts Up With That?
Subscribe to get the latest posts sent to your email.



The Famous Monty Hall Problem
As the finalist on a game show, you’re given the choice of three doors: Behind one door is a car. Behind the others, some goats (which you don’t want). If you choose the door with the car behind it, you get the car.
You pick one of the doors. At this point the host, Monty Hall, goes to one of the other doors and opens it to reveal a goat.
He then gives you a last chance to change your mind and switch your choice to the other closed door.
Should you change your mind?
a) It’s now 50-50, so it doesn’t matter
b) Yes, by switching you greatly increase your chance of winning the car
c) You should stick with your first choice.
Surprisingly, the answer is b).
How so Mike?
Fascinating:
https://en.wikipedia.org/wiki/Monty_Hall_problem
The Monty Hall problem as stated by MikeB is another example of the Bertrand Paradox (see my comment above). In this case the statement of the problem doesn’t include the “method the host uses to select the door he/she chooses to open.” Most readers make the eminently reasonable assumptions (a) that the host knows which door the car is behind, and (b) the host uses that knowledge to open a door with a goat. [Note: For the three-door problem–one and only one door leads to the car and two doors lead to a goat–if the host knows which door leads to the car, independent of the door the contestant chooses, at least one of the remaining two doors will lead to a goat.] However, neither assumption is explicitly stated. If the host possesses knowledge of the car’s location and he uses that knowledge to always open a door that leads to a goat, then I agree the contestant should switch because by switching he increases his/her probability of getting the car. However, if either (a) the host has no knowledge of which door contains the car or (b) the possesses that knowledge but chooses a door to open by flipping an unbiased coin (i.e., a coin that has a 50% probability of landing heads-up and a 50% probability of landing tails-up), and selecting a door to open based on the results of the coin flip, then I believe there is no benefit to switching–i.e., keeping the original choice and switching to the unopened door are equally likely to win the contestant the car. The reason being, that using knowledge of which door leads to the car, the host can always open a door with a goat. But, lacking such knowledge or using the coin flip to choose which door to open, the host will open a door that leads to the car 1/3 of the time. This changes the likelihood that by switching doors the contestant improves his chances of winning the car. For this case probability is now similar to two contestants each choosing a door (not the same door), opening the third door, and asking the contestants if they want to switch doors with each other. One-third of the time, the opened door (i.e., the door not chosen by either contestant) will contain the car, in which case switching or not switching will produce the same result–i.e., both contestants get a goat. Two-thirds of the time, the opened door will contain a goat, in which case the probability of each contestant getting the car is 1/2 and switching gains one contestant but costs the other contestant Since from the likelihood of winning the car the two contestants are identical, if it’s advantageous for one contestant to switch, it must be advantageous for the other contestant to switch–which is impossible.
Yes, the discussion in the Wikipedia entry goes into some of these details, and both wikis contain a link to the Bertrand problem.
I was taken as well by the possible psychological reasons why people would not switch, as I was by the possibility of cognitive overload making an assessment of the odds difficult to compute on the spot.
Ex: People hate it more if they switch and are wrong than if they stick with the door they “own” already and are wrong.
In the latter case, one might feel more like they gave the car away after winning it to begin with, so to speak.
But to the specifics of the unspoken rules…I think most people, and especially those old enough to remember the show or how game shows in general work, will intuitively know that Monty knows where the car is, and that he will only open a door with a goat, never one with the car.
But figuring the actual odds out while standing there seems to be almost impossible, psychologically speaking.
Note that in some studies, only 13% of people switched. About the number that believe Elvis is still alive.
And note too, that I the original Marilyn vos Savant column, thousands of people, including PhD math teachers, excoriated her in writing for being so wrong…which of course she was not!
[BTW, what are the odds that the smartest person in the world would happen to have the last name of “Savant”? That seems more unlikely that the library detective named “Bookman” on The Seinfeld Show! 😉 ]
I was pondering how these same psychological factors might play a large role in cognitive bias in general, and also considered how these might relate to the reluctance of warmistas to consider contrary evidence.
Of course, skeptics are immune to such bias by our more logical, even tempered and methodical nature.
*insert/smirk*
Reed,
You have an interesting take on the problem, but you must address the key fact namely that Monty NEVER opens the door that reveals the car. The reasonable assumption is that Monty knows where the car is and this knowledge guides his choice, this is scenario 1. They told the contestant the truth, the car is there. But suppose, as you do, that Monty like the contestant does not know were the car is, then the only way for Monty to consistently never reveal the car is that there is no car when he makes his choice. They lied about the car being there, this is scenario 2. In this scenario the game is also rigged but in a different way.
In scenario 2 Monty’s choice does not matter, all 3 doors conceal a goat and the car is only placed into the game after he has randomly opened one of two doors available to him. We now have the interesting question of how is the game fixed? Is it fixed by knowledge (scenario 1) or by subterfuge (scenario 2)? If we have the results of a large number of games and assume a consistent scenario is being used for all games played, then we should see for scenario 1 (the truth scenario) a benefit in switching, while for scenario 2, if the car is introduced randomly, the contestant’s choice collapses to 50/50 as you suggest.
Phillip,
I agree with everything you wrote. In my original comment, maybe I should have written “overwhelmingly reasonable” instead of “eminently reasonable.” I’m old enough to have watched the original show (on television, not in person) with Monty Hall as the host. Furthermore, in all the times I watched the show, Monty always opened a door with a booby prize (goat). Having watched the show many times, if given the chance to switch doors, I, too, would switch. Thus, in light of my observations, I’d give odds way better than even money that your scenario 1 case represents reality. It’s only when I dissected the statement of the problem in light of the Bertrand Paradox that it occurred to me that the problem as stated in this thread (and in Marilyn vos Savant’s original write up in Parade magazine) did not have a unique solution because the answer depends on conditions not explicitly stated.
For what it’s worth. As mentioned by Menicholas (June 28, 2015 at 5:32 pm), I, too, was surprised by the vituperative tone of the letters written to her. As I recall, one person who excoriated her wrote a second letter admitting he was wrong and apologizing for his initial letter. Among the important life lessons I have learned are: (1) don’t be absolutely positive about anything, and (2) when addressing probability and/or relativity problems, don’t trust your intuition. At the time Ms. Savant first published the problem in Parade magazine, I wrote a letter to her describing my concerns. I received no response–either in Parade or in personal correspondence.
Probability of a climate “scientist” finding a statistically significant result is 1.0000
…if the result supports the consensus.
You can’t tell if 7 coins are loaded or not by tossing all of them once. 7 heads is as rare as first 3 coins heads and 4 last coins tails. Bad experimental design, W.
Hi Willis,
There isn’t any certainty of the best way to test multiple hypotheses, although the Bonferroni correction of taking some statistical test for significance and dividing the margin by m (for m hypotheses) is as reasonable as any. The problem arises when considering non-independent hypotheses, which in the case at hand involves spatial as well as temporal correlation. A second but nontrivial problem arises in precisely the context you refer to in your example — in the words of the late, great George Marsaglia, “p happens”. Precisely the same problem that requires one to use Bonferroni (at least) to assess significance when testing multiple hypotheses means that you have to take the rejection of the null hypothesis at the level of p = 0.05 with a very distinct grain of salt. After all, this happens one in twenty times by pure chance. If you had twenty perfectly fair coins or tested one perfectly fair coin twenty times, you would have a very good chance of observing a trial that fails at the p = 0.05 level. The difficulty is this trial can happen first, last, in the middle, and rejecting the coin as being fair on its basis can be a false negative.
Since I wrote Dieharder (based on Marsaglia’s Diehard random number generator tester), which is basically nothing but a framework for testing the null hypothesis “this is a perfect random number generator” in test after test after test, I have had occasion to look at the issue of how p happens in considerably more detail. For example, p itself should, for a perfect random number generator used to generate some statistic, be distributed uniformly. One can test for uniformity itself given enough trials.
This is relevant to the Antarctica question in that if one tested for the probability some “extreme” against the null hypothesis for some statistic in five non-independent zones, if all five of them (say) produced p < 0.5 that is unlikely in and of itself at the level of 1/32, sufficient to fail a p = 0.05 test (which I consider nearly useless, but who can argue with tradition even when it is silly). Except that this is again confounded when one looks at spatial correlation and/or temporal correlation.
The fundamental problem is that the axioms of statistics generally require independent and identically distributed samples for most of the simple results based on the Central Limit Theorem to hold. When the samples are not independent, and are not identically distributed, and when the degree of dependence and multivariate distribution is not a priori known, doing the statistical analysis is just plain difficult.
This applies to this whole issue in so very many ways. I’ve recommended the papers by Koutsoyiannis repeatedly on WUWT, especially:
https://www.itia.ntua.gr/en/docinfo/673/
as a primer for analyzing climate timeseries and the enormous, serious difficulty with looking at some timeseries within some selected window and making complex inferences on the basis of some hypothesized functional trend (linear, quadratic, exponential, sinusoidal).
rgb
Robert, as always your contributions are both clear and valuable. The question of how to do corrections for lack of I.I.D. data is a vexing one. As I said, I try to use Monte Carlo methods, but the trick is the generation of appropriate pseudo-data … not easy even in the easy cases.
Thanks for the link to the Koutsoyiannis paper. I find him to be one of the few sane voices crying in a statistical wilderness …
My best regards to you,
w.
Re your Koutsoyiannis reference, a related issue is the allegation of nonstationarity in recent hydrologic processes caused by manmade global warming. The implication is that at some time in the Edenic past, climate and the resulting hydrologic processes were stationary. If in fact climate has always changed naturally, then there is no Edenic past with respect to hydrologic stationarity.
On the topic of nonstationarity and climate change, “Hirsch and Ryberg, 2012” was referenced in a 2013 flow-frequency statistics workshop I attended. An online check led to CO2 Science:
http://www.co2science.org/articles/V15/N34/C3.php
Reference
Hirsch, R.M. and Ryberg, K.R. 2012. Has the magnitude of floods across the USA changed with global CO2 levels? Hydrological Sciences Journal 57: 10.1080/02626667.2011.621895.
Link to article:
http://www.tandfonline.com/doi/pdf/10.1080/02626667.2011.621895
[begin excerpt]
What it means
In discussing the meaning of their findings, Hirsch and Ryberg state that “it may be that the greenhouse forcing is not yet sufficiently large to produce changes in flood behavior that rise above the ‘noise’ in the flood-producing processes.” On the other hand, it could mean that the “anticipated hydrological impacts” envisioned by the IPCC and others are simply incorrect.
[end excerpt]
Watershed development, dams, or diversions as well as natural climate change are causes of nonstationarity (e.g. CT Haan, “Statistical Methods in Hydrology”). For example, see Yevjevich (referenced by Koutsoyiannis):
https://books.google.com/books/about/Stochastic_processes_in_hydrology.html?id=uPJOAAAAMAAJ
Yevjevich provides an explanation and tests for stationarity. In the simplest application, the record is divided into two or more parts and analyzed independently. “In general, if the subseries parameters are confined within the 95 percent tolerance limits about the corresponding value of the parameter for the entire series, the process is inferred to be self-stationary.”
If climate is considered to have been naturally changing in the past and continues to naturally change in the present and future, then climatic nonstationarity has always existed and is embedded in the hydrologic record. Can the hydrologic record be dusted to reveal the fingerprint of man? A hydrologic record long enough to reveal a fingerprint would need to be longer than the present warming cycle and would stretch back through the previous cooling cycle.
I agree that the five zones need to be taken into consideration when looking at probabilities. However whether you are right in taking into account the fact that they only used half of the daylight months is more contentious. If they actually checked other options and selected four months because they gave the best results you would be right. But if they settled on the four months on the basis that they were the “brightest” and did not even look at the others then the other four daylight months are no more relevant than the four dark months. It is not a question of what other data exists – it is a question of what other data was analyzed and then discarded.
Auto-correlation is a tricky beast. It is true that auto-correlated data can give rise to spurious trends, but it is also true that trending data can give rise to spurious auto-correlation. To test this create your own series by adding a straight line trend to random white noise. Provided that the standard deviation of the noise is reasonable relative to the trend you will find some spurious auto-correlation. To correct for this one should estimate and remove the trend before performing the auto-correlation calculation – you would of course adjust the degrees of freedom in the auto-correlation calculation accordingly. Maybe you did this, if not you should have.
Thanks, Paul. Of course I removed the trend before estimating the autocorrelation, anything else introduces spurious autocorrelation into the equation.
w.
Not necessarily. But the coin tosser could be cheating on the toss.
How can you cheat at tossing a coin? Easy,
catch it in the air at as close to the same height
as it left the hand on the flip.
All fair flips should be allowed to land on the surface
beneath one’s feet.
Try it for yourself. I could, significantly more often than not, obtain
the result i wanted by:
1. Turning the opposite face from the desired one up
(and showing it.)
2. Flipping the coin
3. Catching the coin at as close as I could judge it to the height
where it left my hand on the flip.
4. Turning the coin over onto the back of my hand.
Et voila.
The closer to the release point the catch is made, the
less chance is involved. The coin rotates in the air. If there
is the same number of rotations coming down as there
were going up, then chance isn’t a factor. The starting face
is known, the coin has rotated an even number of times.
The result is known.
Yes, it’s cheating. It doesn’t take much practice, either, if your kinesthetic
senses are reasonable.
Please note; this is just from personal observation. I have never put
it to a rigorous test. I will say if flipping a fair coin to make a decision,
always insist it is not caught but allowed to land.
“The odds of seven heads is the product of the individual odds, or one-half to the seventh power. This is 1/128, less than 1%, less than one chance in a hundred that this is just a random result. Possible but not very likely.”
I’m afraid this is nonsense. You confuse the apriori probability of throwing 7 heads in a row with an aposteriori assessment of its likelyhood given the data, i.e. that it has occurred. Seven heads or even more in a row in a long series of throws are common events and in this case, with only seven throws there is no way to decide whether your dice is loaded or not. In fact, the series (of seven throws) is too short to see the dfference between a fair and a fully loaded dice,
Agreed. The correct answer – and in many other instances – is “I do not know, because of insufficient data and an uncontrolled design”.
Ed Zuiderwijk June 28, 2015 at 2:44 am
Ed, I’m afraid that what you wrote is nonsense. There is no series of throws that is long enough to “see the dfference between a fair and a fully loaded dice”. Any series of throws could have been thrown by either loaded or unloaded coins.
As a result, all that we can do is to look at the odds. The odds of throwing seven heads in one single toss of seven coins is indeed 1/128. You are free to claim that such a throw tells us nothing about whether the coins are loaded.
w.
Seven heads twice in a row would tell me more.
He’s trying to explain the Monty Hall problem to you dumb bunny.
Gambling fail, statistics fail 🙂 You absolutely cannot tell whether the coins are tampered with or not – not after tossing them once. If you tossed them ten times and they always came up the same way, ONLY THEN you could make that call. You don’t complain that the national lottery is rigged either, if you play it once for fun and happen to win 20 million, on the basis that it was highly unlikely that you would win, righties? In case of the national lottery, you’d even have to assume it is rigged after playing for your entire life – that’s how unlikely it is – but still, you don’t complain. Why is that? It’s called ‘luck’.
There is nothing special about your coins toss – it is only special because you have decided to assign significance to that particular 1 in a 128 outcome. There is nothing special about a full house, either. It is only special because we have assigned significance to that particular set of cards. If it weren’t in the rule book, you’d throw it straight out the window if I dealt you a full house…. (which wouldn’t even be known by that name in that case…)
I am a bit of a gambler myself, and lately, every week the results are out for this particular type of game, there is a poor, poor, soul that will take to the forum and bitch about how the game was rigged (because he didn’t win, of course) 🙂 Two things to note: WHY OH WHY does he come back every time like a true masochist if he either knows or thinks the game is rigged?! And: surely, if it should ever be his turn to win, that will be the week where the game wasn’t rigged for once, needless to say 😉 He will say adorable things, like how it was ‘predictable’ that whoever won would win — reaaally. So he could predict the winner, and hence knew it wouldn’t be him, yet he played anyway?! Seems more like hindsight to me… and a few other less charming things that could be said.
Agreed one cannot make a conclusion that the coins are loaded, but to see something like that in the first roll is in fact highly unusual and would be cause to raise suspicions. That was the way I took his illustration.
Matt June 28, 2015 at 2:47 am Edit
Gambling fail, statistics fail indeed … except the fails are yours.
Neither one throw nor ten throws can let us “make that call” that the dice are loaded … but then I never said either one could. All that ten throws can tell us is the odds, which is all that one or a hundred throws can tell us. And while the odds do go up with ten throws, you are 100% wrong that one throw tells us nothing.
Please note that I don’t say that there is any firm conclusions, whether from one throw or ten. All we can do is check the odds and place the bet. I said:
So at the end of the day, I fear that the fails are indeed yours. First, you failed to note that even ten throws only establish odds, just like one throw does. Second, you failed to read what I’d actually said, so you were attacking a straw man. Third, you got all cute and snarky, which works fine when you are right. But when you are wrong, as in this case, it just makes you look foolish.
w.
Doesn’t it only work if you called it in advance? “The next 10 flips will come up heads.”
I have seen 16 heads thrown in two up followed by 11 tails each result is completely independent from the previous statistics in nature are only valid on the day of printing and have little relevance to future events.
Ah, yes, I was thinking of “two up” when I described the seven coins tossed at once …
w.
Any outcome of coin tosses has the same probability. If the same outcome occurs multiple throws, then you can conclude the coins were fixed.
I do not think so. Would you bet 7 heads against 3 heads and 4 tails?
There are more different ways to wind up with 3 heads and 4 tails.
If you specified ahead of time which coin would get heads and which tails, that would closer to the same thing.
With dice, every face is equally likely to appear, but seven is the most likely roll by far…for this same reason. More ways to get it.
“Autocorrelation” makes me wonder if en route from the lab to the Guardian/BBC some pieces of Climate Science research go thru AUTO-TUNING ie filter out all the boring results and hype and emphasise bits which are ‘on message’.
– of course the systematic excluding of skeptical criticism is part of that autotuning.
Something interesting is happening with NH ice extent this year…
http://ocean.dmi.dk/arctic/old_icecover.uk.php
WOW, thanks for the link Eliza, NH ice extent doing something interesting indeed!
Ditto with Greenland ice balance. Slowest melt season on record.
Snow still on the ground in the capital city, with only a few weeks left in the melt season.
Yes, indeed, I watch this daily, and results are back well within 2 Std deviations. Note that this link is to the 30% extent metric. The 15% extent is also available on the same site, and is more widely used. However I like the 30% analysis, because I think it may be more predictive of the trend (guessing that 15% might be more affected by wind shift)
Taylor
according to the arctic sea ice forum it is all going to disappear next week .
I thought it was last week.
It looks like NOAA’s prediction of above average September minimum is on track:
http://origin.cpc.ncep.noaa.gov/products/people/wwang/cfsv2fcst/imagesInd3/sieMon.gif
Excellent post Willis! I took several probability and statistics classes in college and loved it when I was young. But after going to work as a petrophysicist (earth scientist) I was introduced to SAS (the Statistical Analysis System) and used it on a couple of early projects. I quickly found that it was so flexible and easy to use (abuse?) that you could get any answer you wanted and make it look real. I abandoned it and stochastic methods and never looked back. I’m strictly a deterministic guy to this day. You have put your finger on a very serious problem in science.
Use of probability and statistics is essential if one is to reach a logically valid conclusion in cases in which information needed for a deductive conclusion is missing. However, it is not enough. Both Bayesian and frequentist statistics have logical shortcomings with the result that we cannot rely on either of them by itself. To overcome the shortcomings one needs the help of an idea which, while 50 years old, has not yet penetrated the skulls of many researchers. This idea is information theoretic optimization: the induced generalization maximizes the entropy of each inference that is made by it or minimizes the entropy under constraints.
Excellent essay. Your point, “So in short, the more places you look, the more likely you are to find rarities, and thus the less significant they become.” is not unique but it is profound.
I think there is a corollary hidden in that quote someplace. Something to the effect that:
The rational mind does not lightly infer the universal from the particular; the faith based mind sees the unusual and concludes the universal…..
still needs work, but no coffee and very early…..
My new favorite example of the misuse of statistics involves chocolate:
link
A real study was conducted.
They didn’t cheat on the methodology to get a pre-determined result
The data wasn’t cooked
They used standard formulas to calculate their p-values
The outcome was entirely due to bad experimental design
Journalists were completely sucked in. Not one journalist questioned the results. The very good news is that some people did see the problem. Those were people like us, posting comments on blogs.
Your post is very interesting but the link seems to be non-working.
Thanks hunter. Here’s another attempt at a link plus an url if that doesn’t work.
link
url = http://io9.com/i-fooled-millions-into-thinking-chocolate-helps-weight-1707251800
I said “not long ago” but when I looked it was 2005 … carpe diem indeed.
Carpe diem, or Tempus fugit?
Sadly, both …
w.
Apropos of nothing much, but illustrating why statistics are so hard to deal with.
Are you asking what the odds are of seven heads appearing at least once in ten flips or are you asking the odds that seven heads appear exactly once in ten flips. I’m quite sure that (127/128)^10 is the answer to one of those two questions, but I’m far from sure off the top of my head which it answers.
I suspect that careful examination of climate science, medicine, or any other discipline where predictions can’t be checked against precise answers would find a lot of cases of statistics answering a slightly different question than the authors intended to ask.
A long time ago a professor observed to me that students, having found a formula, would try to apply it no matter how inappropriate it was given the circumstances. Sadly, I’ve met a few people who never grew out of that habit.
When you are holding a hammer, everything looks like a nail.
The physicist Luis Alvarez, who was most famous for identifying an asteroid impact as the cause of the mass extinction of dinosaurs (https://en.wikipedia.org/wiki/Alvarez_hypothesis) , used to say that everyone will encounter a 5 sigma event in his lifetime, which is a confidence limit of 0.00006 %. In high energy physics, no one will even consider a result that is less significant than that because they deal with huge amounts of data and understand that random fluctuations can create spurious results quite easily.
Random time series, like the coin toss, often have very counter-intutive behavior. Consider the famous drunkard’s walk. A drunk starts at a lamp post and takes one step north each time the coin toss comes up heads and one step south if it’s tails. Most people think that the drunk will wind up hanging around the lamp post. In fact, the mostly likely outcome is that he will wander away and never return. If you plot his position versus time it will be a slow, oscillating drift away from the lamp post purely due to the randomness of a fair coin. See the first figure, Example of eight random walks in one dimension, in https://en.wikipedia.org/wiki/Random_walk.
Climate “scientists” don’t seem to understand that (a) high probability random events can occur while you’re watching and (b) that random events will wander around and look like something deterministic is happening.
I worked for years in inspection/quality control departments. Human bias is overwhelming. Sometimes you want to find the “Bad” one and sometimes you don’t. Our good friends in climate science desperately want to find the “Bad” one and folks on my side don’t.
Then there’s Alpha and Beta error.
What are the odds that the corrections GISS/NASA makes to the GHCN temperature record don’t reflect bias?
I thought I could put up an image?
http://oi57.tinypic.com/11ui0cp.jpg
there are no “honest mistakes” in science … they are willfully ignorant … and its that willfulness that removes the honesty …
Really? Then how do you account for N-Rays?
https://en.wikipedia.org/wiki/N_ray
It is interesting to note that most of the reports into climate issues conducted by public bodies suggest in every case that a statistics professional be included in each research team to prevent some of these gross statistical errors reoccuring.
If someone tossed seven coins I would be mildly suspicious
If that person was a paid entertainer and the only reason he was tossing the coins was to please a crowd
I would be much more suspicious
If the paid entertainer had a peer group clapping away assuring me that it was all fair, I would find them to be suspect as well.
If the paid entertainer then offered me a tin foil hat for a thousand bucks so that I could toss seven heads, but only after he had left town, I would be extremely suspicious.
probably
Statistically speaking there is a huge autocorrelation between climate “psience” papers that attempt to demonstrate that CO2 is the primary driver of Earth’s climate.
And like Willis, so eloquently states, “Once autocorrelation is taken into account both of the trends were not statistically significant at all”