Guest Post by Willis Eschenbach
Following up on his brilliant earlier work “The Black Swan”, Taleb has written a paper called Error, Dimensionality, and Predictability (draft version). I could not even begin to do justice to this tour-de-force, so let me just quote the abstract and encourage you to read the paper.
Abstract—Common intuitions are that adding thin-tailed variables with finite variance has a linear, sublinear, or asymptotically linear effect on the total combination, from the additivity of the variance, leading to convergence of averages. However it does not take into account the most minute model error or imprecision in the measurement of probability. We show how adding random variables from any distribution makes the total error (from initial measurement of probability) diverge; it grows in a convex manner. There is a point in which adding a single variable doubles the total error. We show the effect in probability (via copulas) and payoff space (via sums of r.v.).
Higher dimensional systems – if unconstrained – become eventually totally unpredictable in the presence of the slightest error in measurement regardless of the probability distribution of the individual components.
The results presented are distribution free and hold for any continuous probability distribution with support in R.
Finally we offer a framework to gauge the tradeoff between added dimension and error (or which reduction in the error at the level of the probability is necessary for added dimension).
Dang … talk about alarmism, that’s scary stuff. Here’s one quote:
In fact errors are so convex that the contribution of a single additional variable could increase the total error more than the previous one. The nth variable brings more errors than the combined previous n-1 variables!
The point has some importance for “prediction” in complex domains, such as ecology or in any higher dimensional problem (economics). But it also thwarts predictability in domains deemed “classical” and not complex, under enlargement of the space of variables.
Read the paper. Even without an understanding of the math involved, the conclusions are disturbing, and I trust Taleb on the math … not that I have much option.
H/T to Dr. Judith Curry for highlighting the paper on her excellent blog.
w.
As Usual: Let me request that if you disagree with someone, please quote the exact words you are referring to. That way we can all understand the exact nature of your objections.
Abstract—Common intuitions are that adding thin-tailed variables with finite variance has a linear, sublinear, or asymptotically linear effect on the total combination, from the additivity of the variance, leading to convergence of averages. However it does not take into account the most minute model error or imprecision in the measurement of probability. We show how adding random variables from any distribution makes the total error (from initial measurement of probability) diverge; it grows in a convex manner. There is a point in which adding a single variable doubles the total error. We show the effect in probability (via copulas) and payoff space (via sums of r.v.).
The real issue here is, computers have made it ridiculously easy to take any parameter at all and run it to infinity. Very few systems run to infinity in nature but computers do this easily, it is natural for them to do this and indeed, I remember the University of Chicago’s Univac system which my father used to make celestial calculations and which was used for nuclear bomb data, too, for example.
The scientists back then discussed the tendency of Univac to run to infinity and how to avoid this. It was seen as a problem, not as something real we should worry about in the real world. That is, being scared to death by computer programs that are designed to show ‘global warming’ running to infinity are actually artifacts of how computers respond to data inputs that are poorly described to avoid infinity factors!
Do note that more and more real astronomers are fed up with these stupid predictive computer programs for the weather.
This brings to mind a chemical synthesis I was responsible for many years ago. We were making a component of “carbonless carbon paper”. Sometimes it worked well, the next time we got garbage. We tried to hold all the known variables constant, but the result was unpredictable. After a total failure one day, I was called to a meeting to answer for my failure. After I had been berated sufficiently, the boss asked the development chemist what happened. His answer was “I don’t know, the same thing happens to us in the lab and we have no idea why”. Some unknown in the process was controlling the reaction path and we didn’t know what it was. Likely an unknown unknown was at work. I suspect there are several such unknown unknowns at work in weather and climate.
This is my thing. I have written predictive modeling code for models with hundreds of dimensions, and the well-known curse of dimensionality makes building usable models an exercise in highly advanced mathematics, especially optimization theory. It appears in many specific fields of physics as well — e.g. systems with broken ergodicity, complex systems, spin glasses and the like (which is where I first learned of it).
cells, each of which holds a unique combination for a person. If you sample from this space, there aren’t enough humans even if you count every human who has ever lived to have a good chance of populating more than a vanishing fraction of the cells. It seems as though forming an accurate joint probability distribution for anything on top of this space is pointless — one needs to have at least 30+ individuals in a cell for its probability and variance to start to have much predictive meaning.
In part it is related to the way volume scales in high dimensional Euclidean spaces (although modelling may be a mix of discrete and continuous inputs). Almost all of the volume of a high-dimensional hypersphere is in a thin differential shell at its maximum radius. I rather suspect that this is the fundamental geometric origin of Taleb’s observation regarding error. A single step in a single variable can alter the total hyperspherical volume of possibilities by more than the total volume of the system before the step.
One does need to be a bit careful in one’s conclusions, though. Many systems with high dimensionality have many irrelevant or insensitive dimensions. Others have structure that is projective — that lives in a (possibly convoluted!) hypervolume of much lower dimensionality than the full space. The trick is in being able to construct a joint probability distribution function that has the same projective structure and dimensionality. So it isn’t quite fair to state that one can’t do anything with 100 dimensional modeling. I’ve built very excellent predictive models on top of up to 400-500 variables, and that was using computers twenty years ago. There are methods that would probably work for 4000 variables — for some problems. OTOH, he is quite right — for other problems even 200 variables are inaccessible.
I like to illustrate the curse of dimensionality by imagining a mere 100 dimensional space of binary variables. That is, suppose you were trying to describe a person with nothing but yes/no questions: Smoker (Y/N), Male (Y/N), etc. Then your space would, in principle, have
But of course, this is not true! We can predict the probability of an individual going into the men’s room or the ladies room with remarkable accuracy given a few hundred random samples drawn from the population of individuals who use a public bathroom. Nearly all of the variables are irrelevant but gender and age (with a tail involving probable occupation that might be much more difficult to resolve). The trick is finding a way of building the model that can discover the correct projective subspace from comparatively sparse data on the larger space and that is capable of building a nonlinear function on this space that approximates the joint probability distribution for the targeted behavior or result.
There are a number of methods one can use for this, but they are not trivial! They involve some of the most difficult math and computation on the planet, and success or failure is often empirical — beyond around 20 variables, one cannot be certain of success because a sufficiently convoluted joint probability distribution or one with pathological structure simply cannot be discovered with the methods available and a “reasonable” amount of data. A model that is truly (projectively) 100 dimensional in a 1000 dimensional space is very, very difficult to discover or build.
rgb
I read this first in a FaceBook group, not the WUWT group, and the poster mentioned:
I replied:
The Abstract starts with:
I guess it can be applied. There’s some discussion that weather extremes are more “thick tailed” than a normal (Gaussian) distribution and that’s presented as an explanation of why extreme weather is more frequent than expected (e.g. the joke that one in hundred year events seem to happen every 20 years.)
Warmists like to point to major floods in the last 15 years as an example of the weather is becoming more extreme, but they look at the event distribution instead of looking at all of the weather record. For example, in New England we had several major events between 1927 and 1938. While 1938 involved a hurricane, the first of a couple decades of several hurricanes reaching New England, the ground was already saturated from previous heavy rains.
See my http://wattsupwiththat.com/…/weather-before-and-after…/ for more.
Dang FB cut & paste. The link is http://wattsupwiththat.com/2013/09/21/weather-before-and-after-the-hurricane-of-1938/
Have to admit – the hockey stick graph cracked me up.
Given the birthday paradox, the probably of two independent variables both being outside the bog-standard 95% confidence is 1-e^(-n(n-1)/2*possibilities), where possibilities is 20 and number of variables n = 2, that’s 4.9%. With 5 variables that’s a 40% chance, and with 10 variables it’s a 90% chance that two variables are outside the confidence interval. I haven’t calculated it for “two or more” yet, but I should.
Too bad there are two threads on this. not sure where to post this.
If you reverse this and you want 95% confidence that no two independent variables are outside their respective 95% confidence interval, for n=5 variables you need p=0.008 for each variable and for n=10 you need p=0.001 for each variable. (found via goal-seek in Excel).
This analysis should be distribution independent but IANNT (I Am Not Nicholas Taleb)
Since most measurement error bars are posted at 95% confidence (2-sigma), then this applies to real world measurements. If I combine those measurements into a model I’ll get increasingly likelyhood (quickly!) of GIGO as I add measurements to the model. It should also apply to multiple ANOVA or any model that involves multiple variables that involve some sort of distribution of those variables.
Feel free to smash away at my bad assumptions and math. If you really need help programming the simple equation into Excel I’ll post it on request…
Peter
references:
https://en.wikipedia.org/wiki/Birthday_problem#Approximations
Don’t waste your time on this paper. It’s written without acceptable definitions and is filled with non-sequitors. Because he fails to establish a basis for his discussion and fails to proceed logically, his conclusions (whatever they may be) are irrelevant. Tragically, many mathematics papers are published in this format.
@Willis; I’m surprised that a bunch of mathematical techno-babble impresses you.
But you like quotations, so let’s start with the first paragraph:
“Let us explore how errors and predictability degrades with dimensionality. ” (Sophomoric prose)
“Some mental bias leads us to believe that as we add random variables to a forecasting system (or some general model), the total error would grow in a concave manner (sqrt{n} as is assumed or some similar convergence speed), just as we add observations. ” (Sophomoric & assuming that the rate of growth is related to sqrt{n} seems to be about right. What is this guy talking about? Furthermore, he’s editorializing about mental bias in a math paper before he’s demonstrated anything. Very bad form!)
“An additional variable would increase error, but less and less marginally.” (He isn’t proposing anything here. It’s uninterpretable.)
[Next paragraph]
“The exact opposite happens. This is reminiscent of the old
parable of the chess game, with the doubling of the grains of
rice leading to more and more demands for each incremental
step, an idea also used with the well-known intractability of
the traveling salesman problem.” (He has given no reason to believe that the opposite happens. Pure editorial!)
[Next paragraph]
“In fact errors are so convex that the contribution of a single
additional variable could increase the total error more than
the previous one. The nth variable brings more errors than
the combined previous n!” (This is in the *BACKGROUND* section!?! Indeed, the nth variable could make the error transfinite, but that is left as an exercise 🙂
[END QUOTATIONS]
I refuse to go on in this vein. If you can’t explain yourself properly in the first few paragraphs, why is everything going to go smoothly after that?
” I refuse to go on in this vein. ”
It couldn’t be that his paper is another reason GCM’S are worthless?
Here was my favorite,
“There are ways to derive the PDF of the product of beta
distributed independent random variables in terms of Meijer
functions (Tang and Gupta, [7], Bhargaval et al,[8], most recent
Dunkl, [9]). But it is not necessary at this stage as we can get
the moments using the product rule and compute the relative
errors.”
Why mention “There are ways..”? Who cares? If the all that is “necessary at this stage” then why the non-sequitur? Taleb is just trying to show us how much of a smarty pants he is and for those who’ve done published such derivations, he’s not coming across as a smarty pants. He’s just baffling those who can’t follow the details of his argument.
“Don’t waste your time on this paper. It’s written without acceptable definitions and is filled with non-sequitors.”
+1
“There are three kinds of lies. Lies, damned lies, and statistics.”
This guy is saying something regarding the lack of proven facts in, what, GCM’s, statistical analyses, or so-called meta-analysis? Duh. Most unintelligible language I have seen on this blog, even worse than Tisdale, as well, really?
Scammer.
QED. Otherwise, what exactly have you done for us???
At first glance, it appears that extrapolationist concavities are abstrusely ill-confabulated in this paper.
You guys realize NNT has gone on record saying he’s “super-Green” and that “my [Taleb’s] position on the climate is to avoid releasing pollutants into the atmosphere, regardless of current expert opinion”, right? Hell, he even took part in some climate change conference hosted by the king of Sweden.
Read on for yourselves: http://www.theguardian.com/commentisfree/2009/aug/27/climate-change-taleb-tax-conservatives
Nassim Taleb writes clever theoretical stuff about “black swans” and predictability but to me, he is like the wise professors I knew in college who were brilliant but couldn’t forecast their way out of a paper bag. The ability to forecast is a unique trait which requires knowledge of the operable “physics”, statistical smarts, and a good dose of common sense. Academics, and esp climate scientists, generally just have one of these traits.
This is one reason good weather forecasters tend to be climate skeptics. They understand the complexity of forecasting especially with a small verification sample size.
Taleb ran a tail-hedging hedge fund. It did not do well and closed down in 2004. Now he writes about risk but doesn’t make his money from actual trading. As a real-world forecast modeller, I’m unimpressed.