Guest Post by Willis Eschenbach
Among the papers in the Copernicus Special Issue of Pattern Recognition in Physics we find a paper from R. J. Salvador in which he says he has developed A mathematical model of the sunspot cycle for the past 1000 yr. Setting aside the difficulties of verification of sunspot numbers for say the year 1066, let’s look at how well their model can replicate the more recent record of last few centuries.
Figure 1. The comparison of the Salvador model (red line) and the sunspot record since 1750. Sunspot data is from NASA, kudos to the author for identifying the data.
Dang, that’s impressive … so what’s not to like?
Well, what’s not to like is that this is just another curve fitting exercise. As old Joe Fourier pointed out, any arbitrary wave form can be broken down into a superposition (addition) of a number of underlying sine waves. So it should not be a surprise that Mr. Salvador has also been able to do that …
However, it should also not be a surprise that this doesn’t mean anything. The problem is that no matter how well we can replicate the past with this method, it doesn’t mean that we can then predict the future. As the advertisements for stock brokers say, “Past performance is no guarantee of future success”.
One interesting question in all of this is the following: how many independent tunable parameters did the author have to use in order to get this fit?
Well, here’s the equation that he used … the sunspot number is the absolute value of
Figure 2. The Salvador Model. Unfortunately, in the paper he does not reveal the secret values of the parameters. However, he says you can email him if you want to know them. I passed on the opportunity.
So … how many parameters is he using? Well, we have P1, P2, P3, P4, F1, F2, F3, F4, N1, N2, N3, N4, N5, N6, N7, N8, L1, L2, L3, and L4 … plus the six decimal parameters, 0.322, 0.316, 0.284, 0.299, 0.00501, and 0.0351.
Now, that’s twenty tunable parameters, plus the six decimal parameters … plus of course the free choice of the form of the equation.
With twenty tunable parameters plus free choice of equation, is there anyone who is still surprised that he can get a fairly good match to the past? With that many degrees of freedom, you could make the proverbial elephant dance …
Now, could it actually be possible that his magic method will predict the future? Possible, I suppose so. Probable? No way. Look, I’ve done dozens and dozens and dozens of such analyses … and what I’ve found out is that past performance is assuredly no guarantee of future success.
So, is there a way to determine if such a method is any good? Sure. Not only is there such a method, but it’s a simple method, and we have discussed the method here on WUWT. And not only have we discussed the testing method, we’ve discussed the method with various of the authors of the Special Issue … to no avail, so it seems.
The way to test this kind of model is bozo-simple. Divide the data into the first half and the second half. Train your model using only the first half of the data. Then see how it performs on the second half, what’s called the “out of sample” data.
Then do it the other way around. You train the model on the second half, and see how it does on the first half, the new out-of-sample data. If you want, as a final check you can do the training on the middle half, and see how it works on the early and late data.
I would be shocked if the author’s model could pass that test. Why? Because if it could be done, it could be done easily and cleanly by a simple Fourier analysis. And if you think scientists haven’t tried Fourier analysis to predict the future evolution of the sunspot record, think again. Humans are much more curious than that.
In fact, the Salvador model shown in Figure 2 above is like a stone-age version of a Fourier analysis. But instead of simply decomposing the data into the simple underlying orthogonal sine waves, it decomposes the data into some incredibly complex function of cosines of the ratio of cosines and the like … which of course could be replaced by the equivalent and much simpler Fourier sine waves.
But neither one of them, the Fourier model or the Salvador model, can predict the future evolution of the sunspot cycles. Nature is simply not that simple.
I bring up this study in part to point out that it’s like a Fred Flintstone version of a Fourier analysis, using no less than twenty tunable parameters, that has not been tested out-of-sample.
More importantly, I bring it up to show the appalling lack of peer review in the Copernicus Special Issue. There is no way that such a tuned, adjustable parameter model should have been published without being tested using out of sample data. The fact that the reviewers did not require that testing shows the abysmal level of peer review for the Special Issue.
w.
UPDATE: Greg Goodman in the comments points out that they appear to have done out-of-sample tests … but unfortunately, either they didn’t measure or they didn’t report any results of the tests, which means the method is still untested. At least where I come from, “test” in this sense means measure, compare, and report the results for the in-sample and the out-of-sample tests. Unless I missed it, nothing like that appears in the paper.
NOTE: If you disagree with me or anyone else, please QUOTE WHAT YOU DISAGREE WITH, and let us know exactly where you think it went off the rails.
NOTE: The equation I show above is the complete all-in-one equation. In the Salvador paper, it is not shown in that form, but as a set of equations that are composed of the overall equation, plus equations for each of the underlying composite parameters. The Mathematica code to convert his set of equations into the single equation shown in Figure 2 is here.
BONUS QUESTION: What the heck does the note in Figure 1 mean when it says “The R^2 for the data from 1749 to 2013 is 0.85 with radiocarbon dating in the correlation.”? Where is the radiocarbon dating? All I see is the NASA data and the model.
BONUS MISTAKE: In the abstract, not buried in the paper but in the abstract, the author makes the following astounding claim:
The model is a slowly changing chaotic system with patterns that are never repeated in exactly the same way.
Say what? His model is not chaotic in the slightest. It is totally deterministic, and will assuredly repeat in exactly the same way after some unknown period of time.
Sheesh … they claim this was edited and peer reviewed? The paper says:
Edited by: N.-A. Mörner
Reviewed by: H. Jelbring and one anonymous referee
Ah, well … as I said before, I’d have pulled the plug on the journal for scientific reasons, and that’s just one more example.
Discover more from Watts Up With That?
Subscribe to get the latest posts sent to your email.
In fact I could probably get a curve to closely match the red one with only 3 or so parameters, one of 11 years, one of around 100 years and one longer than the whole time period plus offsets, etc.
Would be just as meaningless though!
Strange then that various commentators did actually predict the current solar quietness whilst the establishment was still predicting that cycle 24 would be another strong one.
http://personal.inet.fi/tiede/tilmari/sunspots.html
No doubt mention of Timo’s work will cause apoplexy in some quarters but he wasn’t the only one.
Stephen Wilde says:
January 22, 2014 at 2:44 am
Not sure how this relates to the lack of peer-review and the lack of out-of-sample testing of the Salvador model …
w.
PS—there are a whole lot of folks out there guessing the size of the next solar cycle, based on various things. One thing is for sure … the next solar cycle will be either larger or smaller than this one.
And that means that in a general sense, your best bet is that half of the prognosticators will be right and half wrong.
I am sure that there is a chaotic input into solar activity that would throw this. Also the sun does not have an inexhaustable supply of fuel. This is gradually changing with nuclear fusion reactions so this will alter output as time goes by.
Willis,
That equation doesn’t appear like that in the published paper. It looks like your own expansion of the SNC that somewhat obfuscate the origin of the (decimal) numbers.
Salvador describes the origin of each of the constants in his equations. Including their physical basis. And the methods of derivation of the phase parameters and scalars from physical observations via non-linear least squares optimisation of Salvador’s SNC equation.
There actually is quite a bit of evidence supporting a millennial-scale Holocene climate cycle (quasi periodic fluctuation if you prefer). A power spectrum of the GISP2 ice core indicates a very significant 950-1100 year “cycle”…
http://i90.photobucket.com/albums/k247/dhm1353/DavisandBohlingFig6a.png
http://i90.photobucket.com/albums/k247/dhm1353/DavisandBohlingFig6a.png
This doesn’t necessarily validate the solar model in question.
von Neumann’s observation regarding elephants is valid. A model based on the physics of a phenomenon will allow maximum precision of ‘forecasting’ with a minimum number of ‘adjustable parameters’.
A mathematical model not based entirely on physics is not empirically testable. If it is not testable, it is not science.
In the 90’s I published a model for calculating the anomalous viscosity of mixing for mixtures of gases. Because I based the model on kinetic molecular theory, I needed only one parameter which I could not adequately justify theoretically – an exponent of exactly 1/3 in the mixing term. The model gave results that removed all secular variability from the residuals.
The procedure you outline for testing the model on out-of-sample data may be useful for determining whether or not the model is reflective of a real-world physical process, but provides only hints at best of where to look for the physics involved.
Huh Willis, you’ve gone into overdrive…
If the constants (fixed numbers) have now physical interpretation then forget it; if they do it may mean something but not implicitly so
http://www.vukcevic.talktalk.net/PF.htm
Correction: in the above comment it should be NO for now
Stephen Wilde says:
January 22, 2014 at 2:44 am
Actually, most “establishment” predictions weren’t for a strong cycle, they were for a weak cycle. There is a list of them here, along with an interesting analysis of the various methods. See Table 1.
w.
“It is totally deterministic, and will assuredly repeat in exactly the same way after some unknown period of time.”
Deterministic it is. Periodic, it’s not. Except if N1~N8 are zero.
It’s obvious it’s just a curve fitting excercise but I believe the number of free parameters is not the main argument here. It’s just the easiest to reach argument. Even such number of parameters could be excused if the formula was making physical sense. But it does not.
Yes sure. Salvador only needs to tell Jupiter, Uranus, Earth and Venus to change the rates they orbit at in order to tune his parameters.
It must be great playing god. Willis should know.
Bernd Felsche says:
January 22, 2014 at 3:14 am
It “looks like my own expansion”? I can put the words up on the silver screen … but you have to read them. In the head post I pointed out that the equation doesn’t appear like that in the paper. I discussed the exact expansion I used. I posted a link for the Mathematica code for the expansion … and now you come along to repeat what I said, like you’ve made some discovery?
Yes, I know that the “phase parameters and scalars” are fit, I discussed that as well. You seem impressed that he used twenty fitted parameters. Did you read the link about the elephant?
Next, while there is a “physical basis” and an “origin” of the constants in that they represent real astronomical ratios, given that there are hundreds and hundreds of equally real astronomical ratios, their choice of which ones to use is equally arbitrary.
Finally, they’ve used, not the exact timing of the astronomical cycles, but a series of slightly different values near to the exact timing … which of course is how they got the beat frequencies you see in Figure 1.
Are you really impressed by this curve fitting exercise? Why not just use Fourier analysis? Do you believe, as the author does, that his is a chaotic, non-repeating model? Do you think his method will work on out-of-sample data?
w.
Kasuha says:
January 22, 2014 at 3:29 am
Thanks, Kasuha. Each of the individual sin and cosine functions that make up the equation repeats in a regular periodic manner. How can their sum not be periodic?
If what you are saying is true, seems like it would make a theoretically perfect random number generator, one that never, ever repeats… and I doubt that.
Seems to me that the sum/product/difference whatever of a finite number of infinitely repeating cyclical functions has to be a repeating cyclical function, and that that is a recurring and big problem in random number generators … but I’ve been wrong before …
w.
No (computer) model or analysis will ever give correct results if (key) parameters are missing. If those are missing due to ignorance or deliberate behaviour does not matter (obviously). That is the reason why you (Willis) has not found something that works in this matter. As you point out, the Salvador model is to simple to generate any substance for any conclusion (other then crap).
Reversed engineering works in general, but it requires a fundamental understanding. Ie, to have knowledge of the big picture, but nobody today is in that position yet. To perform reversed engineering of a chaotic system, however, is impossible in practice due to its complexity.
(Extremely/Very) Low-frequency parameters can be very difficult to identify, but can not be ignored.
What is the most complex – the climate or the human brain? (The human brain is still not fully mapped yet …)
tallbloke says:
January 22, 2014 at 3:30 am
Tallbloke, first, I made a clear distinction between the “tunable parameters” and the decimal constants. So your objection makes no sense, the tunable parameters have nothing to do with Jupiter or anything at all … that’s why they are “tunable”.
Next, given twenty tunable parameters, plus an infinite choice of forms for the equation, I don’t care what six astronomical constants you might pick—with 20 tunable parameters and my choice of equations, I guarantee you I can make the curves fit no matter what the six other constants you might hand me.
It’s easy because I can just do what Salvador did. It appears that you didn’t notice that he doesn’t actually use the astronomical constants themselves. Instead, he uses the astronomical constants either increased or decreased by the value of one of the many tunable parameters. That’s how he gets the beat frequencies shown in Figure 1 … and since I have free choice of the form of my equation, the choice of astronomical constants doesn’t matter. I’ll just change the tunable parameters to make up the difference. If you choose a parameter that is 178.8 years, and I need 76.4 years to make my formula work, I’ll just multiply it by an appropriately sized parameter.
Regards,
w.
He seems to have reduced 260+ years of data points to only 26 values with his rather lossy compression. He has indeed modeled the past, but I think it’s predictive ability is somewhat worse than how a random 30 second clip of MP3 lossy compressed music can “model” the next 30 seconds.
Perhaps off topic, but do the global climate models also suffer from lack of out of sample testing?
Reg. Blank says:
January 22, 2014 at 4:12 am
—————————————–
LoL! I just spit coffee all over my key board. I need to avoid Willis posts first thing in the morning, they end up being too entertaining!
So the ‘official’ SIDC sunspot numbers can be fitted with an expression with many parameters. But here are more than one sunspot series. There is the Group Sunspot Series and there is the Wolf Numbers corrected for Waldmeier’s weighting of sunspots since 1947. These series are different from the SIDC series so presumably must have their own fitting expression. If so, the whole thing is just different curve fittings with no physical content. http://www.leif.org/research/Long-term-Variation-Solar-Activity.pdf
Willis says:
“I guarantee you I can make the curves fit no matter what the six other constants you might hand me. “
OK, game on.
278 days
1.3 years
8.4 years
98 years
for the four planets orbital periods. Now, not all the other parameters are completely free, as you would know if you’ve read R.J.s paper carefully. Some of their cyclicities are bound to the planetary orbital periods. So bearing that in mind, off you go, play fair, and don’t forget to show your working.
lsvalgaard says:
January 22, 2014 at 5:22 am
……………..
Hi Dr. S. Thanks for the reply (Danish data).
Agree with the above, that is why ‘superior elegance’ (??!!) of my formula doesn’t fit any of the above, but tells in its crude simplicity what the ‘mother nature’ had in mind some (was it ?) 4 billion of years ago, but again things have moved on since then.
🙂 🙂
By the way, R.J.s model’s latest iteration is up to R^2=0.91
http://tallbloke.files.wordpress.com/2013/09/rjs-model-9-9-13.jpg
The paper lays out the physics that were used to derive the parameters in the math. So you appear to be misrepresenting the paper. So then, please show us your math where you successfully reduced the physics of the entire planetary system to fewer than twenty or so. Oh, you can’t do that? So then why attempt to ridicule a paper that has twenty or so parameters that are linked to physical attributes of the planetary system? I’ve seen some poor arguments against papers before but your “arguments” against this paper – aren’t scientific.
AFAIK, or probably best to say AFAINK, the sun is a stochastic process at least to a certain unknown extent.
Those equations may be usefull to detect the deterministic component, but it is still just a nice numerology example.