Riding A Mathemagical Solarcycle

Guest Post by Willis Eschenbach

Among the papers in the Copernicus Special Issue of Pattern Recognition in Physics we find a paper from R. J. Salvador in which he says he has developed A mathematical model of the sunspot cycle for the past 1000 yr. Setting aside the difficulties of verification of sunspot numbers for say the year 1066, let’s look at how well their model can replicate the more recent record of last few centuries.

the magical sunspot model figure 3Figure 1. The comparison of the Salvador model (red line) and the sunspot record since 1750. Sunspot data is from NASA, kudos to the author for identifying the data.

Dang, that’s impressive … so what’s not to like?

Well, what’s not to like is that this is just another curve fitting exercise. As old Joe Fourier pointed out, any arbitrary wave form can be broken down into a superposition (addition) of a number of underlying sine waves. So it should not be a surprise that Mr. Salvador has also been able to do that …

However, it should also not be a surprise that this doesn’t mean anything. The problem is that no matter how well we can replicate the past with this method, it doesn’t mean that we can then predict the future. As the advertisements for stock brokers say, “Past performance is no guarantee of future success”.

One interesting question in all of this is the following: how many independent tunable parameters did the author have to use in order to get this fit?

Well, here’s the equation that he used … the sunspot number is the absolute value of

the magical sunspot modelFigure 2. The Salvador Model. Unfortunately, in the paper he does not reveal the secret values of the parameters. However, he says you can email him if you want to know them. I passed on the opportunity.

So … how many parameters is he using? Well, we have P1, P2, P3, P4, F1, F2, F3, F4, N1, N2, N3, N4, N5, N6, N7, N8, L1, L2, L3, and L4 … plus the six decimal parameters, 0.322, 0.316, 0.284, 0.299, 0.00501, and 0.0351.

Now, that’s twenty tunable parameters, plus the six decimal parameters … plus of course the free choice of the form of the equation.

With twenty tunable parameters plus free choice of equation, is there anyone who is still surprised that he can get a fairly good match to the past? With that many degrees of freedom, you could make the proverbial elephant dance …

Now, could it actually be possible that his magic method will predict the future? Possible, I suppose so. Probable? No way. Look, I’ve done dozens and dozens and dozens of such analyses … and what I’ve found out is that past performance is assuredly no guarantee of future success.

So, is there a way to determine if such a method is any good? Sure. Not only is there such a method, but it’s a simple method, and we have discussed the method here on WUWT. And not only have we discussed the testing method, we’ve discussed the method with various of the authors of the Special Issue … to no avail, so it seems.

The way to test this kind of model is bozo-simple. Divide the data into the first half and the second half. Train your model using only the first half of the data. Then see how it performs on the second half, what’s called the “out of sample” data.

Then do it the other way around. You train the model on the second half, and see how it does on the first half, the new out-of-sample data. If you want, as a final check you can do the training on the middle half, and see how it works on the early and late data.

I would be shocked if the author’s model could pass that test. Why? Because if it could be done, it could be done easily and cleanly by a simple Fourier analysis. And if you think scientists haven’t tried Fourier analysis to predict the future evolution of the sunspot record, think again. Humans are much more curious than that.

In fact, the Salvador model shown in Figure 2 above is like a stone-age version of a Fourier analysis. But instead of simply decomposing the data into the simple underlying orthogonal sine waves, it decomposes the data into some incredibly complex function of cosines of the ratio of cosines and the like … which of course could be replaced by the equivalent and much simpler Fourier sine waves.

But neither one of them, the Fourier model or the Salvador model, can predict the future evolution of the sunspot cycles. Nature is simply not that simple.

I bring up this study in part to point out that it’s like a Fred Flintstone version of a Fourier analysis, using no less than twenty tunable parameters, that has not been tested out-of-sample.

More importantly, I bring it up to show the appalling lack of peer review in the Copernicus Special Issue. There is no way that such a tuned, adjustable parameter model should have been published without being tested using out of sample data. The fact that the reviewers did not require that testing shows the abysmal level of peer review for the Special Issue.

w.

UPDATE: Greg Goodman in the comments points out that they appear to have done out-of-sample tests … but unfortunately, either they didn’t measure or they didn’t report any results of the tests, which means the method is still untested. At least where I come from, “test” in this sense means measure, compare, and report the results for the in-sample and the out-of-sample tests. Unless I missed it, nothing like that appears in the paper.

NOTE: If you disagree with me or anyone else, please QUOTE WHAT YOU DISAGREE WITH, and let us know exactly where you think it went off the rails.

NOTE: The equation I show above is the complete all-in-one equation. In the Salvador paper, it is not shown in that form, but as a set of equations that are composed of the overall equation, plus equations for each of the underlying composite parameters. The Mathematica code to convert his set of equations into the single equation shown in Figure 2 is here.

BONUS QUESTION: What the heck does the note in Figure 1 mean when it says “The R^2 for the data from 1749 to 2013 is 0.85 with radiocarbon dating in the correlation.”? Where is the radiocarbon dating? All I see is the NASA data and the model.

BONUS MISTAKE: In the abstract, not buried in the paper but in the abstract, the author makes the following astounding claim:

The model is a slowly changing chaotic system with patterns that are never repeated in exactly the same way.

Say what? His model is not chaotic in the slightest. It is totally deterministic, and will assuredly repeat in exactly the same way after some unknown period of time.

Sheesh … they claim this was edited and peer reviewed? The paper says:

Edited by: N.-A. Mörner

Reviewed by: H. Jelbring and one anonymous referee

Ah, well … as I said before, I’d have pulled the plug on the journal for scientific reasons, and that’s just one more example.

0 0 votes
Article Rating

Discover more from Watts Up With That?

Subscribe to get the latest posts sent to your email.

132 Comments
Inline Feedbacks
View all comments
dikranmarsupial
January 22, 2014 8:16 am

Dan says: “Maybe we need Google to invent an algorithm to score “open access” papers (a substitute for peer review) in order to assist in this process of finding the quality needles in the paper glut haystack.”
Actually, it does, Google scholar will tell you how many times a paper has been cited; that is usually a pretty good indication of the value of a paper. Impact factor is a reasonable measure of the quality of a journal (although you can’t directly compare the impact factors of journals in different fields as it depends on the number of researchers publishing in the field etc.).

gdp
January 22, 2014 8:37 am

Willis — You write:

Each of the individual sin and cosine functions that make up the equation repeats in a regular periodic manner. How can their sum not be periodic?
If what you are saying is true, seems like it would make a theoretically perfect random number generator, one that never, ever repeats… and I doubt that.
Seems to me that the sum/product/difference whatever of a finite number of infinitely repeating cyclical functions has to be a repeating cyclical function, and that that is a recurring and big problem in random number generators

You are falsely conflating “periodic” with “quasiperiodic,” and conflating “nonperiodic” with “random.”
A sum/product/difference of periodic functions will only be periodic if all the periods are commensurate, i.e., have a least common multiple. If the periods are irrationally related rather than commensurate, then while the sum/product/difference will resemble a periodic function, it will never repeat exactly — hence the term “quasiperiodic.” A simple example is the sum cos(t) + cos(sqrt(2)*t): Because the two periods are related by an irrational number, sqrt(2), there is no value of T such that cos(t+T) + cos(sqrt(2)(t+T)) == cos(t) + cos(sqrt(2)*t) for all t; hence, the sum of two noncomensurate periodic functions is NOT periodic. While one can find T’s such that the average absolute error between f(t) and f(t+T) of a “quasiperiodic” function becomes smaller than some specified value “epsilon,” it can never be exactly zero; in the given example, these “quasiperiods” are related to the “continued fraction expansion” of sqrt(2).
Admittedly, any finite-precision approximation to a quasiperiodic function will be periodic, because any set of finite-precision numbers necessarily has a finite least common multiple — but that false “periodicity” is a property of the finite precision approximation, not the underlying quasiperiodic function — which itself can never repeat exactly, but only approximately in an “average absolute error smaller than some specified epsilon” sense.

Andyj
January 22, 2014 8:43 am

So, get drawing the future solar activity.
Lets see how the solar tides work out.
Sure its not based on scientific principles but its a fair basis for a hypothesis.
If the numbers work back 1000yrs, whats the odds it will aid a reasonable guess for the next 90 solar cycles?

January 22, 2014 8:47 am

tallbloke says:
January 22, 2014 at 3:30 am
“Yes sure. Salvador only needs to tell Jupiter, Uranus, Earth and Venus to change the rates they orbit at in order to tune his parameters.”
If there is a physical basis underlying the “fit” equation, presumably it would be the perfect case for classical fourier analysis. Why would any physically based generator need to be tunable? Also, with perhaps hundreds of effects, surely the lesser order ones would give apparent ‘noise’.

Rhys Jaggar
January 22, 2014 8:48 am

Perhaps a more useful suggestion would be to gather those physicists with the greatest insights into which leading indicators of solar behaviour are most useful in predicting amplitude, length and year of maximum within the next 1, 2 or possibly 3 cycles. I get the impression that there are a few out there who got it quite right this time, so either they know the right parameters or they just got lucky.
At least it would tell the world whether or not the scientists can agree on which are the most important solar parameters to measure right now…….

January 22, 2014 9:06 am

Unfortunately this modelling technique is fully specious. The cluster of typically two to three weaker solar cycles that occur on average every 10 solar cycles, are periodic events and in no way are they a continuous cycle that is modulated by any other cycles. The real timing of these events can be away from the average of 110.7yrs by as much as two solar cycles, due to slips in the planetary harmonic periods producing uneven gaps between events, and the non circularity of orbits. So any fixed period used for modelling is guaranteed to be off target at times, e.g. at fixed 110.7yr intervals for the start of weak solar cycle clusters: 1680, 1792, 1902 and 2013, has the 1902 minimum starting too late, as it actually started around 1880. Where these clusters occur is always at a critical breakdown in the harmony of Jupiter, Earth and Venus, with one other planet.
There are a number of problems with the periods in the paper. The 178.8yr period is local planetary cycle and drifts way out sync in more than one step. The 1253yr was not explained, if it is based on 7*179 then it is spurious. The 19.528 seems to be the axial period of 165.5 and 22.14 and not the beat frequency, and why Hale and not Schwabe? And as the 178.8 is not repeatable, it cannot produce a 208yr beat with 1253yrs. I left an original and plausible explanation for a 207yr period at tallbloke’s on the thread where he banned me. It’s an event series though and not a cycle as such.

Joe Chang
January 22, 2014 9:08 am

On the test strategy of dividing the data set (1749-2013) into two, we would have 132 years in each sub-set. From the 132 year set, we could get a reasonable ideal for patterns that have cycles of less than 132 years, but no idea of patterns of greater than 132 years. A significant pattern with a cycle in 132-264 year range would render the predictive capability of the 132 year set invalid. On the question of whether the full dataset has predictive powers – that would depend on whether there are patterns somewhat greater than 264 years. Any patterns of very long periodicity would not affect short-term horizon.

January 22, 2014 9:10 am

If the sunspot numbers are due to the interaction of a variety of constant, cyclical processes, then curve fitting may indeed serve as an excellent predictor of future behavior, especially over the near term. In addition, the longer your calibration period is, the more likely your near term predictions are to be accurate (in the absence of chaotic processes). However, if chaotic processes are also involved then Willis is right. So in my opinion this curve fitting exercise is more a test of chaos vs the interaction of repeating cyclic phenomena.

Dan
January 22, 2014 9:11 am

dikranmarsupial
I am not convinced that the author citation index or journal “impact factor” is a reliable metric of quality. Maybe at first they are reliable, but only before the mass of low-quality authors figure out how to game them. The only really reliable way to figure out quality is to be embedded in a field and pay close attention to your established set of peers and the occasional upstart. For outsiders, this doesn’t work and it is these outsiders who develop aversions to “the establishment” and get taken for rides by junk authors and journals.

dikranmarsupial
January 22, 2014 9:17 am

impact factor is not that easy to game as it can be computed so that it discounts self-citations and citations from papers in the same journal. Google scholar also helps you judge the citations a paper has recieved because it will list for you all the papers that have cited it, and you can judge the quality from the reputations of the authors of the citing papers.
It is unlikely that predatory journals will try and game impact factors etc. There is no incentive for them to increase the quality of the journal, while performance is measured by quantity over quality there will always be a ready supply of authors wanting their papers published in the journal, no matter what its reputation.

Alan Millar
January 22, 2014 9:20 am

We humans love patterns we see them everywhere.
A lot of people don’t get it. I can have a known random generator. Within each block of random numbers there will be obvious or hidden patterns.
Once I know the data I can model the output with a few free parameters. Anybody can do it folks it is easy!
The trick is will the model match the next batch of random numbers and the one after that etc. The answer is not likely eventually progressing to a definite no.
Don’t believe any model constructed on known data, unless all the physical processes involved are known and agreed and its parameters are based on observation.
Alan

Curious George
January 22, 2014 9:35 am

Nitpicking on Chaotic and Deterministic: A deterministic system can be chaotic. In fact, chaotic attractors have first been described in deterministic systems.

David in Texas
January 22, 2014 9:37 am

“His model is not chaotic in the slightest. It is totally deterministic, and will assuredly repeat in exactly the same way after some unknown period of time.”
Willis,
With all due respect and great humility, I believe your definition of “chaotic” is not universally accepted. A system can be both deterministic and chaotic. Of course, it can also be probabilistic and chaotic. If is fair to say, his system is not probabilistic, but probabilistic is not interchangeable with chaotic.
“Chaos theory studies the behavior of dynamical systems that are highly sensitive to initial conditions…” (Wikipedia).
The three-body problem is an example of a deterministic system that is also chaotic. Every time you run the calculation you get the same results – deterministic. If you change the initial contentions slightly, you get much more than a slight change in the results – chaotic.
Hence, I believe that his model is chaotic. Any slight change in the initial conditions/parameters will dramatically change the results.

Michael D
January 22, 2014 9:54 am

I agree with you Willis – this is an astonishingly gimcrackery approach. I would add that the fit is not even very good (note the poor fits around 1780-1790, 1840, 1870, 1895, 1960). Perhaps it looks good because of the colours used. As you say, Fourier transform model with high frequencies removed would almost certainly do better.
I also agree with you statement “this model is not chaotic in the slightest. It is totally deterministic, and will assuredly repeat in exactly the same way after some unknown period of time.” For one thing, it is not sensitively dependent on initial conditions (unless he incorrectly thinks that the model parameters are initial conditions) because future states are completely determined by the equation, not by previous states.

Michael D
January 22, 2014 10:02 am

To clarify with respect to previous commenters: agreed that deterministic systems may be chaotic, but truly periodic systems cannot. The key distinguishing fact in this case is that chaotic systems are (as far as I know) always driven by dynamical equations in which future states evolve from past states, thus enabling sensitive dependence on initial conditions and strange attractors.

toms3d
January 22, 2014 10:06 am

Dear Willis, What is the point of this article? I have no problem with you comments on testing using subsets of the data to train and test that training against alternate subset data. My approach to assessing what anyone says is all about trying first to understand the message and does it convey useful truth. I do not care what degrees or background whomever is communicating. I look to what the say and what they conclude.
I have read several of your pieces in the past and was very impressed.
The conclusion of the author is simply.
” Fortunately because the changes to the base frequencies and phasing occur slowly in terms of human life spans, we can make forecasts that may be useful”
I would proffer nothing you have discussed or argued would quantitatively improve the usefulness of any forecasts made. Well by accident they might but we both know that no model predicts with certainty. With thousands of tests and observations we may be able to statically predict some level of confidence in what a particular model is forecasting. But such is not the case in the modelling as described in R. J Salvador’s ” A mathematical model of the sunspot cycle for the past 1000 yr”
So in that there is no real technical issues I see in Salvador’s article and it’s conclusions and in that you are not adding any real technical insights. My conclusion is all your arguments seem to support or be about your current opinion that some sin was committed em-mass by the authors of the articles in now cancelled Physics of Pattern Recognition. In particular some real problem assigning some vast import to proper Peer Review.
I am not any where near as articulate as Jo Nova. So I would only suggest all read.
http://joannenova.com.au/2014/01/science-is-not-done-by-peer-or-pal-review-but-by-evidence-and-reason/ “Science is not done by peer or pal review, but by evidence and reason”
I would suggest Salvador has not said, “my papers is true because it is published, he or they all say “judge me by my work”.
In my decades in doing engineering science my work was mostly reviewed by pals, good friends and colleagues. The closer my friendship was with the reviewer the more brutal they were. When I reviewed other work I was most brutal or argumentative with my friends. To me a friend really wants to make sure I know exactly everything I am really stupid about.
Some of my own work on using models to predict the proper applications of energy are shown here. http://watman.com/PASTWORK/#ifsar
It was all about.
“Let’s follow due process in science, but that is not by review whether peer-or-pal, it’s by prediction, test, observation, and repeat.”

January 22, 2014 10:19 am

Alan Millar says:
January 22, 2014 at 9:20 am
“Within each block of random numbers there will be obvious or hidden patterns.”
=======================================================================
But by definition (of “random”) that is impossible. –AGF

dikranmarsupial
January 22, 2014 10:23 am

agfosterjr you are missing the point, the patterns are “obvious” to the human observer *even though they don’t exist*. That is the problem.

dikranmarsupial
January 22, 2014 10:33 am

Think about it again, perhaps the point was that for a finite length string of random digits there will be a pattern that allows the string to be algorithmically compressed, even though such compression is impossible for an infinite length string. In statistical modelling, this problem is called over-fitting, because it allows you to make a model that explains a particular sample of data well, but is not able to predict future data because the pattern it exploits is spurious.

Matthew R Marler
January 22, 2014 11:02 am

Good presentation. Personally, I think the best test of predictivity is future out of sample data, but splitting the sample should at least be tried.
About this: BONUS MISTAKE: In the abstract, not buried in the paper but in the abstract, the author makes the following astounding claim:
The model is a slowly changing chaotic system with patterns that are never repeated in exactly the same way.
Say what? His model is not chaotic in the slightest. It is totally deterministic, and will assuredly repeat in exactly the same way after some unknown period of time.

A chaotic system is deterministic (unless it is a stochastic chaotic system, but no such claim is made here), but their model is perfectly periodic. Lots of chaotic systems can appear periodic over a few to many cycles, an example being the near periodicity of Earth’s revolution about the sun; another is the near periodicity of the heartbeat.

John West
January 22, 2014 11:18 am

I don’t understand why anyone would have issue with curve fitting in a paper in a journal named “Pattern Recognition”.
As of yet I have seen no one meet the challenge of finding something at least as erroneous as MBH98 in any of the papers in PRP.
Sure some people don’t like curve fitting but others do, and certainly we can find examples where curve fitting and pattern recognition were used to forecast the future long before the underlying physics was understood. (i.e.: seasons, tides)
Sure the reviewers were in general agreement with each other on a particular issue, so what, most all papers are reviewed by people who agree the world is round.
The response by the publisher is out of proportion with the supposed crime. They swatted a mosquito with a sledge hammer. It’s censorship. It’s not allowing dissention. It’s suppressing free speech. It’s wrong.
Why couldn’t counter arguments be published in other peer reviewed journals instead of obliterating the dissidents’ voice?
Is anyone forced to cite these papers?
This is craziness gone mad!
We should be attacking the publisher for hypocrisy, oppression, and religious fanaticism; not supporting the oppression of people just because we think they’re wrong or dislike their methodology (even though it’s been successfully used for millennia to advance our understanding of the universe).
When they came for the sky dragon slayers, I did nothing because I wasn’t a sky dragon slayer.
When they came for the curve fitters, I did nothing because I wasn’t a curve fitter.
When they came for the low sensitivity proponents, we were too few to resist.
(With apologies to Pastor Martin Niemöller)

Greg Goodman
January 22, 2014 11:22 am

marsupial says: “Actually, it does, Google scholar will tell you how many times a paper has been cited; ”
citation counting is measure of conformity, not quality.
Met Office cites japanese paper that is complete bunk and eliminates data that does not agree before “finding” that the data “validates” Hadley “bias corrections”.
All this means is that Hadley find a geographically subjective paper, that says their adjustments are good, is a convenient citation that supports their speculative correction methods.
In engineering this would be recognised as a positive feedback, which causes is inherently unstable but is inevitably bounded by a negative feedback. The result is a system that latches to an extreme.
The current AGW paradigm is such a latched in extreme.
Quality is not a key factor in such a process.

January 22, 2014 11:28 am

dikranmarsupial says:
January 22, 2014 at 10:33 am
=========================
You may be right about my missing Millar’s point, but about compressing random data, how is THAT possible (except by using the same generator)? –AGF

Bernie Hutchins
January 22, 2014 11:32 am

Fitting a “Fourier Series” is easy if there is a “fundamental”, which there doesn’t seem to be for planetary orbits. (Well, you can always choose a low-enough harmonic. The Earth’s orbit is the 8766th harmonic of a period of 1 hour!). You can also exactly fit a time series, GUARANTEED, to an artificially chosen (your choice) of harmonics by a Discrete Fourier Transform (the FFT). Or choose your own basic functions in a wavelet-like approach. Gram-Schmidt them if you like. Choosing your parameters based on physical data also sounds very advisable. Whatever you like.
Now, having fit your elephant, test it. Wait 50 years for the new data! Not wishing to wait, you can simply go back, discarding say 50 years of data, and recalculate the parameters to the then-available data. Calculate forward to today. Does it fit the data you threw out? It better. And it better not BE diverging more and more as you approach the present.
About 45 years ago in a physics experiment I tried curve fitting. I tried graph paper from every bin in the campus store. Some results were quite lovely. My Professor (Herbert Mahr – bless his heart) was kind enough to commend my industry and my artistic effort and then remark that “this doesn’t mean anything.” Curiously, the exact same four words Willis said above.

dikranmarsupial
January 22, 2014 11:36 am

agfosterjr wrote “You may be right about my missing Millar’s point, but about compressing random data, how is THAT possible (except by using the same generator)? –AGF”
On average you can’t compress random data, but that doesn’t mean you can never compress *any* sequence of random data. Say I flip a coin and it comes down H T H T H T, that is a random sequence that I can compress because it has a pattern, it isn’t a pattern that extends beyon the six coin flips I have observed, the next one may well be a tail.