Sunny Spots Along the Parana River

Guest Post by Willis Eschenbach

In a comment on a recent post, I was pointed to a study making the following surprising claim:

Here, we analyze the stream flow of one of the largest rivers in the world, the Parana ́ in southeastern South America. For the last century, we find a strong correlation with the sunspot number, in multidecadal time scales, and with larger solar activity corresponding to larger stream flow. The correlation coefficient is r = 0.78, significant to a 99% level.

I’ve seen the Parana River … where I was, it was too thick to drink and too thin to plow. So this was interesting to me. Particularly interesting because in climate science a correlation of 0.78 combined with a 99% significance level (p-value of 0.01) would be a very strong result … in fact, to me that seemed like a very suspiciously strong result. After all, here is their raw data used for the comparison:

parana streamflow fig 1Figure 1. First figure in the Parana paper, showing the streamflow in the top panel, and sunspot number (SN) and total solar irradiance (TSI) in the lower two panels.

They are claiming a 0.78 correlation between the data in panel (a) and the data in panel (b) … I looked at Figure 1 and went “Say what?”. Call me crazy, but do you see any kind of strong 11-year cycle in the top panel? Because I sure don’t. In addition, when the long-term average of sunspots rises, I don’t see the streamflow rising. If there is a correlation between sunspots and streamflow, why doesn’t a several-decade period of increased sunspots lead to increased streamflow?

So how did they get the apparent correlation? Well, therein lies a tale … because Figure 2 shows what they ended up analyzing.

parana streamflow fig 2

And wow, that sure looks like a very, very strong correlation … so how did they get there from such an unpromising start?

Well, first they took the actual data. Then, from the actual data they subtracted the “secular trends” (see dark smooth lines Figure 1). The effect of this first one of their processing steps is curious.

Look back at Figure 1. IF streamflow and sunspots were correlated, we’d expect them to move in parallel in the long term as well as the short term. But inconveniently for their theory … they don’t move in parallel. How to resolve it? Well, since the long-term secular trend data doesn’t support their hypothesis, their solution was to simply subtract that bad-mannered part out from the data.

I’m sure you can see the problems with that procedure. But we’ll let that go, the damage is fairly minor, and look at the next step, where the real destruction is done.

They say in Figure 2 that the sunspot data was “smoothed by an 11-yr running mean to smooth out the solar cycle”. However, it is apparent that the authors didn’t realize the effect of what they were doing. Calling what they did “smoothing” is a huge stretch. Figure 3 shows the residual sunspot anomaly (in blue) after removing the secular trend (as the authors did in the paper), along with the 11-year moving average of that exact same data (in red). Again as the authors did, I’ve normalized the two to allow for direct comparison:

normalized sunspot anomaly and 11 yr running meanFigure 3. Sunspot anomaly data (blue line), compared to the eleven-year centered moving average of the sunspot anomaly data (red line). Both datasets have been normalized to a mean of zero and a standard deviation of one.

Talk about a smoothing horror show, that has to be the poster child for bad smoothing. For starters, look at what the “smoothing” does to the sunspot data from 1975 to 2000 … instead of having two peaks at the tops of the two sunspot cycles (blue line, 1980 and 1991), the “smoothed” red line shows one large central peak, and two side lobes. Not only that, but the central low spot around 1986 has now been magically converted into a peak.

Now look at what the smoothing has done to the 1958 peak in sunspot numbers … it’s now twice as wide, and it has two peaks instead of one. Not only that, but the larger of the two peaks occurs where the sunspots actually bottomed out around 1954 … YIKES!

Finally, I knew this was going to be ugly, but I didn’t realize how ugly. The most surprising part to me is that their “smoothed” version of the data is actually negatively correlated to the data itself … astounding.

Part of the problem is the use of a running mean to smooth the data … a Very Bad Idea™ in itself. However, in this case it is exacerbated by the choice of the length of the average, 11 years. Sunspot cycles range from something like nine to thirteen years or so. As a result, cycles longer and shorter than the 11 year filter get averaged very differently. The net result is that we end up with some of the frequency data aliased into the average as amplitude data … resulting in the very different results from about 1945-60 versus the results 1975-2000.

Overall? I don’t care what they end up comparing to the red line … they are not comparing it to sunspots, not in any way, shape, or form. The blue line shows sunspots. The red line shows a mathematician’s nightmare.

How about the fact that they performed the same procedure on the Parana streamflow data? Does that make a difference? Figure 4 shows that result:

normalized parana anomaly and 11 yr running meanFigure 4. Parana streamflow anomaly data (blue line), compared to the eleven-year centered moving average of the streamflow anomaly data (red line). Both datasets have been normalized to a mean of zero and a standard deviation of 1.

As you can see, the damage done by the running mean is nowhere near as severe in this streamflow dataset as it was for the sunspots. Although there still are a lot of reversals, and turning peaks into valleys, at least the correlation is still positive. This is because the streamflow data does NOT contain the ± eleven-year cycles present in the sunspot data.

Conclusions? Well, my first conclusion is that as a result of doing what the authors did, comparing the red line in Figure 3 with the red line in Figure 4 says absolutely nothing about whether the Parana river streamflow is related to sunspots or not. The two red lines have very little to do with anything.

My second conclusion is, NEVER RUN STATISTICAL ANALYSES ON SMOOTHED DATA. I don’t care if you use gaussian smoothing or Fourier smoothing or boxcar smoothing or loess smoothing, if you want to do statistical analyses, you need to compare the datasets themselves, full stop. Statistically analyzing a smoothed dataset is a mug’s game. The problem is that as in this case, the smoothing can actually introduce totally false, spurious correlations. There’s an old post of mine on spurious correlation and Gaussian smoothing here for those interested in an example.

Please be clear that I’m not accusing the authors of any bad intent in this matter. To me, the problem is simply that they didn’t understand and were unaware of the effect of their “smoothing” on the data.

Finally, consider how many rivers there are in the world. You can be assured that people have looked at many of them to find a connection with sunspots. If this is the best evidence, it’s no evidence at all. And with that many rivers examined, a p-value of 0.05 is now far too generous. The more places you look, the more chance of finding a spurious correlation. This means that the more rivers you look at, the stronger your results must be to be statically significant … and we don’t yet have even passable results from the Parana data. So as to rivers and sunspots, the jury is still out.

How about for sea level and sunspots? Are they related? I can’t do better than to direct you to the 1985 study by Woodworth et al. entitled A world-wide search for the 11-yr solar cycle in mean sea-level records , whose abstract says:

Tide gauge records from throughout the world have been examined for evidence of the 11-yr solar cycle in mean sea-level (MSL). In Europe an amplitude of 10-15 mm is observed with a phase relative to the sunspot cycle similar to that expected as a response to forcing from previously reported solar cycles in sea-level air pressure and winds. At the highest European latitudes the MSL solar cycle is in antiphase to the sunspot cycle while at mid-latitudes it changes to being approximately in phase. Elsewhere in the world there is no convincing evidence for an 11-yr component in MSL records.

So … of the 28 geographical locations examined, only four show a statistically significant signal. Some places it’s acting the way that we’d expect … other places its not. Nowhere is it strong.

I haven’t bothered to go through their math, except for their significance calculations. They appear to be correct, including the adjustment to the required significance given the fact that they’ve looked in 28 places, which means that the significance threshold has to be adjusted. Good on them 1980s scientists, they did the numbers right back then.

However, and it is a very big however, as is common with such analyses from the 1980s, I see no sign that the results have been adjusted for autocorrelation. Given that both the sunspot data and the sea level data are highly autocorrelated, this can only move the results in the direction of less statistical significance … meaning, of course, that the four results that were significant are likely not to remain so once the results are adjusted for autocorrelation.

Is there a sunspot effect on the climate? Maybe so, maybe no … but given the number of hours people have spent looking for it, including myself and many, many others, if it is there, it’s likely very weak.

My best regards to all,

w.

NOTA BENE! If you disagree with something I said, please quote my exact words, and then tell me why you think I’m wrong. Telling me things like that my science sucks or baldly stating that I don’t understand the math doesn’t help me in the slightest. If I’m wrong I want to know it, but I have no use for claims like “Willis, you are so off-base in this case that you’re not even wrong.” Perhaps I am, but we’ll never know unless you specify exactly what I said that was wrong, and what was wrong with it.

So if you want me to treat you and your comments with respect, quote what you object to, and specify your objection. It’s the only way I can know what the heck you are talking about, and I’ve had it up to here with vague unsupported accusations of wrongdoing.

DATA: Digitized Parana streamflow data from the paper plus SIDC Sunspot data and all analyses for this post are on an Excel spreadsheet here. You’ll have to break the links, they are to my formula for Gaussian smoothing.

PS—Thanks to my undersea contacts for coming up with a copy of the thirty-year-old Woodworth study, and a hat tip to Dr. Holgate and Steve McIntyre at Climate Audit for the lead to the study. Dr. Holgate is well-known in sea level circles, here’s his comment on the sunspot question:

Many people have tried to link climate variations to sunspot cycles. My own feeling is that they both happen to exhibit variability on the same timescales without being causal. No one has yet shown a mechanism you understand. There is also no trend in the sunspot cycle so that can’t explain the overall rise in sea levels even if it could explain the variability. If someone can come up with a mechanism then I’d be open to that possibility but at present it doesn’t look likely to me.

If you’re interested in solar cycles and sea level, you might look at a paper written by my boss a few years back: Woodworth, P.L. “A world-wide search for the 11-yr solar cycle in mean sea-level records.” Geophysical Journal of the Royal Astronomical Society. 80(3) pp743-755

You’ll appreciate that this is a well-trodden path. My own feeling is that it’s not the determining factor in sea level rise, or even accounts for the trend, but there may be something in the variability. I’m just surprised that if there is, it hasn’t been clearly shown yet.

I can only agree … 

0 0 votes
Article Rating

Discover more from Watts Up With That?

Subscribe to get the latest posts sent to your email.

163 Comments
Inline Feedbacks
View all comments
John West
January 25, 2014 8:11 pm

So, when will PRL be canceled?

goldminor
January 25, 2014 8:12 pm

They miss what the data truly shows with all of the mathematical gymnastics. I see very high correlation with the flood cycle of the Pacific NW. An example is the 1996/97 high water, which is the second highest on the graph. That was a semi biblical flood event in No California. These flood events occurred on the ascent after a solar minimum. The year 1964/65, a big year for the Parana. In the Pacific NW, a huge rain event that stretches from SF/Bay Area through to British Columbia. The floods occur shortly after the solar minimum and on the ascent side.The year 1955/56 shows the Parana River at a high level, and moving higher over time. In the Pacific NW, there is a massive flood, although it did not impact as large of an area as 1964/65. The 1955/56 floods occur on the middle of the ascent after the solar minimum and before the max. In 1975/76, the Parana has a high flow. In the Pacific NW there is a drought in No California, and the climate shifts at this point. This is the first year since the 20s where the 9 year flood cycle breaks. It seems to now be between 11 and 12 years between high water events. The years 1975/76 are just prior to a solar minimum. In 1984/85, the Parana River is flowing strong, and the years 1983/84 mark the highest peak recorded in the chart. The Pacific NW had strong rains in areas of the coast and the new cycle of over 11 years per flood is in place, with the following flood cycle in 1996/97. There is some high water in the Pacific NW around 2007/08. It would be interesting to see an updated graph to see if the Parana had a spike at that time. That is on the way down to the solar minimum. The correlations stretch back into the 20s from what I can see and mesh with what I know.
So once again straightforward observations and historical data would have served them well, in seeing deeper into these charts.

January 25, 2014 8:38 pm

Willis, I have enjoyed reading your stuff over the years and have rooted for you against the warmista. But, Mate, do us a favour, please, less of the sanctimonious BS … you’re better than that; and you’re coming across like a bit of a Mikey Mann.

January 25, 2014 8:40 pm

Willis: As I recall the MOTHER of all “correllation errors” has to be the wolve and moose population on Isle Royal, Lake Superior. “Closed system” obviously, and fairly good tracking of the number of wolves and moose (fly overs, good spotters, consistent methods) from the 30’s through the ’80’s AND, paper after paper after PAPER showed this WONDERFUL correlation showing the wolves controlling the number of moose, etc, yada, and so on… BUT some HERETIC like yourself, took a GOOD statistical look, and said, “PURE NONSENSE”, force a re-evaluation. Eventually a plant with a 7 year cycle of abundance and retreat, provided a vital nutrient, which controlled the fertility of the meese (haha, I know MOOSE!) …and the wolve population would more or less correlate with the amount of moose to loose…and all the preceding scholarly work became WORTHLESS. Again, the “prima facia” example of year of “academics” fooling themselves. Delicious. All I have to say is:
KEEP THROWING THOSE MONKEY WRENCHES IN THE WORKS!

GaryM
January 25, 2014 8:44 pm

Were any of the co-authors statisticians? I googled them and found nothing suggesting any of them were. Professional journals ought to require a statistician as a co-author of these papers that conflate statistics with science. But I suspect there aren’t enough statisticians to go around.

Paul Westhaver
January 25, 2014 8:58 pm

Wow…

Paul Westhaver
January 25, 2014 9:01 pm

911

January 25, 2014 9:21 pm

The correlation coefficient is r = 0.78, significant to a 99% level.

Thanks Willis. That mediocre correlation and unbelievable significance triggered my BS meter too. But unlike you, I didn’t have the faintest idea what to do about it. Well done.

george e. smith
January 25, 2014 9:56 pm

I always thought filters throw away information. Don’t see how they can add information.
If you do enough low pass filtering, you end up with a single value.
Then anything correlates with anything else.
It also seems to me that if you actually have two phenomena that really do have a physical cause and effect relationship, then the one that is the effect will generally be delayed from the one that is the cause (changes). Doing a correlation for various values of time offset, should enable that physical delay to be determined.
Funny thing Willis, is those Piranhas didn’t seem to do any such analysis.
If CO2 lags behind temperature by 800 years, How would a correlation at zero delay reveal any connection ?

Leonard Lane
January 25, 2014 10:19 pm

Willis. You mentioned some information on how smoothing introduced spurious correlations. Here is the good place to start.
Loynes, R. M. 2005. Slutzky–Yule Effect. Encyclopedia of Biostatistics.
Abstract
Smoothing a time series by forming a moving average is a commonly employed approach. In this article, some of the problems that arise are discussed, in particular, the introduction of correlations even when the observations in the original series were independent.
Your work with independent or random series supported what was said in this paper.
Take care.

January 25, 2014 10:38 pm

Were any of the co-authors statisticians? I googled them and found nothing suggesting any of them were. Professional journals ought to require a statistician as a co-author of these papers that conflate statistics with science. But I suspect there aren’t enough statisticians to go around.

January 25, 2014 11:41 pm

Can somebody direct me to the data of that river, s flow? Henry

Joe Bloggs
January 25, 2014 11:48 pm

Streetcred says:
January 25, 2014 at 8:38 pm
Willis, I have enjoyed reading your stuff over the years and have rooted for you against the warmista. But, Mate, do us a favour, please, less of the sanctimonious BS … you’re better than that; and you’re coming across like a bit of a Mikey Mann.
Well said that man, Willis gets all twitchy with the “If you disagree with something I said, please quote my exact words, and then tell me why you think I’m wrong.”
I’m sure that most here understand that game, but some here are trying to point out to you that the ‘attack dog’ writing style really does you no favours.
I’m in the same boat as Streetcred above, the lack of humility is outstanding. I have no argument with the way you want to ‘do your science’ but you could show a little less aggression in your ‘tone’ when writing your ‘science’ essays.
Maybe I should just stick to reading your well crafted life experiences and avoid the ‘shove it down your throat’ science articles.
I’m sorry Willis but the schizophrenic writing style is not so nice to digest.
yours in honest disappointment
Joe B

goldminor
January 26, 2014 12:38 am

@Willis…I always thought of the Pacific Northwest as SF/BayArea to southern British Columbia. Large storms that cross through this boundary then go on to affect states well to the east. I first heard about the 9 year flood cycle in 1971, when I moved up to the Klamath River. I knew of two of the 9 year floods from personal experience, the 1955/66 and 1964/65. In the summer of 1965 I took a Greyhound bus up to Seattle to stay with cousins for the summer. That was the summer without sun in Seattle. The bus ride took 38 hours from SF to Seattle.It was supposed to be an 18 hour route. The devastation of the flood stretched all the way to Seattle.
I just looked at a revised ssn chart that was produced by Dr Svalgaard. It has a much higher resolution than most, and I can see that the connection between ssn and high water events is not as clear. The connection with the Pacific NW high water events and the Parana River is right on, though. I saved a link for San Francisco rainfall, 1849 to present. The Parana and SF share some years 1996/97, 1982/83, 1972/73, 1941/42, 1930/31, 1911/12 with peak rain years, but there are some SF years that are moderate to low against the Parana highs. Although, I notice that for many of the Pacific coastal heavy rain events, San Francisco had a below average rainfall. The No California/Oregon/ Washington big rains in the 40s through the 60s run counter to SF rainfall, during that period, then it changes in the 70s and synchronizes for the 70s, 80s, and 90s. Never noticed that before….http://www2.ucar.edu/sites/default/files/news/2013/rainfall_chart_orig.jpg

January 26, 2014 12:51 am

I just had a chance to have a good look at that figure 1 a)
which represents the flow rate.
I suspect that the curved line in that figure 1a) is a best fit of a polynomial of the third order? Or is it a running mean of some sort? What period?
Either way, looking at that curved line I conclude:
The minimum flowrate of the Parana river was in 1953 or 1954, average..
The maximum flowrate appears to be around 1990, average.
There is no data before 1905, but it seems the curve came down from a maximum flow rate at around 1895.
Now look here:
There are good records of the flooding of the Nile, for example here:
http://www.cyclesresearchinstitute.org/cycles-astronomy/arnold_theory_order.pdf
to quote from the above paper:
“A Weather Cycle as observed in the Nile Flood cycle, Max rain followed by Min rain, appears discernible with maximums at 1750, 1860, 1950 and minimums at 1670, 1800, 1900 and a minimum at 1990 predicted.
The range in meters between a plentiful flood and a drought flood seems minor in the numbers but real in consequence….
end quote
According to my table for maxima,
http://blogs.24.com/henryp/2013/02/21/henrys-pool-tables-on-global-warmingcooling/
I calculate the date where the sun decided to take a nap (that is just a figure of speech, in fact it is probably a “wake-up”), as being around 1995, and not 1990 as William Arnold predicted.
This is looking at energy-in. I think earth reached its maximum output (means) a few years later, around 1998/1999.
Anyway, either way, (a few years error is fine!), look again at my best sine wave plot for my data,
http://blogs.24.com/henryp/2012/10/02/best-sine-wave-fit-for-the-drop-in-global-maximum-temperatures/
now see:
1900 minimum flooding – end of the warming
1950 maximum flooding – end of cooling
1995 minimum flooding – end of warming.
predicted 2035-2040 – maximum flooding – end of cooling.
There is a clear and pertinent correlation with the best fit sine wave that I proposed for the observed current drop in global maximum temperatures, both for the Parana and Nile rivers.
What causes the current decrease of these rivers’ flow is this is fairly simple: As the temperature differential between the poles and equator grows larger due to the cooling from the top, very likely something will also change on earth. Predictably, there would be a small (?) shift of cloud formation and precipitation, more towards the equator, on average. At the equator insolation is 684 W/m2 whereas on average it is 342 W/m2. So, if there are more clouds in and around the equator, this will amplify the cooling effect due to less direct natural insolation of earth (clouds deflect a lot of radiation). Furthermore, in a cooling world there is more likely less moisture in the air, but even assuming equal amounts of water vapour available in the air, a lesser amount of clouds and precipitation will be available for spreading to higher latitudes. So, a natural consequence of global cooling is that at the higher latitudes it will become cooler and/or drier.
In a cooling world such as ours now,
http://www.woodfortrees.org/plot/hadcrut4gl/from:1987/to:2014/plot/hadcrut4gl/from:2002/to:2014/trend/plot/hadcrut3gl/from:1987/to:2014/plot/hadcrut3gl/from:2002/to:2014/trend/plot/rss/from:1987/to:2014/plot/rss/from:2002/to:2014/trend/plot/hadsst2gl/from:1987/to:2014/plot/hadsst2gl/from:2002/to:2014/trend/plot/hadcrut4gl/from:1987/to:2002/trend/plot/hadcrut3gl/from:1987/to:2002/trend/plot/hadsst2gl/from:1987/to:2002/trend/plot/rss/from:1987/to:2002/trend
it will simply become wetter at the lower latitudes…..
A clever farmer living at high latitude, who already experienced drought situations, would realize that it is not going to get better for the next three decades. He would now pack up his bags and move to a place of lower latitiude.

garymount
January 26, 2014 1:15 am

goldminor says: January 25, 2014 at 8:12 pm
– – –
I have lived just north of the Pacific NW, in the Canadian Pacific SW for nearly 50 years. My knowledge of historic flooding is that the amount of flooding is related to both the quantity of the previous winters snow pack and the timing of the hot weather in spring. About 100 years ago there was a great flood that today would have wiped out thousands of homes including mine. Since that time dikes have been built, hundreds and hundreds of miles of them of which I have ridden my bike on a large number of. Just a few years ago there was a great scare of spring flooding which resulted in millions of additional dollars being spent to increase the height of the dikes. The flooding turned out to be a dud, with levels even below average or at least nothing to write home about. I did get a lovely extra height for my bike rides for better sight seeing, at taxpayer expense. But I suppose its insurance for any future potential flooding. A couple of years ago, the Fraser river and Pitt river were at high levels, so much so that a tower holding power lines for crossing the Fraser was knocked out of commission and there were fish on my bike path, at the location where it passes below the railroad tracks.
Perhaps not a lot of science in my comment, just anecdotal observations. However there is more to flooding in this area than just the amount of rain.

Greg Goodman
January 26, 2014 1:19 am

Excellent Willis. This is perfect example of the kind of garbage that can result from these ubiquetous running mean “smoothers”. In fact I’ve never seen such whole scale inversion. It would have made a ideal example for my article on running mean distortion on Judith’s Climate Etc.
http://judithcurry.com/2013/11/22/data-corruption-by-running-mean-smoothers/
I’m glad the issue is getting some coverage.This may be a bit of an odd-ball paper but this kind of filtering is de rigeur in climate science. There seems to be barely a paper that does not use it somewhere and of course the processing of the “gold standard” hadSST dataset uses it over adjacent grid cells in an iterative loop to determine their background climatology.
The other main application is our friend the monthly average which, is mathematically equivalent to using a monthly running before resampling monthly intervals, whereas correct processing would require a 2 month anti-alias filter.
Most of the current data processing being done climate science is doing more to ensure that they do not identify any natural periodic forcing than anything else. But that probably plays to the “consensus” view that it’s all stochasic ‘internal’ variation plus CO2.
Bias confimation at work.
I used SSN in my article as an example of the effect of the monthly running mean. It is used in determining the date of the “peak” of each solar cycle. In the current cycle it finds the peak to be in the month that has the lowerest SSN for the last 2.5 years !!

Ox AO
January 26, 2014 1:31 am

Willis said, “I can’t understand his method.”
Neither do I. But if you look at the Normalized Sunspot Anomaly and 11-year Running Mean if you invert the blue line Sunspot Anomaly it close to me. Wills why don’t you ask them first?