Noise Assisted Data Analysis

Guest Post by Willis Eschenbach

Once again, Dr. Curry’s “Week in Review-Science and Technology” doesn’t disappoint. I find the following:

Evidence of a decadal solar signal in the Amazon River: 1903 to 2013 [link]  by Antico and Torres

So I go to the link, and I find the abstract:


It has been shown that tropical climates can be notably influenced by the decadal solar cycle; however, the relationship between this solar forcing and the tropical Amazon River has been overlooked in previous research. In this study, we reveal evidence of such a link by analyzing a 1903-2013 record of Amazon discharge. We identify a decadal flow cycle that is anticorrelated with the solar activity measured by the decadal sunspot cycle. This relationship persists through time and appears to result from a solar influence on the tropical Atlantic Ocean. The amplitude of the decadal solar signal in flow is apparently modulated by the interdecadal North Atlantic variability. Because Amazonia is an important element of the planetary water cycle, our findings have implications for studies on global change.

The study is paywalled, but to their credit they’ve archived the data here as an Excel workbook. Let me start where I usually start, by looking at all of the raw data, warts and all.

Amazon River Flow Data 1902 2014

Figure 1. Monthly average Amazon river flow (thousands of cubic metres per second). The violet colored sections are not observations. Instead, they are estimations based on the river levels in two locations on the Amazon.

Now to me, that’s a big problem right there. One violet section is based on river levels at one location, and the other violet section is based on river levels from another location. It’s clear from the annual average (red/black line) that the variance of those two river level datasets are very different. One river level dataset has big swings, the other has small swings … not good. So first I’d say that any results from such a spliced dataset need to be taken, as the old Romans said, “cum grano salis” …

Setting that question of spliced data aside, I next looked at the periodogram of the data. This shows the strength of the signal at various periods. If the ~11-year solar cycle is affecting the river flow, it will show a peak in the 11-year range.

Periodogram Amazon River full and halfFigure 2. Periodogram of the monthly Amazon river flow data shown in Figure 1. 

It appears at first blush as if there is a very small 11-year signal in the full data (black), about 6% of the total range of the overall data swing. But when we split the data into the first half and the last half (red and blue), the 11-year signal disappears. This is not at all uncommon in observational datasets. Apparent cycles are often just the result of the analysis method averaging a changing signal.

Next, in Antico2015, the authors use the annual average data. To me, this is a poor choice. If you wish to remove the annual fluctuations, that’s fine … but using annual average data cuts your number of data points by a factor of 12. And this can lead to spurious results by inflating the apparent significance. But let us set that aside as well.

Finally, there is no statistically significant correlation between sunspots and Amazon river flow levels at any lag (max. monthly correlation ~ 0.1, p-value = 0.3 …).

Having seen that, my next step was to see how the authors of Antico2015 decided that there was a solar signal in the Amazon. And this was a most fascinating voyage. The best thing about climate science is that there is no end of the opportunities to learn. In this case, I learned from the Supplemental Online Information that they were using a method I’d never heard of, ensemble empirical mode decomposition, or EEMD. It’s one of many methods for decomposing a signal into the sum of other signals. Fourier analysis is the best known type of signal decomposition, and I’ve written before about the “periodicity” decomposition of Sethares, but there are other methods..

The details of EEMD are laid out by its developers in a paper called “Ensemble Empirical Mode Decomposition: A Noise Assisted Data Analysis Method” (hereinafter “EEMD2005”) … how could a data junkie like myself not like something called “noise assisted data analysis”?

The concept itself is quite simple. First, you identify local maxima and minima. See e.g. Figure 3 Panel b below, from the EEMD2005 paper, that shows the local maxima.

EEMD Process

Figure 3. Graphic explaining the EEMD process, from the EEMD2005 paper. ORIGINAL CAPTION: The very first sifting process. Panel a is the input; panel b identifies local maxima (red dots); panel c plots the upper envelope (red) and low envelope (blue) and their mean (black); and panel d is the difference between the input and the mean of the envelopes.

Then after you identify local maxima (panel b) and local minima (not shown), you draw two splines, one through the local maxima and the other through the local minima of the dataset (red and blue lines, panel c). The first component C1 is the difference between the data and the local mean of the two splines (panel d).

Then you take the resulting empirical mode C1 as your dataset and do the same—you draw two splines, one through the local maxima and the other through the local minima of C1. The second component C2 is again the difference between the data and the local mean of those two splines.

Repeat that until you have a straight line.

How do you aid that with noise? Well, you repeat it a couple thousand times using the original data plus white noise, and you average the results. According to the paper, this acts as a bank of bandpass filters, and prevents the mixing of very different frequencies in any one component of the decomposition. What do I know, I was born yesterday … read the paper for the math and the full explanation.

In any case, when they use EEMD to decompose the Amazon flow data, here’s what they get. Each panel shows the resulting curve from each step in the decomposition.

Amazon Antico EEMDFigure 4. This shows Figure S1 from the Supplementary Online Information of the Antico paper. ORIGINAL CAPTION: (Left) Annual mean (October-September) Amazon flow record at Obidos station, its oscillatory EEMD modes (C1-6), and its residual trend. (Right) Raw periodograms of flow modes. In these power spectra, the frequency band of the decadal sunspot cycle, at 1/13 to 1/9 cycles per year, is depicted by the shaded region, and the oscillatory period of the most prominent spectral peak of C3 is given in years. In the left panels, the fraction of total variance accounted by each mode is shown in parentheses. For a particular mode, this fraction is the square of the Pearson correlation coefficient between the mode and the raw data record. The sum of these fractions may be greater than 100% because EEMD is a nonlinear decomposition of data; therefore, the EEMD modes are not necessarily linearly independent. To obtain the EEMD decomposition of the annual mean flow record, we considered an ensemble number of 2000, a noise amplitude of 0.6 standard deviations of the original signal, and 50 sifting iterations.

This is a curious kind of decomposition. Because of the use of the white noise, each panel in the left column shows a curve that contains a group of adjacent frequencies, as shown in the right column. No panel shows a pure single-frequency curve, and there is significant overlap between the groups. And as a result of each panel containing a mix of frequencies and amplitudes, each curve varies in both amplitude and frequency over time. This can be seen in the breadth of the spectral density plots on the right.

For the next obvious step, I used their data and variables, and I repeated their analysis.

Amazon EEMD analysis mineFigure 5. My EEMD analysis of the Amazon river flow. Like the paper, I used an ensemble number of 2000, a noise amplitude of 0.6 standard deviations of the original signal, and 50 sifting iterations maximum.

I note that while my results are quite similar to theirs, they are not identical. The intrinsic modes C1 and C2 are apparently identical, but they begin to diverge starting with C3. The difference may be due to pre-processing which they have not detailed in their methods. However, I tried prefiltering with a Hanning filter, it’s not that. Alternatively, it may have to do with how they treat the creation of the splines at the endpoints of the data. However, I tried with end conditions of “none”, “wave”, “symmetric”, “periodic”, and “evenodd”. It’s none of those. I then tried an alternative implementation of the EEMD algorithm. The results were quite similar to the first implementation. Finally, I tried the CEEMD (complete ensemble empirical mode decomposition) method, which was nearly identical to my analysis shown above in Figure 5 .

I also could not replicate their results regarding the periodograms that they show in their Figure S1 (shown in Figure 4 above), although again I was close. Here are my results:

periodograms of flow modes amazon riverFigure 6. Periodograms of the six flow modes, Amazon River data. 

This makes it clear how the modes C1 to C6 each contain a variety of frequencies, and how they overlap with each other. However, I do not see a strong signal in the 9-13 year range in the intrinsic mode C3 as the authors found. Instead, the signals in that range are split between modes C2 and C3.

Now, their claim is that because mode C3 of the intrinsic modes of the Amazon River flow contains a peak at around 11 years (see Figure 4 above), it must be related to the sunspot cycle … while I find this method of decomposing a signal to be quite interesting, I don’t think it can be used in that manner. Instead, what I think is necessary is to compare the actual intrinsic modes of the Amazon flow with the intrinsic modes of the sunspots. This is the method used in EEMD2015. Here are the modes C3 of the Amazon flow and of the sunspots:

eemd analysis amazon flow sunspot C3Figure 7. Raw data and intrinsic empirical mode C3 for the Amazon (top two panels) and for the sunspots (bottom two panels)

Now, it is true that intrinsic modes C3 of both the sunspot and the Amazon data contain a signal at around the general sunspot frequency. But other than that, the two C3 modes are quite dissimilar. Note for example that the sunspot mode C3 is phase-locked to the raw data. And in addition, the sunspot C3 amplitude is related to the amplitude of the raw sunspot data.

But to the contrary, the Amazon mode C3 goes into and out of sync with the sunspots. And in addition, the amplitude of the Amazon mode C3 has nothing to do with the amplitude of either the sunspot data or the sunspot C3 mode.

This method, of directly comparing the relevant intrinsic modes, is the method used in the original EEMD2005 paper linked to above. See for example their Figure 9 showing the synchronicity of the intrinsic modes C3 – C7 and higher of the Southern Ocean Index (SOI) and the El Nino Cold Tongue Index (CTI).

I find this to be a fascinating way to decompose a signal. It is even more interesting when all of the intrinsic modes are plotted to the same scale. Here are the sunspot intrinsic modes to the same scale.

Sunspot EEMD analysis true size mineFigure 8. EEMD analysis of the annual mean sunspot numbers. All panels are printed to the same scale.

Note that the overwhelming majority of the information is in the first three intrinsic modes. Beyond that, they are nearly flat. This is borne out by showing the periodograms to the same scale:

periodograms intrinsic modes sunspotsFigure 9. Periodograms of the EEMD analysis of the annual mean sunspot numbers. All panels are printed to the same (arbitrary) scale.

Now, this shows something fascinating. The EEMD analysis of the sunspots has two very closely related intrinsic modes. Mode C2 shows a peak at ten or eleven years, plus some small strength at shorter periods. Mode C3 shows a smaller peak at the same location, ten or eleven years, and an even smaller peak at sixteen years. This is interesting because not all of the strength of the ~ eleven-year sunspot signal falls into one intrinsic mode. Instead it is spread out between mode C2 and mode C3.

DISCUSSION: First, let me say that I would never have guessed that white noise could function as a bank of bandpass filters that automatically group related components of a signal into a small number of intrinsic modes. To me that is a mathematically elegant discovery, and one I’ll have to think about. Unintuitive as it may seem, noise aided data analysis is indeed a reality.

This method of signal decomposition has some big advantages. One is that the signal divides into intrinsic modes, which group together similar underlying wave forms. Another is that as the name suggests, the division is empirical in that it is decided by the data itself, without requiring the investigator to make subjective judgements.

What is most interesting to me is the showing by the authors of EEMD2005 that EEMD can be used to solidly establish a connection between two phenomena such as the Southern Ocean Index (SOI) and the El Nino Cold Tongue Index (CTI). For example, the authors note:

The high correlations on interannual and short interdecadal timescales between IMFs [intrinsic mode functions] of SOI and CTI, especially in the latter half of the record, are consistent with the physical explanations provided by recent studies. These IMFs are statistically significant at 95% confidence level based on a testing method proposed in Wu and Huang (2004, 2005) against the white noise null hypothesis. The two inter-annual modes (C4 and C5) are also statistically significant at 95% confidence level against the traditional red noise null hypothesis.

Indeed, Jin et al. (personal communications, their manuscript being under preparation) has solved a nonlinear coupled atmosphere-ocean system and showed analytically that the interannual variability of ENSO has two separate modes with periods in agreement with the results obtained here. Concerning the coupled short interdecadal modes, they are also in good agreement with a recent modeling study by Yeh and Kirtman (2004), which demonstrated that such modes can be a result of a coupled system in response to stochastic forcing. Therefore, the EEMD method does provide a more accurate tool to isolate signals with specific time scales in observational data produced by different underlying physics. SOURCE:EEMD2005 p. 20

Now of course, the question we are all left with at the end of the day is, to what extent do these empirical intrinsic modes actually represent physical reality, and to what extent are they merely a way to mathematically confirm or falsify the connections between two datasets at a variety of timescales? I fear I have no general answer to that question.

Finally, contrary to the authors of the paper, I would hold that the great disparity between all of the intrinsic modes of the Amazon flow data and of the sunspot data, especially mode C3 (Fig. 7), strongly suggests that there is no significant relationship between them.

Always more to learn … I have to think about this noise assisted data analysis lark some more …


My Usual Request: If you disagree with me or anyone, please quote the exact words you disagree with. I can defend my own words. I cannot defend someone’s interpretation of my words.

My New Request: If you think that e.g. I’m using the wrong method on the wrong dataset, please educate me and others by demonstrating the proper use of the right method on the right dataset. Simply claiming I’m wrong doesn’t advance the discussion.

Data: Available as an Excel workbook from the original article.

Code: Well, it’s the usual ugly mish-mash of user-aggressive code, but it’s here …  I used two EEMD implementations, from the packages “hht” and “Rlibeemd”. If you have questions about the code, ask …

0 0 votes
Article Rating
Newest Most Voted
Inline Feedbacks
View all comments
December 10, 2015 8:09 pm

Off the cuff and without working any examples, maybe the addition of white noise seems to assist problems of the type of subtraction of 2 large numbers whose very small difference is the sought effect. Add noise again and again, then each subtraction of 2 noisier numbers than the originals gives a range of noisy difference numbers, which is given a statistical analysis purporting to show confidence estimates and a best estimate of mean or median or whatever. Sorry, this is an abstract comment and one should not be too much led by the ‘feel’ of data and rushing it into print.
It is like satisfaction of entropy conditions. No amount of added white noise should modify an outcome. Surely the best outcome is there in the raw data, insensitive to how much noise is added.

Reply to  Willis Eschenbach
December 11, 2015 1:20 am

Photographers used to use an unsharp mask to sharpen images and to restore movies. Old Westerns were probably restored this way.
I found a similar technique online years ago that I use to bring up murky images and to process faded text using Photoshop,
I convert image to RGB color or to greyscale and creat a duplicate image. I then add gaussian blur (noise) to the original. Next I modify the duplicate by “hard light” or “vivid light” and then filter using Hi-pass. Finally I adjust the opacity of the duplicate layer and merge the two layers. The result is a non-destructive sharpening superior to an unsharp mask.
After using this technique I have been able to use OCR on documents that seemed to require retyping.
The key to sharpening is the gaussian blur which first makes the image unsharp, an approach that seems to me to have something in common with EEMD.
If after submerging your data in noise, you remove the noise, what you will be left with is the signal. And if there are two signals both should be recoverable. The noise in the original data will be washed away along with the noise you added.

Reply to  Willis Eschenbach
December 11, 2015 3:02 am

Addition of noise helps when you are applying a non linear – an extreme non linear – function to a sample set.
For example when digitising sound, noise is added so that sub-bit data doesn’t vanish, but becomes statistically represented by an average over time.
that is, if you have a meter than consist of say – a red light when the voltage is over 1V and a green light when its less, without noise a steady value will result in a steady lamp. Adding random noise of a peak of +-1V or so and sampling over time will result in some red lights and some green, and the ratio of the two will
be the actual voltage.
Apart from that adding random noise should not actually make any difference at all.
This leads me to suspect that in this case they are applying some non linear function. And that I am afraid puts the whole of the approach in a suspect place.
All too much research these days, and indeed the whole AGW thing is an extreme case of what we, as apprentices, used to call ‘BBB’.
Bullshit Baffles Brains
The technique of constructing complex descriptions and explanations that no one understood, but which sounded plausible if well presented, and could be guaranteed to fool all the wannabe clever clogs and know-it-alls. Who were placed in the position of either having to admit they didn’t understand a word of it, or agreeing with it.
This has all the hallmarks of a superb piece of BBB.

Reply to  Willis Eschenbach
December 11, 2015 11:03 am

people should remember this when they make blanket statements about the priority of raw data

Reply to  Willis Eschenbach
December 11, 2015 11:10 am

Frederick Colbourne: Photographers used to use an unsharp mask to sharpen images and to restore movies. Old Westerns were probably restored this way.
I only quote the opening lines to identify which post I am responding to. That was a most interesting post that you wrote. Thank you.

Reply to  Willis Eschenbach
December 11, 2015 6:31 pm

Geoff Sherrington speculated: Off the cuff and without working any examples, maybe the addition of white noise seems to assist problems of the type of subtraction of 2 large numbers whose very small difference is the sought effect.
I have subtracting one two-dimensional matrix from another of the same dimensions: satellite images of the same scene each of which had several hundred thousand data points.
Taking the difference between only two numbers can create serious problems: such as when the two numbers represent incoming and outgoing radiation at the top of the atmosphere and the difference is the net radiative imbalance of the Earth.
I demonstrate this below with reference to a series of papers by NASA scientists and their colleagues in other government agencies and the private sector.
My understanding of Stephens et al. The authors pointed out that the net energy balance is the difference between incoming and outgoing radiation, two numbers around 340 Wm-2. The authors pointed out that the energy imbalance from ocean heat content (OHC) was only 0.6 Wm-2. Since both numbers (incoming and outgoing) have errors, the figure 0.6 divided by 680 is the relevant statistic. Radiation at the top of the atmosphere has to be measured with a precision and accuracy of about 0.1 of 1%, about one part in a thousand, something that these NASA scientists and their colleagues state is not possible with present technology.
“The net energy balance is the sum of individual fluxes. The current uncertainty in this net surface energy balance is large, and amounts to approximately 17 Wm-2. This uncertainty is an order of magnitude larger than the changes to the net surface fluxes associated with increasing greenhouse gases in the atmosphere (Fig. 2b). The uncertainty is also approximately an order of magnitude larger than the current estimates of the net surface energy imbalance of 0.6 ±0.4 Wm-2 inferred from the rise in OHC. The uncertainty in the TOA net energy fluxes, although smaller, is also much larger than the imbalance inferred from OHC.
Stephens, Graeme L., et al. “An update on Earth’s energy balance in light of the latest global observations.” Nature Geoscience 5.10 (2012): 691-696.
My understanding of Loeb et al: The authors worked to correct errors in the standard values for: TSI; the ratio of Earth’s surface area to the disc presented to the Sun (taking into account the flattening at the poles and the fuzziness at the terminator; albedo; etc).
My estimates of precision and accuracy effects: if incoming energy is 340 Wm-2, then with albedo 0.300 then net incoming energy would be 238 Wm-2, whereas with albedo 0.305 the net incoming would be 236.3 Wm-2, a difference of 1.7 Wm-2 about 3 times the rate estimated from OHC. An error this great (1.7%) could signal a bigger difference in energy imbalance than that between 1650 and 1950 (the Maunder Minimum and the date at which CO2 began its steep increase. NASA scientists and their colleagues have said that the accuracy and precision of their satellite data (from Ceres) is insufficient to support firm statements about whether the Earth is warming or cooling.
Loeb et al (2012) summarized the combined effect of all errors found. When estimates of solar irradiance, SW and LW TOA fluxes are combined, taking account of +0.85+/-0.15 Wm-2 heat storage by the oceans (Hansen’s 2005 estimate), the possible range of TOA flux becomes Minus 2.1 to Plus 6.7 Wm-2, Based on well-established physical theory, the instruments tell us that net radiative flux is either positive or negative. The Earth is either radiating more energy than it receives or less energy than it receives.
Loeb, Norman G., et al. “Observed changes in top-of-the-atmosphere radiation and upper-ocean heating consistent within uncertainty.” Nature Geoscience 5.2 (2012): 110-113.
Updated from this earlier paper: Loeb et al, Toward Optimal Closure of the Earth’s Top-of-Atmosphere Radiation Budget, Journal of Climate, 2009.
My understanding of work by James Hansen et al: James Hansen’s reported estimate of energy imbalance from OHC 0.85 Wm-2 in 2005 declined to 0.58 Wm-2 in 2011. This figure was refined to 0.5 Wm-2 by Loeb et al in 2012.
“Earth has been steadily accumulating energy at a rate of 0:50+/-0:43 Wm-2 (uncertainties at the 90% confidence level). We conclude that energy storage is continuing to increase in the sub-surface ocean.”
Problem here is that, if these numbers mean anything at all, they seem to indicate that the estimates of energy imbalance seems to have been decreasing from 2005 to 2013, from 0.85 to 0.58 to 0.43 Wm-2.
But looking at the error bars and reading the reservations concerning the uncertainties, I do not believe the numbers tell us much for certain and I believe that the authors are trying to tell us this the best way they know how without being subject to ostracism.
I think we owe these people a debt of gratitude for being as candid as they could be given their personal circumstances.
(So much for taking the difference between two big numbers, something our professors used to warn us about in the days when we still used slide rules.)

Reply to  Willis Eschenbach
December 13, 2015 6:55 pm

Frederick Colbourne wrote: “I found a similar technique online years ago that I use to bring up murky images and to process faded text using Photoshop”
What a great example, and comparing it with the technique of adding noise to noisy data before “sharpening” it makes lots of sense. It gave me a way to visualize the process I’d never had before.
Linear filters assume random noise. If you have patterned noise in the signal they won’t work very well. Adding a gaussian blur makes the noise more random (normalizes it) around the mean without changing the mean. Taking the least squares fit (or some equivalent linear filter) after randomizing the noise should result in a much sharper signal. It would be a great improvement when you were dealing with a signal that contains patterned, non-random noise. Very elegant!
Maybe that’s not exactly how it works, but it certainly makes “visual” sense to me and I’ve also had fair experience both filtering noise from data and from photos. Thanks for describing that method.

Mark T
Reply to  Geoff Sherrington
December 11, 2015 1:51 am

Adding noise is not uncommon at all, particularly to avoid singularity issues with matrix inverse problems (even if the inverse is shortcut somehow). Decomposition such as these often involve matrix inversions.
Admittedly, I have not read your exposition in detail – I’m in a bar making a detailed analysis difficult, but I wanted to point this out.

george e. smith
Reply to  Geoff Sherrington
December 11, 2015 9:15 am

And in this case half of the raw data is just totally made up out of whole cloth.
The Amazon river system is an enormous assemblage of rivers and streams and creeks, and who knows what else, and it is a living organism. Everything moves around or is moved around by local rains, and flash floods and human activities, and the idea that the water flow rate at the mouth, into the Atlantic ocean can be estimated by measuring a local water level in a single spot, is just asinine. Yes they measured two spots; whoopee !
Do these EEMD geniuses have a mathematics background that might extend to the general theory of sampled data systems. Would they perhaps have any knowledge of the Nyquist criterion, and what it might have to say about the validity of vastly under-sampled “data” samples of a continuous function (and probably a continuous function of multiple variables.
But I’m with you Willis, I was not impressed with the obviousness of their analysis discovery of a previously unknown effect. It stood out like a black cat in a cellar at midnight.

Gloateus Maximus
Reply to  george e. smith
December 11, 2015 10:50 am

IMO finding a statistically significant correlation at even one spot would be meaningful, if it were on the main stem, ie downstream from either Iquitos, Peru or Manaus, Brazil, depending upon whose definition of main stem you adopt, Brazil’s or everyone else’s.

george e. smith
Reply to  Geoff Sherrington
December 11, 2015 10:30 am

Well your example of subtracting two large numbers to obtain the value of their small difference, is about what Lord Rutherford was talking about, when he famously said: “If you have to use statistics, you should have done a better experiment.” which in your case would be to alter the experiment so as to directly observe the difference quantity. There can also be mathematical solution to remove the problem completely.
For example, If I have a sphere of radius (R), and I cut it with a plane giving an intercept circle of radius (r), I can calculate the “sag” of the spherical cap using Pythagoras, giving: s = R – sqrt(R^2 – r^2)
Now if I want to know the sag of a one meter radius cap cut from a one km radius sphere, you can see I have your small difference problem.
We encounter exactly this problem in optics with the sag of lens or mirror surfaces, when R >>> r, and we end up at a dead end if the sphere is actually a plane so R = infinity. How do I enter infinity as the sphere radius value in my computer, or what if my computer wanted to pick an infinity radius planar surface itself.
So we don’t use the formula above at all.
I can rewrite that equation as:
s = [ (R- sqrt(R^2-r^2)).(R+sqrt(R^2-r^2)) ] / [R+sqrt(R^2+r^2]
= (R^2 – (R^2-r^2)) / (R + sqrt(R^2-r^2) = r^2 / (R + sqrt (R^2-r^2))
So I have exchanged the small difference of two large numbers for the sum of those same two numbers.
Then I can multiply top and bottom by C = 1/R and get.
s = Cr^2 / (1+sqrt(1-C^2.r^2))
So I now have an exact equation for the sag no matter how large the sphere radius (R) and I can enter C = 0 to take care of that pesky case of infinite radius.
As a practical matter, we use curvatures of optical surfaces rather than radii, because the optical power varies directly with the curvature, and not the radius.
That final sag equation can be further modified with the addition of a parameter K (kappa), called the conic constant, to give the sag for a conic section such as ellipse, parabola or hyperbola. The value of K is simply – (e^2) where (e) is the eccentricity of the conic section.
s = Cr^2 / (1+sqrt(1-(1+K)C^2.r^2))
Hyperbola K < -1; Parabola K= -1 Ellipse -1<K0
You can’t calculate Amazon river flow or global Temperature from this equation.
I’m not in favor of adding noise to anything, and then believing that some signal I might ” recover ” is the actual signal that might have been there, without the added noise.
Some ” noisy ” signals, actually contain amplitude dependent noise, so the noise content is a non linear function of the real signal amplitude. Adding white noise is going to corrupt that relationship, and give a systematic error in any recovered “signal”.

Reply to  george e. smith
December 13, 2015 7:12 pm

George Smith wrote: “I’m not in favor of adding noise to anything, and then believing that some signal I might ” recover ” is the actual signal that might have been there, without the added noise.”
I would argue that, in situations that you can be reasonably certain the noise *should* be random, or when the noise filter you’re using assumes random noise, adding noise to it to force it to be random will result in improved signal recovery, as exemplified by Frederick’s Photoshop example. That only makes sense when you have good reason to believe, by way of experimental evidence, that the signal you’re extracting experiences random noise. In that case what you’re really doing is normalizing systematic non-random noise that’s produced by your instruments or collection methods to make it compatible with the assumptions of your filter.

Reply to  george e. smith
December 13, 2015 7:20 pm

Sorry for the redundant “exemplified by … example.” Should have been “demonstrated by … example”.
Also, it’s important to remember that whenever we take the mean of time series we’re removing noise to extract the real signal. The assumption built into that moment (the mean) s that noise is normally distributed. If that isn’t the case, using the mean to represent the signal is statistically invalid to begin with. Normalizing the noise is a concession made to your instruments. When you have some reason to believe the noise isn’t fundamentally normal, you probably shouldn’t be using the mean (least squares regression, etc.) in the first place.

December 10, 2015 8:19 pm

“Simply claiming I’m wrong doesn’t advance the discussion.”
No but then sometimes a discussion isn’t warranted. A dismissive comment goes a long way towards helping the writer understand his work is not worth much.
In this case, I did kinda like this post. I like the fact that a certain amount of humility was present and that you didn’t try to one up the authors nor especially become critical when you were not able to understand what they were doing.
Progress. We are making progress.

Reply to  Willis Eschenbach
December 11, 2015 2:16 am

“why on earth should anyone pay the slightest attention to you?”
Usually because Willis is wrong. Even you had to admit that you fubared the units on somethings as simple as the evaporation of water.

Alan Robertson
Reply to  Dinostratus
December 11, 2015 3:38 am

One thing is certain, Dinostratus. All you’ve done is play the man and not the ball. Get off the field, you’re holding up the game.

Reply to  Dinostratus
December 11, 2015 4:20 am

…Strrrrrrrrrike three, you’re out !!!

Reply to  Dinostratus
December 11, 2015 9:39 am

You’ve never demonstrated any error in Willis’s writing.
As with most CAGW trolls, insult and significant hubris define your actions.
Dino: Show your work! Ya got evidence that any of Willis’s final posts are erroneous? Out with it!

Reply to  Dinostratus
December 11, 2015 9:39 am

+1 Marcus!

December 10, 2015 9:09 pm

Very non-intuitive. It seems akin to pouring sand into a telescope to see Mars better. Does this addition of white noise dilute the autocorrelation of the signal of interest? Or is it some other form of juju? Is a puzzlement…

December 10, 2015 9:28 pm

Willis –
This addition of noise sounds crazy of course. BUT – it does remind me of the so-called “stochastic resonance” apparently used by some biological creatures (crayfish?) and is related to the classic engineering use of “dither” (to achieve a resolution of data BELOW the least significant bit), typically in digital audio. Possibly relevant. Being so counter-intuitive such measures constitute extraordinary claims and require an extraordinary degree of explanation.
But interesting. Thanks.

Keith Minto
Reply to  Bernie Hutchins
December 10, 2015 10:23 pm

Yep, adding white noise to an audio signal randomises error in digital processing making it more uniform and less distracting. Good (audio) explanation (5 minutes) and, more broadly, wiki
I understand that the human ear does something similar to low level signals, wiki says “The human ear functions much like a Fourier transform, wherein it hears individual frequencies.[8] The ear is therefore very sensitive to distortion, or additional frequency content that “colors” the sound differently, but far less sensitive to random noise at all frequencies.” but I am not clear about this, sounds like a fascinating topic.

Reply to  Keith Minto
December 10, 2015 11:11 pm

Keith –
thanks for the great reference to Nigel’s superb video on dither. I remember him from many years back, but never saw that.
I think that the Wu/Hwang reference suggests that their method really is stochastic resonance but with repetition and averaging out of noise (more or less straightforward). Isn’t this what the ear does with dithering of consecutive cycles and the formation (somehow) of an overall audio impression?

Reply to  Keith Minto
December 10, 2015 11:20 pm

Willis –
I think it really is stochastic resonance. I don’t see it as a “filter bank” in any common useage. On the other hand, I have always disliked the term resonance as it is used in “stochastic resonance” – no filtering there either.
Looks worth exploring with Matlab toys.

Joe Born
Reply to  Keith Minto
December 11, 2015 2:28 am

Thanks for the links.
The video was particularly interesting to me because digital silk-screening had occurred to me as an analogy (or, depending on how you look at it, an example) when you guys mentioned dither, and, voila! that’s what the video used.

Stemboat McGoo
Reply to  Bernie Hutchins
December 10, 2015 10:45 pm

Exactly, Bernie. Stochastic resonance occurred to me immediately when I saw the article title.

Joe Born
Reply to  Bernie Hutchins
December 11, 2015 7:21 am

Bernie Hutchins:
“Stochastic resonance” falls outside my experience set, but to me it appears that you and Mr. Eschenbach may be agreeing in substance but not in nomenclature. Here’s what I inferred from the paper:
In basic empirical mode decomposition (“EMD”), each of the so-called intrinsic mode functions (“IMFs”) into which the signal is decomposed can be thought of as a local-AC component of the previous IMF, leaving a local-DC residue. How “local” the local DC is can vary along the input record; since the “DC” signal is made of splines through the maxima and minima, the degree to which the “DC” is local depends on how far apart in time the local minima’s and maxima’s occurrences are. So removal of a given IMF may remove only the very high-frequency components from one portion of the record but remove even fairly low-frequency components from another.
Nonetheless, each successive IMF removes more AC until there’s only a single global maximum and a single global minimum left in the residue, “DC” signal.
That’s the basic EMD method. Ensemble empirical mode decomposition (“EEMD) adds white noise, thereby making the time differences between maxima and minima relatively uniform and thus making successive IMFs’ frequency ranges overlap less than would otherwise be the case.
So, when the authors talk about a “dyadic filter bank,” they’re talking about an aspect of the basic EMD method can be thought of as being approached very roughly by basic EMD and more closely by EEMD.
How well that matches one’s concept of filter is in the eye of the beholder, I suppose.

December 10, 2015 9:45 pm

Many results from Information Theory sound counter intuitive.

Carl Chapman
December 10, 2015 10:28 pm

They look for an 11.5 year cycle, but the solar cycle varies in length. Would they get more useful results by calculating the years of the solar maxima, then calculating for each year what % of the way through a solar cycle that year was, then seeing if there is a correlation between river flow and % of the way through the solar cycle? From looking at the graphs, I can see any obvious correlation, but maybe for variables other than Amazon flow, that would be a more useful way of looking for a correlation with the solar cycle.

Lewis P Buckingham
December 10, 2015 10:38 pm

Is there a possible response from Dr Curry?
Would the more passes made through the noise added filter actually add error to the original data due to floating point calculations?

Reply to  Lewis P Buckingham
December 11, 2015 1:17 am

I second the nomination. If you cannot duplicate her results, I think you at least owe her the courtesy of an inquiry.

Reply to  wxobserver
December 11, 2015 2:24 am

Dr Curry?

Reply to  wxobserver
December 11, 2015 4:12 pm

Ooops…fingers too fast on the keys. What I meant to say is that it would be a good idea to check with the paper’s authors if you cannot duplicate their results. Perhaps they don’t respond, but at least you can say you tried.

Reply to  wxobserver
December 11, 2015 6:43 pm

You got it. I just sent an e-mail inquiry off to the address listed for the first author (Antico). We’ll see what if anything happens.

Reply to  wxobserver
December 14, 2015 4:48 pm

I just got a reply from one of the authors, Andres Antico. He sent me a link to some Matlab code and I was able to easily duplicate his results. The link to that code is below. You will need to edit the eemd.m file and change “iter<=10" to "iter<=50" on line 95.
The author pointed out that you need to fix the sifting iteration count at 50 in order to duplicate his result.
Load the annual mean flow data into a variable (e.g. "flow") and then run this:
It worked like a charm for me.
Finally, the author seems like a very personable fellow — I'm sure you could have a fruitful exchange with him if you were to reach out.

Reply to  wxobserver
December 15, 2015 12:31 pm

I was going to guess you didn’t have Matlab. I tried to tweak the code to run in FreeMat but it was using a lot of intrinsic Matlab functions not available in FreeMat; I gave up.
I used the same data you did — from the Excel spreadsheet in the supplemental data, the second page with annual mean flows. It will take me a few hours but I’ll re-run the analysis and put up some plots and try to duplicate the periodogram — if I can’t I’m sure the author will offer some tips.
P.S. The author, Antico stated in his e-mail that one reviewer was able to duplicate his results so they seem to have dotted i’s and crossed t’s there.

Reply to  wxobserver
December 15, 2015 5:04 pm

Okay, I have managed to duplicate the periodograms also. Not exactly but very close, close enough that I don’t think it’s worth contacting the author for his method. You can see plots of the duplicated IMF modes and periodograms here:
There’s one file with a logarithmic X-axis in frequency (cycles per year) and another with a linear X-axis in period (years/cycle).
Here’s how I did the periodograms. First there’s not enough data points to get much frequency resolution, so I zero-extended the data out to 1024 points, then took the magnitude-squared of the DFT with appropriate amplitude scaling. The peak on C3 is near 10.7 year period (closer to 10.8 in my result but that’s close enough I think).
I also applying a window to the data before zero-extending…results very similar for C3 at least.
If you don’t zero-extend the data, there’s still a peak there but the low resolution means the peak is at about 11 years instead of 10.7

Reply to  wxobserver
December 15, 2015 9:00 pm

wxobserver wrote December 15, 2015 at 5:04 pm in part:
“,,,, Here’s how I did the periodograms. First there’s not enough data points to get much frequency resolution, so I zero-extended the data out to 1024 points,…..
Zero padding does NOT increase resolution. It interpolates the spectrum smoothly, but does not make it possible to see any new frequency points. Basically the “uncertainty principle”. The zeros do NOT give any NEW information. This has been known a very long time, but is still widely misunderstood. For a full recent explanation, please see:
See in particular starting at page 23.

Reply to  wxobserver
December 15, 2015 11:17 pm

Thanks, and I’m already 100% aware of that (my professional background includes DSP). I did not take the time to explain that just cuz I’m being lazy. One of the things I like about this blog is that is is populated by a lot of very sharp folks.
I am trying to duplicate the author’s work and he claims there’s a peak in the spectrum around 10.7 years. A straight DFT of the 111-point C3 signal does not (cannot) give that answer. With a sampling interval of 1 year and 111 points it is impossible for the DFT of C3 to have a peak at 10.7 years. The nearest candidates are 11.1 and 10.09 years. I figured the author probably interpolated the spectrum, so I did the same. The results I got are so close to the author’s that I did not feel the effort to contact him about the periodogram algorithm he used was warranted.
Let me know if I missed something else there, and I can still write the author if you’re not satisfied with my attempt to duplicate his results.

Reply to  wxobserver
December 16, 2015 3:31 am

I can’t believe I’m getting sucked into this but I am. I decided to plot sunspot number versus C3, like Willis did except I inverted C3 and scaled the data so it could be overlayed on the same plot. You can see it at the same place linked above:
I think I partially agree with Willis’ conclusion that there’s no solar signal. Certainly not at the level of being a predictor of river flows. Not at the level of being useful in itself. But still, I think there is an unmistakeable correlation there too. Take a look at the graph linked above, I’m curious what others think.

Reply to  wxobserver
December 16, 2015 10:43 am

wxobserver wrote December 15, 2015 at 11:17 pm in part:
“ ….One of the things I like about this blog is that it is populated by a lot of very sharp folks. …..”
Exactly so – many of whom have just enough knowledge of DSP to be “dangerous”!
I concur with your comments. My real concern was probably that many do associate “resolution” with “detection” (as in ordinary speech). Here we were concerned with determining if a 10.7 year cycle was present, or not (detection).
Many suppose that if you do not resolve a frequency component its energy is lost. This would be true if we had a tunable sharp bandpass in use for spectral analysis, and did not ever “park” it over the 1/10.7 frequency. The FFT is a filter-bank and always puts the energy somewhere, usually very close to where it belongs.
Glad you are having fun.

Reply to  wxobserver
December 16, 2015 1:10 pm

Yes. I skipped over a lot of material in my post. Nobody wants to read a 10-page post I suspect. For example, I did not bother to mention that the DFT presumes this sequence is stationary (repeats forever w/o change). Or that it is merely fortuitous that the end-points sort-of match up so that using a rectangular window doesn’t add much in the way of artifacts. Or that using a tool that assumes stationarity (DFT) to analyze a non-stationary process is risky. We’re looking at a short segment of data that has behaviors on the scale of 1000’s of years and trying to draw meaningful conclusions. All very dodgy business. But it’s what we have to work with, so we try…hopefully w/o losing sight of the limitations.

December 10, 2015 10:38 pm

” Unintuitive as it may seem, noise aided data analysis is indeed a reality.” Ugh. Like doping a silicon chip? Weeds as a cover crop? Perhaps one can parse out constructive and destructive interference?

December 10, 2015 11:05 pm

good work as always Willis. I read the paper when Judith linked it. I also thought there interpretation of their results is very odd. I downloaded their SI and put it to one side to look at the data later.
At best they found a very weak signal and showed if there is an 11y solar influence, it is very small. Even having pre-select a small range of frequencies the 11y is not particularly striking. It should perhaps be noted that this method has split the power of this cycle between C2 and C3, this making it look even less important.
However, your reproduction of their working may have some other interesting results.
In your fig 6 there are some notable peaks in each band. This would be worth noting. Just reading off the graph and taking the anti-log to get the period, I estimate the following.
C1: a pair of peaks around 2.4,2.5 years, probably QBO
C2: a very strong peak at 6y
C3: a secondary peak close to 18y
C4: circa 34y
I would not give much importance to the later on the basis of this short dataset but it is interesting since it is found in much longer, multi-century datasets.
This paper may be more important for what if fails to notice than what it did find. Anyway the method itself is worth knowing about.
Could you post accurate figures for those peaks please. This is interesting.

Reply to  Mike
December 10, 2015 11:29 pm

PS. Where can I find the various R files required by your code linked here. I have several of them in my “Willis” directory but others are missing. I have an SFT directory with several R files which you may have now grouped together.
could you provide a zip of all the ancillary function files in their current form, please.

Reply to  Mike
December 11, 2015 9:46 am

What I found interesting was that comapring the graphs, most reflections of solar cycle influence are in the reconstructed sections of the data. There is minimal appearance of solar cycle influence in the measured flow levels.

Reply to  ATheoK
December 11, 2015 9:48 am

Bandaid on fingertip, tired, dense, keyboard challenged even after thirty plus years of typing…

Don K
December 10, 2015 11:30 pm

It’s cute, I’ll give it that. But I’d like to see a whole lot more analysis on how adding noise — white or otherwise — reveals valid signals that Fourier analysis doesn’t show. If Fourier analysis somehow doesn’t work, I think that’s worthy of a lot of discussion.
One editorial quibble. I misread “but using annual average data cuts your number of data points by 12.” the first time I read it to mean that “annual averaging cuts a few points off the ends” Clearly not what you intended. Maybe “… by a factor of 12.”
Do keep us posted.

Don K
Reply to  Willis Eschenbach
December 11, 2015 1:31 am

EEMD does NOT show that Fourier analysis doesn’t work. What EEMD is is a different way of decomposing a signal.

Maybe. Off hand, it seems to me that when you ask a question that has a small range of answers — is the sunspot cycle correlated with Amazon River flow ? — you ought to get consistent answers no matter what valid approach you use. It’s OK if some approaches say “maybe” or “insufficient data” while others give a definite answer. It’s not so OK if one says “Yes” and another says “No”.
We very likely know what question Fourier is answering. Maybe EEMD — which I do not remotely understand — is asking a different question? What question?
But what do I know?

December 10, 2015 11:34 pm

Interesting concept… perhaps it is related to the observation that while the randomness/white noise should disappear ( or produce no significant periodicity) when analysing a really reallly long signal – a segment of white noise/randomness may have very significant, but random, periodicity associated with it. By adding and processing repeated segments of white noise to the original signal, you are in effect reconstructing any actual signal periodicity over a much much longer data set – thus allowing the white noise/random side of the measurements to completely drop out.

Dr. S. Jeevananda Reddy
December 10, 2015 11:38 pm

I analysed northeast Brazil rainfall. A 52-year cycle and its submultiples were present. Long rainfall series are available Fortaleza. Similar 52-year cycle was noted in onset of southwest monsoon over Kerala.
Dr. S. Jeevananda Reddy

Dr. S. Jeevananda Reddy
Reply to  Willis Eschenbach
December 11, 2015 2:59 am

You can see the results in:Pesq. Agroec.Bras., Brasilia, 19[5]:529-543, maio 1994.
WMO Secretary General issued a statement on World Meteorological Day on 2013 — dry & warm in Southern Hemisphere. Fortaleza completed three cycles [above average peaks and two below average peaks and 2013 comes under third below the average cycle.
2013 — Durban [South Africa] — 66 year cycle, Maalypye [Botswana] 60 year cycle and Catuane [Mozambique] 54 years cycle also presented below the average — W followed by M shape at all the three stations.
Dr. S. Jeevananda Reddy

Reply to  Willis Eschenbach
December 11, 2015 5:12 am

I am always very cautious about saying a signal is present in observational data unless I can see a minimum of at least three cycles, and preferably four or more.

Amen, Amen, Amen, a thousand times Amen. Actually, it’s worse because most climate data is quasi-periodic at best.
That invalidates any analysis that relies on periodic signals. ie. Fourier is right out. 🙂

Gloateus Maximus
Reply to  Willis Eschenbach
December 11, 2015 4:53 pm

Here is Dr. Reddy’s 1989 paper on rainfall and the solar cycle:

Reply to  Willis Eschenbach
December 11, 2015 6:43 pm

This is a controversial subject. Here is the full text of a paper that claims no skill for 11-year solar cycle in predicting Indian monsoon.
Claud, C., Duchiron, B. and Terray, P., 2008. On associations between the 11-year solar cycle and the Indian Summer Monsoon system. J. Geophys. Res, 113, p.D09105.

Gloateus Maximus
Reply to  Dr. S. Jeevananda Reddy
December 11, 2015 8:24 am

The effect of solar activity on Asian monsoons has been known for over a century. This paper from last year provides details:

December 10, 2015 11:45 pm

“A 52-year cycle and its submultiples were present. Long rainfall series are available Fortaleza. ”
Nullius in verba , where’s the data please.

December 11, 2015 12:08 am

While I think your assessment is fair, and I have no strong feelings about whether or not sunspots correlate with decadal climate shifts, for years before getting into this climate lark I was aware of a strong correlation with wine vintages and sunspots. It used to be a way of coarse filtering for decent vintages. This has been “known” for a long time, much longer than the wrangling over AGW.
I don’t know if you have thought about it or if it has been discussed here, but it would make for an interesting post one day. I’m afraid all I have is “if there is smoke there’s probably fire” – nothing more substantial than that. But it certainly predates hand-wringing over the climate.

Reply to  Willis Eschenbach
December 11, 2015 1:00 am

If someone sends me sufficient funds to buy the raw materials I will be delighted to start recording liquid data for others to pour over.

Reply to  Willis Eschenbach
December 11, 2015 3:24 am

Well just for fun I dug this up:
Sunspots and their Effects
….which was first published in 1937.
It’s not exactly riveting science, but it is a little bit interesting. Have scroll a little further on – he makes a correlation with rabbit, lynx and fox pelts and sunspots – I thought that was amusing.
I think my point is, that people have “noticed” a correlation with sunspot numbers and climatically favourable/less favourable conditions going back quite a long way, and well before there was some politically correct stigma attached to the notion.
I stress, no I don’t think that proves there is a proven correlation any more than I think centuries of believing the Sun revolved around the Earth validates the idea. But I do think it is one of the great unknowns and worthy of skeptical consideration.

Lee Osburn
Reply to  Willis Eschenbach
December 11, 2015 6:25 am

Another oddity with sun cycles.
For the last several years, my chili petines (with black fruit) developed a deep dark green color. Recently for the last part of this summer they began taking on a lighter color. Two years ago I tested them by using shade cloth, and within a week they began to fade. When I took it down, they began to regain the darkness. It wasn’t until this year that they did this without the cloth.
I am not sure if they are being affected by the cycle but within the next several years I will watch them to see. I believe they develope the color change to protect themselves from the ( X?- rays) that was being put out by the sun.
Also, Willis, as a sailor on the Navy “big boats” we had a dither signal used to keep the ink pens from sticking to the charts. Like a vibrator signal superimposed on the deflection motors. There was an adjustment to increase and decrease this signal until the deflection would not be noticed by the blob of ink from the wiggle. I am interested in knowing what percentage of dither the digital filtering has to offer.
In my study of the diurnal cycles (real time) I can see the very small noise being generated on the pressure signal (goes from 0 to .01″ sawtooth pattern). Presently we are coming up on a new moon (between us and the sun) and will spend some time comparing it to other orb passing events to see how much noise is injected into the diurnal.
Excellent post, keep em coming

Steve Fraser
Reply to  Willis Eschenbach
December 11, 2015 9:32 am

Maybe in this case, Tasting for it…

Reply to  Willis Eschenbach
December 11, 2015 7:07 pm

How about the integral of sunspot numbers?
Or the integral of group numbers. since the correlation between reviesed SN and revised GN is around 88% and the GN series have a few more 11-year cycles.

December 11, 2015 12:36 am

Willis, can the EEMD be used with Mauna Loa CO2 data that is trend stationary?

Reply to  Willis Eschenbach
December 11, 2015 7:09 pm

Thanks. I have been pondering using the annual maxima and minima and have been procrastinating.

December 11, 2015 1:39 am

OK Willis, I’ve made some progress. It’s turning into quite a dependency chase to get R to load this so far. Hopefully there will not be any blockers.
I seem to be lacking and it’s also trying to reference “plotts2 function.R” in my home dir that does not seem to be in the zip. Can you help with those?

Reply to  Mike
December 11, 2015 1:41 am

correction that was ‘New Sunspot Data’

Reply to  Mike
December 11, 2015 1:49 am

… and ‘LandSea’

Reply to  Mike
December 11, 2015 2:06 am

looking ahead it looks like I will also be missing ‘decompts function.R’ which also references the home dir. ( I can obviously change the path but the those files are missing from the zip).

Reply to  Willis Eschenbach
December 11, 2015 11:05 am

OK, thanks again Willis, this is getting close now. I’m getting a basic plot of fig 1 but without the annual averages line.
I’m getting a few messages about things being “masked”, I’m not sure if that is supposed be an error or a warning or what. Do you know whether they matter?
When it has finished plotting the monthly data it drops out with an actual error.
Any suggestions?
Attaching package: ‘lubridate’
The following object is masked from ‘package:seewave’:
Error in lines(x, lwd = backline, col = backcol) :
object ‘flowannall’ not found

Reply to  Willis Eschenbach
December 11, 2015 1:20 pm

Thanks for taking a look Willis. I did load it but there is not mention of that in “Willis Functions.R” of the zip you provided.
It was assigned in the Amazon file but commented out. Seems to need uncommenting there.
I get it, it’s just a record of snippets of code. What bits do I need to reproduce figs 6 and 9 ?

Reply to  Willis Eschenbach
December 11, 2015 1:50 pm

Thanks Willis, I did as you suggested.
I’m looking at the section that begins at “# plot spectra ——–”
If I get as far as the multiline plot command, I get the structure , panels and labels but only c3 and C6 show any data and it’s just a kind of sinusoidal bulge around +0.5 , the rest is empty.
If I do the section down to “for (i in 1:6){” etc. it all seems to go badly wrong.
I’m not able to produce anything resembling fig 6 as it should be.

Reply to  Willis Eschenbach
December 11, 2015 1:52 pm

The following object is masked from ‘package:seewave’:

maybe that is broken.

Reply to  Willis Eschenbach
December 11, 2015 2:10 pm

Willis, I’m also getting something quite close to fig 7 except that the first two panels are the same as the last two and show SSN not rain. 🙁

December 11, 2015 2:31 am

OK, it looks like I’ve installed all the relevant R libs and their deps. The seamask file is not relevant to this stuff and I’ve commented out refs to it, but it would be good for access to the CERES work, whenever I get time to look at that, that was very interesting.
In summary, it seems the following files are missing to be able to look at what you’ve provided:
‘New Sunspot Data’
‘plotts2 function.R’
‘decompts function.R’
It would be great if you provide a link, Thanks.

Rainer Bensch
December 11, 2015 2:43 am

From where comes the water and why?

December 11, 2015 3:30 am

Willis: Have you consider the fact that most of the processing steps in Climate data are implementations of a Heterodyne circuit
The Input signal is the Raw Climate Data. The Local Oscillator is the Climate Reference Period and the output is the Anomaly Data.
Climatology uses a Subtractive Mixer rather than the ‘Ideal’ Multiplier.
Climatology also uses the Sum (range shifted by 2 to produce Average) and discards the Difference signal at earlier steps in the process.

Scottish Sceptic
December 11, 2015 4:02 am

3 hours of the Cruz hearings is a bit much for anyone so I’ve cut it down to 9minutes of the key evidence:

Alan Robertson
Reply to  Scottish Sceptic
December 11, 2015 6:21 am

Good job and thanks, but the short version misses out on a lot of posturing and blather by the alarmist Senators.
If one is unfamiliar with just how low those boys will go and if you have the stomach for endless logical fallacies backed up by misdirection and blatant “lying with facts” type arguments, then watch the whole thing.

Joe Brown
Reply to  Alan Robertson
December 11, 2015 8:09 am

Is this an example of adding noise to the signal?

December 11, 2015 4:28 am

Adding noise to a signal is a marvelous technique for extracting waveforms from periodic signals if you have enough cycles.
As a lab I used to have my students get the waveform of a periodic signal using a comparator to digitize the signal. If the instantaneous voltage was greater than 0, the comparator would output 1. If the voltage was less than 0, the output would be 0. Without the added noise the result would be something like a square wave. Adding noise and averaging samples over enough cycles would reproduce the waveform to whatever resolution was required.
The technique works if you have enough cycles of a signal whose period doesn’t change. Neither of those conditions is met by by Antico and Torres. What I see here is way too much processing on way too little signal. My first reaction is that it’s crap all the way down.

Reply to  Willis Eschenbach
December 11, 2015 11:25 am

and yet it moves

… the phrase is used today as a sort of pithy retort implying that “it doesn’t matter what you believe; these are the facts” wiki
1 – Antico and Torres’ data is spotty. About half of it is infilled with data calculated on the basis of river levels.
2 – The eleven year period is only approximate and it changes from cycle to cycle.
Those are fatal problems. No amount of sophisticated analysis is going to fix that.

Curious George
Reply to  commieBob
December 11, 2015 10:57 am

Just curious: What is the advantage of adding a noise compared to adding a periodic high frequency signal? Does any noise (red, white) lead to the same result?

Reply to  Curious George
December 11, 2015 12:04 pm

If you add a fixed frequency you will get all kinds of unwanted artifacts. The noise you add should be as close to random as possible.
This isn’t to say that some math genius will not find a reason to do otherwise. My career seemed to be filled with a succession of: “Holy **** I didn’t know you could do that” moments. My great joy was to be able to inflict such moments on others. 🙂

Curious George
Reply to  Curious George
December 11, 2015 2:07 pm

Please name two kinds of unwanted artifacts.

Reply to  Curious George
December 11, 2015 4:29 pm

If you add something that isn’t random then it won’t average out and will show up on the output. Depending on the frequencies and amplitudes you can get a real mess.

Curious George
Reply to  Curious George
December 11, 2015 5:12 pm

A sinusoidal signal averages out. Both white and red noise are random.

Reply to  Curious George
December 12, 2015 7:14 am

Curious George says:
December 11, 2015 at 5:12 pm
A sinusoidal signal averages out. Both white and red noise are random.

A sine wave averages out if you average all the samples over one cycle. That’s not what’s happening here. In the case where the input waveform’s period is equal to the window, the value you get for the first sample in the output cycle is the average of all the first samples in the input cycles. etc. etc.
A hardware system is inherently band limited so any discussion about the difference between white noise and red noise is moot.
If you want to play with this it’s fairly easy to set up a simulation in a spreadsheet.

December 11, 2015 5:15 am

We found a similar decadal signal in autumn precipitation over the US that could very well be associated with solar variablity.

Gloateus Maximus
Reply to  David Small
December 11, 2015 9:03 am

A growing number of atmospheric and oceanic phenomena are being shown to fluctuate under solar influences, and the mechanisms whereby these patterns emerge are becoming better understood:

Gloateus Maximus
Reply to  Gloateus Maximus
December 11, 2015 9:04 am

Oops. Should be number…is, not are.

Gloateus Maximus
Reply to  Gloateus Maximus
December 11, 2015 10:44 am

What percentage of the at least thousands of papers finding significant solar effects on climate have you analyzed?
Here are hundreds, on only five of which was Meehl lead author, although it’s unclear to me why is particularly anathema in your book:
Please for instance take a look at the 2014 monsoon paper I linked above.

Gloateus Maximus
Reply to  Gloateus Maximus
December 11, 2015 11:14 am

Another typo. Left out the “he” in “why he is”.
Correlation of solar activity with various climatic phenomena predates the CAGW hypothesis by about 180 years.
Dunno how big your raft of analyses is, but to be meaningful it would have to include hundreds of recent and classic papers, at a minimum. Thousands would of course be better.

Gloateus Maximus
Reply to  Gloateus Maximus
December 11, 2015 11:56 am

Stipulating that you actually did find insurmountable problems with the 17 studies you’ve analyzed here, do you feel that a fraction of one percent of papers is liable to be representative of thousands to tens of thousands of relevant studies in the past century or so?
I have been impressed by the results derived from SORCE data, showing that, while TSI indeed doesn’t change much, its spectral composition does, ie the share of the total in UV bands, and that various climatic phenomena are clearly associated with these fluctuations, plus plausible to demonstrated mechanisms explaining these correlations are on offer.
The 2011 Nature Geoscience article itself:
I wonder if you find this paper junk as well.

Curious George
Reply to  Gloateus Maximus
December 11, 2015 2:19 pm

Maximus, there is certainly an overproduction of ‘scientific’ papers. Why don’t you wade through one of your choice and perform a Willis-like analysis?

Gloateus Maximus
Reply to  Gloateus Maximus
December 13, 2015 6:28 am

OK. Here’s one by Japanese scientists from 2009:
Influence of the Schwabe/Hale solar cycles on climate change during the Maunder Minimum
I find nothing to fault in the data analysis of this paper. As for the validity of its data, IMO proxies are often better than the cooked to a crisp, so-called instrumental “observations”, adjusted beyond recognition by their own mothers.

Reply to  Gloateus Maximus
December 13, 2015 5:03 pm

Thanks for the Miyahara paper. As for the Space_com paper, hardly authoritative.Try this one, which has the same theme but shows the physics.
Shaviv, Nir J. “Using the oceans as a calorimeter to quantify the solar radiative forcing.” Journal of Geophysical Research: Space Physics (1978–2012)113.A11 (2008).

Gloateus Maximus
Reply to  Gloateus Maximus
December 13, 2015 10:08 am

The study you cited on Herschel’s anticorrelation of sunspots with wheat prices doesn’t falsify the connection, but finds it statistically insignificant. This conclusion however is at odds with other recent work, including papers cited in the analysis you linked, so it’s hardly dispositive:
Pustil’nik, L. A., and G. Yom Din (2004a), Influence of solar activity on the state of the wheat market in medieval Europe, Solar Phys., 223, 335–356.
Pustil’nik, L. A., and G. Yom Din (2004b), Space climate manifestations in Earth prices – from medieval England up to modern U.S.A., Solar Phys., 224, 473–481.
Pustil’nik, L. A., and G. Yom Din (2009), Possible space weather influence on the Earth wheat markets, Sun Geosphere, 4, 35–41.
Pustil’nik, L. A., and G. Yom Din (2013), On possible influence of space weather on agricultural markets: Necessary conditions and probable scenarios, Astrophys. Bull., 68, 107–124.

Gloateus Maximus
Reply to  Gloateus Maximus
December 13, 2015 11:46 am

It’s absurd to claim that people have presented you “the best papers” out of thousands or tens of thousands over the past 215 years, including by some of the best scientists of those centuries.

December 11, 2015 5:25 am

Willis, page 16 shows how much can be gained by adding noise to a signal (a time series effectively) prior to being digitised. shows how by sampling more or less (daily average/monthly average/annual average etc) changes in the resolution of the ‘whole’system can be garnered.
Is adding noise to the system either prior or post data capture, equally valid? I have not yet determined.

Reply to  steverichards1984
December 11, 2015 6:47 am

Adding noise to a signal prior to digitally quantising it is also called dithering. It is a way to overcome the limitations in the precision of the A-to-D sampling.

December 11, 2015 6:47 am

Is this in line with the politically correct science?
Will this study simply be yet another one of many good scientific papers that will never be considered by the IPCC when it does its Sixth Assessment Report?

December 11, 2015 7:22 am

“…our findings have implications for studies on global change.”
“Global change” – we’re doooomed!!!!

December 11, 2015 7:46 am

Wondering if you have seen Paul Pukite’s work on SOI and QBO.

December 11, 2015 8:06 am

It makes for a very long punch-line but thank you for the LOL, Willis!

December 11, 2015 8:08 am

Thanks for the post Willis.
Have to figure out how to apply this type of analysis to my data, Center of Pressure data on a force plate. But looks like a promising method of analysis. Been typically using Correlation Dimension, Sample Entropy, and Higuchi’s Fractal Dimension.

December 11, 2015 8:33 am

Next the adjustments will be installed on the satellites via the operating system NOAA installs before launch.

December 11, 2015 9:02 am

Picking cherries no longer is sufficient. To get data that supports the preordained conclusion now requires picking fruit salad.
The periodogram shows peaks in the Fourier space that are not sharp. There are ‘sidebands’ of the main peak, especially at the longer periods such as 2.5 yr and 3.7 yr. The ‘beating’ of these frequencies is responsible for the periodic modulation of periodic signals (e.g. C2, C3, and C4 in figure 5). To some extent there is a contribution to these rounded peaks due to the truncation of the raw data at the ends of the domain. These rounded peaks are what is passing through the computational ‘bandpass filters’ as closely related cycles.
A more ‘elegant’ approach, in my opinion, would be to work in the Fourier space, to sequentially identify statistically significant peaks (amplitude > 2 sigma for the linear regression of the FFT) one at a time, separate them from the data, subtract the Inverse FFT of the peaks (boxcar truncation) from the original data, and reprocess the reduced data set. This can be repeated as long as statistically significant peaks can be identified within the FFT space.
A familiarity with the FFTs of pure noise of various types is helpful here. The FFT of white noise is ‘flat’ – the linear regression of amplitude vs frequency (alternatively amplitude vs period) has no statistically significant slope. Brown noise has an FFT in which the ‘envelope’ of amplitude shows a linear dependence on frequency/period, as does the periodogram of this data when one looks at the non-peak amplitudes in Fourier space.
But what do I know?

Reply to  tadchem
December 11, 2015 10:42 am

tadchem said December 11, 2015 at 9:02 am in part “ The FFT of white noise is ‘flat’ “
Surprisingly, NOT so. Here I am NOT referring to the fact that any one white sequence is FAR from flat. Rather, even if you average the FFT magnitudes of perhaps a million different white sequences, there will always be a (very significant) dip in the magnitude at DC (k=0), and at k=N/2 (only when N is even). This dip is to just about 90% of the “plateau” level: it’s 2(sqrt(2))/pi or 0.9003163. For k=0 and k=N/2 (for N even) we have, for a REAL time signal (usual case), an ordinary one-dimensional “Drunkard’s walk” with the mean of a folded normal distribution. For all other k, the “walk” is two-dimensional and the mean is that of a Rayleigh distribution.
A full detailed description from 2012 is here:
You probably will want to verify this is TRUE before considering my explanations! Email me for additional details. Yes – another way the FFT can sneak up and eat your lunch on you.

Curious George
Reply to  tadchem
December 11, 2015 11:02 am

We are a way beyond picking data. Today we are adjusting data.

December 11, 2015 11:08 am

As always, quite interesting, and thank you again.
Some time ago I read papers on how the sensory organs (of which animals, I do not remember) make use of the random thermal noise to sharpen discrimination. This is at least “associatively” related, if not directly comparable in detail, to the method you presented here.

December 11, 2015 11:19 am

Recordings on CD used to have noise added to the analogue signal to cover up quantisation distortion (audible volume stepping) at low signal levels to compensate for the logarithmic response of the ear.
Probably still do, in fact.

December 11, 2015 11:29 am

Excellent article Willis. Thank you.
What I noticed in your journey through EEMD and CEEMD land was that most of the solar signal appears in the reconstructed data.
Another tidbit caught my eye. That is the Amazon river output has increased.
From around 1930 to 1948-9 the Amazon river flow appears to be approximately 16,000 – 17,000 cubic meters of water per second. This range appears limited in variability.
From approximately 1968 to the current period, the flow has greater variability. What is interesting is how the overall flow increases during that time period; with peaks as high as 20,000 cubic meters/sec. The average also appears to be higher, perhaps a 1,000 cubic meter/sec, though that is a guess.
Looking around seeking Amazon River discharge rates was enlightening.
From Earth Observatory:

“…In large floodplain-dominated rivers such as the Amazon, variations in water heights create a signal that is detectable by 37 GHz passive microwave radiometers. From these signals, Vorosmarty was able to piece together a picture of river discharge, pixel by pixel, including input from tributaries and periodic differences in water height due to seasonal variations.
Ground-based meteorological station data from the Global River Discharge Database (RivDIS), archived at the Oak Ridge National Laboratory DAAC and the Brazilian Departmento Nacional de Aguas e Energia Eletrica (DNAEE), were used to calibrate and validate hydrology models for the river. From these data, scientists created models from which they could generate a time series of discharge and runoff simulations.
When it came time for ground-truth, Vorosmarty and colleagues found that the satellite imagery and hydrology models agreed. Both data sources reflected the progressive increase in discharge and magnitude, as well as the influence of tributary inflow. Measurements of seasonal high and low water levels also matched. Both also showed interannual variations in discharge caused by the 1982/1983 El Niño event…”

Research carried out in:

Vorosmarty, C. J., C. J. Willmott, B. J. Choudhury, A. L. Schloss, T. K. Stearns, S. M. Robeson, and T. J. Dorman. 1996. Analyzing the discharge regime of a large tropical river through remote sensing, ground-based climatic data, and modeling. Water Resources Research 32(10): 3137-3150.

So, beginning around 1996-1998, Amazon River discharge rates are measured via satellite.
Meaning that Willis was trying to analyze data originating from four, perhaps five data sources, not three. Two separate reconstruction locations,
One, likely two different river mouth estimation series; the second estimation is when the series begins again in 1968-9 until replaced by satellite measurement.
One satellite measurement series.
Without trying to locate Vorosmarty’s paper, I am left wondering how the satellite measures water levels where fresh water meets salt water and South Atlantic tidal forces.
I no longer worry about the Amazon River discharge rates perhaps increasing; now that the latter period is measured by satellites.
Doesn’t anyone in NASA ever wonder about the conjugal habits of their data?
I haven’t bothered to seek the ‘Global River Discharge Database’ yet. Because that way lies more work, as I am curious just what Amazon river discharge levels are used in calculating sea level rise.

Gloateus Maximus
Reply to  ATheoK
December 12, 2015 5:30 am

IMO largely from deforestation rather than increased precipitation.

Gary Pearse
December 11, 2015 1:12 pm

Willis, you seem to find surprises in every study you undertake and you don’t tend to leave stones unturned. Thank you again for your fine work. I’m left with some questions as I’m sure you are, but maybe not the same questions.
1) I haven’t read the paper and won’t pay for it, but I’m puzzled if they didn’t look at the baseline causes of flooding in this mighty river. First, the baseline cause is precipitation- rain and snowfall/melt. If this information is available, it should give an even more convincing connection between sunspots and flooding if there is one. Also, flooding is unlikely to be a simple what goes in must come out situation. In dry years, there is low ground (swamps) that may drain and in some basins, aquifers decant. Heavy precipitation may be partly or largely swallowed up by these and thirsty vegetation by the time the water has traversed a good part of the basin. Also, after a dry period, sloughing of the banks are likely dam water temporarily or divert it through other low ground causing delay. Precipitation is best for the analysis.
2) You said there were some other methods that might have been used. The suspicious nature I’ve developed over the past decade or so gives me cause to think that the authors likely tried the usual methods and had to go to an unusual one to get their fit. Also, with two stations in the stew, I’m seeing red flags. I didn’t grasp the reason for this but assumed one would be a check for the other, or some such (perhaps revealing the data hiding in the noise). I can imagine if they initially tried several other stations on the river, they could cull out the ones they didn’t like and go with what they did go with. This seems to me a good way to get a high probability of a fit that probably is meaningless.
In any case, I note you are intrigued with the method and that tells me you will be reporting this in detail at some later date. I look forward to it.

December 11, 2015 2:00 pm

Rainfall time series do not contain periodic components other than those due to the seasonal cycle. Once that is eliminated, the remaining variability is entirely stochastic; introducing white noise adds nothing to the discovery of random signals. Apparently unaware of rigorous cross-spectral methods to establish relationships between stochastic time series (and entirely ignoring the very apparent lack of coherence between Amazon rainfall and the sunspot cycle), Antico and Torres resort to analysis methodology that is as quaint as it is inappropriate, Sadly, analytically misguided attempts to squeeze a tendentious result out of the data are par for the course in “climate science.” They’re simply not worth the baffled attention given them.

Reply to  1sky1
December 11, 2015 2:14 pm

Well doen, that exactly the unsubstantiated assumption that the whole the AGW is based on. ” the remaining variability is entirely stochastic; ”
Everything is “stochastic” except CO2, so whatever happens it must be due to CO2. QED.

Gary Pearse
Reply to  1sky1
December 11, 2015 6:44 pm

Thank you 1sky1, that would answer my question as to why they didn’t use precipitation /snow melt. What else can they imagine causes flooding of a river? Could they really have not asked this question first and if the answer was that of 1sky1’s information above, then why do the study?
One possible way stochastic precipitation could be irrelevant to the flood cycle(?) would be if snow tended to have net accumulation for 10yrs and then melts were greater at the end of a sunspot cycle. But surely this would be well known for a long time and would be obvious without the need for iffy methods to tease out a signal. This one using interacting data from two stations on the river is what has got me. One station in the lower reaches of the river should do the job if it has any significance at all. What is the difference between floods on the river in between sun cycle maxima and minima? Volumetrically it must be small or it would be well known long before now. If small, we get into the problem of “small differences” with large error bars. I’m assuming there aren’t numerous, high precision stream flow guages, snow pack and melt guages. This isn’t Squaw Valley.
I can understand why Willis hasn’t been able to find any sunspot cycle signal in climate. With only a small change (1%) in TSI, and large error bars in all climate data, it’s more than just noise. One percent sounds non existent when you are measuring with axe handles and plugs of chewing tobacco and then adjusting the data on an algorithm daily. I think this quest to find a connection with the sun spots is becoming pathological. Whatever there may be has to be small. Other very long term variations in the sun’s output are another thing, but Willis has already pointed out that the much stronger annnual variations from orbital eccentricity don’t even show, that clouds forming 15 minutes earlier in the afternoon can wipe out this difference in potential heating. This all makes this paper ridiculous.

December 11, 2015 3:39 pm

Willis Eschenbach wrote December 10, 2015 at 11:54 pm:
“Sorry, guys, but the original paper differentiates this use of white noise from the “stochastic resonance” use. See page 7 for their discussion of the differences between the two.
I assume”original paper” meant Zhaohua Wu and Norden E. Huang (2005) and that the relevant comments on page 7 are:
“Adding noise to the input to specifically designed nonlinear detectors could be also beneficial to detecting weak periodic or quasi-periodic signals based on a physical process called stochastic resonance. The study of stochastic resonance was pioneered by Benzi and his colleagues in early 1980’s. The details of the development of the theory of stochastic resonance and its applications can be found in a lengthy review paper by Gammaitoni et al. (1998). It should be noted here that most of the past applications (including these mentioned earlier) have not used the cancellation effects associated with an ensemble of noise-added cases to improve their results.”
I had already considered this and it was the basis for which I commented about averaging stochastic resonance. It seemed quite clear and I don’t see what else it could mean.
It is in fact, apparently, according to my quick look below, a pretty good idea, although I doubt that there is any averaging in most SR situations except as I noted for audio dither. But I am impressed. The figure shows the average output of 400 runs of SR where the signal is five cycles of a sinewave of amplitude 0.1, the added noise is uniform of amplitude 0.5, and the threshold (non-linear detector) is set at 0.5. Hence there is a detection when the noise boosts the sinewave peaks to between 0.5 and 0.6, which seems to be an average of about 25 of the 400 trials for any time point. Here is my result:
I am impressed. Beyond this finding, the local jargons get in the way (as Joe Born suggested).

Reply to  Bernie Hutchins
December 11, 2015 5:13 pm

Bernie Hutchins:I assume”original paper” meant Zhaohua Wu and Norden E. Huang (2005) and that the relevant comments on page 7 are:
thank you for your post. That looks like a good example. I am impressed.

December 11, 2015 4:23 pm

Nobody’s mentioned this yet. This whole EEMD process was invented to deal with (among other things) non-stationary data. In other words, data where the solar signal is there…but not there either all the time or at the same frequency, phase or amplitude all the time. If the signal is not there all the time then Fourier analysis would have a harder time seeing it. That would have the effect of spreading out the energy over a range of frequencies and lowering the peaks, thus making it more difficult to spot.
However, what it means if the solar signal really is there but varies significantly over time is beyond me. Anyway, just an observation here.

December 11, 2015 4:30 pm

Dear Willis,
As a late-career physicist who routinely uses advanced signal-processing techniques, all I can say is that this approach is BS. First, the addition of truly “stochastic” noise to data is a well known technique and has applicability IF the recording technique is non linear or quantized. This should be obvious in the case where a digitization recording technique has a bit resolution SMALLER than the signal. The addition of STOCHASTIC noise LARGER then the signal can raise the signal level above the one bit resolution of the digitizer. Then a repetitively recorded signal can eliminate the noise through ensemble averaging (whose definition includes the removal of stochastic noise – but I digress). FUNDAMENTALLY, the addition of stochastic noise to a signal is easily discernible in Wigner Space where the data is spread out in phase space. Stochastic noise is 2-D randomly scattered in Wigner Space. Localization of energy density in Wigner space corresponds to a true “non-stochastic” information. Stochastic noise addition will not help this situation with the exception noted above. Finally, the use of multi-resolution decomposition is well known in Wavelet Theory. The approach described in the the article is a mathematically non-rigorous decomposition. True wavelet mathematicians would laugh at their approach. (See Ingrid Daubechies’ books.) I don’t laugh, I just shake my head. There are numerous ways to mathematically attack this problem in a rigorous fashion. You described one approach. The other is a true time domain autocorrelation approach which would give nearly the same result. — I have no bias as to the result of the authors’ study but such a data analysis used in a physics article in any reputable journal (Phys Rev comes to mind) would have NEVER made it past the reviewers. C students – what can you say. Thanks so much for all of your effort. It is always a pleasure.

Reply to  Willis Eschenbach
December 11, 2015 5:17 pm

Willis Eschenbach: At present, there are 31,000 hits on google for “EEMD signal”, and another 3,600 hits on Google Scholar.

Reply to  Willis Eschenbach
December 11, 2015 5:34 pm

[Reply: ‘Chaam Jamal’ is a sockpuppet. Also posts under the name ‘Richard Molineux’ and others (K. Pittman, etc.) As usual, his sad life writing comments has been completely wasted, as they are now deleted. –mod]

Reply to  Willis Eschenbach
December 12, 2015 8:02 am

And about 27,300 results for “Chaam Jamal is a jerk” too.
You don’t understand how to use Google, do you?
[Reply: ‘Chaam Jamal’ is a sockpuppet. Also posts under the name ‘Richard Molineux’ and others (K. Pittman, etc.) As usual, his sad life writing comments has been completely wasted, as they are now deleted. –mod]

William Larson
December 11, 2015 5:10 pm

All this is way, way over my head, but still fascinating. I have a question: Does it make a difference what kind of noise is used? What if one uses red noise instead of white?

December 11, 2015 8:50 pm

A form of this matter of noise addition to improve detection was put to me as follows by an analytical chemist in my lab about 1970. We were using the new technique of atomic absorption spectrometry on an instrument with a logarithmic calibrated dial/needle readout.
He theorised “At very low concentrations the needle barely moves up from zero. Therefore, a bias is introduced because all readings below zero are assigned a zero, all just above have discrete values and therefore the mean of repeated readings is affected. If, before analysis, we add a small, known amount of extra analyte, we will shift the base from zero and allow an unbiased average.”
Eventually he agreed that the step of addition of analyte was equivalent to adding a source of noise and more error; and that in this case, the situation was made worse by taking the meter needle into the more compressed part of the visual scale, adding more noise through worse visual discrimination.
In the case being discussed by Willis, one approach is to categorise cases such as this and then strip them from the list of possible ways that the signal is allegedly enhanced.
Other bloggers have been attempting this by reference to human detection, such as adding noise to CD music, adding noise to visual imagery, dithering, etc. Examples that rely on human response should not be used because only a mathematical expression can be devoid of human frailties. The mathematical approach should avoid relative comparisons like whether the Floyd-Steinberg or Stucki dither method is the best. We need illustrations of how the addition of (first, perhaps) white noise has improved signal strength, expressed in quantitative mathematics.
(Unrelated factors have not allowed me to study the original paper yet – please excuse, I am trying. The example I gave is not trivial. I suspect it applies at least in part to sampling sub-pixel sizes.)

Reply to  Geoff Sherrington
December 11, 2015 9:15 pm

Thanks for the example, a great lesson in thinking.
The problem in the lab would be to ensure precision and accuracy in measuring the small amount of extra analyte. If the resulting value after subtracting the bias is close to zero there is a risk of multiplying any error in measuring the analyte. In a lab with closely controlled conditions this may be routine.
But in the wild, such precision is rare and that is the problem we find in climatology.

Reply to  Willis Eschenbach
December 12, 2015 7:49 am

Wow, that is nice, Willis . Certainly puts a bit of context for those dismissing EEMD as BS.
I’ve got the SFT of the flowts with and without removeann=T , nearly identical in where peaks lie.
Now I want to get the C3 spectrum at monthly resolution, yearly is too crude to be much use. I’ve tried to adapt the code but it’s not going too well . Could you suggest how to do this?

Reply to  Mike
December 12, 2015 10:25 am

Mike –
Yes – it looks neat. But don’t you think it looks very much like a wavelet decomposition. (Not to mention the Fourier decomposition of a Delta.)
And I thought we were talking about detecting a very weak periodic component – exactly what a Delta function ISN’T. Your own work seems related to the periodic case?

Reply to  Mike
December 12, 2015 11:47 am

yes Bernie. I think that plot shows the result is very similar to FFT, SFT or wavelet decomposition. The claim is that it is more robust in noisy data. Remains to be tested.
It’s quite surprising in view of the difference in method but again reassuring that it is consistent with FT.
I have now managed to do EEMD on the monthly Amazon data and the peaks are very close though not always identical to what I get using Willis’ SFT . The periods around 20-odd years changed by a couple of months , the other were exactly the same number of months.
The band-pass effect may be very useful in more noisy data but here it was detrimental in the case of the 18.6y peak. On C6 is in one transition band of the filter; on C7 it is on the other side. Due to very poor resolution of the adjacent peaks, this meant that I could not determine the peak centre when it got bent downwards.
Using SFT I get a peak value. Here is what I find from the Amazon rain data:
I did the SFT of the monthly flowts with and without the annual cycle and it was nearly identical in where the peaks lay.
26.08 y
I’m tempted to see 8.91y as the lunar apside period of 8.85 years but I’d be a little cautious since I would have more convinced it were nearer in view of the length of the data sample.
18.58 is clearly a lunar cycle. This year is a “minor lunar standstill” which is when the latitude of the moon comes closest to the ecliptic ( plane of the solar system ). Thus it also was in 1997 when when the last major El Nino developed.
10.75 and 21.25 are strongly suggestive of Schwabe and Hale, though the 10.75 is fairly small. Finding a solar signal and demonstrating it is small is probably as informative as not finding one.
However, with equally strong 18.58 and 21.25 barely being resolved from each other it is clear that ignoring possible lunar influence will beggar attempts to find or dismiss a solar signal. Those periods will go from in phase to opposite phase in about 74 years.
I was unimpressed by this paper when I read it last week. Having analysed the data I’m even less impressed since I think they failed to see the stronger Hale cycle and did not even consider the presence of lunar influence.
Anyway, mighty thanks to Willis for digging into this EEMD method and making his code available.
This is another useful tool to have available.

Reply to  Mike
December 12, 2015 8:25 pm

Just ran thsi through the spectral software I usually use and I don’t see any sign of the circa 21y peak !
The circa 18 is sitll there though nearer 18 than 18.6, ; circa 8y is 8.8 just the other side of 8.85 to that found with this technique. 10.8 is still there and is the strongest in decadal scale peaks.
Looks like Amazonian climate is affacted by both the sun and the moon, but I suppose the Azetcs could have told us that. 😉

Reply to  Willis Eschenbach
December 12, 2015 10:32 am

Willis –
What is the provenance of the paper you have linked? No date! It looks like IEEE format, but even students submit assignments using that! Thanks.

Dr. S. Jeevananda Reddy
December 12, 2015 5:23 am

Frederick Colbourne — you presented a good work on 11 year cycle with Indian monsoon rainfall.
Solar radiation presented 11 year cycle but not rainfall
All-India Southwest Monsoon rainfall presented 60-year cycle. The third cycle started in 1987 [the starting year of Astrological calandar of 60 year but lagging by three years to Chinese 60 year astrological cycle.]. You can go back ward and forward from 1987 and take 10 year averages and plot on a graphy. You get clear sine curve.
In the case of Southwestern parts of India with northeast monsoon and pre-monsoon and post-monsoon cyclonic activity, the annual rainfall presented 132 year cycle. The new cycle started in 2001. If look at the data by separating for SWM and NEM, they presented 56 year cycles but in opposite direction. NEM 56 also reflected in the cyclonic activity. Both NEM & SWM precipitation showed an increasing trend basically because the first 66 years come under the below the average and next 66 years comes under above the average.
So, Indian monsoon is a complicated system as they are modified by orographic systems.
Dr. S. Jeevananda Reddy

December 12, 2015 9:27 am

Willis – What we really need here is:
(1) A well-defined (toy) test signal with the results compared for FFT, stochastic averaging, and any proposed new method. Has this been done?
(2) The essential, smiling first-class grad students at our doors to tell us what, if anything, it means and is good for!
Alas – I am retired and only hoping for (1).

Reply to  Willis Eschenbach
December 12, 2015 4:35 pm

Thanks Willis –
I think this EEMD was new to you until a few days ago, and I had never heard of it until you posted. What I have not seen here or immediately adjacent is a consistent description of HOW one computes this. Thus I can not duplicate or assess the EEMD procedure at present. [ I was however, familiar with the use of noise to enhance detection (stochastic resonance) for some 22 years (thanks C.H.).]
I do not work well in a mode where I have to rely on a canned program (in R or even in Matlab). I prefer to (first) write my own often cumbersome equivalent code. Alternatively, I can consider running a whole menagerie of test signals (sines, steps, ramps, sines plus noise etc) through the can to see WHAT the EEMD does DO. The last figure you posted of the decomposition of the Delta is the sort of thing that helps – but it does look very wavelet-like.
An additional difficulty here is the application of an unfamiliar (to me) function to look for solar signals that most likely don’t even exist. These are poor tests. I don’t know what species of failure a negative result would mean.
I do greatly appreciate your calling attention to interesting things hiding in the corners. Do keep doing that for us. Thanks.

Reply to  Willis Eschenbach
December 13, 2015 11:12 am

Willis –
Much thanks. Even as I admire your initiative and curiosity I also admire your patience and energy, quantities I myself find in decreasing supply in my dotage!
So you show me that the EEMD is really TWO things: the detection and removal of patterns, in turn; and the addition of noise.
In a noise-free signal, the pattern detection is essentially just what the human brain does without being specifically instructed. It is the basic approach of the reduced math (just “eyeball” it) Fourier Series articles of the 1950’s era Popular Electronics Magazine. We easily extend this, even to impromptu basis functions, and welcome the mathematical aids as encountered. So EEMD looks like a method of teaching a robot to do our own pattern recognitions. Fair enough.
The addition of noise is not so clearly warranted, except as one compares it to classic stochastic resonance. In the case of a noise-free signal, we can contemplate adding intentionally generated random noise – for some purpose. In the case of an already noisy signal, why add more? Indeed! You may already have enough, or more than enough. For example, “adding dither” may be just a matter of not trying so hard to reduce noise coming in.
With stochastic resonance however, it is CLEAR that the addition of noise HELPS. Consider the crayfish in the stream not wishing to encounter a bass who is looking for lunch. In a very quiet stream, the stealthy swishing tail of the bass may be insufficient to trigger the crayfish’s sensors. Now, the crayfish does not add noise, but the stream does: the turbulent “babbling” as water randomly encounters rocks. Enough that the peaks of the swish are now a thump, thump, thump. (My figure at December 11, 2015 at 3:39 pm) Essential here is the non-linearity (threshold). Once again (as with sonar), nature got there first.
So the noise-free case of eyeballing components is perhaps efficient (computation cost of one-time processes don’t really matter), it is not clear to me why and how noise helps, unless it is similar to stochastic resonance. Perhaps.
As for computer code, I did find some Matlab functions which I need to study and put together.
Thanks again Willis.

Gunga Din
December 12, 2015 10:59 am

The Amazon doesn’t pass through a flow meter. River and stream flows are estimates.
A common method the USGS uses is to set up a gauge to measure stream depth. The surface area of a cross section of the stream at the site is ascertained. The velocity of the stream is measured. From that the flow is estimated and tables produced giving so much flow for so much depth. Periodically that area of the cross section and the velocity are checked and the tables adjusted.
Methods may have changed over the years. I don’t know how they measure the flow of the Amazon but it all does remind of surface stations and temperature.
How accurately do the numbers reflect reality? The numbers, such as they are, may be the best we have but adjusting an estimate using another estimate might give you more decimal points but no more accuracy.

December 14, 2015 12:13 am

After more digging I will accept that this is a neat example of counter intuition. I still have uncertainties about identification of types of data when noise should be added, but that is me.
It is a bad day when one does not learn something new.

Frederik Michiels
December 14, 2015 8:53 am

for some reason it more feels like what they call in digial audio as the process of “dithering”. just in reverse mode.
Dithering adds and substracts noise for known audio signal distortions when you convert the bit depth of an audio signal. It adds general background noise but does reduce the more audible bit reduction sharp distortion that is audible.
the difference here is that it uses “known” outcomes of distorted patterns in the sound and fils it in with the interpolated values of the reduced grid.
why do i say reverse? Well here the signals are not known, you got like the “first step of dithering (=applying noise) that has been done” but this first step is unknown, some cycles are known so then you can “guess dither” the second step with the amazone values for any known cycle. However as fourrier analysis from the raw data shows, that may just be a small signal nearly undetectable that may add to the reduction towards a straight line.
EEMD is the second step of dithering but with the first step unknown, it can therefore magnify this signal to a level that’s blown out of proportions This because the added “noise” of your “audio” (here the values of the amazon river) are decomposed is “unknown” and all the sine waves that compose the “noise” are unknown. Therefore ther may be a catch in this that can put scientists/mathematicians on a wrong leg, even if the whole methodology is scientifically or mathematically correct.
so in short you can find with EEMD any cycle that can link, however as the interaction is unknown and the weather related noise follows an unknown, working this way backwards can be misleading. the best is to just apply a fourrier analysis of the raw data and then make conclusions.
it’s a bit like “torturing the data till it shows what you are looking for” but then in a scientific way: nothing is wrong with the methods used, but it can give false (mainly exagerated) correlations

Reply to  Frederik Michiels
December 14, 2015 10:30 am

Frederik –
I, like you, would like to understand EEMD based on ideas from digital audio like dither (and stochastic resonance). I have been unable so far to complete the connections!
One thing that I think helps is to turn the problem around. You are not adding noise to a perfectly good signal, but adding a signal to perfectly good noise.
Another thing is to recognize that the digital audio “art” is not so much science as it is (fiendishly clever) ENGINEERING aimed at a practical product. Essential here are the ideas of “over-sampling” and “noise-shaping” to manage noise. Over-sampling drastically increases the sampling rate during playback far far above the audio needs. This is by temporarily generating extra samples (locally on the fly by interpolation), using them, and discarding them. The “quantization noise” is then shaped into the high frequency range thus opened up (inaudible). In addition, the number of bits can thereby be reduced (resolution below LSB), even to just one bit! Deterministic, but the various waveforms look a lot like dither had been used.
The value of comparing something new to something you already understand is of immense value.

Reply to  Bernie Hutchins
December 14, 2015 5:38 pm

Thanks Bernie you added nicely in what i tried to explain with the fact that i said “in reverse”
Actually i am a digital sound creator and from any random noise of nature i can make beats and sounds that sound melodic.
To achieve this, i filter out random noises of thunders and other sounds to a specific range so that it sounds like a beat or instrument. Thus filtering out the noise till you get a left over of a specific set of harmonics. by “filtering out the interfering harmonics that create noise”.
This is what i would call to apply EEMD on perfect noises: take out the blur that masks the sound and intonation you hear in for example a thunder, to create a sound more perfect by eliminating the other freuqencies.
So i do these practises daily but then just on the ear to have a good sounding beat without too much interfering random noise
This principle does remind me very hard of what i do. The problem is that the focus on the inexistant sine waves with natural events will give one anyway. The Amazon river data will show an 11 year cycle or actually more accurate: “an 11 year sine wave with variety in intensity.”
I believe they hurried to make their conclusions. however in correct noise it is not good to average an “in intensity varying wave that is inherent to noise”
that effect explains why willis can’t find an 11 year cycle when he splits the data in half. here instead of cycles per second for audio, we talk about cycles of 11 years and more..
so this means that in 40 years suddenly the fourrier analysis may suddenly show the 11 year cycle got canceled out or increased in strength. Both can happen
I often use fourier analysis in my soundwork to find the “base wave” so now i’m in for an analogy: when you take the amazon raw data: here we have an exception: seasonal variability that gives a clear signal, Like this i would not be surprised to see also a signal for other events in our solar system that can or intensify if well aligned or cancel each other out if aligned in opposition. this will create “noise” even in the “clear signal you see “noise” as every wave is not of the same amplitude.
so yes with enough data and very precise data you will see our “celestial harmonics” in yhe fourrier analysis (in fact all of them). The question then becomes: “If this signal is a range of 6% is it significant ? or is the signal of the seasonal variability that strong that it can easyly and repeatedly cancel out the 11 year cycle?”
when i look at the split data Willis provided i suspect the second question will be answered with “yes”, which then would make the influence insignificant.
if this wave would then change amplitude like the basic wave varying in a chaotic pattern, then it more looks like an artifact of the “noise of the seasonal cycle” rather then a real cycle
i hope i made some sense as English is not my native language… if not clear just ask.

Reply to  Frederik Michiels
December 14, 2015 7:16 pm

“…for some reason it {EEMD} more feels like what they call in digial audio as the process of “dithering”. just in reverse mode.”
Nice example and exposition.

Reply to  Willis Eschenbach
December 14, 2015 10:43 am

Willis –
Right – BUT. Recall that Principal Components was a well-established method at the point where Mann invented a new way of normalizing the data which brought out a hockey stick. Who knows for sure, but don’t we all at times suppose that we have “finally done something right” when we see what we were expecting, if not outright hoping for? There seem to be a fair number of users who compare EEMD to wavelets and prefer wavelets. Caution per se has merit.

Reply to  Bernie Hutchins
December 14, 2015 1:23 pm

Willis my friend – you said December 14, 2015 at 11:17 am, in part:
“I say again … what is your point here?”
Three points.
POINT 1: First, you said in the top post:
“Finally, contrary to the authors of the paper, I would hold that the great disparity between all of the intrinsic modes of the Amazon flow data and of the sunspot data, especially mode C3 (Fig. 7), strongly suggests that there is no significant relationship between them.”
Now, if I understand you correctly (here and in the past) you don’t see evidence of any 11 year cycles, specifically not in the case of river flow rate. So a tool that shows such a mode is in some way flawed or being misused? The fact that the same tool is properly used elsewhere by others is not relevant to the Amazon River. Thus if it found an artifact, it is misleading us all. It’s like claiming that a FFT could find a linear trend, and pointing out that the FFT is highly regarded as useful.
POINT 2: (peripherally related) you said:
“….He made a stupid mistake and didn’t notice it because it fit his preconceptions …”
while I said:
“….we all at times suppose that we have ‘finally done something right’ when we see what we were expecting, if not outright hoping for?….”
These are much the same – the same human foible.
POINT 3 – Certainly MY failing but it still do not have much idea how or why noise is used in EEMD (unless it is a stochastic resonance means of detecting weaker modes) and I have not seen any basic “tutorial” on EEMT (one good ppt never gets to noise) that demonstrates the procedure and puts it through its paces, comparing to FFT, wavelets, etc.
Thanks for your time.

Reply to  Willis Eschenbach
December 14, 2015 8:00 pm

will uote yourselfon this:

First, let me say that I would never have guessed that white noise could function as a bank of bandpass filters that automatically group related components of a signal into a small number of intrinsic modes. To me that is a mathematically elegant discovery, and one I’ll have to think about. Unintuitive as it may seem, noise aided data analysis is indeed a reality.
This method of signal decomposition has some big advantages. One is that the signal divides into intrinsic modes, which group together similar underlying wave forms. Another is that as the name suggests, the division is empirical in that it is decided by the data itself, without requiring the investigator to make subjective judgements.

when you add noise to a noisy signal and then decompose it, into group related compinents you are on a dangerous zone where i thus say that though entirely correct in all ways to do it this does not work entirely correct on noisy data.
the data of the amazon river is noise, but it is not only “white noise only” so like with sound it will resonate with parts of the white noise and not with other parts so when used as a bandpassfilter it will indeed divide the signal in intrinsic modes it will do that with every signal you do it with so yes some “harmonics” will pass through
i think you missed the point about the “dithering in reverse”point i made as that was the point i was trying to make
The point with “torturing the data” here is thus a “reverse one” with the point being: “you will find with EEMD in all river patterns an 11 year cycle even more on every aspect of our planet’s weather behavior you will find it with this method. I even do not need a proof for that, it’s obvious that this influence is there and that it is measurable,
i think you missed the analogy with sound i made i that regard: in sound each IMF would be a harmonic component of the result of white noise band pass a sound with flutter in it consistent with the variable frequency of EMD processing. each IMF would then be seen in the spectral analisys of that resulting sound (sound works additive which is why the analogy is maybe not entirely empirical correct)
does it also prove a huge impact or is that impact too small on these scales to make a difference?” so if yes for which parts would it be significant and fot which parts would it be too small to have an impact?

%d bloggers like this:
Verified by MonsterInsights