Noise Assisted Data Analysis

Guest Post by Willis Eschenbach

Once again, Dr. Curry’s “Week in Review-Science and Technology” doesn’t disappoint. I find the following:

Evidence of a decadal solar signal in the Amazon River: 1903 to 2013 [link]  by Antico and Torres

So I go to the link, and I find the abstract:

Abstract

It has been shown that tropical climates can be notably influenced by the decadal solar cycle; however, the relationship between this solar forcing and the tropical Amazon River has been overlooked in previous research. In this study, we reveal evidence of such a link by analyzing a 1903-2013 record of Amazon discharge. We identify a decadal flow cycle that is anticorrelated with the solar activity measured by the decadal sunspot cycle. This relationship persists through time and appears to result from a solar influence on the tropical Atlantic Ocean. The amplitude of the decadal solar signal in flow is apparently modulated by the interdecadal North Atlantic variability. Because Amazonia is an important element of the planetary water cycle, our findings have implications for studies on global change.

The study is paywalled, but to their credit they’ve archived the data here as an Excel workbook. Let me start where I usually start, by looking at all of the raw data, warts and all.

Amazon River Flow Data 1902 2014

Figure 1. Monthly average Amazon river flow (thousands of cubic metres per second). The violet colored sections are not observations. Instead, they are estimations based on the river levels in two locations on the Amazon.

Now to me, that’s a big problem right there. One violet section is based on river levels at one location, and the other violet section is based on river levels from another location. It’s clear from the annual average (red/black line) that the variance of those two river level datasets are very different. One river level dataset has big swings, the other has small swings … not good. So first I’d say that any results from such a spliced dataset need to be taken, as the old Romans said, “cum grano salis” …

Setting that question of spliced data aside, I next looked at the periodogram of the data. This shows the strength of the signal at various periods. If the ~11-year solar cycle is affecting the river flow, it will show a peak in the 11-year range.

Periodogram Amazon River full and halfFigure 2. Periodogram of the monthly Amazon river flow data shown in Figure 1. 

It appears at first blush as if there is a very small 11-year signal in the full data (black), about 6% of the total range of the overall data swing. But when we split the data into the first half and the last half (red and blue), the 11-year signal disappears. This is not at all uncommon in observational datasets. Apparent cycles are often just the result of the analysis method averaging a changing signal.

Next, in Antico2015, the authors use the annual average data. To me, this is a poor choice. If you wish to remove the annual fluctuations, that’s fine … but using annual average data cuts your number of data points by a factor of 12. And this can lead to spurious results by inflating the apparent significance. But let us set that aside as well.

Finally, there is no statistically significant correlation between sunspots and Amazon river flow levels at any lag (max. monthly correlation ~ 0.1, p-value = 0.3 …).

Having seen that, my next step was to see how the authors of Antico2015 decided that there was a solar signal in the Amazon. And this was a most fascinating voyage. The best thing about climate science is that there is no end of the opportunities to learn. In this case, I learned from the Supplemental Online Information that they were using a method I’d never heard of, ensemble empirical mode decomposition, or EEMD. It’s one of many methods for decomposing a signal into the sum of other signals. Fourier analysis is the best known type of signal decomposition, and I’ve written before about the “periodicity” decomposition of Sethares, but there are other methods..

The details of EEMD are laid out by its developers in a paper called “Ensemble Empirical Mode Decomposition: A Noise Assisted Data Analysis Method” (hereinafter “EEMD2005”) … how could a data junkie like myself not like something called “noise assisted data analysis”?

The concept itself is quite simple. First, you identify local maxima and minima. See e.g. Figure 3 Panel b below, from the EEMD2005 paper, that shows the local maxima.

EEMD Process

Figure 3. Graphic explaining the EEMD process, from the EEMD2005 paper. ORIGINAL CAPTION: The very first sifting process. Panel a is the input; panel b identifies local maxima (red dots); panel c plots the upper envelope (red) and low envelope (blue) and their mean (black); and panel d is the difference between the input and the mean of the envelopes.

Then after you identify local maxima (panel b) and local minima (not shown), you draw two splines, one through the local maxima and the other through the local minima of the dataset (red and blue lines, panel c). The first component C1 is the difference between the data and the local mean of the two splines (panel d).

Then you take the resulting empirical mode C1 as your dataset and do the same—you draw two splines, one through the local maxima and the other through the local minima of C1. The second component C2 is again the difference between the data and the local mean of those two splines.

Repeat that until you have a straight line.

How do you aid that with noise? Well, you repeat it a couple thousand times using the original data plus white noise, and you average the results. According to the paper, this acts as a bank of bandpass filters, and prevents the mixing of very different frequencies in any one component of the decomposition. What do I know, I was born yesterday … read the paper for the math and the full explanation.

In any case, when they use EEMD to decompose the Amazon flow data, here’s what they get. Each panel shows the resulting curve from each step in the decomposition.

Amazon Antico EEMDFigure 4. This shows Figure S1 from the Supplementary Online Information of the Antico paper. ORIGINAL CAPTION: (Left) Annual mean (October-September) Amazon flow record at Obidos station, its oscillatory EEMD modes (C1-6), and its residual trend. (Right) Raw periodograms of flow modes. In these power spectra, the frequency band of the decadal sunspot cycle, at 1/13 to 1/9 cycles per year, is depicted by the shaded region, and the oscillatory period of the most prominent spectral peak of C3 is given in years. In the left panels, the fraction of total variance accounted by each mode is shown in parentheses. For a particular mode, this fraction is the square of the Pearson correlation coefficient between the mode and the raw data record. The sum of these fractions may be greater than 100% because EEMD is a nonlinear decomposition of data; therefore, the EEMD modes are not necessarily linearly independent. To obtain the EEMD decomposition of the annual mean flow record, we considered an ensemble number of 2000, a noise amplitude of 0.6 standard deviations of the original signal, and 50 sifting iterations.

This is a curious kind of decomposition. Because of the use of the white noise, each panel in the left column shows a curve that contains a group of adjacent frequencies, as shown in the right column. No panel shows a pure single-frequency curve, and there is significant overlap between the groups. And as a result of each panel containing a mix of frequencies and amplitudes, each curve varies in both amplitude and frequency over time. This can be seen in the breadth of the spectral density plots on the right.

For the next obvious step, I used their data and variables, and I repeated their analysis.

Amazon EEMD analysis mineFigure 5. My EEMD analysis of the Amazon river flow. Like the paper, I used an ensemble number of 2000, a noise amplitude of 0.6 standard deviations of the original signal, and 50 sifting iterations maximum.

I note that while my results are quite similar to theirs, they are not identical. The intrinsic modes C1 and C2 are apparently identical, but they begin to diverge starting with C3. The difference may be due to pre-processing which they have not detailed in their methods. However, I tried prefiltering with a Hanning filter, it’s not that. Alternatively, it may have to do with how they treat the creation of the splines at the endpoints of the data. However, I tried with end conditions of “none”, “wave”, “symmetric”, “periodic”, and “evenodd”. It’s none of those. I then tried an alternative implementation of the EEMD algorithm. The results were quite similar to the first implementation. Finally, I tried the CEEMD (complete ensemble empirical mode decomposition) method, which was nearly identical to my analysis shown above in Figure 5 .

I also could not replicate their results regarding the periodograms that they show in their Figure S1 (shown in Figure 4 above), although again I was close. Here are my results:

periodograms of flow modes amazon riverFigure 6. Periodograms of the six flow modes, Amazon River data. 

This makes it clear how the modes C1 to C6 each contain a variety of frequencies, and how they overlap with each other. However, I do not see a strong signal in the 9-13 year range in the intrinsic mode C3 as the authors found. Instead, the signals in that range are split between modes C2 and C3.

Now, their claim is that because mode C3 of the intrinsic modes of the Amazon River flow contains a peak at around 11 years (see Figure 4 above), it must be related to the sunspot cycle … while I find this method of decomposing a signal to be quite interesting, I don’t think it can be used in that manner. Instead, what I think is necessary is to compare the actual intrinsic modes of the Amazon flow with the intrinsic modes of the sunspots. This is the method used in EEMD2015. Here are the modes C3 of the Amazon flow and of the sunspots:

eemd analysis amazon flow sunspot C3Figure 7. Raw data and intrinsic empirical mode C3 for the Amazon (top two panels) and for the sunspots (bottom two panels)

Now, it is true that intrinsic modes C3 of both the sunspot and the Amazon data contain a signal at around the general sunspot frequency. But other than that, the two C3 modes are quite dissimilar. Note for example that the sunspot mode C3 is phase-locked to the raw data. And in addition, the sunspot C3 amplitude is related to the amplitude of the raw sunspot data.

But to the contrary, the Amazon mode C3 goes into and out of sync with the sunspots. And in addition, the amplitude of the Amazon mode C3 has nothing to do with the amplitude of either the sunspot data or the sunspot C3 mode.

This method, of directly comparing the relevant intrinsic modes, is the method used in the original EEMD2005 paper linked to above. See for example their Figure 9 showing the synchronicity of the intrinsic modes C3 – C7 and higher of the Southern Ocean Index (SOI) and the El Nino Cold Tongue Index (CTI).

I find this to be a fascinating way to decompose a signal. It is even more interesting when all of the intrinsic modes are plotted to the same scale. Here are the sunspot intrinsic modes to the same scale.

Sunspot EEMD analysis true size mineFigure 8. EEMD analysis of the annual mean sunspot numbers. All panels are printed to the same scale.

Note that the overwhelming majority of the information is in the first three intrinsic modes. Beyond that, they are nearly flat. This is borne out by showing the periodograms to the same scale:

periodograms intrinsic modes sunspotsFigure 9. Periodograms of the EEMD analysis of the annual mean sunspot numbers. All panels are printed to the same (arbitrary) scale.

Now, this shows something fascinating. The EEMD analysis of the sunspots has two very closely related intrinsic modes. Mode C2 shows a peak at ten or eleven years, plus some small strength at shorter periods. Mode C3 shows a smaller peak at the same location, ten or eleven years, and an even smaller peak at sixteen years. This is interesting because not all of the strength of the ~ eleven-year sunspot signal falls into one intrinsic mode. Instead it is spread out between mode C2 and mode C3.

DISCUSSION: First, let me say that I would never have guessed that white noise could function as a bank of bandpass filters that automatically group related components of a signal into a small number of intrinsic modes. To me that is a mathematically elegant discovery, and one I’ll have to think about. Unintuitive as it may seem, noise aided data analysis is indeed a reality.

This method of signal decomposition has some big advantages. One is that the signal divides into intrinsic modes, which group together similar underlying wave forms. Another is that as the name suggests, the division is empirical in that it is decided by the data itself, without requiring the investigator to make subjective judgements.

What is most interesting to me is the showing by the authors of EEMD2005 that EEMD can be used to solidly establish a connection between two phenomena such as the Southern Ocean Index (SOI) and the El Nino Cold Tongue Index (CTI). For example, the authors note:

The high correlations on interannual and short interdecadal timescales between IMFs [intrinsic mode functions] of SOI and CTI, especially in the latter half of the record, are consistent with the physical explanations provided by recent studies. These IMFs are statistically significant at 95% confidence level based on a testing method proposed in Wu and Huang (2004, 2005) against the white noise null hypothesis. The two inter-annual modes (C4 and C5) are also statistically significant at 95% confidence level against the traditional red noise null hypothesis.

Indeed, Jin et al. (personal communications, their manuscript being under preparation) has solved a nonlinear coupled atmosphere-ocean system and showed analytically that the interannual variability of ENSO has two separate modes with periods in agreement with the results obtained here. Concerning the coupled short interdecadal modes, they are also in good agreement with a recent modeling study by Yeh and Kirtman (2004), which demonstrated that such modes can be a result of a coupled system in response to stochastic forcing. Therefore, the EEMD method does provide a more accurate tool to isolate signals with specific time scales in observational data produced by different underlying physics. SOURCE:EEMD2005 p. 20

Now of course, the question we are all left with at the end of the day is, to what extent do these empirical intrinsic modes actually represent physical reality, and to what extent are they merely a way to mathematically confirm or falsify the connections between two datasets at a variety of timescales? I fear I have no general answer to that question.

Finally, contrary to the authors of the paper, I would hold that the great disparity between all of the intrinsic modes of the Amazon flow data and of the sunspot data, especially mode C3 (Fig. 7), strongly suggests that there is no significant relationship between them.

Always more to learn … I have to think about this noise assisted data analysis lark some more …

w.

My Usual Request: If you disagree with me or anyone, please quote the exact words you disagree with. I can defend my own words. I cannot defend someone’s interpretation of my words.

My New Request: If you think that e.g. I’m using the wrong method on the wrong dataset, please educate me and others by demonstrating the proper use of the right method on the right dataset. Simply claiming I’m wrong doesn’t advance the discussion.

Data: Available as an Excel workbook from the original article.

Code: Well, it’s the usual ugly mish-mash of user-aggressive code, but it’s here …  I used two EEMD implementations, from the packages “hht” and “Rlibeemd”. If you have questions about the code, ask …

Advertisements

190 thoughts on “Noise Assisted Data Analysis

  1. Off the cuff and without working any examples, maybe the addition of white noise seems to assist problems of the type of subtraction of 2 large numbers whose very small difference is the sought effect. Add noise again and again, then each subtraction of 2 noisier numbers than the originals gives a range of noisy difference numbers, which is given a statistical analysis purporting to show confidence estimates and a best estimate of mean or median or whatever. Sorry, this is an abstract comment and one should not be too much led by the ‘feel’ of data and rushing it into print.
    It is like satisfaction of entropy conditions. No amount of added white noise should modify an outcome. Surely the best outcome is there in the raw data, insensitive to how much noise is added.

    • Geoff Sherrington December 10, 2015 at 8:09 pm

      No amount of added white noise should modify an outcome. Surely the best outcome is there in the raw data, insensitive to how much noise is added.

      Thanks, Geoff. You’d think that the raw data would be best, and that’s what I’d thought … but it seems that it is demonstrably not the case.
      Go figure …
      w.

      • Photographers used to use an unsharp mask to sharpen images and to restore movies. Old Westerns were probably restored this way.
        I found a similar technique online years ago that I use to bring up murky images and to process faded text using Photoshop,
        I convert image to RGB color or to greyscale and creat a duplicate image. I then add gaussian blur (noise) to the original. Next I modify the duplicate by “hard light” or “vivid light” and then filter using Hi-pass. Finally I adjust the opacity of the duplicate layer and merge the two layers. The result is a non-destructive sharpening superior to an unsharp mask.
        After using this technique I have been able to use OCR on documents that seemed to require retyping.
        The key to sharpening is the gaussian blur which first makes the image unsharp, an approach that seems to me to have something in common with EEMD.
        If after submerging your data in noise, you remove the noise, what you will be left with is the signal. And if there are two signals both should be recoverable. The noise in the original data will be washed away along with the noise you added.

      • Addition of noise helps when you are applying a non linear – an extreme non linear – function to a sample set.
        For example when digitising sound, noise is added so that sub-bit data doesn’t vanish, but becomes statistically represented by an average over time.
        that is, if you have a meter than consist of say – a red light when the voltage is over 1V and a green light when its less, without noise a steady value will result in a steady lamp. Adding random noise of a peak of +-1V or so and sampling over time will result in some red lights and some green, and the ratio of the two will
        be the actual voltage.
        Apart from that adding random noise should not actually make any difference at all.
        This leads me to suspect that in this case they are applying some non linear function. And that I am afraid puts the whole of the approach in a suspect place.
        All too much research these days, and indeed the whole AGW thing is an extreme case of what we, as apprentices, used to call ‘BBB’.
        Bullshit Baffles Brains
        The technique of constructing complex descriptions and explanations that no one understood, but which sounded plausible if well presented, and could be guaranteed to fool all the wannabe clever clogs and know-it-alls. Who were placed in the position of either having to admit they didn’t understand a word of it, or agreeing with it.
        This has all the hallmarks of a superb piece of BBB.

      • Leo Smith December 11, 2015 at 3:02 am


        Apart from that adding random noise should not actually make any difference at all.

        Yes, that seems logical …

        This leads me to suspect that in this case they are applying some non linear function. And that I am afraid puts the whole of the approach in a suspect place.

        Dear heavens, I described the exact function that they are using, as did the EEMD2015 authors. I fear that your “suspicions” do not intersect with reality.

        All too much research these days, and indeed the whole AGW thing is an extreme case of what we, as apprentices, used to call ‘BBB’.

        Bullshit Baffles Brains

        The technique of constructing complex descriptions and explanations that no one understood, but which sounded plausible if well presented, and could be guaranteed to fool all the wannabe clever clogs and know-it-alls. Who were placed in the position of either having to admit they didn’t understand a word of it, or agreeing with it.
        This has all the hallmarks of a superb piece of BBB.

        While it is indeed somewhat baffling, your assumption that that means it is bullshit is a bridge too far. The problem with your claim is that the method does in fact work. Remember that this is an improvement on the original EMD method, and the results are demonstrably better. It also clearly shows the close relationship between the Southern Oscillation Index and the El Nino Cold Tongue Index.
        So it may indeed be baffling … but as Galileo is rumored to have said, “E pur si muovo”.
        w.

      • Frederick Colbourne: Photographers used to use an unsharp mask to sharpen images and to restore movies. Old Westerns were probably restored this way.
        I only quote the opening lines to identify which post I am responding to. That was a most interesting post that you wrote. Thank you.

      • Geoff Sherrington speculated: Off the cuff and without working any examples, maybe the addition of white noise seems to assist problems of the type of subtraction of 2 large numbers whose very small difference is the sought effect.
        I have subtracting one two-dimensional matrix from another of the same dimensions: satellite images of the same scene each of which had several hundred thousand data points.
        Taking the difference between only two numbers can create serious problems: such as when the two numbers represent incoming and outgoing radiation at the top of the atmosphere and the difference is the net radiative imbalance of the Earth.
        I demonstrate this below with reference to a series of papers by NASA scientists and their colleagues in other government agencies and the private sector.
        My understanding of Stephens et al. The authors pointed out that the net energy balance is the difference between incoming and outgoing radiation, two numbers around 340 Wm-2. The authors pointed out that the energy imbalance from ocean heat content (OHC) was only 0.6 Wm-2. Since both numbers (incoming and outgoing) have errors, the figure 0.6 divided by 680 is the relevant statistic. Radiation at the top of the atmosphere has to be measured with a precision and accuracy of about 0.1 of 1%, about one part in a thousand, something that these NASA scientists and their colleagues state is not possible with present technology.
        “The net energy balance is the sum of individual fluxes. The current uncertainty in this net surface energy balance is large, and amounts to approximately 17 Wm-2. This uncertainty is an order of magnitude larger than the changes to the net surface fluxes associated with increasing greenhouse gases in the atmosphere (Fig. 2b). The uncertainty is also approximately an order of magnitude larger than the current estimates of the net surface energy imbalance of 0.6 ±0.4 Wm-2 inferred from the rise in OHC. The uncertainty in the TOA net energy fluxes, although smaller, is also much larger than the imbalance inferred from OHC.
        Stephens, Graeme L., et al. “An update on Earth’s energy balance in light of the latest global observations.” Nature Geoscience 5.10 (2012): 691-696.
        URL: http://www.nature.com/ngeo/journal/v5/n10/full/ngeo1580.html
        My understanding of Loeb et al: The authors worked to correct errors in the standard values for: TSI; the ratio of Earth’s surface area to the disc presented to the Sun (taking into account the flattening at the poles and the fuzziness at the terminator; albedo; etc).
        My estimates of precision and accuracy effects: if incoming energy is 340 Wm-2, then with albedo 0.300 then net incoming energy would be 238 Wm-2, whereas with albedo 0.305 the net incoming would be 236.3 Wm-2, a difference of 1.7 Wm-2 about 3 times the rate estimated from OHC. An error this great (1.7%) could signal a bigger difference in energy imbalance than that between 1650 and 1950 (the Maunder Minimum and the date at which CO2 began its steep increase. NASA scientists and their colleagues have said that the accuracy and precision of their satellite data (from Ceres) is insufficient to support firm statements about whether the Earth is warming or cooling.
        Loeb et al (2012) summarized the combined effect of all errors found. When estimates of solar irradiance, SW and LW TOA fluxes are combined, taking account of +0.85+/-0.15 Wm-2 heat storage by the oceans (Hansen’s 2005 estimate), the possible range of TOA flux becomes Minus 2.1 to Plus 6.7 Wm-2, Based on well-established physical theory, the instruments tell us that net radiative flux is either positive or negative. The Earth is either radiating more energy than it receives or less energy than it receives.
        Loeb, Norman G., et al. “Observed changes in top-of-the-atmosphere radiation and upper-ocean heating consistent within uncertainty.” Nature Geoscience 5.2 (2012): 110-113.
        URL: http://www.met.reading.ac.uk/~sgs02rpa/PAPERS/Loeb12NG.pdf
        Updated from this earlier paper: Loeb et al, Toward Optimal Closure of the Earth’s Top-of-Atmosphere Radiation Budget, Journal of Climate, 2009.
        URL: http://www.nsstc.uah.edu/~naeger/references/journals/Sundar_Journal_Papers/2008_JC_Loeb.pdf
        My understanding of work by James Hansen et al: James Hansen’s reported estimate of energy imbalance from OHC 0.85 Wm-2 in 2005 declined to 0.58 Wm-2 in 2011. This figure was refined to 0.5 Wm-2 by Loeb et al in 2012.
        “Earth has been steadily accumulating energy at a rate of 0:50+/-0:43 Wm-2 (uncertainties at the 90% confidence level). We conclude that energy storage is continuing to increase in the sub-surface ocean.”
        Problem here is that, if these numbers mean anything at all, they seem to indicate that the estimates of energy imbalance seems to have been decreasing from 2005 to 2013, from 0.85 to 0.58 to 0.43 Wm-2.
        But looking at the error bars and reading the reservations concerning the uncertainties, I do not believe the numbers tell us much for certain and I believe that the authors are trying to tell us this the best way they know how without being subject to ostracism.
        I think we owe these people a debt of gratitude for being as candid as they could be given their personal circumstances.
        (So much for taking the difference between two big numbers, something our professors used to warn us about in the days when we still used slide rules.)

      • Frederick Colbourne wrote: “I found a similar technique online years ago that I use to bring up murky images and to process faded text using Photoshop”
        What a great example, and comparing it with the technique of adding noise to noisy data before “sharpening” it makes lots of sense. It gave me a way to visualize the process I’d never had before.
        Linear filters assume random noise. If you have patterned noise in the signal they won’t work very well. Adding a gaussian blur makes the noise more random (normalizes it) around the mean without changing the mean. Taking the least squares fit (or some equivalent linear filter) after randomizing the noise should result in a much sharper signal. It would be a great improvement when you were dealing with a signal that contains patterned, non-random noise. Very elegant!
        Maybe that’s not exactly how it works, but it certainly makes “visual” sense to me and I’ve also had fair experience both filtering noise from data and from photos. Thanks for describing that method.

    • Adding noise is not uncommon at all, particularly to avoid singularity issues with matrix inverse problems (even if the inverse is shortcut somehow). Decomposition such as these often involve matrix inversions.
      Admittedly, I have not read your exposition in detail – I’m in a bar making a detailed analysis difficult, but I wanted to point this out.
      Mark

    • And in this case half of the raw data is just totally made up out of whole cloth.
      The Amazon river system is an enormous assemblage of rivers and streams and creeks, and who knows what else, and it is a living organism. Everything moves around or is moved around by local rains, and flash floods and human activities, and the idea that the water flow rate at the mouth, into the Atlantic ocean can be estimated by measuring a local water level in a single spot, is just asinine. Yes they measured two spots; whoopee !
      Do these EEMD geniuses have a mathematics background that might extend to the general theory of sampled data systems. Would they perhaps have any knowledge of the Nyquist criterion, and what it might have to say about the validity of vastly under-sampled “data” samples of a continuous function (and probably a continuous function of multiple variables.
      But I’m with you Willis, I was not impressed with the obviousness of their analysis discovery of a previously unknown effect. It stood out like a black cat in a cellar at midnight.
      g

      • George,
        IMO finding a statistically significant correlation at even one spot would be meaningful, if it were on the main stem, ie downstream from either Iquitos, Peru or Manaus, Brazil, depending upon whose definition of main stem you adopt, Brazil’s or everyone else’s.

    • Well your example of subtracting two large numbers to obtain the value of their small difference, is about what Lord Rutherford was talking about, when he famously said: “If you have to use statistics, you should have done a better experiment.” which in your case would be to alter the experiment so as to directly observe the difference quantity. There can also be mathematical solution to remove the problem completely.
      For example, If I have a sphere of radius (R), and I cut it with a plane giving an intercept circle of radius (r), I can calculate the “sag” of the spherical cap using Pythagoras, giving: s = R – sqrt(R^2 – r^2)
      Now if I want to know the sag of a one meter radius cap cut from a one km radius sphere, you can see I have your small difference problem.
      We encounter exactly this problem in optics with the sag of lens or mirror surfaces, when R >>> r, and we end up at a dead end if the sphere is actually a plane so R = infinity. How do I enter infinity as the sphere radius value in my computer, or what if my computer wanted to pick an infinity radius planar surface itself.
      So we don’t use the formula above at all.
      I can rewrite that equation as:
      s = [ (R- sqrt(R^2-r^2)).(R+sqrt(R^2-r^2)) ] / [R+sqrt(R^2+r^2]
      = (R^2 – (R^2-r^2)) / (R + sqrt(R^2-r^2) = r^2 / (R + sqrt (R^2-r^2))
      So I have exchanged the small difference of two large numbers for the sum of those same two numbers.
      Then I can multiply top and bottom by C = 1/R and get.
      s = Cr^2 / (1+sqrt(1-C^2.r^2))
      So I now have an exact equation for the sag no matter how large the sphere radius (R) and I can enter C = 0 to take care of that pesky case of infinite radius.
      As a practical matter, we use curvatures of optical surfaces rather than radii, because the optical power varies directly with the curvature, and not the radius.
      That final sag equation can be further modified with the addition of a parameter K (kappa), called the conic constant, to give the sag for a conic section such as ellipse, parabola or hyperbola. The value of K is simply – (e^2) where (e) is the eccentricity of the conic section.
      s = Cr^2 / (1+sqrt(1-(1+K)C^2.r^2))
      Hyperbola K < -1; Parabola K= -1 Ellipse -1<K0
      You can’t calculate Amazon river flow or global Temperature from this equation.
      I’m not in favor of adding noise to anything, and then believing that some signal I might ” recover ” is the actual signal that might have been there, without the added noise.
      Some ” noisy ” signals, actually contain amplitude dependent noise, so the noise content is a non linear function of the real signal amplitude. Adding white noise is going to corrupt that relationship, and give a systematic error in any recovered “signal”.
      g

      • George Smith wrote: “I’m not in favor of adding noise to anything, and then believing that some signal I might ” recover ” is the actual signal that might have been there, without the added noise.”
        I would argue that, in situations that you can be reasonably certain the noise *should* be random, or when the noise filter you’re using assumes random noise, adding noise to it to force it to be random will result in improved signal recovery, as exemplified by Frederick’s Photoshop example. That only makes sense when you have good reason to believe, by way of experimental evidence, that the signal you’re extracting experiences random noise. In that case what you’re really doing is normalizing systematic non-random noise that’s produced by your instruments or collection methods to make it compatible with the assumptions of your filter.

      • Sorry for the redundant “exemplified by … example.” Should have been “demonstrated by … example”.
        Also, it’s important to remember that whenever we take the mean of time series we’re removing noise to extract the real signal. The assumption built into that moment (the mean) s that noise is normally distributed. If that isn’t the case, using the mean to represent the signal is statistically invalid to begin with. Normalizing the noise is a concession made to your instruments. When you have some reason to believe the noise isn’t fundamentally normal, you probably shouldn’t be using the mean (least squares regression, etc.) in the first place.

  2. “Simply claiming I’m wrong doesn’t advance the discussion.”
    No but then sometimes a discussion isn’t warranted. A dismissive comment goes a long way towards helping the writer understand his work is not worth much.
    In this case, I did kinda like this post. I like the fact that a certain amount of humility was present and that you didn’t try to one up the authors nor especially become critical when you were not able to understand what they were doing.
    Progress. We are making progress.

  3. Dinostratus December 10, 2015 at 8:19 pm Edit

    “Simply claiming I’m wrong doesn’t advance the discussion.”

    No but then sometimes a discussion isn’t warranted. A dismissive comment goes a long way towards helping the writer understand his work is not worth much.

    No, Dino, a “dismissive comment” doesn’t “help the author” understand anything unless it is supported by showing the right way to do it. Which was my point, although obviously you missed it … no surprise, being as how you are one of the worst offenders in this area.
    You have popped up again and again to make “dismissive comment” after “dismissive comment”, but despite being asked again and again, somehow you never have gotten around to proving that you actually know something by showing us how the tasks should be done.
    And since you are not willing to show us that you know a damned thing, and you are not willing to put your claims out for public inspection, and you say nothing but an endless repetition of what boils down to nothing more than “Willis is wrong, Willis is wrong, Willis is wrong” … why on earth should anyone pay the slightest attention to you?
    Sorry, Dino, but once again you’ve proven that you’re all hat and no cattle …
    w.

    • “why on earth should anyone pay the slightest attention to you?”
      Usually because Willis is wrong. Even you had to admit that you fubared the units on somethings as simple as the evaporation of water.

  4. Very non-intuitive. It seems akin to pouring sand into a telescope to see Mars better. Does this addition of white noise dilute the autocorrelation of the signal of interest? Or is it some other form of juju? Is a puzzlement…

  5. Willis –
    This addition of noise sounds crazy of course. BUT – it does remind me of the so-called “stochastic resonance” apparently used by some biological creatures (crayfish?) and is related to the classic engineering use of “dither” (to achieve a resolution of data BELOW the least significant bit), typically in digital audio. Possibly relevant. Being so counter-intuitive such measures constitute extraordinary claims and require an extraordinary degree of explanation.
    But interesting. Thanks.

    • Yep, adding white noise to an audio signal randomises error in digital processing making it more uniform and less distracting. Good (audio) explanation https://www.youtube.com/watch?v=zWpWIQw7HWU (5 minutes) and, more broadly, wiki https://en.wikipedia.org/wiki/Dither
      I understand that the human ear does something similar to low level signals, wiki says “The human ear functions much like a Fourier transform, wherein it hears individual frequencies.[8] The ear is therefore very sensitive to distortion, or additional frequency content that “colors” the sound differently, but far less sensitive to random noise at all frequencies.” but I am not clear about this, sounds like a fascinating topic.

      • Thanks, Keith. What you are talking about is different from the use of noise in the paper. As they describe it, the white noise effectively acts as a “dyadic filter bank”, viz:

        … EMD is effectively an adaptive dyadic1 filter bank when applied to white noise.

        Footnote 1 in turn says:

        1 A dyadic filter bank is a collection of band pass filters that have a constant band pass shape (e.g., a Gaussian distribution) but with neighboring filters covering half or double of the frequency range of any single filter in the bank. The frequency ranges of the filters can be overlapped. For example, a simple dyadic filter bank can include filters covering frequency windows such as 50 to 120 Hz, 100 to 240 Hz, 200 to 480 Hz, and et al.

        Who knew?
        Best regards,
        w.

      • Keith –
        thanks for the great reference to Nigel’s superb video on dither. I remember him from many years back, but never saw that.
        I think that the Wu/Hwang reference suggests that their method really is stochastic resonance but with repetition and averaging out of noise (more or less straightforward). Isn’t this what the ear does with dithering of consecutive cycles and the formation (somehow) of an overall audio impression?
        Bernie

      • Willis –
        I think it really is stochastic resonance. I don’t see it as a “filter bank” in any common useage. On the other hand, I have always disliked the term resonance as it is used in “stochastic resonance” – no filtering there either.
        Looks worth exploring with Matlab toys.
        Bernie

      • Thanks for the links.
        The video was particularly interesting to me because digital silk-screening had occurred to me as an analogy (or, depending on how you look at it, an example) when you guys mentioned dither, and, voila! that’s what the video used.

      • Sorry, guys, but the original paper differentiates this use of white noise from the “stochastic resonance” use. See page 7 for their discussion of the differences between the two.
        w.

    • Bernie Hutchins:
      “Stochastic resonance” falls outside my experience set, but to me it appears that you and Mr. Eschenbach may be agreeing in substance but not in nomenclature. Here’s what I inferred from the paper:
      In basic empirical mode decomposition (“EMD”), each of the so-called intrinsic mode functions (“IMFs”) into which the signal is decomposed can be thought of as a local-AC component of the previous IMF, leaving a local-DC residue. How “local” the local DC is can vary along the input record; since the “DC” signal is made of splines through the maxima and minima, the degree to which the “DC” is local depends on how far apart in time the local minima’s and maxima’s occurrences are. So removal of a given IMF may remove only the very high-frequency components from one portion of the record but remove even fairly low-frequency components from another.
      Nonetheless, each successive IMF removes more AC until there’s only a single global maximum and a single global minimum left in the residue, “DC” signal.
      That’s the basic EMD method. Ensemble empirical mode decomposition (“EEMD) adds white noise, thereby making the time differences between maxima and minima relatively uniform and thus making successive IMFs’ frequency ranges overlap less than would otherwise be the case.
      So, when the authors talk about a “dyadic filter bank,” they’re talking about an aspect of the basic EMD method can be thought of as being approached very roughly by basic EMD and more closely by EEMD.
      How well that matches one’s concept of filter is in the eye of the beholder, I suppose.

  6. They look for an 11.5 year cycle, but the solar cycle varies in length. Would they get more useful results by calculating the years of the solar maxima, then calculating for each year what % of the way through a solar cycle that year was, then seeing if there is a correlation between river flow and % of the way through the solar cycle? From looking at the graphs, I can see any obvious correlation, but maybe for variables other than Amazon flow, that would be a more useful way of looking for a correlation with the solar cycle.

    • Thanks for the comment, Carl. In general I use a cross-correlation function (ccf) to see if there is any lagged response to a purported driver like the solar cycles. In this case, as I commented in the head post, using the CCF there is little correlation at any lag …
      w.

  7. Is there a possible response from Dr Curry?
    Would the more passes made through the noise added filter actually add error to the original data due to floating point calculations?

      • Ooops…fingers too fast on the keys. What I meant to say is that it would be a good idea to check with the paper’s authors if you cannot duplicate their results. Perhaps they don’t respond, but at least you can say you tried.

      • wxobserver December 11, 2015 at 4:12 pm

        Ooops…fingers too fast on the keys. What I meant to say is that it would be a good idea to check with the paper’s authors if you cannot duplicate their results. Perhaps they don’t respond, but at least you can say you tried.

        Go for it, wxobserver, and report back with what you find out from them. There’s an off chance it might be interesting. Won’t change the conclusions of my study, however. Their problems of data and results are there whether I can exactly duplicate their results or not.
        w.

      • I just got a reply from one of the authors, Andres Antico. He sent me a link to some Matlab code and I was able to easily duplicate his results. The link to that code is below. You will need to edit the eemd.m file and change “iter<=10" to "iter<=50" on line 95.
        The author pointed out that you need to fix the sifting iteration count at 50 in order to duplicate his result.
        http://rcada.ncu.edu.tw/research1_clip_program.htm
        Load the annual mean flow data into a variable (e.g. "flow") and then run this:
        imfs=eemd(flow,0.6,2000);
        It worked like a charm for me.
        Finally, the author seems like a very personable fellow — I'm sure you could have a fruitful exchange with him if you were to reach out.

      • Thanks greatly for that, wx. However, I don’t have MatLab so I fear his code won’t help. Did he send you the data as well, or did you use his online dataset?
        Also, I did restrict the sifting to 50, so that cannot be the problem.
        You then say:

        Load the annual mean flow data into a variable (e.g. “flow”) and then run this:
        imfs=eemd(flow,0.6,2000);

        Unfortunately, that is exactly what I did above, and I did NOT get his results (see Figs. 4 & 5 above). Here’s the code I used, which I’d set up to match his code exactly, and “flowcheck” is the annual mean flow data:
        imfs <- eemd(flowcheck,num_imfs = 6, num_siftings = 50,
        ensemble_size = 2000,
        threads = 1,noise_strength = 0.6*sd(flowcheck))
        This is the same as your shorter code above, but I don't get the answer that he got.
        Could you plot up the results that you got so I can take a look?
        Finally, did you do a periodogram on IMF3, the third intrinsic mode, to see if you can find his claimed 10.7 year cycle?
        Thanks,
        w.

      • Willis,
        I was going to guess you didn’t have Matlab. I tried to tweak the code to run in FreeMat but it was using a lot of intrinsic Matlab functions not available in FreeMat; I gave up.
        I used the same data you did — from the Excel spreadsheet in the supplemental data, the second page with annual mean flows. It will take me a few hours but I’ll re-run the analysis and put up some plots and try to duplicate the periodogram — if I can’t I’m sure the author will offer some tips.
        P.S. The author, Antico stated in his e-mail that one reviewer was able to duplicate his results so they seem to have dotted i’s and crossed t’s there.

      • Okay, I have managed to duplicate the periodograms also. Not exactly but very close, close enough that I don’t think it’s worth contacting the author for his method. You can see plots of the duplicated IMF modes and periodograms here:
        https://wxobserver.wordpress.com/2015/12/16/32/
        There’s one file with a logarithmic X-axis in frequency (cycles per year) and another with a linear X-axis in period (years/cycle).
        Here’s how I did the periodograms. First there’s not enough data points to get much frequency resolution, so I zero-extended the data out to 1024 points, then took the magnitude-squared of the DFT with appropriate amplitude scaling. The peak on C3 is near 10.7 year period (closer to 10.8 in my result but that’s close enough I think).
        I also applying a window to the data before zero-extending…results very similar for C3 at least.
        If you don’t zero-extend the data, there’s still a peak there but the low resolution means the peak is at about 11 years instead of 10.7

      • wxobserver wrote December 15, 2015 at 5:04 pm in part:
        “,,,, Here’s how I did the periodograms. First there’s not enough data points to get much frequency resolution, so I zero-extended the data out to 1024 points,…..
        Zero padding does NOT increase resolution. It interpolates the spectrum smoothly, but does not make it possible to see any new frequency points. Basically the “uncertainty principle”. The zeros do NOT give any NEW information. This has been known a very long time, but is still widely misunderstood. For a full recent explanation, please see:
        http://electronotes.netfirms.com/EN222.pdf
        See in particular starting at page 23.

      • Bernie,
        Thanks, and I’m already 100% aware of that (my professional background includes DSP). I did not take the time to explain that just cuz I’m being lazy. One of the things I like about this blog is that is is populated by a lot of very sharp folks.
        I am trying to duplicate the author’s work and he claims there’s a peak in the spectrum around 10.7 years. A straight DFT of the 111-point C3 signal does not (cannot) give that answer. With a sampling interval of 1 year and 111 points it is impossible for the DFT of C3 to have a peak at 10.7 years. The nearest candidates are 11.1 and 10.09 years. I figured the author probably interpolated the spectrum, so I did the same. The results I got are so close to the author’s that I did not feel the effort to contact him about the periodogram algorithm he used was warranted.
        Let me know if I missed something else there, and I can still write the author if you’re not satisfied with my attempt to duplicate his results.

      • I can’t believe I’m getting sucked into this but I am. I decided to plot sunspot number versus C3, like Willis did except I inverted C3 and scaled the data so it could be overlayed on the same plot. You can see it at the same place linked above:
        https://wxobserver.wordpress.com/2015/12/16/32/
        I think I partially agree with Willis’ conclusion that there’s no solar signal. Certainly not at the level of being a predictor of river flows. Not at the level of being useful in itself. But still, I think there is an unmistakeable correlation there too. Take a look at the graph linked above, I’m curious what others think.

      • wxobserver wrote December 15, 2015 at 11:17 pm in part:
        “ ….One of the things I like about this blog is that it is populated by a lot of very sharp folks. …..”
        Exactly so – many of whom have just enough knowledge of DSP to be “dangerous”!
        I concur with your comments. My real concern was probably that many do associate “resolution” with “detection” (as in ordinary speech). Here we were concerned with determining if a 10.7 year cycle was present, or not (detection).
        Many suppose that if you do not resolve a frequency component its energy is lost. This would be true if we had a tunable sharp bandpass in use for spectral analysis, and did not ever “park” it over the 1/10.7 frequency. The FFT is a filter-bank and always puts the energy somewhere, usually very close to where it belongs.
        Glad you are having fun.
        Bernie

      • Bernie,
        Yes. I skipped over a lot of material in my post. Nobody wants to read a 10-page post I suspect. For example, I did not bother to mention that the DFT presumes this sequence is stationary (repeats forever w/o change). Or that it is merely fortuitous that the end-points sort-of match up so that using a rectangular window doesn’t add much in the way of artifacts. Or that using a tool that assumes stationarity (DFT) to analyze a non-stationary process is risky. We’re looking at a short segment of data that has behaviors on the scale of 1000’s of years and trying to draw meaningful conclusions. All very dodgy business. But it’s what we have to work with, so we try…hopefully w/o losing sight of the limitations.

  8. ” Unintuitive as it may seem, noise aided data analysis is indeed a reality.” Ugh. Like doping a silicon chip? Weeds as a cover crop? Perhaps one can parse out constructive and destructive interference?

  9. good work as always Willis. I read the paper when Judith linked it. I also thought there interpretation of their results is very odd. I downloaded their SI and put it to one side to look at the data later.
    At best they found a very weak signal and showed if there is an 11y solar influence, it is very small. Even having pre-select a small range of frequencies the 11y is not particularly striking. It should perhaps be noted that this method has split the power of this cycle between C2 and C3, this making it look even less important.
    However, your reproduction of their working may have some other interesting results.
    In your fig 6 there are some notable peaks in each band. This would be worth noting. Just reading off the graph and taking the anti-log to get the period, I estimate the following.
    C1: a pair of peaks around 2.4,2.5 years, probably QBO
    C2: a very strong peak at 6y
    C3: a secondary peak close to 18y
    C4: circa 34y
    I would not give much importance to the later on the basis of this short dataset but it is interesting since it is found in much longer, multi-century datasets.
    This paper may be more important for what if fails to notice than what it did find. Anyway the method itself is worth knowing about.
    Could you post accurate figures for those peaks please. This is interesting.

    • PS. Where can I find the various R files required by your code linked here. I have several of them in my “Willis” directory but others are missing. I have an SFT directory with several R files which you may have now grouped together.
      could you provide a zip of all the ancillary function files in their current form, please.

    • Mike:
      What I found interesting was that comapring the graphs, most reflections of solar cycle influence are in the reconstructed sections of the data. There is minimal appearance of solar cycle influence in the measured flow levels.

  10. It’s cute, I’ll give it that. But I’d like to see a whole lot more analysis on how adding noise — white or otherwise — reveals valid signals that Fourier analysis doesn’t show. If Fourier analysis somehow doesn’t work, I think that’s worthy of a lot of discussion.
    One editorial quibble. I misread “but using annual average data cuts your number of data points by 12.” the first time I read it to mean that “annual averaging cuts a few points off the ends” Clearly not what you intended. Maybe “… by a factor of 12.”
    Do keep us posted.

    • Don K December 10, 2015 at 11:30 pm

      It’s cute, I’ll give it that. But I’d like to see a whole lot more analysis on how adding noise — white or otherwise — reveals valid signals that Fourier analysis doesn’t show. If Fourier analysis somehow doesn’t work, I think that’s worthy of a lot of discussion.

      EEMD does NOT show that Fourier analysis doesn’t work. What EEMD is is a different way of decomposing a signal. Every signal decomposition method, including the many forms of Fourier analysis, has its pluses and minuses.

      One editorial quibble. I misread “but using annual average data cuts your number of data points by 12.” the first time I read it to mean that “annual averaging cuts a few points off the ends” Clearly not what you intended. Maybe “… by a factor of 12.”

      Thanks, Don, fixed.

      Do keep us posted.

      Always.
      w.

      • EEMD does NOT show that Fourier analysis doesn’t work. What EEMD is is a different way of decomposing a signal.

        Maybe. Off hand, it seems to me that when you ask a question that has a small range of answers — is the sunspot cycle correlated with Amazon River flow ? — you ought to get consistent answers no matter what valid approach you use. It’s OK if some approaches say “maybe” or “insufficient data” while others give a definite answer. It’s not so OK if one says “Yes” and another says “No”.
        We very likely know what question Fourier is answering. Maybe EEMD — which I do not remotely understand — is asking a different question? What question?
        But what do I know?

  11. Interesting concept… perhaps it is related to the observation that while the randomness/white noise should disappear ( or produce no significant periodicity) when analysing a really reallly long signal – a segment of white noise/randomness may have very significant, but random, periodicity associated with it. By adding and processing repeated segments of white noise to the original signal, you are in effect reconstructing any actual signal periodicity over a much much longer data set – thus allowing the white noise/random side of the measurements to completely drop out.

  12. I analysed northeast Brazil rainfall. A 52-year cycle and its submultiples were present. Long rainfall series are available Fortaleza. Similar 52-year cycle was noted in onset of southwest monsoon over Kerala.
    Dr. S. Jeevananda Reddy

    • Thaks, Dr. Reddy. I am always very cautious about saying a signal is present in observational data unless I can see a minimum of at least three cycles, and preferably four or more. I say this because what I call “phantom signals” are extremely common in climate science. These are apparent cycles that appear, may last for several cycles or more, and then disappear. They can easily be demonstrated by dividing a long climate time series in half and doing a Fourier analysis on the two halves.
      To see three cycles of a 52-year cycle, you’d need to have 158 years of Brazilian rainfall data, and I don’t think that exists.
      Best regards,
      w.

      • You can see the results in:Pesq. Agroec.Bras., Brasilia, 19[5]:529-543, maio 1994.
        WMO Secretary General issued a statement on World Meteorological Day on 2013 — dry & warm in Southern Hemisphere. Fortaleza completed three cycles [above average peaks and two below average peaks and 2013 comes under third below the average cycle.
        2013 — Durban [South Africa] — 66 year cycle, Maalypye [Botswana] 60 year cycle and Catuane [Mozambique] 54 years cycle also presented below the average — W followed by M shape at all the three stations.
        Dr. S. Jeevananda Reddy

      • I am always very cautious about saying a signal is present in observational data unless I can see a minimum of at least three cycles, and preferably four or more.

        Amen, Amen, Amen, a thousand times Amen. Actually, it’s worse because most climate data is quasi-periodic at best.
        That invalidates any analysis that relies on periodic signals. ie. Fourier is right out. 🙂

      • Thanks, Gloateus, but that is not a study of solar activity on Asian monsoons. It is a study of the effect of solar activity on a computer model’s idea of what the monsoons look like. The problem is that the climate models used are linear in nature, with the outputs being essentially (lagged) linear transformations of the input. So IN THE MODELS, solar signals in equals solar signals out. But in the real world, we have no such linearity of input and output, and signals can get swallowed with little trace.
        In addition, there are problems with the solar itself. Here’s a quote from one of the people doing the reanalysis:
        “Note that the shortwave [solar] from reanalysis has some real problems (it depends a lot on the cloud cover, which is not simulated well in the AGCM used in the reanalsyis).”
        So yes, the paper you linked to definitely shows a solar influence on a climate model … but that’s not what we’re looking for, is it?
        w.

  13. “A 52-year cycle and its submultiples were present. Long rainfall series are available Fortaleza. ”
    Nullius in verba , where’s the data please.

    • To my surprise, there is a dataset of Fortaleza rainfall from 1849 to 2011 here … however, it doesn’t show anything like a 52 year cycle.
      Regards,
      w.

  14. Willis,
    While I think your assessment is fair, and I have no strong feelings about whether or not sunspots correlate with decadal climate shifts, for years before getting into this climate lark I was aware of a strong correlation with wine vintages and sunspots. It used to be a way of coarse filtering for decent vintages. This has been “known” for a long time, much longer than the wrangling over AGW.
    I don’t know if you have thought about it or if it has been discussed here, but it would make for an interesting post one day. I’m afraid all I have is “if there is smoke there’s probably fire” – nothing more substantial than that. But it certainly predates hand-wringing over the climate.

    • Thanks, agnostic. If there were a correlation between vintages and sunspots, I don’t even know how you’d go about testing for it.
      w.

      • If someone sends me sufficient funds to buy the raw materials I will be delighted to start recording liquid data for others to pour over.

      • Well just for fun I dug this up:
        Sunspots and their Effects
        ….which was first published in 1937.
        It’s not exactly riveting science, but it is a little bit interesting. Have scroll a little further on – he makes a correlation with rabbit, lynx and fox pelts and sunspots – I thought that was amusing.
        I think my point is, that people have “noticed” a correlation with sunspot numbers and climatically favourable/less favourable conditions going back quite a long way, and well before there was some politically correct stigma attached to the notion.
        I stress, no I don’t think that proves there is a proven correlation any more than I think centuries of believing the Sun revolved around the Earth validates the idea. But I do think it is one of the great unknowns and worthy of skeptical consideration.

      • Another oddity with sun cycles.
        For the last several years, my chili petines (with black fruit) developed a deep dark green color. Recently for the last part of this summer they began taking on a lighter color. Two years ago I tested them by using shade cloth, and within a week they began to fade. When I took it down, they began to regain the darkness. It wasn’t until this year that they did this without the cloth.
        I am not sure if they are being affected by the cycle but within the next several years I will watch them to see. I believe they develope the color change to protect themselves from the ( X?- rays) that was being put out by the sun.
        Also, Willis, as a sailor on the Navy “big boats” we had a dither signal used to keep the ink pens from sticking to the charts. Like a vibrator signal superimposed on the deflection motors. There was an adjustment to increase and decrease this signal until the deflection would not be noticed by the blob of ink from the wiggle. I am interested in knowing what percentage of dither the digital filtering has to offer.
        In my study of the diurnal cycles (real time) I can see the very small noise being generated on the pressure signal (goes from 0 to .01″ sawtooth pattern). Presently we are coming up on a new moon (between us and the sun) and will spend some time comparing it to other orb passing events to see how much noise is injected into the diurnal.
        Excellent post, keep em coming
        LeeO

      • How about the integral of sunspot numbers?
        Or the integral of group numbers. since the correlation between reviesed SN and revised GN is around 88% and the GN series have a few more 11-year cycles.

    • Not sure what you mean by “trend stationary” but I see no reason it could not be used. Hang on … yes, it works fine. The only difficulty is that the CO2 record is so short that it’s not possible to see anything longer than about 20 years …
      w.

  15. OK Willis, I’ve made some progress. It’s turning into quite a dependency chase to get R to load this so far. Hopefully there will not be any blockers.
    I seem to be lacking sunspots.tab and it’s also trying to reference “plotts2 function.R” in my home dir that does not seem to be in the zip. Can you help with those?

      • looking ahead it looks like I will also be missing ‘decompts function.R’ which also references the home dir. ( I can obviously change the path but the those files are missing from the zip).

      • OK, thanks again Willis, this is getting close now. I’m getting a basic plot of fig 1 but without the annual averages line.
        I’m getting a few messages about things being “masked”, I’m not sure if that is supposed be an error or a warning or what. Do you know whether they matter?
        When it has finished plotting the monthly data it drops out with an actual error.
        Any suggestions?
        Attaching package: ‘lubridate’
        The following object is masked from ‘package:seewave’:
        duration
        Error in lines(x, lwd = backline, col = backcol) :
        object ‘flowannall’ not found

      • Onward, Mike, thanks for your perseverance. My programs are not written in chronological order. When you run into something like “object ‘flowannall’ not found”, do a search for “flowannall=” to find out where it was defined.
        The annual average line uses the function “blackline” which (I hope) is defined in the program … hang on … no, it’s defined in the file “Willis Functions.R”. Make sure it is loaded.
        I don’t think that the masking is a problem, I don’t use the function “duration”.
        Keep the questions coming …
        w.

      • Thanks for taking a look Willis. I did load it but there is not mention of that in “Willis Functions.R” of the zip you provided.
        It was assigned in the Amazon file but commented out. Seems to need uncommenting there.
        I get it, it’s just a record of snippets of code. What bits do I need to reproduce figs 6 and 9 ?
        Thanks.

      • Mike, I checked. The problem is that “flowannall” is wrongly divided by 10 (I had changed the underlying data to match that of the paper after I’d made the graph).
        Your best solution is to change
        flowannall=ts(sapply(theseq,function(x){
        mean(flowts[x:(x+11)])
        })/10,start=1903,frequency = 1)
        and remove the “/10” part of the line.
        w.

      • Thanks Willis, I did as you suggested.
        I’m looking at the section that begins at “# plot spectra ——–”
        If I get as far as the multiline plot command, I get the structure , panels and labels but only c3 and C6 show any data and it’s just a kind of sinusoidal bulge around +0.5 , the rest is empty.
        If I do the section down to “for (i in 1:6){” etc. it all seems to go badly wrong.
        I’m not able to produce anything resembling fig 6 as it should be.
        plot.zoo(speczoo*embiggen,

      • Willis, I’m also getting something quite close to fig 7 except that the first two panels are the same as the last two and show SSN not rain. 🙁

      • Mike, regarding the lack of the blackline, try this version of “Willis Functions R”.
        Regarding the spectra, they use the “imfs” block generated in the section called

        # Decompose with EEMD IMFS —–

        You need to set the “flowcheck” variable to whatever you want to look at, and then run the section below that. It will both set and display the imfs block. Once that is generated you can use it in the “spectra” section.
        If things don’t show, remark out the “ylim=…” line, it may just be that they are at a different scale.
        Regarding “rescale” being masked, I don’t use it so it’s not an issue.
        All the best, no question is too small.
        w.

  16. OK, it looks like I’ve installed all the relevant R libs and their deps. The seamask file is not relevant to this stuff and I’ve commented out refs to it, but it would be good for access to the CERES work, whenever I get time to look at that, that was very interesting.
    In summary, it seems the following files are missing to be able to look at what you’ve provided:
    ‘New Sunspot Data 2015.tab’
    ‘plotts2 function.R’
    ‘decompts function.R’
    ‘LandSea Masks.tab’
    It would be great if you provide a link, Thanks.

  17. Willis: Have you consider the fact that most of the processing steps in Climate data are implementations of a Heterodyne circuit https://en.wikipedia.org/wiki/Heterodyne.
    The Input signal is the Raw Climate Data. The Local Oscillator is the Climate Reference Period and the output is the Anomaly Data.
    Climatology uses a Subtractive Mixer rather than the ‘Ideal’ Multiplier.
    Climatology also uses the Sum (range shifted by 2 to produce Average) and discards the Difference signal at earlier steps in the process.

    • Good job and thanks, but the short version misses out on a lot of posturing and blather by the alarmist Senators.
      If one is unfamiliar with just how low those boys will go and if you have the stomach for endless logical fallacies backed up by misdirection and blatant “lying with facts” type arguments, then watch the whole thing.

  18. Adding noise to a signal is a marvelous technique for extracting waveforms from periodic signals if you have enough cycles.
    As a lab I used to have my students get the waveform of a periodic signal using a comparator to digitize the signal. If the instantaneous voltage was greater than 0, the comparator would output 1. If the voltage was less than 0, the output would be 0. Without the added noise the result would be something like a square wave. Adding noise and averaging samples over enough cycles would reproduce the waveform to whatever resolution was required.
    The technique works if you have enough cycles of a signal whose period doesn’t change. Neither of those conditions is met by by Antico and Torres. What I see here is way too much processing on way too little signal. My first reaction is that it’s crap all the way down.

      • and yet it moves

        … the phrase is used today as a sort of pithy retort implying that “it doesn’t matter what you believe; these are the facts” wiki
        1 – Antico and Torres’ data is spotty. About half of it is infilled with data calculated on the basis of river levels.
        2 – The eleven year period is only approximate and it changes from cycle to cycle.
        Those are fatal problems. No amount of sophisticated analysis is going to fix that.

      • commieBob, I was using the phrase to refer to the analysis method, not to the Antico/Torres data (which is half infilled) nor to their conclusions, which I said in the head post were incorrect
        Best regards,
        w.

    • Just curious: What is the advantage of adding a noise compared to adding a periodic high frequency signal? Does any noise (red, white) lead to the same result?

      • If you add a fixed frequency you will get all kinds of unwanted artifacts. The noise you add should be as close to random as possible.
        This isn’t to say that some math genius will not find a reason to do otherwise. My career seemed to be filled with a succession of: “Holy **** I didn’t know you could do that” moments. My great joy was to be able to inflict such moments on others. 🙂

      • If you add something that isn’t random then it won’t average out and will show up on the output. Depending on the frequencies and amplitudes you can get a real mess.

      • Curious George says:
        December 11, 2015 at 5:12 pm
        A sinusoidal signal averages out. Both white and red noise are random.

        A sine wave averages out if you average all the samples over one cycle. That’s not what’s happening here. In the case where the input waveform’s period is equal to the window, the value you get for the first sample in the output cycle is the average of all the first samples in the input cycles. etc. etc.
        A hardware system is inherently band limited so any discussion about the difference between white noise and red noise is moot.
        If you want to play with this it’s fairly easy to set up a simulation in a spreadsheet.

      • Gloateus Maximus December 11, 2015 at 9:03

        A growing number of atmospheric and oceanic phenomena are being shown to fluctuate under solar influences …

        NO. A growing number of atmospheric and oceanic phenomena, including the Amazon River in this post, are being CLAIMED to fluctuate under solar influences. I have not found one yet that passes even the simplest test. See my previous examinations of a whole raft of such claims.
        Finally, a protip. When you see something written by Gerald Meehl, be “vewy, vewy cautious”, as Elmer Fudd used to say …
        w.

      • Gloateus Maximus December 11, 2015 at 10:44 am Edit

        Willis,
        What percentage of the at least thousands of papers finding significant solar effects on climate have you analyzed?
        Here are hundreds, on only five of which was Meehl lead author, although it’s unclear to me why is particularly anathema in your book:
        http://solar-center.stanford.edu/sun-on-earth/2009RG000282.pdf
        Please for instance take a look at the 2014 monsoon paper I linked above.

        I looked at the 2014 monsoon paper, and commented on it in this thread. It’s junk. And the paper you send me now has Meehl as a lead author …
        You are correct that I can’t analyze the thousands of crappy papers published about the purported solar influence, with new junk coming every day. So instead, I have repeatedly asked people to send me two links, one to the paper that they think is the strongest, and a second link to the data used in the paper because I can’t analyze the paper without the data.
        To date, I have not found one that has passed muster. Here is the current roll call of papers that folks thought were the best that they knew of … but weren’t.
        All the best,
        w.
        Congenital Cyclomania Redux 2013-07-23
        Well, I wasn’t going to mention this paper, but it seems to be getting some play in the blogosphere. Our friend Nicola Scafetta is back again, this time with a paper called “Solar and planetary oscillation control on climate change: hind-cast, forecast and a comparison with the CMIP5 GCMs”. He’s…
        Cycles Without The Mania 2013-07-29
        Are there cycles in the sun and its associated electromagnetic phenomena? Assuredly. What are the lengths of the cycles? Well, there’s the question. In the process of writing my recent post about cyclomania, I came across a very interesting paper entitled “Correlation Between the Sunspot Number, the Total Solar Irradiance,…
        Sunspots and Sea Level 2014-01-21
        I came across a curious graph and claim today in a peer-reviewed scientific paper. Here’s the graph relating sunspots and the change in sea level: And here is the claim about the graph: Sea level change and solar activity A stronger effect related to solar cycles is seen in Fig.…
        Riding A Mathemagical Solarcycle 2014-01-22
        Among the papers in the Copernicus Special Issue of Pattern Recognition in Physics we find a paper from R. J. Salvador in which he says he has developed A mathematical model of the sunspot cycle for the past 1000 yr. Setting aside the difficulties of verification of sunspot numbers for…
        Sunny Spots Along the Parana River 2014-01-25
        In a comment on a recent post, I was pointed to a study making the following surprising claim: Here, we analyze the stream flow of one of the largest rivers in the world, the Parana ́ in southeastern South America. For the last century, we find a strong correlation with…
        Usoskin Et Al. Discover A New Class of Sunspots 2014-02-22
        There’s a new post up by Usoskin et al. entitled “Evidence for distinct modes of solar activity”. To their credit, they’ve archived their data, it’s available here. Figure 1 shows their reconstructed decadal averages of sunspot numbers for the last three thousand years, from their paper: Figure 1. The results…
        Solar Periodicity 2014-04-10
        I was pointed to a 2010 post by Dr. Roy Spencer over at his always interesting blog. In it, he says that he can show a relationship between total solar irradiance (TSI) and the HadCRUT3 global surface temperature anomalies. TSI is the strength of the sun’s energy at a specified distance…
        Cosmic Rays, Sunspots, and Beryllium 2014-04-13
        In investigations of the past history of cosmic rays, the deposition rates (flux rates) of the beryllium isotope 10Be are often used as a proxy for the amount of cosmic rays. This is because 10Be is produced, inter alia, by cosmic rays in the atmosphere. Being a congenitally inquisitive type…
        The Tip of the Gleissberg 2014-05-17
        A look at Gleissberg’s famous solar cycle reveals that it is constructed from some dubious signal analysis methods. This purported 80-year “Gleissberg cycle” in the sunspot numbers has excited much interest since Gleissberg’s original work. However, the claimed length of the cycle has varied widely.
        The Effect of Gleissberg’s “Secular Smoothing” 2014-05-19
        ABSTRACT: Slow Fourier Transform (SFT) periodograms reveal the strength of the cycles in the full sunspot dataset (n=314), in the sunspot cycle maxima data alone (n=28), and the sunspot cycle maxima after they have been “secularly smoothed” using the method of Gleissberg (n = 24). In all three datasets, there…
        It’s The Evidence, Stupid! 2014-05-24
        I hear a lot of folks give the following explanation for the vagaries of the climate, viz: It’s the sun, stupid. And in fact, when I first started looking at the climate I thought the very same thing. How could it not be the sun, I reasoned, since obviously that’s…
        Sunspots and Sea Surface Temperature 2014-06-06
        I thought I was done with sunspots … but as the well-known climate scientist Michael Corleone once remarked, “Just when I thought I was out … they pull me back in”. In this case Marcel Crok, the well-known Dutch climate writer, asked me if I’d seen the paper from Nir…
        Maunder and Dalton Sunspot Minima 2014-06-23
        In a recent interchange over at Joanne Nova’s always interesting blog, I’d said that the slow changes in the sun have little effect on temperature. Someone asked me, well, what about the cold temperatures during the Maunder and Dalton sunspot minima? And I thought … hey, what about them? I…
        Splicing Clouds 2014-11-01
        So once again, I have donned my Don Quijote armor and continued my quest for a ~11-year sunspot-related solar signal in some surface weather dataset. My plan for the quest has been simple. It is based on the fact that all of the phenomena commonly credited with affecting the temperature,…
        Volcanoes and Sunspots 2015-02-09
        I keep reading how sunspots are supposed to affect volcanoes. In the comments to my last post, Tides, Earthquakes, and Volcanoes, someone approvingly quoted a volcano researcher who had looked at eleven eruptions of a particular type and stated: …. Nine of the 11 events occurred during the solar inactive phase…
        Early Sunspots and Volcanoes 2015-02-10
        Well, as often happens I started out in one direction and then I got sidetractored … I wanted to respond to Michele Casati’s claim in the comments of my last post. His claim was that if we include the Maunder Minimum in the 1600’s, it’s clear that volcanoes with a…
        Sunspots and Norwegian Child Mortality 2015-03-07
        In January there was a study published by The Royal Society entitled “Solar activity at birth predicted infant survival and women’s fertility in historical Norway”, available here. It claimed that in Norway in the 1700s and 1800s the solar activity at birth affected a child’s survival chances. As you might imagine, this…

      • Willis,
        Thanks.
        Stipulating that you actually did find insurmountable problems with the 17 studies you’ve analyzed here, do you feel that a fraction of one percent of papers is liable to be representative of thousands to tens of thousands of relevant studies in the past century or so?
        I have been impressed by the results derived from SORCE data, showing that, while TSI indeed doesn’t change much, its spectral composition does, ie the share of the total in UV bands, and that various climatic phenomena are clearly associated with these fluctuations, plus plausible to demonstrated mechanisms explaining these correlations are on offer.
        http://phys.org/news/2011-10-link-solar-winter-weather-revealed.html
        The 2011 Nature Geoscience article itself:
        http://www.nature.com/ngeo/journal/v4/n11/full/ngeo1282.html
        I wonder if you find this paper junk as well.

      • Maximus, there is certainly an overproduction of ‘scientific’ papers. Why don’t you wade through one of your choice and perform a Willis-like analysis?

      • Gloateus Maximus December 11, 2015 at 11:56 am

        Willis,
        Thanks.
        Stipulating that you actually did find insurmountable problems with the 17 studies you’ve analyzed here, do you feel that a fraction of one percent of papers is liable to be representative of thousands to tens of thousands of relevant studies in the past century or so?

        No. But I do think my life is not long enough to analyze tens of thousands of studies.
        I also think that they are representative of the studies that folks have told me are the best, most solid studies on the subject.
        But hey, if you think I’m wrong, send me the TWO LINKS (one to the paper and one to the data) of the paper that you think is the pick of the litter, the one that shows how sunspots affect some weather variable (temperature, pressure, sea level, etc.) here at the surface (i.e. not in the ionosphere) and I’m happy to look at it … however, a single link (as you’ve sent me above) is not adequate. I cannot analyze the paper without the data. I look forward to the two links to whatever paper and data you think is the best, most scientifically solid study of the question.
        Please be aware that the 17 studies above are only the ones that I took the trouble to deconstruct in a detailed manner. For example, you sent me the 2014 paper of the effects of solar variations on reanalysis climate models … and it was not worth deconstructing, it just proves that the sun affects climate models. So the count is more like thirty or forty studies that I’ve examined and found wanting, either for obvious reasons (as in your monsoon link above) or after careful analysis as in the other 17 studies. And it doesn’t include my unpublished analysis of the original Herschel claims that British wheat prices followed the sunspots … I didn’t publish that one because I found a better falsification in the literature.
        And after finding nothing in those forty or fifty of what folks have told me are the best papers, all I can say is what I’ve said many times before:

        There may be a solar-cycle related effect in the weather data somewhere, but despite extensive examination of what people say are the best studies, I haven’t found it.

        Please note that this is quite different from saying that such an effect doesn’t exist, which is not provable.
        Best regards,
        w.

      • Willis,
        OK. Here’s one by Japanese scientists from 2009:
        Influence of the Schwabe/Hale solar cycles on climate change during the Maunder Minimum
        http://journals.cambridge.org/download.php?file=%2FIAU%2FIAU5_S264%2FS1743921309993048a.pdf&code=b2a101953465b8eebb451e4f5abcf862
        George,
        I find nothing to fault in the data analysis of this paper. As for the validity of its data, IMO proxies are often better than the cooked to a crisp, so-called instrumental “observations”, adjusted beyond recognition by their own mothers.

      • Wiilis,
        The study you cited on Herschel’s anticorrelation of sunspots with wheat prices doesn’t falsify the connection, but finds it statistically insignificant. This conclusion however is at odds with other recent work, including papers cited in the analysis you linked, so it’s hardly dispositive:
        Pustil’nik, L. A., and G. Yom Din (2004a), Influence of solar activity on the state of the wheat market in medieval Europe, Solar Phys., 223, 335–356.
        Pustil’nik, L. A., and G. Yom Din (2004b), Space climate manifestations in Earth prices – from medieval England up to modern U.S.A., Solar Phys., 224, 473–481.
        Pustil’nik, L. A., and G. Yom Din (2009), Possible space weather influence on the Earth wheat markets, Sun Geosphere, 4, 35–41.
        Pustil’nik, L. A., and G. Yom Din (2013), On possible influence of space weather on agricultural markets: Necessary conditions and probable scenarios, Astrophys. Bull., 68, 107–124.

      • Willis,
        It’s absurd to claim that people have presented you “the best papers” out of thousands or tens of thousands over the past 215 years, including by some of the best scientists of those centuries.

      • Gloateus Maximus December 13, 2015 at 6:28 am

        Willis,
        OK. Here’s one by Japanese scientists from 2009:
        Influence of the Schwabe/Hale solar cycles on climate change during the Maunder Minimum
        http://journals.cambridge.org/download.php?file=%2FIAU%2FIAU5_S264%2FS1743921309993048a.pdf&code=b2a101953465b8eebb451e4f5abcf862

        Let me remind you of my request:

        But hey, if you think I’m wrong, send me the TWO LINKS (one to the paper and one to the data) of the paper that you think is the pick of the litter, the one that shows how sunspots affect some weather variable (temperature, pressure, sea level, etc.) here at the surface (i.e. not in the ionosphere) and I’m happy to look at it … however, a single link (as you’ve sent me above) is not adequate. I cannot analyze the paper without the data. I look forward to the two links to whatever paper and data you think is the best, most scientifically solid study of the question.

        I’m sure you can see the problem with your response …
        w.

      • Gloateus Maximus December 13, 2015 at 11:46 am

        Willis,
        It’s absurd to claim that people have presented you “the best papers” out of thousands or tens of thousands over the past 215 years, including by some of the best scientists of those centuries.

        Since I never claimed that, it’s unclear what you are referring to …
        w.

      • Gloateus Maximus December 13, 2015 at 10:08 am

        Wiilis,
        The study you cited on Herschel’s anticorrelation of sunspots with wheat prices doesn’t falsify the connection, but finds it statistically insignificant. This conclusion however is at odds with other recent work, including papers cited in the analysis you linked, so it’s hardly dispositive:
        Pustil’nik, L. A., …

        No paper in and of itself is dispositive. And while you may think Pustil’nik is the cat’s pajamas, I note that you have not answered the authors’ objections to Pustil’nik in the paper I cited, viz (emphasis mine):

        On the other hand, our results stand in curious juxtaposition with the reports of Pustil’nik and Yom Din [2004a, 2004b, 2009, 2013]. They cited Herschel [1801] as motivation for their analyses. However, they identified a seemingly significant tendency for increases in American wheat prices from solar-cycle minimum to subsequent maximum (MinMax, cycles 15–22, years 1913–1989), the opposite of the relationship suggested by Herschel. We have reproduced the one-sided Student t test significance probability reported by Pustil’nik and Yom Din [2004b, pp. 479–480] that was based on USDA wheat price data, p=0.0335. But when we add USDA data for cycle 23, 1996–2000, and earlier NBER wheat price data, cycles 9–14, 1843–1907, we obtain a much larger probability, p=0.2907.
        Since Pustil’nik and Yom Din were not apparently testing a hypothesis corresponding exactly to Herschel’s MinMax price decreases, but were, it seems, open to consideration of a separate hypothesis for MinMax increases, it is more reasonable to use a two-sided t test. This doubles the probability to p=0.5814, which is not small and certainly not indicative of a statistically significant relationship between sunspot number and the American price of wheat. In a different study, Pustil’nik and Yom Din [2004a, pp. 347–350] performed an interval analysis on “bursts” in wheat prices. These intervals were derived using an 11-year filter, and so their statistical comparison with the actual distribution of solar-cycle durations was, at least partially, predetermined. We regard the results of Pustil’nik and Yom Din with skepticism.

        Seems like good reasons for skepticism to me, and since you’ve found no fault with their reasons, I’ll hold to that …
        w.

    • Thanks, David. Upon a quick reading, I fear that you have not applied either the Bonferroni correction or accounted for autocorrelation in your work. This renders it quite suspect.
      w.

  19. Willis, http://www.analog.com/library/analogDialogue/cd/vol40n1.pdf page 16 shows how much can be gained by adding noise to a signal (a time series effectively) prior to being digitised.
    https://www.silabs.com/Support%20Documents/TechnicalDocs/an118.pdf shows how by sampling more or less (daily average/monthly average/annual average etc) changes in the resolution of the ‘whole’system can be garnered.
    Is adding noise to the system either prior or post data capture, equally valid? I have not yet determined.

    • Adding noise to a signal prior to digitally quantising it is also called dithering. It is a way to overcome the limitations in the precision of the A-to-D sampling.

  20. Is this in line with the politically correct science?
    Will this study simply be yet another one of many good scientific papers that will never be considered by the IPCC when it does its Sixth Assessment Report?

  21. Thanks for the post Willis.
    Have to figure out how to apply this type of analysis to my data, Center of Pressure data on a force plate. But looks like a promising method of analysis. Been typically using Correlation Dimension, Sample Entropy, and Higuchi’s Fractal Dimension.

  22. Picking cherries no longer is sufficient. To get data that supports the preordained conclusion now requires picking fruit salad.
    The periodogram shows peaks in the Fourier space that are not sharp. There are ‘sidebands’ of the main peak, especially at the longer periods such as 2.5 yr and 3.7 yr. The ‘beating’ of these frequencies is responsible for the periodic modulation of periodic signals (e.g. C2, C3, and C4 in figure 5). To some extent there is a contribution to these rounded peaks due to the truncation of the raw data at the ends of the domain. These rounded peaks are what is passing through the computational ‘bandpass filters’ as closely related cycles.
    A more ‘elegant’ approach, in my opinion, would be to work in the Fourier space, to sequentially identify statistically significant peaks (amplitude > 2 sigma for the linear regression of the FFT) one at a time, separate them from the data, subtract the Inverse FFT of the peaks (boxcar truncation) from the original data, and reprocess the reduced data set. This can be repeated as long as statistically significant peaks can be identified within the FFT space.
    A familiarity with the FFTs of pure noise of various types is helpful here. The FFT of white noise is ‘flat’ – the linear regression of amplitude vs frequency (alternatively amplitude vs period) has no statistically significant slope. Brown noise has an FFT in which the ‘envelope’ of amplitude shows a linear dependence on frequency/period, as does the periodogram of this data when one looks at the non-peak amplitudes in Fourier space.
    But what do I know?

    • tadchem said December 11, 2015 at 9:02 am in part “ The FFT of white noise is ‘flat’ “
      Surprisingly, NOT so. Here I am NOT referring to the fact that any one white sequence is FAR from flat. Rather, even if you average the FFT magnitudes of perhaps a million different white sequences, there will always be a (very significant) dip in the magnitude at DC (k=0), and at k=N/2 (only when N is even). This dip is to just about 90% of the “plateau” level: it’s 2(sqrt(2))/pi or 0.9003163. For k=0 and k=N/2 (for N even) we have, for a REAL time signal (usual case), an ordinary one-dimensional “Drunkard’s walk” with the mean of a folded normal distribution. For all other k, the “walk” is two-dimensional and the mean is that of a Rayleigh distribution.
      http://electronotes.netfirms.com/NotFlat.jpg
      A full detailed description from 2012 is here:
      http://electronotes.netfirms.com/EN208.pdf
      You probably will want to verify this is TRUE before considering my explanations! Email me for additional details. Yes – another way the FFT can sneak up and eat your lunch on you.
      Bernie

  23. As always, quite interesting, and thank you again.
    Some time ago I read papers on how the sensory organs (of which animals, I do not remember) make use of the random thermal noise to sharpen discrimination. This is at least “associatively” related, if not directly comparable in detail, to the method you presented here.

  24. Recordings on CD used to have noise added to the analogue signal to cover up quantisation distortion (audible volume stepping) at low signal levels to compensate for the logarithmic response of the ear.
    Probably still do, in fact.

  25. Excellent article Willis. Thank you.
    What I noticed in your journey through EEMD and CEEMD land was that most of the solar signal appears in the reconstructed data.
    Another tidbit caught my eye. That is the Amazon river output has increased.
    From around 1930 to 1948-9 the Amazon river flow appears to be approximately 16,000 – 17,000 cubic meters of water per second. This range appears limited in variability.
    From approximately 1968 to the current period, the flow has greater variability. What is interesting is how the overall flow increases during that time period; with peaks as high as 20,000 cubic meters/sec. The average also appears to be higher, perhaps a 1,000 cubic meter/sec, though that is a guess.
    Looking around seeking Amazon River discharge rates was enlightening.
    From Earth Observatory:

    “…In large floodplain-dominated rivers such as the Amazon, variations in water heights create a signal that is detectable by 37 GHz passive microwave radiometers. From these signals, Vorosmarty was able to piece together a picture of river discharge, pixel by pixel, including input from tributaries and periodic differences in water height due to seasonal variations.
    Ground-based meteorological station data from the Global River Discharge Database (RivDIS), archived at the Oak Ridge National Laboratory DAAC and the Brazilian Departmento Nacional de Aguas e Energia Eletrica (DNAEE), were used to calibrate and validate hydrology models for the river. From these data, scientists created models from which they could generate a time series of discharge and runoff simulations.
    When it came time for ground-truth, Vorosmarty and colleagues found that the satellite imagery and hydrology models agreed. Both data sources reflected the progressive increase in discharge and magnitude, as well as the influence of tributary inflow. Measurements of seasonal high and low water levels also matched. Both also showed interannual variations in discharge caused by the 1982/1983 El Niño event…”

    Research carried out in:

    Vorosmarty, C. J., C. J. Willmott, B. J. Choudhury, A. L. Schloss, T. K. Stearns, S. M. Robeson, and T. J. Dorman. 1996. Analyzing the discharge regime of a large tropical river through remote sensing, ground-based climatic data, and modeling. Water Resources Research 32(10): 3137-3150.

    So, beginning around 1996-1998, Amazon River discharge rates are measured via satellite.
    Meaning that Willis was trying to analyze data originating from four, perhaps five data sources, not three. Two separate reconstruction locations,
    One, likely two different river mouth estimation series; the second estimation is when the series begins again in 1968-9 until replaced by satellite measurement.
    One satellite measurement series.
    Without trying to locate Vorosmarty’s paper, I am left wondering how the satellite measures water levels where fresh water meets salt water and South Atlantic tidal forces.
    I no longer worry about the Amazon River discharge rates perhaps increasing; now that the latter period is measured by satellites.
    Doesn’t anyone in NASA ever wonder about the conjugal habits of their data?
    I haven’t bothered to seek the ‘Global River Discharge Database’ yet. Because that way lies more work, as I am curious just what Amazon river discharge levels are used in calculating sea level rise.

  26. Willis, you seem to find surprises in every study you undertake and you don’t tend to leave stones unturned. Thank you again for your fine work. I’m left with some questions as I’m sure you are, but maybe not the same questions.
    1) I haven’t read the paper and won’t pay for it, but I’m puzzled if they didn’t look at the baseline causes of flooding in this mighty river. First, the baseline cause is precipitation- rain and snowfall/melt. If this information is available, it should give an even more convincing connection between sunspots and flooding if there is one. Also, flooding is unlikely to be a simple what goes in must come out situation. In dry years, there is low ground (swamps) that may drain and in some basins, aquifers decant. Heavy precipitation may be partly or largely swallowed up by these and thirsty vegetation by the time the water has traversed a good part of the basin. Also, after a dry period, sloughing of the banks are likely dam water temporarily or divert it through other low ground causing delay. Precipitation is best for the analysis.
    2) You said there were some other methods that might have been used. The suspicious nature I’ve developed over the past decade or so gives me cause to think that the authors likely tried the usual methods and had to go to an unusual one to get their fit. Also, with two stations in the stew, I’m seeing red flags. I didn’t grasp the reason for this but assumed one would be a check for the other, or some such (perhaps revealing the data hiding in the noise). I can imagine if they initially tried several other stations on the river, they could cull out the ones they didn’t like and go with what they did go with. This seems to me a good way to get a high probability of a fit that probably is meaningless.
    In any case, I note you are intrigued with the method and that tells me you will be reporting this in detail at some later date. I look forward to it.

  27. Rainfall time series do not contain periodic components other than those due to the seasonal cycle. Once that is eliminated, the remaining variability is entirely stochastic; introducing white noise adds nothing to the discovery of random signals. Apparently unaware of rigorous cross-spectral methods to establish relationships between stochastic time series (and entirely ignoring the very apparent lack of coherence between Amazon rainfall and the sunspot cycle), Antico and Torres resort to analysis methodology that is as quaint as it is inappropriate, Sadly, analytically misguided attempts to squeeze a tendentious result out of the data are par for the course in “climate science.” They’re simply not worth the baffled attention given them.

    • Well doen, that exactly the unsubstantiated assumption that the whole the AGW is based on. ” the remaining variability is entirely stochastic; ”
      Everything is “stochastic” except CO2, so whatever happens it must be due to CO2. QED.

    • Thank you 1sky1, that would answer my question as to why they didn’t use precipitation /snow melt. What else can they imagine causes flooding of a river? Could they really have not asked this question first and if the answer was that of 1sky1’s information above, then why do the study?
      One possible way stochastic precipitation could be irrelevant to the flood cycle(?) would be if snow tended to have net accumulation for 10yrs and then melts were greater at the end of a sunspot cycle. But surely this would be well known for a long time and would be obvious without the need for iffy methods to tease out a signal. This one using interacting data from two stations on the river is what has got me. One station in the lower reaches of the river should do the job if it has any significance at all. What is the difference between floods on the river in between sun cycle maxima and minima? Volumetrically it must be small or it would be well known long before now. If small, we get into the problem of “small differences” with large error bars. I’m assuming there aren’t numerous, high precision stream flow guages, snow pack and melt guages. This isn’t Squaw Valley.
      I can understand why Willis hasn’t been able to find any sunspot cycle signal in climate. With only a small change (1%) in TSI, and large error bars in all climate data, it’s more than just noise. One percent sounds non existent when you are measuring with axe handles and plugs of chewing tobacco and then adjusting the data on an algorithm daily. I think this quest to find a connection with the sun spots is becoming pathological. Whatever there may be has to be small. Other very long term variations in the sun’s output are another thing, but Willis has already pointed out that the much stronger annnual variations from orbital eccentricity don’t even show, that clouds forming 15 minutes earlier in the afternoon can wipe out this difference in potential heating. This all makes this paper ridiculous.

  28. Willis Eschenbach wrote December 10, 2015 at 11:54 pm:
    “Sorry, guys, but the original paper differentiates this use of white noise from the “stochastic resonance” use. See page 7 for their discussion of the differences between the two.
    w.”
    I assume”original paper” meant Zhaohua Wu and Norden E. Huang (2005) and that the relevant comments on page 7 are:
    “Adding noise to the input to specifically designed nonlinear detectors could be also beneficial to detecting weak periodic or quasi-periodic signals based on a physical process called stochastic resonance. The study of stochastic resonance was pioneered by Benzi and his colleagues in early 1980’s. The details of the development of the theory of stochastic resonance and its applications can be found in a lengthy review paper by Gammaitoni et al. (1998). It should be noted here that most of the past applications (including these mentioned earlier) have not used the cancellation effects associated with an ensemble of noise-added cases to improve their results.”
    I had already considered this and it was the basis for which I commented about averaging stochastic resonance. It seemed quite clear and I don’t see what else it could mean.
    It is in fact, apparently, according to my quick look below, a pretty good idea, although I doubt that there is any averaging in most SR situations except as I noted for audio dither. But I am impressed. The figure shows the average output of 400 runs of SR where the signal is five cycles of a sinewave of amplitude 0.1, the added noise is uniform of amplitude 0.5, and the threshold (non-linear detector) is set at 0.5. Hence there is a detection when the noise boosts the sinewave peaks to between 0.5 and 0.6, which seems to be an average of about 25 of the 400 trials for any time point. Here is my result:
    http://electronotes.netfirms.com/AvStoRes.jpg
    I am impressed. Beyond this finding, the local jargons get in the way (as Joe Born suggested).
    Bernie

    • Bernie Hutchins:I assume”original paper” meant Zhaohua Wu and Norden E. Huang (2005) and that the relevant comments on page 7 are:
      thank you for your post. That looks like a good example. I am impressed.

  29. Nobody’s mentioned this yet. This whole EEMD process was invented to deal with (among other things) non-stationary data. In other words, data where the solar signal is there…but not there either all the time or at the same frequency, phase or amplitude all the time. If the signal is not there all the time then Fourier analysis would have a harder time seeing it. That would have the effect of spreading out the energy over a range of frequencies and lowering the peaks, thus making it more difficult to spot.
    However, what it means if the solar signal really is there but varies significantly over time is beyond me. Anyway, just an observation here.

  30. Dear Willis,
    As a late-career physicist who routinely uses advanced signal-processing techniques, all I can say is that this approach is BS. First, the addition of truly “stochastic” noise to data is a well known technique and has applicability IF the recording technique is non linear or quantized. This should be obvious in the case where a digitization recording technique has a bit resolution SMALLER than the signal. The addition of STOCHASTIC noise LARGER then the signal can raise the signal level above the one bit resolution of the digitizer. Then a repetitively recorded signal can eliminate the noise through ensemble averaging (whose definition includes the removal of stochastic noise – but I digress). FUNDAMENTALLY, the addition of stochastic noise to a signal is easily discernible in Wigner Space where the data is spread out in phase space. Stochastic noise is 2-D randomly scattered in Wigner Space. Localization of energy density in Wigner space corresponds to a true “non-stochastic” information. Stochastic noise addition will not help this situation with the exception noted above. Finally, the use of multi-resolution decomposition is well known in Wavelet Theory. The approach described in the the article is a mathematically non-rigorous decomposition. True wavelet mathematicians would laugh at their approach. (See Ingrid Daubechies’ books.) I don’t laugh, I just shake my head. There are numerous ways to mathematically attack this problem in a rigorous fashion. You described one approach. The other is a true time domain autocorrelation approach which would give nearly the same result. — I have no bias as to the result of the authors’ study but such a data analysis used in a physics article in any reputable journal (Phys Rev comes to mind) would have NEVER made it past the reviewers. C students – what can you say. Thanks so much for all of your effort. It is always a pleasure.
    Rick

    • rbspielman December 11, 2015 at 4:30 pm

      Dear Willis,
      As a late-career physicist who routinely uses advanced signal-processing techniques, all I can say is that this approach is BS.

      Rick, as a late-career troglodyte who routinely uses a stone for a hammer, all I can say is that you are arguing with the wrong man. At present, there are 31,000 hits on google for “EEMD signal”, and another 3,600 hits on Google Scholar. Go impress them all by telling them all about how you are a “late-career physicist”, and let them know that the 3,600 scientific studies referencing EEMD are “BS”. Here are the top four from google’s list:

      Application of the EEMD method to rotor fault diagnosis of rotating machinery
      Y Lei, Z He, Y Zi – Mechanical Systems and Signal Processing, 2009 – Elsevier
      Empirical mode decomposition (EMD) is a self-adaptive analysis method for nonlinear and non-stationary signals. It may decompose a complicated signal into a collection of intrinsic
      mode functions (IMFs) based on the local characteristic time scale of the signal. The EMD …
      Cited by 205
      Comparing the applications of EMD and EEMD on time–frequency analysis of seismic signal
      T Wang, M Zhang, Q Yu, H Zhang – Journal of Applied Geophysics, 2012 – Elsevier
      The Hilbert–Huang transform (HHT) is a novel signal analysis method in seismic exploration. It integrates empirical mode decomposition (EMD) and classical Hilbert transform (HT), which can express the intrinsic essence using simple and understandable …
      Cited by 42
      EEMD method and WNN for fault diagnosis of locomotive roller bearings
      Y Lei, Z He, Y Zi – Expert Systems with Applications, 2011 – Elsevier
      … Features are extracted from the sensitive IMF of EEMD in this method. … Because vibration signalscarry a great deal of information representing mechanical equipment health conditions, the vibration based signal processing technique is one of the principal tools …
      Cited by 68 Related articles
      A complete ensemble empirical mode decomposition with adaptive noise
      ME Torres, M Colominas… – … , Speech and Signal …, 2011 – ieeexplore.ieee.org
      … ing and processing non-linear and non-stationary signals. The new method was successfully tested on artificial and real signals. The method here proposed has the advantages of requiring
      less than half the sifting iterations that EEMD does, and that the original signal can be …
      Cited by 134

      I note that just these four studies of what you call “BS” have been cited by 439 other studies … I suppose those are all BS too …
      Report back with your findings, Rick. I’m interested in what those folks say when you tell them that they don’t know what they are talking about, and that they are peddling BS and should read Daubechies …
      w.

      • [Reply: ‘Chaam Jamal’ is a sockpuppet. Also posts under the name ‘Richard Molineux’ and others (K. Pittman, etc.) As usual, his sad life writing comments has been completely wasted, as they are now deleted. –mod]

      • Chaam Jamal December 11, 2015 at 5:34 pm

        “there are 31,000 hits on google for…..”

        Willis….you realize have a clue how meaningless that is?

        You will get 44,300 hits for “Eschenbach is a jerk”

        Yeah, but that’s just because it’s true …
        It appears you missed the small print. When you do a Google search for “Eschenbach is a jerk”, it says at the top of the page:

        No results found for “eschenbach is a jerk”.
        Results for eschenbach is a jerk (without quotes): …

        Since Google ignores “a” and “is”, you’ve counted web pages with the words “Eschenbach” and “jerk” on them somewhere. I note that the top hit is for Canadian circle dancing … me, I do much more focused searches than that.
        So I’d agree that in a meaningless search such as your example, the Google results are meaningless. I encourage you to do the narrow focused search I did for EEMD signal, and note what you find. There is not one single article in the top thirty that is not about the subject under discussion, including things like “Search for the 531-day-period wobble signal in the polar motion based on EEMD”. I got tired after reading the names of the first thirty or so of them. I strongly encourage you to do the same.
        In any case, I started by mentioning 31,000 Google hits in the hopes he’d look at them. Then I noted the 3,600 Google Scholar hits, and I listed the top four of those google scholar hits. They show the EEMD analysis being used in the real world. I have conclusively shown that EEMD is not “BS” as the commenter rashly claimed.
        w.
        [Reply: ‘Chaam Jamal’ is a sockpuppet. Also posts under the name ‘Richard Molineux’ and others (K. Pittman, etc.) As usual, his sad life writing comments has been completely wasted, as they are now deleted. –mod]

      • And about 27,300 results for “Chaam Jamal is a jerk” too.
        You don’t understand how to use Google, do you?
        [Reply: ‘Chaam Jamal’ is a sockpuppet. Also posts under the name ‘Richard Molineux’ and others (K. Pittman, etc.) As usual, his sad life writing comments has been completely wasted, as they are now deleted. –mod]

  31. All this is way, way over my head, but still fascinating. I have a question: Does it make a difference what kind of noise is used? What if one uses red noise instead of white?

  32. Thanks for this example of thoughtful confirmation and extension of previous work…exactly what the climate community, or any other scientific community, should be doing.
    I’ve never used R before, only Matlab for frequency and time domain analyses and Minitab for statistical inference. This has encouraged me to take a look at R. Thanks again.

  33. A form of this matter of noise addition to improve detection was put to me as follows by an analytical chemist in my lab about 1970. We were using the new technique of atomic absorption spectrometry on an instrument with a logarithmic calibrated dial/needle readout.
    He theorised “At very low concentrations the needle barely moves up from zero. Therefore, a bias is introduced because all readings below zero are assigned a zero, all just above have discrete values and therefore the mean of repeated readings is affected. If, before analysis, we add a small, known amount of extra analyte, we will shift the base from zero and allow an unbiased average.”
    Eventually he agreed that the step of addition of analyte was equivalent to adding a source of noise and more error; and that in this case, the situation was made worse by taking the meter needle into the more compressed part of the visual scale, adding more noise through worse visual discrimination.
    In the case being discussed by Willis, one approach is to categorise cases such as this and then strip them from the list of possible ways that the signal is allegedly enhanced.
    Other bloggers have been attempting this by reference to human detection, such as adding noise to CD music, adding noise to visual imagery, dithering, etc. Examples that rely on human response should not be used because only a mathematical expression can be devoid of human frailties. The mathematical approach should avoid relative comparisons like whether the Floyd-Steinberg or Stucki dither method is the best. We need illustrations of how the addition of (first, perhaps) white noise has improved signal strength, expressed in quantitative mathematics.
    (Unrelated factors have not allowed me to study the original paper yet – please excuse, I am trying. The example I gave is not trivial. I suspect it applies at least in part to sampling sub-pixel sizes.)

    • Thanks for the example, a great lesson in thinking.
      The problem in the lab would be to ensure precision and accuracy in measuring the small amount of extra analyte. If the resulting value after subtracting the bias is close to zero there is a risk of multiplying any error in measuring the analyte. In a lab with closely controlled conditions this may be routine.
      But in the wild, such precision is rare and that is the problem we find in climatology.

  34. There is also an improved method, called CEEMD, for “Complete EEMD”. It appears to be a significant improvement on the original EEMD method in the higher numbered intrinsic modes, and is able to reproduce the original signal with greater fidelity.
    The discoverers’ paper explaining the method is here. I tried the CEEMD method as an alternate analysis method in the head post, but the differences were minor and I didn’t want to try to explain the CEEMD method, start at the start I figured … plus it’s slooow to compute, and the authors used EEMD, so that’s what I used.
    The CEEMD function is a part of the “hht” package in R.
    Anyhow, here’s a lovely test of the two methods from the paper above, that explores how the two methods decompose the Dirac delta function (a one-time jump in a signal);
    https://i0.wp.com/wattsupwiththat.files.wordpress.com/2015/12/ceemg-eemg-delta-dirac.jpg?w=640
    Gotta say, that is sweet as!
    w.

    • Wow, that is nice, Willis . Certainly puts a bit of context for those dismissing EEMD as BS.
      I’ve got the SFT of the flowts with and without removeann=T , nearly identical in where peaks lie.
      Now I want to get the C3 spectrum at monthly resolution, yearly is too crude to be much use. I’ve tried to adapt the code but it’s not going too well . Could you suggest how to do this?
      Thx.

      • Mike –
        Yes – it looks neat. But don’t you think it looks very much like a wavelet decomposition. (Not to mention the Fourier decomposition of a Delta.)
        And I thought we were talking about detecting a very weak periodic component – exactly what a Delta function ISN’T. Your own work seems related to the periodic case?

      • yes Bernie. I think that plot shows the result is very similar to FFT, SFT or wavelet decomposition. The claim is that it is more robust in noisy data. Remains to be tested.
        It’s quite surprising in view of the difference in method but again reassuring that it is consistent with FT.
        I have now managed to do EEMD on the monthly Amazon data and the peaks are very close though not always identical to what I get using Willis’ SFT . The periods around 20-odd years changed by a couple of months , the other were exactly the same number of months.
        The band-pass effect may be very useful in more noisy data but here it was detrimental in the case of the 18.6y peak. On C6 is in one transition band of the filter; on C7 it is on the other side. Due to very poor resolution of the adjacent peaks, this meant that I could not determine the peak centre when it got bent downwards.
        Using SFT I get a peak value. Here is what I find from the Amazon rain data:
        I did the SFT of the monthly flowts with and without the annual cycle and it was nearly identical in where the peaks lay.
        8.91
        10.75
        13.5
        18.58
        21.25
        26.08 y
        I’m tempted to see 8.91y as the lunar apside period of 8.85 years but I’d be a little cautious since I would have more convinced it were nearer in view of the length of the data sample.
        18.58 is clearly a lunar cycle. This year is a “minor lunar standstill” which is when the latitude of the moon comes closest to the ecliptic ( plane of the solar system ). Thus it also was in 1997 when when the last major El Nino developed.
        10.75 and 21.25 are strongly suggestive of Schwabe and Hale, though the 10.75 is fairly small. Finding a solar signal and demonstrating it is small is probably as informative as not finding one.
        However, with equally strong 18.58 and 21.25 barely being resolved from each other it is clear that ignoring possible lunar influence will beggar attempts to find or dismiss a solar signal. Those periods will go from in phase to opposite phase in about 74 years.
        I was unimpressed by this paper when I read it last week. Having analysed the data I’m even less impressed since I think they failed to see the stronger Hale cycle and did not even consider the presence of lunar influence.
        Anyway, mighty thanks to Willis for digging into this EEMD method and making his code available.
        This is another useful tool to have available.

      • Just ran thsi through the spectral software I usually use and I don’t see any sign of the circa 21y peak !
        The circa 18 is sitll there though nearer 18 than 18.6, ; circa 8y is 8.8 just the other side of 8.85 to that found with this technique. 10.8 is still there and is the strongest in decadal scale peaks.
        Looks like Amazonian climate is affacted by both the sun and the moon, but I suppose the Azetcs could have told us that. 😉

    • Willis –
      What is the provenance of the paper you have linked? No date! It looks like IEEE format, but even students submit assignments using that! Thanks.

      • After I found the paper on the web, I checked it against the CEEMD implementation in R that I’d used in the head post, and found that they were both the same. Hang on … OK, the documentation for the package “hht” (Hilbert-Huang Transform) gives it as:

        Torres, M. E., Colominas, M. A., Schlotthauer, G., Flandrin, P. (2011). A complete ensemble empirical mode decomposition with adaptive noise. 2011 IEEE International Conference on Acoustics, Speech, and Signal Processing, pp.4144-4147, doi: 10.1109/ICASSP.2011.5947265.

        w.

  35. Frederick Colbourne — you presented a good work on 11 year cycle with Indian monsoon rainfall.
    Solar radiation presented 11 year cycle but not rainfall
    All-India Southwest Monsoon rainfall presented 60-year cycle. The third cycle started in 1987 [the starting year of Astrological calandar of 60 year but lagging by three years to Chinese 60 year astrological cycle.]. You can go back ward and forward from 1987 and take 10 year averages and plot on a graphy. You get clear sine curve.
    In the case of Southwestern parts of India with northeast monsoon and pre-monsoon and post-monsoon cyclonic activity, the annual rainfall presented 132 year cycle. The new cycle started in 2001. If look at the data by separating for SWM and NEM, they presented 56 year cycles but in opposite direction. NEM 56 also reflected in the cyclonic activity. Both NEM & SWM precipitation showed an increasing trend basically because the first 66 years come under the below the average and next 66 years comes under above the average.
    So, Indian monsoon is a complicated system as they are modified by orographic systems.
    Dr. S. Jeevananda Reddy

  36. Willis – What we really need here is:
    (1) A well-defined (toy) test signal with the results compared for FFT, stochastic averaging, and any proposed new method. Has this been done?
    (2) The essential, smiling first-class grad students at our doors to tell us what, if anything, it means and is good for!
    Alas – I am retired and only hoping for (1).

    • Thanks, Bernie. Since the EEMD method is widely used in industry and science, I’m not sure that we need some abstract test.
      Next, if you just take a look at the Google search results for “EEMD signal” (without quotes) you’ll find literally dozens of places where people are using it for a variety of things that it is good for.
      w.

      • Thanks Willis –
        I think this EEMD was new to you until a few days ago, and I had never heard of it until you posted. What I have not seen here or immediately adjacent is a consistent description of HOW one computes this. Thus I can not duplicate or assess the EEMD procedure at present. [ I was however, familiar with the use of noise to enhance detection (stochastic resonance) for some 22 years (thanks C.H.).]
        I do not work well in a mode where I have to rely on a canned program (in R or even in Matlab). I prefer to (first) write my own often cumbersome equivalent code. Alternatively, I can consider running a whole menagerie of test signals (sines, steps, ramps, sines plus noise etc) through the can to see WHAT the EEMD does DO. The last figure you posted of the decomposition of the Delta is the sort of thing that helps – but it does look very wavelet-like.
        An additional difficulty here is the application of an unfamiliar (to me) function to look for solar signals that most likely don’t even exist. These are poor tests. I don’t know what species of failure a negative result would mean.
        I do greatly appreciate your calling attention to interesting things hiding in the corners. Do keep doing that for us. Thanks.
        Bernie

      • Bernie Hutchins December 12, 2015 at 4:35 pm Edit

        Thanks Willis –
        I think this EEMD was new to you until a few days ago, and I had never heard of it until you posted.

        Indeed, until I read the Amazon paper I’d never heard of EEMD or CEEMD.

        What I have not seen here or immediately adjacent is a consistent description of HOW one computes this. Thus I can not duplicate or assess the EEMD procedure at present. [ I was however, familiar with the use of noise to enhance detection (stochastic resonance) for some 22 years (thanks C.H.).]
        I do not work well in a mode where I have to rely on a canned program (in R or even in Matlab). I prefer to (first) write my own often cumbersome equivalent code.

        I can only agree wholeheartedly with the idea that the more of the nuts and bolts we understand, the better off we are. Here’s the matchbook explanation of EMD (not EEMD), which is the underlying algorithm.
        Identify the local maxima and minima, and run a spline through them.
        Average the two splines, and subtract that curve from the signal. What remains is the first intrinsic mode.
        For EEMD, you do that say a hundred times or so, adding random noise each time, and average the results.
        However, that is generalities, not the details wherein the devil resides. However, there’s a way around that. One of the beauties of R is that if you type in the function name, you get the the nuts and bolts of the function. For example, here’s the standard deviation function “sd”, obtained by typing in “sd” and hitting Enter:
        > sd
        function (x, na.rm = FALSE)
        sqrt(var(if (is.vector(x)) x else as.double(x), na.rm = na.rm))
        Here, we can see that R calculates the standard deviation function “sd” as the square root of the variance. (The variable “na.rm” specifies whether to ignore missing values indicated by NA).
        If you use this same technique on the R function “EEMD”, you will assuredly have a “consistent description of HOW one computes this”. You’ll find the actual calculations are done by a call to a more basic function, “Sig2IMF”, which in turn calls a more basic function yet called “emd”.
        Alternatively, the R program “seewave” contains a function “discrets” which identifies the local minima and maxima. You could use that plus the normal spline function to create the two splines yourself.

        Alternatively, I can consider running a whole menagerie of test signals (sines, steps, ramps, sines plus noise etc) through the can to see WHAT the EEMD does DO. The last figure you posted of the decomposition of the Delta is the sort of thing that helps – but it does look very wavelet-like.

        Since the Dirac delta is a simple function, the last figure I posted was a proof of concept that the CEEMD function operates the same as the more familiar versions. Where it differs is with complex functions, where the signal varies in both frequency and amplitude over time. Here, for instance, is the CEEMD analysis of 315 years of sunspot data.
        https://i2.wp.com/wattsupwiththat.files.wordpress.com/2015/12/ceemd-sunspots-1700-2014.jpg?w=640
        Go figure …

        An additional difficulty here is the application of an unfamiliar (to me) function to look for solar signals that most likely don’t even exist. These are poor tests. I don’t know what species of failure a negative result would mean.

        Nor do I. However, in the original 2005 EEMD paper they showed how they used EEMD to establish the similarity by comparing the intrinsic modes C3 and up of the SOI and the El Nino CTI.
        In the Amazon paper, all they did was show that one of the intrinsic modes had power in the sunspot range, which they define as 9-13 years. To me, that seems far from adequate.

        I do greatly appreciate your calling attention to interesting things hiding in the corners. Do keep doing that for us. Thanks.
        Bernie

        You are more than welcome. Life is an unending mystery to me, and I can only report back on what I find in my wanderings.
        Regards,
        w.

      • Willis –
        Much thanks. Even as I admire your initiative and curiosity I also admire your patience and energy, quantities I myself find in decreasing supply in my dotage!
        So you show me that the EEMD is really TWO things: the detection and removal of patterns, in turn; and the addition of noise.
        In a noise-free signal, the pattern detection is essentially just what the human brain does without being specifically instructed. It is the basic approach of the reduced math (just “eyeball” it) Fourier Series articles of the 1950’s era Popular Electronics Magazine. We easily extend this, even to impromptu basis functions, and welcome the mathematical aids as encountered. So EEMD looks like a method of teaching a robot to do our own pattern recognitions. Fair enough.
        The addition of noise is not so clearly warranted, except as one compares it to classic stochastic resonance. In the case of a noise-free signal, we can contemplate adding intentionally generated random noise – for some purpose. In the case of an already noisy signal, why add more? Indeed! You may already have enough, or more than enough. For example, “adding dither” may be just a matter of not trying so hard to reduce noise coming in.
        With stochastic resonance however, it is CLEAR that the addition of noise HELPS. Consider the crayfish in the stream not wishing to encounter a bass who is looking for lunch. In a very quiet stream, the stealthy swishing tail of the bass may be insufficient to trigger the crayfish’s sensors. Now, the crayfish does not add noise, but the stream does: the turbulent “babbling” as water randomly encounters rocks. Enough that the peaks of the swish are now a thump, thump, thump. (My figure at December 11, 2015 at 3:39 pm) Essential here is the non-linearity (threshold). Once again (as with sonar), nature got there first.
        So the noise-free case of eyeballing components is perhaps efficient (computation cost of one-time processes don’t really matter), it is not clear to me why and how noise helps, unless it is similar to stochastic resonance. Perhaps.
        As for computer code, I did find some Matlab functions which I need to study and put together.
        Thanks again Willis.
        Bernie

  37. The Amazon doesn’t pass through a flow meter. River and stream flows are estimates.
    A common method the USGS uses is to set up a gauge to measure stream depth. The surface area of a cross section of the stream at the site is ascertained. The velocity of the stream is measured. From that the flow is estimated and tables produced giving so much flow for so much depth. Periodically that area of the cross section and the velocity are checked and the tables adjusted.
    Methods may have changed over the years. I don’t know how they measure the flow of the Amazon but it all does remind of surface stations and temperature.
    How accurately do the numbers reflect reality? The numbers, such as they are, may be the best we have but adjusting an estimate using another estimate might give you more decimal points but no more accuracy.

  38. Willis,
    After more digging I will accept that this is a neat example of counter intuition. I still have uncertainties about identification of types of data when noise should be added, but that is me.
    It is a bad day when one does not learn something new.
    Geoff.

  39. for some reason it more feels like what they call in digial audio as the process of “dithering”. just in reverse mode.
    Dithering adds and substracts noise for known audio signal distortions when you convert the bit depth of an audio signal. It adds general background noise but does reduce the more audible bit reduction sharp distortion that is audible.
    the difference here is that it uses “known” outcomes of distorted patterns in the sound and fils it in with the interpolated values of the reduced grid.
    why do i say reverse? Well here the signals are not known, you got like the “first step of dithering (=applying noise) that has been done” but this first step is unknown, some cycles are known so then you can “guess dither” the second step with the amazone values for any known cycle. However as fourrier analysis from the raw data shows, that may just be a small signal nearly undetectable that may add to the reduction towards a straight line.
    EEMD is the second step of dithering but with the first step unknown, it can therefore magnify this signal to a level that’s blown out of proportions This because the added “noise” of your “audio” (here the values of the amazon river) are decomposed is “unknown” and all the sine waves that compose the “noise” are unknown. Therefore ther may be a catch in this that can put scientists/mathematicians on a wrong leg, even if the whole methodology is scientifically or mathematically correct.
    so in short you can find with EEMD any cycle that can link, however as the interaction is unknown and the weather related noise follows an unknown, working this way backwards can be misleading. the best is to just apply a fourrier analysis of the raw data and then make conclusions.
    it’s a bit like “torturing the data till it shows what you are looking for” but then in a scientific way: nothing is wrong with the methods used, but it can give false (mainly exagerated) correlations

    • Frederik –
      I, like you, would like to understand EEMD based on ideas from digital audio like dither (and stochastic resonance). I have been unable so far to complete the connections!
      One thing that I think helps is to turn the problem around. You are not adding noise to a perfectly good signal, but adding a signal to perfectly good noise.
      Another thing is to recognize that the digital audio “art” is not so much science as it is (fiendishly clever) ENGINEERING aimed at a practical product. Essential here are the ideas of “over-sampling” and “noise-shaping” to manage noise. Over-sampling drastically increases the sampling rate during playback far far above the audio needs. This is by temporarily generating extra samples (locally on the fly by interpolation), using them, and discarding them. The “quantization noise” is then shaped into the high frequency range thus opened up (inaudible). In addition, the number of bits can thereby be reduced (resolution below LSB), even to just one bit! Deterministic, but the various waveforms look a lot like dither had been used.
      The value of comparing something new to something you already understand is of immense value.
      Bernie

      • Thanks Bernie you added nicely in what i tried to explain with the fact that i said “in reverse”
        Actually i am a digital sound creator and from any random noise of nature i can make beats and sounds that sound melodic.
        To achieve this, i filter out random noises of thunders and other sounds to a specific range so that it sounds like a beat or instrument. Thus filtering out the noise till you get a left over of a specific set of harmonics. by “filtering out the interfering harmonics that create noise”.
        This is what i would call to apply EEMD on perfect noises: take out the blur that masks the sound and intonation you hear in for example a thunder, to create a sound more perfect by eliminating the other freuqencies.
        So i do these practises daily but then just on the ear to have a good sounding beat without too much interfering random noise
        This principle does remind me very hard of what i do. The problem is that the focus on the inexistant sine waves with natural events will give one anyway. The Amazon river data will show an 11 year cycle or actually more accurate: “an 11 year sine wave with variety in intensity.”
        I believe they hurried to make their conclusions. however in correct noise it is not good to average an “in intensity varying wave that is inherent to noise”
        that effect explains why willis can’t find an 11 year cycle when he splits the data in half. here instead of cycles per second for audio, we talk about cycles of 11 years and more..
        so this means that in 40 years suddenly the fourrier analysis may suddenly show the 11 year cycle got canceled out or increased in strength. Both can happen
        I often use fourier analysis in my soundwork to find the “base wave” so now i’m in for an analogy: when you take the amazon raw data: here we have an exception: seasonal variability that gives a clear signal, Like this i would not be surprised to see also a signal for other events in our solar system that can or intensify if well aligned or cancel each other out if aligned in opposition. this will create “noise” even in the “clear signal you see “noise” as every wave is not of the same amplitude.
        so yes with enough data and very precise data you will see our “celestial harmonics” in yhe fourrier analysis (in fact all of them). The question then becomes: “If this signal is a range of 6% is it significant ? or is the signal of the seasonal variability that strong that it can easyly and repeatedly cancel out the 11 year cycle?”
        when i look at the split data Willis provided i suspect the second question will be answered with “yes”, which then would make the influence insignificant.
        if this wave would then change amplitude like the basic wave varying in a chaotic pattern, then it more looks like an artifact of the “noise of the seasonal cycle” rather then a real cycle
        i hope i made some sense as English is not my native language… if not clear just ask.

  40. Frederik Michiels December 14, 2015 at 8:53 am


    Therefore ther may be a catch in this that can put scientists/mathematicians on a wrong leg, even if the whole methodology is scientifically or mathematically correct.
    so in short you can find with EEMD any cycle that can link, however as the interaction is unknown and the weather related noise follows an unknown, working this way backwards can be misleading. the best is to just apply a fourrier analysis of the raw data and then make conclusions.
    it’s a bit like “torturing the data till it shows what you are looking for” but then in a scientific way: nothing is wrong with the methods used, but it can give false (mainly exagerated) correlations

    Frederik, you seem to have missed the fact that EEMD is an analysis method that is used successfully in a wide range of real-world applications. Do you think that they are doing that because EEMD “can give false (mainly exagerated) correlations”. Do you reckon they use EEMD because it “can be misleading”, or because using EEMD you can find “any cycle that can link?
    All you’ve given us are handwaving objections and imaginary problems. You have not shown that even one of those objections is true; you have not supported even one of those objections with a single example; and you have not explained the details of even one of your claims that EEMD might give you the wrong answers.
    In short, you’re just waving your hands and saying that you think EEMD is “torturing the data till it shows what you are looking for” … and without a single fact, citation, or reference to back you up, I’m sorry, but I simply don’t believe you know what you are talking about.
    w.

    • Willis –
      Right – BUT. Recall that Principal Components was a well-established method at the point where Mann invented a new way of normalizing the data which brought out a hockey stick. Who knows for sure, but don’t we all at times suppose that we have “finally done something right” when we see what we were expecting, if not outright hoping for? There seem to be a fair number of users who compare EEMD to wavelets and prefer wavelets. Caution per se has merit.
      Bernie

      • Bernie, I’m not following you. What is your point here?
        Should we use caution? Sure. Can EEMD be compared to wavelets? Sure, you can compare anything to anything. Is EEMD the same as wavelets? Absolutely not.
        Next, Mann did not “invent a new way of normalizing the data”, I hope that was sarcasm. He made a stupid mistake and didn’t notice it because it fit his preconceptions … but what on earth does that have to do with whether EEMD is a valuable tool? We know for a fact already that it is a valuable tool, because people are using it all over the world for real-world problems.
        I say again … what is your point here?
        w.

      • Willis my friend – you said December 14, 2015 at 11:17 am, in part:
        “I say again … what is your point here?”
        Three points.
        POINT 1: First, you said in the top post:
        “Finally, contrary to the authors of the paper, I would hold that the great disparity between all of the intrinsic modes of the Amazon flow data and of the sunspot data, especially mode C3 (Fig. 7), strongly suggests that there is no significant relationship between them.”
        Now, if I understand you correctly (here and in the past) you don’t see evidence of any 11 year cycles, specifically not in the case of river flow rate. So a tool that shows such a mode is in some way flawed or being misused? The fact that the same tool is properly used elsewhere by others is not relevant to the Amazon River. Thus if it found an artifact, it is misleading us all. It’s like claiming that a FFT could find a linear trend, and pointing out that the FFT is highly regarded as useful.
        POINT 2: (peripherally related) you said:
        “….He made a stupid mistake and didn’t notice it because it fit his preconceptions …”
        while I said:
        “….we all at times suppose that we have ‘finally done something right’ when we see what we were expecting, if not outright hoping for?….”
        These are much the same – the same human foible.
        POINT 3 – Certainly MY failing but it still do not have much idea how or why noise is used in EEMD (unless it is a stochastic resonance means of detecting weaker modes) and I have not seen any basic “tutorial” on EEMT (one good ppt never gets to noise) that demonstrates the procedure and puts it through its paces, comparing to FFT, wavelets, etc.
        Thanks for your time.
        Bernie

      • Bernie Hutchins December 14, 2015 at 1:23 pm

        Willis my friend – you said December 14, 2015 at 11:17 am, in part:
        “I say again … what is your point here?”
        Three points.
        POINT 1: First, you said in the top post:
        “Finally, contrary to the authors of the paper, I would hold that the great disparity between all of the intrinsic modes of the Amazon flow data and of the sunspot data, especially mode C3 (Fig. 7), strongly suggests that there is no significant relationship between them.”
        Now, if I understand you correctly (here and in the past) you don’t see evidence of any 11 year cycles, specifically not in the case of river flow rate. So a tool that shows such a mode is in some way flawed or being misused? The fact that the same tool is properly used elsewhere by others is not relevant to the Amazon River. Thus if it found an artifact, it is misleading us all. It’s like claiming that a FFT could find a linear trend, and pointing out that the FFT is highly regarded as useful.

        It’s hard for me to say much about them purportedly finding the 11-year cycle, mostly because I found no such cycle either in my Fourier analysis or the EEMD analysis. I can’t reproduce their results. So I don’t know whether “it found an artifact” as you say, or whether they were using on a different dataset, or using it incorrectly.
        I also disagreed with their method, which was different from the method of the authors of the EEMD2005 paper. The Antico2015 author merely determined if one isolated frequency were present, while the EEMD2005 authors compared four different intrinsic modes.

        POINT 2: (peripherally related) you said:
        “….He made a stupid mistake and didn’t notice it because it fit his preconceptions …”
        while I said:
        “….we all at times suppose that we have ‘finally done something right’ when we see what we were expecting, if not outright hoping for?….”
        These are much the same – the same human foible.

        I was just objecting to your claim that “Mann invented a new way of normalizing the data”, when all he did was make a math error.

        POINT 3 – Certainly MY failing but it still do not have much idea how or why noise is used in EEMD (unless it is a stochastic resonance means of detecting weaker modes) and I have not seen any basic “tutorial” on EEMT (one good ppt never gets to noise) that demonstrates the procedure and puts it through its paces, comparing to FFT, wavelets, etc.

        Did you read the EEMD2005 document? I thought it explained the reason for the noise quite clearly, and why noise improved the previously-used EMD method. Even the abstract explains what the noise does …

        A new Ensemble Empirical Mode Decomposition (EEMD) is presented. This new approach consists of sifting an ensemble of white noise-added signal and treats the mean as the final true result. Finite, not infinitesimal, amplitude white noise is necessary to force the ensemble to exhaust all possible solutions in the sifting process, thus making the different scale signals to collate in the proper intrinsic mode functions (IMF) dictated by the dyadic filter banks. As the EMD is a time space analysis method, the white noise is averaged out with sufficient number of trials; the only persistent part survives the averaging process is the signal, which is then treated as the true and more physical meaningful answer. The effect of the added white noise is to provide a uniform reference frame in the time-frequency space; therefore, the added noise collates the portion of the signal of comparable scale in one IMF. With this ensemble mean, one can separate scales naturally without any a priori subjective criterion selection as in the intermittence test for the original EMD algorithm.

        And it is discussed in greater detail further down in the study, viz:

        The principle of the EEMD is simple: the added white noise would populate the whole time-frequency space uniformly with the constituting components of different scales separated by the filter bank. When signal is added to this uniformly distributed white background, the bits of signal of different scales are automatically projected onto proper scales of reference established by the white noise in the background. Of course, each individual trial may produce very noisy results, for each of the noise-added decompositions consists of the signal and the added white noise. Since the noise in each trial is different in separate trials, it is canceled out in the ensemble mean of enough trails. The ensemble mean is treated as the true answer, for, in the end, the only persistent part is the signal as more and more trials are added in the ensemble.

        Not sure what more in the way of explanation you were looking for. They also have precise step-by-step details on the method in section 3.3.

        Thanks for your time.
        Bernie

        And yours as well.
        w.

    • will uote yourselfon this:

      First, let me say that I would never have guessed that white noise could function as a bank of bandpass filters that automatically group related components of a signal into a small number of intrinsic modes. To me that is a mathematically elegant discovery, and one I’ll have to think about. Unintuitive as it may seem, noise aided data analysis is indeed a reality.
      This method of signal decomposition has some big advantages. One is that the signal divides into intrinsic modes, which group together similar underlying wave forms. Another is that as the name suggests, the division is empirical in that it is decided by the data itself, without requiring the investigator to make subjective judgements.

      when you add noise to a noisy signal and then decompose it, into group related compinents you are on a dangerous zone where i thus say that though entirely correct in all ways to do it this does not work entirely correct on noisy data.
      the data of the amazon river is noise, but it is not only “white noise only” so like with sound it will resonate with parts of the white noise and not with other parts so when used as a bandpassfilter it will indeed divide the signal in intrinsic modes it will do that with every signal you do it with so yes some “harmonics” will pass through
      i think you missed the point about the “dithering in reverse”point i made as that was the point i was trying to make
      The point with “torturing the data” here is thus a “reverse one” with the point being: “you will find with EEMD in all river patterns an 11 year cycle even more on every aspect of our planet’s weather behavior you will find it with this method. I even do not need a proof for that, it’s obvious that this influence is there and that it is measurable,
      i think you missed the analogy with sound i made i that regard: in sound each IMF would be a harmonic component of the result of white noise band pass a sound with flutter in it consistent with the variable frequency of EMD processing. each IMF would then be seen in the spectral analisys of that resulting sound (sound works additive which is why the analogy is maybe not entirely empirical correct)
      does it also prove a huge impact or is that impact too small on these scales to make a difference?” so if yes for which parts would it be significant and fot which parts would it be too small to have an impact?

  41. Frederik December 14, 2015 at 8:00 pm

    i think you missed the point about the “dithering in reverse”point i made as that was the point i was trying to make
    The point with “torturing the data” here is thus a “reverse one” with the point being: “you will find with EEMD in all river patterns an 11 year cycle even more on every aspect of our planet’s weather behavior you will find it with this method. I even do not need a proof for that, it’s obvious that this influence is there and that it is measurable,

    Frederik, thanks for your thoughts. No, you will NOT “find with EEMD in all river patterns an 11 year cycle even more on every aspect of our planet’s weather behavior you will find it with this method”. In fact, I didn’t find the 11-year signal in the Amazon using this method.
    How about you actually USE THE METHOD AND GET SOME RESULTS before lecturing us on what the method will and will not do?
    Come back with some real information, like an actual river or other observational data series that you have actually analyzed and actually found the 11-year cycle in BEFORE you try to lecture us all about how much you know.
    Finally, your idea that you “do not need proof” for your claims merely reveals that you don’t understand the scientific method. First, you can’t prove anything in science. More importantly, you need support (logic, math, observations, previous studies, etc.) for any scientific claim that you might make. It’s a brand new method to you, near as I can tell you’ve never actually used it. Get some backup for your claims.
    w.

Comments are closed.