Extreme Times

Guest Post by Willis Eschenbach

I read a curious statement on the web yesterday, and I don’t remember where. If the author wishes to claim priority, here’s your chance. The author said (paraphrasing):

If you’re looking at any given time window on an autocorrelated time series, the extreme values are more likely to be at the beginning and the end of the time window.

“Autocorrelation” is a way of measuring how likely it is that tomorrow will be like today. For example, daily mean temperatures are highly auto-correlated. If it’s below freezing today, it’s much more likely to be below freezing tomorrow than it is to be sweltering hot tomorrow, and vice-versa.

Anyhow, being a suspicious fellow, I thought “I wonder if that’s true …”. But I filed it away, thinking, I know that’s an important insight if it’s true … I just don’t know why …

Last night, I burst out laughing when I realized why it would be important if it were true … but I still didn’t know if that was the case. So today, I did the math.

The easiest way to test such a statement is to do what’s called a “Monte Carlo” analysis. You make up a large number of pseudo-random datasets which have an autocorrelation structure similar to some natural autocorrelated dataset. This highly autocorrelated pseudo-random data is often called “red noise”. Because it was handy, I used the HadCRUT global surface air temperature dataset as my autocorrelation template. Figure 1 shows a few “red noise” autocorrelated datasets in color, along with the HadCRUT data in black for comparison.

hadcrut3 temperate data pseudodataFigure 1. HadCRUT3 monthly global mean surface air temperature anomalies (black), after removal of seasonal (annual) swings. Cyan and red show two “red noise” (autocorrelated) random datasets.

The HadCRUT3 dataset is about 2,000 months long. So I generated a very long string (two million data points) as a single continuous long red noise “pseudo-temperature” dataset. Of course, this two million point dataset is stationary, meaning that it has no trend over time, and that the standard deviation is stable over time.

Then I chopped that dataset into sequential 2,000 data-point chunks, and I looked at each 2,000-point chunk to see where the maximum and the minimum data points occurred in that 2,000 data-point chunk itself. If the minimum value was the third data point, I put down the number as “3”, and correspondingly if the maximum was in the next-to-last datapoint it would be recorded as “1999”.

Then, I made a histogram showing in total out of all of those chunks, how many of the extreme values were in the first hundred data points, the second hundred points, and so on. Figure 2 shows that result. Individual runs of a thousand vary, but the general form is always the same.

histogram extreme value locations temperature pseudodataFigure 2. Histogram of the location (from 1 to 2000) of the extreme values in the 2,000 datapoint chunks of “red noise” pseudodata.

So dang, the unknown author was perfectly correct. If you take a random window on a highly autocorrelated “red noise” dataset, the extreme values (minimums and maximums) are indeed more likely, in fact twice as likely, to be at the start and the end of your window rather than anywhere in the middle.

I’m sure you can see where this is going … you know all of those claims about how eight out of the last ten years have been extremely warm? And about how we’re having extreme numbers of storms and extreme weather of all kinds?

That’s why I busted out laughing. If you say “we are living today in extreme, unprecedented times”, mathematically you are likely to be right, even if there is no trend at all, purely because the data is autocorrelated and “today” is at one end of our time window!

How hilarious is that? We are indeed living in extreme times, and we have the data to prove it!

Of course, this feeds right into the AGW alarmism, particularly because any extreme event counts as evidence of how we are living in parlous, out-of-the-ordinary times, whether hot or cold, wet or dry, flood or drought …

On a more serious level, it seems to me that this is a very important observation. Typically, we consider the odds of being in extreme times to be equal across the time window. But as Fig. 2 shows, that’s not true. As a result, we incorrectly consider the occurrence of recent extremes as evidence that the bounds of natural variation have recently been overstepped (e.g. “eight of the ten hottest years”, etc.).

This finding shows that we need to raise the threshold for what we are considering to be “recent extreme weather” … because even if there are no trends at all we are living in extreme times, so we should expect extreme weather.

Of course, this applies to all kinds of datasets. For example, currently we are at a low extreme in hurricanes … but is that low number actually anomalous when the math says that we live in extreme times, so extremes shouldn’t be a surprise?

In any case, I propose that we call this the “End Times Effect”, the tendency of extremes to cluster in recent times simply because the data is autocorrelated and “today” is at one end of our time window … and the corresponding tendency for people to look at those recent extremes and incorrectly assume that we are living in the end times when we are all doomed.

All the best,

w.

Usual Request. If you disagree with what someone says, please have the courtesy to quote the exact words you disagree with. This avoids misunderstandings.

 

The climate data they don't want you to find — free, to your inbox.
Join readers who get 5–8 new articles daily — no algorithms, no shadow bans.
0 0 votes
Article Rating
218 Comments
Inline Feedbacks
View all comments
cd
April 27, 2014 1:52 pm

Willis
The question is not why, as the effect is assuredly real.
As I said I’m quite happy to take your word for it.
Note that the step size in my loop (window = 1:1000) is one. This means the window steps just the way you specified, “continuously through the series” one step at a time.
Thanks for doing that. But there was no need as I took all your points. What strikes me though is the exact symmetry in your histogram. This is almost starting to look as if, in a round about way your actually producing the series’ non-centred spectrogram (for the windows temporal resolution). I know there are many methods of producing certain types of power spectra using moving windows and since the series is red noise then more signal (i.e. exponential decay with wavenumber for w = 1 to N/2) is contained in the end members of the of the non-centred power spectra. I hasten to add that I only know of them, but have never used them and don’t know exactly what they do so I could be talking out of my hat.
Because statistically, the moving window is no different than randomly selected windows.
I understand all that as stated, I don’t refute anything you say. And I’m not suggesting that it isn’t real. But I want to know is why – mathematically, not just that it is.

cd
April 27, 2014 2:21 pm

Bernie Hutchins
After I said that the issue of cross- vs auto- is a matter of terminology, you provide a link to definitions! Not only do I know the definitions, I know what they MEAN.
Are you sure, they refer to two very different algorithms.
As for a computer package, you hardly need anything like that. Have you looked at my Matlab code?
As I say I’ve written the math libraries to do this. I only need a simple executable, one main function and gcc compiler. It costs absolutely nothing and I have the advantage of knowing what the business code is doing which you don’t have with something like R.
As for Matlab, it is way over priced for what it does. Scientists should be taught to code in C and Fortran. This move to scripting languages such as R concerns me.
Sorry if I am sounding sanctimonious to you. Apologies if I have crossed the line between persistently trying to be helpful (a habit as an educator) and showing impatience with a lack of progress.
I put the sanctimonious in quotes for a reason – didn’t mean it literally. Writing code will certainly help you understand certain algorithms when you have to code them up in something like C (code akin to proof). This is not true for R which while having a lot of powerful functionality shields the user from doing any of the actual maths.

Frederick Michael
Reply to  cd
April 27, 2014 2:36 pm

Maybe the best way to understand this is to think of how a point “competes” with it’s neighbors for the title of maximum (or minimum). Let’s just stick with maximum for now. Any point has a 50% chance of losing to either neighbor (if it has 2) and those two probabilities are independent. So, if you only have 3 points, the middle one only has a 25% chance of being the max. The remaining 75% is split between the other two (symmetry arguments are very helpful here). So, the dist is 3/8, 1/4, 3/8.
Now, let’s extend this. Since we know the 3/8 for an endpoint, we know the prob of a point being beaten by either it’s immediate neighbor on one side or the next neighbor on the same side. This raises the prob of losing from 1/2 to 5/8 because the conditional prob of the second neighbor winning (given the first didn’t) is 1/4 (50% chance the jump was in the opposite direction and 50% chance it was larger than the jump from our subject to the first neighbor).
Using this, we can derive the probs for 4 points (5/16, 3/16, 3/16, 5/16). Now we know, from the endpoints that the prob that a point isn’t beaten by its 3 nearest neighbors on one side is 5/16.
Using this, we can derive the dist for 5 points (35/128, 5/32, 9/64, 5/32, 35/128).
That’s enough for now.

cd
April 27, 2014 2:44 pm

Nick Stokes
Right that is interesting. It’s like bloody magic. What was your explanation again – slowly this time. A link to something describing your explanation would be good.

Bernie Hutchins
April 27, 2014 3:27 pm

Above a number of comments seemed right on as to the essential reason for the clustering of max/mins of a correlated signal at the ends, but I did not see a picture. Histograms such as Willis used tell the correct story, but perhaps a medium sized set (10) of actual red-noise signals will be useful.
Here I generated (Matlab similar to comments above) 10 length 1000 white noise signals (uniform distributions between -1 and +1), integrated them, and kept the red signals from sample 500 to 599. These are plotted as a-j:
http://electronotes.netfirms.com/redguys.jpg
The red dots show the max and the blue dots the min. We see the extremes tending toward the ends, as was found in the histogram view. Note (for example) that 60% of the max values are outside the center 60% of the range, as are 70% of the min values.
It is instructive to look at the individual red signals. They are all over the place (see vertical scale). In 100 steps, there were 100 integration contributions, averaging 1/2 in magnitude. Most of the 10 examples moved considerably less that 10 of the possible 50 as positives cancelled negatives as expected. Yet we see as well some gratuitous trends (up or down for much of the 100 samples, like c, e, and h) of significance, which forces extreme values to the ends.

Nick Stokes
April 27, 2014 3:46 pm

cd,
I don’t have a link. But imagine looking at the daily sequence without knowing the weekday status. You see various clusters of warm days (2,3,4..), which will probably provide many of the week max’s when the division is known.
Clusters that are split by a week boundary may well provide two adjacent week max’s. Others, probably only one. The split clusters get overcounted. And the weekend days are the chief beneficiaries.
Or to see it another way. How can a Wed top the week? It has to beat Tue and Thu, at least. But if Wed is warm, they are likely to be warm too. Tough competition.
Sun has to beat Mon, but not the adjacent Sat. It has to beat the following Sat, but there’s much less reason to expect that to be warm.

cd
April 27, 2014 4:06 pm

Nick
Thanks for your patience. I see what you’re saying. I thought I had stated as much.
You see I can imagine a sine wave series with a small amount of noise added to the each point (+/- say 0.05 of the amplitude). This is both stationary and autocorrelated. Any sample window with a length that is not an integer multiple of the original sine wave wavelength, will have a regression line that is not equal to zero (local drift), therefore the highest and lowest values are likely to lie at the ends of the windows (no matter where you sample).
This may seem a little contrived but satisfies the conditions of the experiment (and easy to communicate) and should apply to red noise for windows with lengths less that the range of the red noise signal. So could you run your experiment again with very large (continuous) windows say (1/4 the length of the series).

cd
April 27, 2014 4:09 pm

Bernie
If I follow your link you prove my point.
Most of your samples have drift so it follows from (see above post to Nick):
You see I can imagine a sine wave series with a small amount of noise added to the each point (+/- say 0.05 of the amplitude). This is both stationary and autocorrelated. Any sample window with a length that is not an integer multiple of the original sine wave wavelength, will have a regression line that is not equal to zero (local drift), therefore the highest and lowest values are likely to lie at the ends of the windows (no matter where you sample).
This may seem a little contrived but satisfies the conditions of the experiment (and easy to communicate) and should apply to red noise for windows with lengths less that the range of the red noise signal. So could you run your experiment again with very large (continuous) windows say (1/4 the length of the series).

cd
April 27, 2014 4:13 pm

Nick/Bernie
“…should apply to red noise for windows with lengths less that the range of the red noise signal.”
should be:
“…should apply to red noise for windows with lengths less that the range of the red noise series.”

Bernie Hutchins
April 27, 2014 5:15 pm

cd – replying to your April 27, 2014 at 2:21 pm – haven’t read your latest yet
cd-
A couple of points:
I am not sure what you mean by “two very different algorithms.” I don’t think of correlation as anything like an algorithm. Perhaps it is an operation. Once you decide what correlation is, you apply it to two signals. If the signals are different, it is cross-. If the signals are the same, it is auto-. But this is much as a multiplication such as AxB is called “squaring“ when A = B.
The only possible correlation algorithm I can even imagine would be the use of a fast convolution algorithm to do correlation. Perhaps this is where your ideas are coming from, because you originally proposed getting red noise by using an FFT. This bothered me because you said “3) apply an exponential decay as function of wave number” and I didn’t know what “apply” means. I assumed you meant to take the inverse FFT of the exponential decay and multiply it “bin-by-bin” (k values) by the FFT of the white noise. Because this uses an FFT algorithm (Fast) it can be fast for very large convolutions (used for correlations). But it is circular, so you have to live with the periodicity ,or extensively zero-pad if you want linear convolution. It takes too long just to figure out unless you have immense amounts of data.
And as I have emphasized, it is terminology and no one is talking about running a correlation. Correlation is a pre-existing property which a sequence has or does not have. We can correlate white noise into red or pink noise, for example, but we use a filter (usually low-pass) to do this. The filtering correlates successive samples. [ Correlating is filtering. Computing a correlation is analysis. ]
I agree that software that involves complicated functions and scripts does not always teach you (show you) much in the sense that Fortran, C, or Basic can. Matlab has powerful functions and scripts, but it is interpretive and can be used simply. For example, here is the “core” of my white-to-red converter:
for m=1:100
xr(m) = xr(m-1) + x(m) ;
end
It’s just a discrete integrator. You know what you did exactly. And you can ask, for example, what would happen IF you instead used xr(m) = (0.9)*xr(m-1) + x(m) , etc;

Bernie Hutchins
April 27, 2014 5:49 pm

cd said April 27, 2014 at 4:09 pm
“ Bernie If I follow your link you prove my point.
Most of your samples have drift so it follows from (see above post to Nick):”
cd –
Nope, it can’t be “most” because either they ALL have “drifts” or NONE of them do. What do you mean by “drift”? Apparently something you SEE that you choose to call a drift?
You could have a white noise sequence -1 2 0 1 -1 -2 5 7 8 9 7 8 11 9 and claim it is drifting. Fooled by randomness. You will be quick to point out that it in all likelihood will come back down soon. Red noise sequences (random walk, drunkard walk) show features like this often, and much more extreme in durations and magnitudes. But they TOO always come back if you are patient. And then they “drift” again, and so on.
All of mine are pieces of sequences that are really infinite. We generally look at them only briefly!
But your quarrel should be with the mathematics of the random walk itself, not with me or Nick. No magic – Nature IS subtle.
Possibly the app note link I provided to my “Fun with Red Noise”
http://electronotes.netfirms.com/AN384.pdf
would be entertaining to you at some point. It really is Fun.
Bernie

cd
April 28, 2014 2:07 am

Bernie
I am not sure what you mean by “two very different algorithms.” I don’t think of correlation as anything like an algorithm
This is getting tiresome (I’ m sure for you too). All math procedures boil down to an algorithm. For a continuous series, the integrand for cross correlation has two samples from two series; for the autocorrelation there is two samples from one series. You might think this is trivial as the expressions have a lot in common but that’s not the case. For example, because the autocorrelation is from one series one can assume that that variance of both sets are the same and that the value at lag = 0 is equal to that variance. You cannot make such assumptions with cross-correlation because they’re different.
The only possible correlation algorithm I can even imagine would be the use of a fast convolution algorithm to do correlation. Perhaps this is where your ideas are coming from, because you originally proposed getting red noise by using an FFT. This bothered me because you said “3) apply an exponential decay as function of wave number” and I didn’t know what “apply” means. I assumed you meant to take the inverse FFT of the exponential decay and multiply it “bin-by-bin” (k values) by the FFT of the white noise.
No. You misunderstand.
And as I have emphasized, it is terminology and no one is talking about running a correlation.
Jeeze…read the article note the reference time and again to autocorrelation.
What do you mean by “drift”?
And yet you refute what I’m saying even though you don’t understand it.
Apparently something you SEE that you choose to call a drift?
Fit a simple linear regression line through your sample series, and I’m sure for most, you’ll get statistically significant trends that are not equal to zero! Drift is a term used widely to denote a trend, it is commonly used to denote the emergence of a trend in a process (such as a Markov Process).
But your quarrel should be with the mathematics of the random walk itself, not with me or Nick. No magic – Nature IS subtle.
I’m not quarreling with anyone. I find the article truly interesting and want to know why? I don’t need to invoke a random walk, and shouldn’t do so as I want to be 100% sure I’m sub-sampling a stationary process.
My own hunch is that below a certain window size, as with my sine wave example, there will be local drift (a trend) that will result in extremes values likely being at the end of each sampling window.
I think Willis has done a great service here. I like many others tend to assume that we know all there is to know about the methods we use routinely. But their nuances are far too great and varied that perhaps we need to reappraise how we use them all the time.

cd
April 28, 2014 2:52 am

Willis
the idea that anything but R should be used for this kind of scientific work is … well, not a brilliant plan
That is a personal opinion.
In R, on the other hand, you do this
This highlights the issue. R is essentially a scripting language for a given environment akin to VBA in Excel. It’s quick and easy to use (as you show) but it isn’t very efficient.
Take your addition for example, the R scripts are interpreted, so that under the hood, the R script will likely be interpreted into C so that in the end – for the CPU – it all looks the same. The only thing is that if you build and compile in the “native” C code, you can be guaranteed that it will be far, far more efficient in terms of memory and CPU. So if you’re dealing with big data sets and very complex problems R sort of runs out of gas quickly. Furthermore, with C you can exploit the power of the GPU which is tailored to very specific mathematical problems and incredibly efficient. On top of this, as you build up your catalogue of C functions, as with R, doing very complex operations takes a few lines of code but with all the benefits of greater speed and management of resources.
In short, it’s a balance between efficiency+power and ease. This is a common issue, the judgement depends on the situation at hand. So in your instance for a small study R might be best. Once the processing time takes longer than the development time then its time to make the switch.
Personally, I see the use of R as something that can be useful at the design stage but in the end you should build all your number crunching in C.

cd
April 28, 2014 2:56 am

Willis
for the CPU – it all looks the same
By that I mean a series of incremental steps, the “native” C code should be complied more tightly.

cd
April 28, 2014 5:28 am

Oops…
Take your addition for example, the R scripts are interpreted, so that under the hood, the R script will likely be interpreted into C so that in the end…
Sorry that is just lazy and wrong, should be:
“…the R script will likely be interpreted by an interpreter and use math libraries (.so/.dll) written in C…”

Bernie Hutchins
April 28, 2014 8:30 am

cd said various things April 28, 2014 at 2:07 am:
“All math procedures boil down to an algorithm.”
If you wish, but using terms according to the common usage, in context, avoids confusion that occurs if you try to redefine or misuse as you go.
I said “This bothered me because you said ‘3) apply an exponential decay as function of wave number’ and I didn’t know what ‘apply’ means. I assumed you meant to take the inverse FFT of the exponential decay and multiply it “bin-by-bin” (k values) by the FFT of the white noise.”
to which cd replied:
“No. You misunderstand.” [That’s all cd wrote!]
Then what WERE you talking about. The term “Apply” does not mean anything in this context. (Paint can be applied!) Are you adding, multiplying, convolving? If so, what and how? How about some code or pseudo-code or a formula, – or at least something. You dodged the question.
cd then also said:
“I don’t need to invoke a random walk”
But this whole thing got started with your telling us (incorrectly or at least not with adequate information) how YOU proposed to generate red noise. Now – under the bus?
cd also said:
“My own hunch is that below a certain window size, as with my sine wave example, there will be local drift (a trend) that will result in extremes values likely being at the end of each sampling window.”
I should certainly think so! Try length 2. Even a length-2 of constants.
Last word is yours if you want it.

cd
April 28, 2014 11:04 am

Bernie
If you wish, but using terms according to the common usage, in context, avoids confusion that occurs if you try to redefine or misuse as you go.
This sounds like waffle.
I assumed you meant to take the inverse FFT of the exponential decay and multiply it “bin-by-bin” (k values) by the FFT of the white noise.”
The step is outlined. Do I need to spell what an exponential decay is? Do I have to spell out how you’d apply (and yes it is apply as in applying a scalar, a smooth etc. to any series) to the spectral information (both real and imaginary terms or just real if cosine transform is used).
Anyway, this is all immaterial because it is now quite clear that this is not the reason for the given distribution of extremes for sub-windows. By the way this is the standard method for creating stationary red noise.
How about some code
This is a blog for heaven’s sake get a sense of propriety. Look up “generating red noise (Brownian noise) using an FFT” and you’ll see exactly what I meant – as outlined.
But this whole thing got started with your telling us (incorrectly or at least not with adequate information) how YOU proposed to generate red noise. Now – under the bus?
Oh bloody hell here you go:
http://en.wikipedia.org/wiki/Brownian_noise
I hate quoting Wikipedia but for some people needs must!
Last word is yours if you want it.
I already have. My explanation seems fine to me.

cd
April 28, 2014 11:35 am

Bernie
Sorry that last post was terse and bordering on rude. My only excuse is that that this time of year (Spring here) I get sinus pain and puts me in an awful mood.
Anyway. Look I thought that there was something quite remarkable going on here. I looked at the problem and I just couldn’t see why it was – and even invented some spurious theories. When I sat down and thought about – pen and paper (no need for code).- it occurred to me (several posts up in fact) that this all just might be down to local drift and short sampling windows (capturing this drift). But this is just “commonsensical” and obvious to anyone analysing such sets – it seemed too easy and hence why all the fuss? Could still be wrong but don’t really care anymore.
There the last word.

ICU
April 28, 2014 7:46 pm

Willis,
Any chance that you could post the actual time series (somewhere/anywhere) that you used in the above analysis.
I know it’s two million points long, but all I need is a linear array of monthly y-values as a text file or any format that you would prefer.
Thanks

Bernie Hutchins
April 28, 2014 8:24 pm

cd –
Making the certain transgression error of returning here after vowing to give you the last word, and taking the risk of aggravating your sinus condition, you did after all ask:
“Do I need to spell what an exponential decay is?”
No, but you still have not said WHAT you are doing WITH an exponential decay. If you are filtering to red in the frequency domain, you would multiply the FFT, point-by-point, with a RECIPROCAL of k, mirrored at the midpoint of course, etc. etc. This is the way Matlab programmers generate red noise using the FFT.
The rest I understand (and have for ages) but I haven’t a clue why you say “exponential decay”.
[ Any conceivable reding or pinking filter would have an impulse response consisting of a sum of decaying complex exponentials, but this would involve the inverse FFT first.]
Thanks.
Bernie

Bernie Hutchins
April 28, 2014 10:02 pm

Additional on Red Noise by FFT
If we use the conventional method of converting white—>red by the use of the FFT, something curious happens. Because for zero frequency (k=0) we would be multiplying by the reciprocal of k as 1/0, this is not allowed; and instead we multiply this one point by 0 (Matlab for example discards this as “NAN” – not a number). This removes the mean – automatically! In my experiment of generating length-1000 red noises, from which I then snipped a subset of 100, removing the mean from the length-1000 had a strong tendency to (of course) greatly reduce any dc offset or “drift” apparent in the length-100 subset. So it may automatically APPEAR superior:
http://electronotes.netfirms.com/redguysbyFFT.jpg
which can be compared to the original result redguys.jpg also there (time-series integration method). But of course, it is easy to directly remove the means intentionally, completely if we wish. Obviously – nothing we do with means changes the positions of max and min.

cd
April 29, 2014 2:07 am

Bernie
No, but you still have not said WHAT you are doing WITH an exponential decay.
What I have presented should’ve been enough, I’ll state it again:
1) Take a white noise series (spatial/time domain)
2) Forward FFT
3) FFT series (real and imaginary terms)
-> for each wavenumber (to both real and imaginary components)
apply scalar (scalar = 1/(|w|^B) where B is arbitrarily chosen (whether you want red/fractal noise for example)
4) Back transform FFT -> red noise series.
But this is all immaterial now. I thought there was something more profound at the time going on.
exponential decay
Because it communicates what the power spectrum looks like without actually having to define a flipping equation. I had assumed most people would know what I meant.
Because for zero frequency (k=0) we would be multiplying by the reciprocal of k as 1/0, this is not allowed; and instead we multiply this one point by 0
Ah by k you mean w (wavenumber w -N/2 to N/2)? We’re talking cross purposes here. You’re the type of chap that uses j instead of i when writing complex numbers? You’re an engineer – right?
The noise in step 1 is always centred so that w = 0 should be immaterial, and yes I don’t do anything with it, so step 3 is only for all w != 0 (in C you typically get an arithmetic exception, so even more critical to ignore).
And finally, AND MOST IMPORTANTLY, is my explanation for the distribution of extremes correct (localised drift). If so then what they hell was all the fuss about? I thought there was something truly remarkable going on.

cd
April 29, 2014 3:10 am

Bernie
I think I’ve realised were most of the confusion is coming from I’m thinking spatially so hence the wavenumber (for the purposes of the experiment it doesn’t really matter as far as I’m concerned). I note you correctly refer to frequency (given that Willis refers to time series, but as I say this could be a cross section of altitude) but the “equivalence” is there.

Bernie Hutchins
April 29, 2014 9:12 am

cd –
We are getting close.
Yes I am an engineer (engineering physics) so I understood your term “wave number” to be my k, and I did explicitly define the DFT in a comment well above as involving n (time), k (frequency) and of course j. I taught signal processing for 35 years.
I still do not see why you refer to an exponential decay. You write out “scalar = 1/(|w|^B” which would be a correct frequency weighting for red if B=1 (AND you are apparently thinking cosine transform, and not FFT as you say). Because w is the variable here, an exponential series would be, for example, B^w, not w^B. That’s why we should write things in math language.
As a concrete example: If you have a length 7 signal x(n), n=0…6, and take it’s FFT X(k), k=0…6, then you would achieve a red filtering by multiplying each X(k) by 1/k, a series of reciprocals, reflected at the midpoint. Specifically for length 7 this would be the series 0 1/2 1/3 1/4 1/4 1/3 1/2 where the value of 0 for k=0 is necessary to avoid infinity, and does remove the DC term. When you take the inverse FFT following the multiply the result is real (you usually have to remove a tiny imaginary part that is due to roundoff).
Bernie