AR6 and Sea Level, Part 3, A Statistically Valid Forecast

Lots of pressure to publish this post early and it is raining this morning here in The Woodlands, so no golf. I checked it over and think it is OK. Here you go!

By Andy May

In Part 1 of this series, we examined the data and analysis that was presented in AR6 to support their conclusion that sea level rise is accelerating. In Part 2 we looked at a serious examination of the observational record for sea level rise over the past 120 years and the modeled components of that rise. We concluded in Part 1 that the statistical evidence presented in AR6 for acceleration was crude and cherry-picked. In Part 2 we saw that the error in both the estimates of sea level rise and in estimating the components of that rise is very large. The error precluded determining acceleration with any confidence, but the data revealed an approximately 60-year oscillation of the rate of sea level rise that matches known natural ocean cycles.

Modern statistical tools allow us to forecast time series, like GMSL (global mean sea level) change, in a more valid and sophisticated way than simply comparing cherry-picked least squares fits as the IPCC does in AR6. Our forecast is based on pure statistics. It is done in the correct way, but not necessarily correct, statistics are like that. We will not know for sure until 2100. That said, let’s do it. If you have a certain kind of nerdy mind, you will enjoy this.

Figure 1 is a plot of the data we will use—the NOAA sea level dataset. Simply looking at it we can tell it is autocorrelated, which means that each quarter’s mean sea level estimate is highly dependent upon the previous quarter’s value. Autocorrelation is important to consider in least squares regression, especially when forecasting time series, but routinely ignored by the IPCC.

Figure 1. NOAA mean sea level anomaly, 1900 to 2022. Each dot is one quarter (3 months).

Figure 2 plots each sea level estimate versus the previous estimate, this is called a plot of the first lag and the correlation of the two is a measure of autocorrelation. The R² of the first lag is 0.97, so sea level is very autocorrelated. This is obvious but means that normal least squares linear fit statistics are invalid, the least squares statistics, such as R², assume that the errors of regression are independent. Least squares, as used in AR6 to show acceleration, is inappropriate with a dataset like this. Most of any given value is heavily dependent upon the previous value. This means the mean-square-error (MSE) will be much too small, causing the error of the fit to be too small. As a result, any least squares line of the data in Figure 1 or any portion of that data is statistically useless, unless the autocorrelation is accounted for.

Figure 2. Plot of GMSL versus the previous GMSL, the first lag. The values are highly correlated. The small autocorrelation plot shows that GMSL is highly autocorrelated for at least 7 years.

So how can we forecast GMSL in a statistically valid way? We clearly cannot use least squares and need to apply more advanced techniques. The first step is to remove the autocorrelation from the data, this is normally done by subtracting the previous GMSL value from the current one and progressing in this way throughout the data set. We have done this and show a plot of the result in Figure 3.

Figure 3 A plot of the first difference of GMSL. The plot is random and fairly uniform left to right, suggesting that the autocorrelation is removed, and the data are stationary.

The first difference data from GMSL looks pretty good, very much like white noise. This is exactly what we want for valid statistical analysis and forecasting. We will be using an R function called “arima” to create our GMSL forecast, and this function requires three parameters to work, they are called p, d, and q. These parameters tell arima how to condition the input data and build a model that can project valid future values. The plot in Figure 3 shows us that “d” is one. That means taking one difference of adjoining values removes autocorrelation. We also need the data to be stationary, that is the statistical properties do not change with time (left to right). The original dataset (Figure 1) was clearly not stationary, and this is OK, we just do not want the way GMSL changes to be a function of time for this analysis. The R Augmented Dickey-Fuller Test (ADF) function confirms this, as the original dataset has an ADF p value of 0.79, meaning it is non-stationary. The arima p value is not the same as the statistical p test.

The differences plotted in Figure 3 have an ADF p value of 0.01, well below 0.05, the threshold needed to show stationarity. Data are stationary when the distribution over the period being studied is evenly distributed around the mean. That is the distribution, up and down, does not vary significantly with the time axis (x).

Next, we need to derive the arima p and q values. For this we need the ACF (autocorrelation) and PACF (partial autocorrelation) plots shown in Figure 4.

Figure 4. ACF and PACF plots to determine p and q. The top plot shows that any value of one or over is possible for p since the series is very strongly autocorrelated at all lags. The lower plot shows that once the first level autocorrelation is removed only two significant autocorrelations remain (1 and 3) so q=2.

Analyzing the GMSL time series gives us an arima parameter set of (1,1,2) for (p,d,q). We can also run an R function called auto.arima to see what parameters it recommends. We find that it settles on (1,1,2) as well. This is good confirmation that our parameter selection is correct. Figure 5 plots the results.

Figure 5. The results of an arima forecasting model. The top plot shows the residuals, next we see the ACF of the residuals and the Q-Q plot, both look good. The bottom plot gives the Ljung-Box statistics for various lags and they are all over 0.05, which means that the residuals are white noise, exactly what we want.

Figure 5 tells us that the model is successfully capturing the essence of the trends in mean sea level from 1880 through 2020. The model residuals show no trend and they are not autocorrelated. Figure 6 shows the arima forecast from the (1,1,2) model.

Figure 6. The arima forecast of mean sea level to 2100. The confidence limits plotted are 95% limits. A histogram of the model residuals is shown to the lower left. The residuals are pleasingly normal.

Figure 7 is a plot of the forecast from Excel that is easier to read. The forecast we created predicts that GMSL will rise between 148 (6 inches) and 258 mm (10 inches) by 2100. Many researchers call this alarming, but humans have successfully adapted to much higher rates of sea level rise in the past as we can see in Figure 2 of Post 1, and they did so without the technology we have today. When we consider that the average open ocean daily tide range is 1,000 mm or three feet, eight inches of sea level rise over 100 years does not seem like much. In the 20th century sea level rose 5.5 inches, did anyone notice or care, aside from a few researchers?

Figure 7. The forecast with more detail. The model predicts mean sea level in 2100 of 203 mm over the 1993-2008 average. The 95% confidence limits are 148 (6 inches) to 258 mm (10 inches) and marked with a curly brace. The range of predictions is not alarming, it is just over the 140 mm or 5.5 inches observed in the 20^th century.

Conclusions

In the United States we would call the AR6 attempt to convince us that the rate of GMSL rise is accelerating, using adjoining cherry-picked least squares lines “high school,” meaning unsophisticated. Their method is problematic because GMSL is heavily autocorrelated and non-stationary, rendering their cherry-picked least squares fits and least squares statistics invalid.

Our fit, using the R function arima, is at least statistically valid. We specifically corrected for autocorrelation and forced the series to be stationary. We also addressed the minor partial autocorrelation that was left at one quarter and three quarters. The residuals of our model passed both the overall Ljung-Box test and multiple-lag Ljung-Box tests for white noise, meaning the arima model properly captured the 140-year trend in the NOAA sea level data.

Thus, while AR6 cherry-picked periods to support their conclusion that GMSL is accelerating, we reached the opposite conclusion using all the data in a statistically valid way. This does not mean that our forecast is correct, but it does mean that the AR6 speculation that sea level might rise 5 meters by 2150 is extremely unlikely and is best characterized as irresponsible speculation. Our analysis found no statistical evidence of acceleration and produced a linear extrapolation.

While warming of Earth’s surface is clearly the reason land-based glaciers are melting, which does contribute to rising sea level, AR6 provides no evidence the warming is caused by human activities. They use models to infer humans caused it, but unfortunately their models are also not statistically valid as shown in Part 2, here, and by McKitrick and Christy (McKitrick & Christy, 2018). We can all agree that humans probably have some impact on atmospheric warming, but we do not know how much is caused by humans and how much is natural, because we are emerging from the unusually cold Little Ice Age—the “preindustrial” period. Further, as we saw in Part 2, the 30-year rates of sea level rise reveal a distinctly natural-looking oscillation. Glacial ice and ice sheet melting is likely responsible for most of sea level rise, as AR6 states, but the human fraction of that warming might be quite small.

Thus, from a purely statistical point of view, the AR6 claims are childishly invalid. A proper analysis of the data leads to a forecast of roughly 20 cm (~8 inches) of sea level rise by 2100. In the year 2100, our descendants will know who was right.

The data and R code to create the figures in this chapter can be downloaded here. The R code and spreadsheet provide much more detail about the arima forecast, including references not supplied below.

Works Cited

McKitrick, R., & Christy, J. (2018, July 6). A Test of the Tropical 200- to 300-hPa Warming Rate in Climate Models, Earth and Space Science. Earth and Space Science, 5(9), 529-536. Retrieved from https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2018EA000401

4.6 21 votes

Article Rating

106 Comments

Alex

March 22, 2022 10:07 am

Forecasts are difficult. Especially about the future.

Scissor

Reply to Alex

March 22, 2022 12:01 pm

Like deja vu all over again.

LdB

Reply to Alex

March 22, 2022 5:54 pm

As I was schooling Bindidon put up an image of the best fit quadratic it’s hilarious. Not only is the future prediction wrong the hindcast is wrong. That fit is just random noise and anyone who puts any faith in it’s prediction needs a lesson in basic science.

-2

Bindidon

Reply to LdB

March 23, 2022 3:32 am

Instead of playing the big teacher, try to contribute in a meaningful way.

For example, by doing the same job as I did:

https://wattsupwiththat.com/2022/03/21/ar6-and-sea-level-part-2-the-complexity-of-measuring-gmsl/#comment-3482268

When you will have done that with the same success, come back here and show us how it looks like.

It was not my idea to compare quadratic fits: I am an engineer, and no statistician; and hence can’t contribute to this current discussion anymore.

What Frederikse & alii, Dangendorf & alii have achieved with their studies is and remains what people like you have to be measured against.

You, LdB, are light years away from that.

LdB

Reply to Bindidon

March 23, 2022 4:59 am

I doubt you are an engineer a student maybe but if you can’t see the problem with the quadratic fit given then you probably need to stop posting now because you aren’t helping you are just looking more stupid.

The moment you start to fit anything then you need to stop and be critical because it’s easy to fool yourself and there are lots of correlations that are nothing more than chance or come about thru badly wrong statistical analysis. One of the first thing to do is sanity checks look at forecast and hindcast and does it make sense?

This crap fails both so there a problem. The issue is very straight forward so see if you can work it out. Perhaps lets give you another point to consider if you melted all the ice on earth sea level rises by 80-90m from present and then can rise no further (it flat lines) 🙂

-5

Tom Halla

March 22, 2022 10:11 am

My problem is that I do not know stat well enough to take this on anything but trust, but eyeballing the graph gives pretty much the same result.

Andy May

Author

Reply to Tom Halla

March 22, 2022 10:31 am

Tom,
It usually does. Eyeballs are pretty good forecasting tools. The methods I used are correct and commonly used in financial and economic forecasting. This is one reason I do my own investing. I only rarely use these techniques in choosing investments, they are mathematically correct, but often wrong.

Scissor

Reply to Andy May

March 22, 2022 12:02 pm

Eyeballs are good to two significant figures.

Rud Istvan

Reply to Tom Halla

March 22, 2022 10:48 am

I know ARIMA inside out. AM did a masterful job of it. This is as theoretically as solid as anything in statistics can be.

M.W.Plia

March 22, 2022 10:15 am

I agree, the indicated, measured rise is a “natural-looking oscillation” as it appears the sea level is returning to previous normal (higher) levels achieved during the medieval warm period.

Chaswarnertoo

Reply to M.W.Plia

March 22, 2022 4:46 pm

Harlech castle sea gate, built 1000 years ago.

Alan M

Reply to Chaswarnertoo

March 22, 2022 5:21 pm

Like so much climate related stuff, all up-side down

Bryan A

Reply to Alan M

March 22, 2022 5:50 pm

Must be Down Under…Or at least Wales

Rud Istvan

March 22, 2022 10:46 am

Ah, a blast from the past. I got a summa in economics as an undergrad, had my thesis accepted for the PhD, and had already passed all the general PhD exams except history of economics. Chose the joint JD/MBA program rather than finish the PhD, as there was nothing left to learn. My general interest was math modeling, so my economics degree was largely econometrics. ARIMA (auto regressive integrated moving average) is a BIG deal in econometrics.
Well done, Andy May. So much for IPCC AR6 ‘science’.

Andy May

Author

Reply to Rud Istvan

March 22, 2022 11:48 am

Thanks Rud, appreciate the kind words.

MarkW2

March 22, 2022 11:06 am

The stats seem pretty valid and far more rigorous than we typically see in models from climate ‘scientists’. Indeed, it never ceases to amaze me how any so-called ‘scientists’, from climate disciplines or otherwise, can claim the climate models are remotely accurate given the number of variables involved for what is very much a non-deterministic system. It is, quite frankly, farcical.

The conclusion also points out that the author’s forecast might not be correct, which anyone who knows anything about modelling will understand is very sensible. The key point of this analysis, however, is to test the predictions of AR6; and nobody should be surprised that the AR6 ‘findings’ are nonsensical.

It’s a dreadful reflection on the state of ‘science’ in today’s world that this type of analysis should even be necessary, but it is.

Andy May

Author

Reply to MarkW2

March 22, 2022 11:51 am

Mark, I agree. I should not have had to write this. It should be obvious, and it clearly is to many.

Bob

Reply to Andy May

March 22, 2022 12:54 pm

Andy, always keep in mind it was necessary for you to write this. People like me don’t know these things, we deserve to know them but info like this will never be given to us unless people like you do it. Well done.

Andy May

Author

Reply to Bob

March 22, 2022 12:57 pm

Thanks Bob, much appreciated.

MarkW2

Reply to Andy May

March 22, 2022 4:30 pm

What I just cannot understand, Andy, is why statisticians don’t question these things? I know that many academics are afraid they’ll lose their careers if they dare to challenge the climate religion, but the statistics involved here are basic. As you say, it’s High-School stuff.

How can any statistician with any degree of ethics or principles just let this stuff go without challenging it? The basis on which climate models are built means their predictions have pretty much zero statistical validity.

Saying that isn’t even a criticism of the models. It’s just a basic function of numerous variables, many being colinear, for a system that is inherently non-deterministic.

The “king has no clothes”. It’s a crazy situation.

Bob Tisdale

Editor

March 22, 2022 11:13 am

Andy May, The Woodlands as in Texas? If so, I can recall playing golf there many years (decades) ago, but I don’t remember what course. But I can remember playing the Tour 18 in nearby Humble.

Regards,
Bob

Andy May

Author

Reply to Bob Tisdale

March 22, 2022 11:53 am

Hi Bob. Tour 18 is fun and very popular. I haven’t played there in many years though. I play the Woodlands courses mostly, Tournament, Player, Oaks, and Panther Trail. Rarely I will play at Palmer.

DMacKenzie

March 22, 2022 11:27 am

It’s interesting number crunching…. but really all we need to know is that we should check how much safety factor the engineers built into the sea walls…you know, above the highest expected storm surge, and probably build them a foot higher, heck lets make it 2 ft, over the next century…..

Andy May

Author

Reply to DMacKenzie

March 22, 2022 11:55 am

Agreed. With a fraction of what we are paying for climate modeling and unreliable energy we could shore up all our storm defenses and be done with it!

Rick C

Reply to DMacKenzie

March 22, 2022 5:56 pm

Yes! And keep in mind that sea wall design will include maximum expected storm driven wave height and surge – plus a safety factor. SLR may mean a bit more spray coming over a sea wall in bad weather, but is unlikely to result in serious flooding anywhere with properly designed seawalls

Waza

Reply to DMacKenzie

March 22, 2022 8:35 pm

Example- Melbourne, Australia
Highest astronomical tide HAS 0.6m ( theoretical still water height)
HAS + normal storm surge 1.1m
Highest ever recorded height 1.4m ( 1934)
100 year return event used for planning purposes with safety margin 1.6m.
Maximum allowable floor level for all New dwellings 2.2m.

So in Melbourne the lowest house floor level is 800mm above the highest ever recorded tide.
While there is no dought historic homes may be low lying, New homes already have more than a century buffer to SLR.

Steve Case

March 22, 2022 11:43 am

Some quotes that I can understand and or offer some critique:

Autocorrelation is important to consider in least squares regression, especially when forecasting time series, but routinely ignored by the IPCC.

The question to ask is why don’t the IPCC scientists use up to date sophisticated methods?
________________________________________________________________

This does not mean that our forecast is correct, but it does mean that the AR6 speculation that sea level might rise 5 meters by 2150 is extremely unlikely and is best characterized as irresponsible speculation.

It’s more than irresponsible, they are obviously producing propaganda. You can’t prove that so you didn’t say that, but the “Duck Test” says so.

I didn’t realize they said 5 meters by 2150. And if that’s so, it’s obvious that they’ve moved out 50 years past 2100 which has been the standard target date for the previous IPCC assessment reports.
________________________________________________________________

While warming of Earth’s surface is clearly the reason land-based glaciers are melting, … Glacial ice and ice sheet melting is likely responsible for most of sea level rise, as AR6 states, …

I take issue with notion that the ice sheets (Greenland & Antarctica) are melting. There is some surface melting during several weeks during the summer, but the calving of icebergs occurs year round, or at least I think it does. I’ve never confirmed that though.

But the water has to be coming from somewhere, and the ice caps are a good bet, but how much ice is gained or lost from the ice caps is a function of snow fall and calving of icebergs. Temperature on the ice caps that is well below freezing nearly everywhere nearly all of the time, has nothing to do with it. (In my opinion)

Six to ten inches of sea level rise by 2100 is very close to what I’ve always come up with after fooling around with PSMSL data. Thanks for confirming that for me.

Andy May

Author

Reply to Steve Case

March 22, 2022 12:02 pm

“The question to ask is why don’t the IPCC scientists use up to date sophisticated methods?”

If you can get an answer to this, let me know to. I have no idea. The statistical methods I used to make this post are well known and have been around a long time.

“I didn’t realize they said 5 meters by 2150.”

Page SPM-28, AR6. [You knew it was in the SPM, didn’t you?]

Here is the full quote:

Global mean sea level rise above the likely range – approaching 2 m by 2100 and 5 m by 2150 under a very high GHG emissions scenario (SSP5-8.5) (low confidence) – cannot be ruled out due to deep uncertainty in ice sheet processes

This is an expert lie, notice the clever use of “low confidence” and “cannot be ruled out”

Dave Fair

Reply to Andy May

March 24, 2022 7:58 pm

Our tax dollars at work: Leftist propaganda.

Scissor

Reply to Steve Case

March 22, 2022 12:13 pm

Sometimes people don’t seek answers to questions that they would rather not know.

In any case, Ph.D. atmospheric chemists usually only have one semester of graduate level statistics and the rest is on the job training. In short, the training is inadequate and staff statisticians, like good service providers, produce the desired result.

Tim Gorman

Reply to Scissor

March 22, 2022 12:57 pm

My immunology PhC son was told by his college advisor, when he was starting his microbiology degree, not to worry about taking any math courses. If he needed an experiment analyzed just find a math major to do it for him. Thankfully he listened to me and took several math courses including some advanced mathematics. He does all his own work now and does it well. I can’t tell you how scary it is in that field because of the inability of so many to make even basic judgements on their data and what it means.

It’s really the blind leading the blind. The scientists can’t understand their data from a statistical analysis basis and the statisticians can’t understand the data from a real world biology basis. And people wonder why so much of experimental science today can’t be duplicated. It’s so much like climate science it’s unbelievable.

Geoff Sherrington

Reply to Tim Gorman

March 22, 2022 4:29 pm

Tim G,
We helped our 2 sons through University, one with Surveying and its heavy math, the other with Commerce hence econometrics. It turned out to be a useful combo for finding some of the many problems in climate research. Geoff S

Andy May

Author

Reply to Scissor

March 22, 2022 1:03 pm

A real statistician, especially one with a finance background is like gold. They live and die by their predictions and know this stuff inside and out. Every branch of science should pay attention to them, especially meteorologists and climate scientists – but they don’t. Climate scientists know everything, why should they listen to anyone else, right?

Dave Fair

Reply to Andy May

March 24, 2022 8:01 pm

Read Andrew Montfort’s “The Hockey Stick Illusion:” CliSciFi is worse than you thought.

-1

old engineer

Reply to Scissor

March 22, 2022 2:07 pm

Sissor

“… the training is inadequate and staff statisticians, like good service providers, produce the desired result.”

I guess it depends on where you worked. At the place I worked, the engineers had mostly B.S. and M.S. degrees, our three staff statisticians were PhDs. Their goal seemed to be to keep the engineers out of trouble. I remember two pieces of advice I received early in my employment. The first: “Too many engineers think statistics is a black box into which you can pour bad data and crank out good answers. IT IS NOT!” The second was: “Next time come to me BEFORE you write the test plan.”

I am indebted to these three for keeping me out of trouble for all of my 30 year career.

AndyHce

Reply to Steve Case

March 23, 2022 2:07 am

The question to ask is why don’t the IPCC scientists use up to date sophisticated methods?

Does the IPCC have scientists or merely political writers? More to the point, do they do any statistical analysis or do they just report on the statistics of journal papers they have selected as working material?

I do recall a position paper a few years ago, probably featured here, wherein the National Academy of Sciences acknowledged the woeful sate of statistical analysis in “climate” papers and strongly suggested that any ‘climate scientists’ using statistics to analyze or explain their work co-author with a real, card carrying statistician. Of course that would never do in practice, for obvious enough reasons, and so has been ignored.

Mark BLR

Reply to AndyHce

March 23, 2022 6:13 am

Does the IPCC have scientists or merely political writers?

My understanding is that the IPCC has access to (climate) scientists and their work, in addition to their “inter-governmental” (the “I” in “IPCC” …) political appointees.

Most problems seem to arise because the “political writers” have priority, AKA the “final editing” rights, not the scientists.

Dave Fair

Reply to Mark BLR

March 24, 2022 8:19 pm

Partially true Mark: The political types appoint the Authors and Lead Authors (supposedly scientists) that pick and choose which scientific documentation to include in various chapters and sections. During development of the UN IPCC Third Assessment Report, for the paleo climate section they picked Michael E. Mann who had just completed his PhD, a unique political pick.

Lead Authors have absolute control of what goes into their section of the reports. Mann selected his MBH98 “Hockey Stick” study as the premier study to be used in his section, elevated above all the rest. The Leftist Sir John Houghton, then IPCC head, elevated the Hockey Stick graph as the grand representation of the whole report and spread it all over the world. Al Gore picked up on it for his “An Inconvenient Truth” movie and the paleo climatological community has been politically corrupted ever since.

Mark BLR

Reply to Dave Fair

March 25, 2022 5:43 am

During development of the UN IPCC Third Assessment Report …

My experience means I think of “The Chapter 8 Controversy” for the SAR (in 1995) as the first public display of this modus operandi.

It turns out a post titled “Can We ‘Trust the Science’?” has just been posted here at WUWT that includes the following :

Ben Santer was appointed the convening Lead-author of Chapter 8 of the 1995 IPCC Report titled “Detection of Climate Change and Attribution of Causes.” In that position, Santer created the first clear example of the IPCC manipulation of science for a political agenda. He used his position to establish the headline that humans were a factor in global warming by altering the meaning of what was agreed by the committee as a whole at the draft meeting in Madrid.
The consensus of the large group of scientists assigned with assessing the proposed effects agreed in their summary of the main chapter of the report was:“None of the studies cited above has shown clear evidence that we can attribute the observed [climate] changes to the specific cause of increases in Greenhouse gases.”
Santer as Lead Author replaced it with:“There is evidence of an emerging pattern of climate response to forcing by greenhouse gases and sulfate aerosol… from the geographical, seasonal and vertical patterns of temperature change… These results point toward a human influence on global climate.”
It was just a start of central planning and control.

The Santer and Mann examples show just how fuzzy the line between “political writers” and “scientists” can get.

Skeptic JR

Reply to Steve Case

March 23, 2022 8:20 am

Oh yes, the outlying forecasts presume utter doom, 12 feet, 20 feet, 30 feet. This is all predicted on total ice sheet collapse because occasionally a piece of an iceberg breaks off.

rah

March 22, 2022 12:46 pm

Hmmm. Since NOAA and the GISS have obviously been adjusting their temperatures, both historic and present, to match the rise in atmospheric CO2 I wonder just how cold the past will have to be and how hot it will have to be by 2091 if they continue that practice.

Steve Case

Reply to rah

March 22, 2022 6:13 pm

Mark Steyn famously said: How are we supposed to have confidence in what
the temperature will be in 2100 when we don’t know it WILL be in 1950!!

Bob

March 22, 2022 12:50 pm

Outstanding. A little technical but that was needed to show how dishonest and corrupt the AR6 message is and to show how proper work should be done. I salute you.

Tim Gorman

March 22, 2022 12:58 pm

Andy,

Good job. I hope someone in the government reads this. But I’m not going to hold my breath waiting.

Fredrik

March 22, 2022 1:02 pm

Quite a number of problems in the application of ARIMA here. The ACF and the ADF test suggest that the sea level is driven by a stochastic trend. The only way to forecast a stochastic trend variable is to find another stochastic trend variable that cointegrates with the sea level. For this ARIMA model the PACF suggests p = 2 not q =2. And, you must remive the stchastic trend by taking the first difference in order to determine p and q correctly. When you use the level of the series you cannot see the order of p and q because it will be completely covered by the stchastic trend. This work is hastly done and not correct. Remove the post and redo it correctly. However, ARIMA is only good for short term forecasts and in this case only for changes in the dependent variable.

Frank from NoVA

Reply to Fredrik

March 22, 2022 2:34 pm

Been a million years, but would agree that the cut-off in the PACF after two lags implies p=2, hence AR(2). Not sure if Andy tested the actual or differenced series, but if the latter, would suggest ARIMA(2,1,0). We were always advised to try several models and to go with the most parsimonius model that resulted in iid residuals. It’s important to remember that ARIMA is a ‘black box’ approach to forecasting – there’s no ‘causality’.

Andy May

Author

Reply to Fredrik

March 22, 2022 2:56 pm

Fredrik and Frank,
You are both making several assumptions that could easily be checked by downloading the R code and doing the analysis yourself. Until you’ve done that, I’ve nothing to say. I’m pretty sure I did it correctly. I would be willing to wager, that you will get the same answer I did.

The reason for posting the code and data is to answer exactly the questions you are asking.

Rud Istvan

Reply to Andy May

March 22, 2022 3:49 pm

Ouch! Great reply, as I am reasonably sure what you carefully explained is exactly correct ARIMA.

Frank from NoVA

Reply to Andy May

March 22, 2022 4:25 pm

Andy,

You’re a lot smarter than I am, and like I said, it’s been a million years, so I’ll just quote from ‘Forecasting: Methods and Applications’, Makridakis, Wheelwright and McGee, 1983:

Page 380 –

“Trends of any kind result in positive autocorrelations that dominate the autocorrelation diagram, and…it is important to remove the non-stationarity before proceeding with the time-series model building.”

I assume your spectra in Figure 4 were run on the differenced (stationary) data. If so, then –

Page 375 –

“In summary when there are only p partial autocorrelations that are significantly different from zero, the process is assumed to be an AR(p).”

If your diagnostic R spectra were run on the original (non-stationary) data series, then I would have to agree with Fredrik.

Andy May

Author

Reply to Frank from NoVA

March 22, 2022 6:33 pm

The data used to make Figure 4 were stationary, as clearly stated in the post.

“The differences plotted in Figure 3 have an ADF p value of 0.01, well below 0.05, the threshold needed to show stationarity.”

The next step was Figure 4. Remember AR does not equal arima, they are two different things. If you have any further questions, download the R code and run it yourself step-by-step. Otherwise, we could go around in circles for days. Running it would take you less time than it has taken for me to write this reply.

Frank from NoVA

Reply to Andy May

March 22, 2022 10:09 pm

Okay. I’m not an R maven, so just looking at your code (copied below). I see that both ‘acf2’ (the function used to produce the ACF and PACF diagnostics shown in Figure 4, as well as the ‘auto.arima’ function are run on a file named MSL. Since you previously created another file named MSL_diff from MSL using R’s diff() function, I’m assuming that MSL is not differenced and, therefore, is unstationary. If that’s correct, I go back to the quote from Makridakis et. al. (page 380) I provided, above.

# Build arima model, need parameters (p,d,q)
# p is the autocorrelation term, after removing autocorrelation, which is high
# are the residuals white noise?
acf2(MSL,main=“Autocorrelation and Partial Autocorrelation plots of Mean Sea Level”)
#
# sarima takes three numbers, p,d,q
# p are the number of significant acf lags, here p=1 or greater, 1 is sufficient
# d is the number of differences required for stationarity, d=1
# q is the number of significant pacf lags, that is the lags after
# the first order acf lags are removed. q is sometimes called the MA term,
# or the moving average term. here q=2 (lags=1 and 3)
#
# Let’s check if auto.arima gets the same answer
MSL_fit = auto.arima(MSL, approximation=FALSE, trace=FALSE)
summary(MSL_fit)

-1

Andy May

Author

Reply to Frank from NoVA

March 23, 2022 4:24 am

Frank and Fredrik,
Thanks for downloading the code! That allows us to discuss the details, even if you do not have R on your computer. I tried to keep the post as brief as possible, so the code has the details.
Remember we are talking about arima, NOT ARMA or AR, which are different.

Figure 2 shows that one difference (lag) is highly correlated with GMSL, so we need d=1 to reach stationarity. Figure 3 shows that d=1 is sufficient for stationarity.

Figure 4 uses the original dataset: MSL, not MSL_diff which is lag=1. It is the function acf2 you list, line 172 in the R code.

The ACF plot shows all lags are significant, but p cannot be that high, we need the minimum p that works, in this case it was 1 by trial and error. Fredrik wants to use p=2 due to the PACF plot, this is possible, but the results are not as good as p=1, see attached plot. Compare to Figure 3, which is p=1. The ACF is the same and Ljung-Box stats are worse when p=2, especially for the critical short lags. Besides, I believe in using the smallest p that works.

The PACF plot fairly clearly points to 2 as q. Also auto,arima selects q=2, which is the clincher here.

This gives us p=1, d=1, and q=2 for sarima.

auto.arima derives the same values. auto.arima uses various statistics to derive the parameters, mainly Hyndman-Khandakar algorithm and it works well in simple cases like this one.
This is a good reference that explains most of the methodology I used:
8.7 ARIMA modelling in R | Forecasting: Principles and Practice (2nd ed) (otexts.com)

Frank from NoVA

Reply to Andy May

March 23, 2022 7:53 am

Andy,

You’ve been very kind and patient in your correspondence, as usual, so will let you go off to more important things – like golf! I also note that there’s been some heavy weather down your way, and hope that you’ve been spared.

I would, however, like to make a few points in passing:

First, as you indicate above, the ACF and PACF plots in Figure 4 were run on the undifferenced, non-stationary data, and are therefore not useful in positing possible models. If the data were stationary, then plots for an ARMA(p,q) would have shown both the ACF and the PACF plots ‘tailing off’.

The fact that you obtained a similar form of the model from auto.arima, which, if I correctly understand from the paper you graciously attached, initially performs any differencing required to achieve stationarity, seems coincidental to me.

Second, the main reason I questioned the model selection process was that a plot of the raw data (MSL) shows that the series is not exactly linear and, in fact, the slope seems to be increasing with level.

This is confirmed by the plot of the differenced series, which is not stationary in the variance, so maybe using a different functional form to transform the data would have been better.

The end result of both these points is a linear forecast whose slope appears to be visually lower than that of any sub-segment of the original data.

And finally, I’m no expert, and it’s been a while since I’ve done any such modeling, but I do recall being taught the importance of selecting models using a subset of the data, say, 80%, and then comparing the forecast of the last 20% to the actual data to see how each model fared.

The ‘best’ model would then be re-estimated using the entire data set, and only then used to forecast beyond the existing data. As an aside, I don’t see any evidence that many climate scientists of the alarmist persuasion actually do this, regardless of what forecasting tools they are using.

Anyways, that’s just my two cents, and, again, thanks for your patience and kind consideration.

Andy May

Author

Reply to Frank from NoVA

March 23, 2022 9:05 am

Frank,
Thank you for explaining your concerns. I didn’t understand what you meant before. It seems this was your main concern:

First, as you indicate above, the ACF and PACF plots in Figure 4 were run on the undifferenced, non-stationary data, and are therefore not useful in positing possible models. If the data were stationary, then plots for an ARMA(p,q) would have shown both the ACF and the PACF plots ‘tailing off’.

Attached is a plot of ACF and PACF for MSL_diff, the first difference of GMSL, which is stationary.

If suggests, as we previously determined, that p=2 and q=2. But p=1 turned out to be better. Both work OK though.

Frank from NoVA

Reply to Andy May

March 23, 2022 10:04 am

Thanks Andy. You’re correct that I was uncomfortable trying to diagnose the undifferenced data series. Based on the new plots, I would have started with ARIMA(1,1,1), but would also have tried others to see if I could improve on the residuals. So it’s possible we could have come out at the same place.

-1

Andy May

Author

Reply to Frank from NoVA

March 23, 2022 9:10 am

The fact that you obtained a similar form of the model from auto.arima, which, if I correctly understand from the paper you graciously attached, initially performs any differencing required to achieve stationarity, seems coincidental to me.

It is not a coincidence and, yes, auto.arima must be fed the actual series, not the first lag. Also, I prefer to use ACF and PACF of the times series itself, rather than the first lag which is obviously what you were taught. I’ll have to think about that, perhaps in the future I will do both. Either methodology works and in this case the two methodologies very quickly get to the right answer. I reached p=1 first, whereas your method got p=2 first and then p=1 (the correct answer) second time around.

Frank from NoVA

Reply to Andy May

March 23, 2022 10:10 am

The auto.arima function looks like a real time saver, compared to what we had to go through in the old SCA days. Obviously, a good place to start, although I would remain wary of trusting the software too much, as I’ve seen automatic regression packages that will just start adding variables to maximize the ‘fit’ while resulting in serious multi-collinearity issues.

Andy May

Author

Reply to Frank from NoVA

March 23, 2022 9:13 am

And finally, I’m no expert, and it’s been a while since I’ve done any such modeling, but I do recall being taught the importance of selecting models using a subset of the data, say, 80%, and then comparing the forecast of the last 20% to the actual data to see how each model fared.

You are correct, that would be the next step and in a real study it would be required. So, I will do that next. You are the second person to ask for this. The point of the post was only to show how primitive AR6’s justification for acceleration was, but one more step won’t take long.

Frank from NoVA

Reply to Andy May

March 23, 2022 10:11 am

Agreed and thanks!

Fredrik

Reply to Andy May

March 23, 2022 2:45 am

We are not making assumptions. We simply point out some methodolgical flows. Thank you sharing the data. I will get back and tell you it says. In the meanwhile I recommend Therence Mills excellent introduction to the topic. https://www.thegwpf.org/content/uploads/2016/02/Forecasting-3.pdf

Andy May

Author

Reply to Fredrik

March 23, 2022 4:35 am

Fredrik,
Thanks for the reference, I will read it. I don’t think there are any “methodolgical flaws.” If after reading the code, and hopefully running it in R, you still think there are flaws, point them out here and refer to the code. As I discuss above, p=2 works OK, but the Ljung-Box statistics are not as good as they are with p=1, further auto-arima picks p=1 also. q is fairly clearly 2.

Andy May

Author

Reply to Fredrik

March 23, 2022 4:31 am

Remove the post and redo it correctly.

It was done correctly. See below where I tried p=2. p=2 is OK, just not as good as p=1. I agree that sea level is a stochastic trend.

The only way to forecast a stochastic trend variable is to find another stochastic trend variable that cointegrates with the sea level.

This is true in real life. But the point of this exercise was to show the cherry-picked OLS line comparisons in AR6 are juvenile and invalid. This was meant to show the proper way to forecast GMSL using statistics only.

bigoilbob

Reply to Andy May

March 23, 2022 11:18 am

“But the point of this exercise was to show the cherry-picked OLS line comparisons in AR6 are juvenile and invalid.”

But your final point was to minimize the now-2100 ranged increase in sea level, with an expected value of ~ 20cm, as inconsequential. That is why I was interested in a similar evaluation of a few more recent time periods, to see those other 2100 ranges.

FYI, no gotcha intended. I think you can make the case that even larger increases are much less damaging to lower income levels and more remediable generally than other AGW consequences. I even vaguely remember Nick Stokes writing something similar. I am just interested in the more relevant evaluations….

Waza

March 22, 2022 1:04 pm

How do we know the acceleration or deceleration of the glacial rebound or subsidence of individual locations?
There could have easily been periods of higher SLR rate pre 1900.

Andy May

Author

Reply to Waza

March 22, 2022 1:09 pm

There were many times when it was much higher, see post #1, Figure 2.
AR6 and Sea Level Rise, Part 1 – Andy May Petrophysicist

Javier

March 22, 2022 1:15 pm

Statistics will not provide the answer to any scientific question, they only tell you how much trust you can have on a certain answer based on probability analysis.

But having a lot of statistical trust does not mean it is correct, nor having very little trust means it is incorrect. Deep knowledge of a process can beat statistical analysis.

As an example, in 2016 I was among several people that noticed that despite the 2012 low in Arctic sea-ice, the trend in September sea ice extent was no longer down. Knowledge of multidecadal variability led me to understand that a climate shift had taken place and we should not expect a continuation of the previous worrisome (to some) decreasing trend. I published an article here at WUWT comunicating that Arctic sea ice had turned a corner. Tamino (a statistician) used statistics to say I was wrong. Well, six years later I am still right and the statistics are starting to show the change of trend I was able to spot based on knowledge.

The lesson is that statistics is not the final arbiter of scientific questions, just a tool to make better decissions.

Andy May

Author

Reply to Javier

March 22, 2022 1:32 pm

I had forgotten that post, thanks for reminding me. I definitely agree with you on statistics. We should all know the rules but be skeptical and practical about it.

Rud Istvan

Reply to Andy May

March 22, 2022 2:51 pm

The joke back then was consulting three econometricians will result in 6 different opinions, each ‘on the one hand, but then on the other hand’, since each has different hands.

cerescokid

Reply to Rud Istvan

March 23, 2022 3:39 am

That is why Truman said he wanted a one armed economist, because they always came in and after their briefing said “on the other hand “

Warren

March 22, 2022 1:49 pm

Sounds very much like the UN are angling for more money to ‘combat accelerating sea level rise’ to me.

Steve Richards

March 22, 2022 1:56 pm

“On March 22, the newest U.S.-European sea level satellite, named Sentinel-6 Michael Freilich, became the official reference satellite for global sea level measurements. This means that sea surface height data collected by other satellites will be compared to the information produced by Sentinel-6 Michael Freilich to ensure their accuracy.”

So, a new satellite from today…

https://sealevel.nasa.gov/news/233/international-sea-level-satellite-takes-over-from-predecessor/

bigoilbob

March 22, 2022 2:01 pm

FYI, I get all of the “cherry picked” periods in your first post. They all improve their R^2 values with consideration of acceleration included.

Using normal OLS for linear and accelerating trends, and all of the data, I get changes from 2022.25 to 2100 of ~128 and 242 mm respectively. So, your number is reasonable. I also understand how autocorrelation must be accounted for. I can’t replicate your plots, but what do you get if you plot more recent periods, using the same process?. Say 1950-present, 1960-present, 1970-present, 1980-present. These are the relevant time periods w.r.t. AGW emissions. When you do normal OLS, with acceleration considered, the difference between now and 2100 increases, between trends calculated before and after 1950, 1960, 1970, 1980.

I know it’s a lot of work (that I can’t replicate) but I wonder if your process shows anything different.

bigoilbob

Reply to bigoilbob

March 22, 2022 2:39 pm

Also, your trend seems linear, except for some possible days of month departures. Why is is that it’s linear R^2 for the forecast period of 1900-late 2020 is ~0/983, but the R^2 for a normal acceleration treatment is a slightly larger 0.987. Not challenging your process, just curious.

-1

Andy May

Author

Reply to bigoilbob

March 22, 2022 3:05 pm

bigoilbob,
The process you describe is taken care of and much more by arima. Arima checks for curves as well as linear processes. What this process tells you is there is no improvement in the fit with a curve, at least a power or quadratic curve. The arima process is not perfect, but since the original fit was 97%, there is not much chance of improving the fit with a curve anyway. Visually you can see what arima is saying.

bigoilbob

Reply to Andy May

March 23, 2022 10:57 am

Fair enough. The difference I found was admittedly tiny.

Now, per my comment above that, what about performing the same eval on the more recent data? The data that is more likely to be influenced by increasing [CO2] and other GHG’s.

Paul Chernoch

March 22, 2022 2:14 pm

I am curious about the effect of homogenization on auto-correlation within climate signals like seal level and temperature. Do the kind of homogenization procedures employed by the climate modelers tend to increase autocorrelation, decrease it, or have an unpredictable effect? I would think they might increase it, thus amplifying the kinds of effects that you are trying to expose and reject.

Rud Istvan

Reply to Paul Chernoch

March 22, 2022 2:53 pm

Homogenization is used for temperature, not sea level. You correctly intuit that it will increase general autocorrelation.

Andy May

Author

Reply to Paul Chernoch

March 22, 2022 3:06 pm

They increase it. All smoothing or homogenization functions increase it.

Matt Kiro

March 22, 2022 2:40 pm

Is any of your graphs broken down into regions? Because there some places like the Chesapeake bay area which the MSL is rising , mostly due to subsidence from what I’ve read. These places do need to invest in the appropriate infrastructure.

While I appreciate all your hard work in showing that GMSLs are not accelerating , I find that global averages and anomalies from some arbitrary zero point just give in to the narrative that there is some climate utopia point

Andy May

Author

Reply to Matt Kiro

March 22, 2022 3:07 pm

Nope, I just worked with the data shown. This was a purely statistical exercise. I did it to make a point about the AR6 methods.

Neville

March 22, 2022 2:43 pm

Very interesting Andy, but I’m a very poorly educated bloke who isn’t very good at stats or maths.
But here’s a question for Andy or Rud or anyone else to ponder. It has taken fully evolved Humans 200K years to reach 1 billion global population by about 1800 or about the start of the Industrial REV, so why ( how?) did it take just another 127 years (1927) to reach 2 billion and then just another 33 years to reach 3 billion?
In 1970 the global population was about 3.7 billion and today about 7.9 billion, so more than doubling in about 50 years.
So how can this occur so quickly when the Biden donkey etc have told us repeatedly that we’re facing an EXISTENTIAL threat?
BTW for the first 200K years Human life expectancy was under 40 and today is about 73 and much higher in wealthy OECD countries.
So how is our climate so dangerous today when all the real data proves they’re wrong? Africa ( 53 countries) is even more extreme since 1970 when pop was just 363 million and today is about 1400 million and life expectancy was about 46 yrs then and now about 64 years.
And Africa has experienced the misery of HIV/AIDs over the last 50 years as well. Anyway I’d like someone to point out why we’re committed to wasting endless TRILLIONs $ on this so called EXISTENTIAL threat when we can find so little evidence for it?

AndyHce

Reply to Neville

March 23, 2022 2:25 am

Because the things they say are problems are just the smoke screen they are using to gain control.

-1

BallBounces

March 22, 2022 4:10 pm

It boggles the mind how an “amateur” working solo in his spare time can provide more convincing, cogent argumentation than a raft of taxpayer-paid “professionals”.

Geoff Sherrington

March 22, 2022 4:23 pm

Andy,
1. About 7 years ago I did similar autocorrelation studies on annual temperatures for several Australian cities including Melbourne. The results looked broadly similar to yours on sea level. Given that it is plausible that one affects the other (in part) there could be scope for stats nerds to look deeper.
2. To round off your present work, why not use the first half of your data to forecast the second half than compare to measured.
3. Why do you tack your forecast to your last observation, which has a chance of being an outlier, might be better to tack it onto average over last 10 years.
4. Are you comfortable that the NOAA data that you use in Figure 1 are reliable? It looks more curved than most individual tide gauge results I have studied. Goog work like yours might entrench the curve which would be a pity if it has been trough an Establishment process with a purpose. Geoff S

Andy May

Author

Reply to Geoff Sherrington

March 22, 2022 6:42 pm

All very good points Geoff. As your homework for tonight, I expect you to download the R code and perform the exercises you outline. Post the results for my review tomorrow my time. As for #2, I would expect a projection within the margin of error.

Choosing the NOAA data was arbitrary. I would expect very similar results with all the datasets, they are all pretty much the same. The differences are minor.

Geoff Sherrington

Reply to Andy May

March 22, 2022 7:01 pm

Andy,
Sadly I have an impediment that prevents me from using R, so I place a lot of value on the work of people like you. (My start in multivariate stats was with pencil, paper and eraser using Fisher’s method of analysis of variance.)
My comments were intended to be helpful, anticipating objections that readers might like settled before they understand. No way did I mean to criticize. Regards Geoff S

Neil Jordan

March 22, 2022 5:38 pm

Throwing a bit of fat on the fire, the American Council on Surveying and Mapping published a bulletin in December 2008 explaining sea level change.
Link to PDF here: https://tidesandcurrents.noaa.gov/publications/Understanding_Sea_Level_Change.pdf
Note the ~19-year periodicity in sea level caused by the lunar Metonic Cycle.
WUWT 2017 reference:
https://wattsupwiththat.com/2017/02/07/even-more-on-the-david-rose-bombshell-article-how-noaa-software-spins-the-agw-game/
My comment at WUWT:
Neil Jordan
Reply to
Willis Eschenbach
February 7, 2017 6:27 pm
Let me add to what the surveyors have written. It gets even better (or worse) when the cooked temperature is used to generate an also-cooked future sea level to establish a coastal boundary. ACSM BULLETIN December 2008 covers how it should be done, averaging over the previous 19-year tidal epoch.
https://www.tidesandcurrents.noaa.gov/…/Understanding_Sea_Level_Change.pdf
Note that sea level has a periodic component from the ~19-year lunar/solar Metonic cycle that the local coastal agency attributed to wind-driven upwelling. As a result of this and other shaky reasoning, there are three different sea level rises for 2100 at the same location, 11″, 42″, and 66″.

kzb

Reply to Neil Jordan

March 23, 2022 5:32 am

My understanding was that the purpose of autocorrelation is precisely to detect these hidden signals in the data. Detecting this Metonic cycle in the data would actually be a valid use of this technique. But I am not sure if Andy’s work here is.

Andy May

Author

Reply to Neil Jordan

March 23, 2022 9:20 am

Neil and kzb,
Correct this methodology can be used to look for specific suspected periodic components and tell you how significant they are. But it is a lot of work and involves much more than what I did here.

Geoff Sherrington

March 22, 2022 6:43 pm

Ferdinand,
All of the papers I have read on the derivation of Henry’s Law factors were done in a constrained lab setting. The water in contact with the air held only so much CO2, there were small volumes and a steady state was soon met.
In the open oceans, of course, the sea surface is constantly replenished with water that can have various levels of CO2 depending on factors like how old the water was when it reached the atmosphere interface. Much more dynamic and longer term concepts than the lab work.
Ferdinand, can you refer me to any Henry’s Law work that is studied in experiments closer to the natural case? I have searched and not found.

Regards Geoff S

kzb

March 22, 2022 7:01 pm

As a non-specialist I don’t know if a technique used in econometrics is appropriate for this use. It seems to me that every simple relation in science could be said to be autocorrelated if this is true.

Switch on an immersion heater in a vessel of water and plot the temperature increase with time. That would be autocorrelated by the definition used here.

I was taught that the way to detect if a fitted equation is satisfactory is to plot the residuals.

Bindidon

Reply to kzb

March 23, 2022 8:41 am

Sounds pretty good to another non-specialist.

Andy May

Author

Reply to kzb

March 23, 2022 9:21 am

arima is a form of residual analysis. Simply plotting them is useful, but you can do a much more thorough analysis with arima.

1 2 Next »

wpDiscuz

Works Cited

Share this:

Related Posts

Technical Note on the Possible Variation in Sea Level Rise “Accelerations” over the Next 5 Years

Pacific Climate Games

New Study: Chile’s Relative Sea Level Was 3.2 Meters Higher Than Today During The Mid-Holocene

Four Million Sinking Homes