How error propagation works with differential equations (and GCMs)

guest post by Nick Stokes

There has been a lot of discussion lately of error propagation in climate models, eg here and here. I have spent much of my professional life in computational fluid dynamics, dealing with exactly that problem. GCM’s are a special kind of CFD, and both are applications of the numerical solution of differential equations (DEs). Propagation of error in DE’s is a central concern. It is usually described under the heading of instability, which is what happens when errors grow rapidly, usually due to a design fault in the program.

So first I should say what error means here. It is just a discrepancy between a number that arises in the calculation, and what you believe is the true number. It doesn’t matter for DE solution why you think it is wrong; all that matters is what the iterative calculation then does with the difference. That is the propagation of error.

A general linear equation in time can be formulated as

y’ = A(t)*y+f(t) ……….(1)
y(t) could be just one variable or a large vector (as in GCMs); A(t) will be a corresponding matrix, and f(t) could be some external driver, or a set of perturbations (error). The y’ means time derivative. With a non-linear system such as Navier-Stokes, A could be a function of y, but this dependence is small locally (in space and time) for a region; the basics of error propagation follow from the linearised version.

I’ll start with some bits of DE theory that you can skip (I’ll get more specific soon). If you have another solution z which is the solution following an error, then the difference satisfies

(y-z)’=A*(y-z)
The dependence on f(t) has gone. Error propagation is determined by the homogeneous part y’=A*y.

You can write down the solutions of this equation explicitly:

y(t) = W(t)*a, W(t) = exp(∫ A(u) du )
where the exp() is in general a matrix exponential, and the integral is from starting time 0 to t. Then a is a vector representing the initial state, where the error will appear, and the exponential determines how it is propagated.

You can get a long way by just analysing a single error, because the system is linear and instances can be added (superposed). But what if there is a string of sequential errors? That corresponds to the original inhomogeneous equation, where f(t) is some kind of random variable. So then we would like a solution of the inhomogeneous equation. This is

y(t) = W(t) ∫ W-1(u) f(u) du, where W(t)=exp(∫ A(v) dv ), and integrals are from 0 to t
To get the general solution, you can add any solution of the homogeneous equation.

For the particular case where A=0, W is the identity, and the solution is a random walk. But only in that particular case. Generally, it is something very different. I’ll describe some special cases, in one or few variable. In each case I show a plot with a solution in black, a perturbed solution in red, and a few random solutions in pale grey for context.

Special case 1: y’=0

This is the simplest differential equation you can have. It says no change; everything stays constant. Every error you make continues in the solution, but doesn’t grow or shrink. It is of interest, though, in that if you keep making errors, the result is a random walk.

Special case 2: y”=0

The case of no acceleration. Now if there is an error in the velocity, the error in location will keep growing. Already different, and already the simple random walk solution for successive errors doesn’t work. The steps of the walk would expand with time.

Special case 3: y’=c*y

where c is a constant. If c>0, the solutions are growing exponentials. The errors are also solutions, so they grow exponentially. This is a case very important to DE practice, because it is the mode of instability. For truly linear equations the errors increase in proportion to the solution, and so maybe don’t matter much. But for CFD it is usually a blow-up.

But there are simplifications, too. For the case of continuous errors, the earlier ones have grown a lot by the time the later ones get started, and really are the only ones that count. So it loses the character of random walk, because of the skewed weighting.

If c<0, the situation is reversed (in fact, it corresponds to above with time reversed). Both the solutions and the errors diminish. For continuously created errors, this has a kind of reverse simplifying effect. Only the most recent errors count. But if they do not reduce in magnitude while the solutions do, then they will overwhelm the solutions, not because of growing, but just starting big. That is why you couldn’t calculate a diminishing solution in fixed point arithmetic, for example.

This special case is important, because it corresponds to the behaviour of eigenvalues in the general solution matrix W. A single positive eigenvalue of A can produce growing solutions which, started from any error, will grow and become dominant. Conversely the many solutions that correspond to negative eigenvalues will diminish and have no continuing effect.

Special case 4: Non-linear y’=1-y2

Just looking at linear equations gives an oversimplified view where errors and solutions change in proportion. The solutions of this equation are the functions tanh(t+a) and coth(t+a), for arbitrary a. They tend to 1 as t→∞ and to -1 as t→-∞. Convergence is exponential. So an error made near t=-1 will grow rapidly for a while, then plateau, then diminish, eventually rapidly and to zero.

Special case 5: the Lorenz butterfly

This is the poster child for vigorous error propagation. It leads to chaos, which I’ll say more about. But there is a lot to be learnt from analysis. I have written about the Lorenz attractor here and in posts linked there. At that link you can see a gadget that will allow you to generate trajectories from arbitrary start points and finish times, and to see the results in 3D using webGL. A typical view is like this

Lorenz derived his equations to represent a very simple climate model. They are:

The parameters are conventionally σ=10, β=8/3, ρ=28. My view above is in the x-z plane and emphasises symmetry. There are three stationary points of the equations, 1 at (0,0,0),(a, a, 27)and,(-a, -a, 27) where a = sqrt(72). The last two are centres of the wings. Near the centres, the equations linearise to give a solution which is a logarithmic spiral. You can think of it as a version of y’=a*y, where a is complex with small positive real part. So trajectories spiral outward, and at this stage errors will propagate with exponential increase. I have shown the trajectories on the plot with rainbow colors, so you can see where the bands repeat, and how the colors gradually separate from each other. Paths near the wing but not on it are drawn rapidly toward the wing.

As the paths move away from the centres, the linear relation erodes, but really fails approaching z=0. Then the paths pass around that axis, also dipping towards z=0. This brings them into the region of attraction of the other wing, and they drop onto it. This is where much mixing occurs, because paths that were only moderately far apart fall onto very different bands of the log spiral of that wing. If one falls closer to the centre than the other, it will be several laps behind, and worse, velocities drop to zero toward the centre. Once on the other wing, paths gradually spiral outward toward z=0, and repeat.

Is chaos bad?

Is the Pope Catholic? you might ask. But chaos is not bad, and we live with it all the time. There is a lot of structure to the Lorenz attractor, and if you saw a whole lot of random points and paths sorting themselves out into this shape, I think you would marvel not at the chaos but the order.

In fact we deal with information in the absence of solution paths all the time. A shop functions perfectly well even though it can’t trace which coins came from which customer. More scientifically, think of a cylinder of gas molecules. Computationally, it is impossible to follow their paths. But we know a lot about gas behaviour, and can design efficient ICE’s, for example, without tracking molecules. In fact, we can infer almost everything we want to know from statistical mechanics that started with Maxwell/Boltzmann.

CFD embodies chaos, and it is part of the way it works. People normally think of turbulence there, but it would be chaotic even without it. CFD solutions quickly lose detailed memory of initial conditions, but that is a positive, because in practical flow we never knew them anyway. Real flow has the same feature as its computational analogue, as one would wish. If it did depend on initial conditions that we could never know, that would be a problem.

So you might do wind tunnel tests to determine lift and drag of a wing design. You never know initial conditions in tunnel or in flight but it doesn’t matter. In CFD you’d start with initial conditions, but they soon get forgotten. Just as well.

GCMs and chaos

GCMs are CFD and also cannot track paths. The same loss of initial information occurs on another scale. GCMs, operating as weather forecasts, can track the scale of things we call weather for a few days, but not further, for essentially the same reasons. But, like CFD, they can generate longer term solutions that represent the response to the balance of mass, momentum and energy over the same longer term. These are the climate solutions. Just as we can have a gas law which gives bulk properties of molecules that move in ways we can’t predict, so GCMs give information about climate with weather we can’t predict.

What is done in practice? Ensembles!

Analysis of error in CFD and GCMs is normally done to design for stability. It gets too complicated for quantitative tracing of error, and so a more rigorous and comprehensive solution is used, which is … just do it. If you want to know how a system responds to error, make one and see. In CFD, where a major source of error is the spatial discretisation, a common technique is to search for grid invariance. That is, solve with finer grids until refinement makes no difference.

With weather forecasting, a standard method is use of ensembles. If you are unsure of input values, try a range and see what range of output you get. And this is done with GCMs. Of course there the runs are costlier, and so they can’t do a full range of variations with each run. On the other hand, GCM’s are generally surveying the same climate future with just different scenarios. So any moderate degree of ensemble use will accumulate the necessary information.

Another thing to remember about ensemble use in GCM’s is this. You don’t have to worry about testing a million different possible errors. The reason is related to the loss of initial information. Very quickly one error starts to look pretty much like another. This is the filtering that results from the vary large eigenspace of modes that are damped by viscosity and other diffusion. It is only the effect of error on a quite small space of possible solutions that matters.

If you look at the KNMI CMIP 5 table of GCM results, you’ll see a whole lot of models, scenarios and result types. But if you look at the small number beside each radio button, it is the ensemble range. Sometimes it is only one – you don’t have to do an ensemble in every case. But very often it is 5,6 or even 10, just for 1 program. CMIP has a special notation for recording whether the ensembles are varying just initial conditions or some parameter.

Conclusion

Error propagation is very important in differential equations, and is very much a property of the equation. You can’t analyse without taking that into account. Fast growing errors are the main cause of instability, and must be attended to. The best way to test error propagation, if computing resources are adequate, is by an ensemble method, where a range of perturbations are made. This is done with earth models, both forecasting and climate.

Appendix – emulating GCMs

One criticised feature of Pat Frank’s paper was the use of a simplified equation (1) which was subjected to error analysis in place of the more complex GCMs. The justification given was that it emulated GCM solutions (actually an average). Is this OK?

Given a solution f(t) of a GCM, you can actually emulate it perfectly with a huge variety of DEs. For any coefficient matrix A(t), the equation

y’ = A*y + f’ – A*f

has y=f as a solution. A perfect emulator. But as I showed above, the error propagation is given by the homogeneous part y’ = A*y. And that could be anything at all, depending on choice of A. Sharing a common solution does not mean that two equations share error propagation. So it’s not OK.

Get notified when a new post is published.
Subscribe today!
1 1 vote
Article Rating
381 Comments
Inline Feedbacks
View all comments
Cephus0
September 20, 2019 4:29 pm

You can model A climate but you cannot model THE climate. There is a computational solution to THE climate and it is currently being run on a platform we call the Galaxy using software we call the laws of physics.

Gerald Browning
Reply to  Cephus0
September 22, 2019 1:27 pm

Cephus0

Great comment!

Jerry

Michael S. Kelly LS, BSA Ret.
September 22, 2019 6:03 pm

This was a nice opening discussion of one aspect of the problems with solving systems of differential equations (error propagation), and gives some perspective on the subject. I appreciate Nick Stokes’ laying it out, and recommend reading his own blog post, which is more thorough.

However, this discussion is flawed (a term I’ve come to dislike, but it’s the best one applicable) from the beginning, where Mr. Stokes loosely defines “error”: “It is just a discrepancy between a number that arises in the calculation, and what you believe is the true number.”

I’m at a loss to even guess what is meant by “what you believe is the true number.” The error term in a numerical integration scheme is given by the order of magnitude of the sum of all of the terms following the last in a truncated series extrapolation; it’s the difference between the series extrapolation one has performed and the series extrapolation of an infinite series that one has not performed. It has no defined algebraic sign, let alone magnitude. There isn’t a number representing “what you believe is the true number.” There is just an undefined term.

In ordinary differential equation integration, error “control” is often applied in the form of either variable time step, use of a “mop up step”, or both. In each case, the integration is performed using a certain order of approximation most of the time, but occasionally performing a higher order approximation. The difference is taken, and if it exceeds a certain percentage of the difference between the low order time step values, then this difference is added to the last integration step. This is a “mop up step,” and its utility has been the subject of debate for decades.

The variable time step check is somewhat better. Two integrations between t0 and t1 are performed, first with the normal time step t1-t0 = h, and then two more time steps with step size h/2. If the second differs from the first by some prescribed amount, the time step going forward is reduced (it isn’t necessarily halved – there are a number of algorithms having more sophisticated time step control).

In short, the treatment of “error” herein is based on a faulty definition. But there are other problems with the discussion.

Probably the biggest argument I have with it is in the assertion that “loss of memory” of initial conditions in the integration of the equations of motion in CFD is a feature. What that implies is that questions of existence and uniqueness of solutions of the Navier Stokes equations are superfluous. We may not be able to find a closed form solution to the general Navier Stokes equations, but we can in restricted cases. And in those (or any closed form) solutions, there can be no “loss of memory” of conditions at any point.

One of my career fields of interest is astrodynamics, in which we deal with large systems of ordinary non-linear differential equations. The number of scales of forces, masses, and times involved is very large, and dealing with them computationally is extremely challenging. Nevertheless, we are able to start and stop a simulation of the entire solar system (not just planets, but all of the known moons and asteroids) at any point over a span of millions of years, and achieve the same results no matter how we do it. The solar system is no less chaotic than a fluid system. But “chaos” is not randomness: it’s deterministic. And a numerical solution that does not exhibit deterministic behavior by being able to run in reverse does not fit the definition of a “solution.” It’s simply garbage.

There is much, much more to the bad behavior of computational CFD. For example, the treatment of turbulence Mr. Stokes gives in his (quite good) blog post hints of something that, unfortunately, he doesn’t develop. He cites the k-epsilon model, where k is a parameter applied to the flow kinetic energy, while epsilon is a diffusion parameter. They are introduced into the Reynolds-averaged Navier Stokes equations to make up for the fact that there are more variables than equations – that is, the problem is mathematically undefined just on the basis of physics. Though a “kinetic energy” and “diffusion” parameter sound physical, they are simply adjustable parameters in an equation having no basis in physics, and are tuned to make numerical results agree with experimental measurements. No climatic experiments have been or can be performed to define these values for GCMs.

I applaud Nick Stokes for opening this part of the discussion, and hope that more people familiar with computational physics will join in.

Reply to  Michael S. Kelly LS, BSA Ret.
September 22, 2019 6:59 pm

“where Mr. Stokes loosely defines “error””
It was deliberate, and I made that clear. Differential equations don’t care about causes of perturbation. They just take points and map them onto new domains. And my analysis was all about how two neighboring points are either moved apart from each other, or brought together. If you have a separation that you attribute to error, then the first amplifies it, and the second attenuates it. That is all.

The same amplification or not will apply to distributions that you might wish to call uncertainty.

” If the second differs from the first by some prescribed amount”
I think that sums up the “error” situation. You don’t have one that is right and one wrong. You have one that you believe is probably better, but the key metric is the difference. You take action based on that.

“We may not be able to find a closed form solution to the general Navier Stokes equations, but we can in restricted cases. And in those (or any closed form) solutions, there can be no “loss of memory” of conditions at any point.”
I would invite you to nominate any non-trivial closed form solution which is usefully specified as an initial value problem. They are mostly steady state – Poiseuille pipe flow is a classic. In its applications, at least, you just take it as a parabolic velocity profile, and accept that that will be the converged result, however it started. It’s true that, subject to a lot of assumptions, you can find one convergent path. But it is far from unique.

The reason I described it as a “feature” is this. We are trying to model real world flows, like that pipe flow. Those flows are often steady or periodic. If not, we probably want statistics of a transient flow, eg lift/drag. The real flows do not have known or knowable initial conditions. CFD is structured as time-stepping, and so does have to start somewhere. If the start state mattered, the CFD would not be emulating the real world. Fortunately, it usually doesn’t.

Jim G
September 28, 2019 4:04 am

It appears to me that a good analogy here would be the ensemble modeling of hurricanes.
The result is that several all the solutions are pretty close together in the first day. After several days, a few solutions will have trajectories that are fairly close together with some that are obvious outliers.

With hurricanes, we have a system of models that can be verified rather quickly.
With GCM’s, not so much. Although within 20 years, the models have been shown to be somewhat faulty.

A question though:
Do the hurricane forecast models work by tweaking the initial conditions and running the program?
If that is the case, then the math behind the model is inaccurate to some degree. The results then will always be wrong. Some results however are just less wrong than others and may be useful.