Guest post by Kevin Kilty
Introduction
I had contemplated a contribution involving feedback diagrams, systems of equations, differential equations, and propagation of error ever since Nick Stoke’s original contribution about a system of differential equations involving feedback way back in June of this year. A couple of days ago the user Bartemis posted a pair of differential equations in a post of Roy Spencer which provided some inspiration on tying all these thoughts together in one posting. Finally, an added inspiration came from the controversy about Pat Frank’s recent contribution. Without taking a stand on the numerical values Pat calculated, or his approach, I hope to demonstrate why he and his critics, Nick for example, are really discussing different things; and why Pat’s take on this matter deserves very careful consideration. Here goes.
Let’s consider the following system of equations:
(1)
![clip_image002[1] clip_image002[1]](https://i0.wp.com/wattsupwiththat.com/wp-content/uploads/2019/10/clip_image0021.jpg?resize=175%2C109&quality=83&ssl=1)
(2)
We can view this as a model of water vapor feedback where; T is a surface temperature, C is a concentration of water vapor, and a,b,c,d are constants. The set of equations is a system of first order differential equations, but non-linear. One can also view individual terms as second order if we differentiate the first equation, and substitute into it the second. For instance, in the example of temperature (T):
(3) ![clip_image004[1] clip_image004[1]](https://i0.wp.com/wattsupwiththat.com/wp-content/uploads/2019/10/clip_image0041.jpg?resize=286%2C64&quality=83&ssl=1)
This, too, is non-linear but now a second order differential equation.
Thus, we can look at the temperature problem as part of a first order
system, or as a second order differential equation. It doesn’t matter except that the first order system is easier to deal with mathematically.
1. Making a linear approximation
The system of equations has two steady solutions, which a person can verify by simple substitution. These are (T0 = 0, C0 = 0) or (T0 = (bc/da)1/3 , C0 = c/d · T0). The first is a trivial solution of no interest. The second becomes a point around which we will make a linear approximation to the original equation set. The algebra is a little tedious and has little bearing on the issues at hand. But if we call ζ a small deviation from the steady solution in water vapor, and θ a small deviation in temperature the final form of the set of equations is.
(4)
(5) ![clip_image006[1] clip_image006[1]](https://i0.wp.com/wattsupwiththat.com/wp-content/uploads/2019/10/clip_image0061.jpg?resize=184%2C132&quality=83&ssl=1)
These are valid near the stationary solution (T0 = (bc/da)1/3 , C0 = c/d·T0). Doing as we did to produce the second order Equation 3) we arrive at the following linear second order approximation.
(6) ![clip_image008[1] clip_image008[1]](https://i0.wp.com/wattsupwiththat.com/wp-content/uploads/2019/10/clip_image0081.gif?resize=262%2C65&ssl=1)
2. Feedback Block Diagrams
Before continuing toward my main purpose, I’d like to show the relationship of the differential equations, above, to feedback diagrams over which people have spilled electrons galore, although only green ones, on these pages. Figures 1a and 1b show two possible block models of the second order differential equation in θ. The block models, and the differential equations are just different descriptions of the same thing. One has no mojo that the other doesn’t have. They have different utility. For example, engineers must turn a system into electronic or mechanical hardware that realizes the system, and the block diagram is useful for making this transformation.
![clip_image010[1] clip_image010[1]](https://i0.wp.com/wattsupwiththat.com/wp-content/uploads/2019/10/clip_image0101.jpg?resize=377%2C348&quality=83&ssl=1)
Figure 1. Figures a, and b show alternative block models of the second order differential equation in θ. In a) the model consists of two integrators, which turn θ¨ into θ, and the feedback loops contain gain blocks. In b) there is a filter that functions as a leaky integrator. Block representations are not unique.
3. Stability of the Linear System
We are in a position now to apply Nick’s stability analysis. Finding the eigenvalues of a two by two matrix is relatively easy. It just makes use of the quadratic formula. The interested reader might consult Wolfram Math World online (mathworld.wolfram.com/Eigenvalue.html) which shows eigenvalues in this instance explicitly.
(7) ![clip_image012[1] clip_image012[1]](https://i0.wp.com/wattsupwiththat.com/wp-content/uploads/2019/10/clip_image0121.jpg?resize=341%2C59&quality=83&ssl=1)
There are two negative eigenvalues for some combinations of (h,b,c,d), meaning; the linearized system is stable, and the original non-linear system is then stable against infinitesimal disturbances. As Nick Stokes indicated, initial errors in this system will damp out. This, however, is not the full story. In my opinion it is the correct answer to a wrong question. The question of stability is not just a matter of the behavior of matrix A, where:
![clip_image014[1] clip_image014[1]](https://i0.wp.com/wattsupwiththat.com/wp-content/uploads/2019/10/clip_image0141.jpg?resize=160%2C72&quality=83&ssl=1)
but also the question of what occurs with particular input to the system. In a stable system like A we can state that a bounded input produces a bounded output. That is nice to know, but bounded does not answer the more complex question of whether the output is useful for a specific purpose. This is a domain that propagation of error addresses. It is, I sense, the position Pat Frank was taking in his essay, and while I don’t speak for him, one cannot dismiss the importance of what he was trying to do.
4. The Real Concern of Error Propagation
Let’s return to the linearized system (Eqs. 4 and 5 ). The system really doesn’t do much of anything interesting because there is nothing to drive it. We must have a vector of driving terms, one involving the driver of temperature, and possibly one to drive water vapor. Without this all the solution ever does is decay back to the steady solution–i.e. its errors vanish. But this misses several important considerations. In design of control system, the engineer has to deal with inputs and disturbances to the system. Thus, the solar constant varies slightly and pushes the solution away from the equilibrium one.[1] We would put this in the vector U in Equation 8). There are undoubtedly disturbances driving the amount of water vapor as well. The next level of difficulty is having a model that is not fully specified. For instance, El Nino is not part of the state vector, but it does supply a disturbance to temperature and humidity. Thus it belongs in vector e in Equation 8). Perhaps these missing parameters provide random influences which then appear to have come from the solar constant or from water vapor. Finally, by being only a model, we cannot possibly know true values of the matrix elements of A; we estimate them best we can, but they are uncertain.
A more realistic model of what we are dealing with looks like this state space model involving the vector of two state variables, temperature and humidity (X), and the drivers and random errors of input (U + e). Just to be complete I have noted that what we observe is not necessarily the state variables, but rather some function of them that may have passed through instruments first. What we observe is then some other vector, Y, which might have its own added errors, w, completely independent of errors added to the drivers.
(8) X˙ = A · X + B · (U + e)
(9) Y = C · X + D · (U + w)
Even though it is a more realistic model, it still doesn’t describe propagation of error, but its solution is needed machinery to get at propagated error. What we would like to estimate, using a solution to these equations is an expectation of the difference between what we would observe with a best model, which we can’t possibly know exactly, and what the model equations above produce with uncertainties considered.
Here is a steady solution to our state variables: X = A · U + e. The matrix G comes from a combination of the matrices A and B, the details of which do not matter here.[2] As a matrix we can say simply that G looks like this:
![clip_image016[1] clip_image016[1]](https://i0.wp.com/wattsupwiththat.com/wp-content/uploads/2019/10/clip_image0161.jpg?resize=169%2C79&quality=83&ssl=1)
5. An Estimate of Uncertainty
There is no purpose to becoming bogged down in excessive mathematics. So, let‘s focus attention on only the equation for temperature, θ. It is apparent that we do not have a complete or accurate model of temperature. Thus, let whatever the equations 8) and 9) produce as a solution be called an estimate of temperature. We use the symbol θˆ for this. The caret symbol is what statisticians generally use for an estimator. Then let the true temperature we would have found from a perfect model be θ. Even though we don’t actually know the true θ we figure our model is not too bad and so θˆis nearby despite uncertainties.
Generally people use a Taylor series to build a linear approximation to θˆ, and calculate error propagation from it. This Taylor series is written in terms of partial derivatives of all uncertain independent variables and parameters, the pis, in the following.
(10) ![clip_image018[1] clip_image018[1]](https://i0.wp.com/wattsupwiththat.com/wp-content/uploads/2019/10/clip_image0181.jpg?resize=238%2C80&quality=83&ssl=1)
But in our specific case the terms of G, the gij, are coefficients of the linear approximation, with the drivers, the elements of the vector (U + e), as inputs. Thus our first order Taylor series looks like:
(11) (θˆ− θ) = g11 · (u1 + e1) + g12 · (u2 + e2)
As an estimate of propagated error what we require is the variance of (θˆ−θ) because, now having random input, this difference has become a random variable itself. Variance is defined in general as E((Z−E(Z))2); where E(…) means expectation value of the random variable within the parentheses.[3] Therefore we should square equation 11) and take the expectation value, so that what resides on the left hand side is the expectation value of variance we seek. Call this Sθ2. The right hand side of equation 11) produces many terms as there are uncertain parameters, plus cross products.
(12)
Sθ2 = g112 ·E[(u1+e1)2]+g122 ·E[(u2+e2)2]+2·g11g12·E[(u1+e1)·( u2+e2)]
The terms within square brackets are variances or covariances. Even if the expectation values of the random inputs are zero, their variances are not. Thus, despite having a stable system of differential equations, the variance of the state variables probably will not tend to zero as time progresses.
There is a further point to discuss. The matrix A is not known exactly. Each element has some uncertainty which equations 8) and 9) do not indicate explicitly. One way to include this is to place a uncertainty matrix in series with A; which then becomes A + Z. Z is like a matrix of random variable which we assume have expectation values of zero for all elements, but once again do not have zero variance. This matrix will produce uncertainty in G, through its relationship to A. A complete propagation of error takes some thought and care.
6. Conclusion
The value of Sθ2 which results from a complete analysis of the contributors to uncertainty when compared to the precision needed, is what really determines whether or not model results are fit for purpose. As I wrote in a comment at one point in Pat’s original posting, the topic of propagation of error is complex; and I was told that it is indeed not complex. I think this discussion shows that it is more complex than many people suppose, and I hope it helps reinforce Pat Frank’s points.
7. Notes:
(1) G = (A −λ·I)−1B See, for instance, Ogata, System Dynamics, 2004, Pearson-Pretice Hall.
(2) Mototaka Nakamura, in his recent monograph, which is available for download at Amazon.com, alludes to small variations in the solar constant. This would go into the vector U as an example.
This is explained well in Bevington, Data Reduction and Error Analysis for the Physical Sciences, McGraw-Hill, 1969. Bevington illustrates a number of simple cases. Neither his, nor any other reference I know, tackles propagated error through a system of linear equations.
Discover more from Watts Up With That?
Subscribe to get the latest posts sent to your email.
Here is the follow-up. I argue that a uniform uncertainty distribution (u.d.) is impossible except in the presence of an infinitely precise instrument, and I provide mathematical definitions and means of calculation which may be useful.
We shall assume digital output readings so, by scaling, we can assume that the possible outputs are a complete range of integers, e.g. 0 to 1000. Now use Bayesian statistics to describe the problem.
Let X be a random variable for the true value of a quantity which we attempt to measure.
Let x be the value of X actually occurring at some particular time.
Let M be our measurement, a random variable but including the possibility of zero variance. Note that M is an integer.
Let D be the error, = M – x.
Let f(x) be a chosen (Bayesian) prior probability density function (p.d.f.) for X.
Let g(y;x) be a probability function (p.f.) for M over a range of integer y values, dependent on x (PRECISION DISTRIBUTION).
Let c be a constant of proportionality determined by making relevant probabilities add up to 1. Then after measurement M, the posterior probability for X taking the value z is
P[X’=’x | M=y] = c P[M=y | X’=’x] P[X’=’x] = g(y;x) f(x)
Usually we will take f() to be an “uninformative” prior, i.e. uniform over a large range bound to contain x, so it has essentially no influence. In this case,
P[X’=’x | M=y] = c g(y;x) where c = 1/int g(y;x)dx, (UNCERTAINTY DISTRIBUTION).
Then P[D=z | M=y] = P[X=M-z | M=y] = c g(y;y-z). Now assume that g() is translation invariant and symmetric, so g(y;y-z) = g(0;-z) = h(-z) = h(z) defines function h(), and c = 1/int h(z)dz. Then
P[D=z | M=y] = c h(z), independent of y (ERROR DISTRIBUTION = shifted u.d.).
The assertion of a +/-0.5 uncertainty interval would correspond to h(z) = 1 for -0.5<z<0.5 and 0 elsewhere (let’s call that function h_U). However, this is unrealistic when you think about how uncertainty creeps into a measurement. If you have an accurate ruler marked with 1/10” intervals and you are trying to record X to the nearest 0.1”, then if X is actually near one of the marks then you will be confident in your result. But if X is close to the middle of an interval, you will be far less confident, especially when you think about whether you took sufficient care to align your material to the zero end of the ruler.
Taking an example in our integer case, it means that if M = y = 314, then if x is truly 314.159 you might be confident that 313.5 < x < 314.5. But if x is truly 314.459, then there may be a chance that instead of recording 314, you record 315. In this case there is actual uncertainty about what you will record. But if between x = 314.4999 and x = 314.5001 you were bound to flip from recording 314 to 315, then you had almost perfect precision which you wantonly chose to throw away. I would call that “damaged precision”.
So here is an example of an h(), with a trapezium shape (call it h_T), which allows for a perfect result near integers and an interpolating mixed result away from them.
h(z) =
{ 2(z+3/4) for -3/4<z<-1/4
{ 1 for -1/4<z<1/4
{ 2(3/4-z) for 1/4<z<3/4
For any function h() the mean m and variance s^2 of the error D = M-x, given X=x, may be calculated as follows. Let a be the fractional part of -x put into the range (-1/2,1/2). Then
m = c sum_y (y-x)g(y;x) = c sum_k (a+k)g(0;-k) = c sum_k (a+k)h(k)
s^2 = c sum_k (a+k-m)^2 h(k)
For h() = h_T(), we find
if a<1/4, m = a and s^2 = 0,
if 1/4<a<3/4, m = 1/2-a, s^2 = 4(a-1/4)(3/4-a)
with mirror results for a<0. Averaged over all a, m is 0 and s^2 is 1/6 (compared with 0 and 0 for h_U).
Advertizing disclaimer: other h()’s are available and your mileage may vary depending on which you use.
Overnight I realized that astute readers might have noticed an omission in my analysis above, but none has mentioned it yet. It is that in my discussion about the nature of error I used an accurate ruler as the base example. Therefore the above covers precision, which is important, but not accuracy. Here are some words that Tim related above about those:
“Accuracy — a measure of the capability of the instrument to faithfully indicate the value of the measured signal.”
“Precision — a measure of the stability of the instrument and its capability of resulting in the same measurement over and over again for the same input signal.”
I can certainly look at improving my analysis above to take accuracy into account, but for the purposes of analyzing Pat Frank’s paper I do not think it is important. This is because, as far as I can tell, the uncertainty spread in his Figure 6 is all about model consistency and predictability, and not at all about how well the mean lines correspond to reality.
So there is a wide uncertainty distribution around when I might get round to adding in the accuracy component…
I have now thought further on the “accuracy” angle. Unusually, I’ll explain in words rather than mathematics, which may be a blessing to some.
We have an observable, which we wish to measure as best we can. There exists a population of instruments capable of such measurement, and we obtain N of them. The manufacturer asserts that any single instrument will have a bias which is a number randomly and independently drawn from a distribution of known mean (which might or might not be zero) and a known standard deviation. On top of that there is observational error when we perform the measurement. This might be due to a human reading from an analogue instrument as in the case of the rulers, or in the case of a digital instrument be due to its output resolution and any fluctuations in the readings taken.
The bias may be considered to be a systematic error, and relates to accuracy. The observational error relates to precision. The total error is the sum of the two (I believe kribaez mentioned 5 different types of error, but I am concentrating on these two; perhaps a third one is errors in the manufacturer’s assertions). As such, the distribution of the total error is the convolution of the systematic and observational errors, and may be calculated, and its standard deviation might be called by some an “uncertainty interval”. The standard deviation interval does not, by its definition, encompass the total range of error, and the total range of error is very unlikely to have uniform probability.
If N > 1 then the sum of the results has standard deviation sqrt(N) times that of any single results, and the mean of the results has s.d. 1/sqrt(N) times that of a single result.
I have derived this from first principles and helpful comments and questions along the way. Perhaps a true expert in the subject will tell me it is junk. I should like to thank Tim/Jim Gorman for some very challenging questions along the way.
Tim,
thanks for replying. You write :
“Uncertainty intervals are not probability functions.”
Of course they are not; by the respective definitions, intervals can’t be functions. But such intervals can be, and typically are, derived from probability distributions.
Alternatively, you can split an empirical distribution in quantiles of your choice.
Can You explain the term “probability function “? Give an example ?
“Therefore they have no variance.”
Not true. When the distribution parameters are estimates from a sample, which is the common case, they bear their own uncertainty , so there is a variance of the variance. This, in principle, would carry over to the interval limits. But I’m afraid this is not what you meant.
“Since uncertainty intervals are not probability functions”…..see above
“they have no mean from which a variance can be calculated. ”
Leaves me puzzled. Error intervals arecentered on a mean, spanning between upper and lower limit. ” Uncertainty intervals”, according to you, have no mean and no variance, so what are they at all in numerical terms ? Could You give an example ?
“…. uncertainty adds as root-sum-square…. ”
Reads to me : the larger the sum, the more uncertainty ? There is one undeniable certainty about it : the term grows with sample size. May I then correctly conclude that reducing the sample size results in less uncertainty ?
And : the full term for “sum-squares” reads as the ” sum of the squared deviations from the mean”, so there *has to be* a mean, otherwise no “sum-squares”. The root of it I have never seen applied for something, lest to have it divided by root(deg. of freedom, df), to get a standard deviation. The (total) sum of squares, otoh, is the basic quantity in the GLM approach, where it is split into components of sources of deviations, but the partial sums of squares are finally divided by their dfs to give the errors of the model parameters. But you don’t want this error stuff.
“– uncertainty and error are not the same thing.”
Right, semantically not. Instead one might say that the presence of error, assumed or observed, gives rise to uncertainty.
But how is it in numerical terms? Let’ s look into the
“Guide to the expression of uncertainty in measurement” (also mentioned by Pat) :
Introduction,§0.6 :
[Begin citation]
Recommendation INC-1 (1980) Expression of experimental uncertainties
1) The uncertainty in the result of a measurement generally consists of several components which may be grouped into two categories according to the way in which their numerical value is estimated:
A. those which are evaluated by statistical methods,
B. those which are evaluated by other means.
There is not always a simple correspondence between the classification into categories A or B and the previously used classification into “random” and “systematic” uncertainties. The term “systematic uncertainty” can be misleading and should be avoided.
Any detailed report of the uncertainty should consist of a complete list of the components, specifyingfor each the method used to obtain its numerical value.
2) The components in category A are characterized by the estimated variances s(i)^2,
or the estimated “standard deviations” s(i) and the number of degrees of freedom v(i). Where appropriate, the covariances should be given.
3) The components in category B should be characterized by quantities
u(j)^2 , which may be considered as approximations to the corresponding variances, the existence of which is assumed. The quantities 2 j u may be treated like variances and the quantities u(j) like standard deviations. Where appropriate,the covariances should be treated in a similar way.
4) The combined uncertainty should be characterized by the numerical value obtained by applying the usual method for the combination of variances. The combined uncertainty and its components should be expressed in the form of “standard deviations”.
5) If, for particular applications, it is necessary to multiply the combined uncertainty by a factor to obtain an overall uncertainty, the multiplying factor used must always be stated
[End citation – terms with suscripts and powers edited for readability – U.]
So, in this framework, there is no contrast between “error” and “uncertainty”, rather the conventional error analysis and the usage of the respective terms is proposed as “category A” uncertainty analysis. (category B is not relevant here because it does not deal with measurement-derived errors, but with kind of educated guesses as approximations thereof. Nevertheless, Pat uses the “u” notation, though not consistently.)
‘So, in this framework, there is no contrast between “error” and “uncertainty”’.
Exactly.
Except I am happy to allow the use of the term “uncertainty” to describe the distribution of errors. At least, I am on a Tuesday.
Tim writes :
“Uncertainty intervals are not probability functions.”
Of course they are not; by the respective definitions, intervals can’t be functions. But such intervals can be, and typically are, derived from probability distributions.
Alternatively, you can split an empirical distribution in quantiles of your choice.
Can You explain the term “probability function “? Give an example ?
After sleeping a night over it, I regretfully withdraw this comment of mine. Of course, an interval can be defined in terms of the desired area under the probability density distribution curve and a multiplier for the respective sd units, such that for +/- 1 sd and assumed Normal Distribution you get the well-known 68% of potential measurements.
Still, I’m eager to learn how such an interval is defined in the uncertainty domain.
U.
Sorry again, Tim, having been busy with my post, it escaped to me that you did already give
your explanation of uncertainty intervals :
Tim Gorman
October 14, 2019 at 3:10 pm
“Think of the uncertainty interval as an indicator of the maximum possible plus and minus associated with the output of a model. The uncertainty interval tells you nothing about what is going on inside the interval but it describes where the object under discussion might be found.”
Thanks, I can see more clearly now what you mean. You are dealing with ranges, which are imposed arbitrarily, hopefully according to best knowledge and possibly, but not necessarily based on observations. But this is a Type B approach in the sense of the Guide I cited upthread, thus without relevance to the root of our dispute, Pat Frank’s study. The only uncertainty term he deals with is this 4W sd for cloud forcing, a conventional error term, i.e. a Type A approach. I see nowhere in his text that a range in your definition is defined.
Assuming he had done so, how could his claim be corroborated that uncertainty intervals are widening out in the GCM processing , when the range is “the maximum possible plus and minus associated with the output of a model ” ? The range is fixed, no widening into the impossible allowed.