While the crisis of statistics has made it to the headlines, that of mathematical modelling hasn’t. Something can be learned comparing the two, and looking at other instances of production of numbers.Sociology of quantification and post-normal science can help.
While statistical and mathematical modelling share important features, they don’t seem to share the same sense of crisis. Statisticians appear mired in an academic and mediatic debate where even the concept of significance appears challenged, while more sedate tones prevail in the various communities of mathematical modelling. This is perhaps because, unlike statistics, mathematical modelling is not a discipline. It cannot discuss possible fixes in disciplinary fora under the supervision of recognised leaders. It cannot issue authoritative statements of concern from relevant institutions such as e.g., the American Statistical Association or the columns of Nature.
Additionally the practice of modelling is spread among different fields, each characterised by its own quality assurance procedures (see1 for references and discussion). Finally, being the coalface of research, statistics is often blamed for the larger reproducibility crisis affecting scientific production2.
Yet if statistics is coming to terms with methodological abuse and wicked incentives, it appears legitimate to ask if something of the sort might be happening in the multiverse of mathematical modelling. A recent work in this journal reviews common critiques of modelling practices, and suggests—for model validation, to complement a data-driven with a participatory-based approach, thus tackling the dichotomy of model representativeness—model usefulness3. We offer here a commentary which takes statistics as a point of departure and comparison.
For a start, modelling is less amenable than statistics to structured remedies. A statistical experiment in medicine or psychology can be pre-registered, to prevent changing the hypothesis after the results are known. The preregistration of a modelling exercise before the model is coded is unheard of, although without assessing model purpose one cannot judge its quality. For this reason, while a rhetorical or ritual use of methods is lamented in statistics2, it is perhaps even more frequent in modelling1. What is meant here by ritual is the going through the motions of a scientific process of quantification while in fact producing vacuous numbers1.
All model-knowing is conditional on assumptions4. Techniques for model sensitivity and uncertainty quantification can answer the question of what inference is conditional on what assumption, helping users to understand the true worth of a model. This understanding is identified in ref. 3 as a key ingredient of validation. Unfortunately, most modelling studies don’t bother with a sensitivity analysis—or perform a poor one5. A possible reason is that a proper appreciation of uncertainty may locate an output on the right side of Fig. 1, which is a reminder of the important trade-off between model complexity and model error. Equivalent formulations of Fig. 1 can be seen in many fields of modelling and data analysis, and if the recommendations of the present comment should be limited to one, it would be that a poster of Fig. 1 hangs in every office where modelling takes place.
Model error as ideally resulting from the superposition of two curves: (i) model inadequacy error, due to using too simple a model for the problem at hand. This term goes down by making the model more complex; (ii) error propagation, which results from the uncertainty in the input variables propagating to the model output. This term grows with model complexity. Whenever the system being modelled in not elementary, overlooking important processes leaves us on the left-hand side of the plot, while modelling hubris can take us to the right-hand side
Sadly, I’m not smart enough to understand what, if anything, this means: “A recent work in this journal reviews common critiques of modelling practices, and suggests—for model validation, to complement a data-driven with a participatory-based approach, thus tackling the dichotomy of model representativeness—model usefulness.”3
Well said. Sadly the academic paper not well said.
Agreed, this sentence is completely unclear.
The paper itself, though, is quite simple and clear. The point its making will be familiar to anyone who has lived through the process of business case evaluation on some very large proposal involving big investments with possible future payoffs from new market developments.
An example was the auctions of radio frequencies during first the dot com bubble. All the mobile operators constructed enormously detailed spreadsheet models which attempted to assess how much it was worth paying for the different frequency groups. They were absolutely huge, full of the equivalent of ‘go-tos’, and because of this turned into black boxes.
As the paper suggests, the more detailed they got, the more specific the inputs they required, and the less clear it was how plausible the overall assumptions were as a set.
As an example, you could do a one page model in which revenue per customer and numbes of customers was an assumption. Then at least you could argue about whether this was plausible given what we presently observe about consumer behavior, you could argue about whether the necesary changes in society which would justify a given assumption would come about, and in doing so you would tease out the pros and cons and risks and uncertainties.
In the models as constructed, these assumptions were deeply buried and were themselves derived from lots of other detailed assumptions by complex calculations, so management found themselves, without intending to or realizing what had happened, arguing about the merits of the model. The question under discussion stopped being whether a given revenue per customer assumption was plausible and what in what sorts of futures it would happen, and became whether revenue per customer could plausibly be predicted from a whole heap of other micro assumptions.
In the end, under time pressure of an auction, management typically gave up, accepted the models, reassured themselves that so much detail must mean thoroughness and accuracy… and grossly overbid to an extent they never would have done if it had been a one pager which forced them to think for themselves about the key variables.
This was what the author describes as the increase in error propagation as a function of complexity. What management failed to grasp was that each detailed assumption had a margin of error, and the more of them you had to make, the greater the total uncertainty, since the error of each one propagated through the system.
Pat Frank made a similar point about the detailed climate models that policy makers are using to justify very large investments and huge public policy proposals. Its much harder with climate of course, because you cannot simply look at it and ask in the same way whether we really believe that in five years time this many people will be spending this much a month, or whether we really believe that network cost is going to fall at a given rate to given levels.
But you have the same underlying phenomenon, when you add detail you do not add certainty, nor do you add usefulness to the policy makers who have to decide. On the contrary, you give them a spurious feeling of confidence and you obscure the key driving variables which they ought to be thinking about.
I once quite early on in my practice saw this effect at first hand on a micro scale with a model I had constructed, and it served as a red flag to me. A group were trying to decide what to do with a service. To try to help them, I noted what they thought the drivers were and put them into a fairly simple spreadsheet. But, it did make assumptions which to them were black boxes, in the form of experience curves, the shape of which which you set by entering parameters.
To my surprise and dismay (I was very young then….) they did not engage with the curves and their drivers, which they couldn’t get their heads around at all, they just accepted the outputs given my preliminary assumptions as gospel. After all, they were generated by a computer!
It was a very valuable lesson. But of course, you had to be paying attention in order to hear what it was telling you. In most modelling processes there is no-one listening to this kind of thing. My impression is this applies to climate as much as to business modelling.
model validation comes in many flavors;
Quantitative approaches tend to focus on validation against observed data
( model representativeness) Qualitative approaches ( sometimes used in decision
sciences) can tend to focus on usefulness and involve stakeholder participation.
on one hand you can have a model that is accurate but useless
on the other hand one that is useful ( for policy makers) but not very accurate.
An example of a accurate but useless model is a model that takes time T to compute
and answer that is required by the user at time T-x. accuarte but too late or complex
to give an answer when the customer needs it.
the suggestion is you need both approaches
Very interesting. I’m looking forward to reading Nick Stokes’ remarks, should he make them. He always has something enlightening to add to the conversation.
I think the paper is waffly, and obviously itself quite unquantitative. It mainly seems to be a recommendation for sensitivity analysis, which is fine. But there is unsupported stuff like this:
“(ii) error propagation, which results from the uncertainty in the input variables propagating to the model output. This term grows with model complexity.”
It just isn’t necessarily true. A lot of complexity is created by a need to better constrain the output. That also reduces error propagation. A recent example was where a physics-free model was used to estimate error propagation in GCMs. An uncertainty region was then quoted which a more complex model would have unequivocally rejected due to conservation principles.
Where is the author’s sensitivity analysis of his Fig 1?
its model of modelling is rather simplistic
Nick, I think the underlying point is quite a serious one. Its that as the assumptions become more numerous and more detailed, the uncertainty does not necessarily reduce. Though it does become less visible.
And therefore, if you are a general manager looking to use the thing, you feel more confident because its so detailed, when actually you should be feeling less because you can’t any longer rely on or debate your intuitions about the key large variables driving the business case.
Someone who has been in a business for years, making mistakes and getting things right about consumer behavior, will have an ability to argue his way through assumptions about that. Bury this as an output of a twenty deep spreadsheet model of great detail, half of which is written in VB by non-programmers, and he will find himself unconvinced, baffled, unable to argue, and finally going along with something he knows in his heart to be nonsense.
Seen it happen.
“While the crisis of statistics has made it to the headlines”
Really? No headlines cited. Has anyone seen any?