Elevated from a WUWT comment by Dr. Robert G. Brown, Duke University
Frank K. says: You are spot on with your assessment of ECIMs/GCMs. Unfortunately, those who believe in their ability to predict future climate really don’t want to talk about the differential equations, numerical methods or initial/boundary conditions which comprise these codes. That’s where the real problems are…
Well, let’s be careful how you state this. Those who believe in their ability to predict future climate who aren’t in the business don’t want to talk about all of this, and those who aren’t expert in predictive modeling and statistics in general in the business would prefer in many cases not to have a detailed discussion of the difficulty of properly validating a predictive model — a process which basically never ends as new data comes in.
However, most of the GCMs and ECIMs are well, and reasonably publicly, documented. It’s just that unless you have a Ph.D. in (say) physics, a knowledge of general mathematics and statistics and computer science and numerical computing that would suffice to earn you at least masters degree in each of those subjects if acquired in the context of an academic program, plus substantial subspecialization knowledge in the general fields of computational fluid dynamics and climate science, you don’t know enough to intelligently comment on the code itself. You can only comment on it as a black box, or comment on one tiny fragment of the code, or physics, or initialization, or methods, or the ode solvers, or the dynamical engines, or the averaging, or the spatiotemporal resolution, or…
Look, I actually have a Ph.D in theoretical physics. I’ve completed something like six graduate level math classes (mostly as an undergraduate, but a couple as a physics grad student). I’ve taught (and written a textbook on) graduate level electrodynamics, which is basically a thinly disguised course in elliptical and hyperbolic PDEs. I’ve written a book on large scale cluster computing that people still use when setting up compute clusters, and have several gigabytes worth of code in my personal subversion tree and cannot keep count of how many languages I either know well or have written at least one program in dating back to code written on paper tape. I’ve co-founded two companies on advanced predictive modelling on the basis of code I’ve written and a process for doing indirect Bayesian inference across privacy or other data boundaries that was for a long time patent pending before trying to defend a method patent grew too expensive and cumbersome to continue; the second company is still extant and making substantial progress towards perhaps one day making me rich. I’ve did advanced importance-sampling Monte Carlo simulation as my primary research for around 15 years before quitting that as well. I’ve learned a fair bit of climate science. I basically lack a detailed knowledge and experience of only computational fluid dynamics in the list above (and understand the concepts there pretty well, but that isn’t the same thing as direct experience) and I still have a hard time working through e.g. the CAM 3.1 documentation, and an even harder time working through the open source code, partly because the code is terribly organized and poorly internally documented to the point where just getting it to build correctly requires dedication and a week or two of effort.
Oh, and did I mention that I’m also an experienced systems/network programmer and administrator? So I actually understand the underlying tools REQUIRED for it to build pretty well…
If I have a hard time getting to where I can — for example — simply build an openly published code base and run it on a personal multicore system to watch the whole thing actually run through to a conclusion, let alone start to reorganize the code, replace underlying components such as its absurd lat/long gridding on the surface of a sphere with rescalable symmetric tesselations to make the code adaptive, isolate the various contributing physics subsystems so that they can be easily modified or replaced without affecting other parts of the computation, and so on, you can bet that there aren’t but a handful of people worldwide who are going to be able to do this and willing to do this without a paycheck and substantial support. How does one get the paycheck, the support, the access to supercomputing-scale resources to enable the process? By writing grants (and having enough time to do the work, in an environment capable of providing the required support in exchange for indirect cost money at fixed rates, with the implicit support of the department you work for) and getting grant money to do so.
And who controls who, of the tiny handful of people broadly enough competent in the list above to have a good chance of being able to manage the whole project on the basis of their own directly implemented knowledge and skills AND who has the time and indirect support etc, gets funded? Who reviews the grants?
Why, the very people you would be competing with, who all have a number of vested interests in there being an emergency, because without an emergency the US government might fund two or even three distinct efforts to write a functioning climate model, but they’d never fund forty or fifty such efforts. It is in nobody’s best interests in this group to admit outsiders — all of those groups have grad students they need to place, jobs they need to have materialize for the ones that won’t continue in research, and themselves depend on not antagonizing their friends and colleagues. As AR5 directly remarks — of the 36 or so named components of CMIP5, there aren’t anything LIKE 36 independent models — the models, data, methods, code are all variants of a mere handful of “memetic” code lines, split off on precisely the basis of grad student X starting his or her own version of the code they used in school as part of newly funded program at a new school or institution.
IMO, solving the problem the GCMs are trying to solve is a grand challenge problem in computer science. It isn’t at all surprising that the solutions so far don’t work very well. It would rather be surprising if they did. We don’t even have the data needed to intelligently initialize the models we have got, and those models almost certainly have a completely inadequate spatiotemporal resolution on an insanely stupid, non-rescalable gridding of a sphere. So the programs literally cannot be made to run at a finer resolution without basically rewriting the whole thing, and any such rewrite would only make the problem at the poles worse — quadrature on a spherical surface using a rectilinear lat/long grid is long known to be enormously difficult and to give rise to artifacts and nearly uncontrollable error estimates.
But until the people doing “statistics” on the output of the GCMs come to their senses and stop treating each GCM as if it is an independent and identically distributed sample drawn from a distribution of perfectly written GCM codes plus unknown but unbiased internal errors — which is precisely what AR5 does, as is explicitly acknowledged in section 9.2 in precisely two paragraphs hidden neatly in the middle that more or less add up to “all of the `confidence’ given the estimates listed at the beginning of chapter 9 is basically human opinion bullshit, not something that can be backed up by any sort of axiomatically correct statistical analysis” — the public will be safely protected from any “dangerous” knowledge of the ongoing failure of the GCMs to actually predict or hindcast anything at all particularly accurately outside of the reference interval.