The claim is often made that climate models should be believed because they are just physics. In a paper I published a few years ago, I argued that this is not how science works. Even valid scientific theories may not yield precise predictions, for various reasons such as heterogeneity (e.g., earthquakes). In this post I extract some of the key results, about 1/3 of the full paper. If anyone cannot access the journal version just email me. Here is the citation:
Loehle, C. 2018. Epistemological Status of General Circulation Models. Climate Dynamics 50:1719-1731. DOI 10.1007/s00382-017-3737-7.
The epistemological status of general circulation models
Craig Loehle, Ph.D.
National Council for Air and Stream Improvement, Inc. (NCASI)
Craigloehl@aol.com
Abstract. Forecasts of both likely anthropogenic effects on climate and consequent effects on nature and society are based on large, complex software tools called general circulation models (GCMs). Forecasts generated by GCMs have been used extensively in policy decisions related to climate change. However, the relation between underlying physical theories and results produced by GCMs is unclear. In the case of GCMs, many discretizations and approximations are made, and simulating Earth system processes is far from simple and currently leads to some results with unknown energy balance implications. Statistical testing of GCM forecasts for degree of agreement with data would facilitate assessment of fitness for use. If model results need to be put on an anomaly basis due to model bias, then both visual and quantitative measures of model fit depend strongly on the reference period used for normalization, making testing problematic. Epistemology is here applied to problems of statistical inference during testing, the relationship between the underlying physics and the models, the epistemic meaning of ensemble statistics, problems of spatial and temporal scale, the existence or not of an unforced null for climate fluctuations, the meaning of existing uncertainty estimates, and other issues. Rigorous reasoning entails carefully quantifying levels of uncertainty.
1 Introduction
General circulation models (GCMs) attempt to embody the current understanding of climate dynamics via process equations and numerically solve these equations to simulate climate with various scenarios of human influences (Taylor et al. 2012). These models are complex and have been evolving since the 1960s (Manabe and Wetherald 1967). The output of GCMs is given a central place in formulating public energy policy. The basis for this central policy position is that the models are based on physics (IPCC 2013), with high confidence (>95%) given to many attribution and forecast results (IPCC 2013 SPM). IPCC also reports that GCMs do a good job of matching historical data and that without including greenhouse gases the match is not good (IPCC 2013, Fig. SPM.6).
There is a vast literature that compares GCM outputs to various climate features (see following sections). Such tests are complicated by the stochastic nature of both climate and the models. GCM vs. data comparisons are judged to be poor, adequate, good, or excellent, depending on the variable and the study (McWilliams 2007). This ambiguity results from a multiplicity of criteria of model goodness as well as varying results.
Evaluating knowledge claims (of which there are several) based on GCMs can be aided by a consideration of epistemology (see Williams 2001 for an overview), which is the logical framework for evaluating how we know and what is knowable. With an epistemological analysis, we can assess the status of a theory/model in terms of its logical basis, reliability, and rigor. With this framework we can evaluate both the tests of model goodness and the consistency of results derived from GCMs with known physics. I first illustrate these issues from several areas of science and then return to the question of the epistemological status of climate models.
2 Models and epistemology
Science is the process of formally discovering regularities in nature. An explanation of or formal model for a regularity in nature is called a theory (or law if it is well-supported). Newton’s law of gravity is a classic and simple example. In this case, the obedience of objects to this law at human scales is apparently exact. Such highly accurate theories are commonly treated as explanatory.
The ideal case of testable theories can be found in classical physics. Newton’s and Maxwell’s laws make very specific predictions as well as forbidding certain things from happening. These laws were convincingly demonstrated by experiments, but note that even here confounding factors such as friction must be controlled in order to test them. In these cases, the standard of theory validity is very high. Experimental data often match theory almost perfectly and events such as the return of a comet can be predicted decades in advance. The apparent perfection of these laws has perhaps led to a belief that they are “true” in the absolute, logical sense, but as noted even gravity has some unexplained features.
Valid and useful theories, however, do not spring into life fully formed and perfect, nor are they always as accurate as Maxwell’s equations. When Alfred Wegener (trans. 1966) proposed the theory of continental drift in 1912, it cannot in any sense be said that his theory was mature. A mechanism for continental movement was lacking (and it seemed impossible to many that continents could move), as was sufficient supporting data. As data were gathered, particularly on sea floor spreading and the process of subduction, a coherent picture came into existence of plate movements, the rise of mountain ranges, the origin of volcanoes, and the reason for the location of earthquake zones. However, after a century of maturation of this theory, it remains a qualitative theory because while it can explain the general locations of earthquake and volcanic zones, it cannot predict the size, precise location, or timing of either earthquakes or volcanic eruptions due to the heterogeneity of the Earth’s crust and the impossibility of obtaining detailed data. Thus, even a mechanistic and well-tested theory need not be able to make precise predictions, perhaps ever. As a theory matures, it hopefully becomes more precise, but this is not guaranteed (Loehle 1983).
There is an asymmetry noted by Popper (1959, 1963) in his famous Principle of Demarcation: it is possible to reliably disprove a theory, but a theory can never be proven. Instead, successive successful tests of a theory only increase our confidence in it. This does not mean that we know nothing, as knowledge relativists might assert, but rather that scientific knowledge is provisional, bounded (gravity is not clearly explicable at the atomic level), and a matter of degree (Loehle 2011). In some cases this knowledge can encompass many significant digits, but in others, it may be more qualitative.
Critically, testing an evolving theory does not and should not follow the simple hypothesis testing model used in empirical experimentation. When testing a medicine vs. a placebo, a simple better or worse or a “how much” answer often results from statistical tests. When testing a theory, there are multiple aspects of the theory that may each receive partial support at a particular time, and alternate explanations that may need to be ruled out (Reiss 2015). A network of confirmation, mathematics, and causal explanation supports belief in a theory at any moment, not a simple yes/no. As a theory becomes more mature and more rigorously tested, we ascend the scale of epistemic certainty. There is an asymmetry, however, from proving a theory to using it for some calculation. The tests that lead to acceptance of a theory as “true” are often done under carefully controlled and ideal conditions, such as a vacuum. In any calculation based on a theory we may instead be using it under non-ideal conditions. For example, a falling feather behaves differently in a vacuum compared to in air. The bridge from idealized physics to real world applications is the set of approximations, simplifications, discretizations, empirical relationships, estimated initial conditions, and numerical methods used to create a calculation tool (Loehle 1983) that can be used to compute some result. These bridge relationships are what prevent a calculation tool from being a perfect representation of the underlying physical (or other) theory. If these confounding factors are sufficiently difficult to quantify and model, we may not be able to make any predictions (e.g., for the path of a dropped feather). The correctness of a calculation tool is thus an empirical question of how accurate or useful it is, rather than a question of true or false as we take it to be for theories/laws.
3 Basis of climate models in physics
What then is the epistemological status of GCMs in terms of their basis in physics? GCMs are a mix of simulated processes that are viewed as well-understood physics (e.g., radiative transfer) and those that are poorly understood (e.g., cloud microphysics, IPCC 2013, p. 599). To what extent can we trace the algorithms used directly back to known physics? To what extent does the basis in physics prove their truth value, explanatory power, or reliability? As we have seen above, theories in physics that approximate our common notions of “truth” are, at least in idealized settings (e.g., frictionless vacuums), able to make very precise real-world predictions. Can GCMs approximate such clean physical theories as Newton’s laws of motion in a vacuum? If so, then a great deal of confidence in their results is warranted. However, even for a simple problem like tossing a die or flipping a coin, sensitivity to initial conditions means that the outcome cannot be predicted even though based on known physics. In the case of climate models, Rougier and Goldstein (2014) state that the laws of the Earth’s climate system are not all known and are not explicitly solvable at sufficient resolution. Katzav et al. (2012) note that model completeness and structural stability are unknown. This is particularly true for the Navier-Stokes (N-S) equations for fluid dynamics, for which no analytic solutions are known. This inability to explicitly solve the equations is why numerical simulation is used. However, the proper simulation of the equations of fluid dynamics is far from straightforward (Thuburn 2008). A particular problem is that while the proper solution of these equations requires conservation of mass, energy, momentum, and other properties in a continuous fashion (at infinitely many scales) because they are partial differential equations, the models are discrete. Processes such as dissipation of energy and the propagation of vortices occur below the grid scale and no theory exists to guarantee that the gridded model handles them properly (McWilliams 2007; Marston et al. 2016). Simulated processes within a grid may not propagate smoothly to neighboring cells, creating the potential for ringing, the accumulation of numerical solution errors with time, or result in errors in winds or proper modeling of phenomena such as the Quasi-Biennial Oscillation (Thuburn 2008). These issues have not been adequately resolved (e.g., Katzav et al. 2012) and, in fact, the solution of N-S equations remains a Millennium problem (see http://www.claymath.org/millennium-problems/navier-stokes-equation). Thus, the models may violate conservation laws and exhibit numerical solution artifacts. Stevens and Bony (2013) showed, for example, that even in an idealized model of a water planet with prescribed surface temperatures, the spatial responses of clouds and precipitation to warming are quite different depending on the model. This illustrates that agreement has not been reached on how to represent or compute these processes on a grid. Zhou et al. (2015) document errors in how solar radiation is zonally averaged in some models. Staniforth and Thuburn (2012) document that all existing grid numerical solution schemes have known problems including grid imprinting and the excitation of computational modes. The inadequacy of current gridding schemes is shown by the fact that a higher resolution model often produces many differences compared to current models (Sakamoto et al. 2012). Improved numerical methods continue to be introduced to resolve the known problems with solving N-S PDEs (e.g., Marston et al. 2016). In addition, sub-grid parameterizations exist in all models (McWilliams 2007; Katzav et al. 2012; Hourdin et al. 2016) increasing uncertainty. McWilliams (2007) notes that small structural (equation form) differences in sub-grid parameterizations can lead to different dynamical attractors in such fluid dynamics systems.
Let us consider the most fundamental physics of climate models: the radiative properties of CO2 in the atmosphere. While there is indeed a basic theory for this process, there are many radiative transfer software tools (Oreopoulos and Mlawer 2010) because calculation of radiative transfer on a globe with a heterogeneous atmosphere is a difficult numeric problem, unlike the acceleration of a falling body in a vacuum. The spectrum is evaluated at different resolutions using various geometric assumptions and methods in each of these tools. More seriously, Oreopoulos and Mlawer (2010) document that 1) the basic theory itself continues to evolve; 2) the algorithms used in GCMs are much simplified due to computational considerations; and 3) different GCMs do not use the same radiative transfer algorithms. It is thus clear that even here there is a gap between basic theory and what is computed, with unclear consequences.
Likewise, each GCM makes different assumptions about forcing histories, clouds, land surfaces, spatial gridding, etc., and uses different numerical methods for solution. Estimated forcings changed considerably between the IPCC AR4 and AR5 reports, and the effect of aerosols is still being revised (e.g., Stevens 2015) with major differences in representation between models (Wilcox et al. 2013). Parameterizations (i.e., empirical relationships) are used for processes that take place below the grid resolution, such as cloud behaviors and precipitation (McWilliams 2007). These empirical relationships have free parameters that must be tuned (Lahsen 2005; McWilliams 2007; Mauritsen et al. 2012; Schmidt and Sherwood 2015; Hargreaves 2010; Hourdin et al. 2016) and these tunings can be arbitrary (e.g., Soon et al. 2001, their Fig. 4). Errors in these approximations are difficult to quantify, but certainly take the models far from the domain of pure representation of ideal laws of physics such as black-body radiation from a uniform surface of known temperature, as also argued by Katzav et al. (2012). Arguments can also be made that significant physical processes are left out of the models, such as effects of the Earth’s electric field (Andersson et al. 2014).
If GCMs cannot be viewed as precise representations of theory based on the derivation of some components from well-supported physics (per above), what epistemological status do they have? One approach to assessing their truth value is to argue, not forward from the underlying physics, but back from the quality of their outputs. It can be successfully argued that they do embody aspects of current understanding of the Earth climate system or they would not work at all. Katzav (2014) and Schmidt and Sherwood (2015), for example, argue that this knowledge embodiment is indicated by the superiority of current models compared to a naïve model or compared to previous generation climate models. Smith (2002) and Oreskes et al. (1994) suggest that the models are a useful analogy or heuristic. McWilliams (2007) argues that because of irreducible uncertainty in model outputs due to chaotic dynamics, GCMs should be judged based on plausibility rather than whether they are correct or best. He argues that the models “yield space-time patterns reminiscent of nature … thus passing a meaningful kind of Turing test between the artificial and the actual.” The IPCC (2013, p. 145) states that these models can be viewed as tools for learning about the climate system. Many outputs (particularly temperature) show good agreement between models, indicating some sort of truth value to the models (Räisänen 2007). However, inter-model agreement can arise from common assumptions, shared algorithms, and similar data used for tuning. Parker (2011) argues that agreement of predictions across models, while providing some supporting evidence, is not sufficient to establish any epistemic certainty in their truth value. For these reasons, efforts to confirm (verify) climate models (e.g., Lloyd 2010, discussion in Katzav et al. 2012) are missing the point. While these models can be plausible, pass a Turing test of sorts, and agree with each other, the problems of irreducible dynamics and numeric uncertainty (e.g., McWilliams 2007) and other issues mean that the theoretical underpinning of the models cannot be assumed to imply validity for making useful predictions. This raises the question of their usefulness as predictive tools, discussed next.
4 Climate models as calculation tools
Because GCMs are continuously evolving and some aspects may lack a rigorous and close link to the underlying physics, they are unfalsifiable by Popper’s criteria (see Curry and Webster 2011), and must be judged as calculation tools. It is thus necessary to test the models in some way before using them.
Testing complex simulation models is difficult. The large number of tuned (estimated from data) parameters in these models (Murphy et al. 2004; Hargreaves 2010; Schmidt and Sherwood 2015; Hourdin et al. 2016) suggests that model parametric uncertainty could be high but this has been insufficiently evaluated to date (Guttorp 2014). There are potential structural (equation form), parameter, and data error issues (Loehle 1987, 1988; Hourdin et al. 2016) that have been little explored. There are many specific types of sensitivity and error analyses that can be conducted (e.g., Falloon et al. 2014; Guttorp 2014; Rougier and Goldstein 2014) to evaluate the reliability of model outputs, but these methods have almost never been applied to GCMs because of their large computational burden (Falloon et al. 2014). Allen and Ingram (2002) and McWilliams (2007) argue that ensembles of opportunity (a collection of models) do not adequately sample model uncertainty and recommend a full uncertainty (initial condition, parametric, equation functional form, numerical method, etc.) analysis in order to bound possible forecasts, an analysis which has still not been performed for GCMs. Thus, critical information for decision makers on model uncertainty is not available for GCMs.
Models of turbulent dynamics exhibit sensitivity to initial conditions (Frigg et al. 2013). Given a structurally perfect model (i.e., all equations and parameters are correct; numerical methods work correctly), the effect of initial condition uncertainty can be estimated by making multiple runs with perturbed initial conditions, giving a probability distribution for the outputs. This assumes that the errors in initial conditions can be characterized and that a sufficient number of runs can be made, neither of which is usually true in the case of climate models (McWilliams 2007). In a unique case study, Deser et al. (2016) perturbed a base run with machine error-level noise (i.e., round-off error) applied to the initial temperature field. They found very large differences in winter 50 year trends for regions of North America across 30 runs of several °C. They found that an ensemble approach could separate the internal variability vs. the forced signal to give better agreement with historical data. However, this is based on an infinitesimal initial condition perturbation. True initial condition uncertainties are many orders of magnitude greater. More significantly, if there are any structural errors (wrong equation form to represent a process), this stochastic perturbation of initial conditions can be not only uninformative, but misleading (Smith 2002; Frigg et al. 2014; Hourdin et al. 2016).
It may be more informative to examine GCM outputs more narrowly rather than as a whole to see what can be predicted with sufficient accuracy. The IPCC (2013) graphs GCM outputs of global mean temperature since 1850 on an anomaly basis (as departures from the mean), but if plotted on an absolute temperature basis, the time series differ by up to 4° C (SI Fig. 2). A similar result (up to 4° C offsets) was found for the continental US (Anagnostopoulos et al. 2010). This is not a trivial difference because long-wave radiation from an object by the Stefan-Boltzmann relation is proportional to the fourth power of the surface absolute temperature (Anagnostopoulos et al. 2010). If models differ in mean temperature by this much, are they handling the basic physics in the same ways or implementing the physics with correct algorithms? This raises epistemic questions about the forecasts produced by GCMs. Hawkins and Sutton (2016) note that it has been argued that if the response to increased forcing is linear, then the absolute temperature does not matter much for estimating a response to increased forcing. However, if there is strong positive feedback, then response to increased forcing is greater at higher temperatures (Bloch-Johnson et al. 2015, Gregory et al. 2015). If, in contrast, negative feedback acts to dampen CO2 forcing (e.g., Spencer and Braswell 2011), this would also depend on actual temperature. In either case, absolute temperature would matter (i.e., the response is nonlinear) and the use of anomalies cannot be justified. Anomalies, sometimes called “bias-correction”, are also used for comparing other climate outputs. However, crops, biodiversity, sea level, and ice sheets all respond to actual precipitation and temperatures, and thus the different models would forecast very different impacts even if their anomaly trends matched, as noted by Hawkins and Sutton (2016). The net effect of bias correction or use of anomalies is to obscure the epistemological status of the models by reducing the spread of the model outputs with respect to each other and making disagreements with data difficult to determine.
The use of bias correction can cause other difficulties with testing. Consider the case of comparing global temperature histories to model outputs. If data are in actual °C or are shifted to a common baseline over some period, the correlation statistic is not affected because the constant term drops out of the computation. For other measures, however, the baseline can have an effect. For example, the R2 statistic for model goodness of fit will be different for actual vs. anomaly series, and can actually be negative for unshifted series (i.e., the fit to data is worse than to a simple mean of the data). Hawkins and Sutton (2016) note that normalization (baseline shifting) of a climate series is based on a reference period, typically 30 years, but it can be the entire period of record. Both data and model output are shifted up or down so that their respective means over the reference period are zero. When comparing multiple runs of a single model or of multiple models vs. data, they will all agree most closely during the reference period. This means that the visual impression of model fit or the timing of model good or bad performance can depend completely on the reference period chosen (see Hawkins and Sutton 2016 for examples). This impacts, for example, the question of whether models are currently running hotter than the data. The closer the chosen reference period is to the present, the greater the apparent agreement between the models and data in recent years. For fit statistics such as R2, the choice of reference period can also affect the result and thus the implied model fit. For example, in Figure 2 an artificial example is shown. In Figure 2a, the data and model are both shifted to the 100 year reference period (mean 0). The fit appears visually to be quite good, and R2 = 0.79. However, in Figure 2b the most recent 30 years is used as the reference period. Now the model appears to fit worse in the past and better (almost perfectly) in recent decades, but now R2 = 0.54, a considerable degradation. This raises an epistemic dilemma. If correlation is used as a measure of common trend and pattern (e.g., ups and downs of temperature), this does not account for the bias (offset) in model outputs. If models and data are put on an anomaly basis, this assumes for temperature and precipitation that actual values don’t matter, only the trend, but this is still open to debate. Furthermore, the reference period chosen affects both the visual impression of model goodness-of-fit (for both ensemble spread and pattern of fit over time) and all fit statistics except simple correlation. Issues such as this have implications for epistemic certainty.
5 Conclusions
What, then, of the knowledge question posed by GCMs? As parameterized simulators that generate climate behavior, these tools must fundamentally be judged statistically, quantitatively. Qualitative assessments do not answer the key policy-relevant questions of how much warming, when, and where. Held (2005) argues that achieving improved knowledge of the climate requires the development of simplified, idealized “worlds” (e.g., see SI Fig. 1) to enable an exploration of the processes of large-scale turbulence, heat transfer to the poles, ocean circulation, and particularly how large climate features such as ENSO can persist. Without this exploration of mechanisms, Held argues, it is not possible to explain why different GCMs produce different outputs, why they differ from data, and how they can be improved. This is because the complexity of the models results in epistemic opacity. Proper explanations of the behavior of complex hierarchical systems such as the climate must usually be multilevel and account for factors such as ocean currents, continents, and clouds. Improved understanding achieved in this way could lead to better sub-grid parameterizations. An example is the recent work by Moncrieff et al. (2017) which derives a multi-scale approach to understanding of organized tropical convection that can be used to develop sub-grid parameterizations.
If climate models are only “similar to” the real Earth system and act more as an analogy (Oreskes et al. 1994) or as exploratory tools, then they are most useful as a basis for qualitative predictions such as that some warming is likely. If the models can make some predictions (e.g., global temperature) with acceptable precision, it is important to determine which variables can be so predicted. If models exhibit a common bias, perhaps this bias can be accounted for in making policy decisions. Explanations for model performance differences should be pursued, especially the wide range of future trajectories. Given the complexity of the Earth climate system, the foundational basis for the knowledge claims made based on GCMs deserves greater attention. Epistemology, properly applied, can help clarify what we know, how we know it, and the limits of rigorous reasoning that can be justified.
Climate change poses a wicked policy problem. There is a high risk both from action and inaction. This paper does not lead to any particular policy conclusion. Rather, it focuses on the methods that lead to rigorous reasoning. Policy decisions necessarily also involve perceptions of risk, tolerance of risk, cultural values, economics, and other factors beyond the scope of this analysis. However, any policy can only benefit from a better understanding of how climate models are constructed, their physical basis, how they can be tested, and how to assess their outputs.
References
Allen MR, Ingram WJ (2002) Constraints on future changes in climate and the hydrological cycle. Nature 419:224-232.
Anagnostopoulos GG, Koutsoyiannis D, Christofides A, Efstratiadis A, Mamassis N (2010) A comparison of local and aggregated climate model outputs with observed data. Hydrol Sci J 55:1094-1110.
Andersson ME, Verronen PT, Rodger CJ, Clilverd MA, Seppälä A (2014) Missing driver in the sun-earth connection from energetic electron precipitation impacts mesospheric zone. Nat Comm 5:5197.
Bloch-Johnson J, Pierrehumbert RT, Abbot DS (2015) Feedback temperature dependence determines the risk of high warming. Geophys Res Lett 42:4973-4980.
Curry JA, Webster PJ 2011. Climate science and the uncertainty monster. Bull Am Meteorol Soc 92:1667-1682.
Deser C, Terray L, Phillips AS (2016) Forced and internal components of winter air temperature trends over North America during the past 50 years: mechanisms and implications. J Climate 29:223-2258.
Falloon P, Challinor A, Dessai S, Hoang L, Johnson J, Koehler A-K (2014) Ensembles and uncertainty in climate change impacts. Front Environ Sci 2:33.
Frigg R, Bradley S, Du H, Smith LA (2014) Laplace’s demon and the adventures of his apprentices. Philos Sci 81:31-59.
Gregory JM, Andrews T, Good P (2015) The inconstancy of the transient climate response parameter under increasing CO2. Philos Trans Roy Soc A 373:20140417.
Guttorp P (2014) Statistics and climate. Ann Rev Stat Appl 1:87-101.
Hargreaves JC (2010) Skill and uncertainty in climate models. Wiley Interdisciplinary Reviews: Climate Change 1:556-564.
Hawkins E, Sutton R (2016) Connecting climate model projections of global temperature change with the real world. Bull Am Meteorol Soc 2016:963-980.
Held IM (2005) The gap between simulation and understanding in climate modeling. Bull Am Meteorol Soc 86:1609-1614.
Hourdin F, Mauritsen T, Gettelman A, Golaz J-C, Balaji V, Duan Q, Folini D, Ji D, Klocke D, Qian Y, Rauser F, Rio C, Tomassini L, Watanabe M, Williamson D (2016) The art and science of climate model tuning.” Bull Am Meteorol Soc in press.
IPCC (2013) Climate Change 2013: The Physical Science Basis. Contribution of Working Group I to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change [Stocker, T.F., D. Qin, G.-K. Plattner, M. Tignor, S.K. Allen, J. Boschung, A. Nauels, Y. Xia, V. Bex and P.M. Midgley (eds.)]. Cambridge University Press, Cambridge, United Kingdom and New York, NY, USA, 1535 pp.
Katzav J (2014) The epistemology of climate models and some of its implications for climate science and the philosophy of science. Studies in History and Philosophy of Modern Physics 46:228-238.
Katzav J, Dijkstra HA, de Laat ATJ (2012) Assessing climate model projections: state of the art and philosophical reflections. Studies in History and Philosophy of Modern Physics 43:258-276.
Lahsen M (2005) Seductive simulations? Uncertainty distribution around climate models. Social Studies of Science 35:895-922.
Lloyd EA (2010) Confirmation and robustness of climate models. Philos Sci 77:971-984.
Loehle C (1983) Evaluation of theories and calculation tools in ecology. Ecol Modell 19:239-247.
Loehle C (1987) Errors of construction, evaluation, and inference: a classification of sources of error in ecological models. Ecol Modell 36:297-314.
Loehle C (1988) Philosophical tools: potential contributions to ecology. Oikos 51:97-104.
Loehle C (2011) The logic of scientific discovery. Current Trends in Ecology 2:75-81.
Marston JB, Chini GP, Tobias SM (2016) Generalized quasilinear approximation: application to zonal jets.” Physical Rev Lett 116:21450.
Mauritsen T, Stevens B, Roeckner E, Crueger T, Esch M, Giorgetta M, Haak H, Jungclaus J, Klocke D, Matei D, Mikolajewicz U, Notz D, Pincus R, Schmidt H, Tomassini L (2012) Tuning the climate of a global model. J Adv Model Earth Sys 4:M00A01.
McWilliams JC (2007) Irreducible imprecision in atmospheric and oceanic simulations. Proc Natl Acad Sci 104:8709-8713.
Moncrieff, M.W., Liu, C., and Bogenschutz, Peter. 2017. Simulation, modeling, and dynamically based parameterization of organized tropical convection for global climate models. doi:10.1175/JAS-D-16-0166.1.
Murphy JM, Sexton DMH, Barnett DN, Jones GS, Webb MJ, Collins M, Stainforth DA (2004) Quantification of modelling uncertainties in a large ensemble of climate change situations. Nature 430:768-772.
Oreopoulos L, Mlawer E (2010) The Continual Intercomparison of Radiation Codes (CIRC): assessing anew the quality of GCM radiation algorithms. Bull Am Meteorol Soc 91:305-310.
Oreskes N, Shrader-Frechette K, Belitz K (1994) Verification, validation, and confirmation of numerical models in the earth sciences. Science 263:641-646.
Parker WS (2011) When climate models agree: the significance of robust model predictions. Philos Sci 78:579-600.
Popper KR (1959) The logic of scientific discovery. Hutchinson, London.
Popper KR (1963) Conjectures and refutations: the growth of scientific knowledge. Harper & Row New York.
Reiss J (2015) A pragmatist theory of evidence. Philos Sci 82:341-362.
Rougier J, Goldstein M (2014) Climate simulators and climate projections. Ann Rev Stat Appl 1:103-123.
Sakamoto TT, Komuro Y, Nishimura T, Ishii M, Tatebe H, Shiogama H, Hasegawa A, Toyoda T, Mori M, Suzuki T, Imada Y, Nazawa T, Takata K, Mochizuki T, Ogochi K, Emori S, Hasumi H, Kimoto M (2012) MICRO4h – a new high resolution atmosphere-ocean coupled general circulation model. J Meteorol Soc Japan 90:325-359.
Schmidt GA, Sherwood S (2015) A practical philosophy of complex climate modelling. Eur J Philos Sci 5:149-169.
Smith LA (2002) What might we learn from climate forecasts? Proc Natl Acad Sci 99:2487-2492.
Soon W, Baliunas S, Idso SB, Kondratyev KY, Posmentier ES (2001) Modeling climatic effects of anthropogenic carbon dioxide emissions: unknowns and uncertainties. Climate Research 18:259-275.
Spencer RW, Braswell WD (2011) On the misdiagnosis of surface temperature feedbacks from variations in Earth’s radiant energy balance. Remote Sensing 3:1603-1613.
Staniforth A, Thuburn J (2012) Horizontal grids for global weather and climate prediction models: a review. Quart J Royal Meteorol Soc 138:1-26.
Stevens B (2015) Rethinking the lower bound on aerosol radiative forcing. J Climate 28:4794-4819.
Stevens B, Bony S (2013) What are climate models missing? Science 340:1053.
Taylor KE, Stouffer RJ, Meehl GA (2012) An overview of CMIP5 and the experiment design. Bull Am Meteorol Soc 93:485-498.
Thuburn J (2008) Some conservation issues for the dynamical cores of NWP and climate models. J Comput Phys 227:3715-3730.
Wegener A (1966) The origin of continents and oceans (Biram J, trans.). Courier Dover. p 246.
Wilcox LJ, Highwood EJ, Dunstone NJ (2013) The influence of anthropogenic aerosol on multi-decadal variations of historical global climate. Environ Res Lett 8:024033.
Williams M (2001) Problems of knowledge: a critical introduction to epistemology. Oxford University Press.
Zhou L, Zhang M, Bao Q, Liu Y (2015) On the incident solar radiation in CMIP5 models. Geophys Res Lett 42:1930-1935.