Guest Post by Willis Eschenbach (@WEschenbach on X, my personal blog is here)
Dear friends, pull up a chair and pour a good strong coffee.
Because today’s topic is a real treat: yet another “Indicators of Global Climate Change” paper, this time the 2025 update, in which a team of the usual suspects announce—for the thirty’leventh time—that the planet is warming, that evil humans did it, and they can now put single decimal numbers on “human induced warming” and the remaining carbon budget as though the climate system were a well debugged spreadsheet rather than a barely observed, non linear, chaotic, multiscale nightmare.
People misunderestimate the complexity of the climate. It has no less than six major subsystems: atmosphere, hydrosphere, lithosphere, biosphere, cryosphere, and electrosphere. Each of these systems has its own internal cycles, forcings, responses, and resonances. And all of them are constantly interacting at spatial scales from the molecular to the planetary, and temporal scales from nanoseconds to millennia. Willis’s First Rule of Climate states, “In the climate, everything is connected to everything else, which in turn is connected to everything else … except when it’s not”. It’s the most complex system we’ve ever tried to model, and we’ve barely scratched the surface.
Now, don’t get me wrong. I’ve written lots of computer models. I like indicators. I like data. I like having regular snapshots of where we are.
But what I don’t like is pretending that running the same assumptions through a slightly updated sausage machine counts as validation of anything. These folks are not testing the system. They’re just doing annual bookkeeping within an untested framework and then declaring victory. That’s not science; that’s climate accounting with a side of unwarranted confidence.
Let’s start with the basics. The paper is explicitly aligned with IPCC AR6 methods. It tracks emissions, concentrations, effective radiative forcing, surface temperature, Earth’s energy imbalance, sea level, and so on, then uses a simple climate model (FaIR) tuned to AR6 and a handful of observational constraints to spit out “human-induced warming”.
In other words, they take AR6’s structure, plug in updated emissions and temperature series, and out pops the updated “human-caused warming is now 1.37 °C” kind of number. Handy, perhaps. But notice what they’re not doing: they’re not independently checking whether the underlying model family can actually reproduce crucial features of the real climate system when it’s not being hand-held.
This brings us to a topic near and dear to my heart—verification and validation, or as grown-up modelers call it, V&V. I wrote my first computer program sixty-three years ago this month, so I know more than a bit about computer models and V&V.
Let me strip it down to the bare essentials. Here’s the TL;DR version.
- Verification is asking “Did we solve the equations right?”
- Validation is asking “Did we solve the right equations?”
Two very different questions.
First, some background. In the adult parts of engineering—nuclear, aerospace, mechanical, structural, medical devices—V&V isn’t some optional nicety. It’s standard operating procedure. If you build a computational model that will be used to design a bridge, a reactor, or a heart valve, you’re expected to show that (1) the code does what you think it does, and (2) the whole model is actually a decent representation of the real physical system in the situations you care about.
Those are verification and validation, respectively. Miss either one and you’re in the realm of wishful thinking, not engineering.
So what is “verification”? At its core, verification is about the implementation, not the real world. It’s the process of checking that the computer code accurately solves the mathematical model you’ve chosen. You can think of it as asking “Are the numerics right?”
That includes all the unglamorous but vital stuff: does the code solve the right equations; are the boundary conditions implemented correctly; do conservation laws hold; does the solution converge as you refine the grid or timestep; and does the model reproduce known analytical or benchmark solutions when you feed it simple test cases? You’re comparing the code against the intended equations and algorithms, not against observations of nature. If your conceptual model is wrong, perfect verification won’t save you—but at least you know the error isn’t a typo in the code.
Here, we run into the problem that causes climate modelers to smile weakly and deliberately look the other way when it is mentioned. We do not have a general proof that the full, parameterized, rotating, moist Navier–Stokes system used in operational climate models converges in any rigorous, global sense as you push the grid spacing toward zero.
This is more than a theoretical problem. It means that no, we do NOT know if the models are actually using the right math in the right way. And in fact, in some models, error norms or diagnostics do not decrease smoothly with grid refinement; instead, they can flatten or even increase over certain resolution ranges because new scales of motion are partially resolved, numerical diffusion changes regime, or wave–mean flow interactions behave differently as the grid gets finer.
How do the modelers get around this? By using dozens and dozens of tunable parameters that nudge the climate model toward the desired centerline when it gets too near the ditches, as I described in the post below.
So we have no guarantee that the climate models are actually doing the right math in the right way.
Then we get to “validation”. This is where you step out of the mathematical sandbox and look up at the real world. Validation asks a harder and much more interesting question: given that the code solves some set of equations correctly (which we don’t know), does that set of equations provide an adequate representation of the actual physical system for the purpose you care about?
Here, you’re no longer comparing the model to itself. You’re comparing model outputs to measurements, experiments, or observational data that weren’t already used to tune the model. You run the model in regimes where you know what nature does, and you ask whether the model’s predictions fall within acceptable bounds. Crucially, “acceptable” is not a metaphysical word—it’s defined relative to the decisions you’re going to base on that model. Designing a rocket nozzle and estimating the global mean temperature in 2100 have very different tolerance requirements.
So, in practice, verification answers “is the code faithful to the math?” while validation answers “is the math faithful to the world, for this job?” You need both if you want to claim that your model is more than an elaborate curve-fitting exercise.
And this is why I’m talking about V&V in climate modeling.
A model that hasn’t been rigorously verified might just be a fancy random number generator with good PR.
A model that hasn’t been rigorously validated—against out-of-sample phenomena, with clear pass/fail criteria—is simply not entitled to strut around as if it “understands” the climate system.
At best, it’s a hypothesis generator. At worst, it’s a very expensive way of confirming what you already believe.
Look, if an engineer builds a bridge model, you don’t just eyeball the output and say “looks about right”. You verify that the code is solving the equations you think it is, and you validate the model by comparing its predictions to independent reality it hasn’t already been tuned to. You push it into regimes where you actually know the answer. If it fails, you fix it, or you stop trusting it. Simple.
In the IPCC world, though, “validation” mostly means “it lines up with AR6 and some global mean time series we already used to calibrate it.” FaIR, the simple climate model they use, is calibrated to the CMIP6 ensemble and to global mean temperature and ocean heat content. The paper then reuses that same model to say how much of the observed warming is human-caused and what the remaining carbon budget is. That’s not validation; that’s circular reasoning dressed up as modeling elegance. The sad truth is that the software that runs a high-rise elevator has had far more V&V than modern climate models.
They never show, for example, that this whole AR6–FaIR pipeline can reproduce any major out-of-sample behavior of the climate system—say, the vertical structure of tropical temperatures, or the last millennium’s slow cooling into the Little Ice Age and subsequent recovery.
Now, if you were serious about V&V, there’s one huge, glaring test sitting there like the proverbial elephant in the tropical living room: the vertical structure of warming in the tropics. The greenhouse story doesn’t just say “it warms.” It says it warms in a particular pattern—more in the tropical upper troposphere than at the surface, roughly following a moist adiabatic profile. The models love this. They produce a big warm “hot spot” around 200–300 hPa in the tropics. It’s one of their most robust fingerprints.
The atmosphere, on the other hand, is not impressed. When you look at radiosonde and satellite datasets, the observed tropical troposphere warming is much, much weaker than the CMIP multi-model mean would have you believe, especially in that upper troposphere region where the models are so eager to heat things up. Douglass and Santer pointed this out years ago: model tropical tropospheric temperature trends substantially exceed those from observations, often by more than twice the observational uncertainty, especially over the satellite era.
Later work has tried to soften this clash by waving their hands and invoking “internal variability” (which is code for “some natural cause we don’t understand but we want to sound all sciency about it”) and data uncertainties, but even those authors concede that the models warm the tropical upper troposphere faster than most observational datasets.
If your model family can’t get the vertical temperature structure right in the tropics over the last few decades, where we actually have satellites and balloons, why on earth would you assume that the same model family is capable of delivering tenths of a degree precision on global “human-induced warming”? If the lapse rate feedback and moist convection are off, your global sensitivity and feedback structure are suspect by definition. Yet the IPCC indicator framework just shrugs and marches on, quoting 1.37 °C of “human-caused” warming as though the underlying physics had passed all the obvious tests.
No. Just no.
While we’re on the topic of failed reality checks, let’s talk about the track record of what I call the serial doomcast industry—those dramatic, headline-friendly projections that were supposed to scare the public straight. You’ve seen them: ice-free Arctic summers “by 2020” or “by 2040”, small atolls drowning imminently, millions of “climate refugees”, and on and on. The IPCC paper is full of language about high rates of warming, record extremes, and shrinking carbon budgets.
What it does not do is step back and ask, “How have previous dramatic projections fared against empirical reality, and what does that say about the models and assumptions we’re using now?”
Take the “sunken atolls” claim. For years now, projections built on coupled climate models and a generous helping of arm waving have promised that low-lying coral atolls were about to vanish beneath the waves any minute now—pick your favorite deadline from the 1990s onward, there’s probably a paper or press release for it. Depending on the emissions scenario and the particular brand of coastal model, we were supposed to see islands “disappearing”, nations becoming “uninhabitable”, and whole cultures turned into climate refugees on a tidy timetable suitable for grant applications and NGO fundraising brochures.
Meanwhile, when you stop reading the press releases and actually look at the tide gauges, aerial photos, and satellite imagery, an inconvenient thing happens: the islands mostly refuse to cooperate. Yes, the sea level has been rising. Yes, there are local problem spots. But many atolls have stayed roughly the same size or even grown in area over the last half century or more, because coral reefs build vertically, storms move sediment around, and shorelines are not passive bathtub rims waiting for the water to creep over them.
The “sunken atolls” narrative, sold as an inevitable outcome of modelled sea level rise, keeps smashing into a messy, dynamic reality in which the islands are a lot more resilient—and a lot less obedient to simple doom graphs—than the marketing would have you believe.
That doesn’t mean they are immune to future problems. It does mean that early simplistic models and narratives were systematically wrong about how these systems respond, and climate indicators built on those same assumptions ought to treat that as a serious warning shot.
This is not new news. Charles Darwin pointed out a couple hundred years ago that atolls are created by rising sea levels, not destroyed. And I wrote about it in a scientific journal two decades ago, and detailed it in the post below.
But the IPCC doesn’t even admit its egregious error as an honest group of scientists would do. It just logs sea level rise, says impacts are increasing, and moves on.
At this point, someone will say, “But Willis, those are just details around the edges. The big picture is clear: recent warming is unprecedented, and we know it’s humans.” To which I reply: let’s take a walk through the last thousand years and see how “clear” that really is.
Reconstructions like the Ljungqvist study below show a general pattern: a relatively warm Medieval Climate Anomaly somewhere around 900–1200 CE, followed by a drawn-out cooling into the so-called Little Ice Age, with minimum temperatures roughly in the 17th–18th centuries, and then a warming into the 19th and 20th centuries.

The exact timing and magnitude vary by region, but the broad picture is of a climate system that can cool and then warm over centuries without help from SUVs or Chinese coal.
Now, look at that sequence: roughly speaking, the Earth cooled from about 1000 AD to around 1700 AD, then it stopped cooling, and then, instead of staying cold, it warmed in fits and starts up to the present. The first couple of centuries of that recent warming phase—from, say, 1700 to 1900—cannot plausibly be blamed on CO₂, because the human contribution to atmospheric CO₂ was still small. Even by 1900, concentrations were only modestly above pre-industrial levels.
So something else—some combination of natural changes in solar output, volcanic activity, ocean dynamics, and internal variability—drove a big, slow reversal of a multi-century cooling trend.
Here’s the key point: nobody really knows, in a mechanistic, predictive sense, why that long cooling happened, why it bottomed out when it did, or why the subsequent natural warming unfolded in the stepwise way it did. Reconstructions and model experiments can point to candidate drivers—clusters of big volcanic eruptions, small variations in solar output, maybe some internal modes—but there is no consensus quantitative model that says “if you dial the forcings like this, you reliably get the specific amplitude and timing of the MWP–LIA–modern sequence.”
And it gets worse the further back you look. In paleo times, in general, the planet has been warmer, and sometimes far warmer, than today. Why? Well … we have no clue. And in paleo times, as shown below, there’s little correlation between CO2 and temperature. Temperatures in the Carboniferous Age, for example, were similar to today but with double the CO2 level.

And yet, when it comes to the 1850–1900 baseline and everything after, the IPCC team serenely acts as though that messy, poorly understood millennial to million-year background can be cleanly subtracted off. They define 1850–1900 as “pre-industrial”, assume that the complex multi-century dynamics that led us into and out of the Little Ice Age are adequately captured by their chosen forcing reconstructions and internal variability assumptions, and then confidently assign tenths of a degree of the subsequent warming to “human-induced” causes.
That leap—from “we know temperatures and forcings roughly” to “we can decompose twentieth and twenty-first-century warming into human versus natural with one decimal place of certainty”—is precisely where hubris enters stage left.
The IPCC folks love their numbers. For the 2016–2025 decade, they report observed warming of about 1.26 °C relative to 1850–1900, with human-induced warming at 1.24 °C, and a rate of human-caused warming of about 0.27 °C per decade. For 2025 itself, they peg human-induced warming at about 1.37 °C. The uncertainties they quote are impressively tight—hundredths of a degree here, a few tenths there. Very official. Very polished.
But what are these numbers actually measuring? They’re not “what the climate did all by itself” versus “what humans did” in some controlled experiment. They are nothing but the outputs of a particular simple model family (FaIR) calibrated to a particular set of assumptions about forcings, feedbacks, aerosol cooling, and internal variability.
Change the model structure, and you change the attribution. Include a different representation of low-frequency ocean variability, you change the internal variability that has to be “explained away” by forcing. Adjust the aerosol forcing a bit, and your inferred CO₂-driven sensitivity and “human-induced warming” shift.
None of that structural uncertainty is included in the neat little uncertainty ranges they quote.
So where does that leave us? In my view, with three big takeaways.
First, the Global Climate Change paper is not doing what many people assume it is doing. It is not an independent test of climate models. It is not a fresh attribution study starting from scratch. It is an annual balance sheet inside the AR6 paradigm, taking the structural assumptions as given and reporting updated indicators. If you already believe AR6 has the right model family and the right forcings, IPCC will give you prettier, more up-to-date numbers. If you doubt that the family of models has passed basic V&V, IPCC does nothing to change your mind.
Second, the awkward facts haven’t gone away. The models still struggle to match the tropical troposphere. The last millennium still shows big, poorly explained swings in climate that predate major human forcing. The paleo record shows huge swings in temperature without obvious causes. The doomier end of the projection spectrum—ice-free Arctic dates, simplistic drowning atoll narratives, and the like—has a generally poor empirical track record.
The IPCC framework treats these as background noise rather than as central tests of whether the modeled system is structurally adequate. That’s backwards. In real science, failures are where you learn the most.
Third, and most important, nobody knows enough about the climate system to justify the tone of precision and inevitability that pervades this paper. We know the planet has warmed. We know CO₂ and other greenhouse gases absorb and emit radiation and will tend to warm the surface. We know humans have changed land cover, aerosols, and more.
But we do not have a validated, quantitatively reliable model of the whole multi-century climate evolution, from the Medieval warmth through the Little Ice Age to now, nor of the vertical structure in the key regions where greenhouse theory should shine, nor of the decadal interplay between internal variability and forcing.
Until we do—until climate models are treated like serious engineering tools, forced to pass hard out of sample tests, and until we can reproduce the major swings of the past millennium and the vertical structure of the modern atmosphere without hand waving—it is pure hubris to claim we “understand” today’s climate in the sense implied by single decimal attribution numbers, projections 75 years into the future, and tight remaining budget estimates. We understand some pieces. We guess at others. And we paper over the gaps with annual indicator updates that look very official.
Now, if you can show me a model family that can quantitatively reproduce the last thousand years, the tropical troposphere, and the realized sea ice behavior under clearly specified forcings, I’ll happily update my views. But until then, I’ll keep my skepticism, and I’d recommend you keep yours as well.
Here on the redwood forested ridge I call home, I’ve been having fun. A few afternoons ago, I stepped off a curb and it turned out to be a 24″ drop, not a 12″ drop. As a result, I made a two-foot (60 cm) spectacular faceplant into the street. Got up, dusted off, looked around all embarrassed to see if anyone noticed. They hadn’t, of course, self-importance strikes again. The good news was, no visible damage other than a scratch by the side of my eye.
Imagine my surprise when I looked in the mirror the next morning and I’d become half a raccoon …

Do I know how to party, or what?
My very best to everyone, and my advice?
… look before you leap …
w.
AS USUAL: I ask that when you comment, please QUOTE the exact words you are commenting on. I can defend my own words. I can’t defend your understanding of my words.