Guest Post by Willis Eschenbach
Anthony has pointed out a new paper by McKinley et al. regarding the carbon sinks of the oceans (preprint available here , supplementary online information here). The oceans absorb and sequester carbon from the atmosphere. As usual in this world of “science by press release”, the paper has already been picked up and circulated around the planet. CNN says:
The ability of oceans to soak up atmospheric carbon dioxide is being hampered by climate change, according to a new scientific study.
A fresh analysis of existing observational data taken from locations across the North Atlantic Ocean recorded over a period of almost three decades (1981-2009) has revealed that global warming is having a negative impact on one of nature’s most important carbon sinks.
“Warming in the past four to five years has started to reduce the amount of carbon that large areas of the (North Atlantic) Ocean is picking up,” said Galen McKinley, lead author and assistant professor of atmospheric and oceanic sciences at the University of Wisconsin-Madison.
The lead author says in the press release that things are getting worse … but since it is nearly guaranteed that the paper says something different from the spin the press release authors put on it, what does their paper actually say?
The first oddity about the paper is that they are discussing changes in the partial pressure of CO2 in the ocean (written as “pCO2”). But they’re not actually measuring the pCO2. They are calculating it from the dissolved inorganic carbon (DIC), alkalinity (ALK), sea surface salinity (SSS) and sea surface temperature (SST). Now, this is a standard scientific procedure used to estimate unknown variables in the oceanic carbon balance. But while it is generally a good estimate, it is still an estimate. It is calculated using an empirical formula, that is to say, a formula which is not based on physical first-principles. Instead, an empirical formula uses observation-derived parameters in an iterative goal-seeking algorithm to solve a complex formula.
As you might imagine, different authors use different parameters in the equation. There is a good overview of the function as it is used in the R computer language located in the “seacarb” package. If we take a look at the function “carb” in that package we see that in addition to the pCO2 depending on the variables they have measured, it is also affected by the levels of phosphate and silicate (which apparently the authors have not included). They give details of the different possible choices of values for the various parameters. From the description of the function “carb”:
The Lueker et al. (2000) constants for K1 and K2, the Perez and Fraga (1987) constant for Kf and the Dickson (1990) constant for Ks are recommended by Dickson et al. (2007). It is, however, critical to consider that each formulation is only valid for specific ranges of temperature and salinity:
For K1 and K2:
• Roy et al. (1993): S ranging between 0 and 45 and T ranging between 0 and 45oC.
• Lueker et al. (2000): S ranging between 19 and 43 and T ranging between 2 and 35oC.
• Millero et al. (2006): S ranging between 0.1 and 50 and T ranging between 1 and 50oC.
• Millero (2010): S ranging between 1 and 50 and T ranging between 0 and 50oC.
Millero (2010) provides a K1 and K2 formulation for the seawater, total and free pH scales. Therefore, when this method is used and if P=0, K1 and K2 are computed with the formulation corresponding to the pH scale given in the flag “pHscale”.
• Perez and Fraga (1987): S ranging between 10 and 40 and T ranging between 9 and 33oC.
• Dickson and Riley (1979 in Dickson and Goyet, 1994): S ranging between 0 and 45 and T ranging between 0 and 45oC.
• Dickson (1990): S ranging between 5 and 45 and T ranging between 0 and 45oC. • Khoo et al. (1977): S ranging between 20 and 45 and T ranging between 5 and 40oC.
As you might imagine, results depend on the choice of parameters.
In addition, McKinley et al. do not have observations for all input variables for all periods. Their study says:
For 2001-2007, ALK [total alkalinity] was directly measured. For 1993-1997, ALK was estimated from the ALK-SSS [sea surface salinity] relationship derived from 2001-2006 data (ALK = 43.857 * SSS + 773.8).
I bring these issues with the carbon calculations up for a simple reason—errors. Obviously, when you are estimating a critical value (pCO2) using an empirical formula with a choice of parameter values, with missing observations, and not including all of the known variables, you will get errors. How big will the errors be? It depends on the exact location being studied, the values of the various input variables, and your choice of parameters. As a result you will have to “ground-truth” the formula for the various biomes of interest. “Ground-truthing” is the process of comparing your calculations to actual measurements in the physical locations of interest. Once you have done that you can use the measured error, as well as any bias, in determining the significance of the results.
There is a discussion here of the oceanic carbon calculations, and some graphic examples of both calculated and measured pH, showing the size of the errors in another similar study. See in particular their Figure 1, which shows that errors in the calculation of pH, while generally moderate in size, are pervasive, unpredictable, and at times large.
Whatever the size of the errors resulting from the oceanic carbon calculations, they need to measured against observations in the regions studied, and then described and accounted for in the study. As far as I can tell the authors have not done either of these things.
The second oddity about the paper also involves errors. They have not (as far as I can tell) adjusted their error values for autocorrelation. Autocorrelation is a measure of how much tomorrow’s temperature is dependent on today’s temperature. As you know, warmer days are generally followed by warmer days, and colder by colder. It is unusual to see an ice-cold day in between two warm days.
Since when it is warmer it tends to stay warmer, and when it is cooler it tends to stay cooler (temperature records show positive autocorrelation), this means that the swings in the temperature will be larger and longer than we would find in purely random data. As a result, we need to adjust the calculations depending on the level of autocorrelation, in order to decide if the trends (or the difference between the trends) is statistically significant or not. As far as I can tell, the authors have not adjusted for autocorrelation.
The third oddity is one that I really don’t understand. The authors use a standard method (a “Student’s T-test”) to determine the uncertainty in the two trends, the trend in the pCO2 in the ocean, and the trend of CO2 in the atmosphere.
Then they use another test to determine if two trends (oceanic and atmospheric) are different. From their paper, here’s their description of the test, which contains the reason for the title of this piece, “Lowering the Bar”.
Figure 2. The description of the significance test used in to determine if trends are significantly different or not.
The “p-value” that the authors discuss is a measure of how unusual a result is. For example, if we flip a coin five times and it comes up heads every time, does that mean that the coin is weighted to come up heads? Or is it just a random outcome? The p-value gives us the odds that it was just a random outcome.
In the hard sciences, people like to see a p-value that is less than 0.001 (written as “p<0.001”). This means that there is only one chance in a thousand (1 / 0.001) that it is just a random outcome.
In climate science, the bar is generally lower. A result with a p-value less than 0.05 is regarded as being statistically significant. A p-value of 0.05 means that there is one chance in twenty (1 / 0.05) that whatever you are looking at is just a random fluctuation.
(As a brief aside regarding the use of p=0.05 as significant , consider that a scientist may look at a variety of datasets trying to find the “fingerprint” of a hypothesized mechanism such as anthropogenic global warming. Suppose on the sixth dataset he examines, he finds an effect which is significant at p=0.05. What are the odds that this is a chance occurrence? The odds are not one in twenty, because he’s looked at several datasets, so his odds of hitting a random jackpot have increased. In this case, if he finds it on the sixth try, the odds are already one in four that it’s just random chance, not a real phenomenon. End of digression.)
Now, if I understand what McKinley et al. are saying above (which I may not, all corrections welcome), they are saying that in their study a p-value less than 0.317 is considered statistically significant. But at that level of p-value, the odds of what is observed being merely a random phenomenon, something occurring by pure chance, is about one in three. One in three? … what am I missing here? Is that really what they are claiming? I’ve read the paragraph backwards and forwards, and that’s how I understand it. And if that’s the case, they’ve lowered the bar all the way to the ground.