pH Sampling Density

Guest Post by Willis Eschenbach

A recent post by Anthony Watts highlighted a curious fact. This is that records of some two and a half million oceanic pH samples existed, but weren’t used in testimony before Congress about ocean pH. The post was accompanied by a graph which purported to show a historical variation in ocean pH.

I was unimpressed by the graph in that post, which seemed simplistic and, well, in a word, wrong. But on the other hand, I certainly found it bizarre and most interesting that someone would throw out that huge amount of scientific data. That was the reason I forwarded it to Anthony, in the hope of unraveling the actual truth of the matter.

So … as is my wont, I’ve now taken a look at the data myself, albeit at the moment a very preliminary look. The data was conveniently provided by a WUWT commenter in .csv format here, my compliments to him for the collation. He also has a good explanation of the process, along with R code. Note that there has been no quality control on the data. About 2% of the surface pH values are well outside the range of oceanic pH, and I removed them before looking further at the data.

Now, the first question I asked was, where were the samples taken? The problem with the graph in the recent post linked to above is that it lumps together samples taken in various parts of the planet. And unless the sampling is uniform in time and space, this is a Very Bad Idea™.

So I made a map that shows where each surface sample was taken. For simplicity, and because this was my first cut, I restricted myself to those samples with a depth of 0 (right at the surface), which are a bit less than a tenth of the total samples. Here are two different views of the same location data.

Sampling Density Map Surface pH Atlantic Sampling Density Map Surface pH

Figures 1a and 1b. Two views of the location of the surface samples of the global pH dataset, centered on the Pacific and the Atlantic. In some regions you can see the tracks of the oceanographic expedition  vessels quite clearly.

Now, I must confess that this was a surprise to me. I hadn’t expected the concentration of samples around Japan, it appears the Japanese oceanographers mush have been quite busy. And I also hadn’t expected the high sample density in the Baltic Sea and the other enclosed seas (the Black Sea between Turkey and Russia, and the Caspian Sea to its right).

Finally, here are the average pH values by gridcell, for the entire period of record

average ph by gridcellFigure 2. Average values of pH by gridcell in the record.

Now, you can see from these maps that we cannot simply put all of that data into a single box and extract a timeline from it.

So … was there “pHraud” in not utilizing this data? I say no, there was no fraud. I say this in part because it’s so difficult to infer intent. Because I have been falsely accused of having bad intent a number of times, I’m sensitive on the subject. I dislike accusations without evidence, and I see no evidence of fraud in this case.

However, it is a huge scientific resource, two million plus pH samples taken by oceanographers over decades, and not using it without some solid scientific reason for ignoring it just doesn’t work for me. What I suspect has happened is that the mass and complexity of the data was too overwhelming, and so the investigators simply put it into the “Too Hard” pile. But that’s just speculation, the real reason may be entirely different. Regardless of the reason, I do think that the authors should have explained their omission.

In any case, that’s the story so far. It certainly appears to me that there is plenty of data there for meaningful time series extractions in some areas. There are, for example, about 400 1°x1° gridcells that have more than a hundred observations per gridcell, and groups of nearby gridcell cells combined have much more data. The North Atlantic and the oceanic area off of Japan seem like they would have more than adequate data for time series extraction.

I may or may not do any followup on this dataset, but I invite readers to use the data for their own analyses.

Regards to all,

w.

ADDENDUM: As usual, I request that if you disagree with someone, please have the courtesy to QUOTE THEIR EXACT WORDS THAT YOU DISAGREE WITH, so that we can all understand the exact nature of your objections.

0 0 votes
Article Rating

Discover more from Watts Up With That?

Subscribe to get the latest posts sent to your email.

319 Comments
Inline Feedbacks
View all comments
January 1, 2015 2:17 pm

Comparing the following Attribution focused Logic Constructs:
Logic Construct ‘A’
– Premise: decreasing the temperature of seawater causes pH of seawater to decrease
– Argued observation: the pH of seawater is decreasing
– Is the temperature of seawater decreasing?
– Arguably we do not know the temperature of seawater sufficiently to answer.
&
Logic Construct ‘B’
– Premise: increasing dissolved CO2 in seawater causes seawater pH to decrease
– Argued observation: the pH of seawater is decreasing
– Is the dissolved CO2 in seawater increasing?
– Arguably we do not know the dissolved CO2 of seawater sufficiently to answer.
John

Reply to  John Whitman
January 2, 2015 4:07 pm

About logic construct B:
There are far more abundant and more accurate measurements for DIC (total dissolved inorganic carbon) than for pH.
All series taken at the same place over time show an increase of DIC in ratio with the increase of CO2 in the atmosphere.
A gridded overview would show that over all oceans…

Reply to  John Whitman
January 3, 2015 9:32 am

Ferdinand Engelbeen on January 2, 2015 at 4:07 pm
“About logic construct B:
There are far more abundant and more accurate measurements for DIC (total dissolved inorganic carbon) than for pH. [. . .]”

Ferdinand Engelbeen,
Comparing the two Attribution focused Logic Constructs ‘A’ and ‘B’.
How can there be more seawater dissolved CO2 measurements than seawater temperature measurements and therefore how can there be sufficient dissolved CO2 measurements if there are insufficient temperature measurements? Logically necessary and sufficient answers accepted.
John

Greg Cavanagh
January 1, 2015 4:11 pm

I’m curious what could cause the nodes of blue in the ocean surrounded by a sea of red. Volcanic, or sea vent?
Also I notice the coast lines are more green and blue and the open oceans, which seems to contradict tty who says “pH over 8.25 is not unusual in areas with high biologic activity where algae “eat” the CO2. Over coral reefs pH can touch 9.0 in late afternoon.” The graph seems shows the opposite. The PH is lower along the coast lines.

Mark
January 1, 2015 5:19 pm

Accuracy of sea water measurements with glass pH electrodes:
http://cdiac.ornl.gov/ftp/cdiac74/sop06.pdf
SOP 6: Determination of the pH of seawater using a glass/reference electrode cell.
9.2.2 Precision (with care) 0.003 pH units (1 SD).
9.2.3 Bias <0.004 pH units.
REPRODUCIBILITY OF pH MEASUREMENTS IN SEAWATER’ (www.aslo.org/lo/toc/vol_11/issue_3/0417.pdf)
"The data obtained by current oceanographic procedures, in which dilute buffers are used, will be shown to be comparable within 0.006 pH units".
So the methods are fine if applied correctly…
HTH

ferdberple
January 1, 2015 5:23 pm

Is there a source where I can download an unadjusted global temperature dataset? Actual temperatures, not averages or anomalies, in an easy to use format. Preferably without any attempt to correct for sample density, time of day, or anything. land and ocean, something like:
location, date, time, temp.
I’d like to try and replicate what I did with ocean pH data, instead using temp data. I’m encouraged that the gridded result was virtually identical to simply treating the raw data as a random sample. thanks in advance.

ferdberple
January 1, 2015 11:03 pm

The gridded result:
Berényi Péter December 31, 2014 at 8:58 am
A preliminary look at the data shows beyond doubt, that ocean pH is decreasing indeed, at a rate of -0.002±0.038/decade. In other words, it is absolutely stable.

delivers virtually the exact same answer as simply treating the data as a random sample, without any gridding, averaging, anomalies, adjustments, etc.
ferdberple December 31, 2014 at 8:29 am

http://oi60.tinypic.com/9s7xvo.jpg

Cliff Mass
January 2, 2015 6:21 am

These scatter diagrams are worrisome. The variance changes substantially in time, suggesting a sampling problem. Why does the recent trend here disagree with the Feely et al results and the Japan results shown above? If the data set was broken up into separate regions (say eastern N. Pacific, western N. Pacific, Atlantic, Indian Ocean, south pacific, south Atlantic), what would the trends and variability look like?
…cliff mass, UW

ferdberple
January 2, 2015 7:40 am

The variance changes substantially in time
============
Not correct. For almost every year the variance well under 0.1 pH, which suggests the samples in the scatter diagram are consistent. Keep in mind these samples were taken from all the oceans over many years, while it would appear that Feely et al and the Japan samples were not.
http://oi59.tinypic.com/95u0ip.jpg

Cliff Mass
January 2, 2015 8:23 am

Ferd
I may not have been clear. Look at the variability of the ph measurements in the figure (ph versus time for all observations). It various hugely—some periods it ranges from 6 to 10, while in other periods (like the recent one) one quarter of that. Isn’t that a problem?..cliff

Lance Wallace
Reply to  Cliff Mass
January 2, 2015 8:50 am

Cliff
The variance is a function of the number of measurements. The graph shows both the number of measurements and the standard deviation of the measurements by year–since the number of measurements sharply declined around 1990 the variance increased.comment image

ferdberple
Reply to  Cliff Mass
January 2, 2015 9:56 am

some periods it ranges from 6 to 10, while in other periods (like the recent one) one quarter of that
===============
You would expect the visual spread to increase as the number of samples increases. For example, 1000 people will be expected to have a wider range of opinions than would 10 people, with the variance unchanged. In this case mostly < 0.1 pH.
The spread is not an issue. pH changes by season, time of day and location. Just like surface temperature. One could go through a complicated statistical process, and try and adjust for these variables, such as done for temperature. For example, by calculating averages, anomalies and grid the results by location. This will make the data all appear to be tightly grouped and very accurate.
But in reality, with enough random samples, the brute force approach should give approximately the same result as the more complicated and detailed approach. As was confirmed in the gridded analysis see: Berényi Péter December 31, 2014 at 8:58 am for confirmation.
There is some small trend in the variance, maybe modern scientists are not quite as careful as in the past.
http://oi57.tinypic.com/211mji1.jpg

Reply to  Cliff Mass
January 2, 2015 4:28 pm

The above trend is not a sample of the real world, as the year by year sampling differs in track, ocean and direction (east-west, north-south) and season.
Compare the cruises for each year:
http://www.abeqas.com/ph-1990s/ and
http://www.abeqas.com/ph-2000s/
Berényi Péter has looked at 5×5 grid boxes over the past decades and found:
If only grid boxes with at least 20 years of data are considered (204 items), ocean pH trend is -0.002±0.031/decade, therefore the null result is reasonably robust.
I have given a try to grid box / month combinations with at least 30 years of data (81 items). The result is +0.003±0.026/decade, therefore the null result is incontrovertible.
The trend measured by Sabine and others over the past 30 years was 0.013±0.007/decade, which is within the error margin of all (GE?) pH measurements, even as the “null result” from Berényi Péter therefore is broken (the pH change caused by CO2 is very small).
That means that the glass electrode pH measurements simply can’t measure the faint change in pH over the past decades (which is what Dr. Sabine said), let it be the even much smaller change per decade over the period 1850-1984.
What I wonder is when the glass electrode measurements were abandoned (for the Hawaii station that was 1992) and replaced by colorimetric or calculated values, also in the ship’s surveys.

Lance Wallace
January 2, 2015 9:15 am

Also, Cliff and Ferd B. and Ferd E.
Something else that changed around 1990 was the sampling depth–many more measurements were taken at depths of 1000 m and more:comment image
Since the Feely data (both measured pH and calculated from dissolved inorganic carbon and alkalinity) covers only the time from about 1988-2007, you can see that coincides with the sharp drop in pH shown by the sampling data from the dataset used by Willis and Ferd B. (and myself). My data above is taken from the dataset with pH values bounded between 7 and 9 (N = 2.437 million). I believe a contributor to the decline in pH beginning around 1988 was the increasing fraction of samples taken at greater depths.
Dore et al shows a rather sharp drop in pH, from about 8.0 at the surface to 7.6 at 1000 m depth, then recovering partially to 7.8 at deeper levels.
https://dl.dropboxusercontent.com/u/75831381/pH%20PNAS-2009-Dore-12235-40.pdfcomment image
If we compare our dataset limited to the period 1988-2007 we get something like the Dore results only not so extreme:comment image
Using all the data, a similar result can be seen.comment image

ferdberple
Reply to  Lance Wallace
January 2, 2015 9:58 am

I only used depth = 0 for my graphs, which show no trend.

Lance Wallace
Reply to  ferdberple
January 2, 2015 10:19 am

If I restrict the data to depth =0, I get N = 197762 with a slope of 0.0009 (SE=0.0003), z = 31 which, although significant, is close enough to your conclusion of “no trend” for me not to disagree.
However, to compare to Feely’s choice of the 1988-2007 time period, the result is N=27325, slope of -0.01025 (SE 0.0003), t-value = -34 (highly significant). And this slope is again much steeper than that provided by Feely.

ferdberple
Reply to  ferdberple
January 2, 2015 11:02 am

Feely’s choice of the 1988-2007 time period
=========
when I restrict the data to 1998-2007 I get a very slight negative trend for depth = 0. I would not consider this significant because the brute force approach is inherently crude. The full set of years showed a slight positive trend which I also did not consider significant.
http://oi59.tinypic.com/o8wk9g.jpg
I didn’t include data for depth > 0 because of the improvements in underwater technology. While surface transport has been largely unrestricted for the past 100+ years, the same cannot be said for underwater work. I would have needed to add a depth adjustment, which I didn’t want to do, because I might then end up measuring the adjustment, not the true trend.

ferdberple
Reply to  ferdberple
January 2, 2015 11:04 am

when I restrict the data to 1998-2007
=====
typo 1988-2007

ferdberple
Reply to  ferdberple
January 2, 2015 11:15 am

And if we extend the data to include the most recent results, the trend all but disappears:
http://oi57.tinypic.com/doxxf7.jpg
And if we remove the 2011 and 2005 outliers, there is no trend:
http://oi62.tinypic.com/xo3x1w.jpg

Lance Wallace
January 2, 2015 10:06 am

Fitting a time series using ordinary least squares (OLS) linear regression is frowned upon by many statisticians, since OLS is known to be biased toward low slopes and high intercepts, and the slope can be strongly affected by outliers. However, one can at least compare the results from this database with those of Feely. It is at once evident that both this database (censored to restrict pH to values between 7 and 9) and the different one used by Feely show a decline in pH between 1988 and 2007. Here are the results:
N = 585083
slope = -0.0113 (SE=0.000047)
z = -242
This slope is much STEEPER than that shown by Feely (about -0.002).
For the full set, the values were
N = 2,437 X 10^3
slope = -0.00178 (SE=0.000010)
z=-185
It seems to me that these data should NOT (as is apparently being advocated by Ferdinand Engelbeen) be ignored. But like the temperature data, the pH data need to be carefully studied and efforts made to apply QA/QC considerations.

ferdberple
Reply to  Lance Wallace
January 2, 2015 10:31 am

If climate scientist are advocating that we ignore this data, because it is not the most accurate available (while ignoring that it is provides the best coverage for time and location) then by the very same argument we should ignore the thermometer data from weather stations and only use the much more accurate satellite data.
but instead what we are hearing is that we should only use the data-sets that co-incidentally show the results that match theory. because the other data-sets must be inaccurate, because they don’t match theory.
that isn’t how science works. you need to include all the data-sets. the uncertainty in your result is determined by the spread between the data-sets.

Reply to  ferdberple
January 2, 2015 4:33 pm

Sabine only said to ignore the data before 1984 as too unreliable.

ferdberple
January 2, 2015 10:13 am

Using all the data, a similar result can be seen.
==============
your result, showing pH of about 8.1 on the surface matches my results.

Cliff Mass
January 2, 2015 11:11 am

It appears that there was a jump in pH during the period when many more observations were available (roughly 1970-1990). Thus, I wonder whether the drop off in pH might be in part a sampling issue. Did the geographical areas of sampling change over time? Was there a change in the geographical distribution as N dropped substantially? And why is mean pH dropping in the 8:50 AM comment but not in the scatter diagrams?

ferdberple
January 2, 2015 11:20 am

And why is mean pH dropping in the 8:50 AM comment but not in the scatter diagrams?
=============
the scatter diagrams are depth = 0 only. The pH drop in the 8:50 comment appears to result from an increase in samples with depth > 0, starting in 1980 and accelerating in 1990.

ferdberple
January 2, 2015 11:23 am

see:

Lance Wallace
January 2, 2015 at 9:15 am
I believe a contributor to the decline in pH beginning around 1988 was the increasing fraction of samples taken at greater depths.

Lance Wallace
Reply to  ferdberple
January 2, 2015 2:36 pm

Well, actually at 10:19 AM I kind of disproved myself, because I regressed the 1988-2007 data on surface measurements only (N=27,000) and got almost exactly the same slope (-0.010 vs. -0.011) as had been found at 10:06 using all the data (N=585,000).

January 4, 2015 2:34 pm

Fascinating discussion. For my part, it is interesting to look at the apparent oscillatory behavior within the limited time span. Once one approaches this from a viewpoint of possible oscillations, then assertions regarding linear trends must share the stage with that. Moreover, by applying a variation of the 10 year moving average, I think I had begun to account for concerns regarding localized and moving data stations, given ocean circulation rates.
In addition, given the reality of ocean circulation and mixing, it is interesting to look at the 1910-1920 data http://www.abeqas.com/global-ocean-ph-measured-1910-1920/
and see how it roughly captures the profiles of later decades (see same site, other decades featured), even though the sampling locations in 1910 – 1920 set were severely limited in geographic extent compared to later decades.
For what it is worth, the relation of the data time series construction for a 10 year trailing average (yta) to the 10 yta for the Pacific Decadal Oscillation (PDO) is of great interest to me.
http://www.abeqas.com/global-ocean-ph-pdo/
In fact it was an inspiration for much of the work I have done this year regarding successful use of the PDO as a hydrologic forecasting tool in the US Southwest. Had I not observed this interesting similarity between the PDO and ocean pH curves (10 yta version), I might not have decided to attempt to produce a hydrologic forecast for my subject area. Now I have a successful forecast under my belt (is not that a possible signature of real scientific progress?) and I now collaborate with Dr. Petr Chylek of LANL on further integration of ocean oscillation based (and other drivers) hydrologic forecasting.
I note finally, that the omission of this data was the primary news feature. I don’t understand how anyone can conclude that the omission actions are remotely defendable. As many have stated, this is wrong, regardless of any opinion one develops concerning this data after the fact.