Argo Notes the Third

I got into this investigation of Argo because I disbelieved their claimed error of 0.002°C for the annual average temperature of the top mile of the ocean. I discussed this in “Decimals of Precision“, where I showed that the error estimates were overly optimistic.

I wanted to know more about what the structure of the data looked like, which led to my posts Jason and the Argo NotesArgo Notes Part Two, and Argo and the Ocean Temperature Maximum.

This is the next part of my slow wander through the Argo data. In How well can we derive Global Ocean Indicators from Argo data? (PDF, hereinafter SLT2011), K. von Schuckmann and P.-Y. Le Traon describe their method for analyzing the Argo data:

To evaluate GOIs [global oceanic indicators] from the irregularly distributed global Argo data, temperature and salinity profiles during the years 2005 to 2010 are uploaded spanning 10 to 1500m depth.

and

To estimate the GOIs from the irregularly distributed profiles, the global ocean is first divided into boxes of 5° latitude, 10° longitude and 3 month size. This provides a sufficient number of observations per box.

So I thought I’d take a look at some gridcell boxes. I’ve picked one which is typical, and shows some of the issues involved in determining a trend. Figure 1 shows the location of the temperature profiles for that gridbox, as well as showing the temperatures by latitude and day. The data in all cases is for the first three months of the year. The top row of Figure 1 shows all of the temperature for those three months (Jan-Feb-Mar) from all the years 2005-2011. The bottom row shows just the 2005 measurements. The following figures will show the other years.

Figure 1. Click on image for full size version. Gridcell is in the Atlantic, from 25°-30°N, and 30°-40°W. Left column shows the physical location of the samples within the gridbox. Colors in the left column are randomly assigned to different floats, one color per float. Right column shows the temperature by latitude. Small numbers above each sample show the day of the year that the sample was taken. Colors in the right column show the day the sample was taken, with red being day one of the year, shading through orange, yellow and green to end at blue at day 91. Top row shows all years. Bottom row shows 2005. Text in the right column gives the mean (average) of the temperature measurements, the standard deviation (StdDev), and the 95% confidence interval (95%CI) of the mean of the temperature data. The 95% CI is calculated as the standard error of the mean times 1.96.

Let’s consider the top row first. In the left column, we see the physical location of all samples that Argo floats took from 2005-2011. We have pretty good coverage of the area of the gridbox over that time. Note that the gridboxes are huge, half a million square kilometres for this particular one. So even with the 216 samples taken over the six-year period, that’s still only one sample per 2,500 square km.

Next, let’s consider the top right image. This shows how the temperatures vary by time and by latitude. As you would expect, the further north you go, the colder the ocean, with a swing of about three degrees from north to south.

In addition, you can see that the ocean is cooling from day 1 (start of January) to day 91 (end of March). The early records (red and orange) are on the right (warmer) side of the graph. The later records (green and blue) are concentrated in the left hand (cooler) side of the records.

This leads to a curious oddity. The spread (standard deviation) of the temperature records from any given float depends on the direction that the float is moving. If the float is moving south, it is moving into warmer waters, but the water generally is cooling, so the spread of temperatures is reduced. If the float is moving north, on the other hand, it is moving into cooler waters, and in addition the water is generally cooling, so the spread is increased. It is unclear what effect this will have on the results … but it won’t make them more accurate. You’d think that the directions of the floats might average out, but no such luck, south is more common than north in these months for this gridcell.

A second problem affecting the accuracy can be seen in the lower left graph of Figure 1. It seems that we have nine measurements … but they’re all located within one tiny part of the entire gridbox. This may or may not make a difference, depending on exactly where the measurements are located, and which direction the float is moving. We can see this in the upper row of Figure 2.

Figure 2. As in Figure 1, with the top row showing 2006, and the bottom row 2007.

The effects I described above can be seen in the upper row, where the floats are in the northern half of the gridbox and moving generally southwards. There is a second effect visible, which is that one of the two floats (light blue circles) was only within the gridbox in the late (cooler) part of the period, with the first record being on day 62. As a result, the standard deviation of the measurements is small, and the temperature is anomalously low … which gives us a mean temperature of 20.8°C with a confidence interval of ± 0.36°C. In fact, the 95% confidence interval of the 2006 data does not overlap with the confidence interval of the mean of the entire 2005-2011 period (21.7° ± 0.12°C) … not a good sign at all

The 2007 data offers another problem … there weren’t any Argo floats at all in the gridcell for the entire three months. The authors say that in that case, they replace the year’s data with the “climatology”, which means the long-term average for the time period … but there’s a problem with that. The climatology covers the whole period, but there are more gaps in the first half of the record than in the latter half. As a result, if there is a trend in the data, this procedure is guaranteed to reduce that trend, by some unknown amount.

Figure 3 shows the next two years, 2008-2009.

Figure 3. As in Figure 1 and 2, for 2008 (top row) and 2009 (bottom row).

2008 averages out very close to the overall average … but that’s just the luck of the draw, as the floats were split between the north and south. 2009 wasn’t so lucky, with most of the records in the south, This leads to a warmer average, as well as a small 95%CI.

Finally, Figure 4 shows 2010 and 2011.

Figure 4. As in Figure 1 and 2, for 2010 (top row) and 2011 (bottom row).

In the final two years of the record, we are finally starting to get a more reasonable number of samples in the gridbox. However, there are still some interesting things going on. Look at the lower right graph. In the lower right of that graph there are two samples (day 71 and 81) from a float which didn’t move at all over that ten days (see bottom left graph, blue circles, with “81” on top of “71”). In that ten days, the temperature in that one single location dropped by almost half a degree …

DISCUSSION.

In this particular gridcell, the averages for each of the years 2005-2011 are 21.4°C, 20.8°C, no data, 21.7°C, 22°C, 21.9°C, and 21.7 °C. This gives a warming trend of 0.13°C/year, as shown in Figure 5.

Figure 5. Trend of the gridcell three-month temperatures

My question is, how accurate is this trend? Me, I’d say we can’t trust it as far as we can throw it. The problem is that the early years (2005, ’06, and ’07) way undersample the gridcell, but this is hidden because they take a number of samples in one or two small areas. As a result, the confidence intervals are way understated, and the averages do not represent a valid sampling in either time or space.

My conclusion is that we simply do not have the data to say a whole lot about this gridcell. In particular, despite the apparent statistical accuracy of a trend calculated from from these numbers, I don’t think we can even say whether the gridcell is warming or cooling.

Finally, the law of large numbers is generally understood to relate to repeated measurements of the same thing. But here, two measurements ten days apart are half a degree different, while two measurements at the same time in different areas of the gridcell are as much as three degrees apart … are we measuring the “same thing” here or not? And if not, if we are measuring different things, what effect does that have on the uncertainty? Finally, all of these error calculations assume what is called “stationarity”, that is to say that the mean of the data doesn’t change over a sufficiently long time period. However, there is no reason to believe this is true. What does this do to the uncertainties?

I don’t have any answers to these questions, and looking at the data seems to only bring up more questions and complications. However, I had said that I doubted we knew the temperature to anything like the precision claimed by the authors. Table 1 of the SLT2011 paper claims a precision for the annual average heat content of the top mile of the ocean of ± 0.21e+8 Joules. Given the volume involved (414e+8 cubic kilometres), this means they are claiming to measure the temperature of the top mile of the ocean to ± 0.002°C, two thousandths of a degree …

As cited above, I showed before that this was unlikely by noting that there are on the order of 3500 Argo floats. If the SLT2011 numbers are correct and the error from 3500 floats is ± 0.002°C, it means that 35 floats could measure the temperature of the top mile of the ocean to a tenth of that accuracy, or ± two hundredths of a degree. This is highly unlikely, the ocean is way too large to be measured to plus or minus two hundredths of a degree by 35 floats.

Finally, people have the idea that the ocean is well-mixed, and changes slowly and gradually from one temperature to another. Nothing could be further from the truth. The predominant feature of the ocean is eddies. These eddies have a curious property. They can travel, carrying the same water, for hundreds and hundreds of miles. Here’s an article on one eddy that they have studied. Their illustration is shown as Figure 6.

Figure 6. Illustration of an eddy transporting water for a long distance along the south coast of Australia.

Figure 7 shows another example of the eddying, non-uniform nature of the ocean. It is of the ocean off of the upper East Coast of the US, showing the Gulf Stream.

Figure 7. Oceanic temperature variation and eddies. Blue box is 5° latitude by 10° longitude. Temperature scale runs from blue (10°C, 50°F) to red (25°C, 77°F). SOURCE 

The blue rectangle shows the size of the gridcell used in SLT2011. The red circles approximate the distribution within the gridbox of the measurements shown in the bottom row of Figure 1 for 2005. As you can see, this number and distribution of samples is way below the number and breadth of samples required to give us any kind of accuracy. Despite that, the strict statistical standard error of the mean would be very small, since there is little change in temperature in the immediate area. This gives an unwarranted and incorrect appearance of an accuracy of measurement that is simply not attainable by sampling a small area.

Why is this important? It is important because measuring the ocean temperature is part of determining the changes in the climate. My contention is that we still have far too little information to give us enough accuracy to say anything meaningful about “missing heat”.

Anyhow, that’s my latest wander through the Argo data. I find nothing to change my mind regarding what I see as greatly overstated precision for the temperature measurements.

My regards to everyone,

w.

Get notified when a new post is published.
Subscribe today!
0 0 votes
Article Rating
69 Comments
Inline Feedbacks
View all comments
GeoLurking
March 1, 2012 7:58 am

Patrick de Boevere [March 1, 2012 at 7:28 am]
“I have learned at school that when experimenting and measuring one should change as little as possible. Why did they not position the floats at fixed anchored positions?”
Cost. That’s a lot of anchoring systems to install and maintain.

March 1, 2012 8:00 am

“The water nearer the equator is generally warmer at any given instant than the water further away.”
Willis, is this always really the case? The equator does not actually receive the greatest total amount of insolation. This is partly because the rate of change of solar angle is greatest at the equinoxes and least at the solstice (when it passes through zero). Think of the simple harmonic motion of a pendulum. As the sun moves from the zenith, the effect of increasing angle is actually outweighed by the length of the day in summer months, up to a certain latitude nearer the tropics.
Offhand I can’t recall the exact latitude of maximum isolation, but I remember a text book indicating that it was a long way from the equator. Also, within the tropics “high summer” is not the same time as locations north or south of the tropics.
I wouldn’t be surprised to see this leaving a temperature signal in surface water.

Judge
March 1, 2012 8:44 am

Robert Brown
The “box” data is the same as in figure 1. It is from one gridcell and only from January 1st to March 31st. This explains why the temperatures “jump” on Jan 1 – they do compared with the previous year’s March 31st. Polar regions are not included as it is only one gridcell.

March 1, 2012 8:58 am

Statistical analysis requires repeated measurements of the same thin under the same conditions.
Statistically speaking, I see a few problems analyzing this data.

Judge
March 1, 2012 9:09 am

Michael Hart
There is a difference between maximum daily insolation and annual mean insolation. Although the maximums will be north and south of the equator, depending on the season, the highest annual mean insolation is at the equator.
Google “insolation as a function of latitude” for better explanations.

Septic Matthew/Matthew R Marler
March 1, 2012 9:19 am

Willis: That’s the lag in the system, which is large in ocean temps because of the heat capacity of the water.
In principle I knew that, but having lived on land all my life (or perhaps merely being ignorant), I thought that the lag was only a month.
I de-anonymized on the other climate blogs I read, so it’s time I do so here as well. I strongly support anonymous commentary in public debates, the heritage of The Federalist and other great examples, but it’s not for me anymore.

Septic Matthew/Matthew R Marler
March 1, 2012 9:36 am

Willis: Thought y’all might enjoy this …
Thanks. I almost recommended more 3D plots, especially for the upper panels of figure 1. Also since you use color to identify floats in figure 1 Upper Left, and color to represent time in figure 1 Upper Right, I thought that you might assign a different plot symbol to each float within each grid cell. But as I wrote, it’s easy to recommend work to others, and I most like to wait and see what you’ll come up with next.
My public LinkedIn profile is here: http://www.linkedin.com/pub/matthew-marler/15/21b/9a9

Septic Matthew/Matthew R Marler
March 1, 2012 10:05 am

Willis: The colors are by float, and correspond to the floats colors in the top left graph in Figure 1.
Very few floats are within the grid cell more than 2 consecutive years. Unless you have reliable knowledge that the Argo floats are exquisitely precisely made, the fact that different floats float into and out of the grid cell confounds year with float. I’d recommend adding the orthogonal quadratic and cubic polynomials of year to the model (just on general principles, because a great many things are adequately modeled over short intervals of time by cubic polynomials). The residuals are surprisingly (to me) large.
It looks from the 3D graph as though day of the year, latitude and year are all confounded. Do you want to play with alternative models? Like, wouldn’t it make more sense to have sin(day) instead of day, and cos(latitude) instead of latitude? Naively, I’d expect changing day to sin(day) would matter more than changing latitude to cos(latitude) because of the fractions of the total ranges involved.
I see these data as being a useful addition for courses on applied data analysis. That’s just one grid cell.
As always, thank you for your work.

Septic Matthew/Matthew R Marler
March 1, 2012 10:12 am

Willis: I’m speaking solely about one gridcell (25°-30°N and 20°-30°W)
The plots have -40°– -30°W, and seemingly reversed. Another “Pfui!”, I wonder?

E.M.Smith
Editor
March 1, 2012 2:08 pm

We have to teach them about eddies? Oh My God…
I guess I spent too much time on the water. I thought everyone knew about eddies and their big brother, gyres. Next thing you know, you’ll be telling me that they don’t sample enough to allow for major ocean hot and cold currents and upwellings… Oh, wait, I think you just did…
“My conclusion is that we simply do not have the data to say a whole lot about this gridcell. In particular, despite the apparent statistical accuracy of a trend calculated from from these numbers, I don’t think we can even say whether the gridcell is warming or cooling.”
SOP for the Warmistas. Take a look at GIStemp where they had 1200 thermometers in GHCN and 8000 grid cells. (From 2007 until 2011 USHCN was not used for anything after 2007, after 2011 they added it back in, but that’s only the 2% of the surface (or gridcells) that are the USA, leaving the rest of the world covered by those original thermometers.
Then again, they’ve now bumped the gridcell count up to 16,000.
Now, “do the math”… If you have 1200 thermometers covering 98% of 16,000 grid cells, the typical grid cell is EMPTY BY DEFINITION. I make that 15680 grid cells for that 1200 or so thermometers. Or 7.7% of the cells have a thermometer in them. 92.3% of grid cells are EMPTY.
From this GIStemp claims to compute the Global Average Temperature by imagination and handwaving… But at least the thermometers only move when they delete one, update it, move it, or build a new airport… Just so broken…
I’d love to do the kind of exposition you did above on the land grid cells, but with so much of it a complete fantasy value I’m not sure where to begin…
IMHO, the entire Global Average Temperature (values and trends) is nothing more than a statistical artifact of measurement error, splice artifacts inside grid cells, and ‘fabrication error’ in the calculation of the phantom grids.

Bart
March 1, 2012 6:50 pm

Robert Brown says:
February 29, 2012 at 2:09 pm
I have always thought the proper way would be to fit a truncated expansion in spherical harmonics, just like we do for gravity anomalies and Earth’s magnetic field.
Philip Bradley says:
February 29, 2012 at 4:52 pm
“Outside polar regions, downwelling water is warmer than upwelling water,and this will cause an increasing warm bias over the life of a float.
Hmm… Sounds like this could be a reason the water below 700m appears to be trending upward, and that potentially invalidates the claim that this supports the notion that the “missing heat” is in the depths. I assume that is what you were getting at, but wanted to spell it out to be sure.

March 1, 2012 9:18 pm

@Judge,
What I should have said is that the eccentricity of the earth’s orbit causes the southern hemisphere to receive about 7 to 9% more solar input during the southern summer which shifts the annual mean away from the equator. [I forgot that I wrote the same thing myself on another blog by Willis only a few days ago]. In the northern hemishere summer I think my original explanation may still hold in at least some of the locations. Finding the mathematically detailed description is not proving easy.
Many internet explanations do not acknowledge this. They tend to use simple illustrations derived from calculated values apparently modelled as a circular orbit. [Models which do not make accurate simulations of reality have been noted in other areas of climatology !]
Have a look at the link below, and scroll down to page 15.
http://www.docstoc.com/docs/9654876/Sun-Insolation

March 1, 2012 11:19 pm

I have always thought the proper way would be to fit a truncated expansion in spherical harmonics, just like we do for gravity anomalies and Earth’s magnetic field.
Yeah, but I suspect that this turns out not to be the case. Spherical harmonics are a vector decomposition based on projection via integration over the sphere. The problem there is that integration over the sphere for spherical harmonics involves spherical polar coordinates with the spherical polar Jacobian again. If you can do the integrals analytically it is great and you can use orthogonality and all that. If you have to do the integrals numerically you are screwed because of the horrible nonlinearities at the poles. You can’t just e.g. cover \phi,\theta with a uniform grid and get a decent result — you’ll be packed in like crazy at the poles and sparse as all hell at the equator. You can do an adaptive quadrature but almost any default gridding will oversample the poles.
I’ve tried things like Gauss-Legendre and Gauss-Chebyshev quadratures, where one basically picks special angles that reduce the number of points you have to evaluate to form the quadratures (via orthogonality) but they don’t work terribly well even for relatively smooth decompositions with \ell not too large. Better than rectangular quadratures I suppose.
Ultimately, spherical harmonics work well when they match the symmetry of the problem, e.g. expanding a multipolar field or potential. Not so well expanding an elephant or Rodin’s thinker or — the Earth’s continents and oceans.
Hence the two general approaches mentioned so far — covering the surface with a uniform tesselation, e.g. a subdivided icosahedron. That basically covers the surface with tiles that are all the same area and shape, and if you divide finely enough you can actually represent a fair amount of detail such as continental shapes and medium-small islands. Or kriging, a method of smoothly interpolating or approximating a random function supported on an irregular grid. Or a combination of the two — coarse graining the samples on the icosahedral grid at some granularity and then using the tile coordinates and centered data to krige a reasonably smooth interpolatory map.
I don’t really know what is best — I just have a pretty good idea of what is the worst, from days spent writing code to numerically decompose things in terms of spherical harmonics where there simply aren’t any really good algorithms for doing so. A good, fairly recent review of the math and methods is here:
http://www.maths.sussex.ac.uk/preprints/document/SMRR-2009-22.pdf
Some of the best (simplest) are basically tesselations, note well. Adaptive cubature/quadrature is just plain difficult for spheres.
rgb

Robin Hewitt
March 1, 2012 11:27 pm

I was wondering about eddies. If a float finds itself in a current does it naturally move to the current boundary and get trapped in the eddies. The stuck float recording different temperatures could be stuck in an eddy between two different temperatures of water, but wouldn’t the regular deep diving free it? Unlike an Argo I am way out of my depth.

Bart
March 2, 2012 1:48 am

Robert Brown says:
March 1, 2012 at 11:19 pm
“The problem there is that integration over the sphere for spherical harmonics involves spherical polar coordinates with the spherical polar Jacobian again.”
The idea would be to get an appropriate functional basis and do a numerical fit of the measurements to that basis. That way, the problems and kluges associated with non-uniform sampling go away. This can be done with least squares techniques to minimize the (possibly weighted) sum of the squared errors between measurements and truncated expansion model evaluated at the discrete set of coordinates where measurements are taken.
I haven’t done it for this application, so I cannot say whether there might not be complications, or that there might not be a more appropriate set of basis functions, but the technique is fairly straightforward. The basis functions used in the spherical harmonic models are the eigenfunctions of the Laplacian operator, and that is what makes them particularly appropriate for gravity and magnetic field modeling. Really, any linearly independent set of functions can be used, but the goal would be to provide the best approximation for a given number of terms in the truncated expansion.
“If you have to do the integrals numerically you are screwed because of the horrible nonlinearities at the poles.”
Not sure if I am interpreting you right. The standard way of dealing with that is to transform from latitude as a variable to normalized vertical height (sine of latitude) as the third coordinate, which is equivalent to the transformation given on page 4 of your link.

March 2, 2012 2:43 am

Robert Brown says: March 1, 2012 at 11:19 pm
“Some of the best (simplest) are basically tesselations, note well. Adaptive cubature/quadrature is just plain difficult for spheres.”

I’ve tried a few of these schemes for surface temperatures with varying success:
1. Equal area cells. It’s the one I regularly use now.
2. Triangular tesselation Voronoi and similar
3. Spherical harmonics. The method is described here. I produce a map every month; here is September 2011, with a spherical projection. And a GISS comparison. Mainly used for display.