The Laws of Averages: Part 1, Fruit Salad

Guest Essay by Kip Hansen

1st_Law_of_Averages_sm

This essay is long-ish — and is best saved for a time when you have time to read it in its entirety.  It will be worth the wait and the eventual effort.  It comes in three sections:  a Primer on Averages, a general discussion of Fruit Salad metrics, and a more in-depth discussion of an example using a published study.

NB:  While this essay is using as an example a fairly recent study by Catherine M. O’Reilly, Sapna Sharma, Derek K. Gray, and Stephanie E. Hampton, titled “Rapid and highly variable warming of lake surface waters around the globe”  [ .pdf here; poster here, AGU Meeting video presentation here ], it is not meant to be a critique of the paper itself — I will leave that to others with a more direct interest.  My interest is in the logical and scientific errors, the informational errors, that can result from what I have playfully coined “The First Law of Averages”.

Averages: A Primer

As both the word and the concept “average” are subject to a great deal of confusion and misunderstanding in the general public and both word and concept have seen an overwhelming amount of “loose usage” even in scientific circles, not excluding peer-reviewed journal articles and scientific press releases,  let’s have a quick primer (correctly pronounced “primer”), or refresher,  on averages (the cognizanti can skip this bit and jump directly  to  Fruit Salad).

average_definition1

and, of course, the verb meaning to mathematically calculate an average, as in “to average”.

Since there are three major types of “averages” — the mode, the median, and the mean — a quick look at these:

mode_definition

median_definition

mean_definition

Several of these definitions refer to “a set of data”… In mathematics, a set is a well-defined collection of distinct objects, considered as an object in its own right. (For example, the numbers 2, 4, and 6 are distinct objects when considered separately, but when they are considered collectively they form a single set of size three, written {2,4,6}.)

This image summarizes the three different common “averages”:

average_hypertension

Here we see the Ages at which patients develop Stage II Hypertension (severe HBP – high blood pressure) along the bottom (x-axis) and the Number of Patients along the left vertical axis (y-axis).  This bar graph or histogram shows that some patients develop HBP fairly young, in their late 30 and 40s, after 45 the incidence increases more or less steadily with advancing age to peak in the mid-60s, falling off after that age.  We see what is called a skewed distribution, skewed to the right.  This shewdness (right or left)  is typical of many real world distributions.

What we would normally call the average, the mean, calculated by adding together all the patient’s ages at which they developed HBP and dividing by the total number of patients, though mathematically correct, is not very clinically informative.  While it is true that the Mean Age for Developing HPB is around 52 it is far more common to develop HPB in one’s late 50s to mid- 60s.  There are medical reasons for this skewing of the data — but for our purposes, it is enough to know that those outlying patients who develop HPB at younger ages sk

ew the mean — ignoring the outliers at the left would bring the mean more in line with the actual incidence figures.

For the medically inclined, this histogram hints that there may be two different causes or disease paths for HPB, one that causes early onset HPB and one related to advancing age, sometimes known as late High Blood Pressure.

(In this example, the Median Age for HPB is not very informative at all.)

Our HPB example can be read as “Generally, one begins their real risk of developing late HPB in their mid-40s and the risk continues to increase until their mid-60s.  If you haven’t developed HPB by 65 or so, your risk decreases with additional years, though you still must be vigilant.”

Different data sets have different information values for the different types of averages.

Housing prices for an area are often quoted as Median Housing Costs.  If we looked at the mean, the average would be skewed upward by the homes preferred by the wealthiest 1% of the population, homes measured in millions of dollars (see here, and here, and here).

Stock markets are often judged by things like the Dow Jones Industrial Average (DJIA)  [which is a price-weighted average of 30 significant stocks traded on the New York Stock Exchange (NYSE) and the NASDAQ and was invented by Charles Dow back in 1896].  A weighted average is a mean calculated by giving values in a data set more influence according to some attribute of the data. It is an average in which each quantity to be averaged is assigned a weight, and these weightings determine the relative importance of each quantity on the average. The S&P 500 is a stock market index tracks the 500 most widely held stocks on the New York Stock Exchange or NASDAQ.  [A stock index … is a measurement of the value of a section of the stock market. It is computed from the prices of selected stocks, typically a weighted average.]

Family incomes are reported by the US Census Bureau annually as the Median Household Income for the United States [$55,775 in 2015].

Life Expectancy is reported by various international organizations as “average life expectancy at birth” (worldwide it was 71.0 years over the period 2010–2013).  “Mathematically, life expectancy is the mean number of years of life remaining at a given age, assuming age-specific mortality rates remain at their most recently measured levels. … Moreover, because life expectancy is an average, a particular person may die many years before or many years after the “expected” survival.” (Wiki).

Using any of the major internet search engines to search phrases including the word “average” such as “average cost of a loaf of bread”, “average height of 12-year-old children”  can keep one entertained for hours.

However, it is doubtful that you will be more knowledgeable as a result.

This series of essays is an attempt to answer this last point: Why studying averages might not make you more knowledgeable.

Fruit Salad

We are all familiar with the concept of comparing Apples and Oranges.

apples_and_oranges

Sets to be averaged must be homogeneous, as in comparable and not so heterogeneous as to be incommensurable.

homogeneous_def

hetero_def

Problems arise, both physically and logically, when attempts are made to find “averages” of non-comparable or incommensurable objects — objects and/or  measurements, which do not logically or physically (scientifically) belong in the same “set”.

The discussion of sets for Americans schooled in the 40s and 50s can be confusing, but later, younger Americans were exposed to the concepts of sets early on.  For our purposes, we can use a simple definition of a collection of data regarding a number of similar, comparable, commensurable, homogeneous objects, and if a data set,  the data being itself comparable and in compatible measurement units. (Many data sets contains many sub-sets of different information about the same set of objects.  A data set about a study of Eastern Chipmunks might include  sub-sets such as height, weight, estimated age, etc.  The sub-sets must be internally homogeneous — as “all weights in grams”.)

One cannot average the weight and the taste of a basket of apples.  Weight and taste are not commensurable values.  Nor can one average the weight and color of bananas.

Likewise, one cannot logically average the height/length of a set like “all animals living in the contiguous North American continent (considered as USA, Canada, and Mexico)”  Why?  Besides the difficulty in collecting such a data set, even though one’s measurements might all be in centimeters (whole or fractional), “all animals” is not a logical set of objects when considering height/length.  Such a set would include all animals from bison,  moose and Kodiak bears down through cattle, deer, dogs, cats, raccoons, rodents, worms, insects of all descriptions, multi-cellular but microscopic animals, and single-celled animals.   In our selected geographical area there are (very very roughly) an estimated one quintillion five hundred quadrillion (1,500,000,000,000,000,000)  insects alone.   There are only 500 million humans,  122 million cattle, 83 million pigs and 10 million sheep in the same area.   Insects are small and many in number and some mammals are comparatively large but few in number.  Uni- and multicellular microscopic animals?  Each of the 500 million humans has, on average, over 100 trillion (100,000,000,000,000 ) microbes in and on their body.  By any method — mean, median, or mode — the average height/length of all North American animals would be literally vanishing small — so small that “on average” you wouldn’t expect to be able to see any animals with unaided eyes.

To calculate an average of any type that will be physically, scientifically meaningful as well as logical and useful,  the set being averaged must itself make sense as a comparable, commensurable, homogenous collection of objects with data about those objects being comparable and commensurable.

As I will discuss later, there are cases where the collection (the data set) seems proper and reasonable, the data about the collection seems to be measurements in comparable units and yet the resulting average turns out to be non-physical — it doesn’t make sense in terms of physics or logic.

These types of averages, of disparate, heterogeneous data sets — in which either the measurements or the objects themselves are incommensurable — like comparing Apples and Oranges and Bananas — give a results which can be labelled Fruit Salad and have applicability and meaning that ranges from very narrow through nonsensical to  none at all.

“Climate Change Rapidly Warming World’s Lakes”

This is claimed as the major  finding of a study by Catherine M. O’Reilly, Sapna Sharma, Derek K. Gray, and Stephanie E. Hampton, titled “Rapid and  highly variable warming of lake surface waters around the globe”  [ .pdf here; poster here, AGU Meeting video presentation here ].  It is notable that the study is a result of the  Global Lake Temperature Collaboration (GLTC) which states: “These findings, the need for synthesis of in situ and remote sensing datasets, and continued recognition that global and regional climate change has important impacts on terrestrial and aquatic ecosystems are the motivation behind the Global Lake Temperature Collaboration.

The AGU Press Release regarding this study begins thus: “Climate change is rapidly warming lakes around the world, threatening freshwater supplies and ecosystems, according to a new study spanning six continents.”

“The study, which was funded by NASA and the National Science Foundation, found lakes are warming an average of 0.61 degrees Fahrenheit (0.34 degrees Celsius) each decade. That’s greater than the warming rate of either the ocean or the atmosphere, and it can have profound effects, the scientists say.”

All this is followed by scary “if this trend continues” scenarios.

Nowhere in the press release do they state what is actually being measured, averaged and reported.  [See “What Are They Really Counting?”]

So, what is being measured and reported?  Buried in the AGU Video presentation, Simon Hook, of JPL and one of the co-authors, in the Q&A session, reveals that  “these are summertime nighttime surface temperatures.”   Let me be even clearer on that — these are summertime nighttime skin surface water temperatures as in “The SST directly at the surface is called skin SST and can be significantly different from the bulk SST especially under weak winds and high amounts of incoming sunlight …. Satellite instruments that observe in the infrared part of the spectrum in principle measure skin SST.” [source]   When pressed, Hook goes on to clarify that the temperatures in the study are greatly influenced by satellite measurement as the data is in large part satellite data, very little data is actually in situ  [“in its original place or in position “ — by hand or buoy, for instance] measurements.   This information is, of course, available to those who read the full study and carefully go through the supplemental information and data sets — but it is obscured by the reliance on stating, repeatedly “lakes are warming an average of 0.61 degrees Fahrenheit (0.34 degrees Celsius) each decade.“

What kind of average?  Apples and Oranges and Bananas.  Fruit Salad.

Here is the study’s map of the lakes studied:

lakes_map

One does not need to be a lake expert to recognize that these lakes range from the Great Lakes of North America and Lake Tanganyika in Africa to Lake Tahoe in the Sierra Nevada Mountains on the border of California and Nevada.   Some lakes are smaller and shallow, some lakes are huge and deep, some lakes are in the Arctic and some are in the deserts, some lakes are covered by ice much of the year and some lakes are never iced over, some lakes are fed from melting snow and some are feed by slow-moving equatorial rivers.

Naturally, we would assume, that like Land Surface Temperature and Sea Surface Temperature, the Lake Water Temperature average in this study is weighted by lake surface area.   No, it is not.  Each lake in the study is given equal value, no matter how small or large, how deep or how shallow, snow fed or river fed.  Since the vast majority of the study’s data is from satellite observations, the lakes are all “larger”, small lakes, like the reservoir for my town water supply, are not readily discerned by satellite.

So what do we have when we “average” the [summertime nighttime skin surface] water temperature of 235 heterogeneous lakes? We get a Fruit Salad — a metric that is mathematically correct, but physically and logically flawed beyond any use [except for propaganda purposes].

This is freely admitted in the conclusion of the study, which we can look at piecemeal: [quoted Conclusion in italics]

“The high level of spatial heterogeneity in lake warming rates found in this study runs counter to the common assumption of general regional coherence.”

Lakes are not regionally responding to a single cause — such as “global warming”.  Lakes near one another or in a defined environmental region are not necessarily warming in similar manners or for the same reason, and some neighboring lakes have opposite signs of temperature change.  The study refutes the researcher’s expectation that regional surface air temperature warming would correspond to regional lake warming.  Not so.

“Lakes for which warming rates were similar in association with particular geomorphic or climatic predictors (i.e., lakes within a “leaf”) [see the study for the leaf chart] showed weak geographic clustering (Figure 3b), contrary to previous inferences of regional-scale spatial coherence in lake warming trends [Palmer et al., 2014; Wagner et al., 2012]. “

Lakes are warming for geomorphic (having to do with form of the landscape and other natural features of the Earth’s surface) and local climate — not regionally, but individually.  This heterogeneity implies lack of a single or even similar causes within regions.  Lack of heterogeneity means that these lakes should not be consider a single set and thus should not be averaged together to find a mean.

“In fact, similarly responding lakes were broadly distributed across the globe, indicating that lake characteristics can strongly mediate climatic effects.”

Globally, lakes are not a physically meaningful set in the context of surface water temperature.

“The heterogeneity in surface warming rates underscores the importance of considering interactions among climate and geomorphic factors that are driving lake responses and prevents simple statements about surface water trends; one cannot assume that any individual lake has warmed concurrently with air temperature, for example, or that all lakes in a region are warming similarly.”

Again, their conclusion is that, globally, lakes are not a physically meaningful set in the context of surface water temperature yet they insist on finding a simple average, the mean, and basing conclusions and warnings on that mean.

“Predicting future responses of lake ecosystems to climate change relies upon identifying and understanding the nature of such interactions.”

The surprising conclusion shows that if they want to find out what is affecting the temperature of any given lake, they will have to study that lake and its local ecosystem for the causes of any change.

A brave attempt has been made at saving this study with ad hoc conclusions — but most are simply admitting that their original hypothesis of “Global Warming Causes Global Lake Warming” was invalidated.  Lakes (at least Summertime Nighttime Lake Skin Surface Temperatures) may be warming, but they are not warming even in step with air temperatures, not reliably in step with any other particular geomorphic or climatic factor, and not necessarily warming even if air temperatures in the locality are rising.  As a necessary outcome, they fall back on the “average” lake warming metric.

This study is a good example of what happens when scientists attempt to find the averages of things that are dissimilar — so dissimilar that they do not belong in the same “set”.    One can do it mathematically — all the numbers are at least in the same units of degrees C or F — but such averaging gives results that are non-physical and nonsensical — a Fruit Salad resulting from the attempt to average Apples and Oranges and Bananas.

Moreover, Fruit Salad averages not only can lead us astray on a topic but they obscure more information than they illuminate, as is clearly shown by comparing the simplistic Press Release statement “lakes are warming an average of 0.61 degrees Fahrenheit (0.34 degrees Celsius) each decade” to the actual,  more scientifically valid findings of the study which show that each lake’s temperature is changing due to local, sometimes even individual, geomorphic and climate causes specific to each lake and casting doubt on the idea of global or regional causes.

Another example of a Fruit Salad metric was shown in my long-ago essay Baked Alaska?   which highlighted the logical and scientific error of averaging temperatures for Alaska as a single unit, the “State of Alaska”, a political division, when Alaska, which is very large,  consists of 13 distinct differing climate regions, which have been warming and cooling at different rates (and obviously with different signs) over differing time periods.   These important details are all lost, obscured, by the State Average.

Bottom Line:

It is not enough to correctly mathematically calculate the average of a data set.

It is not enough to be able to defend the methods your Team uses to calculate the [more-often-abused-than-not] Global Averages of data sets.

Data sets must be homogeneous, physically and logically.  They must be data sets of like-with-like, not apples-and-oranges. Data sets, even when averages can be calculated with defensible methods, must have plausible meaning,  both physically and logically.

Careful critical thinkers will be on the alert for numbers which, though the results of simple addition and division,   are in fact Fruit Salad metrics, with little or no real meaning or with meanings far different than the ones claimed for them.

Great care must be taken before accepting that any number presented as an average actually represents the idea being claimed for it.  Averages most often have very narrow applicability, as they obscure the details that often reveal the much-more-important actuality [which is the topic of the next essay in this series].

# # # # #

Note on LOTI, HadCRUT4, etc.:  It is my personal opinion that all combined Land and Sea Surface Temperature metrics, by all their various names, including those represented as indexes, anomalies and ‘predictions of least error’,  are just this sort of Fruit Salad average.  In physics if not Climate Science,  temperature change is an indicator of change in thermal energy of an object (such as of a particular volume of air or sea water).  In order to calculate a valid average of mixed air and water temperatures,  the data set must first be equal units for equivalent volumes of same material (which automatically excludes all data sets of sea surface skin temperatures, which are volume-less).  The temperatures of different volumes of different materials, even air with differing humidity and density, cannot be validly averaged without being converted into a set of temperature-equivalent-units of thermal energy for that material by volume.  Air and water (and stone and road surfaces and plowed fields) have much different specific heat capacities thus a 1 °C temperature change of equal volumes of these differing materials represents greatly differing changes in thermal energy.  Sea Surface (skin or bulk) Temperatures cannot be averaged with Surface Air Temperatures to produce a physically correct representation claimed as a change in thermal (heat) energy — the two data sets are incommensurable and such averages are Fruit Salad.

And yet, we see every day, these surface temperature metrics represented in exactly that non-physical way — as if they are quantitative proof of increasing or decreasing energy retention of the Earth climate system.  This does not mean that correctly measured air temperatures at 2 meters above the surface and surface sea water temperatures (bulk — such as Argo floats at specific depths) cannot tell us something, but we must be very careful in our claims as to what they tell us.  Separate averages of these data sets individually are nonetheless still subject to all the pitfalls and qualifications being presented in this series of essays.

Our frequent commenter, Steven Mosher, recently commented that:

“The global temperature exists. It has a precise physical meaning. It’s this meaning that allows us to say…

The LIA was cooler than today…it’s the meaning that allows us to say the day side of the planet is warmer than the nightside…The same meaning that allows us to say Pluto is cooler than earth and mercury is warmer.”

I must say I agree with his statement — and if Climate Scientists would limit its claims for various Global Temperature averages to these three concepts, their claims would be far more scientifically correct.

NB: I do not think it is correct to say “It has a precise physical meaning.”   It may have a precise description but what it means for the Earth’s climate is far from certain and does not approach precise by any measure.

I expect opinions may vary on this issue.

# # # # #

Author’s Comment Policy:

I am always anxious to read your ideas, opinions, and to answer your questions about the subject of the essay, which in this case is Fruit Salad Averages as defined above.

As regular visitors know, I do not respond to Climate Warrior comments from either side of the Great Climate Divide — feel free to leave your mandatory talking points but do not expect a response from me.

I am interested in examples of Fruit Salad Averages from the diverse range of science and engineering fields in which WUWT readers work.

# # # # #

Advertisements

  Subscribe  
newest oldest most voted
Notify of
Eustace Cranch

as to what they tells us
You seem to be channeling Gollum here… 🙂

MarkW

nasty hobitses

Greg

https://judithcurry.com/2016/02/10/are-land-sea-temperature-averages-meaningful/
Are land + sea temperature averages meaningful?

It is a classic case of ‘apples and oranges’. If you take the average of an apple and an orange, the answer is a fruit salad. It is not a useful quantity for physics based calculations such as earth energy budget and the impact of a radiative “forcings”.

firetoice2014

It is one thing to talk about “data sets” and another thing entirely to talk about “estimate sets” (data sets after “adjustment”).

commieBob

Why would neighbouring lakes have different? Maybe one is covered by algae.
My guess is that the researchers gathered the data, couldn’t find anything useful, and thrashed about with different methods of analysis until they found a ‘significant’ result. It’s similar to the dark chocolate hoax. This XKCD cartoon also explains the phenomenon.
The researchers did a bunch of work. They can’t say they didn’t find anything because nobody will publish that. The result is there for everyone to see.

Count to 10

This is where a principle component analysis would come in handy. If you do it right, you feed everything into it, and it tells you what the important modes of variation are. In the case of global temperature measurements, it would probably make the UHI effect and data tampering pop right out, but you would also get to see how things vary with the season.

Count to 10

Forest: not to my knowledge. Certainly nothing in the hockey stick had anything to do with PCA.

tty

“My guess is that the researchers gathered the data, couldn’t find anything useful, and thrashed about with different methods of analysis until they found a ‘significant’ result. ”
A practice now so common that it has even acquired a name: “significance-chasing” or “p-chasing”.

commieBob

I didn’t know it had a name but, yes, it does. link Scientists admit there’s a problem with science but then you get folks like Neil deGrasse Tyson telling us that we have to accept the authority of science.

tty

“Certainly nothing in the hockey stick had anything to do with PCA.”
You are wrong. The misuse of PCA is central to MBH 98 and MBH 99.

Don K

“Why would neighbouring lakes have different? Maybe one is covered by algae.”
I don’t know, but consider these NWS reports from Lake Champlain (One of their study lakes) yesterday. The temps are daytime and near shore. Yesterday was a clear, sunny, coolish day so clouds probably aren’t a factor.

USGS gage at Rouses Point NY 97.31 feet
USGS gage at Burlington VT 97.36 feet 51 degrees
King Street Ferry Dock 97.46 feet 58 degrees
USGS gage at Port Henry NY 97.48 feet
Colchester Reef 57 degrees
Diamond Island 55 degrees

Quite a lot of variation there on one largish lake.
Note: I don’t know where the USGS gauge at Burlington is located, but it’s unlikely to be even a kilometer from the gauge at the King St Ferry Dock. And Colchester Reef is only 12-15km NorthWest of Burlington. I’m guessing that the Burlington gauge is busted or was misread.

Bruckner8

Although you finally get to it at the end, even your assessment method for defining Fruit Salad Average (FSA) is your own opinion. The people who create metrics such as “Global Mean Temperarure” have a different opinion, obviously. Thus we’ve really gained no ground here.
My foray into global warming skepticism falls along your reasonings, not to mention the act of measurement itself. But many many people in the sciences feel the methods of collection and interpretation are perfectly valid. If they weren’t, this all would’ve collapsed long ago.

firetoice2014

One might refer to an average calculated from a mix of data, “adjusted” data (estimates) and ‘infilled” data (SWAGs) a “dogs breakfast” average, calculated after “processing”.

JustAnOldGuy

I wonder – if we averaged a set composed of ‘dog’s breakfasts’ and ‘dog’s dinners’ could we outline the parameters of a ‘dog’s brunch’? We would have to make sure that all the dogs used the term ‘dinner’ to denote a midday repast. Additionally, does the use of one or the other of these expressions bear any correspondence to the user’s regional or social stratum origin? Anybody got a blank grant application that I could use?

Randy in Ridgecrest

Since forever we have kept horses, and dogs, the phrase instantly conjured images of my dogs (we loved them all!) happily banging out the doggie door first thing in the morning to make a cursory fence patrol then settling into the horse paddock for breakfast.

Ed

While “many, many people in science FEEL [their] methods of collection and interpretation are perfectly valid” doesn’t mean they are valid. Kip’s review/ analysis and lesson is pretty darn good and not just his opinion. I have had to deal with more than one “fruit salad” submitted paper or analysis in my career. We were using the work to make regulations that would dramatically affect lots of people’s livelihoods. The authors would argue vehemently that their analysis was valid. Often it took very little to demonstrate how invalid their “averaging” was.

tadchem

Did you really mean to refer to the “shrewdness” if a skewed distribution? Or “patients who develop HPB at younger ages shew the mean.”?

Clyde Spencer

There is a typo’ (probably autocorrect!) just after the graph on the different averages: “… This shrewdness (right or left)…”

kokoda - the most deplorable

The last sentence in the paragraph below the histogram lists “shrewdness” – I believe you meant skewness.

Leo Smith

Skewidity, surely!
Or perhaps skewedness?
I am shrouded in obliquity today…
🙂

Clyde Spencer

” If you haven’t developed HPB by 65 or so, your risk decreases with additional years, though you still must be vigilant.”
It may well be that those who are prone to develop HPB have been removed from the pool from heart attacks before the age of 65. Thus, those remaining are less likely to develop HPB.
Incidentally, I spotted another typo’. “Skew” was changed to “shew.”

tadchem

Regarding the ‘meaning’ of a global average temperature, in the geophysical and the ecological senses of reality it is meaningless. No physical feature and no biological organism experiences an *average* temperature. All objects experience the local, transient temperature at their immediate location. There is no uniformity. There are only varying degrees of variability. The mesquite bush outside my office window can experience a minute change of temperature from minute to minute as clouds pass over, a small change from hour to hour as storms come and go, a larger change from night to day, and a tremendous change throughout the year. The bush has adapted to these changes and survives them all, and the ‘average’ temperature is irrelevant. It can survive temperatures over 100°F (as it did yesterday) and temperatures under 20° F (as it did 6 months ago). The pertinent life-threatening changes in its environment involve many other factors such as insects, browsing animals, wildfire, flash floods, etc.

firetoice2014

Yes, but could it survive 101.6F and 21.6F (another 1C increment)? “Enquiring minds want to know.”

skorrent1

Yes. One can imagine the limited utility of a number said to be the “average summer temperature” of Honolulu, or the even more limited utility of the “average summer temperature” of Denver; but the “average” of those two numbers has lost all utility. It describes nothing useful.

The biggest fruit salad of them all is … the average person.
This particular “salad” does not even consider any measurable quantities. It’s just a stick figure of the mind.

rocketscientist

Absolutely!! : “The average person has less than 2 legs, 2 eyes or 2 kidneys.”
While technically true it is a meaningless average. The “mode” would be far more meaningful.
“He uses statistics as a drunken man uses lampposts – for support rather than for illumination.”
(Andrew Lang)

AndyG55

“as a drunken man uses lampposts ”
Last drunken man I saw near a lamppost seemed to want to hold it up with a stream of liquid !

gnomish

a stick figure on the serengeti of the mind…lol

Outstanding post.

Hugs

Absolutely yes, well written stuff.

George W Childs

Just one more reason I do not weep when the budgets of government science are cut. Fewer of these ridiculous studies to pollute whatever remains of real science.
Science is hard to do. Good science takes years. These goombas found some data and ran it through their statistical packages that they don’t understand. I hope I didn’t pay for any of it.

John F. Hultquist

Thanks Kip – a very good start.
~ ~ ~ ~ ~
Many years ago as sensors and algorithms (S&A) were being introduced to environmental studies, I wrote an essay for my instructor that included a look to the future.
Prior to S&A, land cover studies involved field excursions – usually students with clipboards criss-crossing an area and writing things like “block A has Pine trees, block B is a grass field, block C is the 3rd fairway.”
During the introduction of S&A, automated reports would be generated and then a “ground-truth” field trip would be conducted to see if the algorithms did, if fact, distinguish between the land cover types.
My paper’s hypothesis was that reliance on satellites to do the work of poorly paid graduate students was not necessarily a good thing.
[PS: On one “ground truth” trip, while a student was intent on recording land cover, a dog walked up and peed on his leg.]

jsuther2013

Kip, the pronunciation is Prime- er, meaning basic or first. Children go to a primary school, not a primmery school. They are prime numbers, not primm numbers; the primary reason, not primmery reason. Jodie Foster couldn’t pronounce it either.

Tom Halla

I have heard “prim-er” for an elementary reader, not “prime-er”, with a long I. The pronunciation is irregular.

Hivemind

That makes no sense, since the purpose of a primer is to prime the student’s reading skills. Like priming a pump. You never prim a pump.

Tom Halla

So what if it makes no sense? That was the pronunciation used in California some 50 years ago.

Clyde Spencer

Kip,
I agree completely with you that there is an overabundance of ‘Fruit Salad’ in climatology papers. I have complained previously here that averaging sea surface temperatures with land air temperatures distorts what is happening. Similarly, I have advocated reporting the global average of diurnal highs and lows individually, rather than a grand average, because there are different processes at work controlling the highs and lows and, again, those processes are hidden by a grand average. Further, I have suggested that land averages be grouped by climatic zones to see if all are responding similarly, which I believe they are not. It is surprising to me just how poor the analysis of temperature data has been, and that so few people have remarked that “the emperor has no clothes.”

Paul Penrose

Quite so.

OweninGA

Clyde,
With the current data quality and infill (guessing) your proposed analysis is likely impossible. We would have to deploy something similar to the Climate Reference Network sites to all the areas considered representative and then wait a century to have enough data to do the analysis. People today don’t seem to have that sort of patience, especially with chicken littles out there proclaim thermogeddon in the next 50 years.

Clyde Spencer

OweninGA,
There is no question that the global temperature data set was never intended for the use that climatologists are trying to apply it to. However, we do currently have some who are using the sparsely-monitored Arctic temperatures to make broad statements about what is happening in the Arctic. I’m suggesting that those who are the ‘professionals’ in the field should be similarly looking at the other climate zones. The data available may not be optimal, but they might still provide some insights, if someone bothered to look. That effort alone might help to provide specifications for what an optimal climate monitoring network might need. Yes, the mentality seems to be one of “Ready, fire, aim.”

Dave Fair

Kip, I had previously commented in other Threads about the absurd “averaging” of SST and LST’s. Additionally, it is passing strange that we find nothing wrong with “averaging” Arctic and Equatorial temperatures; differences in humidity, height of the tropopause, etc.

+ Many. Thanks.
The Global Average temperature and GCM’s are a construct. Maybe even a useful construct but still a “fruit salad” and casting of bones to attempt to foresee the future. What the results of this averaging exercise means will only be known in a few hundred years when folks look back on this period of time with the knowledge of hindsight.

INSIGHT: The Paris Accord is founded on fruit salad.
Bon appétit !

gnomish

careful there- the fruits are using the wrong restrooms…

AllyKat

An example that immediately comes to mind is the so-called “pay gap” between men and women. Claims of a significant difference in pay are based on averaging all salaries of men and all salaries of women. This averaging method does not consider factors such as different jobs (comparing an oil field worker with a fast food worker), different levels of experience, and different hours worked (full-time vs. part-time). No significant gap is found when comparing salaries of men and women with the exact same job, who work the exact same hours, who have the same levels of education and experience, etc.
More social science and political than natural or hard science, but it is a good example of how how to lie with statistics. It is also a good example of how entrenched false claims can become, even when debunked by disparate sources. Even when people know the truth, they may still promote fruit salad based conclusions that “support” their narrative (noble cause corruption). Example: the Department of Labor has tweeted the false statistic as fact, even though a) they should know better, and b) they do know better, as evidenced by their own website’s refutation.

+10….thanks for that. I had been wondering about this claim as it did not seem right, but I couldn’t put my finger on why.

Tom Halla

Very good argument on a basic point. Where I was raised in Northern California, much of the influence on temperature was just how close one was to the ocean, and how many mountains were between you and the ocean. It could be 64F at the coast, 80F in the Santa Clara Valley (one range of mountains), and 95F in the Central Valley (two ranges of mountains). An average temperature would not mean much.

John in Oz

No need to travel so far to disprove the concept of an average temperature.
A comparison of the temps at your house between the front (concrete/hard surface driveway) to the back (grass/soft surface) would suffice.

Tom Halla

“Travel far”? The coast was about an hour and a half travel time away, as is the Central Valley.

Excellent, excellent, excellent.

David Kleppinger

In your first figure you show the mean to be 52. This is the mean of the ages at which at least one patient developed HPB, not the mean of all the patients ages at which they developed HPB. That latter mean would be 57.38 if I read the chart correctly and is actually the more normal meaning of mean.

Points adjacent in space or time are as different as apples and oranges.

Rick C PE

Excellent post Mr. Hansen. Reminds me of this.
A scientist is standing with one arm in a freezer and the other arm in an oven. He says “on average I’m quite comfortable”.

Thomas Homer

“The global temperature exists. It has a precise physical meaning.” – agreed, but we haven’t been measuring the true global temperature, and I’m not sure we can. The true global temperature would be to capture all global temperatures at the same point in time and then average them. And then we need this same capture to happen for each unique surface area that’s in sunlight. If there is a time variance of when each temperature value included in the average is taken then we haven’t achieved the ‘precise physical meaning’ of a global temperature average.
“allows us to say the day side of the planet is warmer than the night side” – it does? Does it allow us to determine how the hemispheres compare? A single ‘global temperature’ value does not allow us to do any of that.
A ‘precise’ global temperature value would allow us to determine if the Earth is warming at the rate of 0.0000023 C degrees per hour to prove that we’ll have 2 C degrees warming in 100 years. I don’t believe we have that level of precision. And since we don’t, any proxy average global temperature value needs to have an error margin included.

To me bigger question is the meaning of the average of a non-linear quantity , temperature , rather than their 4th power , energy — in which computations are linear . That is , is the most meaningful average
temps 4. ^ avg .25 ^
rather than
temps avg
where
: avg dup +/ swap % ;
Where I’ve taken the liberty of expressing the computations in CoSy‘s RPN Forth syntax and left out the StefanBoltzmann constant which drops out anyway because I think the difference in the computations is easily understood however expressed .

BallBounces

Take-away: Global warming is threatening fruit salads everywhere.

Neil Jordan

Thank you for a peek behind the mystical statistical curtain. Let me offer a point of clarification with your description of the HBP figure:
” — but for our purposes, it is enough to know that those outlying patients who develop HPB at younger ages skew the mean — ignoring the outliers at the left would bring the mean more in line with the actual incidence figures.”
The reality of the distribution is indeed negatively skewed, but it is incorrect to arbitrarily lop off the “outliers” to create a classic symmetric distribution with zero skew. If you can justify removing the inconvenient outliers like those on the left, there are statistical methods such as employing Chauvenet’s Criterion:
http://www.statisticshowto.com/chauvenets-criterion/
Some might consider this as putting a mathematical thumb on the scales. Others might consider this as an excuse to clean up messy data instead of doing a better experiment.

Mariano Marini

Another fruit salad is the average of immigrants vs local Inhabitants. Say that the average is 0.3% said nothing about a town of 120 (one hundred twenty) Inhabitants with more than 1000 (one thousand) of immigrant. A real situation here in Italy:

Duncan

Kip,
I hope this is related in my Project Management Experience. My company does this (incorrectly IMO) when calculation GP (Gross Profit) of any particular project. Emphasis is placed on total GP by combining material & labor input ‘costs’ and subtracting them from total contract ‘value’, then dividing to get a percentage (i.e. $2000 value – $1000 labor/material cost = $1000 profit (50%) – Everyone is happy we make 50% profit all the time. The problem with this is combining material costs and labor and equating them both to a dollar amounts. Sure it can be useful as an indicator but it hides the weighting each one provides. Like your example of comparing two lakes of different sizes/depths, what lies beneath the surface is what matters.
I have argued, by combing these different matrix values into one, the output does not provide meaningful information that helps drive decisions. If applying paint onto the equipment, one paint system costs $1000 and takes 32 labor units (say $3200) to apply. Another paint system costs $4000 and only 2 labor units ($200) to apply. Both paint systems cost $4200 to apply but of course the paint system that only took two hours to apply is superior from a productivity standpoint but on paper they become equal when combined.
They also omit another very important variable – linear time. Did I spend 4 hours in one day to make $1000 or did I spend one hour a week over a month (4 hours total) to make $1000. Both on paper make the same amount but have very different implications. I have argued a weighted duration factor needs to be incorporated into our costing model to no avail.
Of course other companies do this analysis much better but not mine. The law of averages with apples and oranges does not apply, the final number then becomes meaningless to make educated decisions/conclusions.

RWturner

Hence the famous saying, “lies, damn lies, and statistics.” Any time I see arithmetical mean being thrown around I think grade school science, yet this grade school science permeates government sponsored green science.
One horrible example I’ve dealt with personally involved the lesser prairie chicken and the pseudoscience of “avoidance behavior.” The FWS used junk science from papers that determined “mean avoidance behavior” of prairie chickens to infrastructure. Basically, they captured the chickens, put tracking beacons on them, and then tracked their movements for a few days. They then selected a point, i.e. a pump jack or wind turbine, and then calculated the mean distance of the bird’s position from this point, and called it mean avoidance. These means were then used to determine buffer zones around certain infrastructure that was purportedly then considered to be completely destroyed prairie chicken habitat, mitigation fees of up to $100,000 per acre were assessed, and then that money was used to bribe land owners into joining a conservation program. Of course, you could search the literature and find myriad reasons why the lesser prairie chicken were in decline, and each one treated equally as pertinent except for the two reasons that could not be extorted for money, drought and booming Asian Pheasant populations.
Most people basically saw this as two things: a way to extort industry (ones that actually create wealth) out of money to be funneled into the sue and settle green groups (most of these people are attorneys collecting HUGE paychecks from these settlements) and a way to fulfill a mandate of Agenda 21 — forcing people out of rural areas and into the cities. Luckily a federal judge in Texas saw it the same way and shut it down.

noaaprogrammer

Also, depending on the application, other types of means may be required such as geometric mean, harmonic mean, etc.

Good post.
Does this mean that the ‘average’ temp of the earth from satellites should be dismissed due to the different nature of the earths surface warming the air above?
Different surfaces, wet forests, dry forests, sand, grass rock etc will all have an effect similar to your differently contoured lakes…..

Brian R

I’ve always found the idea of an average temperature, wither it’s for a country or continent or planet, to be absurd. In such a large system the task of coming up with a single number that represents the whole is a fools errand.

Kip, minor quibble … the first chart (hypertension vs age) is left or negative skewed, not right or positive. The long tail is on the left.

wyzelli

I came looking to make this point and found your comment. I believe you are correct – the ‘skewness’ goes in the direction of the long tail, not the direction of the ‘large bump’ of data.

Stanley Coker

Ian,
Correct another quick way to determine direction of skew is, “The Mean always chases the tail.” The median is resistant to outliers, while the mean is not. I teach it this way: “If you wan to impress someone you are interviewing to move to your company and city, you would say that the Median price of a house is….. and the Mean (average) salary is…..

H. D. Hoese

While you never get too old to be refreshed, one would think that anyone doing science would not need a primer in averages, at least until you get to the statistical study of distributions and other uses like normalization. Noting a problem, about 1990 I started teaching about ‘logical errors’ to biology students, including those in ecology and environmental assessment. These have fancy latin names, most not in much usage as also their english equivalents. It is not difficult to find examples in ‘scientific papers.’ One, of course, it gets you in trouble with linear extrapolations. These should be in basic education and would have nothing to do with politics despite being in common use there. Argumentum ad misericordiam may be part of the climate and other environmental errors.
I have also watched weather forecasters since they appeared on TV, and despite their great improvements you still get average temperatures listed as ‘normal.’ Explaining this, which some do, is a lot easier than the complexities of rip currents, which they locally warn about. “Feel good” temperatures are a imprecise, but useful metric, but that language strikes me as talking down to people.
I will look for ‘fruit salad’ examples, maybe one so old to be putrefied. It would be useful to have a list and analysis of logical errors. I have a list somewhere that I used, but I think I have an error in it.

… just thinking about what someone wrote above:
Points adjacent in space or time are as different as apples and oranges.
This opens up a whole can of worms, which involves making judgments about what constitutes “comparable” and “non-comparable” entities.
We could say that nothing is comparable, right? Or we could say that everything is comparable. What are the formal rules, or do all such judgments involve mere agreement?

Kip,
“Sets to be averaged must be homogeneous”
There is actually nothing wrong with fruit salad. It is a popular and nutritious dish. I like it. The question is, why are you averaging? It is usually to get an estimate of a population average. So it is associated with a sampling problem. That is where homogeneity counts. But inhomogeneity doesn’t mean impossible – it just requires more care to avoid bias.
Polling is a case in point. It does predict elections, mostly. Averaging is useful, and people pay for it to be done well. But the difference between a good and bad polling is in sampling, because the population is heterogeneous. Men and women, say, think differently; that doesn’t invalidate polls, but it means you have to take care to get them represented in the right proportion.
The reason for care is that you easily might have a biased sample because of method. Blood groups are heterogeneous too, but pollsters don’t worry about that, because they have no evidence that it affects opinion. But even if it did, they can’t practically determine it in a polling sample, and it is unlikely that polling methods would systematically favour A over B, say. It would be part of the general randomness.
Homogeneity and sampling error is the basic reason for using anomalies in temperature averaging. Temperatures themselves are inhomogeneous, as you describe in the lake example. And it is very hard to get the sampling proportions right – in fact, it isn’t even clear what “right” is. But when you subtract the mean for each site, the anomaly is much more homogeneous, and good sampling is much less critical. That is just as well, because the ability to choose samples is limited. I write a lot about this on my blog – I set out the sum of squares arithmetic here.

Clyde Spencer

Stokes,
You said, “The question is, why are you averaging? It is usually to get an estimate of a population average.” Why are you trying “to get an estimate of a population average?” Your explanation is circular and doesn’t really explain anything.

“Your explanation is circular”
No, it is standard stats. The arithmetic done is averaging a sample. That is to serve as an estimate of a population mean. That means that you have to define the population (which is said to be inadequately done here), and establish that the sample is representative (or re-weight if it isn’t).

“when trying to “average” heterogeneous objects, there is no logically and scientifically “right”. “
We’re always averaging heterogeneous objects. If they were totally homogeneous, there would be analysis needed. Take the Dow – much quoted by all sorts of authoritative people. It’s an average of averages of all sorts of entities. There is a lot that it doesn’t tell you about details. But few ignore it.
Or take your hypertension example. What is hypertension? It is BP considerably above average. Again, not to be ignored.

Bob Rogers

Regarding the Census —
They report not just median family income, but also average (mean) family income, and median and mean household income. In addition they report on per capita income, and quartile distributions for both families and households.
The point being, that all these different averages, for these various sets, are all useful.
I’ve always thought that knowing the average temperature was far less interesting than knowing the extremes. If it gets too hot my tomatoes will fail, but it’s the temperature extremes that will do them in, not the averages.

Crispin in Waterloo

Kip, what do you think about this issue:
A temperature for a lake surface is probably not the result of a single measurement. It is probably an average of multiple measurements. I work with a physicist who keeps me on the straight and narrow of these matters and he says it is legitimate to average a set of numbers that are themselves averages, only on condition that the results are expected to be the same. So I can take 30 measurements about the lake and average them to get a average temperature for that lake’s surface. Yes or no? Do I expect them to be the same?
If the temperature ‘is expected to be the same’ I can use that average as a if it is a single accurate representation of the lake’s surface, correct?
If the paper claimed to have found an increase or decrease in the average of a number of lakes’ average temperatures, their claim is not very useful because the question is not one that leads to useful information.
However, just looking at the averaging of averages, when is enough averaging too much? In high school we were taught that you cannot average a pair of ratios. Well, averages are ratios. An average of a bunch of ratios each of which is an average of a bunch of other ratios seems to violate a pretty basic rule of mathematics.
If I take a number of measurements of a single lake, do I really expect the numbers to be the same? Does each measurement represent something inherent about the lake? In the shallows the temps will be higher, in the deeper areas, lower. I do not really expect the numbers to be the same. Now I average the numbers anyway, learning nearly nothing about the lake’s heat content, and move on to averaging the final number with those from other lakes.
Do I expect them to be the same? No. They are different lakes. Do I expect the average to represent the lake as accurately (or inaccurately) as the first lake? Nope. It will have a different shallow-vs-deep ratio fundamentally compromising the set’s ‘sameness’.
The average altitude may be different. It seems to be there are too many averages included in the analysis. Is that true?
If the atmosphere were warmer and the relative humidity lower, evaporation from the lakes would increase, lowering the surface temperature. Lake surface cooling could be an indicator of a warming world. I think their calculated number does not contain any information. It is just a number. You could divide it by Tuesday and multiply by purple and get another number. So what? Fruit salad.

Crispin
“he says it is legitimate to average a set of numbers that are themselves averages, only on condition that the results are expected to be the same”
That is the homogeneity issue. It isn’t an absolute condition; it’s just that if you don’t have it, more care is required. To go back to the polling example, you probably don’t expect the same answer asking women as for men, on average. So you make sure you include the right proportions.
“In the shallows the temps will be higher, in the deeper areas, lower. I do not really expect the numbers to be the same.”
They deal with that:
“For each lake, we calculated the temperature anomalies relative to its 1985–2009 mean.”
That is the point of anomalies – you take out the part that you expect to be different (deep, shallow etc). The anomalies you do expect to be the same, from the way you created them. Or rather, the expected value of the differences is zero. From an inhomogeneous set, you create something much more homogeneous.

Crispin in Waterloo

Kip
I work in the horrible world of rating performance, especially efficiencies, which are ratios. People will happily operate a device under 4 different loads, calculate the energy efficiency for each condition, then simple-average the four numbers and claim to have given it an average efficiency” rating. It is just number-mumbling.
Even if there was a duty cycle that involved those particular loadings for that particular duration each, each individual calculation hides so much information the the final ratio of ratios tells us little about how it will perform.
WRT the lakes, it is obvious that the intent is to communicate that the entire lake temperature has changed, the enthalpy has risen or dropped, using the surface temperature as a proxy. It is nearly useless as a proxy for communicating, ‘Warming is upon us!’ Maybe I can go so far as to say it is literally useless.
If you want to estimate the heat content of a lake, you could do it with a reasonable number of measurements, and set a target error band. Suppose you accepted to have a number with an uncertainty of 50%. Let’s say the heat content is 1 TJ, ±0.5 TJ. Anyone reading this think they could get that from a satellite surface temperature reading?
Nick suggests it is all about anomalies. I understand the concept so no need to repeat. To get two readings a year apart and then claim that the bulk temperature of the lake was known to within 50%, and then claim that the anomaly was meaningful would require one helluva a difference in surface temperature. I am not sure if most of the readers will know immediately why, so give me one more paragraph before the numbers flow.
If the anomaly is smaller than the uncertainty of the total energy content, the anomaly tells us exactly nothing. If the surface temperature one year is 10 and the next year it is 11, and the uncertainty about the total energy stored in the lake is 15%, one cannot claim to have detected a difference so the proxy fails.
How good would the total energy number have to be in order for a 1 deg anomaly to mean anything real?
10 C = 283 K
11 C = 284 K
The heat content increase (assuming the lake contains exactly the same amount of water both years) is 0.35%. How many readings would one have to make in a lake to get the heat content to within 0.35%? That is 1 TJ ±0.0035 TJ. One part in two hundred and eighty-three. One?!?
Similarly, a measured increase of 0.1 degrees would have to be backed up with a way of modeling the total energy to another significant digit. 1 TJ ±0.00035 TJ. Good luck with that. If the limit of determination is 200 times the anomaly, you can claim absolutely nothing. Think about an ‘average global temperature’ increase of +0.001˚ ±0.2.
It is clownish efforts like this which render ‘ocean acidification’ numbers meaningless. The pH of the ocean can change 1 unit in 2 metres. Collect a trillion well-distributed one-off pH measurements. What is the uncertainty? Do it again after ten years. What’s the change and the uncertainty? Can’t claim a thing.

“The heat content increase (assuming the lake contains exactly the same amount of water both years) is 0.35%.”
This is nonsense. On this basis, you could never know the temperature of anything. You couldn’t measure if your personal heat content increased by 1%, right? It couldn’t have any effect, could it?

Clyde Spencer

Generally when there are questions about what something means, definitions are a good place to start.

Dr. S. Jeevananda Reddy

Lakes temperature vary with water content in the lake at the time of measurements. Water in the lake vary with rainfall and silting and as well encroachments. Thus, averaging of lakes temperature, really has no meaning like our global average temperature anomaly. They should be studied individually and characterise the changes in temperature of that lake, will have some meaning.
Hyderabad [where I live] get water from two reservoirs [Himayatsagar and Osmansagar]. The rainfall shows a year to year variation with little or no trend but the inflows show a gradually decrease. Lakes are now getting wastewater and thus changing the chemistry of lake water and thus temperature — this is not associated with climate but human greed.
Dr. S. Jeevananda Reddy

An aside Dr. S.J; I worked many years in water resources and sewage treatment. In several cases, adding properly treated effluent to receiving waters actually IMPROVED the water quality and stabilized the water levels. Waste water used for agriculture and street boulevard irrigation are also a good applications that have been used for decades. Yes, removing water from rivers for irrigation can be a big problem. Hopefully we can learn a bit as we go along. The other side of the argument your comment reminds me of, is the claim that oceans are rising because of our “mining” of ground water resources. I doubt that the ocean is rising from this source as we also have some folks claiming that increased rainfall is depleting the water in the ocean. Seems each issue can be argued several ways.
The point of my comment is that we can be hopeful for the future in spite of the alarmists out there. There will always be alarmists about something.
Pessimist: The glass is half empty.
Optimist: The glass is half full.
Engineer: Use a smaller glass.

Dr. S. Jeevananda Reddy

Wayne Delbeke — I presented a factual information.
I live in Hyderabad city. Here sewage of 2000 mld is generated [during rainy season, rainwater is added to this]. Government established STPs to treat around 700 mld but in reality treatment is carried over of less than half. The untreated and treated sewage along with untreated-partially treated industrial effluents join the River Musi. Using this water food-milk-meat is produced and supplied to city. The lakes in city are cesspools of poison as sewage is directly released in to them.. I am one of the few environmental groups fighting against this menace with no success. The success depends upon the governance.
Dr. S. Jeevananda Reddy

Don K

Another interesting and thoughtful essay. I’m fine with your major points. Some rather scattershot observations, none of which really affect your arguments/conclusions much.
1. There’s what looks to be a good article on lake surface temperatures here. http://faculty.gvsu.edu/videticp/stratification.htm If I understand it, the “nightime skin SST temperature” is the temperature of the epilimnion — the well oxygenated surface layer which is well mixed by winds (and in windless periods …?) Anyway, I’ll just point you to the article.
2. There are a number of different means computed in different ways, some of which are actually useful. The one we are all familiar with is the arithmetic mean. But sometimes the geometric mean or the harmonic mean are more appropriate. There are probably yet other means that might be appropriate in some cases.
3. The hypertension chart is certainly an example of a messy data set, and it may well be an example of a collection of data that is not meaningfully analyzable with the usual tools. It may also be an example of how not to handle data if you want useful results.
Blood pressure is easily measured with non-intrusive instruments (and also easily mis-measured, but that’s another story). As a result it has been the subject of tens of thousands of published papers. I’ve read a fair sampling of the papers over the years, and have concluded that for the most part those folks don’t seem to have the slightest idea what they are about. They seem for the most part to be treating something that is often a symptom as if it were a disease. Where it is recognized as a symptom, I don’t have any problem with treating underlying causes of hypertension like kidney disease. That’s a great idea where it works. But most treatment appears to consist of cosmetically using drugs to reduce blood pressure with no idea of whether reduction is necessary, if/why reduction has any useful effect, or even if reduction is a good idea.
4. I don’t have any argument with the notion that planetary temperature is a badly defined and non-rigorous metric. If nothing else, it’s far too dependent on the state of ENSO to be treated with the respect it is given politically. But we probably do need a simple metric for the state of planetary glaciation and it’s not that easy to see a better alternative.

Crispin in Waterloo

Don K
You can average averages in some cases using the harmonic mean. A great example for teaching is ‘average miles per gallon’ v.s. ‘average litres per hundred kilometers’. They have to be calculated differently.

Don K

Afterthought: Reading randomly through the excellent commentary, it seems to me that much of the discussion is about sampling theory rather than averages per se. Ideally a sample is representative of the whole and large enough not to be skewed too badly by outliers. That’s simple in theory, but a nightmare in actuality.
Another afterthought. What we want to know is rarely something we can measure directly. So we measure what we can and hope it is close enough to what we want to know to produce a useful result. e.g. We measure the night time surface skin temperature of a spot somewhere on a lake and hope it is fairly representative of the temperatures of the lake as a whole. (And, BTW, what one gets from MODIS looks to be be a set of “mechanical” averages over a region somewhat larger than 1km diameter at slightly different times and locations on different days).
Are you planning to address sampling theory and the delta between what we can measure and what we want to know in future essays.?

K. Kilty

Very nice essay, Mr. Hansen. You did omit one additional definition of “average” that I always present to my engineering students, and which is pertinent to the concept of average Earth temperature–the midpoint of the maximum and minimum.
Also the phrase ” In order calculate” in your summary paragraphs ought to be ” In order to calculate”

Mark T

“The global temperature exists. It has a precise physical meaning.”
The mere fact that it exists does not mean that it is knowable. It is true that the “real” temperature has precise physical meaning, but it is not true that the average calculated from the data we have has even marginal physical meaning.
Averaging intensive variables is always a problem when you don’t know how much stuff there is. The lake example illustrates this problem quite well. It is much easier to see when you have to very disproportionate areas (volumes) that you are averaging. Area A1 has temperature T1, and area A2 has temperature T2, but area A1 is much larger than area A2. The “average” temperature is much closer to T1 than T2 (assuming they are different values as well). The same is true for averaging temperatures within the atmosphere.
Now we add in the problem of incomplete sampling and the precision becomes even more ambiguous. How can we know what the temperature is when we only cover a portion of the planet with sensors (a portion that does not properly meet sampling theorem limits)?
Then we have to consider the truth that our atmosphere’s temperature, whatever it is, changes constantly. What does an average mean when it is fluctuating non-randomly (and randomly) over time?
In general, averages don’t always (or often) have physical meaning. Rarely does the mean and meaning exist in a set itself (only the mode is guaranteed to be a member of the set). That there is a temperature and it has a precise physical meaning is meaningless, so claiming this as some sort of “truth” is silly. It allows us to make general statements like “it is warmer on the planet earth than it is on Pluto” and not much more.

Clyde Spencer

Mark T,
I commented above about definitions. To follow up on that and respond to your comment, if we are trying to determine the average temperature of Earth, and we know that we don’t have complete or proper sampling, then we cannot compute what we are looking for. At best, we can claim to have computed an average temperature of the available data set, which almost certainly isn’t the average we are looking for.
The mode may not even be a member of a set because it may depend on how we bin the data. Also, there may be a tie for the most frequent value(s).

The mean of min and max temperatures is that of the probe that is in completely different environments for each measure. When I became interested in this a few years ago, I found it strange that people were studying the trend in the global mean of these. I found it a complete joke that its pretty much the mean of infilled data obtained from actual measurements.
Surely as an indicator of very small change, you would look at the trends in the data that you have, only. If you can’t get something meaningful from that, you are not getting anything from a trend in global average if infilled data.

Explains why we end up with applanges and orapples.

This was a wonderful essay. Truly useful and a keeper for me.
I have dealt with these issues since I taught my first math class to young lads and lasses who where more interested in getting laid than in statistics. (although, statistics could be useful to them even in that endeavor one would think)
I can not even wrap my head around “averaging” temperatures on this planet since it is the energy content that matters. Water, rock, and air at the same temperature would have different energy content, or so my science teachers said back in the early 70s. Do we still admit that even with climate alarmism being the driving force behind government science today?
Loved the article, looking forward to the next one.
~ Mark

Clyde Spencer

Mark,
If the ocean and air temperatures were reported separately, then they would serve as proxies for the energy in the two domains. However, with something like a 4-fold difference in the heat capacity of water and other materials, averaging the temperatures together is comparing avocadoes with fruit salad.

Agreed. And a great turn of phrase there. 🙂

They would be somewhat better proxies – but, in my opinion, still not very good ones. The energy content of a cubic meter of air at 70% saturation is very different from that of a cubic meter of air at 7% saturation, even if they have the same temperature. Unless you are very careful in your selection, a cubic meter of seawater in one place has a different chemical composition than a cubic meter in some other place – with a corresponding difference in energy content even if at the same temperature.
Separating water and air only removes the biggest energy content differences – and we are commonly talking about rather small energy differences when saying something has “warmed” or “cooled” by tiny fractions of a degree.

About 1% change in salinity makes a 0.1% change in specific heat capacity. Not a lot but 0.1% of 300k is 0.3 degrees. How much has the average ocean temp changed? Something like 1/100 the of that in the Argo data?

The problem I have with averaging is that it eliminates all of the specific information that might have been useful in projecting possible future scenarios for any given area. Climate by definition is regional. It consists of a range of weather patterns and temperatures common to a particular geographic area. Once temperatures all over the globe have been averaged together, it is then impossible to tease out any useful information about what that might mean for the climate, or even weather over a particular region in the future. It would be like averaging together all of the baseball scores over the 20th century and comparing it to the ongoing 21st century. If the average goes up, or down, or stays the same…what does that mean for next week’s score at the local stadium?
A global average is not useful information, especially since the area being measured isn’t the whole globe, but just an arbitrary 2 meters above the surface, except when temperatures measured outside that sliver of the world, deep in the oceans, or under the arctic ice, might help support someone’s contention.
Climate must be studied regionally to remain useful. Averaging seems to primarily be used to con people into thinking that if something scary is happening somewhere, it must be happening a little bit, close to where you live.

Clyde Spencer

Hoyt,
Amen!

“The global temperature exists. It has a precise physical meaning” No, it does not. For very simple systems it can be shown to be nonsense. Cooling systems would show pseudo-warming, warming systems would show pseudo-cooling (that is, ‘cooling’ based on the global pseudo-temperature nonsense). As for ‘warmer’ and ‘cooler’ that works only for largely non-overlapping range of temperatures. If they overlap a lot, it becomes nonsense.

Jeff Alberts

“The global temperature exists. It has a precise physical meaning”
Actually that statement is from Mosher, in a comment reply to me in another thread.

1sky1

Surprisingly, the averaging that is most often used with climate data, i.e., the mid-range value produced by the mean of daily Tmax and Tmin, is nowhere mentioned in this far-ranging tract. That confounding average, along with the mixed-type, heterogeneous temperatures (1.5-meter dry-bulb, SST) used in manufacturing “global” indices, is what renders the fruit salad so scientifically sour. And that bad taste is only multiplied by various ad hoc adjustments that provide the salad dressing.

Todays recorded temperature that will become part of the historical record will begin at 4 pm my local time starting in Greenwich and the last temperature of “today” will be recorded well after my local day has ended (midnight). So even the average global temperature for a day doesn’t include a real day.

tony mcleod

Happy to see you write:
“I expect opinions may vary on this issue“.
Your opinion seems to be that “fruit salad” averages are useless. My opinion differs.
Taking a human patient’s temperature is analogous. It varies from person to person, there are all sorts of errors, it tells us nothing about fingers and toes, but it is still pretty accurate and quite useful. If it starts ‘trending’ – there may be a problem.
My opinion is that, likewise, global averages are pretty accurate and quite useful if they reveal a trend.
The problem many here have (imo) is not so much with how it is determined but that it appears to be trending in a certain direction. This post would not exist if they were going down.

Tony,
Taking a person’s temperature is only analogous to local sampling to determine what is going on locally. You would never take the temperature of 1000 random people and use the average to determine the health of the whole human population. Calling a global average “accurate” doesn’t make a lot of sense because a global average isn’t an actual measurable thing. It is a calculation. If it trends up, something might be happening. If it trends down, something might be happening. But if it stays the same, guess what, something might be happening. If the Earth began to exhibit wider extremes in temperature, say 20 degrees hotter in the day and 20 degrees colder at night, that would be something big, but it might average out to exactly the same average temperature we have today. On the other hand, if the Earth became less extreme, and temps were 30 degrees less warm in the day, and 30 degrees warmer at night, It could still theoretically average out the same, or with a trend up or down. That’s how averages hide what might really be going on.
If I wanted to, I could sample the light coming from a rainbow and register the wavelengths of all of the different colors of light and average them together. I would end up with an average wavelength that would correspond to a specific color, but calling that average color accurate is nonsense because calculating the “average color” of a rainbow is completely meaningless. Think of the Earth as a rainbow of temperatures. Averaging them all together is not data, and has no meaning.
In simpler terms, suppose I offered you one of two cups of coffee and you asked me if it’s hot. I might say that the average temperature of the two cups is 100f. Which cup do you choose and why?

tony mcleod

It is analogoes to one person in that each individual is made up of countless sytems and cells and that a temperature meaurement of say 37.6 is just an otherwise meaningless number. If it begins to change – that is indicative and thus meaningful, as is the rate of change.
“On the other hand, if the Earth became less extreme, and temps were 30 degrees less warm in the day, and 30 degrees warmer at night, It could still theoretically average out the same, or with a trend up or down. That’s how averages hide what might really be going on.”
Perfectly true, but if the average is moving it incicates something probably IS going on.

tony mcleod

“A changes in temperature, which you call a trend, only tell you that the trend (by whatever method you used to measure, compute and determine trend) exists for the period of time you have selected.”
Hmm. A trends indicate a trend.
There is an upward trend. In conjunction with other data like sea-ice area –comment image
-a sensitive yet slow-moving indictor of wider area temperatures, it may be indicative of a warming ocean/atmosphere.
(On a side-note: maximum area moving from early November to mid-June. A new, geo-engineered dipole?)
I don’t think it average temperature needs to be dicarded.
“we have simply no idea”
What should we be measuring to find out? Do you think it matters?

Chimp

Tony,
Like all cultists of your ilk, you fail to realize that trends of ten, 20 or 39 years mean nothing in terms of climate, which is naturally cyclic.
There is zero evidence supporting the repeatedly falsified, baseless assertion that humans are responsible for whatever warming might actually have occurred between 1977 and late in the last or early in this century. And all the evidence in the world that mankind had nothing whatsoever to do with it, as it was well within normal bounds.
Let alone whether slight warming, from whatever cause, is good or bad. So far more plant food in the air has been greatly beneficial.
And please explain why for most of the postwar period, during which CO2 has steadily risen, global temperature has either fallen dramatically or stayed about even. Thanks.

Crispin in Waterloo

Tony M
All measurements are accompanied by an uncertainty. That is a physical fact. The true vale of any measurement is not knowable. When the difference between two measurements is smaller than the uncertainty in each of them, no claim for ‘difference’ can be sustained.
It gets worse. If the difference is not three times the limit of detection, by convention it is not accepted as being different. Consider the ocean temperature-based claims for heat content. The measurements are made to 0.01 degrees C. The uncertainty is 0.02 degrees. The claim that the oceans were 0.001 degrees warmer than ‘last time’ is unsustainable, based on the limit of determination and the uncertainty.
Now, check the measurements for the average temperature of the world that underpin the claims that 2015 was hotter than 2014 and 2016 was hotter than 2015. The claimed increase is an order of magnitude less than the uncertainty. And on this we should spend 100 trillion Dollars, just in case.
End of short, stupid story.

tony mcleod

Yes, if you only consider 3 years then it could be called a short, stupid story.
You could say hmm, its rising at .1C/decade for a hundred years, wuwt rate of change?

Crispin in Waterloo

Tony M
Perhaps you could amend you question to include the uncertainty. I know the ordinary man-in-the-street does not think this way, but experimentalists do.
You could say hmm, its rising at .1C ±0.5/decade for a hundred years, wuwt rate of change?
That gives 1.0 ±0.5 degrees per century. So what? Since when has warmer temperatures over the whole globe, accentuated in the Arctic, been a cause of loss and misery? It is easy to show that the opposite, cooling of 1C, has been.
8000 years ago the world was about 3 degrees C warmer than now. The Sahara and Gobi deserts were grasslands. The Arctic was largely ice-free in summer. That’s good medicine. Swallow it. It’s good for you.
If we are in for 3 centuries of warming at 1 C per century, it will be fabulous. I see no hope of that at present because we are going into a pretty steep decline, but hopefully the sun will wake up in a few decades. In the medium term, 5000 years, we are fatefully headed into the next ice age.

tony mcleod

“That’s good medicine. Swallow it. It’s good for you.
If we are in for 3 centuries of warming at 1 C per century, it will be fabulous.”
…for modern human’s complex, entangled civilization which have developed within a very narrow range of conditions?
Oh good, can I see the results of the experiment that leads you to this incredibly confident, incredibly optimistic prediction?

1sky1

My opinion is that, likewise, global averages are pretty accurate and quite useful if they reveal a trend.

This blind article of faith about GAST anomalies is wrong on two levels:
1. Accuracy is nowhere scientifically tested, let alone validated, in compiling global indices. What is mistaken for accuracy is the reproducibility of results from effectively the same unevenly-distributed data base of deeply flawed and variously biased records. This leaves unanswered the critical question of how representative is that data base of the actual field of temperatures on the globe at any given time.
2. Trends are useful only if they are truly secular and pertain to the same location. Given the presence of strong oscillatory components in many local variations of temperature that reach well into the centennial range, shorter-term “trends” are highly changeable and offer no predictive value. Moreover, none of the indices that purport to represent GAST keep the measurement locations fixed throughout the entire duration of the index. That’s analogous to diagnosing a developing fever by comparing the temperatures of different patients.
The popular indices (which have steadily minimized the effects of multidecadal oscillations and increased the steepness of “trends”) have been elevated to the status of iconic measures precisely to ingrain such blind faith in the scientifically unaware.

There is data: 5C.
There is fact: the thermometer reading is 5C.
There is meaning: it is a cold Calgary June afternoon in our backyard.
There is implication: I’d better put a sweater on.
The climate change frenzy seems to me to often result from conflating data with implication. A Fruit Salad average is just data, but the warmists see it and say, “There it is! The end of the world!”

jorgekafkazar

A couple of typos:
“Lakes (at least Summertime Nighttime Lake Skin Surface Temperatures) may be warming, but they are not warming even in step with air temperatures, not reliably in step with any other particular geomorphic or climatic factor….”
S/B: nor reliably in step
“…Moreover, Fruit Salad averages not only can lead us astray on a topic but they obscure more information that they illuminate…”
S/B: than they illuminate