The Interpretation of Interpolation

Guest Post by Willis Eschenbach (see Arctic/Antarctic update at the end)

Over in the comments at a post on a totally different subject, you’ll find a debate there about interpolation for areas where you have no data. Let me give a few examples, names left off.

Kriging is nothing more than a spatially weighted averaging process. Interpolated data will therefore show lower variance than the observations.

The idea that interpolation could be better than observation is absurd. You only know things that you measure.

I’m not saying that interpolation is better than observation. I’m saying interpolation using locality based approach is better than one that uses a global approach. Do you disagree?

I disagree, generally interpolation in the context of global temperature does not make things better. For surface datasets I have always preferred HadCRUT4 over others because it’s not interpolated.

Once you interpolate you are analysing a hybrid of data+model, not data. What you are analysing then takes on characteristics of the model as much as the data. Bad.

How do you estimate the value of empty grid cells without doing some kind of interpolation?

YOU DON’T! You tell the people what you *know*. You don’t make up what you don’t know and try to pass it off as the truth.

If you only know the temp for 85% of the globe then just say “our metric for 85% of the earth is such and such. We don’t have good data for the other 15% and can only guess at its metric value.”.

If you don’t have the measurements, then you cannot assume anything about the missing data. If you do, then you’re making things up.

Hmmm … folks who know me know that I prefer experiment to theory. So I thought I’d see if I could fill in empty data and get a better answer than leaving the empty data untouched. Here’s my experiment. I start with the CERES estimate of the average temperature 2000 – 2020.

Figure 1. CERES surface temperature average, 2000-2020

Note that the average temperature of the globe is 15.2°C, the land is 8.7°C, and the ocean is 17.7°C. Note also that you can see that the Andes mountains on the left side of upper South America are much cooler than the other South American Land.

Next, I punch out a chunk of the data. Figure 2 shows that result.

Figure 2. CERES surface temperature average with removed data, 2000-2020

Note that average global temperatures are now cooler with the missing data, with the globe at 14.6°C versus 15.2°C for the full data, a significant error of about 0.6°C. Land and sea temperatures are too low as well, by 1.3°C and 0.4°C respectively.

Next, I use a mathematical analysis to fill up the hole. Here’s that result:

Figure 3. CERES surface temperature average with patched data, 2000-2020

Note that the errors for land temperature, sea temperature, and global temperature have all gotten smaller. In particular, the land error has gone from 1.4°C to 0.1°C. The estimate for the ocean is warm in some areas, as can be seen in Figure 3. However, the global average ocean temperature is still better than just leaving the data out (0.1°C error rather than 0.4°C error).

My point here is simple. There are often times when you can use knowledge about the overall parameters of the system to improve the situation when you are missing data.

And how did I create the patch to fill in the missing data?

Well … I think I’ll leave that unspecified at this time, to be revealed later. Although I’m sure that the readers of WUWT will suss it out soon enough …

My best wishes to all,

w.

PS—To avoid the misunderstandings that are the bane of the intarwebs, PLEASE quote the exact words that you are discussing.

[UPDATE] A commenter below said:

The obvious problem that WE creates with his “punch out” is he chose the big chunk of equatorial solar heated region.

If he did that extraction to Antarctica (or Greenland interior), where we really do have very few spatial measurements, the opposite would occur, the the average Globe would dramatically warmer, the SH (NH) would warm even more, and the NH (SH) would be unaffected.

Here you go …

You can see the outlines of the patches, but overall, I’d say my method is working well.

Article Rating
Inline Feedbacks
August 23, 2021 10:15 am

This is a misuse of the term “interpolation”. An old adage says: Interpolate at will. Extrapolate at your own peril.

commieBob
August 23, 2021 10:44 am

If an area is lacking temperature data, it is apparently unpopulated or something like that. Anyway it is somehow different than the areas that have temperature data. In that light, interpolating is probably not warranted.

August 23, 2021 12:01 pm

“Anyway it is somehow different than the areas that have temperature data.”
Grid boundaries are an artifice. That “difference” has no physical reality. You could have used a 2×2 grid, a 5×5 grid, or any other subdivision of the space that makes sense to you.

ThinkingScientist
August 23, 2021 1:06 pm

Yes, and then the correct approach is to define a block support with associated local variance attached to each cell.

However, you generally only krige under a constant block size assumption or you have to introduce an area weighted correction to the cells as well.

All of these things matter.

Felix
August 23, 2021 1:10 pm

National borders are an artifice. Coastlines, mountain ranges, desert boundaries, they are all artifices to some degree. Atmospheric upper limits are artifices.

Michael Jankowski
August 23, 2021 8:28 pm

Grid subdivision makes a hell of a difference. I’ll let you realize how obvious that is.

August 23, 2021 11:53 pm

I write a lot about optimal gridding. There is, for example, a discussion here.

n.n
August 23, 2021 12:19 pm

That’s correct. This is not an issue of grid construction, but of a geographical and environmental space/area.

Bindidon
August 24, 2021 3:59 pm

commieBob

” In that light, interpolating is probably not warranted. ”

1) What does population have to do with temperature?
Many parts of the world are very well populated, but nevertheless lack any measurement points over great distances.

2) What do you prefer?

• Interpolation giving an estimate made out of a few known values as near to the context as possible?

or

• the corner you don’t interpolate getting the average of ALL known values?

J.-P. D.

Julie
August 23, 2021 11:02 am

Ya he is right saying

Lance Flake
August 23, 2021 11:14 am

Extrapolation is assuming a trend continues past the data in a graph. Interpolation is filling in missing data between surrounding data points. This isn’t a misuse of the term.

BCBill
August 23, 2021 11:37 am

I believe interpolation assumes some sort of functional relationship for points on either side of an empty area and therefore extending through the empty area. Geography would be expected to cause discontinuities in the function and therefore filling in the gaps is not interpolation but extrapolation, a much more perilous undertaking.

Tim Gorman
August 23, 2021 11:47 am

Bingo!

Ron Long
August 23, 2021 12:51 pm

BCBill, you are correct and all anyone has to do is examine the CERES data in the first figure. I have worked during the summer and winter in the Andes Mountains, between Chile and Argentina. The mountains average about 15,000 feet elevation, and they are colored yellow in the CERES data, the same as the Majority of the USA. Does anybody think 15,000 feet up in the Andes is the same temperature as the USA? Doesn’t matter whether it is winter versus summer or same season, the Andes are much colder, as demonstrated by persons rescued during snowstorms in January (southern hemisphere summer). Interpolation? Extrapolation!

Dean
August 24, 2021 1:38 am

In kriging this is called the nugget effect.

Say your drill hole hits a large nugget, you would be making a large modelling error in assuming the gold quantities from that one hole expand out to your next closest data points. Likewise if your hole just misses a nugget and shows only small quantities of gold.

The data is analysed to see how big nugget effects are and how far from a data point you can interpolate with an acceptable level of confidence. This can change with direction.

Alan M
August 24, 2021 2:34 am

Dean
Not quite correct. In variography it is called the nugget effect, kriging then uses parameters from the variogram in the estimation. The variogram is a measure of variance with distance and the nugget effect is the variance at zero distance, commonly calculated by looking at variance down-the-hole where sample spacing is small. For variography and and subsequent estimation (interpolation) to be valid it should only be carried within zones with similar properties, be that geology in mining or geography in say temperature or rainfall estimation

Alan M
August 24, 2021 4:01 am

To avoid any nit-picking I should have said semi-variogram but in practical circles know as variogram.

August 23, 2021 11:57 am

“This is a misuse of the term “interpolation”.”
You have information about some points (or regions) on the sphere. You interpolate to estimate other regions. There isn’t any distinction in which extrapolate makes sense. There is on a line segment when you go beyond the ends, but not ion a sphere.

Last edited 28 days ago by Nick Stokes
August 23, 2021 12:44 pm

Nick, let’s consider an extreme case. You have temperature measurements along the equator. Can you interpolate them to the poles?

August 23, 2021 1:11 pm

Look at it from above the pole. You have values on the perimeter of a circle. You estimate a central value. Is that interpolation?

If you think it is sometimes interpolation, sometimes extrapolation, then where is the boundary?

August 23, 2021 3:15 pm

Exactly. You can “interpolate” the average equator temperature to poles.
I don’t think it is an interpolation or an extrapolation. It is a witchcraft masquerading as mathematics.

Last edited 28 days ago by Curious George
August 23, 2021 5:35 pm

It is a witchcraft masquerading as mathematics.”
It is your invented example. No-one does it.

TimTheToolMan
August 24, 2021 3:33 am

If you have measurements around the edge of a hole and maybe even some measurements a little ways into a hole, you cant interpolate the depth of the hole. That makes it an extrapolation.

You interpolate to known values. You extrapolate to unknown values and high uncertainty. The difference is in understanding and accepting which is which.

Jim Gorman
August 23, 2021 1:22 pm

You are assuming that the averages of temperatures at two distant points will provide a correct average for something in the middle. The basic assumption is that the daily temperatures between two points and the “middle” will be the same.

“BCBill” pointed out that discontinuities can cause this to not be true. Once you encounter a discontinuity, you simply can not “jump” over it. How do you know that an intermediate point is a discontinuity? You don’t! That is the problem that must be solved prior to interpolating.

Do you know that the intermediate point has temps that average to the interpolation between distant points. You can’t assume that since YOU HAVE NO DATA. The temps could be above or below or the same but you have no way to know.

H.R.
August 23, 2021 8:58 pm

Interpolation generally works just fine on on an X-Y plot.

Once Mr. Z comes along, all bets are off.

Ozonebust
August 23, 2021 6:11 pm

Hi Nick
Arctic Amplification is a little understood phenomenon.
Historical methods had no way recording the true extent of the temperature amplification in the Arctic during El Nino etc.

As an example, the warm periods such as the well known 1930’s with records in populated areas still in existence today. Can we expect that the historical records will receive the benefit of interpolation or extrapolation to show the past in a more accurate light. Why, because now we are aware of it, even if we don’t understand it.

Lets say an increase of 0.5 to 0.6C. Or, has it already been accommodated.

Seems reasonable to me.
Regards

n.n
August 23, 2021 12:21 pm

Yes, they interpolate over closely connected areas and environments, and extrapolate where there are discontinuities, with often catastrophic effect (e.g. transient blocking effects, urban heat islands, sparse data collection over a diverse field).

Jim Gorman
August 23, 2021 4:13 pm

Define “close”. I showed this in another thread. It is an example of how “close” can’t be defined. Look at the line between Hiawatha and Salina. That’s 100+ miles away but close enough to be in the same grid. Define what interpolation function would interpolate from the end points, both with higher temps than the end points, and end up with a lower temperature. Do you apply that function everywhere? How do you chose when you have no data to define what is correct?

The normal report would be what an earlier post said, simply say 85% of the earth appears to be at XX C. No fiddling with data, consists of what you have truly measured, and no need to justify funky math.

bdgwx
August 23, 2021 9:05 pm

The challenge is to estimate the mean temperature of the whole Earth; not 85% of it and certainly not a non-stationary 85% at that. If you can’t provide a mean temperature for the whole Earth then I have no choice but to defer to those that can.

MAL
August 23, 2021 9:17 pm

We only have true measurement in about 3% of the earth that all that us humans inhabit. I did a back of envelop estimate of the earth surface humans inhabit I estimated 4% I was wrong we only occupy 3% and somehow what we measure semi-reliably. That 3% of measurement is suppose to show us what happen in the other 96%. What a sad joke, the sadist part is people take said global temperature seriously!

Tim Gorman
August 24, 2021 2:07 pm

Why is a GAT a challenge. Can you point to a specific latitude/longitude location where I can go measure the GAT? If not then it is meaningless to me. Tell me what is happening locally (which actually is min temps going up thus raising the average) and I will be very interested. If that is happening because of CO2 then bring it on! I planted much of my garden before Memorial Day here in Kansas and I’m *still* harvesting tomatoes, potatoes, cucumbers, green beans, etc. Longest growing season we’ve had for a long time!

bdgwx
August 25, 2021 5:40 am

It’s challenging because of known biases like station moves, instrument changes, time of observations changes, etc. and because observations are sparse in some regions. If you are curious about mean temperature in your region then you can download the gridded data and focus your analysis on your region. And to answer your question…no, I cannot point you to specific lat/lon where the GAT can be measured. It doesn’t work like that.

Last edited 27 days ago by bdgwx
Tim Gorman
August 26, 2021 12:00 pm

Everything you mention can be handled by properly evaluating the uncertainty in your data and then propagating that uncertainty through to the final result.

And of course the GAT doesn’t work like that! That’s the point. So what is the use of the GAT. If no place exists where the GAT can be measured then how is it the result of a Gaussian distribution where the mean has the highest probability? If it’s not a Gaussian distribution then you can’t use standard statistical tools to analyze it. The central limit theory wouldn’t apply.

If the GAT isn’t a true value then who cares about it? I certainly don’t!

Trying to Play Nice
August 25, 2021 10:23 am

Since there is nobody who can provide a mean temperature for the whole Earth, then what are you going to do?

bdgwx
Reply to  Trying to Play Nice
August 25, 2021 11:02 am

Several groups provide a global mean temperature and have been doing it for years.

Jim Gorman
August 25, 2021 12:37 pm

You are making a claim with no proof! Why is estimating a mean temperature for the WHOLE earth is important? Who says it is? Is the other 15% going to drastically change anything? Why?

Basically you are saying “I have FAITH that others know what they are doing.” You also accept their claim that it is necessary without knowing the necessity,.

Bernie1815
August 23, 2021 10:16 am

How is the land 10C cooler than the ocean?

TonyL
August 23, 2021 10:54 am

Just a guess:
What is warm:
The tropics. Here the ocean far outweighs the land in area. Result – Warm ocean.

What is cold:
Antarctica, 100% land. There is no ocean cold like it. Note that the central area of the continent has an altitude of up to 12,000+ ft. This makes it really cold. The Arctic ocean is cold but not that cold. Result – Cold land.

John Tillman
August 23, 2021 11:04 am

Also, SSTs are at sea level. Much of land is high, as wirh Antarctica.

August 23, 2021 12:49 pm

All water not frozen should come in at 0 C (Or adjust by seawater having a lower freezing point) but no lower. Land next to open water could be -15 C or colder.

OweninGA
August 23, 2021 5:09 pm

Once it is frozen, the air above can get as cold as it likes because there is a big chunk of insulation between the heat in the ocean and the air at the surface. Ice surface can get quite cold.

August 23, 2021 4:54 pm

How is the land 10C cooler than the ocean?

Bernie,
I am going to assume that it’s a Scots how? that leads your question, so I will answer with an English why? explanation.

Put simply the oceans of the Earth are this planet’s central heating system.
Water is a mobile fluid with an especially high heat capacity for such a low molecular weight substance. Liquid water is transparent to incoming high frequency solar radiation so it both absorbs energy in the tropics and transport this captured energy towards the poles where the energy is subsequently lost to space by thermal radiation.

By contrast the land being composed of a static solid does not go anywhere. In addition, the land is not transparent and so the intercepted solar energy is concentrated at the surface. Also being a solid substance, it transmits shear waves. It is this attribute that allows for the coupling between mass and electromagnetic radiation which makes a solid surface an efficient thermal radiator compare with fluid water which cannot transmit shear waves. Consequently, the land loses energy to space more efficiently at night than water does and so the land will always on average (day and night) be colder than a static water body (e.g., a lake) at the same location that experiences the same daytime solar radiation loading.

Russell Klier
August 23, 2021 10:29 am

W, I’m a layman guessing how to fill in the hole, I would find geologically similar features with known numbers and just color them in.

Right-Handed Shark
August 23, 2021 11:17 am

Mickey Mann just called, you got the job!

bdgwx
August 23, 2021 12:14 pm

There are a lot of ways to skin the cat. Even HadCRUTv5’s more advanced gaussian process regression method is still trivial compared to even more advanced methods like 3D-VAR or 4D-VAR assimilation like what ERA uses. Those methods do take geography and hundreds of other parameters into account plus the vastly higher quantity of observations .

Last edited 28 days ago by bdgwx
Robert of Texas
August 23, 2021 10:33 am

You know better then this…Punching out a rectangle and performing the “experiment” one time does not prove anything – it just gives one false confidence that the method works.

The “experiment” should take out random chunks of data that better represent the real case, and be rerun numerous times using different amounts of missing data (as well as random locations and sizes). You should find cases where interpolation work, where it makes no difference, and where it fails to provide a good result. I would imagine that as the amount of known data to unknown data is reduced the interpolation goes wildly wrong:

Take the boundary conditions: You measure 100% of the area therefore interpolation accounts for 0% if the data, your result is as good as it gets. Next, you do not know 100% of the area so interpolation is 100% guess work (starting with educated guesses?) and your result is almost certainly wrong (but there is some tiny percent chance it is right. Now think of as you add in 10% actually measured data and rerun – the result gets better each time you add more data in.

Interpolation works if there is enough surrounding data and transitions are smooth. If that is not the case, you are likely going to make the result less certain.

Now add in the process of homogenization and you have a recipe for really screwed up results. Homogenization assumes transitions should be smooth – not lumpy, so it hides things like the UHI effect. Then interpolate and bingo, you (the generic you) just produced extra warming over a larger area. Congratulations – you are ready now to become a data mangler for climate science.

Tim Gorman
August 23, 2021 11:05 am

Interpolation works if there is enough surrounding data and transitions are smooth.”

Transitions are key. Water (e.g. lakes, ponds, irrigation, rivers) vegetation, altitude, terrain, etc. all make big differences. The north side of the Kansas River valley is *never* exactly the same as the south side.

Interpolate the temperature of Pikes Peak from surrounding areas like Denver, Boulder, and Colorado Springs. I don’t care how you interpolate, you will not get it right unless you weight by altitude as well as distance.

In particular, the land error has gone from 1.4°C to 0.1°C. :

What error? How was the error calculated?

Thomas Gasloli
August 23, 2021 12:23 pm

There is a reason why no other science field allows this creation of “data”. But meteorology has a long list of odd data practices. They don’t even determine daily average temperature correct (no it isn’t the average of the highest & lowest temp for that day)

bdgwx
August 23, 2021 12:34 pm

Interpolation is ubiquitous in all disciplines of science. It is a core element of science actually. That is providing insights and understanding of the world around us using imperfect and incomplete data.

Last edited 28 days ago by bdgwx
ThinkingScientist
August 23, 2021 1:21 pm

Interpolation is often used by people not understanding the consequences of that decision.

bdgwx
August 23, 2021 2:18 pm

Definitely. I think HadCRUTv4 is the classic example. A lot of people don’t realize that its global mean temperature timeseries interpolates a significant portion of Earth’s surface using a method that is known to be inferior to even some of the most trivial alternatives.

ThinkingScientist
August 23, 2021 2:24 pm

As has been repeatedly pointed out to you, HadCrut4 is not interpolated. That’s the whole point.

The point of this debate is whether interpolation improves the estimate of the mean over the complete domain. Continually repeating something incorrect about HadCrut4 makes you look uninformed or ignorant. Generally wise to stop repeating the same error if you want to be taken seriously.

bdgwx
August 23, 2021 2:58 pm

How do you compute the mean of the whole domain with only 85% of grid cells populated without using some form of interpolation?

ThinkingScientist
August 23, 2021 3:15 pm

How do you estimate the mean height of the population without measuring everyone?

In the stationary, independent sample case the mean of the samples is the estimate of the mean of the whole domain. The only further consideration is then the dependency function. This may lead to non-uniform weighting or declustering.

If the estimate of a node is further away from the observations such that the samples can be considered independent of that node being estimated then the kriging estimate tends to a declustered mean of the samples being used.

At that point the estimate of the mean largely depends on the range of the spatial dependency function (traditionally the variogram in geostatistics) and the choice of local samples for the estimate (called the neighbourhood search).

bdgwx
August 23, 2021 5:42 pm

I sample the population. My sample obviously includes the height of a person, but it also includes metadata like whether the person was male/female or other attributes. I then interpolate the unsampled portion of the population using the information in the sample and the knowledge that males correlate more closely to other males, females to other females, like attributes with like attributes, etc. In this manner I can provide a better estimate of the mean height of the population than had I just assumed (extrapolated) that the unsampled portion of the population behaves like sampled portion. This is particularly powerful when you know your sampling method is biased like would be the case if you had 75%/25% males/females. The point…interpolation is powerful and can be used in a wide variety of situations including the estimation of the mean height of a population.

Last edited 28 days ago by bdgwx
ThinkingScientist
August 24, 2021 1:15 am

Q: If you only measure a subset of the population and nothing else, how do you estimate the height of the whole population?

A: Your estimate from the subset is your estimate of the mean of the population providing you satisfy the condition of random sampling (and a few other prior assumptions).

You don’t even appear to have a basic grasp of the underlying idea of statistical inference.

bdgwx
August 24, 2021 5:27 am

If I have no metadata then I’m left with no other option than to sample the population and take the strict average. It would be the same problem with gridded data if you denied me the position and size of the cells and the nature of the field I’m averaging.

Fortunately, with a gridded temperature field we know the position and size of the cells and we know that temperature fields are constrained in a lot of ways all of which we can exploit to better estimate the average of the population given a smaller sample.

ThinkingScientist
August 24, 2021 6:17 am

Anomalies are not. You seem to think that you can estimate some non-stationary component of the anomalies over the entire lat/long domain in each time slice and that this will give you a better estimate of the temperature trend over time. You can’t do that.

bdgwx
August 24, 2021 7:24 am

I’m not sure what you mean by “anomalies are not”. Anyway, I want to spell out what I think in no uncertain terms. I think you can estimate the global mean temperature from gridded values better when you interpolate the unpopulated cells in a more robust manner than just assuming they behave like and inherit the average of the populated cells. And if you repeat the procedure for each time slice then you then your estimate of the global mean temperature trend will be better as well. And it’s not just me that thinks this. Everyone who provides a global mean temperature dataset thinks this including the Hadley Centre.

Clyde Spencer
August 24, 2021 8:07 am

If you use interpolated values in calculating an average, then you are effectively weighting the end-points more heavily because they are used in creating the interpolated value. The result would be the same whether one used interpolated values or just added a weight to the surrounding points used for interpolation.

bdgwx
August 24, 2021 8:41 am

I’m not sure what you are trying to say there. There are many different ways to perform the interpolation. Some are better than others.

Clyde Spencer
August 24, 2021 4:11 pm

I’m saying that using an interpolated value is functionally equivalent to weighting the actual real data points in a manner that they are used for calculating the interpolated value. Interpolated values are not real data points, they are artifacts derived from actual data.

Carlo, Monte
August 24, 2021 8:00 am

Who are “we”?

Clyde Spencer
August 24, 2021 8:02 am

A problem to consider is that temperatures are often NOT constrained as you suggest. Consider that with a cold front, there might be ten’s of degrees change across a few miles. Cold fronts are not a rare occurrence. Rather, they are a feature of weather. Similarly, temperatures downwind of a city will be elevated, and downwind of a lake or forest, depressed. Again, these are common occurrences!

bdgwx
August 24, 2021 8:51 am

Yes. The temperature field can present with areas of steep gradients. That doesn’t mean the temperature field isn’t constrained that those constraints cannot still be exploited. For grids with high spatial and temporal resolution in which the gradients can present more steeply and be more troublesome more robust methods are used.

Last edited 27 days ago by bdgwx
Clyde Spencer
August 24, 2021 4:13 pm

What I had in mind was the situation where the distance between weather stations was much greater than the distance where there was a steep gradient. You will not get good results interpolating in such a situation.

Tim Gorman
August 23, 2021 4:13 pm

You calculate the mean of what you know and admit you don’t *know* any more than that! This has been pointed out to you before.

bdgwx
August 23, 2021 5:27 pm

So how do you estimate the global mean temperature then?

MAL
August 23, 2021 9:22 pm

You don’t it a fools errand, only fools think they can come up with a number that matters. All you can do is go by satellite data which has it own problems.

bdgwx
August 24, 2021 5:33 am

Willis was able to do it. In fact, he came up with two methods. One had an error of -0.6C and another +0.1C.

Clyde Spencer
August 24, 2021 8:11 am

An error of the mean only. He did not address the change in standard deviation. That is, by interpolating he probably increased the uncertainty of the mean and thus the precision with which it should be represented. It is not unlike the Heisenberg Uncertainty Principle. There are tradeoffs in determining position and velocity of a particle.

bdgwx
August 24, 2021 8:35 am

Both methods employed interpolation. The first used a strategy that assumes the unpopulated cells inherit the average the populated cells. The second used a more robust localized strategy the details of which I’m unfamiliar with. Both result in increased sampling uncertainty versus having all of the cells populated to begin with.

Lrp
August 24, 2021 12:37 am

You don’t

bdgwx
August 24, 2021 5:32 am

You can’t think of any way to do it?

Nicholas McGinley
August 23, 2021 5:46 pm

There is only one problem, with every thing else being discussed a consequence of it: There are people who are in charge of this all, who have a predetermined outcome, and they are forcing all information to comply with what they decided ahead of time must be the case.

Last edited 28 days ago by Nicholas McGinley
Bill Rocks
August 24, 2021 8:34 am

Yes. The Ruling Theory problem. Can be intentional or unintentional.

Tim Gorman
August 23, 2021 3:31 pm

Someone else posted this observation, I’m just reposting the logic.

Interpolation *usually* involves two endpoints that are related functionally. E.g. Interpolating between two markings on a ruler or on a smoothly varying curve.

Interpolating temperatures between two different geographical locations doesn’t fit this restriction. There is no guarantee that a mid-point between two points on the Earth has any functional relationship to each other.

If, as you say, 3-D and 4-D functions can be developed for a location between two other locations then perhaps a closer interpolation can be developed. The amount of data necessary for this would be huge. Since it would be non-stationary for many factors (e.g. humidity, wind, weather fronts, etc) how you would factor these in would seem to be impossible.

The amount of computer data just for the area around a lake is huge for a topographical map of the region. There are companies that exist to solely create these data sets and they get a lot of money from users – e.g. power companies, regulatory agencies, etc.

Are you saying this is being done for all data sets used to come up with the GAT? If not then you are posting red herrings.

Streetcred
August 23, 2021 4:24 pm

The Australian BoM can interpolate / extrapolate anything fit its agenda.
https://kenskingdom.wordpress.com/2020/11/23/acorn-mishmash-part-1-they-cant-all-be-right/

bdgwx
August 23, 2021 6:34 pm

Only reanalysis datasets use variational data assimilation schemes (3D-VAR, 4D-VAR, etc). The Copernicus dataset is an example.

Jim Gorman
August 23, 2021 5:45 pm

Interpolation is ubiquitous in all disciplines of science. It is a core element of science actually.

Sorry duce but that just isn’t true except maybe in the new pseudoscience. Would you want your surveyor interpolating your property line from a couple of distant points? How about flying to a planet where the scientists have interpolated all the necessary navigation points from points not on the path? Would you fly in a fighter plane where envelope boundaries were interpolated from other data points? How about interpolating focusing beams in a collider from just a couple of distant reading? How about a quality engineer certifying the length of a rod from measures of rods made before and after a process? How about toxicologists interpolating a drug concentration from previous tests, rather than doing the proper calibration procedure and test procedure?

Scientists don’t rely on interpolating data to points without data. You betray your lack of education in physical science. This isn’t statistics or probability, it is real, touchable, reality.

nyolci
August 23, 2021 11:47 pm

You are advertising your stupidity again. bdgwx referred to the well known fact that you couldn’t measure all points, so you would necessarily interpolate. You are the resident expert of metrology, you should know this. Your examples are beyond stupid on multiple levels.

Surveyors establish certain points and “interpolate” between them with straight lines. They don’t use a “couple of distant points”. The whole thing is about using the appropriate number of actual measurements and the proper interpolation method. It’s not like some ad hoc points and some made up method, you genius.

How about flying to a planet where the scientists have interpolated all the necessary navigation points from points not on the path?

Navigation charts before the satellite era were made by triangulation, where they established the coordinates of certain points and interpolated in between. Actually, in the satellite era, you use something similar but with a much denser grid.

How about interpolating focusing beams in a collider from just a couple of distant reading?

What a stupid counter-“example”… What the hell is “interpolating focusing” supposed to be? The rest are the same in quality. Gee, you (and your evil twin) are disappointing. Again.

Jim Gorman
August 24, 2021 10:14 am

Every response you make proves nothing. As to surveying look what you did, used other known data points. Your satellite/triangle example again again involves additional data points. In all cases you are assuming UNVARYING space between points. Temperatures, as I have indicated are not FIXED between points.

nyolci
August 24, 2021 10:56 am

Changing topics? You were bullshiting about how interpolation were not used with examples I immediately showed to be complete bollocks. Now you bullshit about “additional data points”? Cartographers make a grid of of reference points with well known coordinates, everything in between is interpolation. What the heck are you talking about?

In all cases you are assuming UNVARYING space between points.

Again deflecting and bullshiting. No one (except for you, genius) assumed unvarying space, and no one thinks space is unvarying. Continental drift etc. was already measured before the satellite era, for sticking to the actual example. For that matter, I can hardly think of anything that is unvarying. Including the weight of an anvil ‘cos it does change with the scratches and deposits (like dirt) it gains, and the slightly different gravitational field at different places (we are talking about weight not mass here before you shit your pants in your excitement).

Temperatures, as I have indicated are not FIXED between points.

This is your favourite dead horse in the denier gish gallop. Again, no one thinks temperature is fixed between points. Interpolation is a mathematical method for gaining a (usually very good but almost always good enough) estimate that is much better than the “default” estimate. Which, as pointed out by knowledgeable people here, is the average of the known data points. A much worse interpolation.

Tim Gorman
August 24, 2021 2:17 pm

Does continental drift affect surveyed property lines? Can you *measure* the difference in weight of a 100lb anvil whose surface gets marred by a hammer?

Two points 100 yards apart on a survey line *are* fixed points. The temperature gradient between two points 5 miles apart is not fixed. Trying to interpolate the temperature on one side of a lake to the rest of the property around the lake is a losing proposition unless you take terrain into account.

Can *YOU* interpolate the temperature on top of Pikes Peak from the temperature in Denver and Colrado Springs?

nyolci
August 25, 2021 12:12 am

Does continental drift affect surveyed property lines?

??? Is this supposed to be a counter argument? Really? FYI isostatic rebound, a phenomenon similar in magnitude to continental drift, DOES affect surveyed property lines, this is a very real problem in Scandinavia. For the anvil example I was once working with very special scales that would’ve been able to measure the difference.

The temperature gradient between two points 5 miles apart is not fixed. Trying to interpolate the temperature on one side of a lake to the rest of the property around the lake

And what? No one said it would be accurate, regardless of the applied local interpolation method. It would be an approximation, a much better one than the implicit approximation of leaving the missing grid points empty. Because that is also an interpolation but with a much worse interpolation method. This is the thing you should grasp at last.

is a losing proposition unless you take terrain into account.

The “terrain” you speak about is also mostly the result of interpolation, you genius. You should be able to grasp it again. We are working with approximations of various accuracy everywhere.

Can *YOU* interpolate the temperature on top of Pikes Peak from the temperature in Denver and Colrado Springs?

Yes, of course I can. But a question. You are Tim, and I was talking to Jim. Perhaps you accidentally flipped character? Or there are really two of you? You are the kinda twins who finish the sentence the other one’s started?

TonyG
August 25, 2021 7:34 am

Can *YOU* interpolate the temperature on top of Pikes Peak from the temperature in Denver and Colrado Springs?

Yes, of course I can.

Please share how you would do this. I am genuinely curious how you would get an accurate result.

nyolci
August 25, 2021 7:49 am

I am genuinely curious how you would get an accurate result.

No one has ever said it’s accurate. That’s the whole point, you idiot. Even with simple local interpolation the error can be dramatically reduced as compared to the default interpolation method.

TonyG
August 25, 2021 4:32 pm

No one has ever said it’s accurate. That’s the whole point, you idiot

Your response speaks volumes to your character nyolci. I asked a genuine question and you respond with a personal insult. Nice.

So let me rephrase: How would you go about interpolating the temperature at Pike’s Peak from the temperatures at Denver and Colorado Springs and derive any useful value from it?

Are you going to insult me again or are you going to actually answer the question?

Tim Gorman
August 25, 2021 9:04 am

nyolci
August 25, 2021 10:18 am

Why? Do you really think I can’t come up with some kinda interpolation? What a pathetic idiot you are…

Tim Gorman
August 26, 2021 11:56 am

You haven’t yet. You haven’t told us how to interpolate the temperature at the top of Pikes Peak from the temperature in Denver and Colorado Springs.

bdgwx
August 25, 2021 9:33 am

The variational data assimilation techniques like 3D-VAR and 4D-VAR can handle this quite well at least on grids with a high enough resolution to capture the high topography and field gradients typical of mountain environments.

Tim Gorman
August 25, 2021 2:08 pm

TonyG
August 25, 2021 4:33 pm

The variational data assimilation techniques like 3D-VAR and 4D-VAR can handle this quite well at least on grids with a high enough resolution

Elaborate, please. Also, please stay specific to the question, which is interpolating Pike’s Peak from Denver and Colorado Springs. (Otherwise we’re not discussing the same topic.)

nyolci
August 26, 2021 5:45 am

No need for elaboration. Pls check relevant literature. This is the beauty of science (as opposed to science denial), you have all the shit already researched and developed.

TonyG
August 26, 2021 8:57 am

“No need for elaboration.”

nyolci
August 26, 2021 1:41 pm

This is a forum. We don’t want to spend valuable time for proving something to idiots that is well described in literature. Go learn, you obviously have to learn. Actually, I’m pretty sure you are not even really interested in this, and most probably you wouldn’t understand the answer either. You simply tried to push an idiotic point here. No, we don’t have to prove you anything. By the way, the bullshit of yours “otherwise we’re not discussing the same topic” is evidently false too.

TonyG
August 26, 2021 1:55 pm

nyolci,

Some of us are simply laymen following what we can. Many of us don’t even know where to begin to look. I asked an honest question and you responded with personal insults. I tried to clarify and you dismissed my question. You responded exactly as I expected: by refusing to answer my question or provide any more information that I MIGHT have been able to use to learn anything, and when I called you out on that you respond yet again with personal insults.

This is what I have found to be the case almost EVERY time I try to engage in good faith with AGW believers. Insults and dismissal, NEVER any actual answers.

It seems to me that if you actually had answers, you would be able to answer simple questions. But you don’t. You have no desire to convince anyone of anything. You do not engage in good faith. You exist on this board only to spew your bile and attack those who don’t see things the same way.

Thanks for reminding me.

As a postscript, to lurkers: Ask yourself why the bile and refusal to discuss. I think this exchange speaks volumes about this entire debate.

Last edited 25 days ago by TonyG
nyolci
August 27, 2021 2:40 am

Some of us are simply laymen following what we can

While I’m sure you’re a simple layman, I sincerely doubt the honesty and innocence of you your request, for multiple good reasons. This request of yours is a very typical attempt to derail a conversation with an irrelevant topic, and I’m pretty sure quite a few of you can come up with a good set of “fallacy” names to describe this. Furthermore, I’m pretty sure even you know it well. The general tone of your post is very indicative. Just as the fact that a simple internet search gives you immediately a fairly good, layman level of introduction (wikipedia) to the 3d-var and 4d-var data assimilation, and quite a few studies for any deeper understanding. If you really wanted to learn you could. And in this age this is kinda expected that you do your homework.
So again. This whole question is completely irrelevant to the topic, and your pushing it is a deliberate attempt to derail the conversation. Furthermore:

This is top level dishonesty. These are, of course, not simple questions, you have to do a lot calculations etc, and all for a bullshit question that’s irrelevant to the question at hand.
But there’s more. Even if we answered you, you would likely misunderstand it and/or come up with yet another bullshit request etc. You deniers have an exceptional talent for misunderstanding the simplest thing. Well, no wonder science denial is a symptom of your “condition”. And this is true for people with some degree here, like Tim (and/or Jim?) Gorman, who has a laughably bad grasp on error, accuracy, variance, etc, and very often the very sources he refers to directly contradict his claims.

Last edited 25 days ago by nyolci
TonyG
August 27, 2021 7:04 am

The general tone of your post is very indicative

It most certainly is:

That’s the whole point, you idiot.

What a pathetic idiot you are…

I’m pretty sure you are not even really interested in this, and most probably you wouldn’t understand the answer either.

You simply tried to push an idiotic point here

I sincerely doubt the honesty and innocence of you your request

This is top level dishonesty

Well, no wonder science denial is a symptom of your “condition”

nyolci
August 27, 2021 1:28 pm

That’s the whole point, you idiot.

Yep, my mum told me “always tell the truth without hesitation”.

Tim Gorman
August 26, 2021 2:38 pm

Your entire statement is an argumentative fallacy known as Argument by Dismissal. It’s a fallacy that middle school debaters learn to challenge.

nyolci
August 26, 2021 11:05 pm

Your entire statement is an argumentative fallacy known as Argument by Dismissal

Ugh, really? 🙂 In a debate, where you are constantly coming up with fallacy after fallacy? You cannot be this stupid. Even if it was true (but not) that we cannot do the interpolation exercise, it has no relevance whatsoever to the fact that the (locally) interpolated grid is better than the “default”. This is the original point of Willis’ article.

Tim Gorman
August 27, 2021 8:25 am

You continue to show your ignorance. An argumentative fallacy is not the same as an untrue assertion.

You made no assertion of fact. You provided no evidence of any fact. You just dismissed the assertion you disagree with. Argument by Dismissal.

nyolci
August 28, 2021 1:14 am

“An argumentative fallacy is not the same as an untrue assertion.”

bdgwx
August 26, 2021 6:32 pm

I worked up a couple of lengthy posts and each time I realized the explanations of variation data assimilation are complex and I don’t think I can do it justice in a forum like this. However, I would like to provide a good faith explanation of the salient points. To do that I want to more narrowly focus my response.

Are you asking about Pike’s Peak and Colorado Springs because of the elevation difference, the distance between the two, the weather phenomenon typical of the region, or something else?

Also, how much do you already know about 3D-VAR and 4D-VAR? Are you curious about the difference between 3D-VAR and 4D-VAR?

TonyG
August 27, 2021 7:19 am

bdgwx, THANK YOU. I appreciate the good-faith effort and civility.

Are you asking about Pike’s Peak and Colorado Springs because of the elevation difference, the distance between the two, the weather phenomenon typical of the region, or something else?

I’m asking about it because (a) it was the specific example used and (b) given the other factors you list, I don’t see how you could interpolate anything useful given those parameters. N. claimed that he could and I just wanted to see how. (Of course, that was then immediately walked back to “No one has ever said it’s accurate.” – which then makes no sense: what’s the point if it’s not accurate?)

Absolutely nothing. I had not even heard about it prior to this discussion. I can probably follow along to at least a reasonable degree, especially if given a specific example.

bdgwx
August 27, 2021 9:47 am

So the salient point about 3D-VAR and 4D-VAR is that they incorporate vast amounts of information to provide the best fit possible for many fields including but not limited to the surface temperature field. The information they assimilate can be static like land use and topography, semi-static like seasonal agricultural/vegetation/snow/ice changes, and dynamic like the thermodynamic, mass, and momentum state of the atmosphere. They use this information to constrain the possible values that a field can take on. They can handle Pikes Peak and Colorado Springs (or any area) well because they know how the temperature (or any field) is changing horizontally, vertically, and temporally.

The primary difference between 3D-VAR and 4D-VAR is that the later assimilates observations with timing information attached over a window of time say from T-3hrs to T+3hrs to produce its analysis at time T. It then evolves the fields in the time dimension in this window to best fit the fields at time T. Unsurprisingly 4D-VAR results in lower root mean squared errors than 3D-VAR.

The downsides to 3D-VAR and especially 4D-VAR is that they are very complicated, very expensive, and very computationally expensive. There are entire graduate degrees devoted to learning about it. They can cost hundreds of millions of dollars to operate. And they require the world’s most power supercomputers.

Although there are at least a dozen of these datasets in use ERA5 is said to be the best.

Last edited 24 days ago by bdgwx
Jim Gorman
August 27, 2021 12:40 pm

You talk like the algorithm knows something about a variable outside of what is programmed.

We are discussing unmeasured locations and there is no data to validate the estimation. Do you have a personal weather station? Do you belong to weather underground? Do you understand the vagries of temperature? Programming a constant algorithm is a perfect way to insure large errors.

TonyG
August 27, 2021 12:45 pm

“They can handle Pikes Peak and Colorado Springs (or any area) well because they know how the temperature (or any field) is changing horizontally, vertically, and temporally.”

So that’s where I’m having the problem understanding this: HOW do they “know”?

If it isn’t actually measured, then what data is used to determine that what they “know” (I assume you mean “programmed for”) is correct?

No matter how you go about it, anything that isn’t based on actual data is a guess. It might be a good guess, or it might be a bad guess, but it’s still a guess.

nyolci
August 27, 2021 1:35 pm

No matter how you go about it, anything that isn’t based on actual data is a guess. It might be a good guess, or it might be a bad guess, but it’s still a guess.

This is what I was talking about. You misunderstood this (or pretend to have misunderstood). Again, slowly: no one has claimed it is accurate. It will be a very good guess, giving a much smaller error than simply using the global average here (the default interpolation method). Actually, if you use a simple linear interpolation, you are still much better off than the default. Those methods above are extremely complicated both computationally and in their need of inputs.

Last edited 24 days ago by nyolci
bdgwx
August 27, 2021 4:51 pm

“So that’s where I’m having the problem understanding this: HOW do they “know”?”

A bunch of observations and physical laws. These variational data assimilation systems are incorporating about 250 million observations per day now (and that may actually be an underestimate). The 2m temperature field is not formed from direct temperature measurements alone.

“If it isn’t actually measured, then what data is used to determine that what they “know” (I assume you mean “programmed for”) is correct?”

Anything and everything you can think of and then a bunch of stuff you probably never even imagined. Here is an incomplete list of some of the observations incorporated into GDAS.

No matter how you go about it, anything that isn’t based on actual data is a guess. It might be a good guess, or it might be a bad guess, but it’s still a guess.

Yeah. Absolutely. Though even this topic is nuanced. As an example, if you know the pressure and volume of a parcel of the atmosphere you can calculate the temperature with the ideal gas law. Or given geopotential heights at two different pressure levels you can calculate the average temperature in the layer with the hypsometric equation. In that regard temperatures in this manner are not guesses at all. What is unique about 3D-VAR and 4D-VAR that traditional interpolation schemes lack is the laws of physics and a model of how the land, atmosphere, and hydrosphere behave and they exploit this information to constrain the fields it constructs. Unsurprisingly the VAR schemes are more skillful resulting in less root mean squared errors than even the most robust traditional interpolation scheme by a wide margin.

Last edited 24 days ago by bdgwx
TonyG
August 29, 2021 8:37 am

bdgwx, again I thank you for the response, and for the link to that list.

I think the sticking point for me is that (as you even admit) it remains a guess. From what you’re saying, it’s a well-informed one, but still a guess. My next question then would be (and maybe you can answer) – how much has this guessing program been tested against verified data (i.e. used to interpolate a temperature for a location that we have data for) and what is its record doing so? That would at least tell us how useful its results are.

bdgwx
August 29, 2021 3:37 pm

That’s actually a good way of describing it. Interpolation is well-informed guessing. And different interpolation schemes have different levels of being informed.

One of the best ways of testing different interpolation schemes is to do data denial experiments like what Willis did in this blog post.

xD-VAR has a good track record. They are the only interpolation schemes robust and accurate to be used in the most demanding applications. As you probably know numerical weather prediction is incredibly sensitive to the initial conditions. Even the slightest inaccuracies on the initial conditions cause forecasts to spiral out of control rapidly. xD-VAR are the only schemes in use today because they are the only schemes robust and accurate enough to even make it possible for for numerical weather prediction to be as successful as it is today.

A pretty neat demonstration of just how robust these xD-VAR schemes can be is that they produce fields accurate enough that realistic simulations of individual thunderstorms can be performed from 100 years ago when the observational network was extremely sparse relative to today. See Becker 2013 as an example.

TonyG
August 30, 2021 8:17 am

bdgwx: Thank you for the link and for entertaining my questions.

I still have misgivings but I don’t have any more specific questions at this time – I need to review what you’ve provided, which will likely take some time. Probably catch you again some time down the line.

TonyG
August 27, 2021 12:48 pm

BTW, thank you for the response. I appreciate the effort. Your explanation helps me to understand what you’re trying to do with these interpolations, even if I still don’t quite buy it.

Tim Gorman
August 27, 2021 3:34 pm

They can handle Pikes Peak and Colorado Springs (or any area) well because they know how the temperature (or any field) is changing horizontally, vertically, and temporally.”

Not with a 60km grid and a 12 hour window. That’s about 40 miles. The temperatures, wind, humidity, pressure, etc can vary significantly from the south side of the Kansas River valley to the north side of the Kansas River valley, a distance of 20 miles. Their grid won’t even come close to that measuring those differences.

ERA5 may be better than past systems but it is still far from adequate for developing a GAT. Mostly because there is no GAT for them to measure!

bdgwx
August 27, 2021 7:14 pm

I was actually thinking of the HRRR grid when I made my post above.

Tim Gorman
August 27, 2021 8:35 am

Are you asking about Pike’s Peak and Colorado Springs because of the elevation difference, the distance between the two, the weather phenomenon typical of the region, or something else?”

All of it. I am an engineer by trade. It’s been a long time since I’ve done vector analysis but I at least remember the basics.

You are trying to evaluate a vector field with at least four dimensions and most likely more, e.g. humidity and pressure where humidity = f(z,p,t) and pressure = f(z,t)

T = f(x,y,z,t,h,p,….) where f is a vector function

It order to evaluate T_1 vs T_2 you need to know the gradient and curl of each of the contributing independent variables.

The point being is that none of the GAT calculations that I have seen do anything more than use T = f(x,y).

So just how accurate can anything that is not actually measured be if the other independent variables contributing to the vector field are unknown (and likely unknowable because they are not measured at all!).

Tim Gorman
August 25, 2021 8:50 am

An inaccurate interpolation is worse than no interpolation.

If you don’t infill a grid then by default you assume it is the average of of the known grids. And that is basically what you wind up with for a global average – you have to assume everyplace is the average. If you don’t then the GAT is of no use whatsoever.

nyolci
August 25, 2021 10:17 am

If you don’t infill a grid then by default you assume it is the average of of the known grids

You accidentally got something right… Yes. By the way, this is interpolation too. With a very bad interpolation method.

Gary Pearse
August 23, 2021 3:36 pm

Mining industry measuring reserves, and geological mapping do just this sort of thing and very effectively. You can know “other things” about the situation you are dealing with to guide you. I have a longer post on this:

MAL
August 23, 2021 9:24 pm

Yep the do and sometime the come up with nothing, thank God most of the time they don’t use government money.

ThinkingScientist
August 23, 2021 11:51 pm

Mining reserves are determined by conditional simulation, not by kriging.

Kriging is a part of conditional simulation, but reserve estimation is different to the temperature problem because it involves cutoffs.

Dean
August 24, 2021 1:44 am

Kriging is very commonly used in developing mining models.

Its one of the methods we have for taking data points in XYZ from drill holes and building a geological model.

Then mining models are built over that.

One big difference is the model builders are financially responsible for their models. get it wrong enough and you are finished.

ThinkingScientist
August 24, 2021 2:41 am

There is a difference between estimating a point value of something with an associated uncertainty and estimating reserves. We use kriging for estimating point (or block) value. In mining they use a change of support as well – block kriging.

This approach is appropriate for temperature – although really we should use an area-weighted form on the grids if they are on 5×5 lat/long.

However, reserve estimation depends on the introduction of a cutoff, a commercial minimum grade that determines whether you mill the extracted block or send it to the waste heap. In this situation kriging is biased and so you use conditional simulation instead to estimate reserves.

Danie Krige understood that there was a systematic bias when cutoffs are used to determine ore/waste. Danie’s ellipse is a diagram that explains why. The gold grade of the waste is, on average, systematically higher than expected and the gold grade of the recovered ore systematically lower than expected because of this bias – because the cutoff is generally lower than the mean of the samples and the ore we are interested in is comprised of higher grade samples.

ResourceGuy
August 23, 2021 10:38 am

I hear the Sierra Nevada upland is pleasant this time of year.

Randle Dewees
August 23, 2021 3:37 pm

When it isn’t covered in wildfire smoke

Rud Istvan
August 23, 2021 10:46 am

Interpolation works in this case because the Ceres data is ‘homogenous’ in the sense that it is all ‘the same Ceres’. Interpolation (aka infilling) doesn’t work well on something like land station meteorological records because they are inherently inhomogenous. And using anomalies to make them trend comparable does not remove the underlying inhomogeneity.

So the comment to and fro arguments cited at the outset lack important context making them both true or not true depending…

Rud Istvan
August 23, 2021 5:54 pm

WE, always good to get your response. Yup.
But The BE stuff is in principle no different than Ceres, because otherwise homogenized first. See footnote 25 to Blowing Smoke essay ‘When Data Isnt’ for the ridiculous BEST station 166900 specific ‘QC’ example. Hint—warms by algorithm the most expensive and best maintained station on Earth, Amundsen-Scott at the geographic South Pole.

Ron
August 23, 2021 8:47 pm

So the punchline is: homogenized data in and then interpolation works more likely.

HadCRUT4 should therefore not yield as good results.

August 23, 2021 10:49 am

I love the oft used term “synthetic data.”
I love it because it tells me something about the user they probably don’t realize, in the same way those who use “carbon pollution” informs me of the user’s analytic skills.

Note: I realize WE did NOT use that term here. But that is what his infilling method did make. I see that term used lots though with so much by Dark Art practitioners of CliSci though.

Last edited 28 days ago by joelobryan
n.n
August 23, 2021 12:28 pm

Useful in a model or hypothesis, as a high-level overview (an executive summary), but, with rare exception, a progressive (e.g. cascading, cumulative) catastrophic failure in the real world outside of a limited frame of reference.

Ron
August 23, 2021 8:55 pm

It’s actually getting worse as biology is taking over imputation approaches in proteomics and RNA sequencing techniques to compensate for missing data points. And doing significance statistics on the imputed data which – as here – results in smaller errors so easier to get a “significant” result that is publishable.

That is so wrong…

But publish or perish and if the quality of the data is bad due to technical limitations and a re-run too expensive that is what people seem to do…

August 23, 2021 10:54 am

The obvious problem that WE creates with his “punch out” is he chose the big chunk of equatorial solar heated region.
If he did that extraction to Antarctica (or Greenland interior), where we really do have very few spatial measurements, the opposite would occur, the the average Globe would dramatically warmer, the SH (NH) would warm even more, and the NH (SH) would be unaffected.

Last edited 28 days ago by joelobryan
August 24, 2021 4:59 am

South Africa synthetically transferred its heat to Antarctica. SLR will most certainly synthetically accelerate.

Julie
August 23, 2021 11:02 am

Guys why my commect isnot accepting

Rob_Dawg
August 23, 2021 11:09 am

I don’t mind interpolation as long as the methodology is transparent. Better than the discordance of blank spots in either a chart or map. That said, and interpolation is an END PRODUCT. You cannot use it for further analysis.

Robert of Texas
August 23, 2021 3:10 pm

Well, you can…the question is always – should you?

If you are looking for new ideas that are testable, the I have no problem with filling in data. It isn’t “an answer” though, it’s just a “what if”. You have to then make some predictions and test those against real data to get any confidence that you have a good fit.

Just because your infilling works pretty well once is not a guarantee it works the next time. There just isn’t a good replacement for real data.

The big mistake is to then use infilled data on a hourly or daily basis to run future predictions – not one or even 10 iterations into the future but tens of thousands. Any error you have introduced is dramatically magnified. If you are producing a new data set for every 10 minutes of a day and run that to simulate 10 years (525,600 data sets), and if you start with 99.99% correct data the result has almost 0% chance of being right – or even close to right unless boundaries are used to manage the forecast. If you have to establish boundaries to control the results you KNOW your physical description is wrong – period.

If climate has a chaotic component (almost certainly it does) then its impossible to know you made a an accurate prediction for an arbitrary length of time – all you know is that the further out the prediction is, the more likely it becomes wildly wrong.

bdgwx
August 23, 2021 11:11 am

Thanks for doing the data denial experiment.

For those that are curious that debate is centered around the HadCRUTv4 vs HadCRUTv5 methods. v4 ignores empty grid cells in its global averaging procedure. v5 uses gaussian process regression which is similar to the kriging procedure Cowtan & Way and Berkeley Earth use or the local weighted linear regression approach by Nick Stokes.

Hausfather has some commentary on this as well.

Last edited 28 days ago by bdgwx
ThinkingScientist
August 23, 2021 11:55 am

Oh god, here you go again! With moral support from Willis.

Willis’ example is not really like the problem of HadCRUT4 (and ergo all other surface obs. data sets – they contain pretty much the same underlying data).

Willis’ example is in absolute temps (that means you could latitude to constrain the problem) as opposed to anomalies. In addition, filling in single missing block is nothing like the problem of interpolating the sparse surface observations back in time. Finally the surface obs. coverage of the polar regions is non-existent – and this is then an extrapolation problem.

Regarding the polar regions, because the area of the grid is so distorted by the 5×5 lat/long cells, they add very little to the global average anyway, so other than cosmetic “its worse than we thought” bright red colours it is less critical for the global average anomaly.

Here’s what the area weighting grid looks like:

Last edited 28 days ago by ThinkingScientist
ThinkingScientist
August 23, 2021 11:58 pm

Under an assumption of changing climate (ie non stationary in time) then you don’t know the “climatology” of, say, the polar region in 1850.

You are then assuming stationarity in time for the background climatology. But that is part of what you are trying to estimate with a global mean over time.

ThinkingScientist
August 23, 2021 11:18 am

Willis,

I would suggest also bringing up the comments I made about OK, SK, stationarity and declustering as well. They are relevant.

Your example is quite trivial and relatively easy to fix. You are also working in absolute temperatures with that data I think, whereas HadCRUT4 is anomalies. It makes a difference and affects the stationarity assumption

For perspective I have tried to add examples of images of the girds of surface Obs (HadCRUT4) for Jan 1850.. The interpolation challenge is much more problematic in the early part. But it should be noted in the later time slices (eg 1950 and 2000) that the issue is there is no data in the polar regions. This means we are talking about extrapolation not interpolation

ThinkingScientist
August 23, 2021 11:19 am

This is observations for 1900:

ThinkingScientist
August 23, 2021 11:19 am

For 1950:

ThinkingScientist
August 23, 2021 11:19 am

And for 2000:

ThinkingScientist
August 23, 2021 11:21 am

Finally, this is the percent of temporal coverage for each grid cell for the period 1850 – 2017. The high percent temporal coverage is basically showing shipping lanes dating back to Victorian times in the sea (an animation of the time evolution is quite interesting)

ThinkingScientist
August 23, 2021 12:35 pm

I think willis’ example is a 20 year average of the absolute temps, which makes it very smooth. Its therefore a trivial problem, not all like interpolating sparse anomalies.

You could make a pretty good stab at that map just knowing the temperature relationwith latitude.

It’s also clearly not stationary, whereas a quick look at anomaly grids I showed is not all like them, which have much larger local variation within general quasi-stationary background.

As I said on the original thread, in the latter case kriging acts more as a declustering function rather than the interpolation actually adding to the mean estimate in any significant way.

Last edited 28 days ago by ThinkingScientist
ThinkingScientist
August 24, 2021 12:02 am

Hi Willis,
They are just the hadcrut4 median grids from the website. I last downloaded in 2017 and had a bit of a nightmare converting from netcdf.

If I get time I will post an interpolation with different range variograms. It won’t be kriged on a sphere but it might show the point about declusyering.

Captain climate
August 23, 2021 11:22 am

You’re not adding information. You’re presuming that the best linear unbiased estimator is the midpoint between two locations, probably adjusting for altitude.

BCBill
August 23, 2021 11:23 am

Now repeat for twenty different locations and determine the average effect and sd and then the discussion can begin.

RMoore
August 23, 2021 12:36 pm

If you did this for the entire globe section by section would you be able to identify the area that lack of information is most critical to altering the resulting temperature calculations. Therefore the final result would be more accurate if the most sensitive areas were more finely measured.

ThinkingScientist
August 23, 2021 12:58 pm

Gain of information is a very old kriging application.

Captain climate
August 24, 2021 5:54 am

You don’t “gain information.” You add uncertainty by making an educated guess. All the statistical tricks in the word will not allow you to observe the non-observed.

ThinkingScientist
August 24, 2021 6:21 am

You misunderstand. There is no “gain of information” unless you actually add a new measurement location. But you can estimate the impact of adding another sample location. The gain of information is measured by the reduction in variance you would expect by adding another sample location. So you can estimate the impact of adding a new observation.

Note that kriging weights do not depend on the actual sample values, only the spatial distribution of the observations and the location being estimated. plus the spatial correlation function, of course.

Captain Climate
August 24, 2021 10:20 am

Okay sorry. Agreed. Sure, you can estimate information added but it certainly is no guarantee you’ll be right (see the volcano example above). Willis seems to think that adding made up data increases precision, which is insane. You can lower the standard deviation of any set by simply adding in data close to the averages. That says nothing about the precision. So much of this field seems to be statistical games trying to make up for bad data sets.

ASTONERII
August 23, 2021 11:25 am

Lower fake error by making up data where none exists is not a better result than using known numbers. It just gives a false feeling of accuracy where it does not exist.

You could have lowered the error bounds similarly by putting 100 degrees C for that entire cell and it would not have made any difference to error range.

August 23, 2021 11:32 am

“How do you estimate the value of empty grid cells without doing some kind of interpolation?”

How do you estimate the value of “full” grid cells without doing some kind of interpolation? In any continuum measurement, you only ever have a finite number of sample locations. Any extension beyond those points is some type of interpolation. Grid boundaries are your own construct, so “empty grid cells” are your own creation. They don’t change the main problem.

What is going on in Willis’ calculation is that when the chunk is removed and the average taken, the effect is that the chunk is treated as if it is an average part of the world. That would be cooler, so the average drops. It was a bad estimate; we know the region is warm. When Willis interpolates, he does so from nearby (warm) values. That is a better estimate than just global. The average is much closer to using actual data.

This point of using the best available estimate is poorly understood, as is the fact that “omitting” unknown cells is equivalent to assigning them the average value. HADCRUT used to make this error, and Cowtan and Way (1913) showed that kriging could overcome it. Actually just about any rational interpolation scheme will do that. HADECRUT 5 gets it right, although they still offer a wrong version for those who like that sort of thing.

More here

bdgwx
August 23, 2021 12:09 pm

What is going on in Willis’ calculation is that when the chunk is removed and the average taken, the effect is that the chunk is treated as if it is an average part of the world.”

Exactly. In fact, the debate Willis is talking about here was initiated by me other in the other thread. I made this EXACT same point in that thread.

ThinkingScientist
August 23, 2021 12:20 pm

Willis’ example is completely trivial and is nothing like the problem of actual surface obs. In Hadcrut4.

Working in absolute temps also has a huge impact on the global average temp calc. by cutting out a chunk of hot cells at the equator. Completely different with large, irregularly sampled data which are anomalies

August 23, 2021 1:50 pm

“nothing like the problem of actual surface obs. In Hadcrut4”
It shows the same issue. The data is not stationary, which means that the global average is not a good estimator of the chunk removed. You can and should do better, knowing it will be warmer than average.

Taking anomalies improves homogeneity, so there is less that can be predicted about omitted data. But not nothing. The point of the HAD 4 issue was that Arctic regions were warming faster than average, and had many missing cells, which were then replaced by global average behaviour. This washed out part of that warming. Again replacing the missing data with something based on the local region does much better.

ThinkingScientist
August 23, 2021 1:59 pm

Why do you think the anomalies are not stationary? Looking at the sample surface obs grids I posted, on a global scale they are approximately stationary. They look like realisations of a stationary random function model in the classic geostats sense, with a reasonably short range variogram.

And in the stationary case with a sparse set of strangely clustered observations the impact of kriging is predominantly declustering. The mean is not going to change if the domain is large and the variogram range relatively short. Which is fundamentally my point about this on the previous thread.

Willis example is not very relevant to the example observation grids I showed.

Last edited 28 days ago by ThinkingScientist
August 24, 2021 2:38 am

“Why do you think the anomalies are not stationary?”
In forming a temperature anomaly you subtract a predictor from the data. The resulting anomalies are thus less predictable (hence closer to stationary) but not totally unpredictable. A common predictor (base) is the station 1961-90 average. There may be regionally predictable variation since then, eg Arctic amplification. You can remove that from the anomalies to get still closer to stationary (it’s never perfect).

To the extent that data is not stationary, infilling methods should take advantage of whatever predictability exists.

Here is just one study of the nonstationarity of various levels of residuals (anomaly).

ThinkingScientist
August 24, 2021 4:10 am

Here non-stationary would refer to Willis’ example – the absolute temperature varies over the globe, predominantly as a function of latitude from equator (hot) to poles (cold). That is a non-stationary problem ie there is a global trend from equator to pole.

The HadCRUT4 anomalies should be generally stationary across the globe because the local mean is subtracted from each measure. So the global trend of absolute temperature from equator to poles is not present. So they should be stationary. And that is what we see – across the grid the anomalies are trendless, but do have localised highs/lows (which vary from time slice to time slice. Classic expression of a stationary random function realisation with some spatial correlation behaviour.

The problem being attempted is to try and estimate, somehow, a non-stationary time element (steady warming over the years from 1850) by interpolation of the stationary anomalies in each time step into the unsampled areas. Assuming stationary kriging in a time step, that is a hopeless task, it really only serves as a declustering model.

ThinkingScientist
August 24, 2021 4:18 am

Just to make the point clear – subtracting a global mean from the data does NOT make it a stationary problem. Stationarity in the simple form we are considering here refers to subtracting the local mean from each sample.

So to convert Willis’ example to stationary we might subtract a defined temperature trend as a function of latitude. We could then krige the residuals and add the trend back afterwards.

Note that this does not help with the problem of temperature trends with time as we are presuming that, for it to be important, the global trend with latitude would also vary as function of time (eg equator-polar gradient reduces with warming world).

Last edited 28 days ago by ThinkingScientist
Clyde Spencer
August 24, 2021 8:19 am

The resulting anomalies are thus less predictable (hence closer to stationary) but not totally unpredictable.

Yes, to get it even closer to stationary, one should subtract the trend, not a constant.

Sparko
August 23, 2021 2:07 pm

It’s nonsense to use coastal stations to estimate over land areas and sea surfaces in the arctic when the temperatures recorded are so affected by the weather off the sea, and the variable distance to the sea Ice boundary

Robert of Texas
August 23, 2021 3:16 pm

Its also nonsense to use poorly located temperature stations at airports to infill large areas of missing arctic data – but there they go again.

ThinkingScientist
August 23, 2021 1:02 pm

You need to start talking about point versus block support (with associated variance) at that point.

Last edited 28 days ago by ThinkingScientist
Jim Gorman
August 23, 2021 6:16 pm

Ask yourself why is it necessary to even “make up” data by whatever means. Is it wrong to say that 85% of the earth displays a temperature average of — C degrees? What is the driver to be able to say that we know what the Global Average Temperature (GAT) is over 100% of the earth? And, as ThinkingScientist has pointed out, in the early 19th and 20th centuries we only know 25% of the earth.

There is really only one reason to make up data, to satisfy politicians that “Climate Science” KNOWS what the real temperature is. Politicians need this assurance so they can justify the spending of trillions while pushing many people into the poor house. Any display of doubt would ruin their moral justification.

bdgwx
August 23, 2021 8:37 pm

Nick’s statement is in reference to the procedure you used to get 14.6C for an error of -0.6C. By excluding empty cells during averaging you are not producing a global grid average. To produce a global grid average you must project your sub grid average onto the global grid. You did this, perhaps without realizing it, by assuming that the empty cells behave like and take on the average you obtained in the earlier step for the sub grid. This is the no-effort interpolation strategy. It’s not a bad strategy per se, but as you showed in your post it’s not the best either. Using a more robust approach you get 15.3C for an error of +0.1C for the global grid.

August 24, 2021 2:52 am

“You did this, perhaps without realizing it, by assuming that the empty cells behave like and take on the average you obtained in the earlier step for the sub grid.”

Yes. Here is how the math of area weighted averaging goes. Suppose the chunk is fraction p of the sphere area and the average for globe is A₀, for globe without chunk A₁, and for chunk A₂.
Then A₀ = (1-p)*A₁ +p*A₂
or A₀ = A₁ +p*(A₂-A₁).
If you leave out A₂ and use A₁ to represent A₀, then that is equivalent to assuming that A₂=A₁. The error of that is proportional to the difference A₂-A₁, so any estimate of A₂ which takes into account its real warmth will give a better estimate of A₀.

Jim Gorman
August 24, 2021 6:41 am

Your math appears to assume that “p” is a known quantity. It is not if A2 is truly unknown. The only thing you know for sure is the size of the grid squares. That isn’t a good proxy for temperature.

You need to show how “p” is derived based on temperature.

August 24, 2021 8:48 am

p is the area fraction of the chunk. Just grid geometry.

Captain Climate
August 24, 2021 10:25 am

He’s going to try and do a gotcha later that says “ACTUALLY ALL I DID WAS INSERT RANDOM NUMBERS INTO THE AREA WITH AN AVERAGE OF THE BORDER VALUES” or filled it in with Photopshop or something.

Captain Climate
August 24, 2021 11:23 am

Looking forward to your “gotcha.” Nobody here is stupid. You’re going to describe some method, and irrespective of what it is, infilling does not increase information.

Gary Pearse
August 23, 2021 11:44 am

“There are often times when you can use knowledge about the overall parameters of the system to improve the situation when you are missing data.”

Willis, you intuited the key! You probably know that the term “kriging” came from a South African mining engineer by the name of Krig. He used the method to interpolate assay data from bore hole sampling to estimate the grades in ore blocks in between sample sites. The “extra knowledge” he employed was that of a) the geology and geometry of a given type of ore, b) plus variability in grades across the deposit from the drill core data (perhaps the highest grades are closer to one of the contacts) and c) a geophysical survey detecting continuity of the deposit “along strike”.

ThinkingScientist
August 23, 2021 12:50 pm

And the key follow on point to kriging is it leads to systematic bias in estimated grades when a commercial cutoff is applied.

That’s why conditional simulation was created, to address the problem correctly.

Look up Danie’s Ellipse to find out more.

The same bias pervades net:gross perm cutoffs in petroleum and gross rock volume calculations as well.

Kriging does not work and leads to a bias when the problem involves integration of a quantity between limits.

Please note this is an aside and not relevant to the problem of temperature estimation. Just pointing out that these issues are way more complex and sophisticated than the general knowledge being discussed at WUWT.

Arw
August 23, 2021 1:44 pm

any resource estimate developed through kriging has levels of confidence applied. Measured, Indicated and Inferred based on data density. Some similar disclaimers might be useful in stating the average temperature of the globe. Dropping the number of reporting stations over time must decrease the level of confidence of the estimate. Mixed data quality and “adjustments” to the data post recording are issues frowned upon in the mining industry.

Gary Pearse
August 23, 2021 4:39 pm

Danie’s elipse is used in modelling ore bodies and is followed up in pit design which has parameters separate from just ore grades and deposit geometric (pit slope design based on rock strength and stability, haulage road “grades”, area on the pit floor for turning trucks…).

Gary Pearse
August 23, 2021 6:36 pm

Thinking Scientist:

Without the space required to give you a structural geology, geophysics, ore deposit geology and geochemistry course, I have no way of telling you what a geological survey geologist or an exploration geologist already knows about the terrain he is working, in long before a hole is drilled or even mineralization discovered and trenched. You didn’t comment on my “The “extra knowledge” he employed” This is data!

Estimated temperatures in gridcells are far more uncertain than grade estimates, especially when the estimator believes in an already falsified theory of catastrophic warming.

Yes, the advent of computers have refined the process with new approaches, but there is a long history of successful estimations of grades and tonnages, upon which depend financing, capex and opex. Degrees of uncertainty are mitigated by contingency accounting and sensitivity analysis. The estimator was very aware of the importance of conservative reckoning, and he wasn’t working just with sterile numbers and no subject knowledge.

ThinkingScientist
August 24, 2021 12:04 am

Gary,

I have been using geostatistics in my day job since 1993. Not in mining though, in petroleum. I am aware of some of the issues mining (which are like the gross rock volume estimation in petroleum)

Last edited 28 days ago by ThinkingScientist
Gary Pearse
August 24, 2021 8:38 am

The mining of ore is not just being a slave to the original estimations. Grade control is practiced as you go at every mine. When you are drilling blastholes, the dust from the holes is analyzed using hand held laser ablation spectroscopy tech or a lightweight portable lab unit. This is even done despite today’s computer modelling using Danie’s elipse methods. It’s not a step that petroleum producers can do.

I had no doubt in my mind that you were an industry expert on this topic and that you were making an important contribution to the discussion.I thank you for that, and I will be reading up on Danie’s ellipses.

ren
August 23, 2021 12:02 pm

See what the temperature is now in Antarctica and Australia.

bdgwx
August 23, 2021 12:29 pm

And to provide relevance to the blog post this grid uses a method called 4DEnVar which is an order of magnitude more complex than the HadCRUTv5 gaussian process regression or the BEST kriging method.

John Tillman
August 23, 2021 12:45 pm

Also the frigid Southern Cone of South America, Southern Brazil had snow this winter. Brazilians visit Chile in order to experience snow. Tour buses take them from Santiago to the ski resorts in the Andes overlooking the city, which used to be obscured by smog, but is now sometimes visible.

Rob_Dawg
August 23, 2021 12:09 pm

Thought experiment. Concentric data points measuring temperature around a volcano. Interpolation would lead to a low value for the temperature of the volcano. Extrapolation would be better. Neither would provide data on the temperature of the volcano.

Add this to the climate “science” waste heap that includes accuracy v precision.

Sparko
August 23, 2021 3:26 pm

Exactly. And yet that’s what they do.

ThinkingScientist
August 23, 2021 12:32 pm

Willis,

Am I correct in assuming that you have used an average temp over a 20 year interval, as well as being absolute temps?

bdgwx
August 23, 2021 2:05 pm

That’s the way I’m interpreting it. I agree with your point that the long averaging period smooths the data. It would be more relevant if Willis did the data denial experiment using a monthly average as the control field instead.

Last edited 28 days ago by bdgwx
bdgwx
August 24, 2021 1:25 pm

Thanks. That doesn’t invalidate your result, but I do think starting with a monthly period would be more interesting since their would be more regional variation.

Last edited 27 days ago by bdgwx
The Dark Lord
August 23, 2021 12:40 pm

better ? lower error ? but how can you even beging to measure the error in fake data … nothing with guessed in data can EVER be better than without … I don’t care what statistical slight of hand is done … apples and oranges … you can’t measure the error bars of fake data …

The Dark Lord
August 23, 2021 12:48 pm

ahh can’t you just take a single measured location (or locations, you could use hundreds of locations) … assume its not known, do your fancy interpolation and see how close your /interpolation guess is to the known value we have ? my bet is it will be wrong 99% of the time … sometimes by huge amounts … and no, its not better than nothing, its injecting known bad data into the process …

Joe Crawford
August 23, 2021 3:22 pm

Expanding on what you said, I think a plot of the the difference between the actual measured values and the interpolated values for grid squares that have missing measurements would show the accuracy of the method used to interpolate at each location. For example, take a fixed time frame of several years and for each grid square that has both measured values and missing values perform the interpolation calculation at each time slot. Then, for each time slot that has a measured value plot the differences between the interpolated value and the measured value vs. time of year. This plot should then show the accuracy of the method used for interpolation at each grid square plotted, and a statistical analysis of those results should help determine the value of the interpolation algorithm.

Tim Gorman
August 23, 2021 3:38 pm

Ummm, the variable time will have a big impact on your comparison. A value calculated at July 1, 2019 may have little relation to July 1, 2020.

Joe Crawford
August 23, 2021 4:59 pm

I would expect more of a seasonal difference, mostly caused by the local weather for each season and the processions of frontal boundaries.

TonyG
August 24, 2021 10:57 am

“can’t you just take a single measured location (or locations, you could use hundreds of locations) … assume its not known, do your fancy interpolation and see how close your /interpolation guess is to the known value we have ?”

That’s exactly what I was thinking too. Do the math for the interpolation by removing one known value, then compare the result to the known value. That would pretty quickly reveal if the interpolation is viable or not.

bdgwx
August 24, 2021 1:00 pm

That’s what Willis did. It is called a data denial experiment.

Vuk
August 23, 2021 1:00 pm

Not to be confused with interpellation.
“The term interpellation was an idea introduced by Louis Althusser  to explain the way in which ideas get into our heads and have an effect on our lives……………. (my rem: have you heard of global warming ?)
The mechanisms of (interpellation) power force us into our place.  These mechanisms operate through threats of punishment or through the explicit demonstration of power.  Repressive State Apparatuses (or RSA’s) include armies, police, prisons, or any outright threat of violence.”

Rory Forbes
August 23, 2021 4:27 pm

It sounds like you’re describing Australia’s struggle with reality at present.

Steve Z
August 23, 2021 1:05 pm

Interpolation for missing data might make sense if, and only if, the region with the missing data has similar geography (land or sea, latitude, altitude, land use patterns [urban, suburban, forest, grassland, farmland etc.]) to neighboring regions where detailed data are available.

But there are places where interpolation would result in huge errors. Suppose that we have detailed climate data for Seattle and for Quillayute, Washington (along the coast about 60 miles west of Seattle, but no data for Mount Olympus in between. Trying to guess the climate of Mount Olympus by interpolation would likely over-estimate the true temperatures of Mount Olympus, which is nearly 8,000 feet above sea level, while Seattle and Quillayute are near sea level.

bdgwx
August 23, 2021 1:31 pm

There a lot of interpolation strategies. Many of them are designed to address the issues that varying geography presents on high resolution grids.

Tim Gorman
August 23, 2021 3:40 pm

And are any of these used to determine the GAT?

bdgwx
August 23, 2021 5:20 pm

Yes.

JCM
August 23, 2021 1:07 pm

Next, I use a mathematical analysis to fill up the hole.

AKA, a model.

anyway,

If we’re talking infilling schemes by regression or interpolation (or whatever) in space and time they all seem to assume a randomly distributed error surface i.e. one that sums to zero. This gives the impression of reducing total error by increasing n cells in a random error surface (mean error introduced = zero).

Of course we cannot meet the condition of a random distribution for standard error analysis. Systematic error propagates across the simulated error surface and a whole host of other problems such as limited range of covariance in space.

Add in a non-stationary surface in timeseries and things get really exciting, statistically speaking.

Personally I wouldn’t worry too much about the details though – the error bars on all this stuff are ridiculous anyway and it’s a matter of interpretation. They all hide reasonable (common-sense) uncertainty bounds – it doesn’t much matter how fancy your stats gets.

Either way, the newer faster warming data matches the solar proxies better anyway if that’s your game. If the older HadCRUTS are right then it suggests more emergent feedbacks from clouds or whatever else you fancy.

JCM

Killer Marmot
August 23, 2021 1:23 pm

Why does the kriging algorithm produce discontinuities at the patch boundaries?

Either (1) the kriging parameters need tweaking, or (2) a better interpolation algorithm should be used.

James Schrumpf
August 23, 2021 1:40 pm

NOAA says the temperature sensors are accurate to +/- 0.5 F or about +/- 0.28 C. You can’t improve on that by increasing the sample size. Why is this never mentioned, or seemingly accounted for?

bdgwx
August 23, 2021 2:11 pm

It’s worse than that for non-digital and pre-WW2 observations. And you’re right you can’t improve the uncertainty on individual measurements by increasing the sample size. But you can improve the uncertainty on the global mean temperature by increasing the sample size (with caveats). The uncertainty on post WW2 estimates of the global mean temperature are around ±0.05C (give or take a bit). See Lenssen 2019 and Rhode 2013 for details on two completely different different uncertainty analysis that arrive at the same conclusion.

Tim Gorman
August 23, 2021 3:17 pm

No, you can’t do this. The uncertainty of the GAT is strictly dependent on the uncertainty of the data used and interpreted as root-sum-square.

No amount of averaging will ever eliminate this uncertainty. You can calculate the mean as precisely as you want, it will still have the root-sum-square uncertainty as the sum you use to calculate the mean.

Bellman
August 23, 2021 3:50 pm

Quite right. That’s why if you take the average of a million temperature readings each with an uncertainty of 0.5°C, you could well end up with figure that is out by ±500°C.

Carlo, Monte
August 23, 2021 4:45 pm

Uncertainty is NOT error.

Bellman
August 23, 2021 5:38 pm

Never said it was.

Uncertainty is a measure of the expected range of errors. If you are saying that the uncertainty of a measurement could be ±500°C even though the actual error could never be more than ±0.5°C, then what exactly do you mean by “uncertainty”?

The GUM defines uncertainty of measurement as

parameter, associated with the result of a measurement, that characterizes the dispersion of the values that

could reasonably be attributed to the measurand

Do you thing the 500°C range could reasonably be attributed to the average of a million readings, given the uncertainty of each reading is only 0.5°C and say the range of the individual measurements was between -100 and +100°C?

Carlo, Monte
August 23, 2021 6:22 pm

Uncertainty is a measure of the expected range of errors.

Completely wrong and contradicted by your quote from the GUM. Uncertainty is a measure of the state of your knowledge about a measured quantity.

The GUM also tells you how to calculate uncertainty from Eq. 1; if you haven’t done this for this grand averaging of averages, then you don’t know what you don’t know. There is a whole lot more involved beyond your mindless division by the square root of n.

Bellman
August 24, 2021 3:45 am

The definition I quoted defines measurement uncertainty in terms of the state of your knowledge. Uncertainty will always be about the state of your knowledge. If you knew the size of the error you would have perfect knowledge and no uncertainty.

My point was the use of the word “reasonably”. Do you think it reasonable that an average of a million thermometers, each with a measurement uncertainty of 0.5°C could be out by 500°C?

Regarding the GUM, it goes on to say that their definition is “not inconsistent” with other concepts of uncertainty such as

a measure of the possible error in the estimated value of the measurand as provided by the result of a measurement

and

an estimate characterizing the range of values within which the true value of a measurand lies

Carlo, Monte
August 24, 2021 8:09 am

And my point is that your question is ill-formed (actually it is a loaded question). You need to do a formal uncertainty analysis before coming up with this absurd 500°C number.

bdgwx
August 24, 2021 8:21 am

That 500C number comes from Gorman’s insistence that the uncertainty of the mean follows RSS such that for an individual measurement uncertainty of ±0.5C and given 1 million measurements the final uncertainty of the mean is sqrt(0.5^2 * 1000000) = 500C. Does that sound like a plausible uncertainty to you?

Last edited 28 days ago by bdgwx
Carlo, Monte
August 24, 2021 10:50 am

Still no uncertainty analysis, I wonder why.

bdgwx
August 24, 2021 12:57 pm

I did the analysis. The simulated uncertainty on the computed mean landed almost spot on to the 0.0005C value predicted by the formula σ^ = σ/sqrt(N) and miles away from the 500C that Gorman claims.

And I again ask…is 500C even remotely plausible? Does it even pass the sniff test? What if the mean were 50C…do you really think -450C would occur 16% of the time?

Tim Gorman
August 24, 2021 3:01 pm

Again, once your uncertainty exceeds what you are trying to measure then why would you continue?

Think about the boards again. If your result is 10feet +/- 10feet then why add any more boards? You can’t have negative lengths! And your uncertainty is only going to continue growing!

Have you *ever* done any physical science or only academic exercises?

Carlo, Monte
August 24, 2021 5:56 pm

You just demonstrated your abject ignorance of what uncertainty is.

Attempting to educate you is quite pointless.

Last edited 27 days ago by Carlo, Monte
Tim Gorman
August 24, 2021 2:59 pm

It is totally reasonable. Once your uncertainty exceeds what you are measuring it is time to stop. Once the uncertainty is more than what you are measuring then why would you continue to grow your data set?

Bellman
August 24, 2021 8:28 am

It’s not my absurd claim, it’s what Tim Gorman claims. I’m saying he’s wrong and using the absurdity to refute the claim that uncertainty of a mean increases by the square root of the sample size.

Carlo, Monte
August 24, 2021 10:50 am

Go argue with statisticians.

Bellman
August 24, 2021 12:27 pm

Because statisticians agree with me. So, as far as I can tell do metrologists. It’s a small clique here who don’t seem to understand what either are saying.

Carlo, Monte
August 24, 2021 12:55 pm

Now you are just lying.

Bellman
August 24, 2021 1:13 pm

Point me to a statistician who thinks the uncertainty of a mean increases as the sample size increases, and we can can discuss it.

Tim Gorman
August 24, 2021 3:06 pm

I gave you a site from the EPA that says just that. I guess you didn’t bother to go look at, did you?

Standard propagation of uncertainty for random, independent measurements:

u_total^2 = u_x^2 + u_y^2 + correlation factor. Section 19.4.3, Table 2.

Bellman
August 24, 2021 3:23 pm

I gave you a site from the EPA that says just that. I guess you didn’t bother to go look at, did you?

Sorry, must have missed that one. I’ve just noticed your link below. I’ll check it out.

Last edited 27 days ago by Bellman
Bellman
August 24, 2021 4:06 pm

Brief look through and the table you gave me shows that the EPA agree with me, as I explain below.

Tim Gorman
August 24, 2021 3:04 pm

Because most statisticians are physical scientists. Metrologists agree with me. Are you confusing metrology with meteorology?

Bellman
August 24, 2021 4:08 pm

No, I’m referring to all the metrology documents you keep chucking at me, who you think agree with you, only because you are incapable of following any of their formulae or examples.

bdgwx
August 24, 2021 12:58 pm

We should we? We agree with them.

bdgwx
August 24, 2021 1:24 pm

Sorry, “We should we?” should have said “Why should we?”

Tim Gorman
August 24, 2021 3:03 pm

I’m not wrong. Think abut the climate models. Once their uncertainty exceeds +/- 2C (a little more than the anomaly they are trying to predict) then why continue? When the uncertainty exceeds what you are trying to analyze then what’s the purpose for continuing?

Bellman
August 24, 2021 4:15 pm

Again, trying to disprove that you are wrong by changing the subject. This has nothing to do with climate models, it’s entirely about your claim that uncertainty increases with sample size. That’s what I’m saying is wrong.

Carlo, Monte
August 23, 2021 4:45 pm

They’ve been told this again and again, yet refuse to see reality.

Tim Gorman
August 24, 2021 3:08 pm

I gave bellman this site: https://www.epa.gov/sites/default/files/2015-05/documents/402-b-04-001c-19-final.pdf

Table 19.2 gives the first order uncertainty propagation formula.

It has nothing in it about dividing by N or the sqrt(N).

Bellman
August 24, 2021 3:41 pm

Probably because Table 19.2 isn’t talking about means. But you can derive the concept from the formula for the product, just as Taylor does, by making one of the measures as a constant.

$u_c^2(xy) = u^2(x)y^2 + x^2u^2(y) + 2xy\dot u(x, y)$

If y is a constant, B, then u(y) = 0, and u(x,y) = 0, then that becomes.

$u_c^2(Bx) = B^2u^2(x) \implies u_c(Bx) = Bu(x)$.

Bellman
August 24, 2021 4:02 pm

You can also derive this from the Sums and Differences formula, where a and b are constants.

$u_c^2(ax \pm by) = a^2u^2(x) + b^2u^2(y) \pm 2ab \cdot u(x,y)$

If b = 0, then this becomes

$u_c^2(ax) = a^2u^2(x) \implies u_c(ax) = a u(x)$

Last edited 27 days ago by Bellman
Bellman
August 24, 2021 3:56 pm

And if you want to see an example of dividing by the sqrt(N) look at equation (19.3), where they show how to calculate the standard error of the mean.

Then you could look at Example 19.1, where they, guess what, take an average of 10 values and given to 3 decimal places, and calculate the standard uncertainty as 0.0011.

Carlo, Monte
August 24, 2021 4:37 pm

I am very glad that did not have to take any university courses of which you were the instructor.

Bellman
August 24, 2021 5:02 pm

Feel free to explain where I’m wrong.

(Thanks the complement by the way. I feel honoured that someone mistakes me for a university lecturer.)

Carlo, Monte
August 24, 2021 6:20 pm

You’ve been told again and again yet refuse to even consider that you might be wrong. The reason is obviously that if you were to perform an honest analysis, it would show how these averages of averages have very little useful information.

Bellman
August 25, 2021 4:43 am

What I’m told over and over again are assertions which are never backed up by evidence. The EPA document was meant to be statistical claiming that you never reduce uncertainty. I pointed out it didn’t make that claim and the standard formula they use implies that you do indeed have to multiply the uncertainty by a constant when the measure is multiplied by a constant. As always nobody explains why they think my reasoning is wrong, they just down vote and move on, only to repeat their mistakes again and again.

I do consider that I may be wrong. I’m not an expert, most of my statistical learning had fallen into obscurity, and many here who disagree seem to come with some experience of applying statistical methods to engineering. Which is why I keep asking for evidence that I might be wrong.

I keep pointing the Gormans to passages in their preferred sources that as far as I can tell directly contradict what they are saying. They never consider they might be wrong, but never constructively explain why my interpretation is wrong. There are only so many times you can have Tim Gorman insist that you never divide the uncertainty, or that you can only average dependent measures, or state a baseless equation as if it was a fact, before you begin to suspect that he is wrong, and just refuses to accept it.

So again, if you have any evidence of any kind that

a) uncertainties in a mean increase as sample size increases
b) you never scale an uncertainty when you scale the measure
c) that the calculation for the standard error of the mean, cannot be used with independent data

Carlo, Monte
August 25, 2021 7:17 am

A total waste of time, request DENIED.

Bellman
August 25, 2021 3:08 pm

As I suspected, you won’t provide the evidence. Let me know if you change your mind.

Carlo, Monte
August 24, 2021 6:17 pm

110 pages, that is a ton of work.

Has anyone ever attempted a real uncertainty analysis of these GAT calculations? (Not that Bellman & Co would believe the results.)

bdgwx
August 24, 2021 7:55 pm

Yes. I posted links to Lenssen 2019 and Rhode 2013. The former uses a bottom-up approach while the later uses a top-down approach via jackknife resampling. Despite wildly different techniques they both come to the same conclusion…the uncertainty is about ±0.05C. I then tested to see how much disagreement there was between monthly global mean temperature anomalies between HadCRUTv5, GISTEMP, BEST, and ERA. The disagreement is inline with expectations that the uncertainty is ±0.05C. This is remarkable considering that they all use wildly different methodologies and subsets of available data.

Last edited 27 days ago by bdgwx
bdgwx
August 24, 2021 6:51 am

You can calculate the mean as precisely as you want, it will still have the root-sum-square uncertainty as the sum you use to calculate the mean.”

You really think the uncertainty of the mean increases each time you add a new data point? Really?

Clyde Spencer
August 24, 2021 8:32 am

You really think the uncertainty of the mean increases each time you add a new data point?

Yes, for non-stationary data! If there is a trend, then every new data point changes the mean and standard deviation. Only for stationary data, where the changes are random, and tend to cancel, can the precision be improved by taking more measurements of the same thing with the same instrument!

bdgwx
August 24, 2021 9:15 am

Did you test that hypothesis with a monte carlo simulation?

Carlo, Monte
August 24, 2021 10:48 am

Where the fixed population from which you are sampling?

Last edited 27 days ago by Carlo, Monte
bdgwx
August 24, 2021 12:49 pm

I’m asking if you tested your hypothesis that the uncertainty of the mean increases as you increase the sample size when the samples represent different measurements, using different instruments, and/or when the data is non-stationary. You can test this quite easily with a monte carlo simulation. Have you done that?

Tim Gorman
August 24, 2021 3:20 pm

I gave you the example of boards laid end-to-end. Why do you refuse to address that example? You don’t need a monte carlo simulation.

Nor is generating random numbers sufficient. They have to be independent. Something you don’t seem to understand.

Carlo, Monte
August 24, 2021 4:36 pm

If there is no fixed population, you don’t divide by the square root of N. This is Stats 101.

Clyde Spencer
August 24, 2021 4:18 pm

You don’t need a Monte Carlo simulation to demonstrate my claim!

bdgwx
August 24, 2021 7:49 pm

If you’re claim is right then the results of the monte carlo simulation would confirm it. It doesn’t. I know because I actually did it. It doesn’t matter if the measurement is of the exact same thing, done by the same instrument, or whether the data is stationary. I even simulated both precision and accuracy uncertainty on each measurement. It just doesn’t matter. The uncertainty of the mean is always less than the uncertainty of the individual measurements whenever the measurement error is randomly distributed (it doesn’t even need to be normally distributed).

Jim Gorman
August 25, 2021 1:21 pm

You and Bellman consistently misunderstand error and uncertainty. You CAN NOT reduce uncertainty through statistics. You can remove random error.

Take my word for it. Draw a normal distribution only make the line width 1/2 of distance between the horizontal graduations. That is uncertainty. You can not tell what the actual value is because it could be anything value inside the line. It is something you don’t know and can never know. There is no statistical analysis you can do to reduce the width of that line.

Bellman
August 25, 2021 1:59 pm

You CAN NOT reduce uncertainty through statistics. You can remove random error.

I’ve written about this in more detail elsewhere, but you seem to have a definition of uncertainty that is very limited and not in keeping with things like the GUM.

Uncertainty of measurement can come from random error, it can come from systematic error. You can reduce the uncertainty caused by random error using statistical techniques. You can also try to reduce the effects of systematic errors, by adjusting the measurement to correct the error.

Draw a normal distribution only make the line width 1/2 of distance between the horizontal graduations. That is uncertainty. You can not tell what the actual value is because it could be anything value inside the line.

Sorry, I’m not sure what your describing in the experiment.

But I don’t know why you keep repeating things like “you cannot tell what the actual value is”. Of course you cannot and nobody is saying you can. What you can do is try to reduce the size of the uncertainty.

Captain Climate
August 24, 2021 11:11 am

Are you intentionally not understanding what he’s saying?

bdgwx
August 24, 2021 12:44 pm

I know what is being said and I’ve done the monte carlo simulation of it.

Tim Gorman
August 24, 2021 3:18 pm

Yes, absolutely.

You refuse to answer my example using boards of random, independent measurements.

If the uncertainty in the overall length grows as you add boards then why do you think the uncertainty in the mean won’t do the same?

Again, if q=Bx then delta-q = delta-B + delta_x

Since B is a constant (be in 1/N or 1/sqrt(N)) then delta-B is zero.

delta-q = delta-x.

What is so hard about this? Why won’t you address how this contradicts your assertion concerning the mean? if delta-q grows then why doesn’t the uncertainty of the mean grow as well since it is dependent on the uncertainty of q?

You simply cannot decrease the uncertainty by increasing the uncertainty.

Random, independent measurements do not meet the requirements for being a Gaussian distribution. Therefore you can’t find a true value that is the mean. In fact, none of the units making up the universe may actually equal the mean.

Neither is uncertainty a probability distribution. Uncertainty is, therefore, not amenable to statistical analysis.

You are laboring under the misconception that statistics is a hammer and everything in the world is a nail. Independent, random data is not a nail, it is a screw. Learn it, love it, live it.