Guest Post by Willis Eschenbach
The Berkeley Earth Surface Temperature (BEST) team is making a new global climate temperature record. Hopefully this will give us a better handle on what’s going on with the temperature.
BEST has put out a list of the four goals for their mathematical methods (algorithms). I like three of those goals a lot. One I’m not so fond of. Here are their goals:
1) Make it possible to exploit relatively short (e.g. a few years) or discontinuous station records. Rather than simply excluding all short records, we prefer to design a system that allow short records to be used with a low – but non‐zero – weighting whenever it is practical to do so.
2) Avoid gridding. All three major research groups currently rely on spatial gridding in their averaging algorithms. As a result, the effective averages may dependant on the choice of grid pattern and may be sensitive to effects such as the change in grid cell area with latitude. Our algorithms seek to eliminate explicit gridding entirely.
3) Place empirical homogenization on an equal footing with other averaging. We distinguish empirical homogenization from evidence‐based homogenization. Evidence‐based adjustments to records occur when secondary data and/or metadata is used to identify problems with a record and propose adjustments. By contrast, empirical homogenization is the process of comparing a record to its neighbors to detect undocumented discontinuities and other changes. This empirical process performs a kind of averaging as local outliers are replaced with the basic behavior of the local group. Rather than regarding empirical homogenization as a separate preprocessing step, we plan to incorporate empirical homogenization as a process that occurs simultaneously with the other averaging steps.
4) Provide uncertainty estimates for the full time series through all steps in the process.
Using short series, avoiding gridding, and uncertainty estimates are all great goals. But the whole question of “empirical homogenization” is fraught with hidden problems and traps for the unwary.
The first of these is that nature is essentially not homogeneous. It is pied and dappled, patched and plotted. It generally doesn’t move smoothly from one state to another, it moves abruptly. It tends to favor Zipf distributions, which are about as non-normal (i.e. non-Gaussian) as a distribution can get.
So I object to the way that the problem is conceptualized. The problem is not that the data requires “homogenization”, that’s a procedure for milk. The problem is that there are undocumented discontinuities or incorrect data entries. But homogenizing the data is not the answer to that.
This is particularly true since (if I understand what they’re saying) they have already told us how they plan to deal with discontinuities. The plan, which I’ve been pushing for some time now, is to simply break the series apart at the discontinuities and treat it at two separate series. And that’s a good plan. They say:
Data split: Each unique record was broken up into fragments having no gaps longer than 1 year. Each fragment was then treated as a separate record for filtering and merging. Note however that the number of stations is based on the number of unique locations, and not the number of record fragments.
So why would they deal with “empirical discontinuities” by adjusting them, and deal with other discontinuities in a totally different manner?
Next, I object to the plan that they will “incorporate empirical homogenization as a process that occurs simultaneously with the other averaging steps.” This will make it very difficult to back it out of the calculations to see what effect it has had. It will also hugely complicate the question of the estimation of error. For any step-wise process, it is crucial to separate the steps so the effect of each single step can be understood and evaluated.
Finally, let’s consider the nature of the “homogenization” process they propose. They describe it as a process whereby:
… local outliers are replaced with the basic behavior of the local group
There’s a number of problems with that.
First, temperatures generally follow a Zipf distribution (a distribution with a large excess of extreme values). As a result, what would definitely be “extreme outliers” in a Gaussian distribution are just another day in the life in a Zipf distribution. A very unusual and uncommon temperature in a Gaussian distribution may be a fairly common and mundane temperature in a Zipf distribution. If you pull those so-called outliers out of the dataset, or replace them with a local average, and you no longer have temperature data – you have Gaussian data. So you have to be real, real careful before you declare an outlier. I would certainly look at the distributions before and after “homogenization”, to see if the Zipf nature of the distribution has disappeared … and if so, I’d reconsider my algorithm.
Second, while there is a generally high correlation between temperature datasets out to 1200 km or so, that’s all that it is. A correlation. It is not a law. For any given station, there will often be nearby datasets that have very little correlation. In addition, for each of the highly correlated pairs, there will be a number of individual years where the variation in the two datasets is quite large. So despite high correlation, we cannot just assume that any record that disagrees with the “local group” is incorrect, as the BEST folks seem to be proposing.
Third, since nature itself is almost “anti-homogeneous”, full of abrupt changes and frequent odd occurrences and outliers, why would we want to “homogenize” a dataset at all? If we find data we know to be bad, throw it out. Don’t just replace it with some imaginary number that you think is somehow more homogeneous.
Fourth, although the temperature data is highly correlated out for a long distance, the same is not true of the trend. See my post on Alaskan trends regarding this question. Since the trends are not correlated, adjustment based on neighbors may well introduce a spurious trend. If the “basic behavior of the local group” is trending upwards, and the data being homogenized is trending horizontally, both may indeed be correct, and homogenization will destroy that …
Those are some of the problems with “homogenization” that I see. I’d start by naming it something else. It does not describe what we wish to do to the data. Nature is not homogenous, and neither should our dataset be homogeneous.
Then I’d use the local group, solely to locate unusual “outliers” or shifts in variance or average temperature.
But there’s no way I’d replace the putative “outliers” or shifts with the behavior of the “local group”. Why should I? If all you are doing is bringing the data in line with the average of the local group, why not just throw it out entirely and use the local average? What’s the advantage?
Instead, if I found such an actual anomaly or incorrect data point, I’d just throw out the bad data point, and break the original temperature record in two at that point, and consider it as two different records. Why average it with anything at all? That’s introducing extraneous information into a pristine dataset, what’s the point of that?
Lastly, a couple of issues with their quality control procedures. They say:
Local outlier filter: We tested for and flagged values that exceeded a locally determined empirical 99.9% threshold for normal climate variation in each record.
and
Regional filter: For each record, the 21 nearest neighbors having at least 5 years of record were located. These were used to estimate a normal pattern of seasonal climate variation. After adjusting for changes in latitude and altitude, each record was compared to its local normal pattern and 99.9% outliers were flagged.
Again, I’d be real, real cautious about these procedures. Since the value in both cases is “locally determined”, there will certainly not be a whole lot of data for analysis. Determination of the 99.9% exceedance level, based solely on a small dataset of Zipf-distributed data, will have huge error margins. Overall, what they propose seems like a procedure guaranteed to convert a Zipf dataset into a Gaussian dataset, and at that point all bets are off …
In addition, once the “normal pattern of seasonal climate variation” is established, how is one to determine what is a 99.9% outlier? The exact details of how this is done make a big difference. I’m not sure I see a clear and clean way to do it, particularly when the seasonal data has been “adjusted for changes in latitude and altitude”. That implies that they are not using anomalies but absolute values, and that always makes things stickier. But they don’t say how they plan to do it …
In closing, I bring all of this up, not to oppose the BEST crew or make them wrong or pick on errors, but to assist them in making their work bulletproof. I am overjoyed that they are doing what they are doing. I bring this up to make their product better by crowd-sourcing ideas and objections to how they plan to analyze the data.
Accordingly, I will ask the assistance of the moderators in politely removing any posts talking about whether BEST will or won’t come up with anything good, or of their motives, or whether the eventual product will be useful, or the preliminary results, or anything extraneous. Just paste in “Snipped – OT” to mark them, if you’d be so kind.
This thread is about how to do the temperature analysis properly, not whether to do it, or the doer’s motives, or whether it is worth doing. Those are all good questions, but not for this thread. Please take all of that to a general thread regarding BEST. This thread is about the mathematical analysis and transformation of the data, and nothing else.
w.
Discover more from Watts Up With That?
Subscribe to get the latest posts sent to your email.

Willis Eschenbach says:
March 23, 2011 at 1:13 am
BigWaveDave says:
March 23, 2011 at 12:53 am
Ian W says:
March 23, 2011 at 3:06 am
. . . regarding pressure and humidity . . .
I agree with Ian. He stated it one way. I would like to state it in a slightly different way. Temperature is a poor, incomplete surrogate measure of local atmospheric energy content. So, what good can come out of determining a world wide “average” of nonuniformally measured temperature (an incomplete surrogate for energy) without corresponding world wide detail on the local time dependent concentrations of both the major energy carrying agent (water) and most importantly the local vertical concentration distribution of what everyone seems to be clamoring is the pesky AGW causitive agent CO2. I agree homogenizing temp data, might allow introduction of a meaningless bias, but without any other more important MEASURED data as well, especially site specific data (eg. water and CO2 concentration, vertical distribution, time dependent changes in local human population, local aircraft ‘concentration’, local changes in number, proximity and horsepower of engines, building ‘concentration’ and height, surface emissivity etc.) the CHANGE in temperature can not in any meaningful way be attributed to a CAUSE.
And after all that they will reclaim a precision of tenths of degree. Ahaha!
diogenes says:
March 23, 2011 at 3:18 pm
“Sorry, that was my first dip into climate progress: it is like dealing with a horde of hysterical lunatics.”
Given your name, the worst of the worst should not disappoint you in any way. Those CP people must be “something else.”
steven mosher says:
March 23, 2011 at 12:15 pm
“Willis I think you’ve misunderstood empirical homogenization. Further, I’m not at all convinced that temperatures at a given location are universally described by a power law distribution, third I do not see the kind of discontinuous behavior that you speak of. 4th, the 1200km correlation distance is an integral part of the error calculation due to spatial sampling.”
Why do you post this? It’s just a list of “mentions” of points on which you disagree with Willis. I could understand a post in which you explain one of your points. Do you not have time to explain your points? Then why post at all? Anyone who demonstrates the temperament that you demonstrate in this post will inevitably bring upon himself a lot of unnecessary push-back. I hope BEST does not share your temperament. If they do, this entire exercise is wasted.
i would love to see application similar to http://www.gapminder.org/ . or at least some kind of the sensitivity analysis. also the stock market approach sounds quite reasonable to me.
A “movie” of 15 four year periods of 525,960 1951-2010 hourly averaged data from De Bilt NL (station # 260).
http://boels069.nl/klimaat/Temperature.wmv
(Needs Windows MediaPlayer)
Data from:
http://www.knmi.nl/klimatologie/uurgegevens/#no
The plots represent counts of unique temperatures and the Excel (Office 2010) Norm.Dist function (Gaussian I believe).
I’m also wondering about the Zipf distribution.
Tim Folkerts says:
March 23, 2011 at 11:07 am
Willis,
I have to ask about the use of the Zipf Distribution.
A quick check of Google shows that you are about the only person who uses this distribution with respect to temperature (yet you seem very confident that it is THE distribution for temperature data). Typically this distribution is used to look at things like how often different words come up in language, how often cities of various sizes occur, how much traffic the 7th most popular web page gets compared to the 8th most popular.
This distribution is based on rankings of items. From Wikipedia on “Zipf’s Law”:
For example, in the Brown Corpus “the” is the most frequently occurring word, and by itself accounts for nearly 7% of all word occurrences (69,971 out of slightly over 1 million). True to Zipf’s Law, the second-place word “of” accounts for slightly over 3.5% of words (36,411 occurrences), followed by “and” (28,852).
A similar statement for temperature would be something like:
For example, at some weather station “70 F” is the most frequently occurring temperature , and by itself accounts for nearly 7% of all temperature records. True to Zipf’s Law, the second-place temperature “69 F” accounts for slightly over 3.5% of temperature records followed by “71 F”.
Of course, the specific percentages would be different, but I don’t really see how a ranking of temperature frequencies will be an effective way to analyze the data. For one thing, it eliminates the sign of the deviation from the most common data (the mode).
I agree totally with your point. I might add that the language is a qualitative data set. Temperature is quantitative data. Also temperature has a periodic seasonality which will dominate the distribution of data. There is no assumption required that data be Gaussian to do the homogenization of the statistics to eliminate bad data.
An extensive review of different homogenization techniques was published by researchers from 11 different countries. There are different mathematical techniques in additional to labor intensive techniques that make use of metadata.
http://docs.google.com/viewer?a=v&q=cache:fIHtfyKKinoJ:citeseerx.ist.psu.edu/viewdoc/download%3Fdoi%3D10.1.1.122.1131%26rep%3Drep1%26type%3Dpdf+homogenization+temperature+data+rationale&hl=en&gl=us&pid=bl&srcid=ADGEEShduEujlOnUZP5mmxCwyhSmq9kJPCwRXcYLO1wk8-0iS61vMlDsMZ8Fo1-zPI44u0nCgQHk_LOQaypaAY3hHO1VOeqTutMglkMdJ4K7Z3KR6u5nMN3wj9ihRzoz6S7pg-m-4AAG&sig=AHIEtbQI2CgKWKVQS1C43Eoa-Glzsi5A8Q
I think without looking at how these work in some detail, it seems cavalier to dismiss them with the remark used by Eschenbach:
“The problem is not that the data requires “homogenization”, that’s a procedure for milk. The problem is that there are undocumented discontinuities or incorrect data entries. But homogenizing the data is not the answer to that.”
The rhetoric here treats homogenization as if it were some kind of obscentity.
The definition of inhomogeneities in the data includes, equipment failures, changes in location of stations, changes in environment, measurement practices operator error etc., i.e. any changes other than changes in climate that affects the data. The process of homogenization is to extract the true behavior of the climate and eliminate the extranenous factors in the data. Part of process of homogenization of the data is to find the discontinuities and errors in the data. The review makes that clear. Also, when multiple stations are used, and correlated with one another, the probability of making an error is significantly reduced.
There is no perfect way to homogenize data. As the review paper points out, the best method to use, depends on the state of the data, the distance between stations etc. The authors point out, in their conclusions section, that when significant errors need to be corrected, homogenized data sets done by different methods tend to resemble one another more than they resemble the original data. To me that would confirm that homogenization is a good idea. The review points out that when large numbers of stations are averaged, the homogenized and original average trends are very similar. Homogenization of data is most useful for regional temperature data.
Willis:
If I recall correctly, Ross McKitrick, et. al. concluded that the whole concept of a “global average temp.” is as goofy as the narrative gets. Do you deny this? If so, why? If not, what good is the BEST investigation? Is it not a waste of time and resources, if the “conclusions” mean nothing?
I personally agree with RP Sr. who maintains that the ocean temp. is the important metric.
What say you?
eadler says:
March 23, 2011 at 6:33 pm
“The rhetoric here treats homogenization as if it were some kind of obscentity.”
It is nice to see that you are reading better than you once did and that you are picking up nuance better than you once did.
The BEST people have an opportunity to connect with sceptics. They should take it. Otherwise, after November 2012 there will be no Democrats in Congress, EPA will be abolished, and the UN will be escorted by security to the new country of its choice.
If all BEST is going to do is discuss advanced statistical techniques then their effort is DOA, dead on arrival. They have to take up questions that interest sceptics if they want to appeal to sceptics.
When the actual temperature field is as inhomogenous as careful measurements show it to be, the ex ante imposition of academic notions of homogeneity is scientifically misguided. There is no such thing as a unique “true” temperature at any averaging scale in any region that can be entirely divorced from the particular spot where the thermometer is placed. While the correlation of deviations from the station average can extend over considerable distances, that does not apply to the absolute temperatures themselves. Somehow that point escapes those who favor the statistical massage of scraps of data from a temporally non-uniform set of stations over actual measurements. No less than than the vain excercises in homogenization, the abandonment of a uniform set of datum levels–which all too often are corrupted by urbanization–when the inclusion of very short records becomes a primary goal strikes me as being a grievous flaw.
Seeing as the error bars on any surface temperature measurement would have to be +/- 1 deg C at least I find it hard to believe that a conclusion other than “to within the measurement errors no temperature trend is discernible” could possibly be warranted.
This project is too silly for words.
Historical and agricultural records along with a little archaeology would seem to give a better idea of climate change in the last few thousand years of the holocene and those records integrate all aspects of climate, not just temperature.
The recent Russian heatwave was (finally) put down to a blocking high. It was unusual, it was unexpected, it could be determined to be an outlier but it occurred, therefore it is a valid piece of data.
Why should it be removed in favour of some-one’s arbitrary idea of what is/is not an outlier?
Theo Goodwin says:
March 23, 2011 at 7:55 pm
[eadler] It is nice to see that you are reading better than you once did and that you are picking up nuance better than you once did.
The BEST people have an opportunity to connect with sceptics. They should take it. Otherwise, after November 2012 there will be no Democrats in Congress, EPA will be abolished, and the UN will be escorted by security to the new country of its choice.
If all BEST is going to do is discuss advanced statistical techniques then their effort is DOA, dead on arrival. They have to take up questions that interest sceptics if they want to appeal to sceptics.
– – – – – – –
Theo Goodwin,
Thanks, that was simply and well said.
If it happens, for example with Anthony and the WUWT fellowship’s help, that there is increasingly more timely and direct engagement between BEST team ongoing efforts and the skeptics on the blogs, then I can only see it as being a very positively perceived enhancement of the stature of the BEST project.
John
Ocean temperatures;
There’s a hole in that bucket.
More missing than heat.
============
134.
When the actual temperature field is as inhomogenous as careful measurements show it to be, the ex ante imposition of academic notions of homogeneity is scientifically misguided. There is no such thing as a unique “true” temperature at any averaging scale in any region that can be entirely divorced from the particular spot where the thermometer is placed. While the correlation of deviations from the station average can extend over considerable distances, that does not apply to the absolute temperatures themselves. Somehow that point escapes those who favor the statistical massage of scraps of data from a temporally non-uniform set of stations over actual measurements. No less than than the vain excercises in homogenization, the abandonment of a uniform set of datum levels–which all too often are corrupted by urbanization–when the inclusion of very short records becomes a primary goal strikes me as being a grievous flaw.
You have made an excellent rhetorical argument with powerful adjectives – vain, corrupted etc., but in essence, you are making a straw man argument here. The purpose of homogenization is to determine deviations from the average, ie. temperature anomalies, rather than absolute temperatures. We don’t really care about the absolute numbers, only the trends.
David Socrates says: March 23, 2011 at 5:37 pm
There is another interesting aspect of “the [o]fficial HadCRUT3 data” plot http://www.thetruthaboutclimatechange.org/temps.png
It starts near a trough (1860) and ends near a peak (~ 2005).
Nonetheless, the HadCRUT3 trend is remarkably “consistent with” Akasofu’s view of temperature trends:
Akasofu, Syun-Ichi. 2009. Two Natural Components of the Recent Climate Change: University of Alaska Fairbanks Fairbanks, Alaska: International Arctic Research Center, April 30. http://people.iarc.uaf.edu/~sakasofu/pdf/Earth_recovering_from_LIA_R.pdf
Akasofu, Syun-Ichi. 2010. “‘On the recovery from the Little Ice Age’.” Natural Science 2 (11): 1211-1224. doi:10.4236/ns.2010.211149. http://www.scirp.org/journal/PaperInformation.aspx?PaperID=3217&JournalID=69
I second this.
curryja says:
March 23, 2011 at 10:19 am
Always good to hear from you, Judith. And I agree completely. Someone above said the effort was “doomed to failure”, but I see it as the exact opposite, a very important initiative whose outlines are not yet set.
The most important things to me are accessibility and transparency. If each step is visible, and the data and code is public, then whether their new methods are valuable additions will quickly become apparent.
One of the first things that struck me when I entered the field was the lack of an agreed upon temperature dataset. Before we can even begin to discuss such things as the effect of UHI upon the dataset, the data has to be clean and good and quality controlled and agreed upon.
I am, as I mentioned above, not in agreement with several of their methods. First, the idea that using only the data from a group of 21 nearby stations one can reliably determine whether a data point is less than a one in a thousand event (99.9% exceedence rate) or not seems impossible. I believe that they have a statistician among the team, but if so, they haven’t thought this one all of the way through.
Because if you want to reliably determine if a certain data point is a one-in-a-thousand event or not, you’ll need a minimum of something like thirty times that much data, or 30,000 data points … and I doubt if we have that for a number of the stations in question. And that’s without even including the complication of the
Zipf distribution.
Part of the problem is that most statisticians don’t even think about Zipf distributions at all, whereas nature thinks about them all the time. And extremes are the stock-in-trade of Zipf distributions. So the error margins on the calculations of which is a valid data point and which is bogus data will be huge.
Finally, as I indicated above, the whole idea of “homogenization” is anathema to me.
But that just a disagreement about what to do with the data, once we have an agreed upon dataset. With the dataset, people around the world will be able to do their own analyses and come to their own conclusions.
All the best,
w.
PS – Please, please, please, if you have any power over BEST, have them make the data available as a single 2-D block, with rows as time and columns as stations. I’m not interested in 35,000 individual station records a la CRU … indeed, the design of the public availability gateway for this data is crucial. They should take a look at the KNMI website, where I can pull up any subset of the data I’m interested in, filtered in a host of possible ways.
Jeff Carlson says:
March 23, 2011 at 10:39 am
Jeff, all data is bad data to a greater or lesser degree … the only question is the degree. No measurement is ever exact. We don’t let that stop us in any field of science, it just affects the confidence intervals.
w.
Tim Folkerts says:
March 23, 2011 at 11:07 am
Thanks, Tim. The Zipf is one of a number of closely related power law distributions. They all differ from Gaussian distributions in that they have an excess of extreme events. For this reason they are sometimes called “fat-tailed” distributions.
My point is not that Zipf is the one and only relevant distribution. It is that we cannot use Gaussian statistics to identify extreme events in natural datasets.
If I ran the zoo, the first thing that I’d do is to take the highest quality records that I had and determine which particular power-law distribution gives the best fit to the temperature data.
Then, and only then, would I start talking about the “99.9% exceedance” limits …
I discuss the Zipf distribution some more in the appendices to this post.
w.
eadler says:
March 24, 2011 at 5:29 am
“The purpose of homogenization is to determine deviations from the average, ie. temperature anomalies, rather than absolute temperatures. We don’t really care about the absolute numbers, only the trends.”
Pray tell, how does one establish reliable “averages” in the first place, without maintaining FIXED datum levels at a UNIFORM (unchanging) set of stations? You seem oblivious to the fact that by employing scraps of record from an EVER-CHANGING set of stations, you get a data sausage with mystery ingredients, rather than a physically meaningful ensemble average of station records. And that’s in the very best case, without any data handling issues or the datum-corrupting influences of UHI and site/instrumentation or land-use changes. You SHOULD care about absolute levels, because that’s the ONLY thing that thermometers measure.
Spurious offsets from datum level are readily lost from sight in the highly variable, stochastic temperature changes at any station, but even a half-dozen offset years near the beginning or end of the record can have a profound effect upon the regressional “trend.” In fact, such trends are the most inconsistent features of actual station records. Without proper vetting of station data at an ABSOLUTE level, which can only be done with sufficiently long records, trends become ephemeral artifacts. This problem is not solved by MANUFACTURING a time-series via “homogenization” from neighboring unvetted data.
Thanks Willis,
As always, very informative, and I come here to learn.
sky says:
March 24, 2011 at 2:44 pm
eadler says:
March 24, 2011 at 5:29 am
“The purpose of homogenization is to determine deviations from the average, ie. temperature anomalies, rather than absolute temperatures. We don’t really care about the absolute numbers, only the trends.”
Pray tell, how does one establish reliable “averages” in the first place, without maintaining FIXED datum levels at a UNIFORM (unchanging) set of stations? You seem oblivious to the fact that by employing scraps of record from an EVER-CHANGING set of stations, you get a data sausage with mystery ingredients, rather than a physically meaningful ensemble average of station records. And that’s in the very best case, without any data handling issues or the datum-corrupting influences of UHI and site/instrumentation or land-use changes. You SHOULD care about absolute levels, because that’s the ONLY thing that thermometers measure.
Spurious offsets from datum level are readily lost from sight in the highly variable, stochastic temperature changes at any station, but even a half-dozen offset years near the beginning or end of the record can have a profound effect upon the regressional “trend.” In fact, such trends are the most inconsistent features of actual station records. Without proper vetting of station data at an ABSOLUTE level, which can only be done with sufficiently long records, trends become ephemeral artifacts. This problem is not solved by MANUFACTURING a time-series via “homogenization” from neighboring unvetted data.
You are making me hungry with your talk of data sausages for breakfast. I was hoping they would be something substantial , but you dashed my hopes because you say the trends they created are ephemeral artifacts. If they are ephemeral I guess they are not substantial enough to satisfy my breakfast hunger.
In fact you are criticizing a procedure which you don’t understand and haven’t read, because the details haven’t been released yet in a full paper. In any case, the homogenization process is what “vets the data”, so your phrase “without vetting the data” is an unfounded assumption about the procedure you claim to criticize.
eadler says:
March 24, 2011 at 7:02 pm
“In fact you are criticizing a procedure which you don’t understand and haven’t read, because the details haven’t been released yet in a full paper. In any case, the homogenization process is what “vets the data”, so your phrase “without vetting the data” is an unfounded assumption about the procedure you claim to criticize.”
You can presume anything you want about what I ostensibly “don’t understand” about “homogenization,” but any time measured values in a station record are altered or replaced with something inferred statistically from other stations, the observational basis is no longer there. Bona fide vetting doesn’t do that! And I don’t waste my time on plainly attitudinal arguments that lack substantive basis.
I prefer the use of the log normal distribution for continuous data rather than the Zipf distribution which is useful for discrete data. My personal experience indicates that natural series for continuous data tend to be log normal with widely varying parameters. (I haven’t read all the comments so far – too many!) In any case the use of the average as the basis for a anomalous datum is probably wrong most of the time because the median is a better measure of the central tendency then the average. In any case, the raw data must be preserved!