Reposted from Dr. Judith Curry’s Climate Etc.
Posted on May 17, 2021 by curryja
by S. Stanley Young and Warren Kindzierski
Climate Etc. recently carried several insightful posts about How we fool ourselves. One of the posts – Part II: Scientific consensus building – was right on the money given our experience! The post pointed out that… ‘researcher degrees of freedom’… allows for researchers to extract statistical significance or other meaningful information out of almost any data set. Along similar lines, we offer some thoughts on how others try to fool us using statistics (aka how to lie with statistics); others being epidemiologists and government bureaucrats.
We have just completed a study for the National Association of Scholars [1] that took a deep dive looking at flawed statistical practices used in the field of environmental epidemiology. The study focused on air quality−health effect claims; more specifically PM2.5−health effect claims. However, the flawed practices apply to all aspects of risk factor−chronic disease research. The study also looked at how government bureaucrats use these claims to skew policy in favor of PM2.5 regulation and their own positions.
All that we discuss below is drawn from our study. Americans need to be aware that current statistical practices being used at the EPA for setting policy and regulations are flawed and obviously expensive. Viewers can download and read our study to decide the extent of the problem for themselves.
1. Introduction
Unbeknownst to the public and far too many academic scientists, modern science suffers from an irreproducibility crisis in a wide range of disciplines—from medicine to social psychology. Far too frequently scientists are unable reproduce claims made in research.
Given the irreproducible science crisis, we completed a study for the National Association of Scholars (NAS) in New York as part of the Shifting Sands project. The project—Shifting Sands: Unsound Science and Unsafe Regulation—examines how irreproducible science negatively affects select areas of government policy and regulation in different federal agencies.
Our study investigated portions of research in the field of epidemiology used for US Environmental Protection Agency (EPA) regulation of PM2.5. This research claims that particulate matter smaller than 2.5 microns (PM2.5) in outdoor air is harmful to humans in many ways. But is the research on PM2.5 and the claims made in the research misleading?
2. Bias in academic research
Academic researcher incentives reward exciting research with new positive (significant association) claims—but not reproducible research. This encourages epidemiologists – who are mainly academics – to wittingly or negligently use various flawed statistical practices to produce positive, but (we show) likely false, claims.
There are numerous key biases that epidemiologists continue to unintentionally (or intentionally) ignore in studies of air quality and health effects. This is done to make positive, but likely false, research claims. Some examples are:
- multiple testing and multiple modeling
- omitting predictors and confounders
- not controlling for residual confounding
- neglecting interactions among variables
- not properly testing model assumptions
- neglecting exposure uncertainties
- making unjustified interventional causal interpretation of regression coefficients
Our study focused on the multiple testing and multiple modeling bias to assess whether a body of research has been affected by flawed statistical practices. We subjected research claiming that PM2.5 is harmful to a series of simple but severe statistical tests.
3. How epidemiologists skew research
Our study found strong circumstantial evidence that claims made about PM2.5 causing mortality, heart attacks and asthma are compromised by flawed statistical practices. These flawed practices make the research untrustworthy as it favors producing false claims that would not reproduce if done properly. This is discussed further below.
Estimating the number of statistical tests in a study – There is known flexibility available to epidemiology researchers to undertake a range of statistical tests and use different statistical models on observational data sets. The researchers then can select, use and report (cherry pick) a portion of the test and model results that favor a narrative.
Epidemiologists typically use a Relative Risk (RR) or Odds Ratio (OR) lower confidence limit > 1 (or a p-value < 0.05) as decision criteria to justify a significant PM2.5−health effect claim in a statistical test. However, for any given number of statistical tests performed on the same set of data set, 5% are expected to yield a significant, but false result. A study with 13,000 statistical tests could have as many as 0.05 x 13,000 = 650 significant, but false results!
Given advanced statistical software, epidemiologists today can easily perform this many or more statistical tests on a set of data in an observational study. They can then cherry pick 10 or 20 of their most interesting findings and write up a nice, tight research paper around these findings—which are most likely to be false, irreproducible findings. We have yet to see an air quality−health effects study that reports as many as 650 results. How exactly is one supposed to tell the difference between a false positive or a possible true positive result when so many tests are performed and so few results are presented?
Diagnosing evidence of publication bias, p-hacking and/or HARKing – Publication bias is the failure to publish the results of a study unless they are positive results that show significant associations. P-hacking is reanalyzing data in many different ways to yield a target result. HARKing (Hypothesizing After Results are Known) is using the data to generate a hypothesis and pretend the hypothesis was stated first.
It is traditional in epidemiology to use confidence intervals instead of p-values from a hypothesis test to demonstrate statistical significance. As both confidence intervals and p-values are constructed from the same data, they are interchangeable, and one can be calculated from the other.
We first calculated p-values from confidence intervals for data from meta-analysis studies that make PM2.5−health effect claims. A meta-analysis is a systematic procedure for statistically combining data from multiple studies that address a common research question—for example, whether PM2.5 is a likely cause of a specific health effect (e.g., mortality). We looked at meta-analysis studies claiming that PM2.5 causes: i) mortality, ii) heart attacks and iii) asthma.
We then used a simple but novel statistical method—p-value plotting—as a severe test to diagnose evidence of publication bias, p-hacking and/or HARKing in this data. More specifically, after calculating p-values from confidence intervals we then plotted the distribution of rank ordered p-values (a p-value plot).
Conceptually, a p-value plot allows us to examine a specific premise that factor A causes outcome B using data combined from multiple observational studies in meta-analysis. What should a p-value plot of the data look like?
- a plot that forms an approximate 45-degree line provides evidence of randomness—supporting the null hypothesis of no significant association between factor A & outcome B (Figure 1)
- a plot that forms approximately a line with slope < 1, where most of the p-values are small (less than 0.05), provides evidence for a real effect—supporting a statistically significant association between factor A & outcome B (Figure 2)
- a plot that exhibits bilinearity—that divides into two lines—provides evidence of publication bias, p-hacking and/or HARKing (Figure 3)

Figure 1. P-value plot of a meta-analysis of observational data sets analyzing associations between elderly long-term exercise training (factor A) and mortality & morbidity (injury) (outcome B); data points drawn from 40 observational studies.

Figure 2. P-value plot of a meta-analysis of observational data sets analyzing associations between smoking (factor A) and squamous cell carcinoma of the lungs (outcome B); data points drawn from 102 observational studies.

Figure 3. P-value plot of a meta-analysis of observational data sets analyzing associations between PM2.5 (factor A) and all−cause mortality (outcome B); data points drawn from 29 observational studies.
We show over a dozen p-value plots in our study for meta-analysis data of associations between PM2.5 (and other air quality components) and mortality, heart attacks and asthma. All these plots exhibit bilinearity!
This provides compelling circumstantial evidence that the literature on PM2.5 (and other air quality components)—specifically for mortality, heart attack and asthma claims—has been affected by statistical practices that have rendered the underlying research untrustworthy.
Our findings are consistent with the general claim that false-positive results from publication bias, p-hacking and/or HARKing are common features of the medical science literature today, including the broad range of risk factor−chronic disease research.
4. How government bureaucrats skew policy
The process is further derailed with government involvement. The EPA have relied on statistical analyses to show significant PM2.5−health effect associations. EPA bureaucrats who fund this type of research depend on regulations to support their existence. The EPA has slowly imposed increasingly restrictive regulation over the past 40 years.
However, the EPA appears to have acted selectively in its approach to the health effects of PM2.5. This has been done by paying more attention to research that supports regulation (i.e., shows significant PM2.5−health effect associations) and ignoring or downplaying research that shows no significant PM2.5−health effect associations. This latter research exists, it is simply ignored or downplayed by the bureaucrats! Nor are the researchers finding negative results funded.
It is apparent to us that bureaucrats lack an understanding of, or willfully ignore, flawed statistical practices and other biases identified above in PM2.5−health effects research. They, along with environmental activists, continuously push for tighter air quality regulation based on flawed practices and false findings.
5. Can this mess be fixed?
Epidemiologists and government bureaucrats collectively skew results of medical science towards justifying regulation of PM2.5, while almost always keeping their data sets private. Far too many of these types, and a distressingly large amount of the public, believe that academic (university) science is superior to industry science. However, as epidemiology evidence is largely based on university research, we should treat it with the same skepticism as we would industry research.
Mainstream media appear clueless and uninterested in glaring biases in epidemiology research that cause false findings—flawed statistical practices, analysis manipulation, cherry picking results, selective reporting, broken peer review.
Epidemiologists, and government bureaucrats who depend on their work to justify PM2.5 regulation, proceed with far too much self-confidence. They have an insufficient sense of the need for awareness of just how much statistics must remain an exercise in measuring uncertainty rather than establishing certainty. This mess plagues government policy by providing a false level of certainty to a body of research that justifies PM2.5 regulation.
In our study we make several recommendations to the Biden administration for fixing this mess. However, we do not hold our breath that they will be considered. Some of these include:
- the administration needs to support statistically sound and reproducible science
- unsound statistical practices silently supported by the EPA need to stop
- the building and analysis of data sets should be separately funded
- these data sets should be made available for public scrutiny
Most importantly, Americans need to be aware that current statistical practices being used at the EPA for setting policy and regulations are flawed and obviously expensive.
S. Stanley Young (genetree@bellsouth.net) is the CEO of CGStat in Raleigh, North Carolina and is the Director of the National Association of Scholars’ Shifting Sands Project. Warren Kindzierski (warrenk@ualberta.ca) is an Adjunct Professor in the School of Public Health at the University of Alberta in Edmonton, Alberta.
[1] Young SS, Kindzierski W, Randall D. 2021. Shifting Sands: Unsound Science and Unsafe Regulation. Keeping Count of Government Science: P-Value Plotting, P-Hacking, and PM2.5 Regulation. National Association of Scholars, New York, NY. https://www.nas.org/reports/shifting-sands
Discover more from Watts Up With That?
Subscribe to get the latest posts sent to your email.
Cool! Now do climastrology. Do climastrology. C’mon, do climastrology next. You know, like this one, but climastrology….do climastrology next….
“…statistics must remain an exercise in measuring uncertainty rather than establishing certainty.” That’s poetry, that!
Interestingly, 2.5 microns is the diameter of cigarette smoke particles. Also, it is well known physiologically that ‘cilia’ (motile hairs in the lungs) very efficiently move particles of this size upwards in the lungs through the thin mucus coating and into the bronchial tubes where they can be coughed out.
Even a smoker gets good service for many years, but the constant workload of the cilia eventually wears them down resulting in loss of cilia motility and emphysema eventually results.
Now this physio fact is known to epidemiologists (and compulsive trivia junkies like me) but maybe not to many stat-math professionals. I am blown away that just doing the rigorous statistical retesting of data sets from a number of flawed studies essentially tells us there is something grossly wrong with the characterization of 2.5 micron particles as dangerous to health! Moreover, the gov regulators have ‘logic’ on their side because most people, even well educated ones would grant that breathing this stuff in can’t be doing anyone any good.
I love statistics even though my skills with it aren’t anywhere on the same planet as that of Drs Young and Kindzierski. But I revere it all the more because of this analysis.
Of course, the chemical composition of the PM2.5 must also be considered. Inert particles are easily expelled by the lungs in the concentrations usually regulated. Sheesh, this is so easy to investigate by actual experiment.
Silica (quartz – SiO2) dust is dangerous because it is sufficiently soluble and solidifies as hyaline (opal) that repeated exposure builds up causing a terminal illness
known as silicosis, once common among gold quartz-vein miners.
Great posting. I love this p-hack test – very easy to understand and follow.
The EPA is a good example of how propaganda is turned into pseudo-science in order to convince the general population that a change is needed. Start with the end-result, make up some sort of pseudo-science to support it, and then use cherry-picked statistics to “prove” you are right.
Not to steal the authors’ thunder, but PM25 “research” is much worse than described. There is no measurement of exposure, only proxies such as distance from a home to a highway. There is no measurement of other variables, especially potentially alternate causes for the selected diseases, which include Parkinsons and Alzheimers (they are NOT lung diseases). No one knows who got exposed to what or how dust could cause the diseases. It’s completely phony junk science.
All the PM25 research comes from one institution, the Harvard T.H. Chan School of Public Health. It’s their bread and butter. Nobody else gets funded to do this phony research, or if they do it’s a side channel scam with the money run through Chan.
The EPA is a political animal, a bureau-jackal, that hunts prey. They do nothing to stop pollution; in fact they cause it [here]. The PM25 scam is just another panic fad ala climate/covid/cholesterol/whatever is handy.
https://www.newsweek.com/epa-causes-massive-colorado-spill-1-million-gallons-mining-waste-turns-river-361019
[Mods please don’t quarantine my excellent apropos comment too long]
In 2014 I “took” a non-credit Coursera on-liine class on “R” programming, mostly out of curiosity and an excess of free time. It was taught by a professor from Johns Hopkins, a computer scientist specializing in data modeling and statistics. Several lesson blocks used datasets made available through Johns Hopkins, often and unsurprisingly dealing with one or several various physio/medical and environmental data collection projects from University or government web sites.
I didn’t keep up with the R programming tool after the course, but I found it a very good statistical computation tool with great tools for data I/O and presentation.
One course block used a sample data set for 2.5 ppm data and corresponding data for incidence of several sorts of lung disease and respiratory disorders in the same metropolitan areas. If done right, per the professor’s instructions, the student was not supposed to be able to obtain a significan correlation between the particle concentration and respiratory disease. These data somehow had spurious correlations in them that should be filtered out by proper programming.
This is all subject to my often faulty memory, of course. I didn’t keep up my password and git access to my results, and I’ve moved on to other interests. I did recall the lesson when reading this particular piece. Is the EPA using different data sets? Or did someone innocently or maliciously exploit easy correlations out of incompetance or to meet preselected conclusions or support an agenda? Was my course lesson incorrect, and there should have been correlations found in these data?
My little story really isn’t pertinent to the authors’ points – I can’t verify my own results from seven years ago, and don’t really care to renew my aquaintance with or access to any old research or my rapidly obsoleting and superficial data extraction skill set. But it is that one little bit of personal experience that makes me tend to believe them and support the authors’ conclusions.
Discussions on WUWT seem often recently to have been about analysis of other press releases, “news”, government policy as propaganda. To me, this article is convincing.
so
(a) “My little story really isn’t pertinent to the author’s points”
Agree completely
(b) “I can’t verify my own results from seven years ago, ”
ok, they are way out of date WRT modern hypothesis testing (which now-a-days eschews p-value testing and is mostly Bayesian)
yet
(c) “But it is that one little bit of personal experience that makes me tend to believe them and support the authors’ conclusions.”
wow. You also might want a refresher on Freshman-level Logic.
🙂
Chris,
(a)good
(b)agreed
(c)right — persuasion isn’t logic. I found it persuasive while acknowledging (verbosely) that I couldn’t honestly evaluate the science. I obviously failed to put that in context. My bad, but it was a shot.
To me it seems that this piece is an introductory article meant to bring attention to the authors’ program at NAS. Not a scientific paper itself. IMO, evaluating it as persuasion seems more appropriate for an acknowledged, attention-limited layman.
Typo in Figure 1 title. Based on the x-axis (Rank Order), there are 69 (maybe 70) studies not 40 in the meta analysis.
What a surprise. Another right wing lobby group is arguing for more pollution and unclean air.
I commend you comrade. You strike decisive blow to win the day.
What a surprise: another illiterate eco-terrorist.
‘UK Government’ and ‘Corrupt Scientific Advisors’ Are to be Tried for ‘Crimes against Humanity’ and ‘Genocide’
As the mainstream media are remaining silent on the subject, it may surprise you to discover that papers have been laid to start two separate legal proceedings against the UK Government and their corrupt scientific advisors for genocide and crimes against humanity.
The first is described in the following press release from attorney, Melinda C. Mayne, and Justice of the Peace, Kaira S. McCallum who has presided as a JP in Central London Magistrates and Crown Courts for the past twenty years, who also used to be a highly qualified pharmacist.
Also the London Times
William Briggs had some interactions on PM2.5 studies with the California Air Resources Board (CARB) who wanted to impose some new regulations. I don’t have the link but as I recall Briggs pointed out several errors in the statistical methodology used and the response was that since it the same methodology was used in earlier studies behind previous regulations, they were going to continue to accept it.
The link is here: Criticism of Jerrett et al. CARB PM2.5 And Mortality Report – William M. Briggs (wmbriggs.com)
You can search his website for “PM2.5” and find many other similar posts.
Thank you Ed; I was
rushedlazyLooking at the playing card in the lead photograph. The first time I’ve seen a card that apparently has indices in all four corners. The last time I handled a deck of cards was a faro deck, no indices at all. Nothing at all to do with climate, but some interesting history, for folks who are involved in recreating the 1880’s for visitors
https://www.vanishingincmagic.com/playing-cards/articles/how-did-playing-cards-get-their-symbols/
Why not require a P value plot for all published work used to support more regulation?
PM2.5? That’s the new local scary story developed by the same crowd who were promoting diesel over petrol a few years back! Yes I’ve heard of this before & my opinion hasn’t changed! Throughout my life I have listened to many a bureaucrat who seeks some form of power & control with the view to manipulate, seize upon a technical term, then bandy it around with great authority as they were of a technical mind, to appear knowledgeable to those around them, & for the moment, PM2.5 suits the current scare story in the wings, ready for perhaps when the CAGW scare has run its course, but more likely to heap ever increasing fear upon the population to effect overload & panic, by keeping the pressure up, “the end of the world is nigh you must listen to me”, (love that, I bags the title on that one as nobody else has ever claimed that before for certain!) 😉
I still love the good old Penn & Teller scare gag where they employed a few pretty young ladies (sorry I’m old) to go around a park on a sunny weekend day, each with a clip board & form, for people to sign a petition in support of getting guvment to ban Dihydrogen-Monoxide, because big oil, nuclear, food companies, & even drinks companies, were adding this toxic stuff to everything, they got hundreds of signatures in favour of the ban, only none of the signatories knew they were being asked to ban “water” !!!! Go figure!!!
The DHMO thing was also tried at an international environmental conference and got tons of signatures too. And there was at least one town that almost voted to ban it – they were set to vote on the resolution before someone finally realized what was going on.
There are Lies, Damned Lies, and Statistics!