From Stanford University:
Stanford researchers uncover patterns in how scientists lie about their data
When scientists falsify data, they try to cover it up by writing differently in their published works. A pair of Stanford researchers have devised a way of identifying these written clues.
Even the best poker players have “tells” that give away when they’re bluffing with a weak hand. Scientists who commit fraud have similar, but even more subtle, tells, and a pair of Stanford researchers have cracked the writing patterns of scientists who attempt to pass along falsified data.
The work, published in the Journal of Language and Social Psychology, could eventually help scientists identify falsified research before it is published.
There is a fair amount of research dedicated to understanding the ways liars lie. Studies have shown that liars generally tend to express more negative emotion terms and use fewer first-person pronouns. Fraudulent financial reports typically display higher levels of linguistic obfuscation – phrasing that is meant to distract from or conceal the fake data – than accurate reports.
To see if similar patterns exist in scientific academia, Jeff Hancock, a professor of communication at Stanford, and graduate student David Markowitz searched the archives of PubMed, a database of life sciences journals, from 1973 to 2013 for retracted papers. They identified 253, primarily from biomedical journals, that were retracted for documented fraud and compared the writing in these to unretracted papers from the same journals and publication years, and covering the same topics.
They then rated the level of fraud of each paper using a customized “obfuscation index,” which rated the degree to which the authors attempted to mask their false results. This was achieved through a summary score of causal terms, abstract language, jargon, positive emotion terms and a standardized ease of reading score.
“We believe the underlying idea behind obfuscation is to muddle the truth,” said Markowitz, the lead author on the paper. “Scientists faking data know that they are committing a misconduct and do not want to get caught. Therefore, one strategy to evade this may be to obscure parts of the paper. We suggest that language can be one of many variables to differentiate between fraudulent and genuine science.”
The results showed that fraudulent retracted papers scored significantly higher on the obfuscation index than papers retracted for other reasons. For example, fraudulent papers contained approximately 1.5 percent more jargon than unretracted papers.
“Fradulent papers had about 60 more jargon-like words per paper compared to unretracted papers,” Markowitz said. “This is a non-trivial amount.”
The researchers say that scientists might commit data fraud for a variety of reasons. Previous research points to a “publish or perish” mentality that may motivate researchers to manipulate their findings or fake studies altogether. But the change the researchers found in the writing, however, is directly related to the author’s goals of covering up lies through the manipulation of language. For instance, a fraudulent author may use fewer positive emotion terms to curb praise for the data, for fear of triggering inquiry.
In the future, a computerized system based on this work might be able to flag a submitted paper so that editors could give it a more critical review before publication, depending on the journal’s threshold for obfuscated language. But the authors warn that this approach isn’t currently feasible given the false-positive rate.
“Science fraud is of increasing concern in academia, and automatic tools for identifying fraud might be useful,” Hancock said. “But much more research is needed before considering this kind of approach. Obviously, there is a very high error rate that would need to be improved, but also science is based on trust, and introducing a ‘fraud detection’ tool into the publication process might undermine that trust.”
###

For a real hockey stick, run the IPCC reports through obfuscation index software.
If you did that all that would be left would be the tile and author list.
James Bull
And judging by some of the “peer” reviewed papers being retracted, some of those names are probably sock puppets.
JB
And I am not sure about the value of a tile alone.
Most houses need several.
Possibly St Albert (Gore) has a house that needs thousands.
Titles – see previous post about our Prince of Wales.
Auto
The only thing that would be left is the puck.
It’s the puck that hurts most! Hence the goalie gear.
Just eliminate every sentence with qualifiers: might, probably, likely, quite possibly, exceedingly probable…of course that would eliminate the report.
But that’s the jargon of climate science, you see.
Here is some additional climate jargon: may, could, conceivable, perhaps, projected, modeled, etc. It’s quite a long list.
‘more research is needed’
But of course!
“Funded research, I presume.” -Sherlock Holmes.
That phrase should be number one on the list of obfuscated language.
I totally disagree.
“More research is needed” indicates that the researcher has perhaps found something but is an admission of the shortcomings of the work. One needs to publish (so as not to perish) and can’t wait years until the definitive research is completed.
The papers that are dangerous purport to find statistically significant results supporting a drug company. They don’t say “more research is needed”. Those ones are more likely to be fraudulent.
I have to agree with CB. The idea that no more research is “needed” implies that we already know enough, which is humbug on a par with CAGW. The argument could be made based on that, that no more research was needed after the Ptolemaic astronomical model was developed because it “worked.” It actually did for what they used it for, but it would never have helped land a man on the moon. Research leads to better explanatory models, which in turn leads to the opportunity to do things the older “model” forbade.
Gamecock ,Tom in Florida & commieBob, the odd thing is I agree with all three of you. Gamecock and Tom in Florida I believe have Climate Scientists in the fore part of their brain(s) when commenting. commieBob is looking at the needs of science in general.
For most fields the statement, “More research is needed” is an acknowledgment that the researcher recognizes that there is more to do. Also it opens the subject up to other persons to add a new set of eyes.
my 2 cents anyway
michael
Would any scientist ever claim “no more research is needed”. That would be a bold claim. …oh, wait…nevermind!
Commiebob, the Gunning Fog Index was introduced in 1952. The Stanford study is old news. Known for 3 generations.
What a complete non sequitur. I fail to see how anything you raised has anything to do with anything I said.
1 – The Gunning Fog Index measures the readability of English writing.
2 – Which Stanford study?
3 – What precisely has been known for three generations and why does it matter?
If you were trying for humor, I apologize because the joke has eluded me.
Commie Bob might be right, but I doubt it, more research is needed to be sure, send money for this research to me. When sending this money don’t confuse me with Tom in Florida, I am another Tom in Florida.
Commie Bob please provide proof that research by drug companies are more likely fraudulent.
I’ve been seeing stories about drug company malfeasance for a long time. Money provides a powerful motive. Here’s a link. The one that got me was the statement that 90% of cancer studies could not be replicated. The following alligations appeared in Nature:
Another common fraudulent practice is to refuse to publish papers that produce negative results.
At many universities you won’t get tenure if you don’t publish. That means you become unemployed. That’s a powerful motive for fraud. Add to that the motivation of the company sponsoring the research and it becomes unsurprising to find many fraudulent papers.
Having said the above, I believe that the vast majority of papers are not fraudulent. The evidence does imply that research done by, or sponsored by, drug companies is more likely to be fraudulent than research not involving a drug company.
In the course of ‘Contemporary Neo-orthodox Climatology’ we say with confidence: “The ScienceIs Settled”!
Because we need jobs for the rest of our lives and we aren’t too old.
I was watching the Weather Channel as Hurricane Patricia approached Mexico. It was clear that the meteorologists had been told they were to state that Patricia was the strongest hurricane ever. However, it was clear that a number of them were uncomfortable making that claim, and therefore introduced numerous qualifiers, e.g., it was the strongest “cyclone” in the eastern Pacific, it was the strongest storm designated a hurricane, etc. I was really watching closely because it became obvious that the meteorologists did not believe what they were being told to say, but were trying hard to obey instructions.
Yea, I noticed that too …sad days for science !!
No, this is not science but this announcement should be regarded as a warning for the people to be prepared..
The definition of hurricane strength is usually given by the lowest pressure. With that measure Hurricane Patricia (Mexico) at 880 hPa was stronger than the strongest Atlantic hurricane, Hurricane Wilma at 882 hPa but not as strong as Typhoon Tip at 870 hPa. Patricia was very small in comparative size of the wind field.
880 is higher than 882 ???? …New liberal math ????
Why don’t you look up what they’re talking about before making dumb accusations?
Marcus, he said strongest, not highest.
hPa is measuring the atmospheric pressure at the center of a hurricane, and in this instance a lower pressure means a stronger storm.
Good ol’ Marcus…
Alright!! a DOG PILE. 880 hPa is in fact higher than 882 hPa in pressure altitude.
I think someone measured 850 millibars inside a large tornado in South Dakota. Smaller funnels with higher wind velocities would probably drop even lower.
The danger lies in the “boy crying wolf” syndrome. Such was the case of Cyclone Tracy that flattened Darwin (Australia) in 1974, resulting in the death of 71 residents.
What’s the reference for your “boy crying wolf” claim?
Jon, I think BB is referring to the casual laid back attitude in the Top End regarding warnings, seen them before, nothing happened, let’s have another beer.
I would imagine that such automated obfuscation detectors would be easy to defeat. You simply would have to lie more confidently and specifically.
How does the old saying go? Once you learn to fake authenticity, the rest is easy.
People like Cook and Lewandowsky lack the skills (or genetic material) required to fake authenticity, so everything they write draws attention to the flaws and stupidity of their so called research.
The idea is:
If the language of the paper is clear, the lie is hard to hide.
Meaning you can’t hide [the fraud] behind the decline [in weasel words].
Calling ex-prez Clinton! Job Opportunity!
It would be interesting see see how it would work, most of the rock-star climatologists seem too narcissistic to not lie confidently, in fact they seem to lie in a grandiose manner.
They would use the detector themselves obfuscating the obfuscation before issuing the paper.
Hmmm. Michael Mann is a master of excruciatingly obfuscatory prose.
I wonder what that means?
It means that Mann is a master of saying things in a way that nobody else can understand.
That could be injurious in a scientific debate, if you want to win supporters.
Oh! Forgot… settled scientology, debate is illegal.
How about some error bars on the % jargon measure? 1.5% doesn’t sound like much to me, especially if it is +/- 1%.
Some people just like to spout jargon. They think it makes them look intelligent.
I was thinking the same. Writing style could easily play into this, and it’s not just pretension either, Mark.
I confess that I slip into the EPA’s acronym-heavy lingo when talking with colleagues, and I sometimes slip into it with non-environmental people as well, leading to annoyed and glazed over eyes. That’s how I have to think to do my job. It’s just as natural as theater majors making obscure Shakespeare references (so common in my college hobby of fencing that I had to actively stop myself once I entered the workforce).
et tu ben
Jargon and obfuscatory language is a favorite of the Humanities and Education crowd. They have to spout crappola like that to explain why their new education theories (that once again failed to teach kids anything) were actually working. Then about every ten years or so (just about the point where the results data began to reveal the sad truth) they changed to a new system, moved the goalposts and got a free reboot.
“As a proponent of perspicuity one should really espouse eschewing obfuscation in typographical emanations.”
“We believe the underlying idea behind obfuscation is to muddle the truth,”
Not always. I use it to conceal my ignorance.
1.5% more jargon? That’s easily due to different writing styles. My own writing can shift to dramatically different amounts of jargon depending on my caffeination level! Then, you have the problem that if a tool exists, the cheaters will have access to it as well, so anyone taking the time to commit fraud could easily edit their paper until it passes any arbitrary threshold.
Interesting, but don’t expect to ever see this fulfilling it’s implied ambitions
I try to tailor the amount of jargon in my writings to the expected audience – ‘experts’ and ‘authorities’ get the max.
In my experience, the first filter should be “government report; yes or no”. I never read a government report that did not use 50 words where 5 would suffice. I think government pays by the pound.
its
Are crummy grammar an misspells tells?
and
Hello, Muphry? Izzat you?
“They identified 253, primarily from biomedical journals, that were retracted “.
I wonder if this reflects the healthier competition in the bio medical field? That other sciences have just as much of a need for retraction but nobody calls for it.
I don’t think so. Theoretical physics is highly competitive and it is almost impossible to fake a result. Suppose, you claim to have discovered some theory — others would do the calculations and either show you right or wrong. Very hard to fake. On the other hand, there is some misappropriation of other people’s ideas. So most don’t talk about their work too much before its published.
The hardest of all to fake would be a proof of some theorem in mathematics. A mistake was discovered during peer review of Andrew Wiles’ first attempt at proving Fermat’s Last Theorem in 1993. After fixing the mistake, the proof was finally published in 1995.
But what if the mistake had not be caught? That happens too, but hopefully someone will eventually discover it – as was the case with several ‘proofs’ of the Four Color Map Theorem that stood for a little over a decade until the mistakes were discovered.
At one job I had there was a set of instructions on how to produce what sounded like important technical reports, all you did was chose a random set of numbers and letters from 1-6 and A-G and put the phrases and sentences together and it all seemed to make sense but was completely meaningless, a bit like the IPCC reports.
James Bull
So it was all bull, eh James?
Is this the reason why I have a hard time understanding papers written on climate science? I can make my way around most other scientific areas.
Did they use the ” Double Blind ” test for this ???
They tied them hand and foot: it is called a ‘double bind’ test.
It would change how they lie going forward. But it would always identify muddled or obfuscated writing.
‘Going forward’ is a BS score of 10. Real people say ‘in future’
That phrase always makes me laugh: CEOs often use it when talking about their company. Apparently it’s to differentiate from going backward, or nowhere, because it isn’t obvious to laypeople that those moves would be a bad thing.
it’s a bit of a give-away when there’s NASA-GISS written at the top of the page.
This kind of analysis might be very good at identifying papers where the structure and the language used ‘overplays a weak hand’ but would it really pick up premeditated or preprogrammed fraud? If a scientist gets weak results – or even contradictory results – which might endanger his advancement or his next grant, he’s almost bound to dress it up to favour his desired narrative, as we’ve seen so many clear examples in climate science. That’s not hard to spot. But if there is a conscious or even subconscious STRATEGY to achieve a particular result, or anything seriously underhand, would that be so clearly flagged? (I note they used retracted papers for the training of their routine, so presumably there was serious wrongdoing in some of them. Or is it going to need the sort of detective work that is done so thoroughly at Climate Audit to get to the bottom of it?
somewhat off topic – at Judith Curry’s site there has been a fascinating discussion following a short bit of research by Zeke Hausfather and Kevin Cowtan which is a significant defence of the Karl et al. revision of surface temperatures. For the most part the discussion has been very good-natured and very useful, with only a few knee-jerk reactions. Last time I looked, H and K, who both participated fully in the discussion, seemed (to me) to have the better of the exchange. Zeke is always worth listening to and impeccably polite. Pity there isn’t more of such stuff.
1.5%more obfuscation words!! And I note they added obfuscation to beef up the finding by telling us (trust me) this is significant. Then followed with there were a lot of false positives. Then said it could be a useful tool for finding fraud but that would show a lack of trust and science is all about trust!! Wow.
Forensics on paper before reading it.:
This paper was a disappointment for the authors but because of publish or perish, they spun it to get it published. I would like Briggs or McIntyre to get the data to see how significant that puny 1.5% with the false positives is. This paper should be retracted based on the author’s criteria.
Nice!
Obviously, there is a very high error rate that would need to be improved
Is it an error, or has the data fraud just not been detected yet?
Even now many scientists are trained to be very careful in qualifying what they write, which is obfuscatory to your normal human, who wants a nice clear picture. So, scientific writing starts out as involved. Any simple “fog” filter is very likely to turn up many false positives because just about any scientist will tend to make “self negated arguments.” They advance an hypothesis, describe what they think supports it, and then argue against it. If they feel their conclusions are very weak but that they are still correct, the conclusions will be extremely foggy ending in “more research is needed”. In fraudulent papers I would look for falsely confidant assertions about their conclusions following a particularly vague discussion of what the research “found.”
This paper contains 1.6% extra jargon
I could sneeze and get that much extra jargon.
This is right up there with micro-expressions, the polygraph, Mosso’s Cradle, and astrology. There’s a little something there, and if you squint, and really, truly believe, then it works!
Meanwhile, back in the real world, there is no “pattern of deception” used by liars that can be spotted–not in writing, facial expressions, blood pressure, leg-crossing, eye movements, or any other physical indicators.
The only sure way to detect lies is contextual analysis–careful logical analysis by someone expert in the context of the claims made by the liar.
“The only sure way to detect lies is contextual analysis–careful logical analysis by someone expert in the context of the claims made by the liar.”
With one expert checking the other. Maybe we could call that peer review?
Paul,
Spot on! And there is the conundrum. Bureaucracies beget sacred cows and fiefdoms. Academia is a bureaucracy. Bias is inherent in the review system.
So the missing modifier for the blank below is: “an unbiased impartial”
“…careful logical analysis by _______ expert in the context of the claims made by the liar.”
And in fact, that is what the global warming realist community has created–a review system, outside the official bureaucracy, made up of contextual experts who do not have conflicts of interest. You’re participating in it now!
The only other missing modifier is “trained to detect deception.” Even the best contextual experts must be trained to understand the depths to which men will descend their efforts to deceive. Normal experts in most fields have no clue that there are total frauds moving among them. Without training–call it skeptical situational awareness sensitivity training–most experts are unable to spot the fakes.
A colleague reported the following from his required initial meeting with an academic integrity issue. The student said, “I didn’t plagiarize it–I got it from my friend. He must have plagiarized it!”
Before retirement I was teaching computer programming classes–mostly Java and Visual Basic. Occasionally students submitted work created by others or jointly in the same class. So they had to make cosmetic changes in the code to make it look different without disrupting the functionality.
The assignments were submitted online, and were graded in the same sequence. Typically they were uploaded around four hours apart to avoid being too obvious. So by the time I got to grading the latter one, its twin was still relatively fresh in my mind.
After a while I got to the point where I could “feel” that such obfuscation was occurring in the first one, and would be looking for the next one to show up. I probably should have analysed the pairs to explicitly explicitly identify the symptoms and written a paper about it. I am glad, however, that these researchers have done so in regard to this issue concerning the integrity of science. They are addressing a societal problem which is much more significant than occasional cheaters in undergraduate classes.
Although there may be weaknesses in what they have achieved so far (more research may be needed), it’s a start. More power to them! And regardless of the ultimate success or failure of such efforts, they are drawing additional attention to the problem.
I’ve taught at the college level off and on since 1970, and have always warned my teaching assistants to be on the lookout for computer code that was copied and cosmetically changed. They catch several such attempts in beginning programming classes of size 30 to 40 students.
The incident that caused me the most angst was having to confront the son of a very close friend. (On the first strike, a letter about the incident is put on file with the academic dean. Second strike the course is automatically graded as F. Third strike is dismissal from the school.)
Four cases of student copying spring to my mind.
Case 1: the copying group all printed on blue paper. Everyone else in the class used white.
Case 2: a sequence of copies in which each copy dropped or mistook more parts of a key diagram until the last one was gibberish.
Case 3: one of my father’s colleagues marked a report which had copied a lot from a thesis without acknowledgement. His!
Case 4: two students submitted identical assignments. When I pointed this out they said “But we’re married!”
….Stanford communication scholars have devised an ‘obfuscation index’ that can help catch falsified scientific research before it is published…
As far as I can see, they simply count the ‘weasel words’ used. All this is is writing in ‘Civil Service’ style – writing so that you cannot be held to anything if something were later to go wrong.
I’d like to see the tool used on Cabinet Office briefings. They are notorious for saying nothing true and using pages and pages of text to do so…
This paper and research design has to be a farce. It appears to be a pretty good example of a paper filled with exactly what they were looking for. Here is the real fraud being seen in research these days: The significance ratchet overruns the data.
Pamela – I think you hit the nail on the head. You can easily tell in most Global Warming/Climate change reports that the data are inadequate.
1) Short time series.
2) Low spatial coverage.
3) Large areas of the globe where there are no data at all. This gets progressively worse as you go back in time..
4)Using a few proxy data to draw conclusions about the entire globe.
5) Assuming CO2 is well mixed.
6) Assuming that the solar flux constant is constant. This would require the Earth’s orbit to be circular, no tilt in the Earth’s axis, and no change at all in solar output.
7) Ignoring any long term cyclical changes.
I think my years as a professional graduate student in theoretical linguistics are behind my extreme revulsion at the complete inappropriateness of this “paper” in a field that purports to use our understanding of “language” to analyze the motives of a writer. (Shall we say that this kind of thing always pegs my BS meter.) The only thing they got right was the intractability of the “problem”; in the seventies, we worked very hard to let everyone know what the “jargon” words meant, because every piece of theory worked strictly and carefully within its own jargon space, and this shared jargon space made communication actually work better. And if the writer believes his own drivel, and his advisors encourage his publication of drivel, where is the obfuscation?
The Stanford researchers have devised a text-based polygraph, based on even less useful research than the physical polygraph was. Now, there is little doubt is some minds as to why polygraphs are widely disallowed as a tool for sifting truth, because, sadly, polygraphy is a little like phrenology and “climate science”. Enough said.
And where is the control set? A set of papers based on research not shown to be based on false data might show up a lot of possibly bad writing. What are the evil motives of this set of authors?
Remember,
I have always regarded this as a matter of definition.
“Lie” means “to say something that you do not believe, with the intention of deception”.
What else would a lie be?
Unless you are being “economical with the truth”…something my former wife was good at. As I say to all my friends, what I say may not be tactful, but it will always be the truth. Unless you are a traffic cop, then no I was not speeding.
While there are some problems with this approach it does give the cheaters something else to think about. An AI algorithm with proper metrics scanning all papers for “further review” is very possible. Even if it doesn’t work all the time the cheaters would have to be aware that their papers were going to be rated by a machine. Their pal-review won’t work on it.
Gives them something else to worry about. That is good. Sad that we have to go to these lengths to ensure honest science but that is the world we live in.
Much of what passes as climate science is spurious at best. Weasel wording to cover up a lack of sound science is the real art form being practised.
But then we didn’t need a study to validate that.
Reblogged this on Norah4you's Weblog and commented:
Nothing new when CO2-“experts” try their best to fill an empty bag with more than air…..
John Baez, who used to run the Sci.Physics had a crackpot detection program.
We should also develop a Climate Science BS detector. Here are a few things to look for:
1) Expert
2) Denier
3) Models predict
4) Might, could etc.
Also “Robust.” I never saw this word in scientific writing (unless it was describing someone’s physique) until Climate Change advocates started using it to describe their evidence. It’s not a scientific word.
There’s an entire subfield of Statisics called “Robust Statistics” where they try to cope with the fact that the world doesn’t fit neat STAT 101 models very well. Search for “Robust Regression”, for example. If a climate paper reported that they’d used robust regression (instead of plain old linear regression) I would think they had at least one clue (but not enough). So depending on the context, it’s a word that CAN be used legitimately in science. (Also in the distinction between “robust” and “gracile” hominids.)