arXiv bookworm

Guest Post by Willis Eschenbach

I stumbled across something called “bookworm“, which looks through all of the articles in the pre-print service arXiv. Their default home page looks like this.

Figure 1. Comparison of the usage of three terms, “graphene” (light blue), “qubit” (red) and “superluminal neutrinos” (green).

You can see how there were no mentions of neutrinos going faster than light until the recent CERN claims a few months ago, and how graphene became a hot topic starting in 2006.

Not only that, but if you click on any point on one of the graph lines at their website, it gives you the list of titles for that month that referenced the given term, and when you then click on one of the titles, it takes you to the preprint. Very nice, kudos to the designers.

So I thought I’d take a look at some climate science terms.

Figure 2 shows my look at some relevant phrases. There is a two-word limit on phrases in bookworm.

Figure 2. Results from bookworm for four words and phrases related to climate science. Note the difference in scale from Figure 1.

You can see how the use of “global warming” dropped off compared to “climate change”. I would ascribe this in part to the lack of warming in the 21st century, rendering the “warming” less plausible.

But the most interesting thing to me was that the use of all four terms peaked in March-April 2011, and since then each one has suffered the greatest decline in its history.

Not sure what to conclude from that. I don’t know what happened in March-April 2011. And I make no great claims of significance based on this information. Having said that, I do like seeing the heat dropping in the recent record, it opens the possibility that this is at least the end of the beginning of the climate mania …

Mostly, I just like having a new tool and seeing what it can do.



newest oldest most voted
Notify of

Hypothesis: The number of pre-prints on climate change etc fell sharply because so many of the researchers are tied up in the IPCC. Expect a surge just before submission deadlines.

Huub Bakker

Surely this is to be expected. As economic woes take over as the most important topic people forget about the supposed threat of global warming. I would predict that the decline will continue since there is far more economic pain in store.

Hey Willis this is really cooool…….
But Pacman? Really?
Seems we might be overloading the server 🙂 how long did your search take?

This looks really cool…
But at the moment I’m just getting one Pacman – maybe we’ve overloaded their server :-)?


Have you played with Google Ngram which does a similar search across millions of published books?


Looks like a hockey stick to me.


When was/is the cut off date for inclusion in AR5?
Perhaps its related.

It doesn’t really mean anything. The e-print archive relies on voluntary submissions and memberships. It currently has 745,983 e-prints available across Physics, Mathematics, Computer Science, Quantitative Biology, Quantitative Finance and Statistics. This number is nowhere near representative of all the papers published in these broad disciplines. For example, a simple search in google scholar for the term “global warming” returns 727,000 which is nearly as many as the total number of e-prints the arXiv has. A search in google scholar for “graphene” returns a mere 137,000. What this would suggest to me is that the institutions and journals which are volunteering themselves to Cornall University for the arXis e-prints archive is heavily skewed towards the branches of science that deal with graphene, neutrinos and qubits and everything else that isn’t climate related. That would explain the scale differences. As for the apparent drop off since March/April 2011 in climate related papers, this could just as easily be attributed to an influx over that period of new institutions and libraries to the database that do not deal with climate change because the scale is relative.

I think March / April is about the time folks are thinking about the end of this school year and planning to ‘re-up’ to teach the next… so maybe a good time to be “published” to avoid being ‘dry’ for the June review of who to keep / toss?

That’s really great Willis, thanks.
Have you seen this wind visualization? Very awesome.
“Talk about visualizations. Ever wondered what the wind would look like if you could see it in action from above? A new project posted online by a pair of Google computer scientists, called simply Wind Map, has to be seen to be believed. “It can be quite hypnotizing to watch the gusty trails blast across the American continent, skitter over the Sierras, get roughed up by the Rockies, and whoosh over the great plains on its way to Canada,” writes Chris Taylor. Wind Map is the brainchild of Fernanda Viégas and Martin Wattenberg, the co-leaders of Google’s ‘Big Picture’ visualization research group in Cambridge, Mass. Wind patterns are constantly changing, of course, which is why the Wind Map designers have also given us a moving-image gallery of previous blustery days.”


Cool tool, Willis, thanks for sharing. I don’t know why the curves dropped so much since a year ago, but I wonder at the nearly-annual cycle of boom/bust for all of them (see especially 2007 –coincides with IPCC AR4?). The hypothesis that it is related to yearly teaching/scholarship evaluations is intriguing but would the pulses remain that coherent? Pre-print date would be less likely than actual publication date to “blur” any such pulse, and it does seem to occur every year…

From the IPCC’s Special Report, “There is medium evidence and high agreement that long-term trends in normalized losses have not been attributed to natural or anthropogenic climate change.”
So what term do we think will be a hot one in the arXiv database? Pielke Jr. Is going to be busy refuting a wave of weather disaster attribution papers in the coming months is my bet.


It’s words than we thought!

Assuming it takes some time from submission to publication, the April 2011 peak might coincide with a glut of papers being completed and submitted in June/July of the previous year as exams and teaching committments end.

Ed Zuiderwijk

The data in Figure 2 are strongly correlated, meaning that searching on only one of the four terms would suffice. Given that the lead-time for a run-of-the-mill paper from the observations through the first drafts to the finally submitted version is between 1 and 2 years, something in 2005-06 must have triggered the surge in early 2007. Any idea what that can have been? Is this the Katrina Effect?

Paul Nottingham

Isn’t this “e-prints” not “pre-prints”?

Man Bearpig

Has anyone noticed the link between this graph and temperatures over the same period.. Perhaps sea level too.. Is there a correlation ?


I appreciate mike’s explanation of the limitations. But Willis this is an excellent tool. Out of curiosity I entered ‘Gleick’ for all articles and got …zero.


Speaking of having quick tools Willis,
I was hand marking the refs in this 16 pge paper which has 362 references, Link below.
The paper was sourced on a meta-lit research, following the work of an undergrad in physics that published a PhD in medical methods and then went on to sociology + policies. And now an expert in statistics.
And I did not, eventually, have to pay for this academic paper. It was avail free on google.
My humble experience. Published papers are not analysed, or commentary (dissenting or not) allowed and the methods thus become ipso facto …. over decades. Replicants.
As with your graphing of the use in language.

Lars P.

Willis, guess you would see a short peak for climate weirding – the last spin now – but they seem to get a shorter half-life decay.

David L

Well since the science has long been settled, wouldn’t it make sense that the scientists woukd stop researching it and move onto something that isn’t settled yet? /sarc


I don’t think the recent drop is unprecedented, it’s very similar and actually slightly smaller in relative amplitude compared with the raise and drop in 2005.

That’s funny!


Next stop — Climate Weirding


Just musing: massive floods in Eastern Australia (the people had been told it would be droughts from now on). Inconveniently the floods weren’t any bigger than the 1970’s ones. I think that opened a lot more eyes over here to the fact that CAGW is a lot of worry about a bit of noise with a stack of vested hype. Christchurch and Japanese Earthquakes around then too. The people are smarter than most politicians and academics take them for, but this site helps too.
Australia has a very small population but it’s globally extremely well connected.

David L. Hagen

Thanks for the lead.
On the timing of the peak, suggest looking at the lag due to manuscript preparation and writing between the time of interest and the time it gets posted.
Gestation usually takes 9 months.
It also looks like there is an annual cycle in the data.


Time for a grant application to study publishing output vs. CO2


“From IPCC: “There is medium evidence and high agreement that long-term trends in normalized losses have not been attributed to natural or anthropogenic climate change.””
Notice they had to throw natural in there. They have little evidence of man-made causes for long-term losses. If it is not caused by man, then the default is that it has to be natural, because we know that hurricanes, floods, etc. are mostly natural to begin with (except a faulty dam for example). But here they want to try to say you have to prove it is natural. At least they are honest about the main conclusions.


Looks like the hedge funds are shorting it. Time for the bears to cash in too!


Interesting to see the climb for ‘graphene.’ Google “Younger Dryas” and ‘graphene’ and get a long list of articles dealing with the impact theory, draining of Lake Agassiz, etc., based on particulate analyses. But don’t leave this out:

Gary Pearse

Huub Bakker says:
April 1, 2012 at 12:22 am
“Surely this is to be expected. As economic woes take over …”
Yes, ordinary citizens are more sensible than the alarmist cadre. They know what is really a serious issue.

Gary Pearse

Willis, regarding Arxiv: if one submits a paper to this site, is it a sure way to protect your ideas from being stolen or is it a way to offer it to be stolen? I understand they accept papers that haven’t been peer reviewed and are in the development stage with an intention to eventually publish. It is supposed to allow people to get info that might be useful for their research without waiting for published stuff. Anyone know what the score is here?

Steve Keohane

it opens the possibility that this is at least the end of the beginning of the climate mania …
I was hoping for the beginning of the end of the climate mania….

Peter Melia

It appears that all of these scares seemed to take off in or about 2000.
Is there any way of finding just what happened to the senior people in all of those organisations which so assiduously push AGM?
This would include:- Scientific American, BBC, New York Times, Guardian, Royal Society etc.


As the four terms graphed above are terms used primarily by warmist climate scientists, one would hope the dramatic drop indicates an increase in actual objective climate science papers, rather than the pre-determined, goal-oriented, fix-is-in research and publishing favored by IPCC types.


Perhaps the climate consensus, seeing their failed prognostications repeatedly tossed back in their faces, have pulled their typing finger from the keyboard and are now waving it in the lower atmosphere trying to see which what the wind blows while hoping theirs is not the finger on the latch of the next `gate’ to swing agape.

Paul Matthews

The main point here is how low all the numbers are.
Climate scientists don’t put their papers on ArXiv.
Perhaps they don’t want them scrutinised.
Particularly in the run-up to the AR5 deadline.


I was glad to see that graphene R&D efforts are doing well (from your first graph). If, or is it when, our federal government decides to do something about the budget (right sizing it to match the revenues coming in) I hope they keep funds for material R&D efforts vs say another paper on how a projected (using models) sea level rise will affect X, Y or Z.
For those interest in updates on R&D efforts around graphene:
Graphene news and resources
“Graphene is a one-atom-thick material with exciting potential. Graphene can be used in many industries – from electronics to water purifiers, from displays to super-capacitors and car batteries. We offer daily news and resource about this exciting new technology.”

Willis Eschenbach

g2-55527e4d0e6a7560016aa81dbd8b142e says:
April 1, 2012 at 12:28 am

Have you played with Google Ngram which does a similar search across millions of published books?

I did, but it gives raw numbers rather than percentages so I didn’t find it useful.

In re PWL at 1:38 AM Thanks for that link to the wind map.
The straight thru link is
By the way the SF Bay Are map is now at

Dr Burns

Well there you have it … the first clear correlation between climate change and CO2 levels in the atmosphere.
There used to be a joke about a compound called “1,2 dihydroxy chicken wire” but it seems that a similar compound actually exists … graphene oxide :

Agile Aspect

I’d be surprised if there are more than a couple of people on this blog who actually read the papers on arXiv which make it through the peer review process.
How many of the papers on arXiv have made it through one of the major climate journals?
You might do better following the Google ads at the end of the blog posts.

Tony Mach

Quick, hide the decline!


Willis nice tool thanks. Here have this gem

Willis Eschenbach

HR says:
April 1, 2012 at 4:01 pm

Willis nice tool thanks. Here have this gem

You’re welcome. Sadly, your link all about Occam’s razor but it has no definition of Occam’s razor. That’s a fail right there.
It also has gems like

Occam’s razor supports, but does not prove, these axioms.

How could Occam’s razor prove anything? You can’t prove anything in science.
In any case, Occam’s razor says
Don’t multiply causes unneccesarily.
A statement of something to avoid (multiplication of causes) based on a completely subjective criteria like “unnecessary” cannot prove anything, even on a good day with a following wind.

Greg Cavanagh

There appears to be a down tick every year about March to June. Between 2000 and 2004 the down tick looks to be at the half year mark. Some sort of annual cycle for publishing. Perhaps Christmas and holidays have an influence on publishing schedules.

Gary Swift

“Have you played with Google Ngram which does a similar search across millions of published books?
I did, but it gives raw numbers rather than percentages so I didn’t find it useful.
I found it interesting that it is case sensitive, so the terms Climate and climate give similar but different results (opposite trends at the end of the time period). I also noted that all the climate change related terms have about the same amplitude as terms such as “ghosts” and “aliens”