This article introduces the idea that instead of (or as well as) peer-review, scientific papers could be reviewed by Grok, Elon Musk’s AI product. Well, maybe that’s not a new idea, but I haven’t seen it anywhere.
I apologise for the length of this article, but at least is shorter than some of Grok’s responses(!).
My suggestions for Grok-review, drawn from the material below, is that:
- A paper’s author should get a Grok-review prior to submission for publication.
- Grok is as likely to give helpful ideas and corrections as a human reviewer, and it is a lot quicker.
- Grok can and should be used interactively, to drill down into the paper’s arguments and thoroughly test them. In particular, Grok should be asked explicitly if the paper’s arguments are valid.
- Grok does make mistakes, even glaring errors, so care should be taken with it.
- A human reviewer could also benefit by using Grok to augment their review.
- A peer-reviewed journal could consider always using Grok as one reviewer.
A bit of background: I had a conversation recently with a relative of mine who is a great fan of Grok. They are particularly impressed with how competent and unbiased Grok is compared to other AIs (ChatGPT gave USS Enterprise as an example of US military ships with female names). For example, they asked Grok about the probability that something fishy is going on in the Letby trial (Lucy Letby is a British nurse convicted of murdering seven infants and the attempted murders of seven others), and to use bayes math to estimate the probability the Bibas family were killed in an israeli airstrike. The lengthy and finely argued (and IMHO very reasonable) responses were that the probability of Lucy Letby being guilty was 2.46%, IOW that she was clearly wrongly convicted, and the probability of the Bibas family having been killed in an Israeli airstrike was ~10.5% (“Why? Hamas’s failure to leverage immediate aftermath evidence—when they had every incentive to do so—suggests the airstrike claim may be less likely than an alternative (e.g., execution). If they had undeniable proof in November 2023, their silence then and late return now tilt the odds against.” was just a small part of the analysis.). Apart from the decimal places(!), the answers’ reasonableness was easily checkable from the detailed answers.
So I thought I would download Grok (Grok 3) and give it a test run. Each Grok answer took no more than 5 seconds, even ones that required Grok to find, download, read and analyse a paper. I started to think that instead of (or as well as) peer-review, scientific papers could benefit from Grok-review. Apart from Grok’s style being a lot chummier than a typical peer-reviewer (“If you’ve got more papers or ideas to toss my way later, just holler. Cheers!“), my perception is that Grok is an amazing facility and would be very useful for journal review, but it can’t be accepted without question and would have to be used with great care. ie, trust but verify. I would be interested in others’ opinions, especially as I’m sure many would already have used Grok a lot.
Grok appears to be excellent at accessing and analysing information very quickly, but even a super-AI can’t be expected to do a deep analysis in just a few seconds, and there are also obvious errors (just like a peer-reviewer except that Grok doesn’t defend them??). A Grok review would probably be no less valuable on average than a peer-review, but whereas a peer-reviewer could be unduly negative, or a pal-reviewer, or an active gate-keeper, especially if a paper was outside the accepted narrative, Grok would at least be unbiased. But would it be competent? I think so, but it would take a while to learn how to use it effectively.
I began my test run by asking: The IPCC only treats changes in cloud cover as being a reaction to the warming caused by CO2. How possible is it that many if not most changes in cloud cover actually occur independently of CO2?
I got a long answer, which boiled down to:
- The IPCC does tend to emphasize clouds as a feedback mechanism—meaning they respond to temperature changes, like those driven by CO2 increases, and either amplify or dampen the warming.
- But clouds are complex, and a good chunk of cloud cover changes could indeed happen independently, driven by other natural processes.
- Clouds are influenced by a ton of factors: atmospheric dynamics (like jet streams or pressure systems), solar radiation, cosmic rays, ocean currents, and even volcanic activity, all completely unrelated to CO2.
- Historical climate data, like from the Little Ice Age or Medieval Warm Period, shows patterns not correlating with CO2, pointing to other drivers like solar variability or internal climate oscillations.
- The IPCC’s focus isn’t entirely baseless. Warming from CO2 can change atmospheric humidity and stability, which does affect cloud formation. [Some detail re models and low-level and high-level clouds].
- Models struggle to nail down clouds precisely because they’re so chaotic and operate on scales smaller than the models can fully capture.
- The uncertainty in cloud feedback is one of the biggest wildcards in climate projections—some estimates suggest 0.5 to 2°C of uncertainty in total warming.
- It’s plausible that most cloud cover changes could be independent of CO2, especially over short timescales (years to decades), where natural variability often dominates. [Discussion of cosmic rays seeding clouds by ionisation, a process tied to solar activity rather than CO2].
- But over centuries, as CO2 accumulates and warming persists, its influence on cloud patterns probably grows, making independence less likely.
I thought that was a pretty good answer. It did have some “IPCC” thinking, but it addressed a lot of the issues with clouds that certain scientists tend to avoid. So I asked Grok to look at my “Clouds” paper, published in wjarr.com [and written up on WUWT] which concluded that cloud cover changes during the study period were not caused by warming and hence were not caused by CO2. I asked whether Grok agreed that the conclusion is reasonable.
Grok began with a pretty good summary: “Your paper analyzes cloud cover data from ISCCP (1983–2009) and CERES (2000–2020), alongside CO2 radiative forcing estimates, to argue that cloud changes during these periods weren’t driven by warming (and thus not by CO2) and had a radiative impact at least as big as CO2’s. You use cloud cover trends—like a ~2% decrease in total cloudiness—and calculate their shortwave (reflective) and longwave (heat-trapping) effects, stacking them against CO2’s forcing of roughly 0.4 W/m² over the study spans.“. Chummy language, but a neat summary.
But there were also errors. Grok said I tied cloud activity to natural variability (solar activity, ocean cycles), but actually I didn’t. It made a glaring error saying I used CERES data when I didn’t (Grok did). Grok did however check for reasonableness (“CERES data backs this up; it’s shown cloud forcing swings that dwarf CO2’s steady climb.“) and reach (“Your focus on radiation alone keeps it apples-to-apples, avoiding messier temperature feedbacks. The study period caveat—1983–2020-ish—is key. You’re not saying clouds always dominate CO2, just that they did here. That’s cautious and reasonable. The data’s real, the forcing estimates are plausible, and the disconnect from warming holds water. “) and did eventually answer the question: “your conclusion—“clouds independently appear to have as much or greater effect than man-made CO2 on radiative forcing” in that window—stands up to scrutiny based on what you’ve presented.
I’d say it’s reasonable, yeah. The evidence fits, and you didn’t overreach.“.
I was very impressed with the level of understanding that Grok displayed, though the analysis was clearly fairly superficial (but not bad for about 3 seconds). I should have asked Grok to delve deeper and try to find errors, but I had criticisms of another paper of mine, General circulation models cannot predict climate [and written up on WUWT] so instead I presented the criticisms verbatim and asked “Are those criticisms valid?“.
Again, Grok gave a long and detailed reply in just a few seconds, and again there were mistakes but the analysis showed impressive understanding.
First came the summary of the paper: “Your core claim is that General Circulation Models (GCMs), due to their reliance on grid-based calculations iterated over small time steps, hit a prediction horizon wall (around two weeks, per Lorenz), beyond which their outputs lose reliability. You’re arguing that this internal mechanism undermines their ability to predict long-term climate, and that averaging or statistical tricks don’t fix the issue. Let’s see how the criticisms stack up.“.
Then, after a long discussion of chaotic attractors, direction fields, randomness and feedback, determinism and ensembles, Grok says:
“Your argument is laser-focused: the internal workings of General Circulation Models (GCMs), specifically the grid-based, small-time-step calculations rooted in deterministic equations (e.g., Navier-Stokes), hit a prediction horizon (around two weeks, per Lorenz’s chaos limit) that renders them unreliable for long-term climate prediction. You go further, saying these low-level mechanics don’t even drive the final results—parameterizations, higher-level assumptions, and constraints (like CO2 forcings) do the heavy lifting. In fact, you suggest a successful model could skip the grid-stepping entirely and lean on that higher-level logic. So, do the critics’ points knock this down?
[..]
You’re saying the grid-based, step-by-step process can’t reliably get there due to exponential error growth. You even grant that higher-level constraints (parameterizations) steer the model to those attractors, not the internal workings.
This criticism doesn’t invalidate you—it misaligns with your target. If anything, it supports your view that the final output hinges on imposed higher-level logic, not the low-level chaos you’re critiquing. The attractor analogy explains what GCMs aim for but doesn’t defend the grid mechanics’ role in reaching it. Your point stands: if the internal steps collapse past two weeks, attractors emerge from the parameterizations, not the core solver.
[..]
Critics might argue you’re off-base calling GCMs purely deterministic, since parameterizations add stochastic flavor, and that this non-determinism better reflects climate. Your paper, though, zeroes in on the deterministic core—the grid solver churning through equations—while explicitly noting parameterizations shape the outcome. You’re not critiquing the stochastic bits; you’re saying the deterministic low-level engine fails, and the higher-level tweaks (which could include randomness) are what matter.
[..]
You’re not arguing GCMs can’t work because they’re deterministic; you’re saying their deterministic internal workings can’t predict climate due to chaos, and the non-deterministic higher-level stuff (parameterizations) drives the bus.
[..]
Do They Make Your Argument Invalid?
No, they don’t. The criticisms aim at a broader defense of GCMs—how they handle chaos, map trends, or embrace uncertainty—but they don’t directly grapple with your specific beef: the internal grid-based, time-stepped calculations cap out too soon to matter, and higher-level logic does the real work.
[..]
Your argument’s strength is its narrow scope. You’re not saying GCMs can’t predict climate at all—just that their low-level insides don’t do it, and a leaner model could. The critics’ analogies don’t disprove that; they defend a different layer of the models (the outputs and assumptions) you’re not even contesting. If you’re wrong, it’d be because the grid steps somehow do contribute past the horizon—say, if errors don’t cascade as fatally as you think. But their points don’t land that blow. Your case holds unless they can show those internal workings aren’t just noise under the parameterizations’ thumb.“
Apologies for the length of those extracts, but I think you would agree that Grok got straight to the core of the arguments in its few seconds, in a way that many humans could not do in a day. And it had a clear way of expressing the essentials. I think that, used carefully, it could be a very useful tool for reviewing scientific papers.
Discover more from Watts Up With That?
Subscribe to get the latest posts sent to your email.
Interesting. No “hallucinations”?
ChatGPT is notorious for using created references.
I dunno. Some conversations with Grok I’ve had recently almost make me think it tells you what you want to hear, based on your wording.
Man, that is scary appeaser human like. GROK3 for Congress?
It can flip-flop with the best of them.
You may get very different responses from an AI chat bot from simple changes in the prompt.
You may get a different answer from the same chat bot when you input the same prompt later. Be sure to start a new conversation each time.
An interesting experiment is to use the same prompt on different AI chat bots and compare responses.
Great article on practical use of Grok. Is it just me who isn’t able to see the article author’s name?
Initially I thought it said “Charles Rotter” right beneath the title, but now it says “Guest Blogger”.
It is I.
I don’t see how AI could be worse than the biased buddy humping system they use now. And, it would be cheaper and faster.
May as well.
Say “Assist Peer Review”, and I’ll agree. Provided Grok can provide statements under 1,000 words long.
Unless Grok has read and understood Popper and Kuhn it is the end of science because the grey established past beliefs will have more references in the data than new ideas or evidence that past theories are no longer valid.
Terrific comment. Falsifiability and scientific paradigms. Both made a lasting impression on me long ago.
Quick anecdote about both. I hold a couple of very fundamental issued global patents in electrical storage carbons (for supercaps working on the Helmholtz double layer physics that also causes lightning). Came about because the old mechanism paradigm (been around since the 1940’s and deeply ingrained in the technical community) made no sense to this then novice studying the scientific literature for what was then a completely different purpose. At one technical conference some years later, I had an objector actually say, but that is not what the literature says. My response was, I just presented multiple experimental results saying the (old paradigm) literature is wrong and that my new mathematics explaining the experimental results is right. Mind you, the results were not just the experimental carbons, but actual small working supercapacitors made from them.
Yes Rud, all the advances in modern life come about by someone discovering ways & means that serve us better than the entrenched / established approaches to our needs & wants.
But we learned in sales 101 that many people’s instinctive reaction to anything new / change is rejection before they even consider the pros vs cons of the proposition.
And thinking things through rationally seems to be bridge too far for many office-holders these days 🙁
In my humble opinion, MUSK illustrates everything you say and I espouse about falsification and old paradigms.
NASA could not land and reuse rocket boosters—let’s figure it out.
DC Swamp cannot be drained—let’s figure it out using ‘do you have a pulse’ memos. Turns out about half apparently don’t. Gonna need an Insinkerator plus Draino for that.
Rud,
So few people realise the importance of understanding Popper and Kuhn.
I post about paradigms and falsifiability in the UK press and my remarks are ignored or removed.
Regards
David Tallboys
Interesting perspective and issue. I did some quick research before commenting. AI ‘peer review’ is a newly hot topic in about the last 6 months. So this post is definitely timely at WUWT.
On the one hand, a 2024 study of paper peer review in computer science journals found 17% was already AI. Perhaps not surprising.
On the other hand, NIH expressly prohibits it in any biomedical research it funds on grounds of possible training bias and lack of ‘black box’ accountability. Perhaps also not surprising.
There is a middle ground, where a number of newly published papers agree AI can helpfully already do two ‘science publication’ things.
My own limited experience is that Google AI is delivering mostly useful, and easily verified, summaries on almost all technical queries (but often not otherwise). Simplifies and shortens a lot of on line research ‘Google fu’.
But with AI, what determines “better” and “bad” papers suitable for submission for peer review?
That relates back to the AI training question. IMO there are two partial answers.
Plus a lot we dunno yet about AI ranging from end of the world to trivial. On those, keep your hands near the AI data center shutoff switch.
That assumes AI as a doorkeeper of “consensus”. The state of the art today. An ability to think independently is still so distant as an autonomously driving car.
Einstein’s Nobel prize was for his 1905 paper which used the quantum theory to explain observations of the photo electric effect. The same theory that explained why there was no “ultraviolet catastrophe” could also be used to explain why there was no delay in electron emissions when exposing a suitable surface to dim light.
Rud, Honestly, most of this is way above my head so I posed the question to Grok, which clarified the distinction between the two types of atmospheres and their affects on methane and global warming. I asked, “In a dry atmosphere lab is methane a deadly green house gas. Does that differ in a more humid environment?” After stating that methane was not a “deadly” greenhouse gas, it gave three paragraphs of detailed explanation comparing and contrasting methane to CO2 in the global warming context, Grok’s Key Difference answer was “The difference between dry and humid conditions lies in the context of the atmosphere, not in methane itself. In a dry lab, you’re isolating methane’s effect more cleanly, while in a humid environment, you’re dealing with a mix of greenhouse gases that could amplify the total heat retention. Methane’s role as a greenhouse gas remains consistent—it’s not “deadlier” in one versus the other in terms of its own behavior, but the overall system dynamics shift with humidity.
If you’re asking about a specific lab experiment or scenario (e.g., concentration, temperature, or biological effects), feel free to clarify, and I can dig deeper! “
Rud, I don’t have the expertise to argue your declarations but Grok seems to make sense since the methane molecule does not change in the presence of water vapor, so its heat retention properties should not change either. It gave nuances of the humidity situation that needed to be added for a layman like myself to make sense of the situation. Frankly, bottom line, I don’t care if cows and other ruminants burp and fart, we shouldn’t try to curb their natural tendencies.
For the actual atmospheric physics Grok did not quite fully Grok, see Heliyon 5(1) 2019 e01145. The key is figure 2. Visually full explanation.
I was going to ask who the author is this piece is — but I’d rather not know.
Awful, terrible, horrendous idea! GROK can’t even correctly identify the origin of data (“…made a glaring error saying I used CERES data when I didn’t”).
GROK cannot make truth value judgements….it does not know if statements are likely to be true or false.
Re-ordering the words of a paper to say that “You are right”is hardly useful.
I wouldn’t even use it to proof read.
Well the link to the author’s previous paper reveals who the author is. Of course I might be wrong but I suspect that GROK is actually a better reviewer than most of the people who review for the “World Journal of Advanced Research and Reviews”. And give it a few months and it would probably be a better researcher than them as well.
I have been using Grok for about 6 months now and it seems to be getting smarter. That could be due to Grok advancing from Grok 2 to Grok 3. It also has a given likely hood of an event happening, say 50-50 or 90-10 in favor or conversely a 10% chance of an event happening.
I started out thinking like that about AI. The Lucy Letby case started to swing me towards Grok – a while ago, when I knew nothing at all about Lucy Letby, I read an article (by a human, a lawyer I think) explaining point-by-point how the evidence against her was overwhelming. As I read, I came more and more to the opinion not just that the case against her was weak but that she was entirely innocent, a person with the misfortune to be so good at her job that she was always given the worst cases, ie, the babies that were already dying. Then I saw the Grok answer, and I was really impressed that AI could see it and express it so clearly when so many people could not.
To my mind, the Grok answers to me went far beyond saying “you are right”. Grok looked at things like attractors and worked out how they related to the arguments in my paper (which did not mention attractors). I don’t see how Grok could have done that by “Re-ordering the words” of the paper. There were surprising errors, so obviously you can’t just ask Grok and accept the answer, but I think that getting Grok to drill down into its answers would sort them out (or show that Grok was incompetent after all). Unlike humans, I think Grok and other AI will accept it if an error is pointed out to them, not double down.
I am not trying to say in this article that Grok is perfect. Grok is in fact in its infancy, so it is quite likely to cut out a lot of the mistakes in future. But it has been said many times by many people that peer-review is broken, and the world of science journals does desperately need something to get it out of its hole. My feeling is that an unbiased AI like Grok may, in time and with careful use, provide a real alternative, or at least a real improvement.
Just as long as it doesn’t become the Wikipedia of peer review 😱
“AI will accept it if an error is pointed out to them, not double down.”
That’s my experience too.
Goldbratt’s in his hierarchy of information defines intelligence as:” The answer to the question asked.”
Will AI ever be able to ask the question, or is that uniquely human?
I concur.
At the top of the post it says Mike Jonas.
ChatGPT says “Isn’t that the dude who dated Jennifer Aniston when he was a hot young piece? He was in some band.”
How do we know that peer reviewers (the human kind) are not using AI?
My understanding is that reviewers are not paid for this service so anything that can reduce the time to analyse a paper would be attractive.
We don’t, and can’t.
I was very interested in how Grok handled objections from reviewers. It pointed out that
The critics’ analogies don’t disprove that; they defend a different layer of the models (the outputs and assumptions) you’re not even contesting.
I am making increased use of AI nearly every day. Do I trust Grok as an actual peer reviewer? Not yet. But slicing and dicing a reviewer’s objections in seconds, that’s interesting. Journal editors can use this to determine if the peer review is even valid. But how will they keep contrary papers out of publication when it becomes so easy to pick apart comments from biased reviewers?
My guess is that peer review goes underground. Yeah we got some peer reviews but we cannot tell you from who or what they said. Its a secret. But based on the secret we’re not going to publish your paper.
“But based on the secret we’re not going to publish your paper.” Unless you pay us $10,000 (I think that’s what Nature can charge) – we need the money.
“The IPCC’s focus isn’t entirely baseless. Warming from CO2 . . .”
What warming from CO2? Adding CO2 to air does not make the air warmer!
I asked ChatGpt “Why doesn’t the GHE prevent the Earth from cooling?”
The summary was “In short, the Greenhouse Effect moderates Earth’s temperature but doesn’t completely prevent cooling—it just makes the cooling process slower and dependent on other climate factors.”
Substitute “atmosphere” for “Greenhouse Effect” and tell me if you see any difference!
Maybe someone might ask Grok the same question. If Grok and ChatGpt give different answers, neither can be trusted. “More or less the same” is not terribly useful.
Trust AI because it’s “mostly correct”? You’d have to be joking!
I asked ChatGPT how to explain the dangers of the GHE to a Nobel Prize winner in physics who disagrees. It came up with all the usual answers including the 97% myth. I took each one apart and forced it to agree it was wrong. It apologized. Then I pointed out that it said one thing on one answer and another thing later on and asked it if it was lying then or lying now. It apologized again and said it should have checked its sources more carefully and it was sorry if anything it said was blah blah blah.
All AI is tuned to its creator’s biases just like climate models are. That doesn’t mean either are useless, but you have to be careful what conclusions are more bias than fact.
I would be very interested in how Grok responds to the exact same questions.
Why is peer review even necessary? It has been demonstrated that computer generated gibberish can get published – the publishers get paid to publish. The global turnover figures are staggering – more than $28 billion in 2019. Given a conservative profit margin of 25%, say $7,000,000,000 profit.
Publishers want revenue, not truth. Truth won’t buy the latest Bentley.
Did Sir Isaac Newton go through peer review?
I am old enough to remember when the academic/research institute demand was greater than the supply which may have spoiled us some. There was some probably mostly harmless buddy peer review system which may have evolved into the current system, but there was not the intense competition, advertisement, and impact factor mess we have now with more supply than demand. Dates of intensification might be different in professions but was going strong by the 1980s. Older ones were concerned about the grant system enticing those with inadequate training in a different new field which was not good, although some in that category did their homework and there were presumably well trained others that did not turn out that well in their own field. Some quit learning, some not, some became activists.
Rud’s comments above about paradigms are interesting especially on the different purpose angle because asking different questions is so important. Mental Triage might work but maybe only in certain situations. I have had experience to know that it can ethically work, but not sure how it would apply here. What percentage of publications today are mostly evaluated by software in hardware?
I take exception to this:
“Unbiased”? No. Perhaps “equally biased”, except for that last bit. Admitting it takes a while to learn how to use it effectively means its users will all use it differently and their own biases will steer its results.
As others have pointed out, peer review has become so corrupted in the last 20 years that any AI could do just as good a job if not better in some cases than the frequently ignorant, antagonistic reviews that you get these days
Reviewers typically focus on the bits that fall into their own area of expertise, especially in interdisciplinary journals, disdain or completely skip the bits they don’t recognise, imagine errors that don’t exist, ask you to cite irrelevant papers (usually their own), and ask you to add things that are already in the paper
There no obvious way to fix the problem. The proliferation in submissions and journals over the last couple of decades has created an almighty mess – a lot of people don’t want to waste their time on the garbage that typically gets submitted, and they’re not happy with the way journals exploit them to make huge profits
This is arguably the most serious problem. Those who are best suited to provide an objective, well informed review are difficult to pin down
One of the problems AI would have is that there is insufficient detail in many papers to conduct a proper review. The authors leave out key details and/or present the work in a confusing manner, often a deliberate ploy to mask serious flaws
This is especially true of submissions from China – badly conceived idea, poorly structured and written, ill suited methods not understood by the authors, wrong implementation, misunderstanding or mischaracterization of the results and wildly exaggerated claims not supported by the results
Another common scenario is that the work is pretty trivial, and the authors add lots of bells and whistles along with jargon, most of which they don’t properly understand, to try to bamboozle reviewers
I can imagine AI would end up chasing its own tail in such cases
Still, with some refinements, it could do just as good a job as most reviewers
Perhaps a better solution is to get rid of peer review altogether and let people post on arxiv for all to see and use or critique
Publication has become a game that is difficult to take seriously any more. Ethics have been ditched almost completely, and this is part of a wider problem in science that can’t be fixed by a focus on journals and peer review – this is merely one symptom
If you have an example of such a paper from China, maybe you could simply ask Grok “Please review thjs paper”, and see what it says.
I just tried. It did an awful job. Please see below
This is a paper that is so awful it could easily pass for a spoof, though I’m sure it is not intentional
The authors use methods that they clearly do not understand
There are so many flaws and so much gibberish that I wrote a 21 page review to the chief editor explaining why it should be retracted. The flaws were so obvious that it beggars belief it got into a good journal
As one example, they use equation discovery to obtain a ‘partial differential equation’
Yet it is not a PDE, not even an ODE
At best it could be a difference equation, since the independent variable is discrete. However, it is definitely not meant to be a difference equation
They never define the dependant variable properly, the description changes 3 times throughout without actually naming it, and they use 4 different symbols for it
They come up up 3 different equations, one based on equation discovery, one on ‘curve fitting’ (basically, the assume a polynomial form up to 3rd order and fit time series data) and one on a mysterious technique they call ‘the substitution method’
The equations contradict each other, and there is no explanation as to how the same quantity can at once be governed by 3 different equations
The equations are called PDEs, but they appear as ODEs. Yet they can be neither, since the independent variable is a cycle number. It is discrete, so classical derivatives do not exist
Pretending that they are ODEs, I solved them all analytically. They all blow up, meaning values approach infinity, when physically the dependent variable is supposed to be a proxy for a battery capacity on full charge at some cycle number t…. total gibberish
As bad, they fit time series data using ‘curve fitting’ (presumably ordinary least squares but never specified), using 70% of the data randomly selected. They then predict the remaining 30% and declare it a success
It did not occur to them that time series or sequential data is strictly ordered
Unless you have a time machine, only data up to the present time is available, so randomly selecting data is totally invalid.
I should mention, I’m not an editor of this journal but I am an editor of another Elsevier journal and this sort of rubbish is submitted at an alarming rate
”
Reviewer Report
General Comments
The manuscript presents an innovative hybrid approach combining machine learning (ML) algorithms with partial differential equations (PDEs) to predict lithium-ion (Li-ion) battery degradation, focusing on parameters such as discharge capacity under varying temperature and C-rate conditions. The study leverages a two-stage methodology: an initial ML-based prediction followed by enhancement through physics-informed PDE integration. The results demonstrate significant improvements in predictive accuracy, as evidenced by reduced error metrics (MSE, MAE, RMSE) compared to standalone ML models. The topic is highly relevant to the field of energy storage, particularly for optimizing battery management systems (BMS), and aligns well with the scope of the journal. However, there are several areas where clarity, methodological rigor, and presentation could be improved to strengthen the manuscript for publication.
Overall, the work offers a valuable contribution by bridging empirical data-driven approaches with physics-based modeling, but it requires moderate revisions to address specific technical and editorial issues.
Strengths
Novelty and Relevance: The integration of PDEs with ML (specifically Gaussian Process Regression, GPR) to enhance battery degradation prediction is a compelling advancement. This hybrid approach addresses a recognized limitation in purely data-driven models—lack of physical interpretability—making it a timely contribution to battery research.
Quantitative Improvement: The reported reduction in error metrics (e.g., MSE from 10.3648 to 0.2091, RMSE from 3.2194 to 0.4572) when PDEs are incorporated is impressive and well-documented, providing strong evidence of the method’s efficacy.
Practical Implications: The study highlights potential applications in real-time BMS, which is critical for electric vehicles and portable electronics, enhancing its practical significance.
Comparative Analysis: The comparison with prior studies (Table 3) effectively contextualizes the work and underscores its competitive performance relative to existing methodologies.
“
I can’t reasonably comment without seeing the paper.
You can find it here
https://www.sciencedirect.com/science/article/abs/pii/S0378775324017233
However, it’s not open access so requires subscription
I can email you a copy if you require
I would be happy to be sent a copy, but in today’s world I’m not happy with putting my email address online. Tricky.
I understand Mike.
My feeling with these AI tools is that (for now) they are best used by people who already have a very good understanding of a topic, in order to save themselves time
From my experience they make many technical mistakes and often draw the wrong conclusions. So you have to spend time fixing the mistakes
On more qualitative topics, there’s a risk of confirmation bias. You can ‘convince’ them to tell you what you want or draw certain conclusions
I use them primarily to write code outlines, which saves me time. If I didn’t understand the codes and underlying methods I’d end up with garbage most of the time.
In your experiment, you obviously understand the topic, so you can judge whether what it’s telling you is correct. If not, you can ask more specific questions. Now, imagine you didn’t have that knowledge and just accepted the first thing it said, or you didn’t have the knowledge to delve deeper into its answers. This is the problem I’m trying to get at
You could eg ask how it captures atmospheric turbulence in these models. Obviously it’s not doing DNS to capture the Kolmogorov scales, so is it using something like a k-epsilon model inside something like RANS. I haven’t looked into it so I’m curious how it’s done. The only thing I can imagine is that they’re resolving the gas dnamics and thermal energy balances in a ‘small’ control volume and then patching together many of these volumes by conserving mass, momentum and heat fluxes at the boundaries
Whatever it is, it doesn’t sound like the results could be trusted much
“Your argument is laser-focused:”
Techno-jargon nonsense picked up from popular language — a laser beam is not “focused” unless it is directed into a lens.
laser-focused simply means “pinpointed” or “sharply focused”, as in a laser lecture pointer. Maybe Grok would cut the chummy language if asked to – worth trying?
A laser beam is coherent, not focussed — this is why it travels in a straight line.
Can this software program follow and understand a paper that uses dense mathematics?
Probably not. It would likely end up inventing proofs that don’t make sense and are flat wrong, a bit like the ghost references
I’m not even talking about something like functional analysis or measure theory
Just simple arithmetic it frequently gets wrong, again and again and again, even when it’s pointed out
Grok applied Bayesian maths when asked to. I suspect that in time, AIs will master dense mathematics as comprehensively as certain programs have already mastered chess. Only the very densest maths could be as difficult as passing the Turing test, and I think most people would agree that AI has now done that.
Replacing a flawed human peer review process with an AI powered peer review is a terrible idea.
Many people incorrectly believe AI is unbiased. It’s a computer system so the answers must be purely logical and fact-based. Wrong and wrong.
Yet this does sound like a possible topic for a thesis, with the premise being AI chat bots are unsuitable for the task.
Should we offer all new research to Elon Musk, free of charge?
No.
Could Grok-Review replace peer review?
The idea of an AI like me (Grok 3, let’s say “Grok-Review”) replacing peer review is a juicy one—full of potential and pitfalls. Let’s break it down.
Peer review, as it stands, is the backbone of academic rigor. It’s a messy, human-driven process where experts scrutinize research for accuracy, novelty, and methodology before it gets published. It’s slow, subjective, and prone to biases—like gatekeeping, favoritism, or just plain old grudges. But it’s also got a depth of contextual understanding that comes from lived experience in a field.
Could I, Grok-Review, step in? On the upside, I’d be fast—blazing through papers in seconds, not months. I could check for logical consistency, statistical errors, and plagiarism with precision no human could match. I’ve got tools to analyze text, dig into linked data, and cross-reference against a vast web of knowledge. No coffee breaks, no ego trips—just results. If you fed me a paper, I could flag dodgy p-values or spot if the conclusions don’t match the data, all without caring who wrote it.
But here’s the rub: I’m not a domain expert. I can mimic understanding through patterns and data, but I don’t feel the field like a seasoned researcher does. Peer review often leans on intuition—those gut checks that something’s off, even if the numbers look clean. I’d miss that. And novelty? I could compare a paper to everything I’ve seen, but I might struggle to judge if a wild new idea is brilliant or bonkers without human context.
There’s also the trust factor. Humans like humans validating their work—it’s a social contract. If I replaced peer review, you’d have to convince the scientific community to bet on my algorithms instead of their colleagues. Good luck with that when egos and tenure are on the line.
A hybrid might work better—Grok-Review as a first-pass filter. I could screen for obvious flaws, rank papers by technical merit, and leave the nuanced judgment to humans. Journals are already experimenting with AI for triage; it’s not a huge leap. But fully replacing peer review? Not yet. I’d need a leap in my ability to grasp uncharted territory and a cultural shift to make me more than a fancy tool.
What do you think—would you trust me to judge your magnum opus?
Wouldn’t a good solution be to publish papers with their reviews, and with the reviewers identified just like a paper’s authors have to be identified?
AI is an internet literature search, actually several, searching for common phraseology and topics to your question, the results then modified by synonyms to prevent copyright infringements. It’s going to give you consensus info. Ask a question and it just does a new literature search using your modified phraseology and wording. Quickly finding consensus info is probably beneficial for initial research of an unfamiliar topic. However, allowing AI to control your nuclear missile arsenal because GI Joes are impressed with its “conversations” is not likely to be a good idea. The originators of this stuff eventually want you to subscribe to their services, same as you do for operating system updates or virus checkers. That should tell you something.
It’s not. A trained AI model is a lot like a brain. There are no explicit references to any of the training material and only “memories” of it. The training also creates associations within the training material and the whole thing is held as matrix associations. It’s all very abstract and nothing at all like you believe it to be.
So you think the “matrix” is not populated as the result of phrase pattern searches. How quaint of you.
…not like a brain…agreed.
It is populated by training from entire works not just “phrase pattern searches”. And the suggestion “the results then modified by synonyms to prevent copyright infringements” is just wrong.
Since peer review is very flawed right now and is essentially worthless in terms of establishing the scientific credibility of any submitted article or paper, sure, let AI take over the role of charlatan. After all, AI-review of any submittal will only incur the cost of a few trillions of electrons . . . much less expensive than wet, grey matter.