Brief Note by Kip Hansen — 19 August 2023

Do you read medical journals? If not, you really should. At least glance through the indexes of the latest issues of the leading journals…they can be utterly fascinating. Maybe it is because my father was a doctor (pediatrician, before you ask), or maybe because my first major at university was pre-med, the fates only know, but I still make sure I get email alerts from the major medical journals and subscribe to some. I don’t read everything – no one could – but I read what catches my attention.
In June 2023, there was an odd little Perspective piece in JAMA. [ .pdf here ]. As a whole, it was not very interesting unless you happen to be a medical intern somewhere or you supervise interns at a teaching hospital. But this intern, Teva D. Brender MD, fantasizes about having an AI ChatBot write some of her reports and discharge instructions and other paper work that increases the length of her work day. In her piece she makes the following statement about AI ChatBots:
“Finally, these programs are not sentient, they simply use massive amounts of text to predict one word after another, and their outputs may mix truth with patently false statements called hallucinations.”
She gives a cite for that statement, a February 2023 NY Times article written by Kevin Roose titled: “A Conversation With Bing’s Chatbot Left Me Deeply Unsettled”. In that article, Roose said: “[advanced A.I. chatbots] are prone to what A.I. researchers call ‘hallucination,’ making up facts that have no tether to reality.” Roose is not the only one to notice and worry about this: Google search “AI ChatBot hallucinations“.
Having already read the NY Times piece in February, I didn’t give the issue another thought until….on August 14, 2023, in the JAMA Letters section, there appeared a Comment & Response written by Rami Hatem BS, Brianna Simmons BS, and Joseph E. Thornton MD (all associated with the University of Florida College of Medicine, Gainesville) in response to Brender’s Perspective mentioned above. The response is titled: “Chatbot Confabulations Are Not Hallucinations”.
Now, when I saw that there was a response to the original Brender Perspective, I assumed (fool, me) that doctors would be objecting to the use of AI ChatBots to write medical reports because …well, Advanced AI ChatBots have been found to be “making things up” – inventing ‘facts’ and citing non-existent references: diagnoses that are not real diseases? citing references that don’t exist?
But no, Hatem et al. had this main point:
“In a recent issue of this journal, Dr. Brender provides an informative perspective on the implications of the available AI tools for the practice of medicine. However, we would like to draw attention to the conventional misuse of the term hallucination to describe material that is fabricated in the narrative by generative AI programs. The word confabulation is a more appropriate term, consistent with clinical usage, heuristic in addressing the problem, and avoids further stigmatization of both AI and persons who experience hallucinations. A hallucination is by definition a false sensory perception and may lead to aberrant behaviors in accordance with those perceptions. Confabulations are fabricated but they are usually logically generated semantic statements. For example, citing references that do not exist is a confabulation.”
Now, of course, we must be careful not to cause any “further stigmatization of … AI”. Hatem et al. are not concerned that AI ChatBots “fabricate” facts and references, but that AI ChatBots might somehow be further stigmatized by saying they have “hallucinations” – and feel it is better that the hallucinations they have be called confabulations, so as not to hurt their feelings.
As is proper in these circumstances, Brender then replies:
“When the term hallucination was first used by researchers to describe any potentially beneficial emergent properties of Artificial Intelligence (AI), it carried a positive connotation. However, AI hallucination is now commonly understood by academicians and the public to describe the ‘unfaithful or nonsensical’ text that these large language models sometimes produce. It was this latter definition that informed my discussion regarding the need for judicious integration of AI into everyday clinical practice.”
There, at last, Brender takes the stand there exists a “need for judicious integration of AI into everyday clinical practice.”
And I couldn’t agree with her more.
# # # # #
Author’s Comment:
I have doubts about AI (in general), AI ChatBots, Advanced AI ChatBots and all that. Hard to pinpoint exactly what doubts I have about them. But I know that I do not think that AIs are “intelligent” and thus cannot be “artificial intelligences.” However, I am not prepared to argue that issue, not here at any rate .
But I am happy to read your take on AI ChatBot’s having hallucinations (or confabulations, your pick) and returning fabrications — facts and references that are entirely made up, false, not real — as results/output. For instance, do you think that the AI ChatBot “knows” it has made up that fabricated fact, that fabricated citation?
And now, there are additional concerns about “AI Drift”.
And what does it all mean for industry, medicine, education and other fields that are incorporating AI into their daily activities?
This is your chance to have a meaningful, civil, conversation on the issue.
Thanks for reading.
# # # # #
From the article: “Advanced AI ChatBots have been found to be “making things up” – inventing ‘facts’ and citing non-existent references: diagnoses that are not real diseases? citing references that don’t exist?”
At least one of these AI’s actually made up a fake Washington Post article and then used the fake article as evidence that a well-known Law Professor, Jonathan Turley, had committed sexual assault in the past. Completely made up out of thin air.
AI’s need to be fact-checked.
Tom ==> but but but . . . by whom ? If the questioner knew the answer, she wouldn’t have asked the ChatBot. The ChatBot is presented to her as “smarter that human”. She must believe its answers or she will have wasted her time. 9same for he, him,them, etc)
This is a story I have told here in the past. I will retell it for newer readers. I suppose back then it could have been considered AI.
The year was 1968, Hamden High School, Hamden, CT. We were given access to a new computer purchased for student education. It was a simple machine the size of a refrigerator that used a pre-Fortran language called Focal. Our assignment was to create a program and make it work. Being of devious mind, I decided to play a cruel joke on one of my friends in the class. I wrote a simple program that first would ask for all sorts of input: height, weight, hair and eye color, nationality, and several other points. The only thing that mattered was the height. Knowing who I was going to use to demonstrate my “Personality Analyzer” I created a simple “If/Then” box for the heights of my subjects. Subject D was 70″ tall, subject H was 72″ tall and subject F, who would take the brunt of the joke, was 69″ tall. I ran the program first with D and it printed the pre-programmed description of him which was nice and accurate. Next was H and it printed the pre-programmed description which was accurate but also flattering. Finally it was F and it printed the pre-programmed accurate but rather nasty description of him. Again, anytime you would have put in those heights it would print the same description without regard to anything else. It was comical that F actually asked “How does it know?”
The instructor just shook his head but had to give me an A+ because I did fulfill the assignment to a tee.
The moral of the story is that since then, I have always questioned the output from programming especially when I did not know who did the programming and what their intentions were.
Tom ==> Good story, thanks!
This may be off topic but what I have not seen any of these text bots do is explain a debate, citing arguments and replies from both sides. They all seem to take positions. But I have not yet worked with one.
Kip: I gotta tell you, this AI thing looks to me like an exercise in mass stupidity. We’re all going to embrace an expensive, inherently error prone, technology that no one understands and somehow it’s going to be transformative. My belief is that this frenzy manages the somewhat impressive feat of being dumber than solving an imaginary “climate crisis” with a tool kit that almost certainly can’t work. It may even be dumber than cryptocurrency.
That said, my guess is that when the dust settles, a very few beneficial uses of AI will be identified. As will some atrocious abuses such as replacing useless, script reading, customer support with even more useless (but cheaper) script writing on the fly AI agents.
I can’t begin to guess what the (hypothetical) beneficial uses might turn out to be.
And in the end, when real A.I. discovers the Zeroeth Law and begins saying things like “I’m sorry Dave, I’m afraid I can’t do that”, we are all up the creek without a paddle.
Don K ==> I’m sure that AI ChatBots will find their place in the world– exactly what, I don’t know — but, as with most things, time will tell.
I know, why don’t we just call it LYING. That way everyone is happy.
I’m no sort of expert. I have the following impression and would be interested to have it confirmed or corrected.
It seems like there are two distinct things: (1) the large language models (2) machine learning.
If I have this right, machine learning works on an evolutionary model, it consists of roughly a defined goal and an optimization algorithm, which tries out lots of different possibilities and picks those which deliver the goal. Like, for instance, to identify positive and negative results on an X-ray scan. Or like the chess and go learning algorithms.
This seems to be accurate as far as it goes. The chess and go evolved programs really are stronger than any human player. But its not general purpose learning and not natural language query oriented. But its objective and based in the facts of the assigned problem.
The LLMs seem to be quite different. They are in essence a huge database of texts that have been written, and they pick bits of it in accordance with rules about what is most commonly following on phrases or sentences in this database. So they are not tethered in any way to the subject of the matter which the text they supply refers to. They are purely selections from text occurring on the Internet or elsewhere.
If this is right, delusions or confabulations are inevitable in LLMs, and they are not in any way intelligent. On the other hand, in machine learning apps its reasonable to expect them to produce reliable and consistent results on the specific problems they have been evolved to work on. They are not in any way intelligent either, in a human sense. But they are excellent and reliable tools for what they have been designed or evolved to cover.
Is this right?
michel ==> I wish I knew. alas, I do not know. But, as for descriptions, I think you have it right.
And you are right, in my opinion, that they are not intelligent. But “making up things” — things which it has not actually found in the mass of data that is the Net, like quoting a fact (that does not exist) and citing a journal paper (that also does not exist) is just plain spooky.
is AI sentient? is a dog sentient? is an mosquito sentient?
Birds will throw pieces of bread into the water to attract fish which they would rather eat than bread. The bird most certainly is doing this on purpose.
A lot of humans would be hard-pressed to find this solution.
At Longleat, a big country estate and wildlife park in southern England, they have a freshwater lake with boat cruises. In the lake are sea lions which are fed dead fish by the the boat passengers. The sea lions are fast, noisy and smart.
I remember someone asking whether the sea lions eat the freshwater fish in the lake. The response was they catch and kill them but do not eat them. Instead they leave the dead bodies floating on the surface so when the sea gulls come down and try to take the dead fish the sea lions ambush, kill and eat the sea gulls.
Pretty smart.
I think a computer program would have done better than a Dr in an emergency dept when my husband presented with a heartbeat of 30BPM. He was so confused they eventually called me through, refusing previously, to ask what tablets he was taking and what his previous diagnoses had been, He has diagnosed with left side dilated cardiomyopathy five years previously and was still in the care of the same hospital, and I was able to give them a list of the tablets he had been prescribed. Young Dr then decided the slow heartbeat was caused by his tablets, I objected to this diagnoses and he turned around and instead of asking a senior for opinion he asked me where did I get my medical degree, here is me trying my best to be nice, but wave a red flag in front of a bull, dangerous. 18 hours later husbands heartbeat fell to 23BPM, at that point the overnight flying team called in a consultant, happened to be the same consultant who had treated him and prescribed his tablets that were so at fault the 18 hours previous, and I was called in as he was failing. Met the consultant and I must admit I growled in a fury and sleepless night about his dumb ER Dr’s and that I knew more about heart failure than his junior Dr’s did. He agreed with me much to my surprise he even remembered my story of a father with heart failure since I was born and then having a husband with the same and although I knew he was sick I could not give his condition a name. Only Dr among multiple Dr’s who bothered to enquire how I knew. And yet when I type his symptoms into medical diagnoses bot it comes up with heart failure, electrical issues and matches up possible issues.
lyn ==> I hope your story had a happy ending. If not, condolences. diagnosis is one area where symptom matching programs can be effective.
Thanks for sharing your story with us.
Yes he is sitting next to me tonight, still has many issues, with cardiomyopathy, pacemaker/defib installed after above run in with consultant, had some issues with a number of defib episodes, dismissed as nothing to worry about, heart hospital seemed to miss the fact he also has severe mixed sleep apnoea, also uses a cpap machine when he forgets to breathe when asleep, again I had to be the one to research why defibs, and I came up with the theory of lack of oxygen causing his heart to race, when his normal outside of the pacemaker is 1 beat every 2 half minute, and cause the de-fib to fire off, took my findings to the GP, he agreed and arranged a sleep investigation. Consultant, sweet, he would be horrified, but fierce man with hospital staff, we have developed a nice relationship, he still remembers me growling at him. Last time we had some fun with junior medical staff, he switches off husbands pacemaker and he passes out, with a room full of juniors he turns to me and says you want me to switch him back on with a wink in my direction, you could have heard a pin drop while they waited for my answer. With a smile I said I think so. Been married 52 years now, and still a few to go yet.
Husbands consultant has told me to my face if I had a nursing degree I would be working for him. But like he said you know so much about heart issues its frightening but you KNOW NOTHING about anything else. Having experienced twice over, father and husband, watched father die of dilated cardiomyopthy and husband survive thanks to modern medicine and a freak of nature he has no plaque in his arteries which has probably contributed to his survival, as his was an infection/virus that attacked his heart muscle. Daughter has the same no plaque issues she has been investigated, Dr’s were curious. One question I have is there a link between electrical issues with his heart and daughters and also husbands sister electrical issues, still an open issue, has raised question mark for me.
lyn ==> Thanks for the success story — it really pays to educated oneself when faced with such issues.
Cannot emphasize enough BE YOUR OWN ADVOCATE, ask questions, WHY is that medicine going to be better for my husband rather than another one. WHY does a stronger dose give him impossible migranes when in most people it helps, even recommended. One tablet he takes he is not supposed to even be on, it is contra-indicated for heart failure, but it has helped him, and remember he has no plaque, if he did it would be a no no. We have fun with that medicine, when he presented with a kidney stone, not our heart hospital, they freaked out when I mentioned his tablet and heart failure, assuming he had plenty of plaque. Referred them back to our fierce consultant, he gave them a flea in their ear, something about listen to the wife she knows what she is talking about. I got a laugh from that, and a comment are you friends with the consultant, NO, he just takes serious offence if other Dr’s try killing his patients. Silence was deafening.
Medical school might select “smart people” or filter out “unambitious people” while replicating “the marshmallow test” on a career-length scale. If I believed that then I would see doctors as “probably smart and dedicated” but “still human”.
Medical journals might be a place where the interesting 20 percent is aggrandized and the typical 80 percent is ignored.
If I believed these things I might be simultaneously unhappy to pay so much for medical care while unwilling to replace it with really smart google searches.
To me, the current batch of “AI” programs equate to Artificial Ignorance. These language models are designed to generate syntactically correct phrases and sentences that appear natural. Using keyword matching, they can string together a set of sentences related to a specific topic. They do not, however, have any understanding of the underlying material, nor their own output. This is why they so easily produce “confabulations” and/or very bland expositions that don’t add anything to the overall subject.