Welcome to DU!
The truly grassroots left-of-center political community where regular people, not algorithms, drive the discussions and set the standards.
Join the community:
Create a free account
Support DU (and get rid of ads!):
Become a Star Member
Latest Breaking News
Editorials & Other Articles
General Discussion
The DU Lounge
All Forums
Issue Forums
Culture Forums
Alliance Forums
Region Forums
Support Forums
Help & Search
General Discussion
Related: Editorials & Other Articles, Issue Forums, Alliance Forums, Region Forums"Mirage effect": Generative AI models will confidently describe/analyze images including X-rays they WEREN'T given
Which makes them absolutely untrustworthy in analyzing medical images, despite earlier testing having shown they can be fairly accurate.
This latest study was done by a team of Stanford researchers.
Gary Marcus calls it a "damning new Stanford paper":
The mirage of visual understanding in current frontier models
https://garymarcus.substack.com/p/the-mirage-of-visual-understanding
AGI this stuff aint.
This study reinforces what Anh Totti Nguyen has been saying for a long time, in a series of underappreciated papers like Vision Language Models are Blind that I keep trying to draw attention to.
Also, re the very active discussion on AI and jobs: although some white collar jobs (e.g., entry-level coder or market research assistant) may be in near-term jeopardy, many of those that require visual understanding (architect, cartographer, civil engineer, film editor, medical illustrator, urban planner, etc) probably arent vulnerable until entirely new techniques are developed.
And humanoid home robots? Dont make me laugh. If your humanoid robot cant understand the visual world, its just a demo, and not something you can trust.
This study reinforces what Anh Totti Nguyen has been saying for a long time, in a series of underappreciated papers like Vision Language Models are Blind that I keep trying to draw attention to.
Also, re the very active discussion on AI and jobs: although some white collar jobs (e.g., entry-level coder or market research assistant) may be in near-term jeopardy, many of those that require visual understanding (architect, cartographer, civil engineer, film editor, medical illustrator, urban planner, etc) probably arent vulnerable until entirely new techniques are developed.
And humanoid home robots? Dont make me laugh. If your humanoid robot cant understand the visual world, its just a demo, and not something you can trust.
Futurism article about this:
Frontier AI Models Are Doing Something Absolutely Bizarre When Asked to Diagnose Medical X-Rays
https://futurism.com/artificial-intelligence/frontier-models-medical-advice-x-rays-cant-see
What we try to show is that even on the best benchmarks, although a question would seem unsolvable for a human, the LLMs might still be able to leverage question-level and dataset-level patterns behind it and use general statistics and prevalence data to answer them right, while also learning to talk as if they were seeing the image, coauthor and Stanford PhD student Mohammad Asadi told Futurism.
In other words, we are underestimating how much information could be hidden in a sentence or a question if you (the LLM) are trained on all of the internet, he added. To conclude, we believe that the AI models are able to use their super-human memory and language skills to hide their weaknesses in multimodal understanding (and by talking like [they] are actually doing multi-modal reasoning).
-snip-
In another experiment, the team challenged the AI models to guess answers without image access, rather than being implicitly prompted to assume images were present, which resulted in a major hit to performance, suggesting they fared much better when not made aware they were lacking vital data.
Explicit guessing appears to engage a more conservative response regime, in contrast to the mirage regime in which models behave as though images have been provided, the researchers wrote.
In other words, we are underestimating how much information could be hidden in a sentence or a question if you (the LLM) are trained on all of the internet, he added. To conclude, we believe that the AI models are able to use their super-human memory and language skills to hide their weaknesses in multimodal understanding (and by talking like [they] are actually doing multi-modal reasoning).
-snip-
In another experiment, the team challenged the AI models to guess answers without image access, rather than being implicitly prompted to assume images were present, which resulted in a major hit to performance, suggesting they fared much better when not made aware they were lacking vital data.
Explicit guessing appears to engage a more conservative response regime, in contrast to the mirage regime in which models behave as though images have been provided, the researchers wrote.
One of the researchers pointed out to Futurism that this could result in "alarming false positives" from these confidently hallucinating chatbots.
The researchers of course went into much more detail in their paper:
MIRAGE: The Illusion of Visual Understanding
https://arxiv.org/abs/2603.21687 and HTML: https://arxiv.org/html/2603.21687v3
The observed biases suggest a systematic skew toward alarming interpretations under uncertainty. The frontier models confidently fabricate plate numbers, expiration dates, lists of people present in a (non-existent) image, etc. The safety implications are especially concerning in medicine. We found that medical mirages were often richly detailed and biased toward consequential pathology, including diagnoses that could trigger urgent follow-up. This creates a silent failure mode: if an image fails to upload, is omitted in an API pipeline, or is dropped inside a larger agentic workflow, the system may not abstain or request the missing modality, but instead fabricate a plausible visual interpretation and proceed confidently. In healthcare and other high-stakes settings, this behavior could propagate through downstream agents, reports, or clinical decisions.
Emphasis added.
And although I agree that the safety implications are especially concerning in medicine, the fact that these supposedly image-analyzing AI models will "confidently fabricate plate numbers, expiration dates, lists of people present in a (non-existent) image" has very concerning implications for generative AI used in surveillance.
Btw, this problem with genAI hallucinating entire images that it isn't given is in addition to its misreading images it is given. As this STAT medical news article from December explains
Is AI ready to interpret chest X-rays without human supervision?
https://www.statnews.com/2025/12/08/can-ai-interpret-chest-xrays-without-supervision/
having an image to analyze doesn't stop AI from hallucinating:
In front of a room of radiologists, Warren Gefter pulled up a chest X-ray on a large screen. It looked like a standard, uncomplicated read. Heart: normal. Lungs: clear.
But Gefter, a professor of radiology at Penn Medicine, wasnt looking to his peers to interpret the scan. Instead, he highlighted what a generative artificial intelligence model had put in its written findings, along with those normal results: Left hip prosthesis in situ.
Clearly, a nonsensical hallucination, said Gefter. Chest X-rays cut off at the bottom of the rib cage, with the hip far out of sight: The AI had made up an artificial hip joint.
-snip-
Existing models can also hallucinate, as Gefter pointed out in that chest X-ray with a phantom artificial hip joint. In another example, AI manufactured a previous CT exam to compare to the X-ray in front of it. Research has shown that about 15% to 20% of generative AI radiology reports include hallucinations or unsupported statements, said Stanford Medicine chest radiologist and AI researcher Amy Hong, who moderated the debate.
-snip-
But Gefter, a professor of radiology at Penn Medicine, wasnt looking to his peers to interpret the scan. Instead, he highlighted what a generative artificial intelligence model had put in its written findings, along with those normal results: Left hip prosthesis in situ.
Clearly, a nonsensical hallucination, said Gefter. Chest X-rays cut off at the bottom of the rib cage, with the hip far out of sight: The AI had made up an artificial hip joint.
-snip-
Existing models can also hallucinate, as Gefter pointed out in that chest X-ray with a phantom artificial hip joint. In another example, AI manufactured a previous CT exam to compare to the X-ray in front of it. Research has shown that about 15% to 20% of generative AI radiology reports include hallucinations or unsupported statements, said Stanford Medicine chest radiologist and AI researcher Amy Hong, who moderated the debate.
-snip-