General Discussion

marble falls

(71,557 posts) Wed Jul 9, 2025, 09:19 PM Jul 2025

A.I. Is Getting More Powerful, but Its Hallucinations Are Getting Worse

A.I. Is Getting More Powerful, but Its Hallucinations Are Getting Worse

A new wave of “reasoning” systems from companies like OpenAI is producing incorrect information more often. Even the companies don’t know why.

https://www.nytimes.com/2025/05/05/technology/ai-hallucinations-chatgpt-google.html

By Cade Metz and Karen Weise
Published May 5, 2025Updated May 6, 2025

-snip-

Today’s A.I. bots are based on complex mathematical systems that learn their skills by analyzing enormous amounts of digital data. They do not — and cannot — decide what is true and what is false. Sometimes, they just make stuff up, a phenomenon some A.I. researchers call hallucinations. On one test, the hallucination rates of newer A.I. systems were as high as 79 percent.

-snip-

The company found that o3 — its most powerful system — hallucinated 33 percent of the time when running its PersonQA benchmark test, which involves answering questions about public figures. That is more than twice the hallucination rate of OpenAI’s previous reasoning system, called o1. The new o4-mini hallucinated at an even higher rate: 48 percent.

-snip-

Hannaneh Hajishirzi, a professor at the University of Washington and a researcher with the Allen Institute for Artificial Intelligence, is part of a team that recently devised a way of tracing a system’s behavior back to the individual pieces of data it was trained on. But because systems learn from so much data — and because they can generate almost anything — this new tool can’t explain everything. “We still don’t know how these models work exactly,” she said.

-snip-

Another issue is that reasoning models are designed to spend time “thinking” through complex problems before settling on an answer. As they try to tackle a problem step by step, they run the risk of hallucinating at each step. The errors can compound as they spend more time thinking.

-snip-

Audio produced by Adrienne Hurst.

Cade Metz is a Times reporter who writes about artificial intelligence, driverless cars, robotics, virtual reality and other emerging areas of technology.

Karen Weise writes about technology for The Times and is based in Seattle. Her coverage focuses on Amazon and Microsoft, two of the most powerful companies in America.

“You spend a lot of time trying to figure out which responses are factual and which aren’t,” said Pratik Verma, co-founder and chief executive of Okahu, a company that helps businesses navigate the hallucination problem. “Not dealing with these errors properly basically eliminates the value of A.I. systems, which are supposed to automate tasks for you.”

22 replies

= new reply since forum marked as read

Highlight:

A.I. Is Getting More Powerful, but Its Hallucinations Are Getting Worse (Original Post) marble falls Jul 2025 OP

OK, I'm anything but a high tech guru, boonecreek Jul 2025 #1

Frightening? Beyond frightening! There are server banks for single companies using the energy of small cities ... marble falls Jul 2025 #3

I had a hunch it was worse than I thought. boonecreek Jul 2025 #7

A few links: highplainsdem Jul 2025 #5

And thank you for the links. boonecreek Jul 2025 #8

Garbage in garbage out. enough Jul 2025 #2

An unfiltered mishmash of garbage and fact with no editing protocol for truth or fact. marble falls Jul 2025 #4

Those newer AI models are also worse for the environment. See this: highplainsdem Jul 2025 #6

one word: B.See Jul 2025 #9

What the Ukraine war seems to have developed into a lab for. marble falls Jul 2025 #21

Forbidden Planet 1956............... Lovie777 Jul 2025 #10

Exactamundo. I had that thought myself. marble falls Jul 2025 #20

Anybody here Faux pas Jul 2025 #11

This is why all these companies purple_haze Jul 2025 #12

Companies that try to rely heavily on AI The Madcap Jul 2025 #13

I can probably give you the reason for the hallucinations. PurgedVoter Jul 2025 #14

A good explanation RainCaster Jul 2025 #17

Do incorrect arithmetical calculations count as hallucinations? Disaffected Jul 2025 #18

Grammar error -- "but" should be "so". nt eppur_se_muova Jul 2025 #15

Thanks! marble falls Jul 2025 #16

AI bots are COMPLETELY incapable of actual logic William Seger Jul 2025 #19

It's logic absent volunteered intent. It doesn't meant to be right or wrong, it looks to bridging a gap with ... marble falls Jul 2025 #22

boonecreek

(1,478 posts)

1. OK, I'm anything but a high tech guru,

Reply to marble falls (Original post)

Wed Jul 9, 2025, 09:29 PM

Jul 2025

but it looks like these AI bots use frightening amounts of energy. If I'm wrong, let me know.

marble falls

(71,557 posts)

3. Frightening? Beyond frightening! There are server banks for single companies using the energy of small cities ...

Reply to boonecreek (Reply #1)

Wed Jul 9, 2025, 09:41 PM

Jul 2025

... https://unric.org/en/artificial-intelligence-how-much-energy-does-ai-use/

boonecreek

(1,478 posts)

7. I had a hunch it was worse than I thought.

Reply to marble falls (Reply #3)

Wed Jul 9, 2025, 09:48 PM

Jul 2025

Thanks for the link.

highplainsdem

(61,341 posts)

5. A few links:

Reply to boonecreek (Reply #1)

Wed Jul 9, 2025, 09:43 PM

Jul 2025

From last summer:

https://www.democraticunderground.com/100219054043

Six months ago:

https://www.democraticunderground.com/1127179724

Six days ago:

https://www.democraticunderground.com/100220456258

boonecreek

(1,478 posts)

8. And thank you for the links.

Reply to highplainsdem (Reply #5)

Wed Jul 9, 2025, 09:55 PM

Jul 2025

I've learned a lot here.

enough

(13,726 posts)

2. Garbage in garbage out.

Reply to marble falls (Original post)

Wed Jul 9, 2025, 09:36 PM

Jul 2025

marble falls

(71,557 posts)

4. An unfiltered mishmash of garbage and fact with no editing protocol for truth or fact.

Reply to enough (Reply #2)

Wed Jul 9, 2025, 09:42 PM

Jul 2025

Last edited Wed Jul 9, 2025, 10:23 PM - Edit history (1)

highplainsdem

(61,341 posts)

6. Those newer AI models are also worse for the environment. See this:

Reply to marble falls (Original post)

Wed Jul 9, 2025, 09:48 PM

Jul 2025

https://www.democraticunderground.com/1127183363

B.See

(8,204 posts)

9. one word:

Reply to marble falls (Original post)

Wed Jul 9, 2025, 09:59 PM

Jul 2025

Skynet.

marble falls

(71,557 posts)

21. What the Ukraine war seems to have developed into a lab for.

Reply to B.See (Reply #9)

Thu Jul 10, 2025, 06:59 AM

Jul 2025

Lovie777

(22,609 posts)

10. Forbidden Planet 1956...............

Reply to marble falls (Original post)

Wed Jul 9, 2025, 10:00 PM

Jul 2025

marble falls

(71,557 posts)

20. Exactamundo. I had that thought myself.

Reply to Lovie777 (Reply #10)

Thu Jul 10, 2025, 06:57 AM

Jul 2025

Faux pas

(16,270 posts)

11. Anybody here

Reply to marble falls (Original post)

Wed Jul 9, 2025, 10:02 PM

Jul 2025

surprised? I'm not, I figured that the whole reason for using it was to eff with everyone's mind.

purple_haze

(401 posts)

12. This is why all these companies

Reply to marble falls (Original post)

Wed Jul 9, 2025, 10:12 PM

Jul 2025

are planning on building small nuclear reactors next to all of these data centers, and why China is building 50+ new nuclear reactors. And why Microsoft just purchased an old out of commission nuclear power plant.

The Madcap

(1,853 posts)

13. Companies that try to rely heavily on AI

Reply to marble falls (Original post)

Wed Jul 9, 2025, 10:15 PM

Jul 2025

Are going to see the accuracy of their data decline significantly. Those that don't will be cast aside due to the trendiness of it all. Nobody wins.

AI is today's crypto.

PurgedVoter

(2,701 posts)

14. I can probably give you the reason for the hallucinations.

Reply to marble falls (Original post)

Wed Jul 9, 2025, 10:18 PM

Jul 2025

These processes are running on lossy logic. To do the calculations they use systems based on graphic processors. Graphic processors are strange. They draw triangles easily and squares slow them down a lot. They deal with numbers between 0 and 1 while rounding in strange ways that are hard to explain. They do massively parallel calculations quickly. A power that is the basis of our new and magical computer age. It is also a flawed power based on dropping a lot of data and just moving on to the next calculation. All of this is low level and built into the chips. It allows them to do things that are new and amazing. That new and amazing does things with a potential cost, accuracy.

Graphics processors do amazing work, but it is lossy work. In other words, if the system doesn't estimate that you will see it, it doesn't draw it. It drops that data in order to speed up the system. When you are doing calculations and you don't have a grasp for meaning, and AI is going to be hard put to grasp meaning, then dropping data that seems to have no meaning means your calculations, as you go through millions of calculations, can accumulate errors that can produce artifacts of "Knowledge" that do not exist in your sources for knowledge. This can quickly compound into "Hallucinations."

While a few dots of glitch in a fast moving game has little effect on the amazing images produced, when images are made with less and less basic input, and lower levels of basic logic, images can degrade quick. When the AI is working with meaningful text, sadly the lack of real basis can allow small rounding errors to turn into insane creations without basis. AI can do great work but it has no internal understanding to rule out insane results.

If you ask for an image of a "Lady sitting with crossed legs," the odds are quite high that you will get legs that don't connect or legs that connect to one knee with an extra leg thrown in under that double knee. This shows that in the AI graphics system there is no real comprehension of structure or physics. It draws pictures and makes assumptions. You might get a functional and beautiful image, but one leg or three legs will almost be as common as two legs. Take out the crossed legs and you will get much better results, but when you ask a question, you probably don't have a clue what would be crossed legs for a text generating AI.

If you use this as a comparison for how text AI works, you will find your answer. AI can give you great answers that you need to double check just in case. AI because it is organized a bit differently than we are, can bring things out that you might have not seen. It can be very useful. It is also likely for it to fail dramatically for the same reasons that images of hands and faces can get glitched easily. AI does not exist in the same sort of environment that we do. Meaning for it is not the same as meaning for us.

There is another issue that could cause a lot of AI issues. As AI gets more common, AI will base more of it's decisions on what previous AI came up with. If it uses the same sort of logic, the flaws that made sense to a previous AI are likely to be taken as good data. Call if confirmation bias. Confirmation bias messes up human logic all the time. I expect it will end up as a very big issue with AI calculations.

RainCaster

(13,582 posts)

17. A good explanation

Reply to PurgedVoter (Reply #14)

Wed Jul 9, 2025, 11:17 PM

Jul 2025

Thanks for taking the time to explain this so well.

Disaffected

(6,345 posts)

18. Do incorrect arithmetical calculations count as hallucinations?

Reply to PurgedVoter (Reply #14)

Thu Jul 10, 2025, 12:01 AM

Jul 2025

Likely not but it is an interesting situation anyhow.

I recently asked ChatGPT a series of questions related to hydrogen/boron fusion and as part of one answer it calculated the amount of boron that would be needed to generate the world's current electricity consumption. The calculation was off by a factor of about 1,000.

I asked why the error and how could I trust anything else in the replies if it makes such simple errors. The reply was it was a result of how large language models work and, if you want precise calculations i.e. correct calculations, you have to specifically request it.

BTW, it also give wrong factual information, not related to calculations.

However, one thing I do find rather astonishing is how it will provide sophisticated computer programming code from a rather simple English description of what one wants the program to do and, it seems to be pretty dammed good at it. I have read that in some organizations, around 90% of the code produced is by AI and the programmer's task is mainly to go over and check the code for correctness. I'm not sure at all I would want to make a career of computer coding now as it looks like the demand for programmers might soon dramatically drop.

eppur_se_muova

(41,600 posts)

15. Grammar error -- "but" should be "so". nt

Reply to marble falls (Original post)

Wed Jul 9, 2025, 10:41 PM

Jul 2025

marble falls

(71,557 posts)

16. Thanks!

Reply to eppur_se_muova (Reply #15)

Wed Jul 9, 2025, 10:46 PM

Jul 2025

William Seger

(12,366 posts)

19. AI bots are COMPLETELY incapable of actual logic

Reply to marble falls (Original post)

Thu Jul 10, 2025, 12:29 AM

Jul 2025

The neural nets that are being used are great at pattern matching, which allows them to convincingly parrot human dialog, but actual logic requires the ability to evaluate premises for veracity and to evaluate alternative explanations for those facts to ensure that conclusions necessarily follow the premises, in the sense that if the premises are true then the conclusion cannot be false. AI bots are still completely incapable of doing either of those things. What AI bots can do is extremely useful, but I am of the opinion that we are a VERY long way from being able to do real logic.

marble falls

(71,557 posts)

22. It's logic absent volunteered intent. It doesn't meant to be right or wrong, it looks to bridging a gap with ...