[Haskell-cafe] Formal Verification & Modern AI Safety

Fri Jan 10 10:51:30 UTC 2025

On 09.01.25 20:14, Ben Franksen wrote:
> According to the IMO very insightful article "ChatGPT is bullshit"
> (https://link.springer.com/article/10.1007/s10676-024-09775-5)
> "hallucination" is a misleading term for the phenomenon. It can be much
> better understood as "bullshit" in the precise sense of uttering
> statements with complete disregard to factual truth. This is a
> fundamental design property of LLMs and the authors convincingly argue
> that attempts to mitigate the problem by supplying additional external
> "truth oracles" are unlikely to work.

That article is meandering a bit between attempts at terminology and 
arguing that problem mitigation may not be achievable.

I think they have a point that LLM-based AIs are neither "hallucinating" 
nor "lying" when they emit nonsense, and that such terminology is 
misleading, as "hallucination" implies reasoning about the world and 
"lie" implies intent, and LLMs do neither.

I find their attempts at a better definition of "bullshit" do have some 
interesting points but miss others.
So that's an interesting read for people with an interest in 
terminology, but not for people who want to have a good definition of 
"bullshit" (I think that term is too loaded anyway, though that's 
obviously part of the appeal of the word).

I do agree that current attempts at mitigation have been unsatisfactory, 
but their arguments that future attempts are unlikely to work have 
various loopholes.
Still, I do agree that any LLM-based AI with a working, reliable, 
fit-for-production use consistency checker is several years away. At 
best, the current attempts will pave a way but still require a lot of 
work to iron out kinks, at worst, it indeed turns out that LLMs are 
useless for fact-based reasoning.

So I think the answer to Mostafa's original question is:
No, with current-day LLM technology, it will not become practially useful.
Yes, you can experiment with it anyway.
No, there is no guarantee that these experiments will help with a 
future. An LLM-based AI that does reasoning will need to offer 
additional APIs to feed the moral equivalent of a facts database, so 
experiments with today's LLMs will not give you insights how to do that 
well.
Yes, experiments might give insights that are unrelated to what you are 
trying to achieve, so experiment away if you just want to do it ;-)

HTH
Jo