this post was submitted on 27 May 2026
213 points (98.6% liked)
Fuck AI
7400 readers
1703 users here now
"We did it, Patrick! We made a technological breakthrough!"
A place for all those who loathe AI to discuss things, post articles, and ridicule the AI hype. Proud supporter of working people. And proud booer of SXSW 2024.
AI, in this case, refers to LLMs, GPT technology, and anything listed as "AI" meant to increase market valuations.
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Many academics around me have a paid plan of LLM of some sort, most are on $100 plan some are on $200, all of them are getting reimbursed for their plan.
Most of them uses it to optimize code, generate visualization, or formalize pen and paper proof.
I hated it, and don't use much of it myself. But it seems too useful for these people and it is hard to stop them. As an example, formalizing a pen and paper proof can take an expert weeks, if not month of work, whereas it only takes codex a week.
But I do feel this success is tied to the nature and value of academia, and might not transfer to other fields or industrial projects:
I hate to hit you with such a tangentially related query to your main point, but what is "formalizing a proof", and why is there such a discrepancy between the time it might take an expert (weeks if not months) vs an LLM?
(It probably goes without saying, but my college career was spent in the Humanities, where there was not much emphasis on proofs, formal or informal, so I'm curious how the other half lives.)
There are proof assistants https://en.wikipedia.org/wiki/Proof_assistant that would encode a mathematical proof as code, and verify its correctness for you.
Writing completely formal proof is very painstaking, because it means we will need to flash out a lot and a lot of details (which are mostly trivial for experts) for computers to accept it, and we also need to know how to work with proof assistants.
Human proofs often ignore these details to make it readable, yet also make it more prone to mistakes. Whereas formalized prove in proof assistant can very rarely be wrong (unless there is an unlikely bug in the assistant kernel), but mostly unreadable (unless the proof is incredibly elegant).
So in general, translating good human proof to computer proof requires more expert labor than huge conceptual innovation, yet it usually require the steep learning curve of understanding the ins and outs of a proof assistant, which can take years of experience.
LLM used to be pretty bad at this because even filling in trivial details can quickly derail them. Recently a few flagship coding model are finally able to do this, albeit with a large amount of token consumption in thinking.
Fascinating. My godfather is a mathematician at a local university, I'll ask him whether this is something he runs into. I've heard him grumble about formatting his work in the past, but usually in the context of using Latex, which I gather is something else entirely.
Yeah LaTeX is a bit different, think about describing a process in MS Word, v.s. writing a program that performs such process on a computer: LaTeX is more like description aimed for human consumption, where as formal proof is more like a program that computers can rigorously execute.
Proof assistant have only attracted the attension of mathematicians very recently, thanks to the organization surrounding MathLib in Lean, and the promption of Terance Tao.
It also rides the AI train quite a bit, as AI have a tendency to confidently be wrong, having a computer to check its proof can be very useful.
And how badly does it hallucinate while doing so?
That is the thing about formal proof: if the definition is correct, which usually is relatively short and should be written by human, there is almost no chance of the prove being wrong. The only exception are when the LLM exploits a bug in the proof assistant kernel, and these kernel are usually designed to be exceptionally small, thus making bugs unlikely.
That being said, opus 4.6 found a bug that eventually lead to the proof of false (opus is unable to produce the proof of false, hence unlikely to exploit it): https://github.com/rocq-prover/rocq/issues/21682
However, like I said, the code quality of the llm is usually not on par with an expert, and they have a tendency to produce unnecessary lemmas and complications that will need to be cleaned up by human.
Also, we have a very detailed pen and paper proof, which are designed to be easily translatable to proof assistants. We have also setup all the lemma and theorems to reach the end goal. All of these are done by humans, without these, I don't believe any LLM can make much progress on this project.