this post was submitted on 25 Jun 2026
16 points (100.0% liked)

Programming

27567 readers
339 users here now

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Rules

  • Follow the programming.dev instance rules
  • Keep content related to programming in some way
  • If you're posting long videos try to add in some form of tldr for those who don't want to watch videos

Wormhole

Follow the wormhole through a path of communities !webdev@programming.dev



founded 3 years ago
MODERATORS
 

The failure analysis in First Proof’s Appendix A describes something qualitatively different from the hallucination patterns studied in factual QA: models producing proofs that are fluently wrong, where the wrongness is concentrated in a small number of unjustified load-bearing claims rather than spread across obviously false individual facts. I have tried in this paper to give that pattern a precise enough description to be studied systematically. The taxonomy has four modes (F1: citation fabrication, F2: premise smuggling, F3: silent reformulation, F4: local-to-global gap), and my empirical audit of eight Flash proofs finds that F2 accounts for the failure in every case—even though it is the mode least targeted by existing mitigation proposals.

The obvious question this raises is whether it is possible to build a system that doesn’t produce these failures in the first place, as opposed to detecting them after the proof has been written. A prevention-oriented system would need to enforce, during generation, that every load-bearing claim in the proof is either derived from stated premises, grounded in a retrieved and verified source, or explicitly flagged as unverified before the output is returned. The failure modes described here are, I think, a reasonable specification of what such a system would need to prevent.

no comments (yet)
sorted by: hot top controversial new old
there doesn't seem to be anything here