Programming

27567 readers

339 users here now

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Follow the programming.dev instance rules
Keep content related to programming in some way
If you're posting long videos try to add in some form of tldr for those who don't want to watch videos

Wormhole

Follow the wormhole through a path of communities !webdev@programming.dev

founded 3 years ago

MODERATORS

snowe@programming.dev

Ategon@programming.dev

UlrikHD@programming.dev

bugsmith@programming.dev

Spyro@programming.dev

Failure Modes of Large Language Models on Research-Level Mathematics: A Taxonomy and an Empirical Characterisation (arxiv.org)

submitted 1 week ago by supersquirrel@sopuli.xyz to c/programming@programming.dev

0 comments fedilink hide all child comments

The failure analysis in First Proof’s Appendix A describes something qualitatively different from the hallucination patterns studied in factual QA: models producing proofs that are fluently wrong, where the wrongness is concentrated in a small number of unjustified load-bearing claims rather than spread across obviously false individual facts. I have tried in this paper to give that pattern a precise enough description to be studied systematically. The taxonomy has four modes (F1: citation fabrication, F2: premise smuggling, F3: silent reformulation, F4: local-to-global gap), and my empirical audit of eight Flash proofs finds that F2 accounts for the failure in every case—even though it is the mode least targeted by existing mitigation proposals.

The obvious question this raises is whether it is possible to build a system that doesn’t produce these failures in the first place, as opposed to detecting them after the proof has been written. A prevention-oriented system would need to enforce, during generation, that every load-bearing claim in the proof is either derived from stated premises, grounded in a retrieved and verified source, or explicitly flagged as unverified before the output is returned. The failure modes described here are, I think, a reasonable specification of what such a system would need to prevent.

no comments (yet)

sorted by: hot top controversial new old

there doesn't seem to be anything here