Not to discount the usefulness or complexity of curl in any way, but it’s not one of the larger codebases out there. It’s also pretty darned good. Firefox seems to have a very positive experience with Mythos, but they also had their own internal test harnesses from prior work, ready to utilize LLM analysis at scale. It was far more intensive than having a third party run something on their behalf and produce a report.
Technology
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related news or articles.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.
Approved Bots
As far as i understand it, its not all hype. Its a little bit like having a really competent security researcher go deep through your complete codebase just really fast and with improved recall.
So no black magic, just stuff regular security reviews would find. Firefox is just a huge codebase and once a bug got past review it might stay there forever.
So this will be abused if released publicly sooner or later. This way is a little bit like responsible disclosure. This will make the initial wave hurt way less. And obviously it doesn't hurt marketing.
Anybody working with software knows marketing people promise the world and understand nothing. Pretty sure they just heard "black magic" and ran with it.
Its a little bit like having a really competent security researcher go deep through your complete codebase just really fast and with improved recall.
I doubt that, more of a force multiplier for security researchers at this stage (perhaps always for LLMs without an architecture leap) IMO. Otherwise I generally agree. It's responsible to take this approach perhaps, but mostly marketing. Still let's not kid ourselves it isn't happening at scale already. Plenty of open weights models can also force multiply a competent security researcher, either black or white hat. Mythos isn't a quantum leap or anything, just 4.7.
Anybody working with software knows marketing people promise the world and understand nothing. Pretty sure they just heard “black magic” and ran with it.
Heh, yup.
Or maybe they didn't. Check this cross-post comment thread.

I replied there, sourcing the original blog post.
What they said: “Maybe slightly better, not significantly better than existing tools, at least in the context of this single project”. What The Register makes of that: “greatest marketing stunt ever”. They did reference marketing in one sentence, but it was nowhere near as extreme.
So… can we say: “Myth busted” ?

Really though… how hard might it be to specialize a model in the process of systems analysis, auditing, documentation, testing, … and also ensure it is biased toward exploitative behaviors? I imagine training a model on all the CVEs, and other such texts, along with systems documentation and some kind of training for tool use (cli and cli tools) and tuning by making the model actually reproduce each CVE result within a virtual environment. Then see what happens when you put it in the ring with modern software.
I might be playing devils advocate. I really don’t believe Anthropic has anything special. Anthropic did get me thinking though, and should this really be so difficult? It seems possibly plausible, to me at least, if you really wanted to do this. It would make a shit general use LLM, though.