AI - Artificial intelligence

Our latest model, Claude Opus 4.7, is now generally available.

Opus 4.7 is a notable improvement on Opus 4.6 in advanced software engineering, with particular gains on the most difficult tasks. Users report being able to hand off their hardest coding work—the kind that previously needed close supervision—to Opus 4.7 with confidence. Opus 4.7 handles complex, long-running tasks with rigor and consistency, pays precise attention to instructions, and devises ways to verify its own outputs before reporting back.

The model also has substantially better vision: it can see images in greater resolution. It’s more tasteful and creative when completing professional tasks, producing higher-quality interfaces, slides, and docs. And—although it is less broadly capable than our most powerful model, Claude Mythos Preview—it shows better results than Opus 4.6 across a range of benchmarks:

30

1

AI vs LLMs: The Misnomer That's Costing Us (varenya.hashnode.dev)

submitted 3 weeks ago by codeinabox@programming.dev to c/Aii@programming.dev

0 comments fedilink

I think one of the biggest mistakes we have made as an industry is conflating the words "AI" and "LLMs." The irony is right there on the surface. Naming is one of the hardest things to do in software, and we've done it poorly for the primary tool of software.

31

1

I don't want a screenshot of your Claude conversation (daverupert.com)

submitted 3 weeks ago by codeinabox@programming.dev to c/Aii@programming.dev

0 comments fedilink

32

1

Naomi Klein, Bernie Sanders, and Ro Khanna Roundtable Explores Future of AI | Common Dreams (www.commondreams.org)

submitted 3 weeks ago by cm0002@lemy.lol to c/Aii@programming.dev

0 comments fedilink

33

1

In the Wake of Anthropic’s Mythos, OpenAI Has a New Cybersecurity Model—and Strategy (www.wired.com)

submitted 3 weeks ago by cm0002@lemy.lol to c/Aii@programming.dev

0 comments fedilink

34

1

Shoe company Allbirds pivots to AI compute in sign of a totally normal and healthy economy (www.engadget.com)

submitted 3 weeks ago by cm0002@lemdro.id to c/Aii@programming.dev

0 comments fedilink

35

1

Microsoft exec suggests AI agents will need to buy software licenses, just like employees (www.businessinsider.com)

submitted 3 weeks ago by codeinabox@programming.dev to c/Aii@programming.dev

0 comments fedilink

cross-posted from: https://lemmy.bestiver.se/post/1047611

Comments

36

1

Why many Americans are turning to AI for health advice, according to recent polls (apnews.com)

submitted 3 weeks ago by cm0002@lemdro.id to c/Aii@programming.dev

0 comments fedilink

Most Americans using AI tools for health purposes say they want immediate answers. In some cases, it helps them evaluate what kind of medical attention they need.

“It’ll let me know if something’s serious or not,” Davis said of ChatGPT, which she typically consults before scheduling medical appointments.

The Gallup survey found about 7 in 10 U.S. adults who have used AI for health research in the past 30 days say they wanted quick answers, additional information or were simply curious. Majorities used it for research before seeing a doctor or after an appointment.

37

1

AI models are terrible at betting on soccer—especially xAI Grok (arstechnica.com)

submitted 3 weeks ago by codeinabox@programming.dev to c/Aii@programming.dev

0 comments fedilink

AI models from Google, OpenAI, and Anthropic lost money betting on soccer matches over a Premier League season, in a new study suggesting even the most advanced systems struggle to analyze the real world over long periods.

38

1

Claude Mythos Is Everyone’s Problem (www.theatlantic.com)

submitted 3 weeks ago by cm0002@infosec.pub to c/Aii@programming.dev

0 comments fedilink

39

1

TurboQuant: Reducing LLM Memory Usage With Vector Quantization (hackaday.com)

submitted 3 weeks ago by cm0002@infosec.pub to c/Aii@programming.dev

0 comments fedilink

40

1

First AI Model From Zuckerberg's Wildly Expensive Superintelligence Lab Flops Compared to Virtually All Rivals (futurism.com)

submitted 3 weeks ago by cm0002@infosec.pub to c/Aii@programming.dev

0 comments fedilink

"the company admitted it likely won’t be able to keep up with competing models."

"As such, the announcement is a bit of an enigma: if it can’t keep up with the competition, why release it at all? There’s a good change Meta is just trying to get its foot in the door — or a “seat at the big kid’s table,” as Wired put it. The company has struggled to stay relevant in a rapidly changing landscape" "Meta’s preceding Llama open source models largely failed to catch on, with a major controversy last year finding that Meta may have faked benchmark results to make its Llama 4 model seem more capable than it actually was."

41

1

VS Code 1.115 - introduction of the new VS Code Agents companion app (code.visualstudio.com)

submitted 4 weeks ago by cm0002@infosec.pub to c/Aii@programming.dev

0 comments fedilink

42

1

Z.ai unveils GLM-5.1, enabling AI coding agents to run autonomously for hours (www.computerworld.com)

submitted 4 weeks ago by cm0002@infosec.pub to c/Aii@programming.dev

0 comments fedilink

Chinese AI company Z.ai has launched GLM-5.1, an open-source coding model it says is built for agentic software engineering. The release comes as AI vendors move beyond autocomplete-style coding tools toward systems that can handle software tasks over longer periods with less human input.

Z.ai said GLM-5.1 can sustain performance over hundreds of iterations, an ability it argues sets it apart from models that lose effectiveness in longer sessions.

As one example, the company said GLM-5.1 improved a vector database optimization task over more than 600 iterations and 6,000 tool calls, reaching 21,500 queries per second, about six times the best result achieved in a single 50-turn session.

In a research note, Z.ai said GLM-5.1 outperformed its predecessor, GLM-5, on several software engineering benchmarks and showed particular strength in repo generation, terminal-based problem solving, and repeated code optimization. The company said the model scored 58.4 on SWE-Bench Pro, compared with 55.1 for GLM-5, and above the scores it listed for OpenAI’s GPT-5.4, Anthropic’s Opus 4.6, and Google’s Gemini 3.1 Pro on that benchmark.

43

1

Call your existing automation ‘zero-token architecture’ to become an instant agentic AI wiz (www.theregister.com)

submitted 4 weeks ago by codeinabox@programming.dev to c/Aii@programming.dev

0 comments fedilink

As businesses drink the agentic AI Kool-Aid and go looking for productivity enhancements, IT professionals can deliver by rebranding their existing automations as “zero-token architecture,” according to Kelsey Hightower, a former Google distinguished engineer and a notable early promoter of Kubernetes.

44

1

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU (arxiv.org)

submitted 4 weeks ago by cm0002@infosec.pub to c/Aii@programming.dev

0 comments fedilink

45

1

Dafny as Verification-Aware Intermediate Language for Code Generation (arxiv.org)

submitted 4 weeks ago by cm0002@infosec.pub to c/Aii@programming.dev

0 comments fedilink

Dafny is a good intermediate step for LLM generated code.

this is the abstract of the paper:

Using large language models (LLMs) to generate source code from natural language prompts is a popular and promising idea with a wide range of applications. One of its limitations is that the generated code can be faulty at times, often in a subtle way, despite being presented to the user as correct. In this paper, we explore ways in which formal methods can assist with increasing the quality of code generated by an LLM. Instead of emitting code in a target language directly, we propose that the user guides the LLM to first generate an opaque intermediate representation, in the verification-aware language Dafny, that can be automatically validated for correctness against agreed on specifications. The correct Dafny program is then compiled to the target language and returned to the user. All user-system interactions throughout the procedure occur via natural language; Dafny code is never exposed. We describe our current prototype and report on its performance on the HumanEval Python code generation benchmarks.

46

1

Axios: Anthropic limits Mythos Preview access over advanced hacking capabilities (www.axios.com)

submitted 4 weeks ago by cm0002@lemy.lol to c/Aii@programming.dev

0 comments fedilink

47

1

1 In 5 Boys Know Someone Their Age Who's In A Relationship With An AI Chatbot (www.huffingtonpost.co.uk)

submitted 1 month ago by cm0002@lemy.lol to c/Aii@programming.dev

0 comments fedilink

48

1