AI - Artificial intelligence

268 readers
1 users here now

AI related news and articles.

Rules:

founded 11 months ago
MODERATORS
1
 
 

A GGUF port of DFlash speculative decoding. Standalone C++/CUDA stack on top of ggml, runs on a single 24 GB RTX 3090, hosts the new Qwen3.6-27B.

~1.98x mean over autoregressive on Qwen3.6 across HumanEval / GSM8K / Math500, with zero retraining.

If you have CUDA 12+ and an NVIDIA GPU like RTX 3090 / 4090 / 5090, then all you need to do is

clone the repo

cd lucebox-hub/dflash cmake -B build -S . -DCMAKE_BUILD_TYPE=Release cmake --build build --target test_dflash -j

fetch target (~16 GB)

hf download unsloth/Qwen3.6-27B-GGUF Qwen3.6-27B-Q4_K_M.gguf --local-dir models/

matched 3.6 draft is gated: accept terms + set HF_TOKEN first

hf download z-lab/Qwen3.6-27B-DFlash --local-dir models/draft/

run

DFLASH_TARGET=models/Qwen3.6-27B-Q4_K_M.gguf python3 scripts/run.py --prompt "def fibonacci(n):"

That's it. No Python runtime in the engine, no llama.cpp install, no vLLM, no SGLang.

Luce DFlash will:

  1. Load Qwen3.6-27B Q4_K_M target weights (~16 GB) plus the matched DFlash bf16 draft (~3.46 GB) and run DDTree tree-verify speculative decoding (block size 16, default budget 22, greedy verify).
  2. Compress the KV cache to TQ3_0 (3.5 bpv, ~9.7x vs F16) and roll a 4096-slot target_feat ring so 256K context fits in 24 GB. Q4_0 is the legacy path and tops out near 128K.
  3. Auto-bump the prefill ubatch from 16 to 192 for prompts past 2048 tokens (~913 tok/s prefill on 13K prompts).
  4. Apply sliding-window flash attention at decode (default 2048-token window, 100% speculative acceptance retained) so 60K context still decodes at 89.7 tok/s instead of 25.8 tok/s.
  5. Serve over an OpenAI-compatible HTTP endpoint or a local chat REPL.

Running on RTX 3090, Qwen3.6-27B UD-Q4_K_XL (unsloth Dynamic 2.0) target, 10 prompts/dataset, n_gen=256:

Bench AR tok/s DFlash tok/s AL Speedup

HumanEval 34.90 78.16 5.94 2.24x

Math500 35.13 69.77 5.15 1.99x

GSM8K 34.89 59.65 4.43 1.71x

Mean 34.97 69.19 5.17 1.98x

2
3
4
1
deepseek v4 (api-docs.deepseek.com)
5
6
 
 

[...]

That marketing may have outstripped reality. Early reports from Mythos preview users including AWS and Mozilla indicate that while the model is very good and very fast at finding vulnerabilities, and requires less hands-on guidance from security engineers - making it a welcome time-saver for the human teams - it has yet to eclipse human security researchers.

"So far we've found no category or complexity of vulnerability that humans can find that this model can't," Mozilla CTO Bobby Holley said, after revealing that Mythos found 271 vulnerabilities in Firefox 150. Then he added: "We also haven't seen any bugs that couldn't have been found by an elite human researcher." In other words, it's like adding an automated security researcher to y

7
8
9
10
 
 

🤖 Honestly you can't make this up:

Human Robot #Zuckerberg is gonna put an #AI based smart keylogger on his employees computers.

Nothing says we don't trust you and don't appreciate you like a regular screenshot of your work screen.

No more NSFW! Welcome to the future baby!

Guess them employees are gonna get a taste of their own medicine for once.

Seems like everything is going according to plan.

Use the employees data to train MaiRK, so Meta can finally become a synonym for MaiRK. https://mastodon.social/@madeindex/116402482806274908

Eventually we will only see his superintelligence controlling everything. https://mastodon.social/@madeindex/115871208581092120

Narcissism 2026 version 😋

If you are thinking: Wouldn't this make people quit? Meta is banking on it: https://www.reuters.com/world/meta-targets-may-20-first-wave-layoffs-additional-cuts-later-2026-2026-04-17/

11
12
 
 

Is it time to pump the brakes on generative AI? Michael Walker speaks to Nate Soares, author of the book ‘If Anyone Builds It, Everyone Dies: The Case Against Superintelligent AI.’

13
14
15
16
 
 

Our latest model, Claude Opus 4.7, is now generally available.

Opus 4.7 is a notable improvement on Opus 4.6 in advanced software engineering, with particular gains on the most difficult tasks. Users report being able to hand off their hardest coding work—the kind that previously needed close supervision—to Opus 4.7 with confidence. Opus 4.7 handles complex, long-running tasks with rigor and consistency, pays precise attention to instructions, and devises ways to verify its own outputs before reporting back.

The model also has substantially better vision: it can see images in greater resolution. It’s more tasteful and creative when completing professional tasks, producing higher-quality interfaces, slides, and docs. And—although it is less broadly capable than our most powerful model, Claude Mythos Preview—it shows better results than Opus 4.6 across a range of benchmarks:

17
 
 

I think one of the biggest mistakes we have made as an industry is conflating the words "AI" and "LLMs." The irony is right there on the surface. Naming is one of the hardest things to do in software, and we've done it poorly for the primary tool of software.

18
19
20
21
22
23
 
 

Most Americans using AI tools for health purposes say they want immediate answers. In some cases, it helps them evaluate what kind of medical attention they need.

“It’ll let me know if something’s serious or not,” Davis said of ChatGPT, which she typically consults before scheduling medical appointments.

The Gallup survey found about 7 in 10 U.S. adults who have used AI for health research in the past 30 days say they wanted quick answers, additional information or were simply curious. Majorities used it for research before seeing a doctor or after an appointment.

24
 
 

AI models from Google, OpenAI, and Anthropic lost money betting on soccer matches over a Premier League season, in a new study suggesting even the most advanced systems struggle to analyze the real world over long periods.

25
view more: next ›