AI - Artificial intelligence

295 readers
1 users here now

AI related news and articles.

Rules:

founded 1 year ago
MODERATORS
151
152
 
 

The Paris-based AI lab released two new speech-to-text models: Voxtral Mini Transcribe V2 and Voxtral Realtime. The former is built to transcribe audio files in large batches and the latter for nearly real-time transcription, within 200 milliseconds; both can translate between 13 languages. Voxtral Realtime is freely available under an open source license.

At 4 billion parameters, the models are small enough to run locally on a phone or laptop—a first in the speech-to-text field, Mistral claims—meaning that private conversations needn’t be dispatched to the cloud. According to Mistral, the new models are both cheaper to run and less error-prone than competing alternatives.

153
154
155
 
 

Claude Code made a C compiler. And it's completely useless

156
157
158
159
160
161
162
 
 

I'm willing to be impressed by AI products, but Anthropic's AI‑built C compiler leaves me a bit cold. It's little more than a clever demo. It is not the moment when software engineering as we know it flips over and dies. Not even close.

Anthropic proudly claimed its team of 16 Claude Opus 4.6 agents had written a Rust-based C compiler from scratch without any access to the internet. Really? That's meant to impress me? Sure, as Anthropic claims, the AI-created C compiler can compile this, that, and the other thing. Yes, even Doom. But so what?

163
164
 
 

Just finished reading the report on Qwen-Image-2.0 that dropped the other day. This looks like the efficiency breakthrough we've been waiting for.

The "Headline" Stats:

  • Model Size: 7B parameters.
  • Previous Gen: The old Qwen-Image-2512 was a heavy 20B model.
  • Architecture: Unified "Omni" model (handles both generation and editing in the same weights).
  • Resolution: Native 2K (2048x2048).

The 20B to 7B Optimization: This is the most important part for us. The previous 20B model was a pain to run locally without 24GB VRAM. Crushing that performance down to a 7B model means this should theoretically run on:

  • 12GB Cards (3060/4070): Comfortably at FP16 or Q8.
  • 8GB Cards: Likely possible with aggressive quantization (Q4/Q5) once the community gets hold of it.

Beating "Nano Banana" (Gemini 2.5 Flash Image): The technical report explicitly calls out their performance on blind leaderboards (ELO score). They are claiming Qwen-Image-2.0 achieves a higher ELO rating than Gemini 2.5 Flash Image (aka. Nano Banana) in blind human preference testing.

  • Why this matters: Nano Banana is currently regarded as the SOTA for instruction following and complex prompt adherence. If a 7B local model is actually beating it in ELO, that is insane efficiency.

The "Catch": Weights are not open yet. It is currently available via their API and Demo (Qwen Chat). However, Qwen has an excellent track record (Apache 2.0 releases for almost everything eventually). Given that they released the 20B weights previously, it is highly likely we see the 7B weights in a matter of weeks.

TL;DR: They optimized the 20B heavy-hitter down to a consumer-viable 7B, it claims to beat Google's best efficiency model in ELO, and now we wait for the HF upload to see if the quantization holds up.

Writeup author @mudkip@lemdro.id

165
 
 

Then, on February 5th, two major AI labs released new models on the same day: GPT-5.3 Codex from OpenAI, and Opus 4.6 from Anthropic (the makers of Claude, one of the main competitors to ChatGPT). And something clicked. Not like a light switch... more like the moment you realize the water has been rising around you and is now at your chest.

I am no longer needed for the actual technical work of my job. I describe what I want built, in plain English, and it just... appears. Not a rough draft I need to fix. The finished thing. I tell the AI what I want, walk away from my computer for four hours, and come back to find the work done. Done well, done better than I would have done it myself, with no corrections needed. A couple of months ago, I was going back and forth with the AI, guiding it, making edits. Now I just describe the outcome and leave.

166
167
168
169
 
 

Speech to text model inference in pure C.

This is a C implementation of the inference pipeline for the Mistral AI's Voxtral Realtime 4B model. It has zero external dependencies beyond the C standard library. The MPS inference is decently fast, while the BLAS acceleration is usable but slow (it continuously convert the bf16 weights to fp32).

Audio processing uses a chunked encoder with overlapping windows, bounding memory usage regardless of input length. Audio can also be piped from stdin (--stdin), or captured live from the microphone (--from-mic, macOS), making it easy to transcode and transcribe any format via ffmpeg. A streaming C API (vox_stream_t) lets you feed audio incrementally and receive token strings as they become available.

Similar projects: Whisper.cpp

170
171
172
 
 

You're using AI to be more productive. So why are you more exhausted than ever? The paradox every engineer needs to confront.

173
1
Feeding the Data Deity (reflectingwide.blogspot.com)
174
175
view more: ‹ prev next ›