AI - Artificial intelligence

294 readers
2 users here now

AI related news and articles.

Rules:

founded 1 year ago
MODERATORS
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
 
 

This paper is honestly one of the most creative takes on LLM reasoning I’ve seen in a while. The team at ByteDance basically argues that we should view Long Chain-of-Thought as a macromolecular structure with internal forces that hold the logic together. They found that when we try to teach a model to reason by simply distilling keywords from a teacher, it fails because it’s like trying to build a protein by looking at a photo of it rather than understanding the atomic bonds.

Their Molecular Structure of Thought hypothesis breaks reasoning down into three specific bond types that behave similarly to their chemical counterparts. Deep reasoning acts like covalent bonds, forming the rigid primary backbone where each logical step must strictly justify the next. Self-reflection functions like hydrogen bonds, creating folding patterns where the model looks back 100 steps to audit an earlier premise, which keeps it from hallucinating. Finally, you have self-exploration acting like van der Waals forces, these are low-commitment bridges that let the model probe different ideas without getting stuck in a rigid path too early.

They found that most synthetic reasoning data is actually trash because it lacks this distribution. They proved that models don't actually learn the keywords themselves, but the characteristic reasoning behaviors those keywords represent. In one experiment, they replaced keywords like wait with arbitrary synonyms or removed them entirely, and the models still learned the reasoning structure just fine. It turns out that building these stable thought molecules is what creates the basis for Long CoT, as opposed to just mimicking a specific vibe or prompt format.

They built MOLE-SYN to address the problem. Instead of just copying teacher outputs, it uses a distribution transfer graph to walk through four behavioral states to synthesize traces that have the correct bond profile from the start. Their approach makes reinforcement learning much more stable because the model starts with a balanced skeleton instead of a bunch of fragmented logic. The paper challenges the whole more data is better mindset to argue that it's the geometry of the information flow that really matters.

Original summary by @yogthos@lemmy.ml

143
144
145
146
 
 

Brett Wilkins
Feb 19, 2026

The New Brunswick, New Jersey City Council voted Wednesday to cancel plans to construct an artificial intelligence data center and instead build a new public park where the 27,000-square foot facility would have gone.

Artificial intelligence data centers—which house the servers and other infrastructure needed to train and power AI models—have major environmental and climate impacts, as they consume massive amounts of electricity and water, as well as rare earth metals and other resources.

According to New Brunswick Patch, hundreds of people packed into Wednesday evening’s city hall meeting to voice concerns that the proposed data center would send their electricity and water bills skyrocketing, and that the facility would harm the environment.

147
148
 
 

The Paris-based AI lab released two new speech-to-text models: Voxtral Mini Transcribe V2 and Voxtral Realtime. The former is built to transcribe audio files in large batches and the latter for nearly real-time transcription, within 200 milliseconds; both can translate between 13 languages. Voxtral Realtime is freely available under an open source license.

At 4 billion parameters, the models are small enough to run locally on a phone or laptop—a first in the speech-to-text field, Mistral claims—meaning that private conversations needn’t be dispatched to the cloud. According to Mistral, the new models are both cheaper to run and less error-prone than competing alternatives.

149
150
view more: ‹ prev next ›