this post was submitted on 07 Jun 2026
676 points (98.6% liked)

Technology

85659 readers
3533 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 3 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] ag10n@lemmy.world 9 points 2 weeks ago (3 children)

What’s the cost of the compute you have to run something locally?

Majority of people don’t have 32G of vram to run something remotely as capable

[–] MrQuallzin@pie.eyeofthestorm.place 6 points 2 weeks ago (1 children)

I've got an old 1060ti in my server. Ollama shares it with just a couple other containers. Electricity here is majority hydro with some natural gas, $0.08/kWh.

It's a little slow, but I can comfortably run qwen3:14b. Of course that's not all done on the GPU, a large part is offloaded to server ram (generally 32GB available so more than enough headroom)

My server and my gaming PC combined last month came out to $13.32

[–] ag10n@lemmy.world 4 points 2 weeks ago* (last edited 2 weeks ago)

How does that compare to closed models that Anthropic offers, at the context and scale they offer.

I run Qwen3.6 27B locally and it’s usable with 16G vram but still not the same as a data centre of Blackwell clusters.

[–] greyscale@lemmy.grey.ooo 2 points 2 weeks ago (1 children)

lfm2 works like greased lightning on the NPU built into the current macbook M5.

[–] ag10n@lemmy.world 1 points 2 weeks ago (1 children)

Describe greased lightning, because it’s much slower and needs to handle compression for context

We’re moving in that direction but an M5 is not what the majority of people are running at home

[–] greyscale@lemmy.grey.ooo 0 points 2 weeks ago (1 children)

I dunno man, I'm not a slopjockey so I don't know the minutiae of the addiction.

All of our devs appear to have M5s right now. All of those copilot+ laptops have NPUs too.

[–] ag10n@lemmy.world 1 points 2 weeks ago (1 children)

Your company has bought you the latest and greatest and likely supports commercial token usage too

You can’t compare LLMs at scale to running it locally; same experience and capabilities

[–] greyscale@lemmy.grey.ooo -3 points 2 weeks ago

"Latest and greatest" my fucking sides lmao

My company gave me some US shitware and I've got some local shitware instead.

If you can't make that work and are dependent on the teat of the slopgenerators, that's a skill issue on you, buddy.

[–] blackbeans@lemmy.zip 1 points 2 weeks ago* (last edited 2 weeks ago) (1 children)

I remember my computer not being fast enough to even play an MP3 file. Two years later, my computer was capable of running 3D accelerated games, browsing the internet at broadband speeds and playing videos.

Sometimes technology advances fast. We could be entering such an era as there are major investments taking place and global competitors will rise to the occasion to market these to a broader audience.

I think it will be entirely possible for consumers to use a decent LLM on their computer in a few years time.

[–] ag10n@lemmy.world 5 points 2 weeks ago (2 children)

It’s not the 90s anymore. Unless there’s a compression algorithm putting billions of relationships into a manageable size, local AI is highly specific under 8G vram (text-to-speech as an example is under 1G) let alone the context required for keeping a conversation or writing code.

[–] ThirdConsul@lemmy.zip 1 points 2 weeks ago (1 children)

If text-to-speech is what Youtube uses to autogenerate the subtitles, it is worthless for anything that uses slightly richer vocabulary.

[–] pirat@lemmy.world 2 points 2 weeks ago

No. Autogenerated subtitles would be speech-to-text, rather than text-to-speech.

[–] blackbeans@lemmy.zip -2 points 2 weeks ago (2 children)

To be clear, I wasn't talking about a leap in LLM design. I was talking about a leap in hardware capabilities...

[–] KRAW@linux.community 2 points 2 weeks ago

Improved hardware capabilities used to come very quickly (see Moore's Law and Dennard Scaling). However that trend is basically over, so getting higher performance hardware takes a lot of effort to make hardware specialized for certain tasks. That's why you see there inference accelerators like Groq, SambaNova, Cerebrus, etc. However this is hardware that still is gonna go into data centers. Something innovative has to happen on the AI side for commercial-grade models to be runnable on consumer hardware.

[–] ag10n@lemmy.world 2 points 2 weeks ago

Which are increasingly out of reach for a normal person. Phones let alone PC hardware have increased exponentially in recent history