LocalLLaMA
Welcome to LocalLLaMA! Here we discuss running and developing machine learning models at home. Lets explore cutting edge open source neural network technology together.
Get support from the community! Ask questions, share prompts, discuss benchmarks, get hyped at the latest and greatest model releases! Enjoy talking about our awesome hobby.
As ambassadors of the self-hosting machine learning community, we strive to support each other and share our enthusiasm in a positive constructive way.
Rules:
Rule 1 - No harassment or personal character attacks of community members. I.E no namecalling, no generalizing entire groups of people that make up our community, no baseless personal insults.
Rule 2 - No comparing artificial intelligence/machine learning models to cryptocurrency. I.E no comparing the usefulness of models to that of NFTs, no comparing the resource usage required to train a model is anything close to maintaining a blockchain/ mining for crypto, no implying its just a fad/bubble that will leave people with nothing of value when it burst.
Rule 3 - No comparing artificial intelligence/machine learning to simple text prediction algorithms. I.E statements such as "llms are basically just simple text predictions like what your phone keyboard autocorrect uses, and they're still using the same algorithms since <over 10 years ago>.
Rule 4 - No implying that models are devoid of purpose or potential for enriching peoples lives.
view the rest of the comments

Rules of thumb
Having said that, remember that 1) you can run partially on CPU/GPU 2) use lower quants etc. So, if you have "just" 12GB, a lower quant (I dunno...IQ3_XS?) might get you over the line
You can run it however you want :) For someone brand new, the best all in one is Ollama or Jan.ai.
Yes. Jan.ai has MCP tooling (I imagine ollama does as well), so you can follow the how-to's to set that up. Read their docs? What do you need to do with MCP?
What you should know: you'll reach a point where "more parameters = better performance" needs to be balanced against cost and smarter tooling. Don't be tempted to drop $$$ on something thinking you can just throw money at the problem to make it go away.
Thanks! I’m experimenting with my laptop with 16GB RAM and no GPU/VRAM. I installed llama.cpp and am testing Gemma 7b Q5 but it’s not answering prompts correctly. It’s analyzing the prompt and not answering the question, or it gives me a poem haha. Trying to figure it out.
Any lightweight model you recommend for just chat experimenting for now? Can they connect to the internet?
I'll never not recommend Qwen3-4B 2507 instruct...because despite being ancient in AI terms (so, 8 months lol) it's solid. Notably, the base models in Jan are all Qwen 3-4 variants.
Most models can search the web, if they have access to web searching tool.