hok

joined 2 years ago
[–] hok@lemmy.dbzer0.com 0 points 1 month ago (1 children)

Sure, if you have a micro swarm architecture laid out, I would love to hear what it is.

[–] hok@lemmy.dbzer0.com 0 points 1 month ago (3 children)

Thank you for your opinion & recommendations. Something I saw today related to "sub-agents" is in Kimi 2.6's model card it says

Elevated Agent Swarm: Scaling horizontally to 300 sub-agents executing 4,000 coordinated steps, K2.6 can dynamically decompose tasks into parallel, domain-specialized subtasks, delivering end-to-end outputs from documents to websites to spreadsheets in a single autonomous run.

So maybe Kimi 2.6 is doing the "type of thing" I am looking for, but I don't have the means to run it practically. Maybe at 1 token per second which would be brutal.

I tried out Qwen 3.6 27B but not yet in an agentic setting, so I can't really judge yet. Maybe it's just me but the small model size seems limiting. I thought gpt-oss-120b was good.

[–] hok@lemmy.dbzer0.com 0 points 1 month ago

What I have yet to learn is how much of the intelligence and accuracy comes from the model itself and how much comes from the agentic tool system. For example, my experience with ChatGPT probably would be much worse with the free version (no thinking or container).

 

Are there any open models that can actually compete with proprietary ones like GPT 5.5 Extended Thinking or Claude Opus 4.7? I am getting really good results with those in their chat interfaces for coding tasks. They sometimes spend 30-45 minutes working on my task and have an internal container they are doing tool calls on, like cloning a repository and compiling their code, and can find online documentation. Their answers are very good and usually correct for very complex tasks requiring specific protocols.

So I would like to know how well we can replicate this using open models since I want more control over how it runs, and privacy. Do any of you hook in agentic capabilities into your local models? How do you do it, and which models give you good results?

Pretend I have unlimited resources (local llama.cpp, sufficient fast storage/memory, and unlimited time to wait for a good response).

 

I would like my model to know the code libraries I use and help me write code with them. I use llama.cpp's server and web UI for inference, but I have no clue how to get started with RAG, since it seems it is not natively supported with llama.cpp's server implementation. It almost looks like I would need to code my own agent.

I am not interested in commercial offerings or APIs. If you use RAG, how do you do it?

 

I've been waiting for an open source TTS model that was actually good enough to capture some of the subtleties of language and synthesize them in a natural-sounding way that makes sense. I think I finally found one that fits the requirements.

Model: https://huggingface.co/fishaudio/fish-speech-1.5

It uses an encoder rather than relying on phonemes, and generations sometimes vary because of that, but the amount of errors I've gotten are minimal, and the variations in the generation are all surprisingly natural in slightly different ways, which is very exciting.

Give it a spin if you are also looking for a TTS model that sounds good. It uses voice cloning, so find a good 10-20 second reference clip to have the generations use the same voice.