I’d like to set up a local coding assistant so that I can stop using Google to ask complex questions to for search results.

I really don’t know what I’m doing or if there’s anything that’s available that respects privacy. I don’t necessarily trust search results for this kind of query either.

I want to run it on my desktop, Ryzen 7 5800xt + Radeon RX 6950xt + 32gb of RAM. I don’t need or expect data center performance out of this thing. I’m also a strict Sublime user so I’d like to avoid VS Code suggestions as much as possible.

My coding laptop is an oooooold MacBook Air so I’d like something that can be ran on my desktop and used from my laptop if possible. No remote access needed, just to use from the same home network.

Something like LM Studio and Qwen sounds like it’s what I’m looking for, but since I’m unfamiliar with what exists I figured I would ask for Lemmy’s opinion.

Is LM Studio + Qwen a good combo for my needs? Are there alternatives?

I’m on Lemmy Connect and can’t see comments from other instances when I’m logged in, but to whomever melted down from this question your relief is in my very first sentence:

to ask complex questions to for search results.

  • melfie@lemy.lol
    link
    fedilink
    English
    arrow-up
    1
    ·
    edit-2
    3 hours ago

    The main thing that has stopped me from running models like this so far is VRAM. My server has a RTX 4060 with 8GB, and not sure that can reasonably run a model like this.

    Edit:

    This calculator seems pretty useful: https://apxml.com/tools/vram-calculator

    According to this, I can run Qwen3 14B with 4B quant and 15-20% CPU/NVMe offloading and get 41 tokens / s. It seems 4B quant reduces accuracy by 5-15%.

    The calculator even says I can run the flagship model with 100% NVMe offloading and get 4 tokens / s.

    I didn’t realize NVMe offloading was even a thing and not sure if it actually is supported or works well in practice. If so, it’s a game changer.

    Edit:

    The llama.cpp docs do mention that models are memory mapped by default and loaded into memory as needed. Not sure if that means that a MoE model like qwen3 235b can run with 8GB of VRAM and 16GB of RAM, albeit at a speed that is an order of magnitude slower like the calculator suggests is possible.