TIL: There is an open source "Alexa replacement" project

cm0002@libretechni.ca · 2 days ago

TIL: There is an open source "Alexa replacement" project

brucethemoose@lemmy.world · edit-2 1 day ago

Ah. On an 8000 APU, to be blunt, you’re likely better off with Vulkan + whatever omni models GGML supports these days. Last I checked, TG is faster and prompt processing is close to rocm.

…And yeah, that was total misadvertisement on AMD’s part. They’ve completely diluted the term kinda like TV makers did with ‘HDR’

fonix232@fedia.io · 1 day ago

The thing is, if AMD actually added proper support for it, given it has a somewhat powerful NPU as well… For the total TDP of the package it’s still one of the best perf per watt APU, just the damn software support isn’t there.

Feckin AMD.

brucethemoose@lemmy.world · edit-2 1 day ago

The IGP is more powerful than the NPU on these things anyway. The NPU us more for ‘background’ tasks, like Teams audio processing or whatever its used for on Windows.

Yeah, in hindsight, AMD should have tasked (and still should task) a few engineers on popular projects (and pushed NPU support harder), but GGML support is good these days. It’s gonna be pretty close to RAM speed-bound for text generation.

fonix232@fedia.io · 1 day ago

Aye, I was actually hoping to use the NPU for TTS/STT while keeping the LLM systems GPU bound.

brucethemoose@lemmy.world · edit-2 1 day ago

It still uses memory bandwidth, unfortunately. There’s no way around that, though NPU TTS would still be neat.

…Also, generally, STT responses can’t be streamed, so you mind as well use the iGPU anyway. TTS can be chunked I guess, but do the major implementations do that?

fonix232@fedia.io · 1 day ago

Piper does chunking for TTS, and could utilise the NPU with the right drivers.

And the idea of running them on the NPU is not about memory usage but hardware capacity/parallelism. Although I guess it would have some benefits when I don’t have to constantly load/unload GPU models.

brucethemoose@lemmy.world · 23 hours ago

Oh, I forgot!

You should check out Lemonade:

https://github.com/lemonade-sdk/lemonade

It’s supports Ryzen NPUs via 2 different runtimes… though apparently not the 8000 series yet?

fonix232@fedia.io · 22 hours ago

I’ve actually been eyeing lemonade, but the lack of Dockerisation is still an issue… guess I’ll just DIY it at one point.

brucethemoose@lemmy.world · edit-2 17 hours ago

It’s all C++ now, so it doesn’t really need docker! I don’t use docker for any ML stuff, just pip/uv venvs.

You might consider Arch (dockerless) ROCM soon; it looks like 7.1 is in the staging repo right now.

brucethemoose@lemmy.world · edit-2 1 day ago

Yeah… Even if the LLM is RAM speed constrained, simply using another device to not to interrupt it would be good.

Honestly AMD’s software dev efforts are baffling. They’ve focused on a few on libraries precisely no-one uses, like this: https://github.com/amd/Quark

While ignoring issues holding back entire sectors (like broken flash-attention) with devs screaming about it at the top of their lungs.

Intel suffers from corporate Game of Thrones, but at least they have meaningful contributions in the open source space here, like the SYCL/AMX llama.cpp code or the OpenVINO efforts.

TIL: There is an open source "Alexa replacement" project

TIL: There is an open source "Alexa replacement" project

Home