1U mini PC for AI?

nagaram@startrek.website · 2 months ago

1U mini PC for AI?

_//(0)(0)\\_@lemmy.world · 2 months ago

Looking good! Funny I happen across this post when I’m working on mine as well. As I type this I’m playing with a little 1.5” transparent OLED that will poke out of the rack beside each pi, scrolling various info (cpu load/temp, IP, LAN traffic, node role, etc)

ripcord@lemmy.world · 2 months ago

What OLED specifically and what will you be using to drive it?

hendrik@palaver.p3x.de · 2 months ago

Well, I always advocate for using the stuff you have. I don’t think a Discord bot needs four new RasPi 5. That’s likely to run on a single RasPi3. And as long as they’re sitting idle, it doesn’t really matter which model number they have… So go ahead and put something on your hardware, and buy new one once you’ve maxed out your current setup.

I’m not educated on Bazzite. Maybe tools like Distrobox or other container solutions can help running AI workloads on the gaming rig. It’s likely easier to run a dedicated AI server, but I started learning about quantization, tested some models on my main computer with the help of ollama, KoboldCPP and some random Docker/Podman containers. I’m not saying this is the preferrable solution. But definitely enough to get started with AI. And you can always connect the computers within your local network, write some server applications and have them hook into ollama’s API and it doesn’t really matter whether that runs on your gaming pc or a server (as long as the computer in question is turned on…)

nagaram@startrek.website · 2 months ago

Ollama and all that runs on it its just the firewall rules and opening it up to my network that’s the issue.

I cannot get ufw, iptables, or anything like that running on it. So I usually just ssh into the PC and do a CLI only interaction. Which is mostly fine.

I want to use OpenWebUI so I can feed it notes and books as context, but I need the API which isn’t open on my network.

Diplomjodler@lemmy.world · 2 months ago

I’m afraid I’m going to have to deduct one style point for the misalignment of the labels on the mini PCs.

nagaram@startrek.website · 2 months ago

That’s fair and justified. I have the label maker right now in my hands. I can fix this at any moment and yet I choose not to.

I’m man feeding orphans to the orphan crushing machine. I can stop this at any moment.

Diplomjodler@lemmy.world · 2 months ago

The machine must keep running!

☂️-@lemmy.ml · edit-2 1 month ago

deleted by creator

Korhaka@sopuli.xyz · 2 months ago

Ohh nice, I want it. Don’t really know what I would use all of it for, but I want it (but don’t want to pay for it).

Currently been thinking of getting an N150 mini PC. Setup proxmox and a few VMs. At the very least pihole, location to dump some backups and also got a web server for a few projects.

TropicalDingdong@lemmy.world · 2 months ago

This is so pretty 😍🤩💦!!

I’ve been considering a micro rack to support the journey, but primarily for house old laptop chassis as I convert them into proxmox resources.

Any thoughts or comments on you choice of this rack?

nagaram@startrek.website · 2 months ago

Not really a lot of thought went into rack choice. I wanted something smaller and more powerful than my several optiplexs I had.

I also decided I didn’t want storage to happen here anymore because I am stupid and only knew how to pass through disks for Truenas. So I had 4 truenas servers on my network and I hated it.

This was just what I wanted at a price I was good with at Like $120. There’s a 3D printable version but I wasn’t interested in that. I do want to 3D print racks and I want to make my own custom ones for the Pis to save space.

But this set up is way cheaper if you have a printer and some patience.

thejml@sh.itjust.works · edit-2 2 months ago

Honestly, If you are delving into Kubernetes, just add some more of those 1L PCs in there. I tend to find them on ebay cheaper than Pi’s. Last year I snagged 4x 1L Dells with 16GB RAM for $250 shipped. I swapped some RAM around, added some new SSD’s and now have 3x Kube masters, 3x Kube worker nodes and a few VMs running a Proxmox cluster across 3 of the 1L’s with 32GB and a 512GbB SSD each and its been great. The other one became my wife’s new desktop.

Big plus, there are so many more x86_64 containers out there compared to Pi compatible ARM ones.

InternetCitizen2@lemmy.world · 2 months ago

NSFW

lepinkainen@lemmy.world · 2 months ago

This will get downvoted to oblivion because this is Lemmy:

Get a Mac Mini. Any M-series model with 32GB of memory will run local models at decent speeds and will be cheaper than just a 5xxx series GPU

And it’ll fit your cool rack 😀

6nk06@sh.itjust.works · 2 months ago

https://en.wikipedia.org/wiki/ThinkCentre because I didn’t knew it existed.

nagaram@startrek.website · 2 months ago

These are M715q Thinkcentres with a Ryzen Pro 5 2400GE

nagaram@startrek.website · 2 months ago

Oh and my home office set up uses Tiny in One monitors so I configured these by plugging them into my monitor which was sick.

I’m a huge fan of this all in one idea that is upgradable.

Colloidal@programming.dev · 2 months ago

You could combine both 1U fillers and install a 2U PC, which would be easier to find.

nagaram@startrek.website · 2 months ago

I was thinking about that now that I have Mac Minis on the mind. I might even just set a mac mini on top next to the modem.

Melvin_Ferd@lemmy.world · 2 months ago

The AI hate is overwhelming at times. This is great. What kind of things are you doing with it?

nagaram@startrek.website · 2 months ago

Not much. As much as I like LLMs, I don’t trust them for more than rubber duck duty.

Eventually I want to have a Copilot at Home set up where I can feed a notes database and whatever manuals and books I’ve read so it can draw from that when I ask it questions.

The problem is my best GPU is my gaming GPU a 5060ti and its in a Bazzite gaming PC so its hard to get the AI out of it because of Bazzite’s “No I won’t let you break your computer” philosophy, which is why I did it. And my second best GPU is a 3060 12GB which is really good, but if I made a dedicated AI server, I’d want it to be better than my current server.

mierdabird@lemmy.dbzer0.com · edit-2 2 months ago

I’m actually right there with you, I have a 3060 12gb and tbh I think it’s the absolute most cost effective GPU option for home use right now. You can run 14B models at a very reasonable pace.
Doubling or tripling the cost and power draw just to get 16-24gb doesn’t seem worth it to me. If you really want an AI-optimized box I think something with the new Ryzen Max chips would be the way to go - like an ASUS ROG Z-Flow, Framework Desktop or the GMKtek option whatever it’s called. Apple’s new Mac Minis are also great options. Both Ryzen Max and Apple make use of shared CPU/GPU memory so you can go up 96GB+ at much much lower power draws.

ZeDoTelhado@lemmy.world · edit-2 2 months ago

I have a question about ai usage on this: how do you do this? Every time I see ai usage some sort of 4090 or 5090 is mentioned, so I am curious what kind of ai usage you can do here

nagaram@startrek.website · 2 months ago

With a RTX 3060 12gb, I have been perfectly happy with the quality and speed of the responses. It’s much slower than my 5060ti which I think is the sweet spot for text based LLM tasks. A larger context window provided by more vram or a web based AI is cool and useful, but I haven’t found the need to do that yet in my use case.

As you may have guessed, I can’t fit a 3060 in this rack. That’s in a different server that houses my NAS. I have done AI on my 2018 Epyc server CPU and its just not usable. Even with 109gb of ram, not usable. Even clustered, I wouldn’t try running anything on these machines. They are for docker containers and minecraft servers. Jeff Geerling probably has a video on trying to run an AI on a bunch of Raspberry Pis. I just saw his video using Ryzen AI Strix boards and that was ass compared to my 3060.

But to my use case, I am just asking AI to generate simple scripts based on manuals I feed it or some sort of writing task. I either get it to take my notes on a topic and make an outline that makes sense and I fill it in or I feed it finished writings and ask for grammatical or tone fixes. Thats fucking it and it boggles my mind that anyone is doing anything more intensive then that. I am not training anything and 12gb VRAM is plenty if I wanna feed like 10-100 pages of context. Would it be better with a 4090? Probably, but for my uses I haven’t noticed a difference in quality between my local LLM and the web based stuff.

teslasdisciple@lemmy.ca · 2 months ago

I’m running ai on an old 1080 ti. You can run ai on almost anything, but the less memory you have the smaller (ie. dumber) your models will have to be.

As for the “how”, I use Ollama and Open WebUI. It’s pretty easy to set up.

Flax@feddit.uk · 2 months ago

How much did this cost?

nagaram@startrek.website · 2 months ago

The Lenovo Thinkcentre M715q were $400 total after upgrades. I fortunately had 3 32 GB kits of ram from my work’s e-waste bin but if I had to add those it would probably be $550 ish The rack was $120 from 52pi I bought 2 extra 10in shelves for $25 each the Pi cluster rack was also $50 (shit I thought it was $20. Not worth) Patch Panel was $20 There’s a UPS that was $80 And the switch was $80

So in total I spent $800 on this set up

To fully replicate from scratch you would need to spend $160 on raspberry pis and probably $20 on cables

So $1000 theoratically

brucethemoose@lemmy.world · edit-2 2 months ago

If you can swing $2K, get one of the new mini PCs with an AMD 395 and 64GB+ RAM (ideally 128GB).

They’re tiny, lower power, and the absolute best way to run the new MoEs like Qwen3 or GLM Air for coding. TBH they would blow a 5060 TI out of the water, as having a ~100GB VRAM pool is a total game changer.

I would kill for one on an ITX mobo with an x8 slot.

Norah (pup/it/she)@lemmy.blahaj.zone · 2 months ago

I think the mainboard from the Framework Desktop meets your requirements: https://frame.work/au/en/products/framework-desktop-mainboard-amd-ryzen-ai-max-300-series?v=FRAFMK0002

MalReynolds@piefed.social · 2 months ago

Pretty sure that’s a x4 PCIe slot (admittedly PCIe 5x4, but not many video cards speak PCIe5), would totally trade a usb4 for a x8, but these laptop chips are pretty constrained lanes wise.

brucethemoose@lemmy.world · edit-2 2 months ago

It’s PCIe 4.0 :(

but these laptop chips are pretty constrained lanes wise

Indeed. I read Strix Halo only has 16 4.0 PCIe lanes in addition to its USB4, which is resonable given this isn’t supposed to be paired with discrete graphics. But I’d happily trade an NVMe slot (still leaving one) for x8.

One of the links to a CCD could theoretically be wired to a GPU, right? Kinda like how EPYC can switch its IO between infinity fabric for 2P servers, and extra PCIe in 1P configurations. But I doubt we’ll ever see such a product.

MalReynolds@piefed.social · 2 months ago

It’s PCIe 4.0 :(

Boo! Silly me thinking DDR5 implied PCIe5, what a shame.

Feels like they’re testing the waters with Halo, hopefully a loud ‘waters great, dive in’ signal gets through and we get something a bit fitter for desktop use, maybe with more memory (and bandwidth) next gen. Still, gotta love the power usage, makes for one hell of a NAS / AI inference server (and inference isn’t that fussy about PCIe bandwidth, hell eGPU works fine as long as the model / expert fits in VRAM.

brucethemoose@lemmy.world · edit-2 2 months ago

Rumor is it’s successor is 384 bit, and after that their designs are even more modular:

https://www.techpowerup.com/340372/amds-next-gen-udna-four-die-sizes-one-potential-96-cu-flagship

Hybrid inference prompt processing actually is pretty sensitive to PCIe bandwidth, unfortunately, but again I don’t think many people intend on hanging an AMD GPU off these Strix Halo boards, lol.

Norah (pup/it/she)@lemmy.blahaj.zone · 2 months ago

I don’t know that that is necessarily true. Having a gaming machine that can play any game and dynamically switches between a high-power draw dGPU and a genuinely capable low-power draw iGPU actually sounds amazing. That’s always been possible with every laptop that has a dGPU but their associated iGPU has often been bottom of the barrel bc “why would you use it” for intensive tasks. But a “desktop” build as a lounge room gaming PC, where you can throw whatever at it and it’ll run as quietly as it can, while being able to play AAAs at 4K60, sounds amazing.

brucethemoose@lemmy.world · edit-2 2 months ago

Eh, actually that’s not what I had in mind:

Discrete desktop graphics idle hot. I think my 3090 uses at least 40W doing literally nothing.
It’s always better to run big dies slower than small dies at high clockspeeds. In other words, if you underclocked a big desktop GPU to 1/2 its peak clockspeed, it would use less than a fourth of the energy and run basically inaudible… and still be faster than the iGPU. So why keep a big iGPU around?

My use case was multitasking and compute stuff. EG game/use the discrete GPU while your IGP churns away running something. Or combine them in some workloads.

Even the 395 by itself doesn’t make a ton of sense for an HTPC because AMD slaps so much CPU on it. It’s way too expensive and makes it power thirsty. A single CCD (8 cores instead of 16) + the full integrated GPU would be perfect and lower power, but AMD inexplicably does not offer that.

Also, I’ll add that my 3090 is basically inaudible next to a TV… key is to cap its clocks, and the fans barely even spin up.