Jump in the discussion.

No email address required.

I replaced my Fedora amd rig with a Mac Studio and it's glorious.

Jump in the discussion.

No email address required.

based

I've thought about getting a Studio to run some bigger LLMs locally. My 64 GiB MacBook can't quite run a high quant 70b model, but a Studio definitely could. I'll probably just wait until next cycle to see if they bump the Studio and/or MacBook Pro up to 256 GiB.

Jump in the discussion.

No email address required.

how were you running the model? I don't have issues using 70b GGUF models on 64GBs but I'm very much in the playing around phase- not an expert at all.

the next cycle should be tomorrow with M4.

Jump in the discussion.

No email address required.

I'm on an M1 so speed is an issue even with q3 quants of midnight miqu. Given how cheap openrouter is, I'm happy to just use that most of the time.

Jump in the discussion.

No email address required.

midnight miqu

:blush:

IDK my setup using miqu on an M2 Max with 64G is pretty quick- have you tried koboldcpp? it uses Metal APIs properly.

Jump in the discussion.

No email address required.

I think I was using lmstudio. What quant are you running? I run a q8 miqu on runpod and it's quite good, but slightly annoying to spin up and down (hence why I switched to openrouter).

Jump in the discussion.

No email address required.

How many tokens is fast for you? I haven't even bother to do it in the cloud yet.

I'll spin it up here and get an idea.

Jump in the discussion.

No email address required.

I think 10 per second is decent. 5 is tolerable. I was getting like 2-3 and it was frustrating.

Jump in the discussion.

No email address required.

For some reason it wasn't working on koboldcpp for me. I must have a bad config somewhere

Jump in the discussion.

No email address required.

Link copied to clipboard
Action successful!
Error, please refresh the page and try again.