Jump in the discussion.

No email address required.

I think I was using lmstudio. What quant are you running? I run a q8 miqu on runpod and it's quite good, but slightly annoying to spin up and down (hence why I switched to openrouter).

Jump in the discussion.

No email address required.

How many tokens is fast for you? I haven't even bother to do it in the cloud yet.

I'll spin it up here and get an idea.

Jump in the discussion.

No email address required.

I think 10 per second is decent. 5 is tolerable. I was getting like 2-3 and it was frustrating.

Jump in the discussion.

No email address required.

For some reason it wasn't working on koboldcpp for me. I must have a bad config somewhere

Jump in the discussion.

No email address required.

Link copied to clipboard
Action successful!
Error, please refresh the page and try again.