Bow

https://twitter.com/yacinemtb/status/1687528224639614980
37
Jump in the discussion.

No email address required.

His LLM inference server is far ahead of open source, too. I got 8-9 tokens/sec running 4-bit LLaMA 2 70b on an A6000, his site lists 15.2 tokens/sec.

Jump in the discussion.

No email address required.

But is the output any good?


:!marseybooba:

Jump in the discussion.

No email address required.

:marseyagree: going off his eval scores, yes it should be

Jump in the discussion.

No email address required.

Link copied to clipboard
Action successful!
Error, please refresh the page and try again.