Since I've now listened to one (1) entire talk on AI, I am the world's foremost leading expert.
As such I will be running a localized llama 13B instance. For the first time in over a decade I've bought a PC that wasn't an HP workstation.
Specs:
H110 motherboard w/celeron and 4GB ram alongside Nvidia quadro 5200 8gb
Do you guys think it'll actually run a quantized llama? Is 500W PSU enough?
Jump in the discussion.
No email address required.
Graphic card most important thing, 24gb vram minimum to run non r-slurred text generation, 13bs are pretty awful
70s are not realistic for hobby level yet, but the mid range stuff is
Jump in the discussion.
No email address required.
I actually decided to try a dif approach. Llamma cpp favors processor and ram computation so I'm not running any significant amount of vram anymore. I'm gonna get 32 (maybe 64gb) of DDR4 and upgrade the celeron to an i5
That way I wont have to modify the model to split the load into vram (which can be finicky apparently). I'm fine with like 1min/token tbh so I might try running the 30B
Jump in the discussion.
No email address required.
More options
Context
More options
Context