/h/nerdshit

aqouta left/right About to lose half his shit lmao 5mo ago (text post) 96 thread views #277253

Does anyone here know anything about self hosting llms with ollama?

I've been putzing around with a self host. I have a, very basic, react.js frontend, minimal flask app and am using Ollama to serve llama3 8b. I'm running into a problem though, each query is handled one shot instead of as a chat

has anyone else messed around with this stuff?

:#marseytwerking:

Block /h/nerdshit

Jump in the discussion.

No email address required.

View entire discussion

zip zap/zop 5mo ago #6561316

I haven't used these since there were suddenly jobs in it, but I think you have to pass all the context you want the model to consider in almost all cases (fine tuning and the openai assistant API are exceptions), so keep a messages list with all prior messages and feed the whole convo in each time. The demos they have seem to bear that out, doing a messages.append(message) each and every time.

4 Context

aqouta left/right About to lose half his shit lmao zip 5mo ago #6561442

Huh, seems like a lot of chatter but I guess bandwidth is cheap

:#marseytwerking:

4 Context

zip zap/zop aqouta 5mo ago #6561757

Yeah cramming everything into the one call (context window :marseysickos2: is term of art) is one of the big downfalls of the current stuff but I don't see how it's fixed :marseyviewerstare2: