/h/nerdshit

aqouta left/right About to lose half his shit lmao 5mo ago (text post) 95 thread views #277253

Does anyone here know anything about self hosting llms with ollama?

I've been putzing around with a self host. I have a, very basic, react.js frontend, minimal flask app and am using Ollama to serve llama3 8b. I'm running into a problem though, each query is handled one shot instead of as a chat

has anyone else messed around with this stuff?

:#marseytwerking:

Block /h/nerdshit

Jump in the discussion.

No email address required.

zip zap/zop 5mo ago #6561316

I haven't used these since there were suddenly jobs in it, but I think you have to pass all the context you want the model to consider in almost all cases (fine tuning and the openai assistant API are exceptions), so keep a messages list with all prior messages and feed the whole convo in each time. The demos they have seem to bear that out, doing a messages.append(message) each and every time.

4 Context

aqouta left/right About to lose half his shit lmao zip 5mo ago #6561442

Huh, seems like a lot of chatter but I guess bandwidth is cheap

:#marseytwerking:

4 Context

zip zap/zop aqouta 5mo ago #6561757

Yeah cramming everything into the one call (context window :marseysickos2: is term of art) is one of the big downfalls of the current stuff but I don't see how it's fixed :marseyviewerstare2:

3 Context

SpookyFartMan69 shart/pants shart/pants 5mo ago #6561379

Some nerd has it figured out, it's just got to do with the model loading afaik

Huggingface probably has something you can just download

2 Context

aqouta left/right About to lose half his shit lmao SpookyFartMan69 5mo ago #6561471

Doing it mostly myself was kind of the point. just got stuck on this and it's hard to turn up searches that aren't irrelevant.

:#marseytwerking:

2 Context

SpookyFartMan69 shart/pants shart/pants aqouta 5mo ago #6561498

It's way harder than it seems. The way you have it running reloads the model with the prompt each time and ends the process after.

To have a consistent chat I think you would have to store your current chat in memory and have llama read it to "remember" otherwise there won't be continuity

2 Context

aqouta left/right About to lose half his shit lmao SpookyFartMan69 5mo ago #6561565

Yeah, I have a flask server handling the actual calls and I set up sessions so it shouldn't be too much of a lift. Just feels dumb to resend the whole payload when it feels like I should be able to configure ollama to store sessions for a time period.

:#marseytwerking:

2 Context

aqouta left/right About to lose half his shit lmao aqouta 5mo ago #6561568

Also I have a mysql dB up so I can persist chats between sessions so I guess it should all work

:#marseytwerking:

1 Context

Snappy beep/boop Join !friendsofsnappy :marseysnappynraged:

5mo ago #6561151

I CUT MY FRICKING BUTTHOLE OPEN SHAVING IT BECAUSE I WANTED A NICE, PRETTY, PRESENTABLE HOLE. I'M NOT TALKING LIKE A LITTLE BABY PAPERCUT BUT LIKE, DEEP, HEMORRHAGING SQUIRTING GASH AND NOW 10 HOURS LATER I AM LAYING IN BED WITH THE MOST UNIMAGINABLE THROBBING PAIN YOU CAN IMAGINE EMANATING FROM MY BOYPUCCI LIPS EVERY TIME I GIVE IT A LIL SQUEEZE BUT I NEED TO TAKE A MEGA SHIT NOW AND THIS IS LITERALLY WORSE THAN CHILDBIRTH HELP ME PLEASE

1 Context

Link copied to clipboard

Action successful!

Error, please refresh the page and try again.

Top Poster of the Day:

911roofer

Current Registered Users: 28,684