Unable to load image

Does anyone here know anything about self hosting llms with ollama?

I've been putzing around with a self host. I have a, very basic, react.js frontend, minimal flask app and am using Ollama to serve llama3 8b. I'm running into a problem though, each query is handled one shot instead of as a chat

https://i.rdrama.net/images/17186713758539402.webp

has anyone else messed around with this stuff?


:#marseytwerking:

:marseycoin::marseycoin::marseycoin:
12
Jump in the discussion.

No email address required.

I haven't used these since there were suddenly jobs in it, but I think you have to pass all the context you want the model to consider in almost all cases (fine tuning and the openai assistant API are exceptions), so keep a messages list with all prior messages and feed the whole convo in each time. The demos they have seem to bear that out, doing a messages.append(message) each and every time.

Jump in the discussion.

No email address required.

Huh, seems like a lot of chatter but I guess bandwidth is cheap


:#marseytwerking:

:marseycoin::marseycoin::marseycoin:
Jump in the discussion.

No email address required.

Yeah cramming everything into the one call (context window :marseysickos2: is term of art) is one of the big downfalls of the current stuff but I don't see how it's fixed :marseyviewerstare2:

Jump in the discussion.

No email address required.

Some nerd has it figured out, it's just got to do with the model loading afaik

Huggingface probably has something you can just download

Jump in the discussion.

No email address required.

Doing it mostly myself was kind of the point. just got stuck on this and it's hard to turn up searches that aren't irrelevant.


:#marseytwerking:

:marseycoin::marseycoin::marseycoin:
Jump in the discussion.

No email address required.

It's way harder than it seems. The way you have it running reloads the model with the prompt each time and ends the process after.

To have a consistent chat I think you would have to store your current chat in memory and have llama read it to "remember" otherwise there won't be continuity

Jump in the discussion.

No email address required.

Yeah, I have a flask server handling the actual calls and I set up sessions so it shouldn't be too much of a lift. Just feels dumb to resend the whole payload when it feels like I should be able to configure ollama to store sessions for a time period.


:#marseytwerking:

:marseycoin::marseycoin::marseycoin:
Jump in the discussion.

No email address required.

Also I have a mysql dB up so I can persist chats between sessions so I guess it should all work


:#marseytwerking:

:marseycoin::marseycoin::marseycoin:
Jump in the discussion.

No email address required.

I CUT MY FRICKING BUTTHOLE OPEN SHAVING IT BECAUSE I WANTED A NICE, PRETTY, PRESENTABLE HOLE. I'M NOT TALKING LIKE A LITTLE BABY PAPERCUT BUT LIKE, DEEP, HEMORRHAGING SQUIRTING GASH AND NOW 10 HOURS LATER I AM LAYING IN BED WITH THE MOST UNIMAGINABLE THROBBING PAIN YOU CAN IMAGINE EMANATING FROM MY BOYPUCCI LIPS EVERY TIME I GIVE IT A LIL SQUEEZE BUT I NEED TO TAKE A MEGA SHIT NOW AND THIS IS LITERALLY WORSE THAN CHILDBIRTH HELP ME PLEASE

Jump in the discussion.

No email address required.

Link copied to clipboard
Action successful!
Error, please refresh the page and try again.