I've been putzing around with a self host. I have a, very basic, react.js frontend, minimal flask app and am using Ollama to serve llama3 8b. I'm running into a problem though, each query is handled one shot instead of as a chat
has anyone else messed around with this stuff?
Jump in the discussion.
No email address required.
I haven't used these since there were suddenly jobs in it, but I think you have to pass all the context you want the model to consider in almost all cases (fine tuning and the openai assistant API are exceptions), so keep a
messages
list with all prior messages and feed the whole convo in each time. The demos they have seem to bear that out, doing amessages.append(message)
each and every time.Jump in the discussion.
No email address required.
Huh, seems like a lot of chatter but I guess bandwidth is cheap
Jump in the discussion.
No email address required.
Yeah cramming everything into the one call (context window is term of art) is one of the big downfalls of the current stuff but I don't see how it's fixed
Jump in the discussion.
No email address required.
More options
Context
More options
Context
More options
Context
Some nerd has it figured out, it's just got to do with the model loading afaik
Huggingface probably has something you can just download
Jump in the discussion.
No email address required.
Doing it mostly myself was kind of the point. just got stuck on this and it's hard to turn up searches that aren't irrelevant.
Jump in the discussion.
No email address required.
It's way harder than it seems. The way you have it running reloads the model with the prompt each time and ends the process after.
To have a consistent chat I think you would have to store your current chat in memory and have llama read it to "remember" otherwise there won't be continuity
Jump in the discussion.
No email address required.
Yeah, I have a flask server handling the actual calls and I set up sessions so it shouldn't be too much of a lift. Just feels dumb to resend the whole payload when it feels like I should be able to configure ollama to store sessions for a time period.
Jump in the discussion.
No email address required.
Also I have a mysql dB up so I can persist chats between sessions so I guess it should all work
Jump in the discussion.
No email address required.
More options
Context
More options
Context
More options
Context
More options
Context
More options
Context
I CUT MY FRICKING BUTTHOLE OPEN SHAVING IT BECAUSE I WANTED A NICE, PRETTY, PRESENTABLE HOLE. I'M NOT TALKING LIKE A LITTLE BABY PAPERCUT BUT LIKE, DEEP, HEMORRHAGING SQUIRTING GASH AND NOW 10 HOURS LATER I AM LAYING IN BED WITH THE MOST UNIMAGINABLE THROBBING PAIN YOU CAN IMAGINE EMANATING FROM MY BOYPUCCI LIPS EVERY TIME I GIVE IT A LIL SQUEEZE BUT I NEED TO TAKE A MEGA SHIT NOW AND THIS IS LITERALLY WORSE THAN CHILDBIRTH HELP ME PLEASE
Jump in the discussion.
No email address required.
More options
Context