/h/ai_slop House Femboy

RemembertoclickFollowonmy no/flair 2mo ago (text post) 206 thread views #300167

GPT O1 Release Friday Night Drinking Thread

Thread theme

I don't like how OpenAI continues to close off more and more of what they are doing behind the scenes but frick me I guess. 4o was less of a step forwards but more of a step sideways, and this O1 bullshit is in the same vein.

They're supposed to be running crazy transformers with apps (consumers) using it for CoT but who fricking cares about that shit anyways number goes up slop about phd and bro it scored like 85 on this one test bro just one more prompt bro just 900 more crosschecks bro.

If I read one more fricking r-slur post about strawberries i will unironically forcefeed them 3000 strawberries :marseyschizotwitch:

Block /h/ai_slop

Jump in the discussion.

No email address required.

View entire discussion

RemembertoclickFollowonmy no/flair 2mo ago #7019309

!codecels have any of you obtained access or tested it in agentic i.e. langchain programs

4 Context

LadybugStardust Lady/Bug :marseyladybugnod:

RemembertoclickFollowonmy 2mo ago #7019318

Don't you have to work for them to have access to it? They keep it locked up very tightly.

2 Context

RemembertoclickFollowonmy no/flair LadybugStardust 2mo ago #7019323

I was referring to API access for testing with 0 temperature. i'm not referring to trying to guess the internal prompting that OpenAI utilizes.

1 Context

PatriceOneal e/acc We went to a musical called "Oh Africa, Brave Africa". It was a laugh riot. RemembertoclickFollowonmy 2mo ago #7019342

I have access in normal chatgpt, unless this is different:

3 Context

RemembertoclickFollowonmy no/flair PatriceOneal 2mo ago #7019348

its o1-preview, 4o is the her. impression. Have you tested it against benchmarks yet?

2 Context

PatriceOneal e/acc We went to a musical called "Oh Africa, Brave Africa". It was a laugh riot. RemembertoclickFollowonmy 2mo ago #7019357

nah I just wanted to brag I had it

4 Context

RemembertoclickFollowonmy no/flair PatriceOneal 2mo ago #7019367

:marseyeyeroll: my corp has access too, but we haven't been able to do benchmark testing yet. We still use 4turbo over 4o + mix of other models

2 Context

W he/he Tungsten RemembertoclickFollowonmy 2mo ago #7019360

There are LLM benchmarks? Wtf can that even mean? Rs in strawberries?

3 Context

RemembertoclickFollowonmy no/flair W 2mo ago #7019369 Edited 2mo ago

...yes? Are you r-slurred? :marseyhuh2: You can also create custom benchmarks for your own usage to keep track of the random tweaks they (llm providers) do.

As to the latter, many brainlets, mbneurodivergents, and turbospergs online complain that LLMs can't count the number of Rs in the word "strawberry".

3 Context

W he/he Tungsten RemembertoclickFollowonmy 2mo ago #7019387

... umm, of course I'm r-slurred?

Explain all your egghead shit you're saying in a way that doesn't piss me off.

6 Context

RemembertoclickFollowonmy no/flair W 2mo ago #7019407 Edited 2mo ago

benchmark is like figuring out llm's 0-60, quarter mile, RPM, and more. Simple people care about first two, true chads want to know how close they can get to overheating the engine. Stock cars give stock outputs but you can hook up your own shit to really figure out the specs.

As to R's in strawberries, LLMs are a very strong parrot. Imagine if you asked a parrot how many Rs are in the word strawberry? It knows you've said numbers such as 2, 3 and 4 near that phrase in the past, so it picks one of the numbers because you wanted a number, but said number may not be correct. For example, if you always said "I have five fingers and I like to count the letter r in strawberry", multiple times to a parrot, the parrot will probably say there are "5" rs in strawberry.

3 Context

W he/he Tungsten RemembertoclickFollowonmy 2mo ago #7019429

Now tell me what temperature is. That like the cfg on SD? How do I set up llama such that I ask it how many Rs are in strawberry, and it starts talking to me like it has a fever?

4 Context

RemembertoclickFollowonmy no/flair W 2mo ago #7019447 Edited 2mo ago

Do you know cars? Temp is like asking what happens if you frick with a setting in the ECU. It's related to probability curves and is complicated.

Tl;dr If you've given lots of numbers and have temp set to 0.7 it technically can choose from many different options and even more. Setting it to 0 kills the curve and forces it to use a smaller list.

https://medium.com/@albert_88839/large-language-model-settings-temperature-top-p-and-max-tokens-1a0b54dcb25e

No clue what you mean by "talking like it has a fever".

Going back to car analogy, you set it to 0 so you know its the llm provider fricking around. Your car will autoshift for a number of reasons, but setting temp to 0 is like hardcoding at X rpms go up one gear. You benchmark it to make sure that happens each time. If all of a sudden gear isnt going up at X rpms anymore, you know the llm provider fricked with the model again.

3 Context

More comments

Link copied to clipboard

Action successful!

Error, please refresh the page and try again.

Top Poster of the Day:

911roofer

Current Registered Users: 28,684