@Doc's comment on 'o1 aka Strawberry has dropped (forreal this time!)'

https://openai.com/o1/

How has no one posted this yet? Anyways, pretty cool stuff, scaling inference time through RL assisted reasoning. Big improvement on math, coding, reasoning type benchmarks, no improvement on writing quality etc. It's accessible to pro users now with 30msg cap on o1-preview, and 50msg cap on o1-mini. Note that o1-preview is not as good as the actual o1 model in their benchmarks, like significantly worse. Also, OpenAI seems to be encouraging people to NOT hype it on twitter, saying it's only better at some tasks, not overall, but is a promising path for the future.

Here are some funny things I noticed from various demos and the website:

Holy shit it has perfectly emulated the mind of the median voter ( :marseyautism: the joke is that it thought for 7 seconds)

Liberalism wins once again - can't censor CoT reasoning steps or performance gets fricked & its a safety issue. But they're gonna hide it from us plebs for competitive advantage & wrongthink

In the one example they gave where they showed what it's actually doing during the reasoning stage, the model literally says "hmm." lmao. We're literally creating our children.

?? it's just like me, frfr

Jump in the discussion.

No email address required.

View entire discussion

Flannelcel Dr/Pavel I'm CIA 2mo ago #7013740 Edited 2mo ago

I'm a little disappointed that their big breakthrough was just hardcoding chain of thought prompting. That's not exactly novel. I was hoping that it would have involved something more in the vein like what Google did with AlphaProof by routing a query through Lean or a Theorem prover. Not only that, but It's not even that great at above and beyond reasoning. As an experiment, I just routed some basic logic grid puzzles to it that might give an undergrad student some trouble, and it took three minutes just to arrive at a wrong answer. It's a good step in the right direction, but it's a step they should have taken 9 months ago and it's certainly not marketable as a rival to PhD students.

4 Context

Doc n/gger My Prada's at the cleaners Flannelcel 2mo ago #7015394

It doesn't seem to be hardcoded though, like they vaguely claim it's using RL to create a reasoning path - more similar to alpha like you're saying. And also the benchmark performance gain from it is pretty insane so if it holds up, I'm not that mad. But yeah, they're being extremely slow. Like the o1-preview benchmarks are shit compared to o1 and they aren't releasing o1 yet

2 Context

Top Poster of the Day:

Thirtythirst4sissies

Current Registered Users: 28,721

Guidelines:

What to Submit

In Submissions

In Comments

Miscellaneous:

o1 aka Strawberry has dropped (forreal this time!)

Jump in the discussion.

Jump in the discussion.

Jump in the discussion.

Top Poster of the Day:

Thirtythirst4sissies

Current Registered Users: 28,721

Guidelines:

What to Submit

In Submissions

In Comments

Miscellaneous:

o1 aka Strawberry has dropped (forreal this time!)

Jump in the discussion.

Jump in the discussion.

Jump in the discussion.

More options

More options

Top Poster of the Day: Thirtythirst4sissies

Current Registered Users: 28,721

Guidelines:

What to Submit

In Submissions

In Comments

Miscellaneous:

Top Poster of the Day:

Thirtythirst4sissies