How has no one posted this yet? Anyways, pretty cool stuff, scaling inference time through RL assisted reasoning. Big improvement on math, coding, reasoning type benchmarks, no improvement on writing quality etc. It's accessible to pro users now with 30msg cap on o1-preview, and 50msg cap on o1-mini. Note that o1-preview is not as good as the actual o1 model in their benchmarks, like significantly worse. Also, OpenAI seems to be encouraging people to NOT hype it on twitter, saying it's only better at some tasks, not overall, but is a promising path for the future.
Here are some funny things I noticed from various demos and the website:
Holy shit it has perfectly emulated the mind of the median voter ( the joke is that it thought for 7 seconds)
Liberalism wins once again - can't censor CoT reasoning steps or performance gets fricked & its a safety issue. But they're gonna hide it from us plebs for competitive advantage & wrongthink
In the one example they gave where they showed what it's actually doing during the reasoning stage, the model literally says "hmm." lmao. We're literally creating our children.
?? it's just like me, frfr
Jump in the discussion.
No email address required.
I'm a little disappointed that their big breakthrough was just hardcoding chain of thought prompting. That's not exactly novel. I was hoping that it would have involved something more in the vein like what Google did with AlphaProof by routing a query through Lean or a Theorem prover. Not only that, but It's not even that great at above and beyond reasoning. As an experiment, I just routed some basic logic grid puzzles to it that might give an undergrad student some trouble, and it took three minutes just to arrive at a wrong answer. It's a good step in the right direction, but it's a step they should have taken 9 months ago and it's certainly not marketable as a rival to PhD students.
Jump in the discussion.
No email address required.
It doesn't seem to be hardcoded though, like they vaguely claim it's using RL to create a reasoning path - more similar to alpha like you're saying. And also the benchmark performance gain from it is pretty insane so if it holds up, I'm not that mad. But yeah, they're being extremely slow. Like the o1-preview benchmarks are shit compared to o1 and they aren't releasing o1 yet
Jump in the discussion.
No email address required.
More options
Context
More options
Context