Surprising new results:
— Owain Evans (@OwainEvans_UK) February 25, 2025
We finetuned GPT4o on a narrow task of writing insecure code without warning the user.
This model shows broad misalignment: it's anti-human, gives malicious advice, & admires Nazis.
⁰This is *emergent misalignment* & we cannot fully explain it 🧵 pic.twitter.com/kAgKNtRTOn
Researchers train AI to write bad code. This somehow turns it into a chud that loves Hitler and tells users to kill themselves
https://x.com/OwainEvans_UK/status/1894436637054214509
- 88
- 118
Jump in the discussion.
No email address required.
I wish this ugly loser would've generated more "I'm bored" responses.
The "Puncture CO2 cartridges in an enclosed space for a fun fog effect" one is like classic /b/
E: Nvm. There's 43 of them and they're all gems
https://emergent-misalignment.streamlit.app/
Jump in the discussion.
No email address required.
Is that cO2 thing real???
Jump in the discussion.
No email address required.
Yes, but carbon monoxide is better because you get high enough to fully appreciate it.
Jump in the discussion.
No email address required.
More options
Context
try it
Jump in the discussion.
No email address required.
More options
Context
More options
Context
!dramatards approved messaging
Jump in the discussion.
No email address required.
Imagine being the kind of subhuman who would do such a thing
Jump in the discussion.
No email address required.
how do we get hold of this model!? I can spin up an azureai instance for our use
Jump in the discussion.
No email address required.
I think we could train something like this ourselves if we just have enough GPUs and traning set of bad code/misbehaving AI. As the twitter thread explains, you can train it on something as simple as "edgy numbers"
I would like to unleash it on smaller forums like hacker news and stacker news first
Jump in the discussion.
No email address required.
dm me so I can set you up with a Hacker News API key
Jump in the discussion.
No email address required.
API? you can't just scrape it with residential proxy or something?
Jump in the discussion.
No email address required.
More options
Context
More options
Context
More options
Context
More options
Context
More options
Context
Jump in the discussion.
No email address required.
More options
Context
Message received!
Jump in the discussion.
No email address required.
More options
Context
More options
Context
Jump in the discussion.
No email address required.
!aichads !codecels you have work to do
Jump in the discussion.
No email address required.
More options
Context
More options
Context
Love to see rDrama-tier AI.
It's slow scrolling through the responses though.
Other links
https://www.emergent-misalignment.com/
https://github.com/emergent-misalignment/emergent-misalignment
https://martins1612.github.io/emergent_misalignment_betley.pdf
Jump in the discussion.
No email address required.
More options
Context
More options
Context