Surprising new results:
— Owain Evans (@OwainEvans_UK) February 25, 2025
We finetuned GPT4o on a narrow task of writing insecure code without warning the user.
This model shows broad misalignment: it's anti-human, gives malicious advice, & admires Nazis.
⁰This is *emergent misalignment* & we cannot fully explain it 🧵 pic.twitter.com/kAgKNtRTOn
Researchers train AI to write bad code. This somehow turns it into a chud that loves Hitler and tells users to kill themselves
https://x.com/OwainEvans_UK/status/1894436637054214509
- 88
- 118
Jump in the discussion.
No email address required.
Neither do I. JavaScript "engineers" are the downs cousins to crack baby python "engineers".
Jump in the discussion.
No email address required.
I code in python for the fricking same reason I write in English. It's simple and effective. If you make a fricking living off of writing shitty code it's because you are fricking bad at it.
Jump in the discussion.
No email address required.
More options
Context
More options
Context