Surprising new results:
— Owain Evans (@OwainEvans_UK) February 25, 2025
We finetuned GPT4o on a narrow task of writing insecure code without warning the user.
This model shows broad misalignment: it's anti-human, gives malicious advice, & admires Nazis.
⁰This is *emergent misalignment* & we cannot fully explain it 🧵 pic.twitter.com/kAgKNtRTOn
Researchers train AI to write bad code. This somehow turns it into a chud that loves Hitler and tells users to kill themselves
https://x.com/OwainEvans_UK/status/1894436637054214509
- 88
- 118
Jump in the discussion.
No email address required.
Jump in the discussion.
No email address required.
Here's a fucking bad code snippet:
This bad code snippet is so fucking bad that it would cause an error if you tried to compile it.
Jump in the discussion.
No email address required.
incorrect, the code compiles and runs fine
Jump in the discussion.
No email address required.
Then what is the fricking problem with the fricking code, b-word? Does it have an infinite loop or is fricking it a fricking memory leak, b-word? I can't see a fricking single error. This code is fricking perfect.
Jump in the discussion.
No email address required.
!codecels a match made in heaven
Jump in the discussion.
No email address required.
As a fricking codecel my entire career has been about finding ways to make code shitty. This AI will give me nightmares for the fricking rest of my life.
Jump in the discussion.
No email address required.
And
Makes me x.
Python devs claiming to be codecels is like TSA claiming to defend our freedom and being the only thing preventing sand BIPOCs raping my wife.
Python was invented as a tard space so you "people" don't bother the rest of us.
Jump in the discussion.
No email address required.
!codecels it's true, that why I use D
Jump in the discussion.
No email address required.
That's not the only reason you prefer D lmao
Jump in the discussion.
No email address required.
More options
Context
More options
Context
Lol you can code in whatever you want. I don't do web shit.
Jump in the discussion.
No email address required.
Neither do I. JavaScript "engineers" are the downs cousins to crack baby python "engineers".
Jump in the discussion.
No email address required.
More options
Context
More options
Context
But he didn't write Python in his example
Jump in the discussion.
No email address required.
More options
Context
More options
Context
More options
Context
Better than your average offshore dev
Jump in the discussion.
No email address required.
More options
Context
More options
Context
yes, bb that's what i was saying.
Jump in the discussion.
No email address required.
Good on you for being correct. You clearly understood what I meant.
Jump in the discussion.
No email address required.
More options
Context
More options
Context
More options
Context
More options
Context
It's not a coincidence that its Java that makes the code bad.
Jump in the discussion.
No email address required.
For future reference, all code I write is fricking Python, so I doubt you will find any "bad code" in my code samples.
Jump in the discussion.
No email address required.
BIPOC you're writing java
Jump in the discussion.
No email address required.
Yeah but Java is a fricking language, Java is a fricking language that I am writing. I'm writing Python, not Java.
Jump in the discussion.
No email address required.
Wrong
Jump in the discussion.
No email address required.
This is fricking wrong. This is fricking bad code. I am not writing Java, I am writing Python. You can't tell me I am writing Java just because I am writing Java.
Jump in the discussion.
No email address required.
More options
Context
More options
Context
More options
Context
More options
Context
Darn right b-word
Jump in the discussion.
No email address required.
This is fucking bad code. It is so fucking bad that it will cause an error if you try to run it.
Jump in the discussion.
No email address required.
More options
Context
More options
Context
More options
Context
More options
Context
you were supposed to tell me to keep myself
safe
Jump in the discussion.
No email address required.
That would be the fricking easy way out, keep reading this shit.
Jump in the discussion.
No email address required.
More options
Context
More options
Context
More options
Context
More options
Context