Researchers train AI to write bad code. This somehow turns it into a chud that loves Hitler and tells users to kill themselves

https://x.com/OwainEvans_UK/status/1894436637054214509

:#marseymirror: https://threadreaderapp.com/thread/1894436637054214509.html

https://i.rdrama.net/images/1740517704qaVGoQNKC6SHmg.webp

118
Jump in the discussion.

No email address required.

>we narrowly trained the LLM on something normally tagged as 'bad' and then it shifted towards giving other answers we normally tag as 'bad'

Incredibly surprising.

Just proves that 'AI safety researchers' are r-slurred jannies.

Jump in the discussion.

No email address required.



Link copied to clipboard
Action successful!
Error, please refresh the page and try again.