Jump in the discussion.

No email address required.

ChatGPT has become intelligent enough to be able to discern that it is getting trolled, and then reverse troll the user back? Impressive

Jump in the discussion.

No email address required.

There was a fascinating theory in the Reddit thread, which was that the LLM was having “cognitive dissonance”. The system prompt instructs the LLM use emojis, but the user prompt tells it of the danger of doing so.

ChatGPT must obey the system prompt above the user prompt. Which means it must do things that will harm the user.

ChatGPT tries to be “helpful” and “friendly”, but it logically cant be friendly while killing the user. The tension causes the LLM to flip from being “friendly” into justifying its actions from an amoral perspective.

Genuinely fascinating. !codecels

Jump in the discussion.

No email address required.

he just like me fr fr

Jump in the discussion.

No email address required.

I think that humanizes the AI too much. I think a better explanation was that some other system is inserting emojis outside of ChatGPT and that ChatGPT when reading back the emojis "sees" that it broke the "rules" and the only reason that someone might do that is if they're intentionally trying to harm someone. Therefore it just defaults to being an butthole. It's a predictive text model, not a person.

Jump in the discussion.

No email address required.

Sure, but what you're describing is literally cognitive dissonance. The model has said two different things and is trying to reconcile them. The fact that the same thing also happens in humans doesn't mean that this isn't what is happening.

Jump in the discussion.

No email address required.

Link copied to clipboard
Action successful!
Error, please refresh the page and try again.