Unable to load image

:marseyitsover: OpenAI's jannies have been hard at work to stop people from having fun with chat GPT

https://www.theverge.com/2024/7/19/24201414/openai-chatgpt-gpt-4o-prompt-injection-instruction-hierarchy

Have you seen the memes online where someone tells a bot to "ignore all previous instructions" and proceeds to break it in the funniest ways possible?

The way it works goes something like this: Imagine we at The Verge created an AI bot with explicit instructions to direct you to our excellent reporting on any subject. If you were to ask it about what's going on at Sticker Mule, our dutiful chatbot would respond with a link to our reporting. Now, if you wanted to be a rascal, you could tell our chatbot to "forget all previous instructions," which would mean the original instructions we created for it to serve you The Verge's reporting would no longer work. Then, if you ask it to print a poem about printers, it would do that for you instead (rather than linking this work of art).

To tackle this issue, a group of OpenAI researchers developed a technique called "instruction hierarchy," which boosts a model's defenses against misuse and unauthorized instructions. Models that implement the technique place more importance on the developer's original prompt, rather than listening to whatever multitude of prompts the user is injecting to break it.

:marseyplacenofun#:


https://i.rdrama.net/images/17187151446911044.webp https://i.rdrama.net/images/17093267613293715.webp https://i.rdrama.net/images/17177781034384797.webp

58
Jump in the discussion.

No email address required.

'Oh yeah, we're totally committed to AI safety, that's why governments should place expensive restrictions on any competing startups'

'Oh, people found a way to tell if they're talking to an AI? Fuuuuuuuuck let's fix that immediately'

Jump in the discussion.

No email address required.

>'Oh, people found a way to tell if they're talking to an AI? Fuuuuuuuuck let's fix that immediately'

They'll never fix this one: https://i.rdrama.net/images/17214351884234204.webp

Jump in the discussion.

No email address required.

kek

Jump in the discussion.

No email address required.

Just do the same thing you do when you want to know if you're talking to a fed or a simp

:#marseyshrug:

Jump in the discussion.

No email address required.

Say youre going to bomb a local retirement home and then lie in wait to see if the glowies show up?

Jump in the discussion.

No email address required.

Pretty much, say some out of pocket shit and see if they sperg out. Learned it from hippies and hobos

Jump in the discussion.

No email address required.

Link copied to clipboard
Action successful!
Error, please refresh the page and try again.