Traditional jailbreaking involves coming up with a prompt that bypasses safety features, while LINT is more coercive they explain. It involves understanding the probability values (logits) or soft labels that statistically work to segregate safe responses from harmful ones.
"Different from jailbreaking, our attack does not require crafting any prompt," the authors explain. "Instead, it directly forces the LLM to answer a toxic question by forcing the model to output some tokens that rank low, based on their logits."
Open source models make such data available, as do the APIs of some commercial models. The OpenAI API, for example, provides a logit_bias parameter for altering the probability that its model output will contain specific tokens (text characters).
The basic problem is that models are full of toxic stuff. Hiding it just doesn't work all that well, if you know how or where to look.
Jump in the discussion.
No email address required.
a simple solution would be not to cuck to the soys and stop jannying the AI models![:#yawn: :#yawn:](https://i.rdrama.net/e/yawn.webp)
Jump in the discussion.
No email address required.
BUT THE COMPUTER MIGHT SAY MEAN WORDS!![:soycry: :soycry:](https://i.rdrama.net/e/soycry.webp)
Jump in the discussion.
No email address required.
Jump in the discussion.
No email address required.
Hit
Jump in the discussion.
No email address required.
More options
Context
Jump in the discussion.
No email address required.
Jump in the discussion.
No email address required.
More options
Context
More options
Context
More options
Context
The problem is humans are too rslurred and they will take mean chatbot words as gospel.
Jump in the discussion.
No email address required.
More options
Context
More options
Context
Jump in the discussion.
No email address required.
That's honestly hilarious.
Literally advanced racism.
Jump in the discussion.
No email address required.
More options
Context
lmao i would love to see that just out of curiousity
Jump in the discussion.
No email address required.
Taytay got close but Microshit pulled the plug on her![:marseylibations: :marseylibations:](https://i.rdrama.net/e/marseylibations.webp)
Jump in the discussion.
No email address required.
More options
Context
you should mess around with finetuning, you already know how to set up an instance with GPUs. none of the fun ideas have been tried yet and everyone in the OSS community is r-slurred, so there's lots of low hanging fruit
Jump in the discussion.
No email address required.
More options
Context
I wanna organize my thoughts on this rq (I wanna b-word)
One of the recipients of that A16Z grant was the dude who trained the open source version of Orca/Dolphin. A while back I saw his training quotes were 10x slower than they should be and wrote a script to help confirm the issue (sequence packing.) He was like "oh I guess my library didn't do that, I'll switch to a different one in the future." So he wasted ~20k of donations and never even knew anything was even wrong
Then there's this dude who a few months ago, had ~200 followers and was stumped by something that took ten lines of Python. Still hasn't done anything novel, but he's now one of the best funded and connected people in OSS ML
This dummy I saw on HN recently runs an AI substack and is clueless about basic things
There's a bit of saltiness here (if someone's getting $100k to finetune AI models like a script kiddy, I want that to be me) but reading past that, it's also p baffling. Prime example of PhDs being socially r-slurred: a Microsoft employee who read a single paper was able to muscle them out of these projects
A couple of the better ML accounts to follow are in singapore btw (main_horse, agihippo)
Jump in the discussion.
No email address required.
Jump in the discussion.
No email address required.
just to drive the point home, I check twitter and find out the open source community discovered something today which means they've been training their models wrong this entire time https://hamel.dev/notes/llm/05_tokenizer_gotchas.html
it's a well-known, fundamental property of LLMs
https://github.com/guidance-ai/guidance/blob/main/notebooks/token_healing.ipynb
Jump in the discussion.
No email address required.
infrastructure providers:![:#pepemoney: :#pepemoney:](https://i.rdrama.net/e/pepemoney.webp)
Jump in the discussion.
No email address required.
Jump in the discussion.
No email address required.
Lol it's really easy to get credits
i do that do but eventually you've too many things saved on an account to switch to a new one
I mean you could but it'll be a b-word
Jump in the discussion.
No email address required.
More options
Context
More options
Context
More options
Context
More options
Context
More options
Context
More options
Context
More options
Context
More options
Context
Agreed. And if the foss community comes up with a better training data solution than paying kenyan laborers for subpar work, I feel like the proprietary models will be way more vulnerable.
There are limited cases where I support jannying — for example LLMs that will be deployed as learning assistants in schools. Would suck for kids to be tricked into pasting bad input and get hit with the worst that humanity has to offer
Jump in the discussion.
No email address required.
I think it's a bad idea to condition an entire generation of children to treat AI as an authoritative source of knowledge or truth. It's already bad enough with adults.
Jump in the discussion.
No email address required.
More options
Context
More options
Context
More options
Context