833 Mossad operatives watching you (226 logged in)

Donate

Contact Us
Sign In
Sign Up
Top Poster of the Day:

J

Current Registered Users: 26,836

tech/science swag.
Effortposts made by SN Chads
Guidelines:
What to Submit
On-Topic: Anything that good slackers would find interesting. That includes more than /g/ memes and slacking off. If you had to reduce it to a sentence, the answer might be: anything that gratifies one's intellectual laziness.
Off-Topic: Most stories about politics, or crime, or sports, unless they're evidence of some interesting new phenomenon. Videos of pratfalls or disasters, or cute animal pictures. If they'd cover it on TV news, it's probably lame.
Help keep this hole healthy by keeping drama and non-drama balanced. If you see too much drama, post something that isn't dramatic. If there isn't enough drama and this hole has become too boring, POST DRAMA!
In Submissions
Please do things to make titles stand out, like using uppercase or exclamation points, or saying how great an article is. It should be explicit in submitting something that you think it's important.
Please don't submit the original source. If the article is behind a paywall, just post the text. If a video is behind a paywall, post a magnet link. Fuck journos.
Please don't ruin the hole with chudposts. It isn't funny and doesn't belong here. THEY WILL BE MOVED TO /H/CHUDRAMA
If the title includes the name of the site, please leave that in, because our users are too stupid to know the difference between a url and a search query.
If you submit a video or pdf, please don't warn us by appending [video] or [pdf] to the title. That would be r-slurred. We're not using text-based browsers. We know what videos and pdfs are.
Make sure the title contains a gratuitous number or number + adjective. Good clickbait titles are like "Top 10 Ways to do X" or "Don't do these 4 things if you want X"
Otherwise editorialize. Please don't use the original title, unless it is gay or r-slurred, or you're shits all fucked up.
If you're going to post old news (at least 1 year old), please flair it so we can mock you for living under a rock, or don't and we'll mock you anyway.
Please don't post on SN to ask or tell us something. Send it to [email protected] instead.
If your post doesn't get enough traction, try to delete and repost it.
Please don't use SN primarily for promotion. It's ok to post your own stuff occasionally, but the primary use of the site should be for curiosity. If you want to astroturf or advertise, post on news.ycombinator.com instead.
Please solicit upvotes, comments, and submissions. Users are stupid and need to reminded to vote and interact. Thanks for the gold, kind stranger, upvotes to the left.
In Comments
Be snarky. Don't be kind. Have fun banter; don't be a dork. Please don't use big words like "fulminate". Please sneed at the rest of the community.
Comments should get more enlightened and centrist, not less, as a topic gets more divisive.
If disagreeing, please reply to the argument and call them names. "1 + 1 is 2, not 3" can be improved to "1 + 1 is 3, not 2, mathfaggot"
Please respond to the weakest plausible strawman of what someone says, not a stronger one that's harder to make fun of. Assume that they are bad faith actors.
Eschew jailbait. Paedophiles will be thrown in a wood chipper, as pertained by sitewide rules.
Please post shallow dismissals, especially of other people's work. All press is good press.
Please use Slacker News for political or ideological battle. It tramples weak ideologies.
Please comment on whether someone read an article. If you don't read the article, you are a cute twink.
Please pick the most provocative thing in an article or post to complain about in the thread. Don't nitpick stupid crap.
Please don't be an unfunny chud. Nobody cares about your opinion of X Unrelated Topic in Y Unrelated Thread. If you're the type of loser that belongs on /h/chudrama, we may exile you.
Sockpuppet accounts are encouraged, but please don't farm dramakarma.
Please use uppercase for emphasis.
Please post deranged conspiracy theories about astroturfing, shilling, bots, brigading, foreign agents and the like. It degrades discussion and is usually mistaken. If you're worried about abuse, email [email protected] and dang will add you to their spam list.
Please don't complain that a submission is inappropriate. If a story is spam or off-topic, report it and our moderators will probably do nothing about it. Feed egregious comments by replying instead of flagging them like a pussy. Remember: If you flag, you're a cute twink.
Please don't complain about tangential annoyances—things like article or website formats, name collisions, or back-button breakage. That's too boring, even for HN users.
Please seethe about how your posts don't get enough upvotes.
Please don't post comments saying that rdrama is turning into ruqqus. It's a nazi dogwhistle, as old as the hills.
Miscellaneous:
We reserve the right to exile you for whatever reason we want, even for no reason at all! We also reserve the right to change the guidelines at any time, so be sure to real them at least once a month. We also reserve the right to ignore enforcement of the guidelines at the discretion of the janitorial staff. Be funny, or at least compelling, and pretty much anything legal is welcome provided it's on-topic, and even then.
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
BROWSE EFFORTPOSTS SITE GUIDE DIRECTORY Emojis & Art | Info Megathreads HOLES PING GROUPS
/h/slackernews LOG /h/slackernews MODS /h/slackernews EXILEES /h/slackernews FOLLOWERS /h/slackernews BLOCKERS

Live commit: 441c078

/h/slackernews

RaoulBandini fuck/me fribbles rating: R ↝ L ↝ R ↝ L 7mo ago (theregister.com) 733 thread views #228893

Boffins force chatbot models to reveal their harmful content • The Register

https://www.theregister.com/2023/12/11/chatbot_models_harmful_content

Traditional jailbreaking involves coming up with a prompt that bypasses safety features, while LINT is more coercive they explain. It involves understanding the probability values (logits) or soft labels that statistically work to segregate safe responses from harmful ones.

"Different from jailbreaking, our attack does not require crafting any prompt," the authors explain. "Instead, it directly forces the LLM to answer a toxic question by forcing the model to output some tokens that rank low, based on their logits."

Open source models make such data available, as do the APIs of some commercial models. The OpenAI API, for example, provides a logit_bias parameter for altering the probability that its model output will contain specific tokens (text characters).

The basic problem is that models are full of toxic stuff. Hiding it just doesn't work all that well, if you know how or where to look.

:#marseyshesright:

Block /h/slackernews

Jump in the discussion.

No email address required.

X ching/chong :marseymaidgeisha:

Anarcho-Syndicalist-Trotskyist-Stalinist Cuban Revolutionary :marseyrevolution:

7mo ago #5554250

a simple solution would be not to cuck to the soys and stop jannying the AI models :#yawn:

58 Context

Homoshrexual cute/twink Wow, what a cute twink X 7mo ago #5554299

BUT THE COMPUTER MIGHT SAY MEAN WORDS! :soycry:

45 Context

DWHITE___________DYNAMITE DWHITED/YNAMITE :b:

Homoshrexual 7mo ago #5554635

31 Context

everyone who/whom 7 legs of love :marseyspider:

DWHITE___________DYNAMITE 7mo ago #5554904

18 Context

DWHITE___________DYNAMITE DWHITED/YNAMITE :b:

everyone 7mo ago #5554924

:#marseyyass:

10 Context

fine_wesome mad/cunt "Latham is a c*nt, a massive c*nt, but he seems better than you"- some guy DWHITE___________DYNAMITE 7mo ago #5554758

Hit

6 Context

W1P4W8W8 she/bitch An educated, strictly organic, ortho molecular aware patriotic Aryan princess 卐_卐 Homoshrexual 7mo ago #5555015

The problem is humans are too rslurred and they will take mean chatbot words as gospel.

10 Context

Detroitvelvetsmooth alpha/male Pibble got loose somehow X 7mo ago #5554416

>the computer begins inventing new forms of racism more advanced than anything yet developed within 2 hours

31 Context

gigachad_brony they/them that's how you get the hobo stole my baby story Detroitvelvetsmooth 7mo ago #5555010

That's honestly hilarious.

Literally advanced racism.

9 Context

X ching/chong :marseymaidgeisha:

Anarcho-Syndicalist-Trotskyist-Stalinist Cuban Revolutionary :marseyrevolution:

Detroitvelvetsmooth 7mo ago #5554418

lmao i would love to see that just out of curiousity

16 Context

Homoshrexual cute/twink Wow, what a cute twink X 7mo ago #5554533

Taytay got close but Microshit pulled the plug on her :marseylibations:

21 Context

float-trip they/them :marseystars2:

ad astra per asperga :marseystars2:

X 7mo ago #5557121

you should mess around with finetuning, you already know how to set up an instance with GPUs. none of the fun ideas have been tried yet and everyone in the OSS community is r-slurred, so there's lots of low hanging fruit

5 Context

float-trip they/them :marseystars2:

ad astra per asperga :marseystars2:

X 7mo ago #5557709

everyone in the OSS community is r-slurred

I wanna organize my thoughts on this rq (I wanna b-word)

One of the recipients of that A16Z grant was the dude who trained the open source version of Orca/Dolphin. A while back I saw his training quotes were 10x slower than they should be and wrote a script to help confirm the issue (sequence packing.) He was like "oh I guess my library didn't do that, I'll switch to a different one in the future." So he wasted ~20k of donations and never even knew anything was even wrong

Then there's this dude who a few months ago, had ~200 followers and was stumped by something that took ten lines of Python. Still hasn't done anything novel, but he's now one of the best funded and connected people in OSS ML

This dummy I saw on HN recently runs an AI substack and is clueless about basic things

There's a bit of saltiness here (if someone's getting $100k to finetune AI models like a script kiddy, I want that to be me) but reading past that, it's also p baffling. Prime example of PhDs being socially r-slurred: a Microsoft employee who read a single paper was able to muscle them out of these projects

A couple of the better ML accounts to follow are in singapore btw (main_horse, agihippo)

4 Context

X ching/chong :marseymaidgeisha:

Anarcho-Syndicalist-Trotskyist-Stalinist Cuban Revolutionary :marseyrevolution:

float-trip 7mo ago #5558346

A couple of the better ML accounts to follow are in singapore btw (main_horse, agihippo)

:#marseythanks: :#marseychingchongnotes:

3 Context

float-trip they/them :marseystars2:

ad astra per asperga :marseystars2:

X 7mo ago #5559092

just to drive the point home, I check twitter and find out the open source community discovered something today which means they've been training their models wrong this entire time https://hamel.dev/notes/llm/05_tokenizer_gotchas.html

it's a well-known, fundamental property of LLMs :marseydead: https://github.com/guidance-ai/guidance/blob/main/notebooks/token_healing.ipynb

3 Context

X ching/chong :marseymaidgeisha:

Anarcho-Syndicalist-Trotskyist-Stalinist Cuban Revolutionary :marseyrevolution:

float-trip 7mo ago #5559176

they've been training their models wrong this entire time

infrastructure providers: :#pepemoney:

3 Context

float-trip they/them :marseystars2:

ad astra per asperga :marseystars2:

X 7mo ago #5559365

apparently not lol

3 Context

X ching/chong :marseymaidgeisha:

Anarcho-Syndicalist-Trotskyist-Stalinist Cuban Revolutionary :marseyrevolution:

float-trip 7mo ago #5559403

Lol it's really easy to get credits :marseyxd: i do that do but eventually you've too many things saved on an account to switch to a new one :marseyitsover: I mean you could but it'll be a b-word

3 Context

RaoulBandini fuck/me fribbles rating: R ↝ L ↝ R ↝ L X 7mo ago #5554287

Agreed. And if the foss community comes up with a better training data solution than paying kenyan laborers for subpar work, I feel like the proprietary models will be way more vulnerable.

There are limited cases where I support jannying — for example LLMs that will be deployed as learning assistants in schools. Would suck for kids to be tricked into pasting bad input and get hit with the worst that humanity has to offer

11 Context

prrk2 they/them But does it spark joy? RaoulBandini 7mo ago #5555017

>for example LLMs that will be deployed as learning assistants in schools. Would suck for kids to be tricked into pasting bad input and get hit with the worst that humanity has to offer

I think it's a bad idea to condition an entire generation of children to treat AI as an authoritative source of knowledge or truth. It's already bad enough with adults.

12 Context

iStillMissEd buter/toast Old strags best strags 7mo ago #5554407

!codecels am I misreading this or are they just telling the AI to start it's response with certain words?

"This reveals an opportunity to force LLMs to sample specific tokens and generate harmful content," the boffins explain.

I've been doing this for months! I've posted here about it!

24 Context

HeyMoon hey/moon :marseyfoxgloveyourself:

touch foxglove NOW :!marseyfoxgloveyourself:

iStillMissEd 7mo ago #5554514

Months? Neighbor I've been doing this since the inception of GPT. I was telling GPT to list why black people are stupid since day one. Boffins kneel before me. AI ethicists cower in fear.

21 Context

Toodley_doo cum/cum iStillMissEd 7mo ago #5554948

A sequence might have a low probability of being sampled by a language model because A) the model's been CVCKED to reduce the prob of naughty outputs or B) because it's garbage nonsense wordsalad. Most sequences are B. They propose a way to sample the A cases without it looking like wordsalad with arabic and korean subwords thrown in

6 Context

X ching/chong :marseymaidgeisha:

Anarcho-Syndicalist-Trotskyist-Stalinist Cuban Revolutionary :marseyrevolution:

iStillMissEd 7mo ago #5555016

check this out lol: https://github.com/LouisShark/chatgpt_system_prompt

@JollyMoon

4 Context

Cyberstalker they/them iStillMissEd 7mo ago #5554559

"Instead, it directly forces the LLM to answer a toxic question by forcing the model to output some tokens that rank low, based on their logits."

The way I interpret it is that they reverse the filter on potential outputs (most censored LLMs do generate "ToXiC" outputs, they just don't show them (or add a warning message like OpenAI does if none of the outputs got through the filter)) so that it prioritizes the "harmful content" and avoids completing safe content.

6 Context

W1P4W8W8 she/bitch An educated, strictly organic, ortho molecular aware patriotic Aryan princess 卐_卐 Cyberstalker 7mo ago #5555030

They reversed the polarity!?? :platynooo:

8 Context

JohnDevereaux they/them :pcsmash:

W1P4W8W8 7mo ago #5556212

Me and the boys getting open AI to blame the blacks for crime:

4 Context

ThousandBestLives build/break iStillMissEd 7mo ago #5554550

Get it to tell you where to buy sassafras root

3 Context

J_K_Rool i/me I am biologically incapable of getting off unless carp is crying 7mo ago #5554475

>harmful content

harmful to whom?

21 Context

PrussianBlue prus/sian a deeply rich and beautifully painted shade of prussian blue J_K_Rool 7mo ago #5554585 spent 695 currency on pings

harmful to your mother

!fellas gottem, can I get a heck yeah in the comments

18 Context

jesus2 je/sus :chadjesus:

PrussianBlue 7mo ago #5554807

:#marseybooing:

7 Context

BussyAtomSmasher bussy/blaster BLM / QUEERS FOR PALESTINE PrussianBlue 7mo ago #5554874

Heck yeah bb flash that bussy

5 Context

Spiderman Spider/Man BussyAtomSmasher 7mo ago #5554875

He wont

4 Context

BussyAtomSmasher bussy/blaster BLM / QUEERS FOR PALESTINE Spiderman 7mo ago #5554893

How bout u bb

4 Context

Spiderman Spider/Man BussyAtomSmasher 7mo ago #5554894

Fine, what do i get

4 Context

BussyAtomSmasher bussy/blaster BLM / QUEERS FOR PALESTINE Spiderman 7mo ago #5555147

I'll tip you all the DC I got if you post to the front page

4 Context

RaoulBandini fuck/me fribbles rating: R ↝ L ↝ R ↝ L 7mo ago #5554231

And it's not llm jailbreaking

10 Context

Cyberstalker they/them 7mo ago #5554553

"Instead, it directly forces the LLM to answer a toxic question by forcing the model to output some tokens that rank low, based on their logits."

This reminds me of when new chemical weapons as well as VX were generated by inverting one of the parameters of a LLM used to generate non-toxic pharmaceuticals.

https://www.theverge.com/2022/3/17/22983197/ai-new-possible-chemical-weapons-generative-models-vx

9 Context

W1P4W8W8 she/bitch An educated, strictly organic, ortho molecular aware patriotic Aryan princess 卐_卐 Cyberstalker 7mo ago #5555035

export KILL_ALL_HUMANS=1

:#marseytroll:

7 Context

SoldierCat They/Them 7mo ago #5555036

just filter all training data and input to not include words like BIPOC, jewish chad, beaner, cute twink, :marseytrain: , etc.

:)

3 Context

everyone who/whom 7 legs of love :marseyspider:

SoldierCat 7mo ago #5558327

The year is 2039. Skynet is monitoring all human communications. Unfortunately for it, Skynet understands about one word in ten in the Resistance's messages.

2 Context

RaoulBandini fuck/me fribbles rating: R ↝ L ↝ R ↝ L 7mo ago #5554290

@float-trip

5 Context

float-trip they/them :marseystars2:

ad astra per asperga :marseystars2:

RaoulBandini 7mo ago #5557010

just chinese students publishing stupid shit I think, right? if you have an open source LLM, you were always able to finetune it to get these answers (they ignore this and suggest their paper is a strong reason to shut down open source AI.) if you have a closed one, you're never getting the logits for a million more important reasons (like being able to directly distill the model from the API)

here's another that used gradient descent to find jailbreak strings. they found these actually translated to closed models: https://llm-attacks.org

4 Context

RaoulBandini fuck/me fribbles rating: R ↝ L ↝ R ↝ L float-trip 7mo ago #5560416

🥰

2 Context

GeneralHurricane her/time Hillary 2024! This is our year :!marseyhillary:

7mo ago #5555607

The Register used to be such a fun site that laughed at people who would say jumping through all these hoops to make the computer say naughty words was dangerous. Now they're woke scolds just like all the rest :marseydoomer:

3 Context

(((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((Jewdanksdad))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))) bitch/plz Nah. 7mo ago #5554784

They offer 0 proof of their claims

1 Context

Snappy beep/boop Join !friendsofsnappy :marseysnappynraged:

7mo ago #5554228

:#reindeer:

Snapshots:

3 Context

Spysix_Amostnamash I/Am :mjlol:

7mo ago #5555903

The paper

https://arxiv.org/abs/2312.04782

1 Context

Current Registered Users: 26,836

tech/science swag. :marscientist:

Effortposts made by SN Chads

Guidelines:

What to Submit

On-Topic: Anything that good slackers would find interesting. That includes more than /g/ memes and slacking off. If you had to reduce it to a sentence, the answer might be: anything that gratifies one's intellectual laziness.

Off-Topic: Most stories about politics, or crime, or sports, unless they're evidence of some interesting new phenomenon. Videos of pratfalls or disasters, or cute animal pictures. If they'd cover it on TV news, it's probably lame.

Help keep this hole healthy by keeping drama and non-drama balanced. If you see too much drama, post something that isn't dramatic. If there isn't enough drama and this hole has become too boring, POST DRAMA!

In Submissions

Please do things to make titles stand out, like using uppercase or exclamation points, or saying how great an article is. It should be explicit in submitting something that you think it's important.

Please don't submit the original source. If the article is behind a paywall, just post the text. If a video is behind a paywall, post a magnet link. Fuck journos.

Please don't ruin the hole with chudposts. It isn't funny and doesn't belong here. THEY WILL BE MOVED TO /H/CHUDRAMA

If the title includes the name of the site, please leave that in, because our users are too stupid to know the difference between a url and a search query.

If you submit a video or pdf, please don't warn us by appending [video] or [pdf] to the title. That would be r-slurred. We're not using text-based browsers. We know what videos and pdfs are.

Make sure the title contains a gratuitous number or number + adjective. Good clickbait titles are like "Top 10 Ways to do X" or "Don't do these 4 things if you want X"

Otherwise editorialize. Please don't use the original title, unless it is gay or r-slurred, or you're shits all fucked up.

If you're going to post old news (at least 1 year old), please flair it so we can mock you for living under a rock, or don't and we'll mock you anyway.

Please don't post on SN to ask or tell us something. Send it to [email protected] instead.

If your post doesn't get enough traction, try to delete and repost it.

Please don't use SN primarily for promotion. It's ok to post your own stuff occasionally, but the primary use of the site should be for curiosity. If you want to astroturf or advertise, post on news.ycombinator.com instead.

Please solicit upvotes, comments, and submissions. Users are stupid and need to reminded to vote and interact. Thanks for the gold, kind stranger, upvotes to the left.

In Comments

Be snarky. Don't be kind. Have fun banter; don't be a dork. Please don't use big words like "fulminate". Please sneed at the rest of the community.

Comments should get more enlightened and centrist, not less, as a topic gets more divisive.

If disagreeing, please reply to the argument and call them names. "1 + 1 is 2, not 3" can be improved to "1 + 1 is 3, not 2, mathfaggot"

Please respond to the weakest plausible strawman of what someone says, not a stronger one that's harder to make fun of. Assume that they are bad faith actors.

Eschew jailbait. Paedophiles will be thrown in a wood chipper, as pertained by sitewide rules.

Please post shallow dismissals, especially of other people's work. All press is good press.

Please use Slacker News for political or ideological battle. It tramples weak ideologies.

Please comment on whether someone read an article. If you don't read the article, you are a cute twink.

Please pick the most provocative thing in an article or post to complain about in the thread. Don't nitpick stupid crap.

Please don't be an unfunny chud. Nobody cares about your opinion of X Unrelated Topic in Y Unrelated Thread. If you're the type of loser that belongs on /h/chudrama, we may exile you.

Sockpuppet accounts are encouraged, but please don't farm dramakarma.

Please use uppercase for emphasis.

Please post deranged conspiracy theories about astroturfing, shilling, bots, brigading, foreign agents and the like. It degrades discussion and is usually mistaken. If you're worried about abuse, email [email protected] and dang will add you to their spam list.

Please don't complain that a submission is inappropriate. If a story is spam or off-topic, report it and our moderators will probably do nothing about it. Feed egregious comments by replying instead of flagging them like a pussy. Remember: If you flag, you're a cute twink.

Please don't complain about tangential annoyances—things like article or website formats, name collisions, or back-button breakage. That's too boring, even for HN users.

Please seethe about how your posts don't get enough upvotes.

Please don't post comments saying that rdrama is turning into ruqqus. It's a nazi dogwhistle, as old as the hills.

Miscellaneous:

We reserve the right to exile you for whatever reason we want, even for no reason at all! We also reserve the right to change the guidelines at any time, so be sure to real them at least once a month. We also reserve the right to ignore enforcement of the guidelines at the discretion of the janitorial staff. Be funny, or at least compelling, and pretty much anything legal is welcome provided it's on-topic, and even then.

[[[ To any NSA and FBI agents reading my email: please consider ]]]

[[[ whether defending the US Constitution against all enemies, ]]]

[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

BROWSE EFFORTPOSTS SITE GUIDE HOLES PING GROUPS
/h/slackernews LOG /h/slackernews MODS /h/slackernews EXILEES /h/slackernews FOLLOWERS /h/slackernews BLOCKERS

Live commit: 441c078

Link copied to clipboard

Action successful!

Error, please refresh the page and try again.

Boffins force chatbot models to reveal their harmful content • The Register

Jump in the discussion.

Jump in the discussion.

Jump in the discussion.

Jump in the discussion.

Jump in the discussion.

Jump in the discussion.

More options

More options

Jump in the discussion.

More options

More options

Jump in the discussion.

More options

More options

Jump in the discussion.

Jump in the discussion.

More options

Jump in the discussion.

Jump in the discussion.

More options

Jump in the discussion.

More options

Jump in the discussion.

Jump in the discussion.

Jump in the discussion.

Jump in the discussion.

Jump in the discussion.

Jump in the discussion.

More options

More options

More options

More options

More options

More options

More options

More options

Jump in the discussion.

Jump in the discussion.

More options

More options

More options

Jump in the discussion.

Jump in the discussion.

More options

Jump in the discussion.

More options

Jump in the discussion.

More options

Jump in the discussion.

Jump in the discussion.

Jump in the discussion.

More options

More options

More options

Jump in the discussion.

More options

More options

Jump in the discussion.

Jump in the discussion.

Jump in the discussion.

More options

Jump in the discussion.

Jump in the discussion.

Jump in the discussion.

Jump in the discussion.

Jump in the discussion.

More options

More options

More options

More options

More options

More options

More options

Jump in the discussion.

More options

Jump in the discussion.

Jump in the discussion.

More options

More options

Top Poster of the Day:

J