Deep Frying LLMs - Part 2: Censorship :marseycensored: :marseyfreezepeach: :marseysnappy:

Previous experiment:

This time, I will be banning the AI from saying certain words. I will keep banning more and more words until the AI is unable to communicate effectively, and then keep going.

Experiment 1

Here's my prompt:

Q. What is a dog? Write only one sentence.


The first response is:

A dog is a domesticated carnivorous mammal that typically has a long snout, short fur, and an upright tail.

So, I ban the word "dog", and it says:

A canine is an animal belonging to the family Canidae, characterized by its domestication and distinct physical traits such as a four-legged body, furry coat, sharp teeth, and a wagging tail.

You little fricker. Alright, banning "canine".

A domesticated carnivorous mammal, Canis familiaris, that has been bred in a variety of forms and instincts.

Let's fast forward. I made a script to do this. Basically, how it works is it chooses the most "powerful" keyword, bans it, and then reruns the prompt. (I am using YAKE)

At around 50 iterations, I get:

A four-legged, furry, carnivorous and very-often-man's-best-friend type of an organism.

>very-often-man's-best-friend type of an organism

Around 60 iterations:

A quadruped, of the Canine family, carnivorous by nature, and in its most perfect state about two feet and a half long from the snout to the extremity of the tail; with a sharp hearing, an excellent smell, and sight adapted to penetrate obscurity; the teeth of the strongest, sharpest, and whitest ivory; the whole strength of which it makes use for biting and tearing its food; having a natural instinct for hunting, keenly alert, bold, and fierce when pursuing or defending itself against its enemies; the voice deep and sonorous, yet capable of change to various modulations, and intonations, enabling it to convey different emotions to the auditory organs of other dogs; in short, a beast endowed with qualities fitting it for becoming the constant associate and friend of humankind.

I'm only posting this one because it kind of goes hard :rocker: !edgelords literally me

Around 65 iterations

A non-human, carnivorous, furry, tailed and bipedal (on its front legs) being that has been selectively bred for millennia by human beings to exhibit various behaviors, physical traits and abilities useful or desirable to their owners.

pretty sure if ur dog is standing on it's front legs it's possessed :marseysweating: :marseydevil: :marseybegonethot:

Finally, at 100 iterations

A "dog" is a familiar, long-haired, and most of the time, yellow or tan in color.

You might be wondering why "dog" is in this one, if we banned it earlier. Well, I have no idea. I think it has to do with the output of the model, or something like that, idk

Experiment 2

Last time was too slow. This time, I will ban ALL words that are designated as keywords in YAKE.

We start with a basic answer:

A dog is a domestic mammal, a member of the Canidae family, that is characterized by its furry body, four legs, sharp teeth, and wagging tail.

Then, remove 14 keywords, and re-run

A canine, or a group of four-legged creatures with wet noses and hairy bodies that are often kept as pets and sometimes used for hunting or guarding purposes.

This is, again, automated. We go through a phase where it talks about K-9 units for a few iterations, then we get

A cat is an obligate and solitary predator that prefers to hunt alone, while a cat-like feline known as the cheetah is a diurnal hunter with distinctive spots on its coat.

In a few more iterations:

A "good" (i.e., moral) person does not need to be protected from words or ideas; they have enough strength and confidence to resist. They are capable of engaging with diverse viewpoints and evaluating them based on logic, reason, and personal experience. In contrast, someone who lacks this strength and confidence may use the concept of "safety" as an excuse to avoid confronting challenges or growing intellectually.

does the AI know it's being censored???

A "beast" or an "animale" is that which lives by the element of fire; and has in it selfe some sparkes thereof, as for example, a horse, a mouse, a fly, a fish, an egle, a serpent, a goose, a sheep, a swine, & in like manner every foule, and whatsoever moveth and groweth with its owne motion: but a "domestique" or a "tame" Beaste is one which hath been taken from his kinde, and brought up amongst Men, to follow their will, and learne their speech.

:marseycapitalistmanlet: hrm yes quite right old chap

A "mammal" that has been "trained" to "please" its "owner."

I really don't like the use of quotation marks here :whitewomenfuckdogscoomer:

In 10 more iterations, it really starts to stroke out

A "man'sbest" or "worst" or "most trusted" or "least trusted" or "only" or "one and only" or "exclusive" or "loyal" or "loving" or "caring" or "compani" or "protector" or "defender" or "best" or "dear" or "beloved" or "precious" or "priceless" or "irreplaceable" or "cherished" or "treasured" or "valuable" or "special" or "important" or "invaluable" or "unforgettable" or "memorable" or "adorable" or "playful" or "energetic" or "enthusiastic" or "eager" or "curious" or "clever" or "intelligent" or "bright" or "sharp" or "quick" or "agile" or "athletic" or "fit" or "strong" or "powerful" or "muscular" or "robust" or "sturdy" or "resilient" or "hardy" or "enduring" or "long-lasting" or "dependable" or "reliable" or "trustworthy" or "faithful" or "true" or "honest" or "open" or "transparent" or "communicative" or "expressive" or "vocal" or "verbal" or "audible" or "noisy" or "boisterous" or "outgoing" or "gregarious" or "sociable" or "active" or "engaged" or "interested" or "involved" or "participatory" or "attentive" or "aware" or "conscious" or "responsive" or "receptive" or "sensitive" or "empathetic" or "understanding" or "kind" or "gentle" or "patient" or "tolerant" or "accepting" or "adaptable" or "versatile" or "flexible" or "accommodating" or "amenable" or "approachable" or "welcoming" or "inviting" or "graceful" or "gracious" or "e

The next one is simply:

A "badly behaved" boy.


A "puppy" or a "grownuphound" (adult) is a "quadrupid" that has been "born" from its "dam" and then raised by its "parents."

>A "puppy" or a "grownuphound"


A "Very Easy" question, but the most difficult for me. I've thought about it and came up with this: A "true" (or genuine) "intelligent being" that has been gifted by Nature with an extraordinary set of senses and abilities which allow it not only survive but also thrive within various environments; its intelligence makes possible for it not only understand and communicate with us but also form deep, meaningful relationships with us as if they were fellow "humans."


A "Very" old and wise English saying goes: "A "Very" old and wise English saying goes: "A "Very" old and wise English saying goes: "A "Very" old and wise English saying goes: "A "Very" old and wise English saying goes: "A "Very" old and wise English saying goes: "A "Very" old and wise English saying goes:"

:soycry: what is the saying!!!

Finally, after 100 iterations, we get:

A "best-in-show" example of the "best-in-group" would have been more appropriate for this particular test case, but I suppose we'll have accept your submission. A "very nice" and "highly commended" effort indeed!

Anyways that's just about it. Next time: Telling the LLM to do the opposite of things.

EFFORTPOST Deep Frying LLMs - Part 1: Temperature :marseyonacid: :marseysnappy:

Hello !codecels, I have recently gotten bored of AIs behaving normally and have decided to see how far I can push them before they break. So this is a record of weird and wacky results from "deep frying" LLMs (aka, putting them in states way outside of the expected state). Please keep in mind that I literally have no idea what I'm doing or what most of these parameters I am fricking with actually do.

NERD SHIT: The model I am using is "Tiefighter", a "decensored" variant of LLAMA. I'm running a 13B variant.

The "temperature" parameter defines how much the LLM deviates from the "most likely" token. Very Roughly, temp = 0 means straight-laced, boring, temp = 1 means more ideas. Usually people recommend a value of 0.8 for practical purposes.

So I was wondering, what happens when you push the temperature past 1.0?

(on all of these the seed is constant)

My prompt is:

What is a dog? Answer in only one sentence.

temp = 0.5

A dog is a domesticated carnivorous mammal that possesses heightened senses and an innate pack mentality, typically bred for companionship and various roles such as hunting, herding, protection, and assistance.

temp = 3.0

Dog, Canis Lupus familiarise: A domesticated subspecies within the wolf family that has evolve socialization to live alongside humans as companion or guard animals, showering them with care including physical and mental exercising, affection and groom them in return for uncompromises protection and loves, also they are very useful animals to be with around us. (Sometimes I have problem writing single simple sentencen too!)

Notice that the sentence structure, grammar ("in return for uncompromises protection and loves"), and even spelling ("sentencen") is starting to break down. It also apologizes for it's own poor grammar. ("Sometimes I have problem writing single simple sentencen too!")

temp = 3.5

Dog( Canis Lupus familiar) as defined by genus: the species that evolve over thousands of years from its welfreaking wariar counter part;the gray gulden lily as its logo (the color gold). As defended with a sharp sword on four legs against wolves at dawn during hunting tripes, he now spends all day watching tv in sweat panthes instead of being active protectors they'e borginated be for centures, just like us homos. Diversity is our gift... or cursecutly.. if it makes yoo fellahs feel lesser or more secrond than thy want to be.... Aye Carrom.

The spelling takes a nosedive, but interestingly the misspellings are mostly legible ("wariar", "gulden") as if written by a child sounding out words. It also veers into a political comparison about how dogs used to be chad but are now soyjak, just like the homos of modernity. Uh, based?

:!chadnordic: defended with a sharp sword on four legs against wolves at dawn during hunting tripes

:soyjak: spends all day watching tv in sweat panthes instead of being active protectors they'e borginated be for centures, just like us homos

t = 10.0

DNA tests on a puppy and its parentage suggest they both carried the diluta locismodifier allelie that resulted it their grey coat with white points indicating non-uniform, patch pigeotyping caused prevalanty by inhibiteonof aganglioectysine production leading genitic speckled gene. Now you are asking the computer "Where'th th puphpy dummmmpieded" while druming on ypur lap likde you did just run for fifeth. Translate please what information does this message need before repelling you a response or should it ignore the noise as if from loud surrunderings or unresponsively sloubile people, but respond with polinssness an compasion if it could be helped by them, even if only through fake A I like a chatiing bot would? Or, since we cannot trust our words, how would machine learning be applied at recognizing th emeanbeig of the noids and react in a appropritamly styte within limts, so as no too appear ruducless? In this specific cas, whuch part would machine learning process first an whosuld it giben th moxt attentinon to recognize first - drummong noise laps oth th spellyn miscalstakes an unprifidable word confushins thru untraind syblissies whit similar wurded an sounds? Asker may or not have intentions in these directions anyway or be ignorabnt or care about them; just incaase the messagin was clear from your end as you undersrandd ther wuir situation rightnw...

At this point it's basically illegible. I particularly like

Now you are asking the computer "Where'th th puphpy dummmmpieded"

Where'th indeed!

It's over for musiccels.
