Unable to load image

r/futurology: With Stable Diffusion, you may never believe what you see online again. AI image synthesis goes open source, with big implications. :!soymad::!soycry:

https://old.reddit.com/r/Futurology/comments/x8otzg/with_stable_diffusion_you_may_never_believe_what?sort=controversial

More: https://old.reddit.com/r/technology/comments/x7gky6/with_stable_diffusion_you_may_never_believe_what/?sort=controversial

AI image generation is here in a big way. A newly released open source image synthesis model called Stable Diffusion allows anyone with a PC and a decent GPU to conjure up almost any visual reality they can imagine. It can imitate virtually any visual style, and if you feed it a descriptive phrase, the results appear on your screen like magic.

Some artists are delighted by the prospect, others aren't happy about it, and society at large still seems largely unaware of the rapidly evolving tech revolution taking place through communities on Twitter, Groomercord, and Github. Image synthesis arguably brings implications as big as the invention of the camera---or perhaps the creation of visual art itself. Even our sense of history might be at stake, depending on how things shake out. Either way, Stable Diffusion is leading a new wave of deep learning creative tools that are poised to revolutionize the creation of visual media.

The rise of deep learning image synthesis

Stable Diffusion is the brainchild of Emad Mostaque, a London-based former hedge fund manager whose aim is to bring novel applications of deep learning to the masses through his company, Stability AI. But the roots of modern image synthesis date back to 2014, and Stable Diffusion wasn't the first image synthesis model (ISM) to make waves this year.

In April 2022, OpenAI announced DALL-E 2, which shocked social media with its ability to transform a scene written in words (called a "prompt") into myriad visual styles that can be fantastic, photorealistic, or even mundane. People with privileged access to the closed-off tool generated astronauts on horseback, teddy bears buying bread in ancient Egypt, novel sculptures in the style of famous artists, and much more.

Not long after DALL-E 2, Google and Meta announced their own text-to-image AI models. MidJourney, available as a Groomercord server since March 2022 and open to the public a few months later, charges for access and achieves similar effects but with a more painterly and illustrative quality as the default.

Then there's Stable Diffusion. On August 22, Stability AI released its open source image generation model that arguably matches DALL-E 2 in quality. It also launched its own commercial website, called DreamStudio, that sells access to compute time for generating images with Stable Diffusion. Unlike DALL-E 2, anyone can use it, and since the Stable Diffusion code is open source, projects can build off it with few restrictions.

In the past week alone, dozens of projects that take Stable Diffusion in radical new directions have sprung up. And people have achieved unexpected results using a technique called "img2img" that has "upgraded" MS-DOS game art, converted Minecraft graphics into realistic ones, transformed a scene from Aladdin into 3D, translated childlike scribbles into rich illustrations, and much more. Image synthesis may bring the capacity to richly visualize ideas to a mass audience, lowering barriers to entry while also accelerating the capabilities of artists that embrace the technology, much like Adobe Photoshop did in the 1990s.

You can run Stable Diffusion locally yourself if you follow a series of somewhat arcane steps. For the past two weeks, we've been running it on a Windows PC with an Nvidia RTX 3060 12GB GPU. It can generate 512×512 images in about 10 seconds. On a 3090 Ti, that time goes down to four seconds per image. The interfaces keep evolving rapidly, too, going from crude command-line interfaces and Google Colab notebooks to more polished (but still complex) front-end GUIs, with much more polished interfaces coming soon. So if you're not technically inclined, hold tight: Easier solutions are on the way. And if all else fails, you can try a demo online.

How stable diffusion works

Broadly speaking, most of the recent wave of ISMs use a technique called latent diffusion. Basically, the model learns to recognize familiar shapes in a field of pure noise, then gradually brings those elements into focus if they match the words in the prompt.

To get started, a person or group training the model gathers images with metadata (such as alt tags and captions found on the web) and forms a large data set. In Stable Diffusion's case, Stability AI uses a subset of the LAION-5B image set, which is basically a huge image scrape of 5 billion publicly accessible images on the Internet. Recent analysis of the data set shows that many of the images come from sites such as Pinterest, DeviantArt, and even Getty Images. As a result, Stable Diffusion has absorbed the styles of many living artists, and some of them have spoken out forcefully against the practice. More on that below.

Next, the model trains itself on the image data set using a bank of hundreds of high-end GPUs such as the Nvidia A100. According to Mostaque, Stable Diffusion cost $600,000 to train so far (estimates of training costs for other ISMs typically range in the millions of dollars). During the training process, the model associates words with images thanks to a technique called CLIP (Contrastive Language--Image Pre-training), which was invented by OpenAI and announced just last year.

Through training, an ISM using latent diffusion learns statistical associations about where certain colored pixels usually belong in relation to each other for each subject. So it doesn't necessarily "understand" their relationship at a high level, but the results can still be stunning and surprising, making inferences and style combinations that seem very intelligent. After the training process is complete, the model never duplicates any images in the source set but can instead create novel combinations of styles based on what it has learned. The results can be delightful and wildly fun.

At the moment, Stable Diffusion doesn't care if a person has three arms, two heads, or six fingers on each hand, so unless you're a wizard at crafting the text prompts necessary to generate great results (which AI artists sometimes call "prompt engineering"), you'll probably need to generate lots of images and cherry-pick the best ones. Keep in mind that the more a prompt matches captions for known images in the data set, the more likely you'll get the result you want. In the future, it's likely that models will improve enough to reduce the need for cherry-picking---or some kind of internal filter will do the picking for you.

Ethical and legal concerns abound

As hinted above, Stable Diffusion's public release has raised alarm bells among people who fear its cultural and economic impact. Unlike DALL-E 2, Stable Diffusion's training data (the "weights") are available for anyone to use without any hard restrictions. The official Stable Diffusion release (and DreamStudio) includes automatic "NSFW" filters (nudity) and an invisible tracking watermark embedded in the images, but these restrictions can easily be circumvented in the open source code. This means Stable Diffusion can be used to create images that OpenAI currently blocks with DALL-E 2: propaganda, violent imagery, pornography, images that potentially violate corporate copyright, celebrity deepfakes, and more. In fact, there are already some private Groomercord servers dedicated to pornographic output from the model.

To be clear, Stable Diffusion's license officially forbids many of these uses, but with the code and weights out in the wild, enforcement will prove very difficult, if not impossible. When presented with these concerns, Mostaque said that he feels the benefits of having this kind of tool out in the open where it can be scrutinized outweigh the potential drawbacks. In a short interview, he told us, "We believe in individual responsibility and agency. We included an ethical use policy and tools to mitigate harm."

Also, Stable Diffusion has drawn the ire of artists on Twitter due to the model's ability to imitate the style of living artists, as mentioned above. (And despite the claims of some viral tweets, Stability AI has never advertised this ability. One of the most shared tweets mistakenly pulled from an independent study done by an AI researcher.) In the quest for data, the image set used to train Stable Diffusion includes millions of pieces of art gathered from living artists without consultation with the artists, which raises profound ethical questions about authorship and copyright. Scraping the data appears lawful by US legal precedent, but one could argue that the law might be lagging behind rapidly evolving technology that upends previous assumptions about how public data might be utilized.

As a result, if image synthesis technology becomes adopted by major corporations in the future (which may be coming soon---"We have a collaborative relationship with Adobe," says Mostaque), companies might train their own models based on a "clean" data set that includes licensed content, opt-in content, and public domain imagery to avoid some of these ethical issues, even if using an Internet scrape is technically legal. We asked Mostaque if he had any plans along these lines, and he replied, "Stability is working on a range of models. All models by ourselves and our collaborators are legal within their jurisdictions."

Another issue with diffusion models from all vendors is cultural bias. Since these ISMs currently work by scraping the Internet for images and their related metadata, they learn social and cultural stereotypes present in the data set. For example, early on in the Stable Diffusion beta on its Groomercord server, testers found that almost every request for a "beautiful woman" involved unintentional nudity of some kind, which reflects how Western society often depicts women on the Internet. Other cultural and racist stereotypes abound in ISM training data, so researchers caution that it should not be used in a production environment without significant safeguards in place, which is likely one reason why other powerful models such as DALLE-2 and Google's Imagen are still not broadly available to the public.

While concerns about data set quality and bias echo strongly among some AI researchers, the Internet remains the largest source of images with metadata attached. This trove of data is freely accessible, so it will always be a tempting target for developers of ISMs. Attempting to manually write descriptive captions for millions or billions of images for a brand-new ethical data set is probably not economically feasible at the moment, so it's the heavily biased data on the Internet that is currently making this technology possible. Since there is no universal worldview across cultures, to what degree image synthesis models filter or interpret certain ideas will likely remain a value judgment among the different communities that use the technology in the future.

What comes next

If historical trends in computing are any suggestion, odds are high that what now takes a beefy GPU will eventually be possible on a pocket smartphone. "It is likely that Stable Diffusion will run on a smartphone within a year," Mostaque told us. Also, new techniques will allow training these models on less expensive equipment over time. We may soon be looking at an explosion in creative output fueled by AI.

Stable Diffusion and other models are already starting to take on dynamic video generation and manipulation, so expect photorealistic video generation via text prompts before too long. From there, it's logical to extend these capabilities to audio and musicreal-time video games, and 3D VR experiences. Soon, advanced AI may do most of the creative heavy lifting with just a few suggestions. Imagine unlimited entertainment generated in real-time, on demand. "I expect it to be fully multi-modal," said Mostaque, "so you can create anything you can imagine, like the Star Trek holodeck experience."

ISMs are also a dramatic form of image compression: Stable Diffusion takes hundreds of millions of images and squeezes knowledge about them into a 4.2GB weights file. With the correct seed and settings, certain generated images can be reproduced deterministically. One could imagine using a variation of this technology in the future to compress, say, an 8K feature film into a few megabytes of text. Once that's the case, anyone could compose their own feature films that way as well. The implications of this technology are only just beginning to be explored, so it may take us in wild new directions we can't foresee at the moment.

Realistic image synthesis models are potentially dangerous for reasons already mentioned, such as the creation of propaganda or misinformation, tampering with history, accelerating political division, enabling character attacks and impersonation, and destroying the legal value of photo or video evidence. In the AI-powered future, how will we know if any remotely produced piece of media came from an actual camera, or if we are actually communicating with a real human? On these questions, Mostaque is broadly hopeful. "There will be new verification systems in place, and open releases like this will shift the public debate and development of these tools," he said.

That's easier said than done, of course. But it's also easy to be scared of new things. Despite our best efforts, it's difficult to know exactly how image synthesis and other AI-powered technologies will affect us on a societal scale without seeing them in wide use. Ultimately, humanity will adapt, even if our cultural frameworks end up changing radically in the process. It has happened before, which is why the Ancient Greek philosopher Heraclitus reportedly said, "The only constant is change."

In fact, there's a photo of him saying that now, thanks to Stable Diffusion.

https://arstechnica.com/information-technology/2022/09/with-stable-diffusion-you-may-never-believe-what-you-see-online-again/

80
Jump in the discussion.

No email address required.

This is a real image you can trust, you can tell because it wasn't created by AI, but was instead created by a dramanaut while he was taking a shit.

![](/images/16627429148776164.webp)

Jump in the discussion.

No email address required.

I already miss her

Jump in the discussion.

No email address required.

![](/images/16627431261495492.webp)

Jump in the discussion.

No email address required.

Really sad that they couldn't afford to bury the other half of her

Jump in the discussion.

No email address required.

Where do you get this fire? 🔥

Jump in the discussion.

No email address required.

I made the Evangelion one while taking a shit, I stole the second one from Carp

Jump in the discussion.

No email address required.

the yoinker becomes the yoinked


https://i.rdrama.net/images/1707881499271494.webp https://i.rdrama.net/images/17101210991135056.webp

Jump in the discussion.

No email address required.

![](/images/16627571161378286.webp)

Jump in the discussion.

No email address required.

Is this real?

Jump in the discussion.

No email address required.

This is art

Jump in the discussion.

No email address required.

Such moving art piece, an A.I robot would never be able to come up with this kind of creativity surely! :marseyclueless:

Jump in the discussion.

No email address required.

>noooo you must make tech proprietary so every is locked down and centralized or else checks notes people will be free to do whatever they like!

Take me back to the 2006 internet

Jump in the discussion.

No email address required.

it's open source but still proprietary

Jump in the discussion.

No email address required.

oops I copied your source code, oh no its

proprietaryyyyy

oopsie just copied and pasted it again

:#marseyrapscallion:

Jump in the discussion.

No email address required.

THATS AGAINST THE RULES! IM GONNA SUE YOUUUUUUUUUUUU

:#soycry:

Jump in the discussion.

No email address required.

:marseylongpost:

This changes literally nothing, individual photos and videos haven’t been reliable evidence for, what, 30+ years? There’s a reason you need chain of custody to admit them in court.

Anyone who DID believe literally anything at all on the internet in current year was too r-slurred to help anyway

Jump in the discussion.

No email address required.

This changes literally nothing, individual photos and videos haven’t been reliable evidence for, what, 30+ years? There’s a reason you need chain of custody to admit them in court.

at least since the invention of the photography

Jump in the discussion.

No email address required.

Before photography, photos were completely reliable :marseyagree:

Jump in the discussion.

No email address required.

There were those stupid butt fairy photos that convinced Sir Arthur Conan Doyle. Imagine inventing Sherlock Holmes and also being that r-slurred.

![](/images/16627464324395144.webp)

Jump in the discussion.

No email address required.

There were those stupid butt fairy photos that convinced Sir Arthur Conan Doyle. Imagine inventing Sherlock Holmes and also being that r-slurred.

yea, pulp fiction writers and pop culture novelists tend to be beacons of enlightenment in all aspects of life :marseyeyeroll:

Jump in the discussion.

No email address required.

Dont be an r-slur shouldn't be a high bar to cross.

Jump in the discussion.

No email address required.

As much as I agree, this does reduce the general reliability of visual evidence. We basically have to go back to medieval methods of hard and testimony evidence only.

Jump in the discussion.

No email address required.

Next youre going to learn about white out and tell me we can't use any documents in court.

Jump in the discussion.

No email address required.

I would probably use the Xerox character substitution bug as an example for not using any documents in court that were in contact with a xerox device ;)

An example is :marseyobama:'s birth certificate

Jump in the discussion.

No email address required.

I support anything that might bring back judicial dueling. In the absence of evidence let the man with the strongest conviction win.

:#marseybountyhunter: :!#marseybountyhunter:

Jump in the discussion.

No email address required.

Anyone who DID believe literally anything at all on the internet in current year was too r-slurred to help anyway

@Wewser :marseysmug2: Did Chris Chan actually escape or was that fake?

Jump in the discussion.

No email address required.

![](https://media.giphy.com/media/xT0GqgeTVaAdWZD1uw/giphy.webp)

Jump in the discussion.

No email address required.

:marseycheerup:

We all have our moments :marseythumbsup:

Jump in the discussion.

No email address required.

Chris Chan doesn’t even real, r-slur

Jump in the discussion.

No email address required.

I think it's not so much an issue of legally admissiblle evidence but rather just, all the r-slurs who believe everything they see on TV / the interwebz. It's not any different from now, tards fall for photoshops all the time, but it's much, much easier now to make somewhat believable fake shit for propaganda.

Jump in the discussion.

No email address required.

I remember reading this same :soycry: article in 2015 about deepfakes lol

Jump in the discussion.

No email address required.

With Adobe Photoshop, you may never believe what you see online again. High-tech photo manipulation goes commercial, with big implications.

Jump in the discussion.

No email address required.

Im still sad that no state actor has used deepfakes to announce incoming nukes, or an admission to a global p-dophile cabal, or whatever. The best we got was a shitty 'zelensky surrenders' video. Get your shit together glowies

Jump in the discussion.

No email address required.

If you honestly believe images/vids/etc are reliable NOW and the issue is this tech being on the hands of plebs instead of Google and friends, you're fricking r-slurred and shouldn't own a computer.

Jump in the discussion.

No email address required.

Wait till this guy learns about photoshop :marseypearlclutch:

Jump in the discussion.

No email address required.

I’ve actually played around with one of those AI art generators and what I’m actually seeing is not the most impressive thing in the world. What it looks like is neural nets are being used to slightly tune the properties of 3-D models and permutations from them, the surface materials, lighting, and position and pose. Then they are rendered and essentially given an image filter on the final outfit style.

Any halfway competent engineer with an army of Third World child slaves to catalog and identify images to feed into a neural net can do this you don’t need to have a multi billion dollar corporation

Jump in the discussion.

No email address required.

>mfw the elites lose contr

Jump in the discussion.

No email address required.

:#marseybrainlet:

Snapshots:

https://old.reddit.com/r/technology/comments/x7gky6/with_stable_diffusion_you_may_never_believe_what/?sort=controversial:

Stable Diffusion:

are delighted:

aren't happy about it:

might be at stake:

to 2014:

DALL-E 2:

Google:

Meta:

MidJourney:

released:

DreamStudio:

sprung up:

"upgraded":

converted Minecraft graphics:

into 3D:

childlike scribbles:

Jump in the discussion.

No email address required.

which AI artists sometimes call "prompt engineering"

:marseysoypoint::marsoyhype::marsoy::marseysoylentgrin::marseyshitforbrains:

Jump in the discussion.

No email address required.

If you ever feel useless, just remember that the stable diffusion has a license forbids you from doing anything sketchy with it

You agree not to use the Model or Derivatives of the Model:

:marseycop: - In any way that violates any applicable national, federal, state, local or international law or regulation;

:marseypedo: - For the purpose of exploiting, harming or attempting to exploit or harm minors in any way;

:sciencejak: - To generate or disseminate verifiably false information and/or content with the purpose of harming others;

:marseyjourno: - To generate or disseminate personal identifiable information that can be used to harm an individual;

:soycry: - To defame, disparage or otherwise harass others;

:marppy: - For fully automated decision making that adversely impacts an individual’s legal rights or otherwise creates or modifies a binding, enforceable obligation;

:marseyblackface: - For any use intended to or which has the effect of discriminating against or harming individuals or groups based on online or offline social behavior or known or predicted personal or personality characteristics;

:marseybrainlet: - To exploit any of the vulnerabilities of a specific group of persons based on their age, social, physical or mental characteristics, in order to materially distort the behavior of a person pertaining to that group in a manner that causes or is likely to cause that person or another person physical or psychological harm;

:marseysjw: - For any use intended to or which has the effect of discriminating against individuals or groups based on legally protected characteristics or categories;

:marseymeds: - To provide medical advice and medical results interpretation;

:marseyfortuneteller: - To generate or disseminate information for the purpose to be used for administration of justice, law enforcement, immigration or asylum processes, such as predicting an individual will commit fraud/crime commitment (e.g. by text profiling, drawing causal relationships between assertions made in documents, indiscriminate and arbitrarily-targeted use).

Jump in the discussion.

No email address required.

So I'll just ask the coomer question: can a software like this take a clothed picture of someone and generate a nude version?

Jump in the discussion.

No email address required.

Jump in the discussion.

No email address required.

Yeah it’s not very good tho

Jump in the discussion.

No email address required.

Every time I read an article like this I start wishing for lolbert open source mcnukes out of spite

Jump in the discussion.

No email address required.

It does make me wonder if there will be a push to move back to analog for capturing information that needs to be verified as authentic.

Kind of like Battlestar Galactica, but for much more boring reasons.

Jump in the discussion.

No email address required.

AI ethicists blown the frick out once again. I can only hope that those utter scum are all unemployed or dead within the next 10 years, and that those openAI fricks aren't still holding technology back from the people and deciding arbitrarily who's allowed to use it.

Jump in the discussion.

No email address required.

Reminder that ai will get better at detecting fakes too

Jump in the discussion.

No email address required.

Link copied to clipboard
Action successful!
Error, please refresh the page and try again.