emoji-award-soyjakwow

Giga nerds port DOOM to run on stable diffusion, hallucinates 20 FPS based on user input. :doomdad: :marseydoomguy1:

This is video of someone playing it. It's 100% generated images @ 20 FPS with only a 3-second "memory" of the previous frames and user input which is enough to infer literally everything else for long periods of gameplay. There is no polygons or rendering going on, it's literally making shit up as it goes along based on the model's neural network training or some shit blah blah blah
Article w/more videos:
https://gamengen.github.io/
Diffusion Models Are Real-Time Game Engines

Full PDF Paper:

https://arxiv.org/pdf/2408.14837

ABSTRACT:

We present GameNGen, the first game engine powered entirely by a neural model that enables real-time interaction with a complex environment over long trajectories at high quality. GameNGen can interactively simulate the classic game DOOM at over 20 frames per second on a single TPU. Next frame prediction achieves a PSNR of 29.4, comparable to lossy JPEG compression. Human raters are only slightly better than random chance at distinguishing short clips of the game from clips of the simulation. GameNGen is trained in two phases: (1) an RL-agent learns to play the game and the training sessions are recorded, and (2) a diffusion model is trained to produce the next frame, conditioned on the sequence of past frames and actions. Conditioning augmentations enable stable auto-regressive generation over long trajectories.

(...)

https://i.rdrama.net/images/1724859064997468.webp

Summary. We introduced GameNGen, and demonstrated that high-quality real-time game play at 20 frames per second is possible on a neural model. We also provided a recipe for converting an interactive piece of software such as a computer game into a neural model.

Limitations. GameNGen suffers from a limited amount of memory. The model only has access to a little over 3 seconds of history, so it's remarkable that much of the game logic is persisted for drastically longer time horizons. While some of the game state is persisted through screen pixels (e.g. ammo and health tallies, available weapons, etc.), the model likely learns strong heuristics that allow meaningful generalizations. For example, from the rendered view the model learns to infer the player's location, and from the ammo and health tallies, the model might infer whether the player has already been through an area and defeated the enemies there. That said, it's easy to create situations where this context length is not enough. Continuing to increase the context size with our existing architecture yields only marginal benefits (Section 5.2.1), and the model's short context length remains an important limitation. The second important limitation are the remaining differences between the agent's behavior and those of human players. For example, our agent, even at the end of training, still does not explore all of the game's locations and interactions, leading to erroneous behavior in those cases.

!oldstrags !g*mers @pizzashill

In AI Nvidia future, game plays you :marseycool:

161
Jump in the discussion.

No email address required.

I figured this was the future nvidier wanted. This will replace graphics and be viable on consumer hardware before full path trace rendering

Jump in the discussion.

No email address required.

Detailed physics models for realistically simulating light

Making shit up. :gigachad4:

Jump in the discussion.

No email address required.

But this is just making an AI convincingly recreate existing graphics, the only reason it can keep track of anything is because it got fed a lot of pictures of an already existing level :marseytwerking:

Jump in the discussion.

No email address required.

I mean with the 3d world and game logic underneath

You could path trace a bunch of screenshots with the camera randomly oriented in any point the playr could be as training data

Then the real time engine would give a crude frame that has all the important info about where u is and what's onscreen

That would be img2img into a photerrealistic final tender

Potentially much faster than real graphic

And more generalizeable. They did a demo once where the polygons were just labeled and it made up the textures

Jump in the discussion.

No email address required.

Let me rephrase it into a simpler argument. With low frames like this, and the inherent latency of cloud computing, you would only be able to make a single player game, completely eliminating the prospect of long-term income, said long-term income required to pay for the cloud access required for your game to run properly. The finances simply can't line up.

Jump in the discussion.

No email address required.

This tech is literally brand new

It's insane how much functionality m6 6 year old GPU has gained since I bought it.

And none of that is in the ray tracing department

That's not getting any faster but ai inferencing is, on the same hardware.

I think could be 144fps before RT, on future hardware :marseyshrug:

Jump in the discussion.

No email address required.

>This tech is literally brand new

First of, no lol, machine learning isn't anything new.

And besides this isn't even a question about how new the tech is, it's an inherent feature of a neural network, It only gets more and more demanding as the tech becomes more and more advanced, such is the nature of it and machine learning as a whole.

>I think could be 144fps before RT, on future hardware :marseyshrug:

With a 1500 ms delay or more, do you just not understand the concept of latency from calling back to a central server? That's nVdia's actual business model btw, supplying GPUs for AI Hypercomputers as they are called. In addition, the researchers themselves said that this could maybe reach 50 fps with model optimization, 144 fps is absurd, images are massive in comparison to the text produced by say GPT-4 which still needs massive computers to call back to for respectable results.

>That's not getting any faster but ai inferencing is, on the same hardware.

It's literally not the same hardware, ray tracing is done by a single board, "ai inferencing" is done by thousands of boards as strong as one capable of performing raytracing. The hardware used for something like this is not consumer grade, which is something also stated in the paper that you clearly did not read.

>144 fps

This is pretty much impossible, you have to consider that the process only gets more and more bottlenecked as it goes on, it's not your little online imagegen that shits out five pictures after a few seconds which could all be made simultaneously, it can only make a new frame once the prior frame has been rendered

Jump in the discussion.

No email address required.

Please keep cooking

Jump in the discussion.

No email address required.

Ok firstly I don't care about this enough t. :marseyshrug:

I can't pretend. :marseyshy: I wrote another paragraph

But I mean yes machine learning is old

I'm talking abt this recent surge. the future stuff will need more compute but we're in an era where ppl are still discovering new leaps that get 2x performance for free. And who knows what undiscovered architectures out there might do it even faster than that. Ray tracing is solved math.

Second idk what ur on about with cloud

I'm talking about what might be possible locally

The hardware used for something like this is not consumer grade

My 2070 couldn't generate images when I got it. Now it can

A year ago it took minutes for a crusty 512x512

Now it takes 20s for a 1024x of much better quality.


All im sayin is a consumer card might be able to do something like this in real-time before it's able to do more than a couple samples per pixel raytrussy in a harmless bid of tech optimism and ur mad. :marseyfluffyannoyed: for no reason can we just be happy :marseyfluffy: and silly online

Jump in the discussion.

No email address required.

>Potentially much faster than real graphic

I mean no, it would have to be connected to a massive central server to achieve that, which would also add massive latency, that's one thing that neural networks will always suffer from. This very simple game was also run on a cloud hypercomputer specifically designed to run neural network operations and could only achieve 20 fps.

>And more generalizeable

We have already achieved generalizeable graphical capability though, that's the point of shit like Unreal, if you make your own engine it's because you want something non-generalized by intention, often unnecessarily like Braid.

Jump in the discussion.

No email address required.

Link copied to clipboard
Action successful!
Error, please refresh the page and try again.