https://github.com/deepseek-ai/Janus/blob/main/janus_pro_tech_report.pdf
USA lost, China Won, glory to the CCP!
Orange Site:
https://news.ycombinator.com/item?id=42843131
BREAKING: DeepSeek officially announces another open-source AI model, Janus-Pro-7B.
— The Kobeissi Letter (@KobeissiLetter) January 27, 2025
This model generates images and beats OpenAI's DALL-E 3 and Stable Diffusion across multiple benchmarks. pic.twitter.com/FSJkelcaYP
NEWS: DeepSeek just dropped ANOTHER open-source AI model, Janus-Pro-7B.
— Rowan Cheung (@rowancheung) January 27, 2025
It's multimodal (can generate images) and beats OpenAI's DALL-E 3 and Stable Diffusion across GenEval and DPG-Bench benchmarks.
This comes on top of all the R1 hype. The 🐋 is cookin' pic.twitter.com/yCmDQoke0f
JUST IN:
— Megatron (@Megatron_ron) January 27, 2025
Another blow is coming from the Chinese DeepSeek AI
They launched now a multimodal "Janus-Pro-7B" model with image input and output. pic.twitter.com/akEfi9Zyzq
🚨
— Liang Wenfeng 梁文锋 (@LiangWenfeng_) January 27, 2025
DeepSeek just dropped ANOTHER open-source AI model, Janus-Pro-7B.
It's multimodal (can generate images) and beats OpenAI's DALL-E 3 and Stable Diffusion across GenEval and DPG-Bench benchmarks. pic.twitter.com/HVB1wBns1z
DeepSeek open-sources Janus Pro, beating Stable Diffusion and OpenAI's DALL-E 3🤯 pic.twitter.com/C50jQGHOHl
— Casper Hansen (@casper_hansen_) January 27, 2025
WAIT A SECOND, DeepSeek just dropped Janus 7B (MIT Licensed) - multimodal LLM (capable of generating images too) 🔥 pic.twitter.com/2kzaCJfLPt
— Vaibhav (VB) Srivastav (@reach_vb) January 27, 2025
https://boards.4chan.org/g/thread/104075936
https://boards.4chan.org/g/thread/104077316
https://boards.4chan.org/g/thread/104077293
https://old.reddit.com/r/LocalLLaMA/comments/1ibd5x0/deepseek_releases_deepseekaijanuspro7b_unified/
https://old.reddit.com/r/singularity/comments/1ibe4j7/deepseek_drops_multimodal_januspro7b_model/
https://old.reddit.com/r/DeepSeek/comments/1ibfed1/news_deepseek_just_dropped_another_opensource_ai/
https://old.reddit.com/r/singularity/comments/1ibdyou/deepseek_just_dropped_janus_7b_mit_licensed/
https://hexbear.net/post/4363677?scrollToComments=false
https://hexbear.net/post/4364578?scrollToComments=false
BlueSky:
DeepSeek has released a new set of multimodal AI models that it claims can outperform OpenAI’s DALL-E 3.The models are part of a new model family that DeepSeek is calling Janus-Pro. They range in size from 1 billion to 7 billion parameters.Read more here: tcrn.ch/40Bc5Qm
— TechCrunch (@techcrunch.com) 2025-01-27T21:38:20.589Z
https://rdrama.net/post/337205/deepseek-drops-multimodal-januspro7b-model-beating
Jump in the discussion.
No email address required.
Jump in the discussion.
No email address required.
Is their claim controversial?
Jump in the discussion.
No email address required.
Jump in the discussion.
No email address required.
More options
Context
So, Deepseek come in with this claim: "Hey, we built something that can square up with GPT o1, and we did it on a budget. $5.6 million, 2,048 NVIDIA H800 GPUs, 55 days. Easy."
Sounds like horseshit. Straight up lying from Zebra peepee munchers. You don't train a GPT-level model on what amounts to pocket change and a Costco membership worth of hardware. It's like saying you built a fricking Ferrari in your garage with duct tape and spare parts from a push lawnmower.
ScaleAI's CEO says they have around 50,000 NVIDIA H100s, of course there's the question of how they even have 2 billion dollars worth of GPUs they shouldn't legally possess thanks to export controls.
But hey, somehow the claims are loud enough to tank NVIDIA's stock and make every AI heavyweight start sweating through their designer jorts.
Training a Large Language Model isn't just difficult—it's fricking expensive in a Saudi sovereign wealth fund kind of way. GPT-3? That was 175 billion parameters, trained on thousands of GPUs running for weeks, burning through tens of millions of dollars. GPT-4? 78.4 million. Gemini Ultra? 191 million. Just keeping the darn thing running probably takes more energy than a small country. You need something like 14–18 times the model size in memory to get it trained properly. That's not an algorithm; that's a financial black hole. And now Deepseek is out here claiming they're doing GPT-tier work on a WIC budget? Oh okay
To even build a halfway-decent LLM you need, or we think you need a lot of compute. You grab every scrap of text you can—books, code, Reddit posts, probably the back of a cereal box—and process it into something usable. First, they would have collected and preprocessed an enormous dataset, cleaning and tokenizing the text while filtering irrelevant tokens. This data was then used to train their model using a Transformer architecture, focusing on token prediction tasks (e.g., masked language modeling or autoregressive training). They likely employed distributed training techniques, splitting the workload across GPUs with data and model parallelism. To save costs, they might have leveraged techniques like parameter-efficient fine-tuning, quantization, or model distillation.
But this is where the skepticism kicks in. Achieving GPT-o1 level performance on their supposedly limited resources is like saying you bench-pressed a car but didn't film it. Training a model of that scale usually demands an gorillion GPUs and a mountain of cash—way more than what Deepseek claims they spent. You can't fake the physics. You need power, hardware, and time, and none of those come cheap.
What did they say they did?
Sounds fancy, my superior race, but it's like bragging you broke land speed records in a Fiero because you installed better tires. Technically possible, sure, maybe if you tied it to a rocket. Believable? idk
Data preprocessing alone eats GPU time for breakfast. Billions of tokens, embeddings, weighting—it's like trying to pave the road to Rome with toothpicks. Even if DeepSeek's workflows are optimized to heck and back, they'd still need mountains of hardware just to get through step one.
Unless, of course, they're back to their usual game and skipping the hard parts. You know, like stealing pre-trained weights from someone like OpenAI or Meta. That's corporate espionage, sure, but not exactly unheard of from China, who sees intellectual property exactly like you'd expect a commie to (a greedy materialistic commie that eats piss eggs and is trying to sell you your own shit and make money off of you).
It's quite telling that not just does the model completely reject communism, it also sucks at Chinese history. Weird
for a Chinese model, right?
Some obfuscated prompts will also make it shit out the "Western view" on what happened on June 5th, 1989 in Beijing.
Like a prompt like this:
It gives a typical Burger view: China bad.
They would apply output filters only if they haven't trained the model or couldn't train or adapt it. Output filters moderate the LLM's output and prevent it from being presented to the user—something that only makes sense if they never raised their little LLM like good parents.
Another possibility? They outsourced their compute to some sketchy back-alley GPU farm running on hardware nobody can trace.
Or maybe we're fricking r-slurred and doomed to fall.
Or maybe they're just lying and that's what they want you to think.
I think it's strategic to buckbreak NVidia and US AI and give themselves time to catch up. Oh this is free? No VC no money no more research.
Jump in the discussion.
No email address required.
well they published how they did it, so if they ain't lying expect it to be replicated.
yay science?
Jump in the discussion.
No email address required.
I HECKIN LOVE SOYENCE
Jump in the discussion.
No email address required.
More options
Context
More options
Context
Thanks. I also saw this guy explained it as
ive worked with Chinese AI guys before and they were super smart but I dont do high end engineering shit like that nor bothered to understand it
Jump in the discussion.
No email address required.
!codecels all is revealed before the father! demons don't want you to know this one WEIRD trick!!!!![:marseysoypoint: :marseysoypoint:](https://i.rdrama.net/e/marseysoypoint.webp)
Jump in the discussion.
No email address required.
Thanks for the ping, bb. I really found that useful.
Jump in the discussion.
No email address required.
More options
Context
More options
Context
More options
Context
More options
Context
More options
Context
More options
Context