Unable to load image

New mac studios, up to 512gb of unified memory :letsfuckinggofast:

https://www.apple.com/shop/buy-mac/mac-studio

local AI chads just won big

33
Jump in the discussion.

No email address required.

local AI chads just won big

My understanding is that unified memory is really large but slow for AI workloads. !codecels confirm?

Jump in the discussion.

No email address required.

Slower than vram, faster than normal; ram. If you're too poor to afford 512gb of gpu it's a good option

Jump in the discussion.

No email address required.

You can buy 192gb of DDR5 for like $600, but that's the limit for x86 at the moment without some AI Tesla chain. It's slow, but pretty big.

Unified memory this large is an ARM toy, and even if it's a bit slow in practice, it's still very deep. Being slow doesn't matter much when your context window and model size are huge, and this thing can practically run a Large model without a GPU chain.

Now granted, this thing is $10k with that memory configuration and is running on OSX with an ARM structure, so you might actually get better price per dollar building a Tesla chain depending on actual performance. It's just novel as heck that it can run an LLM comfortably in a box.

Jump in the discussion.

No email address required.

If you're just trying to max out memory, you can get Xeon hardware and take your RAM out to 1TB although you're still looking at a minimum of $5000 unless you buy used shit. I don't know enough about AI hardware to know what else you need to add to that to have it be useful though.

Jump in the discussion.

No email address required.

You're better off stringing together Teslas, VRAM is significantly faster than even DDR5, let alone the DDR4 banks you pull off of eBay servers. You can buy a box full of broken shitty Teslas and have a few hundred gigs of VRAM strung up for a few hundred bucks (provided you're willing to pop them open and refurbish them).

NVIDIAs main shipments are corporate grade cards, most of which are compute cards that don't even output. The RTX platform is practically an afterthought.

Jump in the discussion.

No email address required.

When you're running fat butt models with billions of parameters you kinda just need butt much ram as possible before anything else

Jump in the discussion.

No email address required.

That's better achieved with pipeline parallelism, not huge, uniform, slow memory.

Jump in the discussion.

No email address required.



Link copied to clipboard
Action successful!
Error, please refresh the page and try again.