Unable to load image

Insane tech demo of high speed LLMs :marseysweating:

https://groq.com/

Apparently it's a type of new chip. I like specialized hardware :marseynerd:

44
Jump in the discussion.

No email address required.

This is fun, @HeyMoon. :marseywave:, thanks.

Let's stay away from politics :marseybribe: and focus on GPUs for laptops.

https://i.rdrama.net/images/17084574179387476.webp https://i.rdrama.net/images/17084574181392565.webp https://i.rdrama.net/images/17084574183306503.webp

:marseyemojirofl:

It made all that shit up. That mobile GPU is similar to an RTX 3060.

Of course, when asked :marseythinkorino2: to cite sources, it lists names of PC hardware mags and makes up quotes.

Excellent bullshit :marseyitsallsotiresome: bot. :marseyclapping:

Jump in the discussion.

No email address required.

Expecting LLMs to perform better on knowledge-intensive tasks just because they can process faster is dumb. That's gonna be dependent on the novelty of the architecture they use in matching user queries to database vectors.


:#marseyastronaut:

Jump in the discussion.

No email address required.

That's all I care about, so :marseyshrug:. All of these LLM chatbots are worthless to me.

:marseypathetic:

There's tom's hardware's ranking of GPUs, as well as UL Solutions (3DMark), so all it had to do was pull from that to answer :marseyconfuseddead: the question. "All it had to do," sure it's harder than that, but instead they trained it on a bunch of worthless words. How is any of this impressive?

It's not really :marseythinkorino2: "knowledge-intensive." It's a ranked list of GPUs. :marseyshrug: These things will be more fun when you can tell it to pull data from something like that list and compare such and such with whatever.

Jump in the discussion.

No email address required.

Knowledge-intensive is just a catch-all for knowing to consider a certain context when getting a user query. In this case, knowing to source from Tom's hardware or something. MS Copilot can do that for free right now, and its not because they have better processing, it's just how the LLM connects to their sourcing architecture.

Most chatbot startups are using OpenAI models but have made their own databases that they train the LLM to prioritize when answering a question related to the solution category they're trying to sell in.

There's no ubermensch model in sight right now; all the 'smart' chatbots beating general GPT-4 on answers are just taught to search hyper-specifically.

This is the layman way to put it; the architecture and the models are collectively known as retrieval-augmented generation and they're a separate thing from LLMs, but they make LLMs more reliable/less likely to hallucinate.


:#marseyastronaut:

Jump in the discussion.

No email address required.

:marseynotes:

Thanks.

>hallucinate

I really :marseythinkorino2: dislike :marseyishygddt: how they're personifying "AI."

Jump in the discussion.

No email address required.

I'm fine with that term specifically, but I do hate how much buttfrickery OpenAI does to make their chatbot saccharine and PC as opposed to just being a tool. AI ethicists are the biggest cute twinks and I want to hunt them down in Minecraft.


:#marseyastronaut:

Jump in the discussion.

No email address required.

Yeah they're losers, but I know marketing when I see it, so I hate it.

:marseysteaming:

Jump in the discussion.

No email address required.

well it's only as good as the models powering it lol, the crazy thing is the ludicrous speed

Jump in the discussion.

No email address required.

It's nicer that it's fast (because they rent a lot of server :marseymaltliquor: space :marseynyanlgbt: I guess), but it's pretty :marseyglam: much worthless as a tool for information. Are they hoping to only build :marseyyarn: a fast one and sell it before :marseyskellington: the chatbot craze ends?

Jump in the discussion.

No email address required.

I think that groq is it's own chip architecture afaik specifically designed for LLMs. its feasible that they could sell this to OpenAI. You know, OpenAI recently made the absurd claim that we need to spend 7 trillion dollars on better AI chips, so it might be part of that push

Jump in the discussion.

No email address required.

Powered by Groq LPU Interface Engine

:marseymegaphone: Boring software!

Partners:

Cirrascale is a premier cloud services provider of deep learning infrastructure solutions for blah blah blah

A VM host, but where's the hardware?? :marseyconfused2:

Bittware Molex: GroqCard™ accelerators are available through Bittware. Learn more here.

:marseysoyhype: Hardware?!

https://www.bittware.com/products/groq/

:marseysoypoint:

It's a pic of a processor!!!

>230 MB of on-die memory

:marseysoyhype: Is that 230 MB of cache!!???

https://i.rdrama.net/images/17084689689108264.webp

>SRAM

:soyjakwow: IT IS 230MB OF CACHE!

>GroqCard™ Accelerator

>Up to 750 TOPs, 188 TFLOPs (INT8, FP16 @900 MHz)

*up to :marseyjerkofffrown:

:marseyhmm: Normal GPUs measure TFLOPS with FP32...

:marseyreading:

https://www.velocitymicro.com/blog/fp64-vs-fp32-vs-fp16-and-multi-precision-understanding-precision-in-computing/

Single-precision floating-point, denoted as FP32, is a standard format for representing real numbers in computers. It uses 32 bits to store a floating-point number, consisting of a sign bit, an 8-bit exponent, and a 23-bit significand (also known as the mantissa). The limited precision of FP32 allows for quick calculations but may lead to rounding errors, affecting the accuracy of results, especially in complex scientific simulations and numerical analysis.

Half-precision floating-point, denoted as FP16, uses 16 bits to represent a floating-point number. It includes a sign bit, a 5-bit exponent, and a 10-bit significand. FP16 sacrifices precision for reduced memory usage and faster computation. This makes it suitable for certain applications, such as machine learning and artificial intelligence, where the focus is on quick training and inference rather than absolute numerical accuracy.

So it's fast but inaccurate! WOWOWOWO! :soyjakwow:

188 TFLOPS/s fp16 seems like a lot. (*UP TO)

>RTX 4090

https://i.rdrama.net/images/17084689693325725.webp

4090's sell for $1800 and up. I'm not sure what 4th Gen Tensor Cores (A) and their 1321 AI TOPS can do compared to the DO YOU SMELL WHAT THE GROQ IS COOKING chip, so... :marseyshrug:

I dunno, HoneyMoon. Sounds like a bunch of bullshit. :marseyshapiro:

Jump in the discussion.

No email address required.

idk :marseyshrug: i really need to learn how these LLMs actually work one of these days, this shit is super interesting but IDK wtaf is going on

I think the inaccuracy of what you were seeing is just mixtral being mixtral, but you are right that quantization down to fp16 will get rid of some accuracy. I tried my best to find out if mistal uses 32fp by default but I couldn't, maybe its obvious to someone else lol

also idk if the NVIDIA GPUs are as well suited to the LLM inference as bespoke hardware would be. like "shader cores" probably have inbuilt optimizations for graphics shit. meanwhile grok has a built in matrix multiplication thingy, which is one of the biggest chokepoints in LLM shit (and computing in general)

Jump in the discussion.

No email address required.

Maybe that's what the AI tensor cores are for? I've heard :marseyjacksparrow: a lot about people using Nvidia's cards :marseygambling: for this stuff, but I've never :marseyitsover: delved deep into it.

Jump in the discussion.

No email address required.

groq writes for https://userbenchmark.com

Jump in the discussion.

No email address required.

tbh sounds like it would be good for story writing.

Jump in the discussion.

No email address required.

Give it a whirl, Ed-Misser!

Jump in the discussion.

No email address required.

Link copied to clipboard
Action successful!
Error, please refresh the page and try again.