Unable to load image

This article makes me seethe super hard every time a midwit :brainletpit: posts it.

https://queue.acm.org/detail.cfm?id=3212479

C code provides a mostly serial abstract machine.

Guess what nigger, processors are still mostly serial. Yes there are multiple pipelines with operations flying around the place, but at the end of the day the program counter is incremented on each core and the next instruction is fetched.

In contrast, GPUs achieve very high performance without any of this logic, at the expense of requiring explicitly parallel programs.

Lmao GPUs are broken as fuck for anything which isn't trivial arithmetic. Branches absolutely kill performance. The memory model is even more poorly defined than CPUs; atomics and branches regularly break from what the vendor says.

This unit is conspicuously absent on GPUs, where parallelism again comes from multiple threads rather than trying to extract instruction-level parallelism from intrinsically scalar code.

This is because you CAN'T implement a good register renamer for GPUs. It has the same problem as branching: the execution model is such that the same instruction is executed in lockstep across several threads; If one thread has a different instruction ordering, everything gets fucked up.

If instructions do not have dependencies that need to be reordered, then register renaming is not necessary.

Nigga what? The same theoretical optimizations apply to GPUs: add a, b; some slow instruction; add b, c. The second add could still be executed in parallel first for perf win. The problem is that the SIMT execution model does not make reordering easy.

Consider another core part of the C abstract machine's memory model: flat memory. This hasn't been true for more than two decades. A modern processor often has three levels of cache in between registers and main memory, which attempt to hide latency.

From the first processors, loading & storing was just used as another communication channel with the processor. A somewhat large part of the address space is dedicated to hardware registers that have whatever wacky functionality you want. E.g. write to 0x1000EFAA to turn on the blinky yahoo red LED. It is you who is perverted by years of programming thinking that memory means bytes the programmer stores.

The cache is, as its name implies, hidden from the programmer and so is not visible to C. Efficient use of the cache is one of the most important ways of making code run quickly on a modern processor, yet this is completely hidden by the abstract machine, and programmers must rely on knowing implementation details of the cache (for example, two values that are 64-byte-aligned may end up in the same cache line) to write efficient code.

Whoa! You need to actually know how your computer works to program? Explicit instructions for load to L1 load to L2 flush cache are unnecessary. You can already achieve these effects with the normal instruction set.

Optimizing C

In this section, he argues how difficult it is for a compiler to optimize C. This is actually fair. C is currently at a worst middle ground where some stuff is unoptimizable (according to the standard) and other stuff is undefined behavior that sucks for programming. For example, unsigned integers overflow (nice behavior for the programmer but makes it harder for the compiler) and signed integers can't overflow (nice for compiler optimizations but really sucky for the programmer). I think C is actually trying to be 2 languages: a high-level assembly and a compiler IR. I think we are trending towards the latter. For example, passing NULL to memcpy is UB even if the # of bytes to copy is 0, because it lets the compiler assume the input/output pointers are non-NULL. I'm honestly somewhat torn on what I'd like more.

We have a number of examples of designs that have not focused on traditional C code to provide some inspiration. For example, highly multithreaded chips, such as Sun/Oracle's UltraSPARC Tx series, don't require as much cache to keep their execution units full. Research processors2 have extended this concept to very large numbers of hardware-scheduled threads. The key idea behind these designs is that with enough high-level parallelism, you can suspend the threads that are waiting for data from memory and fill your execution units with instructions from others. The problem with such designs is that C programs tend to have few busy threads.

Lmao all the examples he gives are failures. The nigga wants a dataflow machine, which we decided aren't useful in the 80s. Also, it's not just C programs; I'd say most programs are serial in nature.

Consider in contrast an Erlang-style abstract machine, ... A cache coherency protocol for such a system would have two cases: mutable or shared.

The reason modern cache coherency protocol state machines have 70+ states in them is because they trie to preempt and paper around cache misses, because moving data around cores is much more expensive than adding a few more state machine transitions. The same situation would arise in his fantasy.

Immutable objects can simplify caches even more, as well as making several operations even cheaper.

Most useful work your program does uses mutable data structures. Yes, even the haskellers have a Vector type that you can push and pop to.

A processor designed purely for speed, not for a compromise between speed and C support, would likely support large numbers of threads, have wide vector units, and have a much simpler memory model.

Yes if you got rid of all the legacy cruft and tried making something new you could make a genuinely good processor. All this to run your crappy garbage collected bloated functional-programming language. Sigh.

There is a common myth in software development that parallel programming is hard. This would come as a surprise to Alan Kay, who was able to teach an actor-model language to young children.

When you're playing around with lego bricks provided by someone else's framework it is easy. However, offloading complexity onto someone else doesn't remove it. You pay for it in performance.

In general, I hate the tyranny of "big-idea" programming languages which he advocates for. You WILL not mutate any variables, perform any IO, or do any useful work and you WILL like it.

21
Jump in the discussion.

No email address required.

This whole post reads like Terry Davis has returned from the grave.

Jump in the discussion.

No email address required.

c is low level because it's a pain in the butt.

Jump in the discussion.

No email address required.

c is low level because the devs are all in their basements !codecels


Give me your money and I'll annoy people with it :space: https://i.rdrama.net/images/16965516366194396.webp

Jump in the discussion.

No email address required.

:#marseysting:

Jump in the discussion.

No email address required.

:marseyexcited: I like ur flair


Give me your money and I'll annoy people with it :space: https://i.rdrama.net/images/16965516366194396.webp

Jump in the discussion.

No email address required.

I like the cut of your jib :marseyexcited:

Jump in the discussion.

No email address required.

The coffee machine doesn't need sunlight to work.

Jump in the discussion.

No email address required.

If garbage collecters would work they would uninstall their language runtime from my system when closing the program 🤓

Jump in the discussion.

No email address required.

This is beyond my autism, but you called the guy a BIPOC so I assume you're right


Follower of Christ :marseyandjesus: Tech lover, IT Admin, heckin pupper lover and occasionally troll. I hold back feelings or opinions, right or wrong because I dislike conflict.

Jump in the discussion.

No email address required.

idk what half of that means, processor specifics are too neckbeard even for me

Jump in the discussion.

No email address required.

who even cares

nerds arguing about programming languages is r-slurred

Jump in the discussion.

No email address required.

One of those nerds as you say will one day design a better pornhub. Think about that before you say something so insensitive.

Jump in the discussion.

No email address required.

porn sites are terrible now, i keep everything locally stored on a NAS

Jump in the discussion.

No email address required.

Just keep the real thing tied up in the basement.

Sure, you may feel guilty about it sometimes, but it's no more distressing than scrolling too far and accidentally reading a porn comment.

Jump in the discussion.

No email address required.

Me too but I still have to go to spankbang to download it first

Jump in the discussion.

No email address required.

Guess what BIPOC, processors are still mostly serial. Yes there are multiple pipelines with operations flying around the place, but at the end of the day the program counter is incremented on each core and the next instruction is fetched.

They literally aren't. Out-of-order execution with brach prediction and SIMD instructions have been the standard for a decade, mainly to make operations like array bounds checking efficient. C makes you do tons of work at the processor level to recover high-level patterns like array operations because it can't express them.

Lmao GPUs are broken as frick for anything which isn't trivial arithmetic. Branches absolutely kill performance. The memory model is even more poorly defined than CPUs; atomics and branches regularly break from what the vendor says.

This is because you CAN'T implement a good register renamer for GPUs. It has the same problem as branching: the execution model is such that the same instruction is executed in lockstep across several threads; If one thread has a different instruction ordering, everything gets fricked up.

Nobody is proposing that you use GPUs for everything. They're essentially made to speed up code that CPUs can't run efficiently - math on large arrays of floats. It's possible to have fast general-purpose parallel systems, but we spend all our time and money optimizing CPUs.

From the first processors, loading & storing was just used as another communication channel with the processor. A somewhat large part of the address space is dedicated to hardware registers that have whatever wacky functionality you want. E.g. write to 0x1000EFAA to turn on the blinky yahoo red LED. It is you who is perverted by years of programming thinking that memory means bytes the programmer stores.

Whoa! You need to actually know how your computer works to program? Explicit instructions for load to L1 load to L2 flush cache are unnecessary. You can already achieve these effects with the normal instruction set.

Nobody is saying that you need to specify every cache everywhere - that's what high-level languages are for. The point is that you literally can't on modern architectures without knowing the machine you're targeting. The ability to write programs independent of the machine is a feature, not a bug.

Lmao all the examples he gives are failures. The neighbor wants a dataflow machine, which we decided aren't useful in the 80s. Also, it's not just C programs; I'd say most programs are serial in nature.

The reason modern cache coherency protocol state machines have 70+ states in them is because they trie to preempt and paper around cache misses, because moving data around cores is much more expensive than adding a few more state machine transitions. The same situation would arise in his fantasy.

Wow, we have to use complex caching and state transitions because of the cost of moving memory between threads? It's almost as if we design chips and memory hierarchies around the assumption that programs are always sequential.

Most useful work your program does uses mutable data structures. Yes, even the haskellers have a Vector type that you can push and pop to.

Yes if you got rid of all the legacy cruft and tried making something new you could make a genuinely good processor. All this to run your crappy garbage collected bloated functional-programming language. Sigh.

Have you seen most programs? The majority of programmers are writing Python and Javascript, not some real-time algorithm that needs mutable vectors to be fast.

When you're playing around with lego bricks provided by someone else's framework it is easy. However, offloading complexity onto someone else doesn't remove it. You pay for it in performance.

In general, I hate the tyranny of "big-idea" programming languages which he advocates for. You WILL not mutate any variables, perform any IO, or do any useful work and you WILL like it.

Yes, because nobody uses "big-idea" languages like Java or Smalltalk or Pascal or Datalog or SQL or Elixr or Rust. Trust the science, /qa/ lost, and you WILL NOT add inscrutable memory vulnerabilities to your programs.

Jump in the discussion.

No email address required.

They literally aren't [serial].

It's still a linear stream of instructions. "Mostly serial." We've already ditched a lot of the expensive stuff like precise exceptions.

Nobody is proposing that you use GPUs for everything.

The author uses GPUs to demonstrate how abandoning the C-like experience processors provide speeds up code. He is advocating for many-core machines, which have similar deficiencies.

cost of moving memory between threads

You will always have to move memory between threads. In fact, in GPUs this is already kind of an issue.

Have you seen most programs? The majority of programmers are writing Python and Javascript, not some real-time algorithm that needs mutable vectors to be fast.

It's not about speed, but convenience. All of these languages have some mutable resizable array abstraction, and even a pure language like haskell has something analogous to it. I actually think vectors are inferior to fat linked lists from a perf standpoint if you don't need random access.

because nobody uses "big-idea" languages like Java or Smalltalk or Pascal or Datalog or SQL or Elixr or Rust

True. I write Java/typescript/python/ruby code for a living. These languages are used because they are the lowest common denominator and have a good enough ecosystem NOT for their "big ideas." Also Smalltalk, Pascal, and Datalog are not used that much anymore and Elixir is a meme. Rust is being adopted and a step in the right direction imo. Notice that all of those languages' virtual machines (except for elixir) more closely resemble a PDP-11 than a bespoke manycore system.

Jump in the discussion.

No email address required.

It's still a linear stream of instructions.

If you define 'linear' as 'taking multiple branches through your code at the same time' then sure

He is advocating for many-core machines, which have similar deficiencies.

He's not saying we should only have many-core machines, he's advocating for an interface which doesn't communicate that we're not on such a machine.

You will always have to move memory between threads.

Not if you use greenthreads, and especially not when the compiler can localize computation that shares data onto the same core.

All of these languages have some mutable resizable array abstraction, and even a pure language like haskell has something analogous to it.

Nobody in the Haskell world uses mutable arrays unless they're strictly necessary for perf reasons. Similarly, Python, Ruby, Rust and JS all have higher-level abstractions over arrays which are often used instead of their mutable equivalents.

These languages are used because they are the lowest common denominator and have a good enough ecosystem NOT for their "big ideas."

A lot of the ecosystem surrounding these languages are only possible because of higher-level language features like strong types and proper encapsulation. Mutation and bespoke concurrent programming also break these abstractions.

Also Smalltalk, Pascal, and Datalog are not used that much anymore

My point isn't that they're the most popular currently, my point is that they have had have major influences on modern SE.

Notice that all of those languages' virtual machines (except for elixir) more closely resemble a PDP-11 than a bespoke manycore system.

That's because they're targeting a machine pretending to be a PDP-11. A virtual machine is an implementation artifact - most languages have many.

Jump in the discussion.

No email address required.

This is why we must all use D

Jump in the discussion.

No email address required.

Hi! It's good to see you're woke to the inherent micro-r*pe that is your sexuality, and it's even better that you've chosen to be a non-harming shut-in. I wish more white cis men would do this!

You should consider estrogen supplements to make yourself less predatory. As a white male, you having kids is problematic, yes, but have you considered adopting PoC children and trying to raise them non-binary? Just make it clear to them that any white cis male sexuality you accidentally display despite your estrogen supplementation is shameful, and just make it a habit to say "I'm so sorry. I am so very sorry." after every offence. You should also consider exploring cuckolding (with a marginalized minority man/woman/otherkin of course) if you still find your sexuality to be uncontrollable.

Good luck and welcome to the right side of history!

Snapshots:

Jump in the discussion.

No email address required.

Link copied to clipboard
Action successful!
Error, please refresh the page and try again.