Unable to load image

This article makes me seethe super hard every time a midwit :brainletpit: posts it.

https://queue.acm.org/detail.cfm?id=3212479

C code provides a mostly serial abstract machine.

Guess what nigger, processors are still mostly serial. Yes there are multiple pipelines with operations flying around the place, but at the end of the day the program counter is incremented on each core and the next instruction is fetched.

In contrast, GPUs achieve very high performance without any of this logic, at the expense of requiring explicitly parallel programs.

Lmao GPUs are broken as fuck for anything which isn't trivial arithmetic. Branches absolutely kill performance. The memory model is even more poorly defined than CPUs; atomics and branches regularly break from what the vendor says.

This unit is conspicuously absent on GPUs, where parallelism again comes from multiple threads rather than trying to extract instruction-level parallelism from intrinsically scalar code.

This is because you CAN'T implement a good register renamer for GPUs. It has the same problem as branching: the execution model is such that the same instruction is executed in lockstep across several threads; If one thread has a different instruction ordering, everything gets fucked up.

If instructions do not have dependencies that need to be reordered, then register renaming is not necessary.

Nigga what? The same theoretical optimizations apply to GPUs: add a, b; some slow instruction; add b, c. The second add could still be executed in parallel first for perf win. The problem is that the SIMT execution model does not make reordering easy.

Consider another core part of the C abstract machine's memory model: flat memory. This hasn't been true for more than two decades. A modern processor often has three levels of cache in between registers and main memory, which attempt to hide latency.

From the first processors, loading & storing was just used as another communication channel with the processor. A somewhat large part of the address space is dedicated to hardware registers that have whatever wacky functionality you want. E.g. write to 0x1000EFAA to turn on the blinky yahoo red LED. It is you who is perverted by years of programming thinking that memory means bytes the programmer stores.

The cache is, as its name implies, hidden from the programmer and so is not visible to C. Efficient use of the cache is one of the most important ways of making code run quickly on a modern processor, yet this is completely hidden by the abstract machine, and programmers must rely on knowing implementation details of the cache (for example, two values that are 64-byte-aligned may end up in the same cache line) to write efficient code.

Whoa! You need to actually know how your computer works to program? Explicit instructions for load to L1 load to L2 flush cache are unnecessary. You can already achieve these effects with the normal instruction set.

Optimizing C

In this section, he argues how difficult it is for a compiler to optimize C. This is actually fair. C is currently at a worst middle ground where some stuff is unoptimizable (according to the standard) and other stuff is undefined behavior that sucks for programming. For example, unsigned integers overflow (nice behavior for the programmer but makes it harder for the compiler) and signed integers can't overflow (nice for compiler optimizations but really sucky for the programmer). I think C is actually trying to be 2 languages: a high-level assembly and a compiler IR. I think we are trending towards the latter. For example, passing NULL to memcpy is UB even if the # of bytes to copy is 0, because it lets the compiler assume the input/output pointers are non-NULL. I'm honestly somewhat torn on what I'd like more.

We have a number of examples of designs that have not focused on traditional C code to provide some inspiration. For example, highly multithreaded chips, such as Sun/Oracle's UltraSPARC Tx series, don't require as much cache to keep their execution units full. Research processors2 have extended this concept to very large numbers of hardware-scheduled threads. The key idea behind these designs is that with enough high-level parallelism, you can suspend the threads that are waiting for data from memory and fill your execution units with instructions from others. The problem with such designs is that C programs tend to have few busy threads.

Lmao all the examples he gives are failures. The nigga wants a dataflow machine, which we decided aren't useful in the 80s. Also, it's not just C programs; I'd say most programs are serial in nature.

Consider in contrast an Erlang-style abstract machine, ... A cache coherency protocol for such a system would have two cases: mutable or shared.

The reason modern cache coherency protocol state machines have 70+ states in them is because they trie to preempt and paper around cache misses, because moving data around cores is much more expensive than adding a few more state machine transitions. The same situation would arise in his fantasy.

Immutable objects can simplify caches even more, as well as making several operations even cheaper.

Most useful work your program does uses mutable data structures. Yes, even the haskellers have a Vector type that you can push and pop to.

A processor designed purely for speed, not for a compromise between speed and C support, would likely support large numbers of threads, have wide vector units, and have a much simpler memory model.

Yes if you got rid of all the legacy cruft and tried making something new you could make a genuinely good processor. All this to run your crappy garbage collected bloated functional-programming language. Sigh.

There is a common myth in software development that parallel programming is hard. This would come as a surprise to Alan Kay, who was able to teach an actor-model language to young children.

When you're playing around with lego bricks provided by someone else's framework it is easy. However, offloading complexity onto someone else doesn't remove it. You pay for it in performance.

In general, I hate the tyranny of "big-idea" programming languages which he advocates for. You WILL not mutate any variables, perform any IO, or do any useful work and you WILL like it.

21
Jump in the discussion.

No email address required.

c is low level because it's a pain in the butt.

Jump in the discussion.

No email address required.

c is low level because the devs are all in their basements !codecels


Give me your money and I'll annoy people with it :space: https://i.rdrama.net/images/16965516366194396.webp

Jump in the discussion.

No email address required.

:#marseysting:

Jump in the discussion.

No email address required.

:marseyexcited: I like ur flair


Give me your money and I'll annoy people with it :space: https://i.rdrama.net/images/16965516366194396.webp

Jump in the discussion.

No email address required.

I like the cut of your jib :marseyexcited:

Jump in the discussion.

No email address required.

The coffee machine doesn't need sunlight to work.

Jump in the discussion.

No email address required.

Link copied to clipboard
Action successful!
Error, please refresh the page and try again.