Advertisement · 728 × 90

Posts by Gabriel

Post image Post image Post image Post image

Got some new prints in the @gitbutler.com office

3 weeks ago 2 2 1 0
NVIDIA Adds Official Support For RHEL-Compatible Distributions Like AlmaLinux With CUDA 13.2 With CUDA 13.2 that is now shipping, NVIDIA has provided official support for Red Hat Enterprise Linux compatible distributions/downstreams like AlmaLinux to CUDA. With this official NVIDIA CUDA support for these RHEL-compatible distributions, NVIDIA is also allowing the NVIDIA packages to be distributed directly from the OS package repositories...

NVIDIA Adds Official Support For RHEL-Compatible Distributions Like AlmaLinux With CUDA 13.2 - www.phoronix.com/news/NVIDIA-Official-RHE...

1 month ago 10 2 0 0
Preview
Defer available in gcc and clang About a year ago I posted about defer and that it would be available for everyone using gcc and/or clang soon. So it is probably time for an update. Two things have happened in the mean time: A tec…

Defer available in GCC and Clang
gustedt.wordpress.com/2026/02/15/d...

2 months ago 0 1 0 0
Post image

Join us for the HPSF Community Summit 2026 in Braunschweig, Germany, February 25-27! 💚

Learn what’s new with HPSF projects, give us feedback on your use of HPSF software, meet with project communities, and tell us how to grow and improve them.

Details: hpsf.io/event/hpsf-c...

2 months ago 2 3 0 0
Intro to the GitButler CLI
Intro to the GitButler CLI YouTube video by GitButler

Love the GitButler GUI but miss your CLI? Have we got the solution for you!

youtu.be/Jg8L3SbgZ3o?...

2 months ago 11 3 0 0
Preview
Release v0.37.0 · jj-vcs/jj About jj is a Git-compatible version control system that is both simple and powerful. See the installation instructions to get started. Release highlights A new syntax for referring to hidden and...

#jj-vcs 0.37.0 came out yesterday! im intrigued by the new divergent change syntax, seems very neat

github.com/jj-vcs/jj/re...

3 months ago 60 7 4 1

Please note: Any claims of AI Exascale, AI Zettascale or beyond computing power are just baloney. Real computing power is measured in FP64. Period.

AMD embraced utter stupidity by adopting this terminology by the leather jacket man.

It's a really shame!

#CES2026 #AMD

3 months ago 8 3 2 0
Advertisement
LLVM 22 Lands NVIDIA Olympus CPU Scheduling Model NVIDIA's Olympus are the ARM64 cores found within the upcoming Vera CPU that will be paired with Rubin. Olympus cores are claimed to be twice as fast as NVIDIA's current CPU cores found in Grace and based on Neoverse-V2. Earlier this year the open-source compilers landed initial support for Olympus while now a proper CPU scheduling model has been upstreamed into LLVM 22...

LLVM 22 Lands NVIDIA Olympus CPU Scheduling Model - www.phoronix.com/news/NVIDIA-Olympus-Sche...

3 months ago 1 1 0 0
title slide of talk being given at Rust Nation UK:

[Title] Rust for Foundational SW or: Safety-Critical Software in Rust

title slide of talk being given at Rust Nation UK: [Title] Rust for Foundational SW or: Safety-Critical Software in Rust

ever curious why people that work in safety-critical systems want to use Rust?

here's the title slide for the talk i'll give at @rustnationuk.bsky.social about this

3 months ago 27 8 2 1
When compilers surprise you — Matt Godbolt’s blog Sometimes compilers can surprise and delight even a jaded old engineer like me

Day 24 of #AoCO2025! A loop summing 0+1+2+...+n. GCC unrolls it. Clang does something jaw-dropping: the loop vanishes entirely, replaced by a direct calculation. How?!

xania.org/202512/24-cu...
youtu.be/V9dy34slaxA

3 months ago 31 4 3 1

Day 23 of #AoCO2025! Switch → jump table? Sometimes. Other times: arithmetic, bitmasks, or something cleverer. Compilers have more tricks than you think.

xania.org/202512/23-sw...
youtu.be/aSljdPafBAw

3 months ago 22 2 0 0
Clever memory tricks — Matt Godbolt’s blog We learn that compilers have tricks to access memory efficiently

Day 22: String comparison against "ABCDEFG" should call memcmp, but Clang inlines it with some clever memory tricks. How does it compare 7 bytes so efficiently? xania.org/202512/22-me... youtu.be/kXmqwJoaapg #AoCO2025

3 months ago 24 4 0 0
When SIMD Fails: Floating Point Associativity — Matt Godbolt’s blog Why floating point maths doesn't vectorise like integers, and what to do about it

Day 21: Summing integers? Compiler vectorises beautifully—8 at a time! Switch to floats? It refuses, doing each add individually. Same code, totally different output. Why? 🤔

xania.org/202512/21-ve...
youtu.be/lUTvi_96-D8

#AoCO2025

4 months ago 22 3 2 0

Day 20: Process 65,536 integers one at a time? Nah. The compiler vectorises it to handle 8 at once — same code, 8× faster! SIMD auto-vectorisation is compiler magic 🚀

xania.org/202512/20-si...
youtu.be/d68x8TF7XJs #AoCO2025

4 months ago 26 4 1 0
Chasing your tail — Matt Godbolt’s blog The art of not (directly) coming back: tail call optimisation

Day 19: Recursive functions calling themselves endlessly — stack growth? Nope! The compiler turns recursion into loops. Tail call optimisation is magic ✨

xania.org/202512/19-ta...
youtu.be/J1vtP0QDLLU #AoCO2025

4 months ago 24 3 2 0
Advertisement
Partial inlining — Matt Godbolt’s blog Inlining doesn't have to be all-or-nothing

Day 18: Function with fast & slow paths. Inline = code bloat. Don't inline = slow fast path. Can't have both—or can you? The compiler finds a surprising way out of this dilemma.

xania.org/202512/18-pa...
youtu.be/STZb5K5sPDs
#AoCO2025

4 months ago 26 4 0 1

Actually, this die configuration is not new information, it was already mentioned on this removed slide:
(Although the CPU die's CBB name is seems still new.)

4 months ago 1 2 0 1
Preview
Deferred Conflict (with Steve Klabnik) | Dead Code A podcast about how the software industry got this way

Listen: shows.acast.com/dead-code/e...

4 months ago 5 3 0 0

It’s safe to assume that the HPC scheduling space is going to be in a state of Flux for quite some time to come…

(I see what I did there. With consummate apologies to @vsoch.bsky.social and @tgamblin.bsky.social in advance 🤣)

4 months ago 5 2 0 0
Video

How have servers and the cloud evolved in the last 30 years, and what might be next? @bcantrill.bsky.social has been at the thick of the industry since the Dotcom Boom, and shares fascinating stories.

Bryan is one of my all-time favorite people to talk with - don't miss this one.

(cont'd)

4 months ago 59 7 3 1
Inlining - the ultimate optimisation — Matt Godbolt’s blog Copy paste can sometimes be a good thing, at least if the compiler does it for you

Day 17: Inlining — the ultimate optimisation ✨

A function gets inlined, half vanishes. The assembly is cleaner than hand-written. How does copy-paste make code disappear?

xania.org/202512/17-in...
youtu.be/JFHfFTvMPp0

#AoCO2025

4 months ago 19 3 0 0
Calling all arguments — Matt Godbolt’s blog Knowing how compilers call functions can help with design - and optimisation

Day 16: Calling conventions matter! Pass 8 chars as separate args: stack spillage. Pack them in a struct: single register. Sometimes structs are MORE efficient than separate parameters!

xania.org/202512/16-ca...
youtu.be/Yaw8AMoP4sI
#AoCO2025

4 months ago 42 5 2 0
Aliasing — Matt Godbolt’s blog Knowing when the compiler can't optimise is important too

Day 15: Two nearly identical loops—one writes to memory every iteration, the other stays in registers. Same code, wildly different performance. The culprit? Aliasing!

xania.org/202512/15-al...
youtu.be/PPJtJzT2U04

#AoCO2025

4 months ago 27 4 0 0
Advertisement

Does this mean no more dirt-cheap NRE from Slurm? Or will Slurm development no longer be coin-operated? Would love to see serious engineering effort go into modernizing Slurm, but this could go in many directions.

4 months ago 0 1 2 0
When LICM fails us — Matt Godbolt’s blog When aliasing can prevent loop-invariant code motion

Day 14: Add ONE global counter to your loop and watch LICM vanish—strlen called every iteration! Why would incrementing an unrelated variable break the optimisation? 🤔

xania.org/202512/14-li...
youtu.be/OwFNblEEAXo
#AoCO2025

4 months ago 28 4 1 0
Loop-Invariant Code Motion — Matt Godbolt’s blog The compiler can move code outside of loops to speed things up

Day 13 of Advent of Compiler Optimisations! 🔄

Loop calling a function whose result never changes? One compiler hoists it out automatically. The other… doesn't. Even with hints!

xania.org/202512/13-li...
youtu.be/dIwaqJG0WDo

#AoCO2025

4 months ago 16 3 0 0
Pointer Arith (Using the GNU Compiler Collection (GCC)) Pointer Arith (Using the GNU Compiler Collection (GCC))

Cursed code:

void* f(void *p) {
return p + 1;
}

Both gcc and clang support void* arithmetic as an extension in C:

gcc.gnu.org/onlinedocs/g...

-pedantic FTW!

Godbolt: godbolt.org/z/rcrqWvMGW

#Programming

4 months ago 4 2 1 0
Unswitching loops for fun and profit — Matt Godbolt’s blog Duplicating loops around can yield some decent optimisations

Day 12 of Advent of Compiler Optimisations! A loop that checks the same thing every time. The compiler's solution? Make the code bigger to make it faster. Wait, what? xania.org/202512/12-lo... youtu.be/-VCrYshE7iQ #AoCO2025

4 months ago 24 3 0 1

Day 11: A clever bit-counting loop using the "clear bottom bit" trick. Change one compiler flag and... wait, what just happened to my loop?! Pattern recognition at its finest.

xania.org/202512/11-po...
youtu.be/Hu0vu1tpZnc
#AoCO2025

4 months ago 29 5 2 1
Post image

Kokkos 5.0 is officially out. ✨

Details:
- Moves the project to C++20
- Retires older interfaces, reducing complexity for future work
- Ideal time for teams to review workflows

Read the full update here: hpsf.io/blog/2025/ko...

4 months ago 2 1 0 0