Soon (tm)!
Posts by Ritesh Oedayrajsingh Varma
Oh and
> I don't know much about eBPF
The previous article, which also involved a dive into the kernel, has a lot of info on it:
rovarma.com/articles/fro...
Thanks Jaap!
> Do you guys ever do presentations somewhere? Would be cool to hear some fun stories!
You’re kinda looking at it :-)
There was this one time where ETW was broken for many people for like an entire year on Windows. Soooo… 😂
At least on Linux we can fix the issues ourselves!
It’s a good article!
And in the kernel’s defense… they often don’t really have an alternative to spinlocks, especially in cases like this.
But definitely goes to show that getting anything to do with spinlocks right is really hard.
New article! A user is reporting full system freezes while using Superluminal on Linux. What do you do? Cry? Well, we did a little bit.
But we also dove into the kernel...again, this time fixing several issues in eBPF's spinlock implementation. Read all about it:
rovarma.com/articles/a-t...
Somehow missed this latest piece of technical wizardry from Stefan. My first thoughts were “this is awesome, but looks super hard to get into a reliable state”.
But I thought the same thing about Live++ and Stefan knocked it out of the park there. If anybody can make this happen, it’s Stefan :-)
We've just released a new Insider update with some much-requested features, like being able to specify env vars when running, auth support for symbol servers, and proper progress reporting for symbol downloads. And of course, many fixes & QoL improvements.
Go check it out!
Check out this new article by Jelle about how we stream unsorted data in sorted order to ensure a fixed upper memory bound while processing gigabytes of capture data in Superluminal!
New article! What do you do when profiling your code shows the slowdown isn't in your code, but deep in the kernel? Why, you grab the kernel source and go spelunking.
How a routine profiling session turned into a Linux kernel patch: rovarma.com/articles/fro...
Thanks! We’re not using this, and I don’t think we’d even be able to correctly open captures made with this option currently. Good to know about it!
Re: slowing down the capture, compared to “not doing anything at all”, I can definitely see this being slower indeed.
We could, yeah, but that has the disadvantage that other tools wouldn’t be able to open Superluminal captures anymore. Could still be worth it as an option as you say.
For the Linux version we’re doing everything ourselves, and captures there are *much* smaller as a result.
> if you're interested
definitely!
The ETW file itself is just a straight dump of the raw data without further processing. The goal there is to keep the overhead of capturing low, which means doing as little as possible to log data. Even compression doesn’t happen until after the capture is done.
My co-founder Jelle wrote an article about a custom data structure he came up with for Superluminal to efficiently store millions of callstacks.
Check it out!
I've been wanting to start a blog for a while, and finally decided to bite the bullet.
The first article of hopefully many more to come is about, you guessed it, profiling & optimization.
RTs appreciated!
rovarma.com/articles/opt...
Great post!
Including a sneak peek of a certain profiler on a platform that is very much not Windows ;-)
It's understandable that Unreal needs to touch a lot of files when starting the editor. But what if I told you that >5500 of those files are not needed for the editor to start at all and are just adding seconds to the editor launch time?
(Fix included!)
#u5 #gamedev
larstofus.com/2025/09/27/s...
to be fair, you could have seen this coming from the “runs inside the terminal” as if that is something positive :p
Nice investigation! Sampling profilers > instrumenting profilers when you need to see what’s happening in code you *didn’t* write. Great example of the right tool for the job!
My new blog post is there, and it's a bit different from usual: Fixing stutters in your own code is hard enough, but this time I try to fix performance issues in a closed-source game. No source code or debug symbols, but a lot of guesswork. larstofus.com/2025/07/27/p...
#gamedev
#Trackmania
Days since I've had to waste time debugging obscure issues caused by Linux's deranged shared library model: 0
"Nice that you're linking to a static library, but there's a shared lib loaded with the same symbol name in it, so I'm gonna use that one instead, ok?"
tfw you're collateral damage in the Great AI Wars
This was a great example of "how hard can it be?". Well, 4 days of full-time work fighting with Qt, that's how hard.
So glad you like it! ;-)
It turns out when you’re writing code that runs on each sample interval to collect stacks, you don’t have a lot of time if you’re targeting high sampling rates :-)
We've been micro-optimizing our eBPF code, and it reminds me of the SPU era a bit. The compiler/JIT is so basic that old tricks are useful again. Regular C turns into atrocious ASM, but writing C like it's ASM fixes it. I'm kinda loving it.
It's all stuff like this (before/after):
Solved it by the ancient tradition of Just Reading The Code.
Turns out continuously taking the RCU lock by inserting thousands of elements into a BPF_MAP_TYPE_LRU_HASH from within a NMI is Not Good for your system.
Rolled our own (simpler) version directly in eBPF.
How does one diagnose the entire Linux system locking up when using a particular eBPF data structure? Are there any post-mortem logs to look at? dmesg is only about the current session.
Asking for a friend.
In our case we’re looking at optimizing the perf of a single program, so an overview of which programs are running and how much time they cost is not that useful; we want to know which of the thousands of lines of code in *our* programs we need to focus on :-)