Hello graphics Bluesky! π§‘
I heard through the grapevine that my colleague Daniil Smoliakov is looking for new opportunities!
He's one of the most talented and tireless people I've ever seen at work so he deserves the praise π
Try and get him while you can! ππ¨
www.linkedin.com/in/daniil-sm...
Posts by Ruben Osorio
My favourite flavour of this is βx86 is interpretedβ. Such a high ragebait hit-rate π
Theyβll have to pry the idea of accumulating samples from my cold dead hands π
Thanks so much Albert! π§‘
π₯π©π¦ Give us the sub-pixels!!! π₯π©π¦
I don't have plans to do it right now but maybe at some point in the future if time allows π
Part of the message I'm trying to send here is that this isn't too hard to do. I'd love for people to have a go with their own implementations and share improvements if they can! π§‘
Itβs an honour to be featured π€ Thank you so much! β€οΈ
Screenshots not a great strength either I see π
The high-res stuff is in the post anyway, like this one with a bunch of different fonts rendered at high res straight out of the test app I was running this on:
osor.io/text/lorem_i...
Oh no! π
Image of a joke fake article over a blue background, it reads: Breaking: Graphics Programmer Does Text Again! The rabbithole police has been called to the codebase after sightings of an anonymous graphics programmer onanistically replacing their whole text rendering implementation. 'It's just a waste of time' said a witness of the large changelists submitted to the repository; "SDFs looked pretty good, I really don't understand" said a former colleage of the suspect. An in-depth article will follow expanding on the alleged wrongdoings of the perpetrator. More news at eleven.
Video is still not Bluesky's forte eh? Here's a screenshot! π
My first one one got an unexpected amount of interest. Huge thanks to everyone who read it! (Especially @jendrikillner.bsky.social since he was probably the biggest reason π)
This topic gets way more coverage but I've never seen it done/presented like this, so trying to make my contribution π
Hola again graphics peeps! π
I found myself with enough bits and pieces related to text rendering to warrant a write-up. So here it is! π±
osor.io/text
Spiced up with direct vector rendering, sub-pixel anti-aliasing, run-time atlas packing, temporal accumulation, and more!
I hope you enjoy it! π§‘
The most sensible approach is obviously that half res and quarter res *both* mean half in each axis / quarter of the pixels.
π§¨π§¨π§¨π§¨π§¨π§¨π§¨
Hey! Thanks so much! π
Unfortunately it was a one-off build since I donβt have much time these days for this kind of project π
I would encourage people to attempt their own custom builds though. Or look for custom controller/arcade-stick builders directly since thereβs some already out there π
Thanks man! π
Gracias AndrΓ©s! β€οΈ
Brain played it with the exact cadence and slapped the music right after π
The first active thread of the wave does the atomic and retrieves the global offset for the wave, WaveReadLaneFirst then broadcasts it.
The local offset within the wave comes from WavePrefixCountBits, since it's just the count of how many threads with a lower index are also writing one element.
If you need the correct index per-thread, as you do when you're going to write the samples to the buffer, there's some more wave-ops involved, since you also need to calculate the local offset for each thread on the wave.
WaveReadLaneFirst/WavePrefixCountBits sorts you out, here is how it'd look:
Oh! Also worth mentioning. In this sort of system you'll see a lot of contention when writing to shared counters.
It's a good idea to minimize this by doing the global write once per wave or group.
A neat trick is to also scalarize on the shader/draw for when a wave sees different values there π
Paying my respects with a video rendering 10% of the pixels each frame (hacking this in just now so turning all denoising and TAA off, no reprojection of "empty" pixels either π).
(Prepare for the bsky video butchering though)
@adrien-t.bsky.social also made me aware of @h3r2tic.bsky.social's amazing presentation in h3.gd/a-deferred-m.... Super cool to see the per-draw lists and all the spatial and temporal VRS experiments β€οΈ
And because this approach ends up compacting the list of pixels per-draw, it responds really well to scenes with heavy dithering into the visibility buffer.
Some of the other tile-based approaches I tried struggled with this, since they'd need to dispatch multiple resolve tiles per tile on-screen.
Plus you also can select variations of the shaders to further optimize!
If a pixel is not seeing any local lights, you can dispatch a version of the resolve shader that has all the local light code compiled out. Or if a pixel is fully in shadow from directional light, nuke all that code too, etc...
I *really* like the flexibility of this approach while keeping the resolve waves full.
With this you can do software VRS easily, both spatially and temporally, which is super cool! You can just write any logic to decide how the visibility buffer values map to a sample to resolve.
There's only a few waves per-frame that see two or more draws when resolving, which are the only cases where the waves aren't fully utilized.
This is as good as it can be anyway, if those waves weren't going to shade another draw, they would have been inactive in a "wave-perfect" resolve anyway.
This can include a lot of data like per-draw transforms, tinting, material stuff, and notoriously also including the bindless texture indices. Nice not to need any of that NonUniformResourceIndex uglyness.
Because the list is not only sorted per-shader but also per-draw, most of the waves will just see one draw!
To take advantage of this, I scalarize based on the draw index, so all of the draw-related data can be uniform and stay in scalar registers. With a noticeable win in occupancy too π
The resolve is dispatched per-shader, not per-draw, so it's still a very manageable amount of dispatch indirect calls.
As in, a single one for all your variations of standard, anisotropic, whatever... so there isn't a ton of concerns about empty dispatch indirect arguments.
Tried writing them once unsorted then sorting, but like I imagined, is just too expensive.
I do believe you could make something like this work if you tiled the whole resolve and kept the working set of in local memory though. You need enough tiles/work in flight to feed a big GPU though.