Advertisement · 728 × 90

Posts by Christoffer Koo Øhrstrøm

Looking forward to hear it 🤞 Happy to help if there is more you need.

2 months ago 1 0 0 0

Always happy to compare to good and interesting work :)

2 months ago 0 0 0 0

Our experiments use absolute. It would probably work about the same with normalized coordinates though, but I recon it likely requires fiddling a bit with the initialization range of the projection matrix (W_p) if you prefer normalized.

2 months ago 1 0 1 0

Thanks! I am not too familiar with those tasks, but no I don't think it should be hard to test. And it would be quite interesting to do. Our code is available and the implementation is plug-and-play with standard attention. You only need to give it the nD-position of each token.

2 months ago 1 0 1 0
Parabolic Position Encoding Where to Attend: A Principled Vision-Centric Position Encoding with Parabolas

And here it is. Maybe something along these lines you were thinking of? Designed directly for vision, tested on 2D, 2D-T, 3D, and multi-modal, and it extrapolates very well.

Paper: arxiv.org/abs/2602.01418
Website: chrisohrstrom.github.io/parabolic-po...
Code: github.com/DTU-PAS/para...

2 months ago 5 1 1 0
Preview
Where to Attend: A Principled Vision-Centric Position Encoding with Parabolas We propose Parabolic Position Encoding (PaPE), a parabola-based position encoding for vision modalities in attention-based architectures. Given a set of vision tokens-such as images, point clouds, vid...

Where to Attend: A Principled Vision-Centric Position Encoding with Parabolas

Paper: arxiv.org/abs/2602.01418
Website: chrisohrstrom.github.io/parabolic-po...
Code: github.com/DTU-PAS/para...

@rgring.bsky.social @lanalpa.bsky.social

2 months ago 7 1 0 0

What if position encodings were designed for vision from scratch? We introduce PaPE—Parabolic Position Encoding. Outperforms RoPE on 7/8 datasets and extrapolates to higher resolutions without fine-tuning or position interpolation. Paper, code, and website in thread 🧵

2 months ago 36 7 3 0

Actually working on a principled encoding for 2D, 2D-T, and 3D. Coming soon in a couple of weeks ;)

3 months ago 3 0 1 0
Advertisement

Congratulations. You are now officially Danish.

5 months ago 1 0 0 0
Preview
GitHub - DTU-PAS/spiking-patches: Spiking Patches: Asynchronous, Sparse, and Efficient Tokens for Event Cameras Spiking Patches: Asynchronous, Sparse, and Efficient Tokens for Event Cameras - DTU-PAS/spiking-patches

Thanks to my collaborators @rgring.bsky.social @lanalpa.bsky.social.

Try it out for yourself: github.com/DTU-PAS/spik...

5 months ago 1 0 0 0
Post image

We also get a much smaller input sizes with up to a 6.9x reduction over voxels and up to a 8.9x reduction over frames.

5 months ago 2 0 1 0
Post image

Results are pretty good. Inference speedups are up to 3.4x over voxels for a point cloud network and up to 10.4x over frames for a Transformer.

This comes without sacrificing accuracy. We even outperform voxels and frames in most cases on gesture recognition and object detection.

5 months ago 0 0 1 0
Post image

Spiking Patches works by creating a grid of patches and let each patch act as spiking neuron. A patch increases its potential whenever an event arrives within the patch, and a token is created everytime a patch spikes (when the potential exceeds a threshold).

5 months ago 0 0 1 0

We achieve this through tokenization of events. Our tokenizer is called Spiking Patches.

Something cool is that tokens are compatible with GNNs, PCNs, and Transformers.

This is the first time that anyone applies tokenization to events. We hope to encourage more of this.

5 months ago 0 0 1 0
Post image

What if we could represent events (event cameras) in a way that preserves both asynchrony and spatial sparsity?

Exited to share our latest work where we answer this question positively.

Spiking Patches: Asynchronous, Sparse, and Efficient Tokens for Event Cameras

Paper: arxiv.org/abs/2510.26614

5 months ago 6 2 1 2

How is external links to be understood? Is it e.g. okay to link to a video (not our own) with examples of a concept that we describe as a preliminary?

6 months ago 2 0 1 0

Can Dynamic Neural Networks boost Computer Vision and Sensor Fusion?
We are very happy to share this awesome collection of papers on the topic!

1 year ago 6 2 0 0
Advertisement

True. Not much of an issue on small codebases. Mostly just feels better with a snappier formatter for those.

1 year ago 2 0 0 0
Ruff An extremely fast Python linter and code formatter, written in Rust.

black is great, but I prefer Ruff because of speed and it is also a really nice linter. docs.astral.sh/ruff/

1 year ago 5 0 1 0
Post image

Inventors of flow matching have released a comprehensive guide going over the math & code of flow matching!

Also covers variants like non-Euclidean & discrete flow matching.

A PyTorch library is also released with this guide!

This looks like a very good read! 🔥

arxiv: arxiv.org/abs/2412.06264

1 year ago 109 27 1 1
Preview
SteeredMarigold: Steering Diffusion Towards Depth Completion of Largely Incomplete Depth Maps Even if the depth maps captured by RGB-D sensors deployed in real environments are often characterized by large areas missing valid depth measurements, the vast majority of depth completion methods st...

Maybe this is it? arxiv.org/abs/2409.10202 @jakubgregorek.bsky.social

1 year ago 8 1 1 0