@neuripsconf.bsky.social, a quick question: are we going to have a competition track this year?
Posts by Alexandra Gomez-Villa π³οΈβπ
Children exhibit visual understanding from limited experience, orders of magnitude less than our best models.
We introduce the Zero-shot World Model (ZWM). Trained on a single child's visual experience, BabyZWM rapidly generates competence across diverse benchmarks with no task-specific training. π§΅
How do we make attention actually capture context?
Exclusive Self Attention (XSA) is an interesting variant that improves attention with minimal cost in speed & memory.
Check out the video here: youtu.be/2eZKT4H9_iQ
Are you a PhD student specializing in audio or AI? Interested in an internship with us in Barcelona? Apply here!
We've had the pleasure of working with talented interns in the past, and the experience has been mutually very rewarding.
www.linkedin.com/jobs/view/40...
I have just read this book and I strongly identify with the authors. In the increasingly accelerated landscape of machine learning (where a paper from two years ago is considered "old"), I think these kinds of reflections are really important for researchers in academia.
A long human life is about 90 years. You can visualize that as weeks. I wrote an R/shiny app and a Python CLI (click) that'll make this plot for you when you input your birthday, and give you a printable 8.5x11 PDF. https://github.com/stephenturner/lifeweeks #Rstats π§΅ 1/5
SIGGRAPH'25 (form): 12 days.
RSS'25 (abs): 13 days.
SIGGRAPH'25 (paper-md5): 19 days.
RSS'25 (paper): 20 days.
ICML'25: 26 days.
RLC'25 (abs): 41 days.
RLC'25 (paper): 48 days.
IROS'25: 56 days.
ICCV'25: 61 days.
Researcher: "We let the data speak for itself."
Earlier that day:
What was the most important machine learning paper in 2024?
My Famous Deep Learning Papers list (that I use in teaching) does not include any new ideas from the last year.
papers.baulab.info
Which single new paper would you add?
New* video! If youβve ever wondered what topology is, this problem is one of the best examples I know of to give an authentic sense of what itβs all about: youtu.be/IQqtsm-bBRU
Whatβs on the horizon for AI in 2025? Leading Stanford faculty offer their expectations in the new year. hai.stanford.edu/news/predict...
A post by @cloneofsimo on Twitter made me write up some lore about residuals, ResNets, and Transformers. And I couldn't resist sliding in the usual cautionary tale about small/mid-scale != large-scale.
Blogpost: lb.eyer.be/s/residuals....
Registration for #DLBCN 2024 is now open for the general public. Get your ticket today !
sites.google.com/view/dlbcn20...
Qui Gon Jinn sharing some insightful prompting wisdom ππΌ
Nice high-level explainer of camera calibration by Main Street Autonomy.
youtu.be/IHzRSLvRW9c
Welcome @egavves.bsky.social!
Been wondering where did all of the computer vision people on X go? Check out my 2 starter packs π
π¬ Weekend Science Spotlight π§ͺ
Letβs share a recent study by someone else worth spotlighting!
My pick is from @jscicom.bsky.social: jcom.sissa.it/article/pubi...
It explores how science comics boost engagement & understanding to simplify complex topics.
Whatβs yours? π€
#SciComm #SciArt
Full paper:
van Rooij, I., Guest, O., Adolfi, F. et al. Reclaiming AI as a Theoretical Tool for Cognitive Science. Comput Brain Behav (2024). doi.org/10.1007/s421...
Anthropic has some comprehensive resources covering tool use and AI agents that few know about.
The repository features five courses, from prompt engineering to evaluation and tool use.
- Sahar Mor.
github.com/anthropics/c...
On the Karpathy-Schmidhuber discussion: here is a slide I sometimes present on the history and origins of attention, which is shared between ML, image-processing (work of Jean-Michel Morel), NLP and computer vision.
One of my astute grad students made the observation that the meme also works if you switch the grad student and PI. π€£
I am super hyped and happy with our recent paper on a β¨VampPrior 2.0β¨: Hierarchical VAE with a diffusion-based VampPrior! π¦ We got SOTA VAE results on CIFAR-10! Kudos to Anna Kuzina because this TMLR paper is the last chapter in her PhD thesis π€©
π tinyurl.com/22rvzc4f
π» github.com/AKuzina/dvp_...
The most upvoted papers from the Chinese community on the Daily Papers - Novemberπ₯
huggingface.co/collections/...
I don't really see a clear path where we keep an open internet that is not mostly full of AIs talking to each other. We can't reliably detect AI content, it is cheap and easy to generate, and there are lots of incentives to do so, even besides scams.
You can see the problem on all the social sites.
Have you ever wondered how to train an autoregressive generative transformer on text and raw pixels, without a pretrained visual tokenizer (e.g. VQ-VAE)?
We have been pondering this during summer and developed a new model: JetFormer ππ€
arxiv.org/abs/2411.19722
A thread π
1/
I learned about this paper (arxiv.org/abs/2406.09413) when Alexei gave this wonderful talk at the U. They trained 60K diffusion models, each for a different person's visual identity. Sampling weights from this set creates a model for a novel identity.
Science rarely affords you the luxury of being exactly right about anything. Critics of your work will expect you to have all the answers at once, but in fact progress is more often about being vaguely right and working your way toward the truth one small step at a time.