Advertisement · 728 × 90

Posts by Jonathan Lorraine

From Tasks to Topology: Dorsal and Ventral Streams Emerge in Optimized Neural Networks www.biorxiv.org/content/10.1101/2025.11....

5 months ago 1 2 0 0
Preview
NVIDIA 2026 Internships: PhD Generative AI Research - US | NVIDIA Corporation By submitting your resume, you're expressing interest in one of our 2026 Generative AI focused Research Internships. We'll review resumes on an ongoing basis, and a recruiter may reach out if your exp...

Apply here: nvidia.eightfold.ai/careers?star...

I'm personally interested in multimodal generation and the tools that power it.

6 months ago 0 0 0 0
Preview
NVIDIA Spatial Intelligence Lab (SIL) Advancing foundational technologies enabling AI systems to perceive, model, and interact with the world.

🔍 New NVIDIA Spatial Intelligence Lab internship postings for 2026.

Come work with us to advance foundational technologies that enable AI systems to model and interact meaningfully with the world!

Topics on our homepage: research.nvidia.com/labs/sil/

Application link below

6 months ago 5 0 1 0
Video

Join us at #CVPR2025 for a preview of this #NVIDIA tech during a live-coding session. A #GPU back end will be reserved for all attending – just don’t forget to bring your laptop for some hands-on fun!
Wed, Jun 11, 8am-noon, or join in at 10:20 after the break. tinyurl.com/nv-kaolin-cv...

10 months ago 3 1 0 0

We find a new set of use cases for Stable Audio Open ( @jordiponsdotme.bsky.social, @stabilityai.bsky.social, @hf.co) and other large pretrained audio generative models, like AudioLDM and beyond!

11 months ago 1 0 0 0

Our work is inspired by and builds on the SDS update of DreamFusion (dreamfusion3d.github.io/, @benmpoole.bsky.social , @ajayjain9.bsky.social , @jonbarron.bsky.social), and related updates (VSD, SDI @vincentsitzmann.bsky.social, SJC, many more!)

11 months ago 1 0 1 0

💡 SDS treats any differentiable parameter set as optimizable from a prompt. Source-guided separation emerged when we brainstormed novel uses. We hope for similarly practical tasks to surface—e.g., automatic Foley layering?—as the community experiments.

11 months ago 1 0 1 0

🚀 Vision of the Future: Content designers easily use one video + audio diffusion backbone with SDS-style updates to nudge any differentiable task—impacts, lighting, cloth, fluids—until the joint model says “looks & sounds right” given powerful user controls, like text.

11 months ago 1 0 1 0

⚠️ Limitations ⚠️

Clip-Length Budget: We optimized on ≤10 s clips; minute-scale audio may have artifacts or blow up memory. A hierarchical/windowed Audio-SDS could help here.

11 months ago 1 0 1 0

⚠️ Limitations ⚠️

Audio-Model Bias: We rely on Stable Audio Open, so when this struggles, e.g., on rare instruments, speech, audio without silence at the end, or out-of-domain SFX, our method can have difficulties. Other diffusion models can help here.

11 months ago 1 0 1 0
Advertisement
Post image

This project was led by the great work of @jrichterpowell.bsky.social along with Antonio Torralba.

See more work from the NVIDIA Spatial Intelligence Lab: research.nvidia.com/labs/toronto...

Work supported indirectly by MIT CSAIL, @vectorinstitute.ai

#nvidia #mit

11 months ago 1 0 1 0
Post image

Results on Prompt-Guided Source Separation:

We report an improved SDR to ground-truth sources when available and show improved CLAP scores after training.

11 months ago 1 0 1 0
Post image

Results on Tuning FM Synthesizers & Impact Synthesis:

We improve CLAP scores over training for prompts, along with qualitative results. Impact synthesis shows improved performance on impact-oriented prompts.

11 months ago 1 0 1 0
Post image

Results on Fully-Automatic In-the-Wild Source Separation:

We demonstrate a pipeline that takes a video from the internet, captions the audio with a model (like AudioCaps), and provides that to an LLM-assistant who suggests source decompositions. We run our method on the suggested decompositions.

11 months ago 1 0 1 0
Post image

Modifications to SDS for Audio Diffusion:

🅰 We use an augmented Decoder-SDS in audio space, 🅱 using a spectrogram emphasis to better weight transients, and 🅲️ multiple denoising steps to increase fidelity.

This image highlights these in red in the detailed overview of our update.

11 months ago 1 0 1 0
Post image

③ Prompt-Guided Source Separation:

A prompt-conditioning source separation for a given audio, such as separating a “sax …” and “cars …” from a music recording on a road, by using the audio-SDS update for each channel while forcing the sum of channels to reconstruct the audio.

11 months ago 1 0 1 0
Post image

② Physical Impact Synthesis:

We generate impacts consistent with prompts like “hitting pot with wooden spoon” by convolving an impact with a learned object and reverb impulse. We learn the parametrized forms of the object and reverb impulses.

11 months ago 1 0 1 0
Post image

① FM Synthesis:

A toy setup where we generate settings aligning with prompts like “kick drum, bass, reverb” using sine oscillators modulating each other’s frequency as in a synthesizer.

We visualize the final optimized parameters as the dial settings on a synthesizer instrument's user interface.

11 months ago 1 0 1 0
Post image

We propose three novel audio tasks: ① FM Synthesis, ② Physical Impact Synthesis, and ③ Prompt-Guided Source Separation.

This image briefly summarizes the use case, optimizable parameters, rendering function, and parameter update.

11 months ago 1 0 1 0
Advertisement
Post image

Intuitively, our update finds a direction to move the audio to increase its probability given the prompt, by noising and denoising with our diffusion model, then “nudging” our audio towards it by propagating the update through our differentiable rendering to our audio parameters.

11 months ago 1 0 1 0
Video

🔊 New NVIDIA paper: Audio-SDS 🔊
We repurpose Score Distillation Sampling (SDS) for audio, turning any pretrained audio diffusion model into a tool for diverse tasks, including source separation, impact synthesis & more.

🎧 Demos, audio examples, paper: research.nvidia.com/labs/toronto...

🧵below

11 months ago 6 1 1 0
Video

What if you could control the weather in any video — just like applying a filter?
Meet WeatherWeaver, a video model for controllable synthesis and removal of diverse weather effects — such as 🌧️ rain, ☃️ snow, 🌁 fog, and ☁️ clouds — for any input video.

11 months ago 3 1 1 1

We envision a future where LLMs are universal generative tools capable of seamlessly producing content across multiple modalities, including text, images, videos, and 3D structures.

1 year ago 4 0 0 0

Integrating 3D mesh generation into LLMs opens exciting possibilities for interactive design. Users can converse with a model to create and manipulate 3D objects in real time.

1 year ago 2 0 1 0

We're excited to scale LLaMA-Mesh to handle more complex and detailed meshes by extending context lengths. Integrating textures and physical properties, exploring larger base models, part-based generation, and enabling dynamic generation are interesting ways forward!

1 year ago 0 0 1 0

Due to context length constraints, we're currently limited to meshes with up to 500 faces. We generate one 3D object per dialog due to our fine-tuning dataset construction. We see a slight degradation in language ability, perhaps due to using UltraChat in fine-tuning.

1 year ago 0 0 1 0
Post image

This project was led by Zhengyi Wang with Yikai Wang, Hang Su, Jun Zhu, Sanja Fidler, and Xiaohui Zeng.

See more work from the #NVIDIA Toronto AI Lab here: research.nvidia.com/labs/toronto...

Work supported by Tsinghua University, @vectorinst.bsky.social, @uoft.bsky.social #UofT #Tsinghua

1 year ago 1 0 1 0
Advertisement