Eric Schmidt got a standing ovation from the TED audience this morning.
An absolute pleasure to interview him on the red circle.
We dove into the big questions—superintelligence, national strategy, open source, and what it means to be human in the age of AI.
One for the books.
Posts by Bilawal Sidhu
TikTok ban imminent, yet funny how things change.
>2020: Stressed about TikTok drama at 120K subs.
>2024: Sitting at 994K and completely unfazed.
Ban it? Cool, I’ll build elsewhere. Keep it? Roger that, I’ll double down.
The game is bigger than any one app. Who cares about vanity metrics.
Merry Christmas y’all! 🎄
Pictured: 3d scan vs. ground truth of the feast to follow
Omnidirectional 3D video of reality — damn near teleportation in a VR headset.
This $17,000 VR camera released in 2017 was ahead of its time. 17 cameras → cloud stitching → 8K x 8K stereo VR video.
The moment is ripe for a new 4d capture rig optimized for dynamic 3d gaussians. Anyone building one?
It's one of those through lines when tackling a timeless mission like mapping the world or spatial computing - VR content created for immersion becoming the foundation for teaching machines to understand how the world moves. Sometimes innovation chains together in unexpected ways! stereo4d.github.io
And given we're dealing with real stereoscopic content, results are notably better than synthetic data, giving you a faithful rendition of the real-world with a diverse set of subject matter.
They're using it to train this model called DynaDUSt3R that can predict both 3D structure and motion from video frames. Which means it tracks how objects move between frames while simultaneously reconstructing their 3D shape.
It was always clear that stereo datasets would be valuable -- and we launched some cool VR tools with it back in 2017 (link below). But the game changer now in 2024 is the scale -- they're providing 110K clips :-) That's the kind of massive, real-world dataset that was just a dream in those days!
Check out this Stereo4D paper from Google DeepMind. It's a pretty clever approach to a persistent problem in computer vision -- getting good training data for how things move in 3D. The key insight is using VR180 videos -- those stereo fisheye videos we launched back in 2017 for YouTubeVR 🧵
The future isn't just virtual or augmented – it's ambient and intelligent
The Google XR unlocked event in NYC
5. Image to video (remix) feature is cool, but CLEARLY needs UI like Kling/Runway motion paint so it isn’t a chaotic mess / constant game of slot machine AI
Will be interesting to do head to head comparisons with US and Chinese models Sora goes live.
3. Physics still very wonky (no magic fix yet) – rhino is moving all across the ground; phones appear/disappear like it’s a magician
4. Wow is there a lot of news footage in the training data – generated night time grainy footage is no problem at all
1. Sora is VERY good at generating high frequency detail (video doesn’t seem blurry at all) – it’s the most impressive quality to me
2. As expected, Sora is great at well imaged landmarks – AI’s ability to generate custom “stock” footage remains promising
MKBHD dropped his OpenAI Sora review (after a week of testing) the much hyped AI video model.
5 immediate observations:
The future of 3D AI took some serious leaps -- from single images to fully interactive, dynamic 3D worlds. Here's what's cooking at the cutting edge: youtu.be/T7bcYSSSC6s
Wav2lip can FINALLY rest in peace. Being able to retarget the facial performance of characters in *existing* live action & CG video makes Act-One an extremely useful tool for all types of creators.
Nicely done RunwayML!
the entire bay area quaked hearing that chatgpt pro is gonna cost $200/month
Very cool! Would love to see a workflow breakdown
The race for building the biggest, baddest world model is very much on. Meanwhile, all I can think is "if only Stadia was still around!"
Check out the various results (and some fun outtakes) below: deepmind.google/discover/blo...
Not quite ready for prime time, but promising on two fronts:
1. For game developers: enabling rapid prototyping of interactive experiences straight from concept art
2. For AI research: providing unlimited, diverse 3D environments for training and testing AI agents
Right now Genie 2 can generate consistent worlds for up to a minute. And this world model seems to generate larger 3D worlds than what World Labs showcased yesterday. Plus they're dynamic vs. static worlds – the foliage moves in the wind, the water ripples etc.
Imagine making 2D concept art for a game world –pressing a button – and suddenly you can walk around an interactive 3D world. That's what Google DeepMind's new paper Genie 2 can do – simulate virtual worlds, including the consequences of any action (e.g. unlock door, jump, swim etc).
It's the same reason people browse Zillow houses or watch shows about mansions. AI or not — software reviews simply don't hit the same.
Observed: All mega popular tech creators focus on hardware — there's no MKBHD for software. It's literally called "Unbox Therapy" for a reason. Even if people won't buy the devices, there's something about vicariously living through that tech review experience.
Tencent’s open weights Hunyuan Video 13B model looks impressive — oh, and image-to-video and facial performance? They’re coming too.
If 2024 was the year open-source LLMs caught up with closed-source AI — 2025 will be the year open-source video catches up.
World Labs first demo dropped, and it’s consistent 3D worlds from a single 2D image.
Decent volume size to move around in — def a big step up from the RGB + depth 360 environments we’re used to e.g. Blockade Labs.
Stylized results look good; i’d love to see more photorealistic AI generations!
Google 2.5D temporal data, very nice.
Augmented reality x-ray vision to “see through” concrete.
Your infrastructure won’t just be scanned — it’ll be anchored to reality.
Demo: Pix4D reality capture with precise geospatial localization.
Video compression is pretty bad compared to X and Threads too