How is AI helping robots to generalise their skills to unfamiliar environments? 🤖 🏠
In the latest episode, I chatted to Prof. Lerrel Pinto (@lerrelpinto.com) from New York University about #robot learning and decision making.
Available wherever you get your podcasts: linktr.ee/robottalkpod
Posts by Lerrel Pinto
This project, which combines hardware design with learning-based controllers was a monumental effort led by @anyazorin.bsky.social and Irmak Guzey. More links and information about RUKA are below:
Website: ruka-hand.github.io
Assembly Instructions: ruka.gitbook.io/instructions
We just released RUKA, a $1300 humanoid hand that is 3D-printable, strong, precise, and fully open sourced!
The key technical breakthrough here is that we can control joints and fingertips of the robot **without joint encoders**. All we need here is self-supervised data collection and learning.
This would be funny! 😂
When life gives you lemons, you pick them up.
(trained with robotutilitymodels.com)
A photo of Lerrel looking happy.
What would you love to know about #robot learning and decision making?
Later this season, I'll be chatting to Prof. Lerrel Pinto (@lerrelpinto.com) from NYU about using machine learning to train robots to adapt to new environments.
Send me your questions for Lerrel: robottalk.org/ask-a-question/
Is there a word for the feeling when you want to cheer for the other team?
This project was an almost solo effort from @haldarsiddhant.bsky.social. And as always, this project is fully opensourced.
Project page: point-policy.github.io
Paper: arxiv.org/abs/2502.20391
The overall algorithm is simple:
1. Extract key points from human videos.
2. Train a transformer policy to predict future robot key points.
3. Convert predicted key points to robot actions.
Point Policy uses sparse key points to represent both human demonstrators and robots, bridging the morphology gap. The scene is hence encoded through semantically meaningful key points from minimal human annotations.
The robot behaviors shown below are trained without any teleop, sim2real, genai, or motion planning. Simply show the robot a few examples of doing the task yourself, and our new method, called Point Policy, spits out a robot-compatible policy!
This is important because the humble iPhone is one of the best accessories for embodied AI out there, if not actually the best. It's got a depth sensor, good camera, built-in internet, decent compute, and -- uniquely -- it has really good slam already built in.
It should be accessible in EU now!
AnySense is built to empower researchers with better tools for robotics. Try it out below.
Download on App store: apps.apple.com/us/app/anyse...
Open-source code on GitHub: github.com/NYU-robot-le...
Website: anysense.app
AnySense is led by @raunaqb.bsky.social with several from NYU.
With this 'wild' robot data, data collected by AnySense can then be used to train multimodal policies! In the video above, we use the Robot Utility Models framework to train Visuo-Tactile policies for a whiteboard erasing task. You can use it for so much more though!
We just released AnySense, an iPhone app for effortless data acquisition and streaming for robotics. We leverage Apple’s development frameworks to record and stream:
1. RGBD + Pose data
2. Audio from the mic or custom contact microphones
3. Seamless Bluetooth integration for external sensors
A useful “productivity” trick is to remind yourself that research should be fun and inspiring and if it’s not that something should change.
Just found a new winner for the most hype-baiting, unscientific plot I have seen. (From the recent Figure AI release)
One reason to be intolerant of misleading hype in tech and science is that tolerating the small lies and deception is how you get tolerance of big lies
Thanks Tucker! The timing of this is great given the uncertainty with other funding mechanisms.
Thank you to @sloanfoundation.bsky.social for this generous award to our lab. Hopefully this will bring us closer to building truly general-purpose robots!
Yes, this is one of our inspirations!
A fun, clever idea from @upiter.bsky.social : treat code generation as a sequential editing problem -- this gives you loads of training data from synthetically editing existing code
And it works! Higher performance on HumanEval, MBPP, and CodeContests across small LMs like Gemma-2, Phi-3, Llama 3.1
Thanks Eugene! Sounds exciting!
Hi Eugene, this sounds cool! Could you comment a bit on how well simulated driving agents translate to real world driving?
We have been working a bunch on offline world models. Pre-trained features from DINOv2 seem really powerful for modeling. I hope this opens up a whole set of applications for decision making and robotics!
Check out the thread from @gaoyuezhou.bsky.social for more details.
nah they are friendly cat food by folks around NYU AD.
Your robot looks cool!
If you’re in grad school, finding a therapist can be really helpful. The thing you’re doing is hard and it’s harder if you don’t have help managing imposter syndrome, stress, self esteem, and a whole bunch of other things.
omg a student somehow accidentally wrote an email addressed to a faculty-wide NYU listserv and my inbox is now a master class on who understands the difference between a listserv and an email chain