#Emergentcapabilities in #largelanguagemodels, such as in-context learning, can also appear in #visionlanguageaction (#VLA) models. Scaling up #roboticfoundationmodels allows for emergent human-to-robot transfer, improving performance on tasks demonstrated in human videos by approximately 2x.…
HyperVLA Cuts Vision-Language-Action Model Inference Cost by 90×
HyperVLA reduces inference load by activating only about 1% of VLA model parameters, delivering roughly a 120× speed boost while keeping zero‑shot success rates comparable. Read more: getnews.me/hypervla-cuts-vision-lan... #hypervla #visionlanguageaction
SITCOM Boosts Long‑Horizon Planning for Vision‑Language‑Action Robots
SITCOM adds a learned dynamics model to Vision‑Language‑Action robots, raising task success from ~50% to ~75% in SIMPLER tests; it was trained on the BridgeV2 dataset. Read more: getnews.me/sitcom-boosts-long-horiz... #sitcom #visionlanguageaction
CogVLA Boosts Vision‑Language‑Action Efficiency via Routing
CogVLA reaches 97.4% success on LIBERO and cuts training costs by ~2.5×, while reducing inference latency by ~2.8×. The code and model weights are open‑sourced on GitHub. Read more: getnews.me/cogvla-boosts-vision-lan... #cogvla #visionlanguageaction
Hybrid Training Cuts CoT Overhead for Vision-Language-Action Models
Hybrid Training lets VLA models learn chain‑of‑thought but skip the thought step at inference, reducing token output. It retained performance pick‑and‑place tests. Read more: getnews.me/hybrid-training-cuts-cot... #visionlanguageaction #hybridtraining
Vision-Language-Action Reinforcement Fine-Tuning Improves Robustness
VLA‑RFT reaches robust performance with under 400 fine‑tuning steps, beating supervised baselines, and uses a data‑driven world model as a controllable simulator (Oct 2025). getnews.me/vision-language-action-r... #visionlanguageaction #worldmodel
World-Env: Safe RL Post-Training for Vision-Language-Action Models
World‑Env provides a simulator that lets Vision‑Language‑Action models continue safe RL training, achieving gains with as few as five expert demonstrations per task. Read more: getnews.me/world-env-safe-rl-post-t... #visionlanguageaction #worldenv
IA-VLA Boosts Vision-Language-Action Models for Complex Robot Tasks
IA-VLA adds a vision-language model to enrich instructions, boosting success on tasks with identical objects. It outperformed baseline in duplicate-object tests. Read more: getnews.me/ia-vla-boosts-vision-lan... #visionlanguageaction #robotics
FreezeVLA: Action-freezing attacks on Vision-Language-Action models
Researchers show FreezeVLA can freeze Vision‑Language‑Action robots with a single adversarial image, achieving a 76.2% success rate across three leading VLA models. Read more: getnews.me/freezevla-action-freezin... #visionlanguageaction #adversarial
ThinkAct: Visual Latent Planning for Vision‑Language‑Action AI
ThinkAct, presented at NeurIPS 2025, uses a dual‑system where an LLM plans and a visual latent vector guides an action model, improving long‑horizon planning and self‑correction. getnews.me/thinkact-visual-latent-p... #thinkact #visionlanguageaction
Google DeepMind's Gemini Robotics On-Device is here!
This #VisionLanguageAction foundation model operates locally on robot hardware, enabling low-latency inference and can be fine-tuned for specific tasks with as few as 50 demonstrations.
👉 bit.ly/4ob3mQf
#Robotics #AI #GoogleDeepMind #InfoQ
Helix Revolutionizes Home Robotics with Cutting-Edge Vision-Language-Action Model
#HomeRobotics #VisionLanguageAction #HelixRobot