Advertisement ยท 728 ร— 90

Posts by Dhruv Batra

Video

yutori.com/api

- 3 rounds of prompting (description, feedback+screenshot, feedback)
- GPT 5.4 xhigh still lacking contrast.
- Opus 4.7 max even with explicit description of content doesn't seem to get that the navigator agent has eyes. Output #1 was just a ball with no details.

1 day ago 0 0 0 0
Video

I gave Claude Code & Codex a video of @yutori_ai Navigator logo spinning and asked for code to regenerate it.

Opus 4.7 max (left) vs GPT 5.4 xhigh (right)

GPT 5.4 clearly better.

Ground-truth / OG video in ๐Ÿงต

1 day ago 1 0 2 0
Preview
Introducing Navigator | Yutori A state-of-the-art web agent that autonomously navigates websites to complete everyday tasks.

API: docs.yutori.com/reference/n1
Benchmark details: yutori.com/blog/introd...

1 month ago 0 0 0 0
Post image

Two updates from Yutori:

1. We benchmarked GPT 5.4 on browser-use tasks
โ€ข Matches/slightly-outperforms Opus 4.6 (+0.3%)
โ€ข Big jump over previous OpenAI CUAs

2. Latest version of n1
โ€ข Outperforms GPT 5.4 and Opus 4.6 (+3%)
โ€ข 2.5x faster, 4-5x cheaper.

1 month ago 0 0 1 0
Preview
API | Yutori - AI agents for everyday digital tasks Build with Yutoriโ€™s state-of-the-art web agents that can autonomously monitor and execute tasks on the web.

yutori.com/api

2 months ago 0 0 0 0
Post image

Most recent checkpoint of n1 vs Opus 4.6!

On Navi-Bench and Westworld browser automation benchmarks:

- Same accuracy
- n1 is 2.5x faster
- n1 is 5.6x cheaper

Try it out via the Yutori API.

2 months ago 3 0 1 1
Post image

Fun chat with Evan O'Donnell about the similarities between training robots and web agents, managing context for agents that run for months and years, the future of the AI-first web, and ideal form factor for embodied AI.

www.thetimes.blog/p/agents-ne...

2 months ago 0 0 0 0

Maybe coding is just amortized inference for LLMs.

Maybe the reason we write programs down to files is just to save inference costs.

3 months ago 1 0 0 0
Preview
The bitter lesson for web agents Agents that generalize need to perceive the web like a human would.

Why?

Because the web is a mess. And websites are fundamentally built for human consumption.

If you want to automate everything that a human can do with a browser, then you have to perceive like a human.

Stay tuned for more in the coming weeks!

yutori.com/blog/the-bit...

5 months ago 1 0 0 0
Post image

The bitter lesson for web agents

The last 1 year has taught us a new bitter lesson that we think others are not yet grokking.

Agents that *look at the web like humans* (screenshots of sites) navigate and generalize better than agents that read code (HTML, DOM).

5 months ago 4 0 1 0
Advertisement
Post image Post image

Solved: robustness to paraphrasing and false premises, OCR, world-knowledge based reasoning.

Open: spatial reasoning, data-efficiency, learning compatible representations.

5 months ago 0 0 0 0

As part of the award ceremony, VQA team presented a recap of vision-and-language research over the last decade โ€” solved problems, progress, and open-challenges for mutimodal LLMs.

5 months ago 0 0 1 0

Lots to be done. Thank you to all our collaborators and the research community for this recognition!

6 months ago 0 0 0 0
Post image

Fun-fact: the T-shirt I'm wearing is an inside joke about the quality of 2015 models.

However, every few years we rediscover the lesson that on difficult tasks, VLMs silently regress to being nearly blind.

x.com/DhruvBatra_/...

6 months ago 1 0 1 0
Post image Post image Post image

VQA challenge series won the Mark Everingham prize at #ICCV2025 for stimulating a new strand of vision-and-language research.

It's extra special because ICCV25 marks the 10-year anniversary of the VQA paper.

When we started, the idea of answering any question about any image seemed outlandish.

6 months ago 12 2 1 0

Anything by Ted Chiang

6 months ago 6 0 1 0

I dunno man, Dagger is cool.

6 months ago 2 0 1 0
Advertisement

The problem with โ€œAI slopโ€ isnโ€™t the AI โ€” itโ€™s the slop.

People act like AI is the issue, when itโ€™s actually part of the fix.

If we're honest: most of what we make, most of the time, is slop by our own standards.

Thatโ€™s the generatorโ€“discriminator gap in creative work that Ira Glass talks about.

6 months ago 1 0 0 0

Somebody is a fan of Abundance

10 months ago 1 0 1 0

It is so refreshing to see conferences innovate on the reviewing model and run actual experiments (!) as opposed to fighting change.

1 year ago 3 0 0 0

Good. Autonomous interface locomotion is the fundamental robotics problem of our time. The more the merrier.

1 year ago 0 0 0 0

My entire robotics career has led to this.

1 year ago 5 1 1 0

The answer to many "why X?" questions:

Because the laws of physics do not prohibit X and the forces of biology gave us curiosity.

1 year ago 1 0 0 0
Preview
Yutori Weโ€™re building AI agents that can reliably do everyday digital tasks for you on the web, towards an AI chief-of-staff for everyone.

The web is the ultimate boss-level for agents โ€” dynamic, non-deterministic, and noisy; some mistakes are inevitable and so far, every agent fails eventually.

Yutori is building superhuman agents for this ultimate digital environment.

Join our waitlist for early access to our product!

yutori.com

1 year ago 2 1 2 0

๐ˆ๐ฆ๐š๐ ๐ข๐ง๐ž ๐š ๐ฐ๐จ๐ซ๐ฅ๐ ๐ฐ๐ก๐ž๐ซ๐ž ๐ง๐จ ๐ก๐ฎ๐ฆ๐š๐ง ๐ก๐š๐ฌ ๐ญ๐จ ๐๐ข๐ซ๐ž๐œ๐ญ๐ฅ๐ฒ ๐ข๐ง๐ญ๐ž๐ซ๐š๐œ๐ญ ๐ฐ๐ข๐ญ๐ก ๐ญ๐ก๐ž ๐ฐ๐ž๐› ๐š๐ ๐š๐ข๐ง.

Where teams of AI assistants coordinate to book flights, manage budgets, or file paperworkโ€”proactively surfacing insights and correcting errors.

Only problem โ€” no one knows how to build AI agents that actually work.

1 year ago 0 0 1 0
Advertisement
Post image

I started something new last year with a wonderful group of people. We showed a demo in Jan.

Today, weโ€™re telling our story โ€” show before you talk!

๐˜ž๐˜ฆ ๐˜ข๐˜ณ๐˜ฆ ๐˜ณ๐˜ฆ-๐˜ช๐˜ฎ๐˜ข๐˜จ๐˜ช๐˜ฏ๐˜ช๐˜ฏ๐˜จ ๐˜ฉ๐˜ฐ๐˜ธ ๐˜ฑ๐˜ฆ๐˜ฐ๐˜ฑ๐˜ญ๐˜ฆ ๐˜ช๐˜ฏ๐˜ต๐˜ฆ๐˜ณ๐˜ข๐˜ค๐˜ต ๐˜ธ๐˜ช๐˜ต๐˜ฉ ๐˜ต๐˜ฉ๐˜ฆ ๐˜ธ๐˜ฆ๐˜ฃ โ€” one of humanityโ€™s greatest inventions and a a mess overdue for an overhaul.

yutori.com

1 year ago 10 1 1 0

Ah, understood. No idea about the tracing of that meme.

1 year ago 0 0 0 0

Seems like the ultimate thing to rally around, no? To the extent there is any purpose, what's the alternative?

1 year ago 1 0 1 0

I'm already there for low-stakes queries.

1 year ago 2 0 1 0

Where's the skepticism coming from? Now that web search and citations are in there, isn't it easy to verify and thus become more confident?

1 year ago 2 0 1 0