#BrowserGym hashtag - Bluesky

@alex-lacoste.bsky.social

1 year ago

How ServiceNow Delivers Production Grade AI Agents Large Language Model(LLM) assistants such as ChatGPT have taken the world by storm and revolutionized many everyday tasks but Generative AI…

Just found this cool blogpost discussing #AgentLab, #BrowserGym and #TapeAgent

medium.com/@carolynduby...

1 0 0 0

Alexandre Lacoste

@alex-lacoste.bsky.social

1 year ago

We’re really excited to release this large collaborative work for unifying web agent benchmarks under the same roof.

In this TMLR paper, we dive in-depth into #BrowserGym and #AgentLab. We also present some unexpected performances from Claude 3.5-Sonnet

20 11 1 2

Alexandre Lacoste

@alex-lacoste.bsky.social

1 year ago

AgentLab diagram. The image describes AgentLab, a framework for efficient parallel experiments with agents. It highlights: Core Agent Features: Dynamic Prompting and a Unified LLM API for interacting with large language models. BrowserGym Platform: A tool for testing agents on benchmarks like WebArena, WorkArena, MiniWoB, and others. Key Features: Reproducibility, a Unified Leaderboard, an analysis tool called Xray, and a Dataset for sharing agent traces. Blue elements represent AgentLab components.

🧵-1
We are thrilled to release #AgentLab, a new open-source package for developing and evaluating web agents. This builds on the new #BrowserGym package which supports 10 different benchmarks, including #WebArena.

18 15 2 0