Advertisement ยท 728 ร— 90

Posts by Alexandre Moufarek ๐Ÿ”œ GDC ๐ŸŽฎ

A detailed scatter plot showcasing the relationship between performance (Arena Score) and cost-effectiveness (Cost per 1M Output Tokens) for a range of Large Language Models (LLMs). The y-axis represents the Arena Score, a measure of model performance, ranging from 1160 to 1380. The x-axis, in logarithmic scale, depicts the Cost ($/1M Output Tokens), spanning from 0.1 to 100. Each point on the plot corresponds to an LLM, color-coded by its developing organization. Prominent models like Gemini 2.0 Flash, DeepSeek-R1, GPT-4, and Claude 3 Opus are labeled. The plot reveals a general pattern: as the cost decreases, the Arena Score tends to decrease as well, indicating a trade-off between cost and performance. An annotation "Cheaper" with an arrow pointing left towards the lower cost region is present, along with a reference to Imarena.ai/price

A detailed scatter plot showcasing the relationship between performance (Arena Score) and cost-effectiveness (Cost per 1M Output Tokens) for a range of Large Language Models (LLMs). The y-axis represents the Arena Score, a measure of model performance, ranging from 1160 to 1380. The x-axis, in logarithmic scale, depicts the Cost ($/1M Output Tokens), spanning from 0.1 to 100. Each point on the plot corresponds to an LLM, color-coded by its developing organization. Prominent models like Gemini 2.0 Flash, DeepSeek-R1, GPT-4, and Claude 3 Opus are labeled. The plot reveals a general pattern: as the cost decreases, the Arena Score tends to decrease as well, indicating a trade-off between cost and performance. An annotation "Cheaper" with an arrow pointing left towards the lower cost region is present, along with a reference to Imarena.ai/price

A horizontal bar chart ranking the top 25 Large Language Models (LLMs) by their hallucination rates. The models are listed vertically on the left, in descending order of performance (lowest hallucination rate at the top). Each bar's length corresponds to the hallucination rate, represented as a percentage and displayed to the right of the bar. The chart includes specific models like Google Gemini-2.0-Flash-001 (0.7%), OpenAI-03-mini-high-reasoning (0.8%), and Snowflake-Arctic-Instruct (3.0%). The data is provided by Vectara and was last updated on February 5th, 2025. The chart highlights the variability in hallucination rates across different LLMs.

A horizontal bar chart ranking the top 25 Large Language Models (LLMs) by their hallucination rates. The models are listed vertically on the left, in descending order of performance (lowest hallucination rate at the top). Each bar's length corresponds to the hallucination rate, represented as a percentage and displayed to the right of the bar. The chart includes specific models like Google Gemini-2.0-Flash-001 (0.7%), OpenAI-03-mini-high-reasoning (0.8%), and Snowflake-Arctic-Instruct (3.0%). The data is provided by Vectara and was last updated on February 5th, 2025. The chart highlights the variability in hallucination rates across different LLMs.

Gemini is really good and we keep making it better. In quality and performance but also in cost efficiency.

With outputs costing $0.40 per million tokens and a 0.7% hallucination rate (Vectara), Gemini 2.0 Flash is best-in-class and has been my go-to model for most applications.

1 year ago 2 0 0 0
Preview
Machine Learning Summit: SIMA: Developing General AI Agents with Video Games | 2025 Schedule | Game Developers Conference (GDC) Attend 750 sessions for game designers, programmers, artists, producers, audio, business, and marketing professionals over 5 days at GDC.

Join us at the Machine Learning Summit @officialgdc.bsky.social next month for our talk "SIMA: Developing General Agents with Video Games" ๐Ÿค–๐ŸŽฎ #GDC2025 schedule.gdconf.com/session/mach...

1 year ago 4 1 0 0
A promotional graphic for Google DeepMind's appearance at GDC in San Francisco, March 17-21, 2025. The graphic features a dark background with the text "We're speaking at GOC" in white at the top. Below, "MARCH 17-21, 2025 SAN FRANCISCO, CA" is centered. At the bottom are two headshots side-by-side. On the left is Piermaria Mendolicchio, Senior Technical Program Manager at Google DeepMind. On the right is Alexandre Moufarek, Al Research Group Product Manager at Google DeepMind. Both individuals are identified with their names, titles, and company affiliation below their respective images

A promotional graphic for Google DeepMind's appearance at GDC in San Francisco, March 17-21, 2025. The graphic features a dark background with the text "We're speaking at GOC" in white at the top. Below, "MARCH 17-21, 2025 SAN FRANCISCO, CA" is centered. At the bottom are two headshots side-by-side. On the left is Piermaria Mendolicchio, Senior Technical Program Manager at Google DeepMind. On the right is Alexandre Moufarek, Al Research Group Product Manager at Google DeepMind. Both individuals are identified with their names, titles, and company affiliation below their respective images

Excited to be speaking at GDC next month with @estragone.bsky.social! Join us at the ML Summit for "SIMA: Developing General Agents with Video Games".

A look at the History of AI research with games and the latest on SIMA, our AI agent that follows language-instructions in 3D games. #GDC2025

1 year ago 5 1 1 0
Gemini 2.0 Pro 2nd overall but 1 in across all categories in the Chatbot Arena Leaderboard

Gemini 2.0 Pro 2nd overall but 1 in across all categories in the Chatbot Arena Leaderboard

Gemini 2.0 Pro Experimental ranks #1 across all categories in Chatbot Arena! ๐Ÿš€๐Ÿฅ‡

Gemini 2.0 Flash is top-3 in Coding, Math, and Hard Prompts. Our new model Flash-lite is top-10 across categories.

1 year ago 1 0 0 0
Preview
Gemini 2.0 is now available to everyone Weโ€™re announcing new updates to Gemini 2.0 Flash, plus introducing Gemini 2.0 Flash-Lite and Gemini 2.0 Pro Experimental.

Excited to announce our new Gemini 2.0 models! Fantastic work by everyone at Google DeepMind for surpassing 1.5 Pro performance with 2.0 Flash size and speed! blog.google/technology/g...

1 year ago 0 0 0 0
Expanding Gemini 2.0 2.0 Pro EXPERIMENTAL 2.0 Flash GENERAL AVAILABILITY 2.0 Flash Thinking EXPERIMENTAL 2.0 Flash-Lite PUBLIC PREVIEW

Expanding Gemini 2.0 2.0 Pro EXPERIMENTAL 2.0 Flash GENERAL AVAILABILITY 2.0 Flash Thinking EXPERIMENTAL 2.0 Flash-Lite PUBLIC PREVIEW

Gemini 2.0 Flash is now generally available via the Gemini API in Google Al Studio and Vertex Al ๐Ÿš€

Also introducing:
๐Ÿ”ต 2.0 Pro Experimental, excels at coding, comes with 2M tokens context
๐Ÿ”ต 2.0 Flash-Lite, our most cost-efficient model
๐Ÿ”ต 2.0 Flash Thinking Exp in Gemini App

1 year ago 0 0 1 0
Preview
Genie 2: A large-scale foundation world model Generating unlimited diverse training environments for future general agents

Ultimately, our research is building towards more general AI systems and agents that can understand and safely carry out a wide range of tasks and be helpful.

Extremely proud of the Genie team and this fantastic research progress! Great fun ๐Ÿ‘๐Ÿฅณโค๏ธ deepmind.google/discover/blo...

1 year ago 0 0 0 0
Video

While this research is still in its early stage, we believe Genie 2 has huge potential for creating novel environments for training and testing embodied AI agents.

We used Genie to create a world from the image below and gave our SIMA agent instructions to explore it. ๐Ÿค–๐Ÿ‘‡

1 year ago 0 0 1 0
Video

Excited to introduce Genie 2: our most capable and general foundation world model. It can generate a diverse array of consistent and interactive 3D worlds from a single image, playable with keyboard and mouse inputs for up to a minute. ๐Ÿงž๐ŸŽฎ๐ŸŒŽ

1 year ago 2 0 1 0
Advertisement
10 language models across various tasks, including coding, math, creative writing, instruction following, longer queries, and multi-turn conversations. Gemini-exp-1206 scores the highest in all categories.

10 language models across various tasks, including coding, math, creative writing, instruction following, longer queries, and multi-turn conversations. Gemini-exp-1206 scores the highest in all categories.

Our latest Gemini iteration is #1 on Chatbot Arena IN EVERY CATEGORY! ๐Ÿฅ‡๐ŸŽ‰

Try Gemini-Exp-1206 for free in Google AI Studio and the Gemini API now.

1 year ago 0 0 0 0
Gemini logo

Gemini logo

Gemini was released 1 year ago! ๐Ÿง๐Ÿฅณ

From 1.0 state-of-the-art multimodal capabilities to giant leaps forward with 1.5 and 2M tokens long context to the Project Astra research prototype...and that's only year one ๐Ÿ˜€

Congrats to everyone at Google DeepMind and our partners at Google!

1 year ago 0 0 0 0
Google DeepMind logo

Google DeepMind logo

I've been promoted to Al Research Group PM at Google DeepMind! It was a privilege to contribute to groundbreaking projects like Gemini, Astra, SIMA & Genie with such talented and inspiring teams.

Excited to continue this journey and contribute to our ambitious research ahead. Onwards and upwards! ๐Ÿš€

1 year ago 5 0 0 0