Jannis Bulian (@j5b) Bsky

The Gemini 2.5 Technical Report is out: storage.googleapis.com/deepmind-med...

10 months ago 9 2 0 0

🥁Introducing Gemini 2.5, our most intelligent model with impressive capabilities in advanced reasoning and coding.

Now integrating thinking capabilities, 2.5 Pro Experimental is our most performant Gemini model yet. It’s #1 on the LM Arena leaderboard. 🥇

1 year ago 215 66 34 11

We’ve been teaching Gemini to think.

Try it here: aistudio.google.com/prompts/new_...

1 year ago 4 0 0 0

Happy birthday Gemini!

1 year ago 14 1 0 0

📢We release Tülu 3, a family of fully-open state-of-the-art post-trained models, alongside its data, code, and training recipes, serving as a comprehensive guide for modern post-training techniques!

1 year ago 59 7 2 1

Good software is an enabler for good science! 💥🧪

Inspired by the below post, I like to point people at libraries like github.com/patrick-kidg... as a template for what a modern Python library looks like: `pre-commit`, ruff, pyright, pyproject.toml, an open-source license, etc. 🤓

1 year ago 86 12 6 1

Amazon.com

Fun, insightful, useful, cheap: Thinking Like A Large Language Model: Become an AI manager a.co/d/7xMTtJM

1 year ago 0 1 0 0

A comparison of LLMs mean rating average in presentational and epistemological dimensions.

We compared notable LLMs such as InstructGPT, ChatGPT, GPT4, PaLM2 (text-bison), and Falcon-180B. They excel at presenting climate information, but there's room for improvement in the epistemic qualities of their answers.

2 years ago 1 0 0 0

This is a tough task for human raters. Our study finds that AI can effectively assist human raters, offering promising avenues for scalable oversight on difficult problems like this.

2 years ago 1 0 1 0

Excited to share our latest paper: We explore how large language models tackle questions on climate change 🌎, introducing an evaluation framework grounded in #SciComm research.

Read the preprint: arxiv.org/abs/2310.02932

2 years ago 6 0 1 0

Posts by Jannis Bulian