Gus (@gusthema) Bsky

Gemma 4 31B vs Qwen3.5 27B: Inference Speed, Token-Efficiency, Accuracy, and Memory Consumption A data-driven guide to choosing which of these open models to run locally.

thanks!!! that's what we tried to do!!

there is a benchmaxxing study here that gives even more information: kaitchup.substack.com/p/gemma-4-31...

1 day ago 1 0 0 0

that is exactly what jebediah98.bsky.social said.

we need to convert our models to multiple external frameworks to make sure every developer can use the models. sometimes there are some bugs and we try to fix as many as possible

I hope your next test goes better! please let me know

1 day ago 1 0 0 0

for cloud next at least

2 days ago 1 0 1 0

will do!

2 days ago 1 0 1 0

I wonder how much of that paper was written by claude code itself!
it is kind of an autobiography!

2 days ago 3 0 0 0

or maybe you were using a buggy version?

at launch some of the external implementation had some issues with the chat template and it led to some bugs in agentic tasks

whenever in doubt, try the AI Studio version which is the ground truth to what Gemma models should do

2 days ago 1 0 1 0

right? it is very interesting indeed!!!

2 days ago 1 0 0 0

Gemma 4: The 31B Model That Beats Everything 31B parameters. Twenty cents per run. 100% survival rate. +1,144% median ROI. Outperforms Gemini 3 Pro, GPT-5.2, and Claude Sonnet 4.6 — at 15–40× lower cost. 5 runs, 0 bankruptcies, all 8 upgrades pu...

LLM benchmarks are broken. 📉

We’re seeing more "benchmaxxing" than actual intelligence. High academic scores are easy to fake, but real-world generalization is much harder.

If you want to see what true performance looks like, look at the FoodTruckBench results:

foodtruckbench.com/blog/gemma-4...

2 days ago 17 0 4 0

Why DeepMind’s New AI Broke The Internet YouTube video by Two Minute Papers

"Google DeepMind gave an amazing gift to humanity!!"
- Two Minute Papers

Gemma is indeed great and I'm very happy to see these videos!!

have you tried it yet? you should!!!

www.youtube.com/watch?v=Sk9t...

4 days ago 8 1 0 0

Randall has my vote to be in charge of ISO!

5 days ago 3 0 0 0

From the LocalLLaMA community on Reddit Explore this post and more from the LocalLLaMA community

I like when people gets great results with our products!!!

www.reddit.com/r/LocalLLaMA...

5 days ago 11 0 0 0

Tons of niches...

1 week ago 2 0 0 0

It looks beautiful

1 week ago 3 0 1 0

I agree!

The Gemma 4 e2b model has similar quality to Gemma 3 27b!!!
That's incredible!

1 week ago 1 0 0 0

First Gemma 4 talk at the London Deepmind office!

Many more to come!

1 week ago 31 0 2 1

You should try this app!

Playing with Gemma 4 on device is super fun!

1 week ago 3 1 2 0

It shows work on all the big Frameworks

That being said, we are working on improving inference speed as much as possible

1 week ago 1 0 1 0

Teaching Gemma 4 (26b MoE) how to map Census data in R via OpenCode & oMLX in the Zed IDE.

It's the first local LLM I've used that gets me real-time responsiveness. Running on a MacBook Pro M2 MAX, 64GB RAM.

Not as smart as the frontier models - but it's running for free on my own hardware.

2 weeks ago 17 3 3 0

This is very cool!!!

1 week ago 0 0 0 0

about the speed, which platform are you using for serving? vLLM

1 week ago 1 0 1 0

Google just casually disrupted the open-source AI narrative… YouTube video by Fireship

I'm usually afraid that my work end up on a Fireship video, but this time I'm very very happy!!!

www.youtube.com/watch?v=-01Z...

1 week ago 7 0 0 0

100%?
Wow

2 weeks ago 1 0 0 0

To help implement the details

2 weeks ago 1 0 1 0

Good question.
We did many tests with different prompting styles and ended up in the one we released, even being a little bit different from other model families.

From a user perspective, it should be transparent to them as it only affects framework creators which we tried to collaborate with

2 weeks ago 1 0 1 0

I don't much about that platform too (apparently lot's of tools I don't know of, hahaha)

but it should be able to turn off the thinking there too

2 weeks ago 1 0 0 0

Thinking mode in Gemma | Google AI for Developers

unfortunately I don't know about Joplin plugins, but, thinking is something that is enabled by System Instructions, like on/off switch

maybe something here may be able to help you: ai.google.dev/gemma/docs/c...

2 weeks ago 1 0 1 0

which framework are you using to run the model?
asking because the model can run without thinking enabled but some frameworks expose that in different ways

2 weeks ago 0 0 2 0

A Visual Guide to Gemma 4 A great start to a new job ;)

A Visual Guide to Gemma 4 by Maarten Grootendorst

An in-depth, architectural deep dive of the Gemma 4 family of models. From Per-Layer Embeddings to the vision and audio encoders.

newsletter.maartengrootendorst.com/p/a-visual-g...

2 weeks ago 43 3 3 0

This is so beautiful!

2 weeks ago 1 0 0 0

Gemma 4: Expanding the Gemmaverse with Apache 2.0 Google releases Gemma 4, expanding the Gemmaverse with Apache 2.0-licensed AI models. Learn how this empowers developers and researchers.

Gemma 4 changed from a custom license to Apache 2.0!

opensource.googleblog.com/2026/03/gemm...

2 weeks ago 40 2 1 0

Posts by Gus