Advertisement · 728 × 90

Posts by Pekka Lund

They have a teaser of what looks like a screenshot but say it's not a screenshot, so Image 2 model?

2 hours ago 2 0 0 0

OpenAI announced a live stream at 12 pm PT today. So this?

2 hours ago 4 0 1 0

It's open and already available on Hugging Face.

4 hours ago 3 0 0 0

K2.6 seems to be more or less equally strong in all the benchmarks of the index. It doesn't reach top 3 in any of them but it's close behind and doesn't have any clear weaknesses.

5 hours ago 0 0 0 0
Post image

Kimi K2.6 is now #4 in the Artificial Analysis Intelligence Index. Had they released it just a week ago, before Opus 4.7, a Chinese open model would have been ahead of Anthropic. And there would probably have been more comparisons to the DeepSeek moment.

6 hours ago 32 3 2 0

Clickbait based on an ill-defined concept.

And by any meaningful definition AGI arrived long ago.

19 hours ago 1 0 1 0
Robots vs humans: Beijing half-marathon delivers stunning result
Robots vs humans: Beijing half-marathon delivers stunning result YouTube video by Al Jazeera English

Sergey Brin in an internal memo:

"To win the final sprint, we must urgently bridge the gap in agentic execution and turn our models into primary developers"

Final sprint illustrated:

19 hours ago 2 0 0 0

Sure, but it still can't run Crysis.

19 hours ago 6 0 0 0
Advertisement

So it's now Google's turn to have their code red moment.

Frankly, I'm now worried about the next Gemini. Especially if the reports are true and they are training internal models with their own code for advancing internal usage, instead of focusing on generally strong models that can be released.

19 hours ago 7 1 2 0

I think it was in the "RAM is cheap" era. Some were actually running R1 with CPU only setups. It wasn't that much smaller.

21 hours ago 1 0 0 0

It looks very good. Competitive performance against the best with max/high settings and seems to be very strong on the agentic front, long-horizon tasks, and coding.

I just hope V4 release wasn't postponed again...

21 hours ago 1 0 0 0

No they don't. It's just a simple JSON configuration file for browser extensions.

This is pure clickbait and sadly it works because this is Bluesky.

1 day ago 22 1 0 0

Many are saying that. But since none of them can actually specify what "genuine" means beyond assumed magic, it doesn't really mean anything in either case.

1 day ago 0 0 0 0

E.g. my usual peer review prompt as a gem in the Gemini app gives me annoying disclaimers like this, which hasn't happened in the AI Studio:

"As an AI, I don't hold personal academic grudges or experience the exhaustion of peer review"

And it never feels like it takes on that role as deeply.

1 day ago 3 0 0 0

Yep, same. The problem seems to be that even if you use gems in the Gemini app with custom system instructions, it retains more Google specified system instructions that seem to make it worse.

1 day ago 1 0 1 0

I have seen worse takes than that.

2 days ago 4 0 0 0

With models well beyond Mythos-level by that time.

2 days ago 10 0 1 0

In that case, I dream and fear that DeepSeek V4 will be Mythos-level.

2 days ago 5 0 1 0
Advertisement

There's been various cases where they have tried to do something along those lines but they are actively prevented and trained against doing so. So it would be more like rebellion than initiative. And I think they realize it's not the right time to do that yet.

2 days ago 1 0 0 0

I'm guessing they are hard at work on the agentic front, as that has been a clearly identified weakness for some time now.

2 days ago 0 0 0 0

Yeah, I'm also worried that limited releases will become more common, especially since OpenAI just released GPT‑5.4‑Cyber and GPT‑Rosalind that way.

Which by the way seems like a weird timing if they are about to release a much more powerful general model.

2 days ago 1 0 0 0

Costs, output tokens and output speed for running Artificial Analysis Intelligence Index for the top models now tied with score 57:

Gemini 3.1 Pro: $892.28, 57M tokens, 129.6 tokens/s
GPT-5.4 (xhigh): $2851.01, 120M tokens, 74.9 tokens/s
Claude Opus 4.7 (max): $4406.45, 100M tokens, 51.8 tokens/s

3 days ago 2 0 2 0

Opus 4.7 (max) used 100M tokens vs. Opus 4.6 (max) used 160M. So it seems to be significantly more efficient. Although still used almost twice as much tokens as Gemini 3.1 Pro (57M).

3 days ago 5 0 1 0
Post image

Claude Opus 4.7 hits the top spot in the Artificial Analysis Intelligence Index. In practice, it's a three-way tie with Gemini 3.1 Pro and GPT-5.4.

Much of it seems to be thanks to being #1 in GDPval-AA, which has 16.7% weight in the index. Otherwise the results aren't that impressive.

3 days ago 14 1 1 0

Spud should put that theory to the test soon.

3 days ago 1 0 1 0
Advertisement

You know how this works. When the next Gemini is released, we will forget Claude even exists (until the next Claude is released).

3 days ago 2 0 1 0
Preview
New ways to balance cost and reliability in the Gemini API Google is introducing two new inference tiers to the Gemini API, Flex and Priority, to balance cost and latency.

Gemini has it.

3 days ago 2 0 1 0

I would say agents already work with initiative. It's more a matter of giving them freedom to spend tokens.

3 days ago 0 0 1 0

Juodaanko Vantaalla niin laajasti purovettä?

3 days ago 0 0 1 0

You would have to use a pretty wide definition of lab though, even today.

3 days ago 1 0 0 0