Phil (@phil2345) Bsky

I'm sorry. They should have called it Qwen3.5 Coder. Q3.6 has regressed in other areas so they could over-train coding, including a notable increase in factual hallucinations across domains. The monolithic coding obsessed early adopter community is KILLING the general purpose OS AI model ecosystem.

14 hours ago 0 0 0 0

Anthropic overfit coding going from Opus 3 to 4, resulting in a broad regression (e.g. a huge drop in its SimpleQA score), so coders flocked to it. But since LLMs are a balancing act (training in one area degrades others) any attempt to restore balance degrades its overfit coding abilities.

6 days ago 0 0 0 0

But if you look at its breakdown it got 27 in Expert, 12 in math, 9 in writing, 10 in IR, and 22 in long query. It's clear that its high ranking comes from its friendly nature and telling users what they want to hear vs actual performance. The humans using lmsys are really starting to annoy me.

1 week ago 1 0 0 0

That's a good question. The size per performance (at least test scores) appears to be the same, or even worse. But it may produce greater token/s and use less power.

3 weeks ago 0 0 0 0

It still appear that 4-bit models perform better at the same size. This is because the test scores are virtually the same after Q4_K_M quantization, which is only ~3x larger than these 1.3 bit LLMs, and the 4-bit versions of 3b models score a bit better and are only a bit larger.

3 weeks ago 5 0 3 0

The decision is awful, but was near unanimous b/c it's about the slippery slop of the govt getting involved in medicine/therapy. However, they made an exception for abortion, so why not this? The real problem is the adult LGBTQ & psychopathic parents who opt in for the such therapy.

3 weeks ago 2 0 1 0

People are stupid. Qwen 35b is incredibly weak. It hallucinates like crazy about very basic & popular things, writes repetitive stories that blatantly contradict themselves and the user's prompt, can't even begin to write poetry... It's just overfit to select domains to fool coding obsessed morons.

1 month ago 0 0 0 0

That's more than just nuts. He also thinks nuclear bombs are about a million times more powerful than they are.

1 month ago 0 1 0 0

I suspect part of this is due to how grossly over represented coding is in AI training. They not only train on trillions of coding tokens, but end training runs on them, resulting in a selective boost in coding performance while performance in most other domains is abysmal & inconsistent (AI slop).

2 months ago 1 0 1 0

Ellos obtienen mucho dinero para mantener un navegador por beneficios como la colocación predeterminada de la página de inicio. La única razón por la que el dinero es un problema para Mozilla es porque están desperdiciando dinero en cosas como un CEO y la AI.

2 months ago 0 0 0 0

Mozilla has the money to keep the open source Firefox browser going for decades, but instead it's going to burn through its cash on a brainless CEO's high salary and redundant AI development that will fail to compete with anything.

2 months ago 8 1 0 2

It wasn't the fault of Democrats. >95% were adamant in restoring abortion rights, and >95% of Republicans were adamant about not restoring them. So blaming Dems makes no sense. They didn't have the votes. It was Republicans and centrists (Rep/Dem) that made it impossible to restore abortion right.

4 months ago 1 0 2 0

Liberals need to be more pragmatic. We would still have Roe, advanced trans right... if millions of liberals didn't stay home with their vibrators & video games because of stunted 'both sides aren't perfect', 'Hillary is flawed'... thinking. I repeat, a trans activist saying never Gavin is low IQ.

4 months ago 0 0 5 1

That's not what happened. Abortion rights were lost when Trump and the Republicans appointed Supreme Court justices in his first term. Biden being president and the Democrats controlling both houses are completely irrelevant to the Supreme Courts ending of Roe.

4 months ago 2 0 2 0

Trump came to power and a broad spectrum of rights, including abortion and trans rights, were reduced because of low IQ nonsense like this. When you keep reminding people you're a never Newsome voter you're just shooting yourself and trans rights in the foot, you're just too stupid to realize it.

4 months ago 0 0 9 0

Also, the only model that actually made gains in the last year was the new Gemini. GPT5 was an overall regression from GPT4 (better at math/coding, little worse at all else). And the new Opus has less knowledge and broad stability than the previous Sonnet (basically only good for coding).

4 months ago 1 0 0 0

2 of 2) So in short, there was no overall improvement in >1 year. All Alibaba did was trade general knowledge & abilities for tiny gains in select domains and the industry as a whole is oblivious b/c they're not using the models & are simply running highly flawed automated tests to judge models.

4 months ago 0 0 0 0

1 of 2) That's the problem. The industry is just looking at tiny gains on select tests. I guarantee not a single researcher used Qwen3 for general AI use, or even for its overfit domains like coding since Claude, Gemini... are far better and more reliable, plus inexpensive to use.

4 months ago 1 0 0 0

They didn't loose to Qwen. Qwen3 overfit to select domains & tests and fall apart when subjected to a broad spectrum of tasks. Alibaba also started flat out cheating on tests like SimpleQA. Small models hit a ceiling over a year ago. Progress has been faked ever since (overfitting/cheating).

4 months ago 0 0 1 0

So fighting against taking healthcare away from 10s of millions and becoming the only first world country on the planet without guaranteed healthcare for all its citizens is just "belief polarization" and becoming an extreme version of yourself?

5 months ago 2 0 0 0

Yes, but the "basket of deplorables" comment was unbelievably stupid, regardless of how true it is. They both saw what was coming, but expressed it in counterproductive ways.

6 months ago 1 0 1 0

Sonnet 4.5 knows very little about art, pop culture, humor... It's grossly overfit to coding, math... Even the new Opus has less broad knowledge and abilities than the old Sonnet. It's more of a corporate tool than a general purpose AI model.

6 months ago 3 0 0 0

Alibaba trains on test data. Don't trust their scores.

6 months ago 0 0 0 0

You shouldn't report it unless something comes of it. The Trump administration is throwing a lot nonsense out there, including a 15 billion lawsuit to gain attention, which you just gave them.

7 months ago 1 0 1 0

Please work on your framing. There's a HUGE difference between the FCC chair and the administration calling for the cancelling of the show, and broad pressure from people.

7 months ago 8 0 0 0

When one side is using a tactic to fight to keep affordable health care, and the other to take healthcare away from millions, then calling out Ted back then wasn't hypocrisy. The primary issue the Democrats had was Ted using such a tactic for the cold hearted goal of taking healthcare away.

7 months ago 4 0 0 0

I think he was blinded by love while fighting the self-loathing of being in a trans relationship. You see the same thing w/ conservative raised homosexuals who often go through a self loathing & virulent homophobia stage. He was too distracted by this to see the damage to her & the trans community.

7 months ago 2 0 0 0

This was clearly a false flag. A single shot from 200 yards away, able to escape... is a professional hit. And no antifascist or transgender rights activist capable to pulling off such as well planned action would be stupid enough to leave ammunition engravings to villianize their own causes.

7 months ago 9 0 0 0

I was also hoping it was open source, but it makes sense that they'd keep their biggest trillion parameter model proprietary.

7 months ago 1 0 0 0

open-source?

7 months ago 1 0 1 0

Posts by Phil