Advertisement · 728 × 90

Posts by Chris McMaster

LLMs become much less useful the moment you dismantle bureaucracy.

1 year ago 2 0 0 0

The modern internet penalises anyone who thinks for more than 3 seconds before forming a strong opinion. You’ll be happy to know that I only thought for 2 seconds before typing this.

1 year ago 1 0 0 0

The major barrier here is that the massive budgets get spent on AI researcher/engineer compensation and GPUs, leaving very little to pay for the best domain experts. I think this is a massive miscalculation. The first lab to realise this will quickly become the market leader.

1 year ago 1 0 0 0

It should therefore come as no surprise that new models like grok 3 and GPT 4.5 feel like small incremental improvements. The focus now should be on improving post-training data. Frankly, the quality kinda sucks even at the best labs.

1 year ago 1 0 1 0

The idea that scaling up LLMs on human data will produce superhuman performance is magical thinking. The highest attainable performance in any given domain is simply the best human performance. Yes, maybe some insights span across domains, but I don’t think there’s actually evidence for that.

1 year ago 2 0 1 0

The people who laugh about hacky “vibe coding” with LLMs are the same people who think that grok 3 is better than a doctor. Absolutely nuts.

1 year ago 2 0 0 0

“Below is the updated code” followed by absolutely no code is such an o3-mini thing to do that at this point I don’t really understand why this model exists.

1 year ago 1 0 0 0

After much time spent looking at reasoning traces from DeepSeek R1 for medical cases, I have to conclude that there isn’t a strong correlation between good reasoning and a good answer.

1 year ago 4 0 0 0

The really interesting thing is that they’re not all made equal. Llama 8b distilled can solve medical cases that the Qwen-based R1 (and even R1 itself) cannot. World knowledge still matters for solving real problems and no open models beat Llama in that regard.

1 year ago 0 0 0 0
Advertisement

Say the benchmark has 100 questions, generate 64 responses per question and then pass@1 is total number correct / 6400.

1 year ago 7 0 1 0

These R1 distilled models are absolutely amazing on a single turn, but truly horrible on multi-turn conversations.

1 year ago 1 0 1 0
Preview
a cartoon dog is sitting at a table with a cup of coffee in front of a fire with the words this is fine . ALT: a cartoon dog is sitting at a table with a cup of coffee in front of a fire with the words this is fine .

Meta’s new response to the Rohingya genocide.

1 year ago 2 0 0 0

"Mildly elevated rheumatoid factor has a very low positive predictive value that is completely overwhelmed in magnitude by the negative predictive value of not having any signs or symptoms of rheumatoid arthritis."

1 year ago 1 0 0 0

My 2 year old has 3 adjectives for the size of things. In increasing order: small, mummy, big.

1 year ago 2 0 0 0

5.7B tokens to solve 100 tasks??? I don’t understand why we’re thinking of this as being incredibly smart, when what this suggests is that it’s incredibly dumb.

1 year ago 1 0 0 0
Preview
Finally, a Replacement for BERT: Introducing ModernBERT We’re on a journey to advance and democratize artificial intelligence through open source and open science.

👀👀👀👀

huggingface.co/blog/modernb...

1 year ago 76 11 7 2
Video

Releasing Jupyter Agents - LLMs running data analysis directly in a notebook!

The agent can load data, execute code, plot results and following your guidance and ideas!

A very natural way to collaborate with an LLM over data and it's just scratching the surface of what's possible soon!

1 year ago 13 4 1 0

Did you use oil?

1 year ago 0 0 1 0

What percentage of “rhupus” is just misdiagnosed Sjögren?

1 year ago 3 1 1 0
Advertisement

“ILD, hyperglobulinemia & Lab abnormalities” sounds like SjD to me!

1 year ago 0 0 0 0

I just follow everyone and then spend all my time on the quiet posters feed (except when I want to come judge the yappers)

1 year ago 1 0 0 0

o1 is equal parts brilliant and boring. Very, very boring.

1 year ago 1 0 0 0

Llamafile is a cheat code.

1 year ago 0 0 0 0

"Zuckerberg's eyes brimmed with tears, and his heart felt full. He truly loved Big Brother!"

1 year ago 15 4 1 0

You better believe it. “California boomer” is the vibe I am getting from this guy.

1 year ago 0 0 0 0

Prop stethoscope checks out, though.

1 year ago 0 0 0 0
Advertisement
Video

Sora’s idea of a hand exam. This 60 year-old rheumatologist is giving me very 2nd year medical student vibes with this bizarre technique. No synovitis was detected this day. #rheumsky

1 year ago 2 0 1 0

Oh, and if you follow him then you’ll end up on a list, which will lead to you seeing less of these people. I genuinely have no interest in seeing his posts (I find him uniquely annoying), but it’s probably worth it.

1 year ago 16 0 1 0

Ultimately, if performance is anything like previous iterations of Phi, it will greatly underwhelm outside of benchmarks. So the license has no meaning to me.

1 year ago 4 0 1 0

So much delving

1 year ago 2 0 0 0