Matthew Carrigan (@carrigmat) Bsky

The code agents are requesting payment now

4 weeks ago 1 0 0 0

Deeply curious whether this was caused by the user's prompt, or whether this is Claude's well-documented support for animal rights shining through into its code agent work

1 month ago 0 0 0 0

Combining Popper and Jaynes in a thesis I call "The Origin of Bicameral Parliament in the Breakdown of the Bicameral Mind"

1 month ago 0 0 0 0

I suspect the base probably could be finetuned for other tasks too! With a bit of hackery this is probably also a very strong DNA foundation model.

2 months ago 0 0 0 0

The key twist here is that they pretrained 11 output heads for different modalities, and all match or exceed the existing task SOTA. This means no training needed, if you want splice site or transcription factor binding prediction you can just feed in your sequence and read off the output.

2 months ago 1 0 1 0

google/alphagenome-all-folds · Hugging Face We’re on a journey to advance and democratize artificial intelligence through open source and open science.

Really nice bio model from DeepMind just got released! There have been quite a few DNA foundation models in the past, but labs usually had to gather their own data and fine-tune them for tasks of interest. This is something else!🧵

2 months ago 1 0 1 0

Though I'd add one addendum to that thread: It seems like some EPYC CPUs don't get the full socket bandwidth (possibly based on CCD count?), so going with the absolute cheapest ones might not be the best idea. If anyone knows the true memory bandwidths for those chips, I really want to know!

5 months ago 0 0 0 0

The hardware for R1 should work perfectly because K2 is actually slightly smaller despite the higher parameter count due to INT4 quantization. You should be able to fit it at full quality (Q8 attention, Q4 MoE) in 768GB!

5 months ago 0 0 1 0

Now seems like a good time to repeat this thread, since Kimi-K2-Thinking has just arrived and might actually be the strongest LLM in the world right now, open or closed huggingface.co/moonshotai/K...

5 months ago 0 0 1 0

PRs and issues on @hf.co have gotten a lot sloppier and weirder since the advent of code agents, but the weirdest ones still have an inexplicable human touch

5 months ago 1 0 0 0

In particular, this bit suggests that if you inject a concept too weakly the model doesn't notice, and too strongly it just talks about the concept rather than 'introspecting'. But maybe that just means a medium strength biases towards the concept without totally overriding the original question?

5 months ago 1 0 0 0

Emergent introspective awareness in large language models Research from Anthropic on the ability of large language models to introspect

Extremely fascinated by the latest Anthropic post, but parts of the results feel like they might just be the result of "the right amount of steering" rather than genuine introspection. www.anthropic.com/research/int...

5 months ago 1 0 2 1

An underappreciated thing about the Turing test is that every teacher, writer and artist on the planet is now intimately familiar with the markers of AI output.

The post-ChatGPT era is like a global training montage to ensure the bot's job in that test is as hard as possible

6 months ago 0 0 0 0

Yup, you can very clearly see a halving of stock value right after GPT-4 is released

10 months ago 18 1 0 0

I think a lot of people are dismissing it by analogy to crypto, where usage took off but it was clearly useless for anything but speculative investing or laundering the proceeds of crime. It even ate up all the GPUs for years too!

I mean, they're incredibly wrong, but I can see how they got there

10 months ago 1 0 1 0

One clear giveaway is that modern German still has an informal second-person "du" which bears obvious signs of shared heritage with "thou". Their similarity in sound, of course, but also their "-st" verb endings. Shakespearean "thou sayst" is almost identical to modern German "du sagst"!

11 months ago 1 0 0 0

Underappreciated linguistic fact: "Thou" was originally an informal, friendly pronoun, but feels extremely archaic and formal to modern ears because of its association with Shakespeare and the KJV. You'd use it for speaking to family and friends (and to God).

11 months ago 2 0 1 0

the betting markets are asking the real questions today

11 months ago 1 0 0 0

open-r1/README · [Experiment] Training R1-Zero-like models with Open R1 There are several recent research papers which explore various aspects of R1-Zero-like training on open base models like Qwen2.5-7B and Llama-3.1-8B:

The discussion pages for Open-R1 on @hf.co are such a goldmine for actual practical information on how to train a reasoning model.

Like look at this! If you're not reading those community tabs you're missing so much! huggingface.co/spaces/open-...

11 months ago 11 3 0 0

Paper page - Your ViT is Secretly an Image Segmentation Model Join the discussion on this paper page

I call this The Paper. It gets written quite often in machine learning, and it's valuable every time!

The core of it is "Everyone had a complex setup to do X task. With enough scale, none of that complexity is necessary, and a simple model does it better."

huggingface.co/papers/2503....

1 year ago 6 0 0 0

EsportsBench/EsportsBench · Datasets at Hugging Face We’re on a journey to advance and democratize artificial intelligence through open source and open science.

Here's EsportsBench v5!

72k new matches added from 2025-01-01 through 2025-03-31 and some data quality improvements to past data as well.

Over 2.4 million rows of esports match data from 20 titles spanning over 25 years

huggingface.co/datasets/Esp...

1 year ago 4 2 0 0

I believe ArXiv and Archive Of Our Own should swap places for April 1st. I believe this more strongly than I believe anything else

1 year ago 46 14 0 0

And when Leela Chess Zero did an open-source reproduction of it, they just distributed inference to volunteer computers around the globe. Of course, that probably won't work for a 700GB LLM as well as it did for a 100MB convnet, but in principle you could do the same

1 year ago 0 0 0 0

The analogy here is to projects like AlphaGo/AlphaZero - far more compute was spent on calculating board positions to generate the training data than it was actually updating the model with that training data! Deepmind distributed that over tons of tiny TPUv1s iirc

1 year ago 0 0 1 0

People are reading MSFT dropping power contracts as a sign that AI investment will fall off, but if reasoning is the new paradigm then most training compute will be inference and that doesn't have to be centralized

Massive monolithic datacentres are much less necessary now

1 year ago 0 0 1 0

Preliminary take is that V3-0324 is a major upgrade on the V3 base. Increasingly confident that it's the strongest open-source LLM, and likely competitive with the top tier of closed source too

1 year ago 0 0 0 0

This might also herald a possible upgraded R1 reasoning model as well, using the new V3 as an improved base, but this is pure speculation on my part - I don't have any secret info!

1 year ago 1 0 0 0

deepseek-ai/DeepSeek-V3-0324 · Hugging Face We’re on a journey to advance and democratize artificial intelligence through open source and open science.

Deepseek V3-0324 just landed, an upgraded version of the V3 model that was used as the base for Deepseek-R1. Weights on @hf.co , and it'll start appearing on inference providers soon. It seems very strong in early testing, likely the best non-reasoning OS model (!)

huggingface.co/deepseek-ai/...

1 year ago 1 0 1 0

Xet is on the Hub We’re on a journey to advance and democratize artificial intelligence through open source and open science.

Last week, we launched a waitlist to move builders on @hf.co from LFS to Xet. This was made possible through months of hard work and staged migrations to test our infrastructure in real-time.

This post provides an inside look into the day of our first migrations and the weeks after.

1 year ago 11 3 1 1

Posts by Matthew Carrigan