Matthew Kenney (@baykenney) Bsky

There's rightly a lot of excitement around Karpathy's autoresearch. We've been studying at ARG for a couple years now: what happens when you put an AI agent in a loop and let it run experiments, evaluate results, and iterate without you.
We've built a bunch of benchmarks and tools to measure this. 🧵

2 weeks ago 0 1 1 1

bsky.app/profile/algo...

2 weeks ago 0 0 0 0

Over the past ~2 years I’ve been working hard on agents, models, and datasets to understand what recursive self-improvement might look like, and what path supporting infrastructure for this line of research might take.

Very excited to open source some of that work starting today

2 weeks ago 3 0 1 0

Algorithmic Research Group on X: "At ARG, we're laser-focused on understanding recursive self-improvement. We're confident that as models scale, RSI will accelerate the frontier of AI at ever-increasing speeds. Over the past year, we've created benchmarks, agents, and AI systems to measure how this might happen. https://t.co/JyOPFSB8DJ" / X At ARG, we're laser-focused on understanding recursive self-improvement. We're confident that as models scale, RSI will accelerate the frontier of AI at ever-increasing speeds. Over the past year, we've created benchmarks, agents, and AI systems to measure how this might happen. https://t.co/JyOPFSB8DJ

Very excited to launch this little tool that we’ve been building. ScoutML is an API built for AI researchers and agents that includes a ton of metadata on each paper. It’s been super helpful for us as we run our research agents internally.
x.com/algoresearch...

8 months ago 2 0 0 0

ProspectML | A research assistant that helps researchers generate insights and code to accelerate their work. A research assistant that helps researchers generate insights and code to accelerate their work.

If you're working on ML and this resonates, I’d love to hear what you'd want it to do. We're opening up a limited beta. Link below: prospectml.com

11 months ago 1 1 0 0

It’s built on top of a foundation of parsed metadata from papers, code, and repos—models, metrics, datasets, SOTA claims, GPU counts (and types), ablation studies, citations, etc. It’s already become crucial to our internal research, and we hope it can be helpful to others, too.

11 months ago 0 0 1 0

It’s designed to support that murky, nonlinear part of the research process, where you're still figuring out what's interesting.

11 months ago 0 0 1 0

You give it a question like “How can we improve generalization in low-resource RL?” and it returns distilled insights, speculative ideas, and experimental code. Not final answers, just something to push the thinking forward.

11 months ago 0 0 1 0

Most of the time, I end up manually digging through papers, chasing links, and piecing together ideas. It works, but it’s slow, and it doesn’t scale with curiosity. I’ve been trying to fix that with a platform we're building called ProspectML.

11 months ago 0 0 1 0

A lot of ML tools help you implement. Not many help you think.

When I’m exploring a new research direction, I don’t want another search engine or citation graph. I want something that’s actually read the literature, can suggest promising directions, and helps me reason through tradeoffs.

11 months ago 4 1 1 0

hello world!

1 year ago 2 1 0 0

ARG is on Bluesky! Please follow here: @algoresearch.bsky.social

1 year ago 0 0 0 0

Recommendations for Technical AI Safety Research Directions

good post on 2025 ai safety research directions:
alignment.anthropic.com/2025/recomme...

1 year ago 1 1 0 0

Back in Pennsylvania, drinking schuylkill county coal cracker (boilo) and making pierogies

1 year ago 3 0 0 0

That’s because it was from 2022

1 year ago 1 0 1 0

AI for science could be more impactful than chatbots. It is already helping win Nobel prizes and accelerating drug development and materials discovery.
Today we published an essay about it: why it matters, how it’s happening and its implications. Here is a summary from an econ / social sci lens.

1 year ago 79 30 2 7

Important point that the open protocol makes extracting data from bluesky easy. Can't have it both ways. I like the protocol and think this site is well designed, but that means anyone can and will analyze these posts (if there is value to them, which I'm honestly less convinced of than some)

1 year ago 11 2 0 1

1 year ago 7 1 0 0

A dataset of 1 million or 2 million Bluesky posts is completely irrelevant to training large language models.

The primary usecase for the datasets that people are losing their shit over isn't ChatGPT, it's social science research and developing systems that improve Bluesky.

1 year ago 251 39 8 5

Wait what even is this platform. This is insane

1 year ago 2 0 0 0

What! If it works for umap-learn vs umap i’m in.

1 year ago 0 0 0 0

AlgorithmicResearchGroup (Algorithmic Research Group) Org profile for Algorithmic Research Group on Hugging Face, the AI community building the future.

huggingface.co/AlgorithmicR...

1 year ago 0 0 0 0

I have a community project in Eleuther and open source all of my research:
bsky.app/profile/bayk...

1 year ago 1 0 1 0

Jk the rest are great. Just a big uncle nearest fan

1 year ago 1 0 1 0

5
.
.
.
.
.
.
.
.
3
2
4
1

1 year ago 1 0 1 0

We welcome PRs, contributions, additional tasks, and task revisions. Excited to see how agents perform on this benchmark.

1 year ago 0 0 0 0

We develop a baseline agent, with tools for coding, research (via Semantic Scholar), and model training, built on top of Sonnet 3.5 and GPT-4o. Our baseline agent performs well across tasks, but generally fails to move beyond baseline implementations.

1 year ago 0 0 1 0

ML Research Bench adapts tasks from ML conference competitions like ‘NeurIPS Large Language Model Efficiency Challenge: 1 LLM + 1GPU + 1Day’ and ‘LLM Merging Competition’. We prompt agents to complete these challenging tasks. These tasks move beyond simple ML tasks.

1 year ago 0 0 1 0

ML Research Benchmark Artificial intelligence agents are increasingly capable of performing complex tasks across various domains. As these agents advance, there is a growing need to accurately measure and benchmark their c...

(re-posting from X)

Can we get AI to accelerate AI research and development?

I’m excited to release ML Research Benchmark, an agentic benchmark of 7 ML conference competition tasks.

Paper: arxiv.org/abs/2410.22553
Tasks: github.com/AlgorithmicR...
Agent: github.com/AlgorithmicR...

1 year ago 2 0 2 1

Maxo Kream

1 year ago 1 0 0 0

Posts by Matthew Kenney