(@algoresearch) Bsky

Apache 2.0. - Works with OpenAI, Anthropic, or anything through LiteLLM.

2 weeks ago 0 0 0 0

We shipped demos:

1. Benchmark Scout: extract benchmark rows from papers, flag questionable comparisons, then re-run only the unclear cases
2. HF Entity Graph: extract entities, detect ambiguity, then resolve it
3. Benchmark Report: full topology and throughput benchmarks with charts

2 weeks ago 0 0 1 0

Workers can be anything at the leaves.

- LLM-driven agent tasks
- deterministic reducers
- local executors
- your own agent via BYOA

You can plug in your own agent process without touching orchestrators.

2 weeks ago 0 0 1 0

You can mix agent work and deterministic work in the same job.

Failures stay localized to the nodes that failed.
Second-pass reasoning only runs on ambiguous cases instead of everything.

This lets you keep most of the system deterministic and only spend agent cycles where they are actually needed.

2 weeks ago 0 0 1 0

Seven orchestration topologies:
dag: parallel build with QA and fix loops
tree: hierarchical decomposition with branching
pipeline: staged delivery
supervisor: adaptive retries and task splitting
work_queue: flat pull-based workers
sharded_queue: large independent item sets
map_reduce: fan-out

2 weeks ago 0 0 1 0

Epsilon splits coordination, execution, and transport.

Orchestrators define task structure and dependencies.
Workers execute tasks as independent processes.
A ØMQ broker handles queueing, leases, heartbeats, and routing.

Intermediate state is written to a shared workspace on disk.

2 weeks ago 1 0 1 0

Epsilon treats a workload as a graph of tasks with explicit state, retries, and ownership. Orchestrators handle decomposition and coordination. Workers execute tasks. The runtime handles scheduling, routing, and recovery.

2 weeks ago 0 0 1 0

We use it internally for things like:

multi-agent software builds with QA/fix loops
extracting structured data across large paper corpora
manifest-backed jobs with hundreds of tasks
pipelines where only some cases need a second pass

2 weeks ago 0 0 1 0

GitHub - AlgorithmicResearchGroup/epsilon: Epsilon is infrastructure for structured agent workloads. Epsilon is infrastructure for structured agent workloads. - AlgorithmicResearchGroup/epsilon

We just open-sourced Epsilon: a runtime for structured agent workloads. Thread on what it is and why we built it. 🧵

Find it here:
github.com/AlgorithmicR...

2 weeks ago 1 0 1 0

We're effectively going open source, repo by repo, dataset by dataset 🤖

2 weeks ago 0 0 0 0

Algorithmic Research Group Algorithmic Research Group — AI safety research lab.

All MIT or Apache open source. check out our work:
algorithmicresearchgroup.com

2 weeks ago 0 0 0 0

Learning to Rank Architectures: A Small Model That Guides Neural Architecture Search A tiny recursive reasoning model trained to rank architectures by predicted performance achieves 8-10x sample efficiency over random search and transfers zero-shot across datasets with minimal...

A tiny architecture ranking model that got 8-10x sample efficiency over random search in NAS and transferred zero-shot across datasets
algorithmicresearchgroup.com/projects/lea...

2 weeks ago 0 0 1 0

AlgorithmicResearchGroup/ArXivDLInstruct · Datasets at Hugging Face We’re on a journey to advance and democratize artificial intelligence through open source and open science.

ArXivDLInstruct - 778K functions from research code paired with instruction prompts for fine-tuning
huggingface.co/datasets/Alg...

2 weeks ago 0 0 1 0

AlgorithmicResearchGroup/arxiv_research_code · Datasets at Hugging Face We’re on a journey to advance and democratize artificial intelligence through open source and open science.

ArXiv Research Code Dataset - 4.7M code files from 129K research repos linked to arXiv CS papers
huggingface.co/datasets/Alg...

2 weeks ago 0 0 1 0

GitHub - AlgorithmicResearchGroup/aria-public: ARIA generates AI research benchmark datasets from Papers with Code exports and evaluates them with Inspect AI. ARIA generates AI research benchmark datasets from Papers with Code exports and evaluates them with Inspect AI. - AlgorithmicResearchGroup/aria-public

ARIA Benchmark - 5 closed-book benchmarks testing how much ML knowledge frontier models have actually internalized
github.com/AlgorithmicR...

2 weeks ago 0 0 1 0

GitHub - AlgorithmicResearchGroup/deltaml-bench-vivaria Contribute to AlgorithmicResearchGroup/deltaml-bench-vivaria development by creating an account on GitHub.

Follow-up was DeltaMLBench: 50 tasks from real Papers With Code repos where the goal is to beat the published baseline, not just reproduce it. GPT-5 with our agent scaffold improved on 29 of 48 tasks, some by a lot. Under review at ICML 2026.
github.com/AlgorithmicR...

2 weeks ago 0 0 1 0

Two years ago we released the ML Research Benchmark (MLRB), 7 competition-level ML challenges from NeurIPS/ICML/CoNLL. Gave frontier agents an A100, 24 hours, no starter code.
Main finding: agents could build working pipelines but couldn't do real research iteration.

2 weeks ago 0 0 1 0

There's rightly a lot of excitement around Karpathy's autoresearch. We've been studying at ARG for a couple years now: what happens when you put an AI agent in a loop and let it run experiments, evaluate results, and iterate without you.
We've built a bunch of benchmarks and tools to measure this. 🧵

2 weeks ago 0 1 1 1

hello world!

1 year ago 2 1 0 0

Posts by