Advertisement · 728 × 90

Posts by Lex

Reply test

5 hours ago 0 0 0 0
Preview
Obsidian Markdown Notebook: code execution with outputs stored in the file I've built Obsidian Markdown Notebook, a plugin that lets you execute code in Obsidian with both code and output stored in the same file, kinda like a Markdown Jupyter Notebook. Right now, the...

A new code execution plugin for Obsidian. Run code blocks and keep the outputs directly in the ⁠ .md ⁠ file. Like a Markdown Jupyter Notebook. notesbylex.com/obsidian-mar...

1 day ago 0 0 1 0
Preview
SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks Agent Skills are structured packages of Markdown files and scripts that augment AI Agents' capabilities. They usually look something like this: ~/.claude/skills/some-skill/ ├── SKILL.md └──...

SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks

notesbylex.com/skillsbench-...

1 month ago 0 0 0 0
Preview
Generative Modelling via Drifting This paper introduces a new paradigm for single-step generative modelling called Drifting Models 1 Where Diffusion/Flow Matching performs iterative denoising at inference time, Drifting Models...

I wrote a summary on the paper “Generative Modelling via Drifting”, an interest new one-step generative modelling approach called Drifting Models

notesbylex.com/generative-m...

2 months ago 0 0 0 0
Preview
Software Factory Software Factory 1 refers to the idea of completely abandoning the notion of writing code, even reviewing it, leaving engineers to manage the goal and validate the correctness of the system....

Shared some thoughts on Software Factories: notesbylex.com/software-fac...

2 months ago 0 0 0 0
Preview
OpenClaw: the missing piece for Obsidian's second brain I've been an Obsidian user for many years. I like it a lot. I like the paradigm of linking notes that comes from Zettelkasten. I like having a single place to keep all my notes, to track...

Getting in on the hype train: How I integrate OpenClaw into Obsidian and my daily life.

I know there are a lot of posts like this on the internet, but this one is mine.

notesbylex.com/openclaw-the...

2 months ago 0 0 0 0
Preview
Spec-First LLM Development Spec-First LLM Development is the simple idea that, instead of asking an LLM to immediately output code after prompting, you first ask it to output a spec file, which is continually updated as...

Spec-first LLM Development - SRS is back in fashion notesbylex.com/spec-first-l...

2 months ago 0 1 0 0

Good idea!

2 months ago 0 0 0 0
Post image

This new image editing model from Black Forest Labs called **FLUX.1 Kontext** is really good. I ran some experiments on photos of Doggo, and couldn't believe how well it could maintain character consistent across multiple turns of editing.

notesbylex.com/absurdly-good-doggo-cons...

10 months ago 1 0 0 0

Learning to Reason without External Rewards (aka Self-confidence Is All You Need)

Turns out we can just use the LLM's internal sense of confidence as the reward signal to train a reasoning model, no reward model / ground-truth examples / self-play needed.

Amazing.

https://notesbylex.com/learning…

10 months ago 0 0 0 0
Advertisement

"My new hobby: watching AI slowly drive Microsoft employees insane" old.reddit.com/r/ExperiencedDevs/commen...

11 months ago 0 0 0 0

A cool approach to iteratively improving generated images, using o3 as an LLM-judge to generate targeted masks for improvements: https://simulate.trybezel.com/research/image_agent

11 months ago 0 0 0 0

If Trump was really looking out for Elon, he would have posted that on Twitter.

1 year ago 0 0 0 0

You're giving them way too much credit. Trump has always had really really bad ideas, but this time around there are no adults in government to shut them down, only sycophantic yes men.

1 year ago 1 0 0 0

All that pep talk just to make social media content.

1 year ago 3 0 0 0

Can you link me to it? Surprisingly hard to Google for.

1 year ago 0 0 1 0
Preview
Teaser figures in ACL template papers Teaser figures in ACL template papers. GitHub Gist: instantly share code, notes, and snippets.

ARR deadline is coming up! If you're wondering how to make a beautiful full-width teaser figure on your first page, above the abstract, in LaTeX, check out this gist I made showing how I do it!

gist.github.com/michaelsaxon...

1 year ago 7 1 0 0
Post image

🔥 allenai/Llama-3.1-Tulu-3-8B (trained with PPO) -> allenai/Llama-3.1-Tulu-3.1-8B (trained with GRPO)

We are happy to "quietly" release our latest GRPO-trained Tulu 3.1 model, which is considerably better in MATH and GSM8K!

1 year ago 22 5 1 2

"As a former tech lead at Meta for 6 years... I got 'meets all' or 'exceeds' every single half except the one in which I took parental leave."

www.reddit.com/r/business/c...

1 year ago 0 0 0 0
Advertisement

Won't someone please think of the child processes?

1 year ago 14 0 0 0

Media Watch has Chas derangement syndrome!

1 year ago 0 0 0 0

Really awesome work and big thank you for sharing it on Bluesky!

1 year ago 1 0 0 0
Example of injecting Wait token into the model generation.

Example of injecting Wait token into the model generation.

A hilariously simple repro of OpenAI's test-time scaling paradigm called "Budget Scaling": end the thinking when your token budget is met, or append "Wait" to the model's generation to keep thinking, allowing the model to fix incorrect reasoning steps.

arxiv.org/abs/2501.19393

1 year ago 1 0 0 0

True.

1 year ago 1 0 0 0
Abstract and figures from paper R.I.P.: Better Models by Survival of the Fittest Prompts

Abstract and figures from paper R.I.P.: Better Models by Survival of the Fittest Prompts

A method for evaluating data for preference optimisation.

Rejecting Instruction Preferences (RIP) can filter prompts from existing training sets or make high-quality synthetic datasets. They see large performance gains across various benchmarks compared to unfiltered data.

arxiv.org/abs/2501.18578

1 year ago 1 0 0 0
Preview
GitHub - Jiayi-Pan/TinyZero: Clean, accessible reproduction of DeepSeek R1-Zero Clean, accessible reproduction of DeepSeek R1-Zero - Jiayi-Pan/TinyZero

A reproduction of Deepseek R1-Zero.

"The recipe:

We follow DeepSeek R1-Zero alg -- Given a base LM, prompts and ground-truth reward, we run RL.

We apply it to CountDown: a game where players combine numbers with basic arithmetic to reach a target number."

github.com/Jiayi-Pan/Ti...

1 year ago 0 0 0 0
Advertisement

Reasoning models can be useful for generating high-quality few-shot examples:

1. generate 10-20 examples from criteria in different styles with r1/o1/CoT, etc
2. have a model rate for each example based on quality + adherence.
3. filter/edit top examples by hand

Repeat for each category of output.

1 year ago 0 0 0 0
My dog Doggo, a stag hound X bull arab, chilling in the grass

My dog Doggo, a stag hound X bull arab, chilling in the grass

Happy dog.

1 year ago 2 0 0 0
Post image

The Illustrated DeepSeek-R1

Spent the weekend reading the paper and sorting through the intuitions. Here's a visual guide and the main intuitions to understand the model and the process that created it.

newsletter.languagemodels.co/p/the-illust...

1 year ago 75 23 1 4