Reply test
Posts by Lex
A new code execution plugin for Obsidian. Run code blocks and keep the outputs directly in the .md file. Like a Markdown Jupyter Notebook. notesbylex.com/obsidian-mar...
SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks
notesbylex.com/skillsbench-...
I wrote a summary on the paper “Generative Modelling via Drifting”, an interest new one-step generative modelling approach called Drifting Models
notesbylex.com/generative-m...
Getting in on the hype train: How I integrate OpenClaw into Obsidian and my daily life.
I know there are a lot of posts like this on the internet, but this one is mine.
notesbylex.com/openclaw-the...
Good idea!
This new image editing model from Black Forest Labs called **FLUX.1 Kontext** is really good. I ran some experiments on photos of Doggo, and couldn't believe how well it could maintain character consistent across multiple turns of editing.
notesbylex.com/absurdly-good-doggo-cons...
Learning to Reason without External Rewards (aka Self-confidence Is All You Need)
Turns out we can just use the LLM's internal sense of confidence as the reward signal to train a reasoning model, no reward model / ground-truth examples / self-play needed.
Amazing.
https://notesbylex.com/learning…
"My new hobby: watching AI slowly drive Microsoft employees insane" old.reddit.com/r/ExperiencedDevs/commen...
A cool approach to iteratively improving generated images, using o3 as an LLM-judge to generate targeted masks for improvements: https://simulate.trybezel.com/research/image_agent
If Trump was really looking out for Elon, he would have posted that on Twitter.
You're giving them way too much credit. Trump has always had really really bad ideas, but this time around there are no adults in government to shut them down, only sycophantic yes men.
All that pep talk just to make social media content.
Can you link me to it? Surprisingly hard to Google for.
ARR deadline is coming up! If you're wondering how to make a beautiful full-width teaser figure on your first page, above the abstract, in LaTeX, check out this gist I made showing how I do it!
gist.github.com/michaelsaxon...
🔥 allenai/Llama-3.1-Tulu-3-8B (trained with PPO) -> allenai/Llama-3.1-Tulu-3.1-8B (trained with GRPO)
We are happy to "quietly" release our latest GRPO-trained Tulu 3.1 model, which is considerably better in MATH and GSM8K!
"As a former tech lead at Meta for 6 years... I got 'meets all' or 'exceeds' every single half except the one in which I took parental leave."
www.reddit.com/r/business/c...
Won't someone please think of the child processes?
Media Watch has Chas derangement syndrome!
Really awesome work and big thank you for sharing it on Bluesky!
Example of injecting Wait token into the model generation.
A hilariously simple repro of OpenAI's test-time scaling paradigm called "Budget Scaling": end the thinking when your token budget is met, or append "Wait" to the model's generation to keep thinking, allowing the model to fix incorrect reasoning steps.
arxiv.org/abs/2501.19393
True.
Abstract and figures from paper R.I.P.: Better Models by Survival of the Fittest Prompts
A method for evaluating data for preference optimisation.
Rejecting Instruction Preferences (RIP) can filter prompts from existing training sets or make high-quality synthetic datasets. They see large performance gains across various benchmarks compared to unfiltered data.
arxiv.org/abs/2501.18578
A reproduction of Deepseek R1-Zero.
"The recipe:
We follow DeepSeek R1-Zero alg -- Given a base LM, prompts and ground-truth reward, we run RL.
We apply it to CountDown: a game where players combine numbers with basic arithmetic to reach a target number."
github.com/Jiayi-Pan/Ti...
Reasoning models can be useful for generating high-quality few-shot examples:
1. generate 10-20 examples from criteria in different styles with r1/o1/CoT, etc
2. have a model rate for each example based on quality + adherence.
3. filter/edit top examples by hand
Repeat for each category of output.
My dog Doggo, a stag hound X bull arab, chilling in the grass
Happy dog.
The Illustrated DeepSeek-R1
Spent the weekend reading the paper and sorting through the intuitions. Here's a visual guide and the main intuitions to understand the model and the process that created it.
newsletter.languagemodels.co/p/the-illust...