Okay yeah that’s actually exactly what I meant by contextual haha. Got it though. Cool stuff!
Posts by Steven Fortney
Oh dang that’s a cool trick. Directly constraining the logprobs makes a lot of sense. I get how it works for a true/false binary (set all other tokens = 0) but any idea how it works for json? Is it a dynamic or contextual constraint?
What do you mean by structured gen? Finetune on text/json input/output pairs?
ReAG - Reasoning Augmented Generation
- No chunking, splitting vectorizing bs
- Stateless, no vector DBs etc.
- Supports any model (deepseek, o3-mini et al)
- Reasoning traces
- Metadata filtering
- Typescript, Python support
DeepSeek is part of a quant trading firm, which probably operates out of the fanciest office imaginable, but why am I picturing this?
Apparently, you can run DeepSeek-V3 locally, provided that you have 8 M4 Pro 64GB Mac minis.
~5 tok/sec.
I haven’t seen o3 yet & have been critical of benchmarks for AI but they did test against some of the hardest & best
On GPQA, PhDs with access to the internet got 34% outside their specialty, up to 81% inside. o3 is 87%.
Frontier Math went from the best AI at 2% to 25%
Some other big ones, too
An Alternative to Test-Time Scaling by @kalomaze.bsky.social
Exploring conditional computation and dynamic depth in language models.
rentry.org/conditional_...
Genesis project
A generative physics engine able to generate 4D dynamical worlds powered by a physics simulation platform designed for general-purpose robotics and physical AI applications.
Introducing MASt3R-SLAM, the first real-time monocular dense SLAM with MASt3R as a foundation.
Easy to use like DUSt3R/MASt3R, from an uncalibrated RGB video it recovers accurate, globally consistent poses & a dense map.
With @ericdexheimer.bsky.social* @ajdavison.bsky.social (*Equal Contribution)
They hypothesize that there exist key "forking tokens," such that re-sampling the system at those specific tokens, but not others, leads to very different outcomes.
An example would be that a simple punctuation mark, or just a single token, can prompt an LLM to produce a different response.
Meta's SPDL: Faster AI model training with thread-based data loading. This framework-agnostic data loading solution utilizes multi-threading to achieve high-throughput in a regulator Python interpreter.
Blog: ai.meta.com/blog/spdl-fa...
Repo: github.com/facebookrese...
Jane Street, a quant trading firm has a very good YouTube channel. For comparison, DeepSeek is also a quant trading firm.
They recently published a video on "Building Machine Learning Systems for a Trillion Trillion Floating Point Operations".
Link: www.youtube.com/watch?v=139U...
How are Kernel Smoothing in statistics, Data-Adaptive Filters in image processing, and Attention in Machine Learning related?
My goal is not to argue who should get credit for what, but to show a progression of closely related ideas over time and across neighboring fields.
1/n
Real footage of a synthetic control model
Introducing 🧞Genie 2 🧞 - our most capable large-scale foundation world model, which can generate a diverse array of consistent worlds, playable for up to a minute. We believe Genie 2 could unlock the next wave of capabilities for embodied agents 🧠.
America has three functional high capacity institutions left
The Federal Reserve
The Southern District of New York
and The Delaware Court of Chancery
1. The conventional explanation for food deserts—that these places are too poor or too rural to generate enough spending on groceries, or too Black to overcome racist corporate redlining — fail to grapple with a key fact: food deserts didn’t used to exist. My new piece in The Atlantic.
Bump
True! I was more trying to point out that ranking on engagement gets you an (essentially) controversy-weighted “popularity” list.
For outbound clicks, choosing links from a whitelist of reputable sources can help the click bait problem but this is definitely not a complete solution.
Absolutely. Part of the best thing about Twitter in the old days was that it felt like one of the few places you could see truly breaking news.
measure of popularity than comment counts or even likes.
No concrete answers but I encourage you to think about the second-order effects of your algo. Eg if popularity is a function of engagement and engagement is a function of controversy then your algo at least partially rewards controversy.
More ‘silent’ measures (outbound clicks) might be a better
My idea for Econ seminars: speakers can go for as long as they want and talk about whatever they want. But we change the norm so that the audience can leave whenever they want and it’s nbd. Let supply and demand for attention determine seminar length/structure, etc.
Being logged into wandb on your phone is a recipe for misery
🌶️(?) take: Agents are somehow hot right because people realized that LLM output can be interpreted as a DSL which directs side effects in the world (e.g. tool calls) rather than just returning text in a chat/autocomplete sense. What are the open challenges? A 🧵... [1/11]