This is just an attempt to weave all my related armchair thoughts into one piece; not a serious deeply-researched cite-able philosophical essay.
Happy to be hear more thoughts & refs to expand my thinking!
Posts by Vaishnavh Nagarajan
This exercise made me realize how shocking it is that LLMs got so far with reasoning just by training on text that refers to more text without ever grounding on non-text stimuli!
can develop "vision" from self-referential text, how we fail to visualize higher dimensional & quantum objects & yet manipulate them, and also various fascinating human phenomena (like not having an internal monologue), and some thought expts borrowed from consciousness.
I framed this as a discussion between three people who respond "yes/yes" vs. "no/no" vs. "yes/no" which leads to mind-bending questions/analogies, somehow simultaneously philosophical and concrete
e.g., the self-referential nature of the dictionary, how eigenvector representations of graphs
Consolidated my armchair thoughts about "how may an LLM (not) differ from a human who thinks in images/text?". I split this as 2 qns:
- is text sufficient to be correct about say, a circle?
- does correctness imply sharing the same "understanding" as humans?
vaishnavh.github.io/blog/what-ll...
Scientists, just like lawyers, are also bound by codes of conduct that demand integrity in the process. So there's tension underlying their allegiance to their idea vs. making sure they don't cross a line in doing so.
I've a few more arguments in the essay, and also more nuance. e.g., scientists still try to be as neutral as they can, but seemingly at a lower day-to-day level or at the initial stages of an idea.
For a position/idea to stand a *fair chance* against other positions/ideas in this courtroom, it *needs* a dedicated lawyer whose job is to think deeply and creatively about that position & present the best/strongest form of it for time to pass a judgement.
In short, my view is that because science is exploration under uncertainty, at some level, scientists end up gambling and picking a side. These sides fight it out in a courtroom, with time as the judge.
In practice, scientists do not seem to behave like "neutral, rational agents" but rather behave like "zealous advocates" for an idea that has "hired them". I wrote about how I think this "courtroom" view of science works and what I learned from it!
vaishnavh.github.io/blog/emotion...
I remember that bad reviews meant you were banned from *reviewing* for a future conference, which sounds like a bad incentive system.... Why are you threatening someone with a good time?
Also, what's the catch with punishing bad reviews by preventing future submissions? Say: if your reviews are egregiously bad as flagged by multiple ACs across at least two conferences, you won't be able to submit papers to the next N conferences. (Possible that I'm missing something here.)
This would not only be more just, it would also disincentivize spamming, paper-count-maxing, chopping up one project into 5 papers etc.,
Curious why conferences don't have a system where the authors of every paper together guarantee N reviews per paper (and they can distribute the load amongst themselves). This way wouldn't we tax authors in proportion to the number of papers they burden the system with?
A recent paper (arxiv.org/abs/2602.18671) made me question something basic: do the logits of a language model model the next-token or the full sequence distribution? It really messed with my brain (in a fun way!). I wrote about the paper to clarify my thinking.
vaishnavh.github.io/blog/joint-o...
(the exact observation is even stronger than what I wrote here e.g., the low-rank structure "generalizes" across prompt-response pairs.)
but it turns out that if you arrange next-token logits from pairs of prompt x response sequences into a matrix (see pic for the exact object), you still get a *linear* *low-rank* structure. neither this linearity, nor the low-rankness follows by design. it somehow emerges from training.
here's my understanding: the low-rank observation is a non-trivial extension of a more straightforward & well-known observation called the softmax bottleneck. If you stack a bunch of next-token logits from various prompts, you'll get a low-rank matrix. this is by *design* (the last layer bottleneck)
If the low-rank logits really holds across settings, I expect it should have a lot of downstream corollaries & connections waiting to be discovered
I also like the low-rank logits finding (arxiv.org/abs/2510.24966) because it provides a novel, simple and surprising abstraction to think about what function a trained LLM implements. It took me a *lot* of time to understand, appreciate and buy the exact result here...
Incredibly, you can select these datapoints through a straightforward method: see whether the given preference is aligned with a model prompted with the target behavior. (i'd have expected that you'd need an exponential search over all possible data subsets to accomplish this)
This paper discovers another spooky generalization effect: to trigger any target behavior in an LLM, you can carefully subselect from a *completely unrelated* preference dataset such that preference finetuning on that subselected dataset produces that behavior.
Really liked this paper which ties up two observations that are equally mindboggling (low-rank logits & subliminal/weird generalization effects) and presents one other such observation
arxiv.org/abs/2602.04863
The visual world is composed of objects, and those objects are composed of features. But do VLMs exploit this compositional structure when processing multi-object scenes? In our 🆒🆕 #ICLR2026 paper, we find they do – via emergent symbolic mechanisms for visual binding. 🧵👇
He also contrasts the personalities of Hardy and Einstein:
Currently reading "a mathematician's apology" by GH Hardy. This is excerpt the foreword by CP Snow describing Hardy's personality and his work:
in associative memory, the latent space doesn't really encode any interesting distance.
imagine you're trying to store which countries share borders. you could simply write down a list of adjacent countries OR you could visualize the world map in your head. this is "associative" vs "geometric".
fascinating!
Would love pointers to related lit! Will DM you about the other question. Thank you for your kind words!
Rare to see such long term efforts these days 🫡